Skip to main content

5.1.11. Germany

FLaReNet Summary

In Germany there was a large and long term national project on translation (of spoken languages), called Verbmobil funded by Germany's Federal Ministry of Research and Technology (BMBF) between 1993 and 2000. A particular outcome of this project, is that is allowed the creation of a relevant number of companies and of new chairs at universities.

It was followed by 3 further larger German projects, also funded by the BMBF: SmartKom (kind of following Verbmobil, including multiple modalities), SmartWeb -- Mobile Broadband Access to the Semantic Web, and COLLATE. The later was particularly important, since it was one the first project to concretely address the issues of a language infrastructure, including the issues of creating a information portal about the field (LT-World), a demonstration centre and evaluation of the technology. Concerning the topic of evaluation, a centre was created, resulting from the cooperation between Fundazione Bruno Kessler, the “Provincia autonoma di Trento” and the DFKI (see CELCT, Center for the Evaluation of Language and Communication Technologies, in the Italy/Trento region chapter).

Some national projects on the mark-up and annotation of Language Resources have been funded in the 1990 and early 2000, which have been (and are still ) very influential, not only in Germany. On the one hand, the project for developing the STTS (Stuttgart-Tübingen tagset). The work on annotation of language corpora in Germany has been building on STTS and extending it. 2 projects (NEGRA and TIGER), both partially funded by the Deutsche Forschungsgemeinschaft (DFG) have been delivering results that have allowed a large number of corpus based linguistic studies, and part of the annotation schemes are being further developed in other projects, like SFB 632 in Postdam. The annotation schemes proposed by those projects have turned up to constitute de facto standards in the field, and some adaptation and abstraction work over those annotation schemes have lead to an international standardization of syntactic annotation in the framework of ISO TC 37/SC4 on language resources management). The work done by many representatives of German institutions, both academic and industrial) in the context of standardization initiatives should be stressed, this being in the national context of DIN or in the international context of ISO, but also within W3C, in the context of a running ICT-PSP project on the “Multilingual Web”.

German institutions are involved in the running CLARIN project and contributing to the efforts on the technological infrastructure for language resources and tools. A project funded by the BMBF), called D-SPIN, is the German pendant to CLARIN. The META-NET (a Network of Excellence forging the Multilingual Europe Technology Alliance) is lead by DFKI in Germany.

Last but not least, the large Theseus project should be mentioned, which has a main goal to “developing the basic technologies and standards necessary to make this knowledge [on the internet] more widely available in the future”. This project is co-funded by the German Federal Ministry of Economics and Technology (BMWI).

The most important funding agencies which are supporting R&D and Infrastructures in the field of LT in Germany are BMBF, DFG, and the SFB (Sonderforschungsbereiche), BMWI and the Bundesländer (the federal states).

Apart from the preliminary mention of various resources that are now available in Germany, and which are described above, the following institutions are specialized in the storage of information on language resources and tools: LT –World at DFKI , BAS – Bayerische Archiv für Sprachsignale, IDS -- Institut für Deutsche Sprache, Wortschatz–Portal – at the University of Leipzig and BBWA - Berlin-Brandenburgische Akademie der Wissenschaften.

Contact Point Input

National/Regional contact: Hans Uszkoreit, Thierry Declerck, DFKI.

Programs

As a first comment it should be stressed that this first set of information cannot aim at completeness. But it tries to put as many links as possible to existing institutions and initiatives which are dealing with Language Resources and Technologies in a way which is going beyond the “day to day” work and offering resources and solutions that can be used by other players in the field. This is the reason why teaching institutions or the like are not listed. A short historical perspective is provided since some of the larger and long term projects in the past, funded either on a national or a transnational base, have had a lasting influence in the actual status for language resources and tools, and for the Language Infrastructure in general, in Germany.

The first to be named should be the former Eurotra project (funded by the European Commission between 1978 and 1994, see Eurotra at Wikipedia), which – not only in Germany – helped in establishing a trans-European infrastructure on issues related to translation but also on language processing in general.

In Germany there was a large and long term national project on translation (of spoken languages), funded by Germany's Federal Ministry of Research and Technology (BMBF) between 1993 and 2000 (see http://verbmobil.dfki.de/). A particular outcome of this project, is that is allowed the creation of a relevant number of companies and of new chairs at universities.

At the European level again, it should be mentioned the participation of German research institutes to the EAGLES project (http://www.ilc.cnr.it/EAGLES/home.html), which was pioneering in the field of standards and annotation guidelines in a large scope of fields in Computational Linguistics.

To close the historical overview, 3 further larger German projects, funded by the BMBF should be mentioned: SmartKom (kind of following Verbmobil, including multiple modalities, see SmartKom), SmartWeb -- Mobile Broadband Access to the Semantic Web, and COLLATE. The later was particularly important, since it was one the first project to concretely address the issues of a language infrastructure, including the issues of creating a information portal about the field (LT-World), a demonstration centre and evaluation of the technology. Concerning the topic of evaluation, a centre could be created, resulting from the cooperation between Fundazione Bruno Kessler, the “Provincia autonoma di Trento” and the DFKI (see CELCT, Center for the Evaluation of Language and Communication Technologies).

Some national projects on the mark-up and annotation of Language Resources have been funded in the 1990 and early 2000, and have to be mentioned here, since the resources they have generated have been (and are still ) very influential, and this not only in Germany. On the one hand, the project for developing the STTS tagset. The work on annotation of language corpora in Germany has been building on STTS and extending it. 2 projects (NEGRA and TIGER), both partially funded by the Deutsche Forschungsgemeinschaft (DFG) which are over now, have been delivering results that have allowed a large number of corpus based linguistic studies, and part of the annotation schemes are being further developed in other projects, like for example at the SFB 632 in Postdam (SFB stands for Sonderforschungsbereich, which are specialized research programs funded by the DFG. Some of this SFB have been and are funding long-term projects in the fields of linguistics and computational linguistics). It should be stressed that the annotation schemes proposed by those projects have turned up to constitute de facto standards in the field, and that some adaptation and abstraction work over those annotation schemes have lead to an international standardization of syntactic annotation in the framework of ISO TC 37/SC4 on language resources management (see ISO TC 37/SC4).

At this point, the work done by many representatives of German institutions, both academic and industrial) in the context of standardization initiatives should be stressed, this being in the national context of DIN (see DIN Nat) or in the international context of ISO (See above), sometimes with the support of FLaReNet for the travels of representatives joining international meetings. Standardization work is an important aspect of all work dealing with linguistic infrastructure, not only in collaboration with ISO, but also with W3C, especially also in the context of a running ICT-PSP project on the “Multilingual Web”.

German institutions are involved in the running CLARIN project and contributing to the efforts on the technological infrastructure for language resources and tools. A nationally funded project (again by the BMBF), called D-SPIN, is the German pendant to CLARIN. A new European Initiative has also been started in 2010, called META-NET (A Network of Excellence forging the Multilingual Europe Technology Alliance) , lead by DFKI in Germany.

Last but not least, the large Theseus project should be mentioned, which has a main goal to “developing the basic technologies and standards necessary to make this knowledge [on the internet] more widely available in the future”. This project is co-funded by the German Federal Ministry of Economics and Technology (BMWI).

Here is a list of some of the most important national funding agencies, which are supporting R&D and Infrastructures in the field of HLT in Germany, and which have been mentioned in the text above:
   1) BMBF;
   2) DFG, and the SFB (Sonderforschungsbereiche);
   3) BMWI;
   4) The Bundesländer (the federal states).

Apart from the preliminary mention of various resources that are now available in Germany, and which are described above, some institutions that are specialized in the storage of information on HLT, language resources and tools, should be mentioned.
   1) LT –World at DFKI;
   2) BAS – Bayerische Archiv für Sprachsignale, including BASSS (BAS Schiel Service);
   3) IDS -- Institut für Deutsche Sprache;
   4) Wortschatz–Portal – at the University of Leipzig;
   5) BBWA - Berlin-Brandenburgische Akademie der Wissenschaften.

A presentation by Prof. Wolfgang Wahlster (slides) give an overview of some of the aspects of language technologies in Germany in the last 30 years (until 2006).

Finally, there exist several associations (the list is not exhaustive) that are active in the field of HLT, and are for example organizing conferences:
   1) DGfS - Deutsche Gesellschaft für Sprachwissenschaft (German Linguistic Society);
   2) GSCL - Gesellschaft für Sprachtechnologie und Computerlinguistik e.V (German Society for Computational Linguistics & Language Technology);
   3) GAL - Gesellschaft für Angewandte Linguistik.