Skip to main content

5.1.10. France

FLaReNet Summary

Funding for research and development in Language Technology mostly comes from the Ministry of Higher Education and Research through the National Research Agency (ANR), from the Ministry of Economy, Finance and Industry through the Oseo Agency and through the Pôles de Compétitivité (Competitiveness Clusters), gathering industrials and researchers, which are funded by Ministries and by local administrations (departments and regions). The Direction Générale de l’Armement of the Ministry of Defence has its own programs for defense applications, and also cooperates with the previously mentioned agencies on cooperative programs regarding dual (civil and defense) technologies, including Language Technologies.

The Techno-Langue program (2003-2005) was supported by the Ministries of Research, Industry and Culture. It included the development of Language Resources (corpus, lexica, dictionaries, etc) for French and the organization of 8 evaluation campaigns, for written and spoken language processing. All data and tools produced within the evaluation campaigns have been distributed by ELRA as Evaluation packages. It was followed on the same basis by the Techno-Vision program addressing research in Computer Vision, comprising OCR (Optical Character Recognition) and document processing (including handwritten recognition). Some of those activities are now continuing as individual projects supported by the ANR, including the REPERE evaluation campaign on multimedia people identification in audiovisual broadcast launched in 2010.

With the help of those efforts, French is ranked 2nd after English in terms of the number of Language Resources available for the European Union official languages, as it appears in the Euromatrix, where 1114 resources have been identified for French (August 2010). Some laboratories achieved the highest performances in the framework of international evaluation campaigns, such as the ones organized on Speech recognition by NIST in the USA, or on crosslingual Question&Answer by the CLEF project in Europe.

Nowadays, OSEO supports the very large Quaero program gathering 26 industrial and academic partners with a public funding of 99 M€ over 5 years (2008-2013). Quaero addresses the development of around 30 technologies for various medias (speech, text, music, image, video) for the needs of 6 applications related to Multimedia and Multilingual Document processing. The whole program is structured on the systematic comparative evaluation of technologies and on the production and use of large amounts of data.

When the European Language Resources Association (ELRA) was created in 1995, the French government expressed its support for welcoming its Evaluation and Language Resources Distribution Agency, ELDA, which is located in Paris.

CNRS also set in 2005 a Textual and Lexical Resources Centre (CNRTL) and Centers for Oral Resources (CRDO), distributing data and tools in the framework of the Digital Resources Center (CRN).

There are about 50 laboratories working on speech and language processing, also including Sign Language Processing and Multimodal communication, in France, gathering about 600 researchers. Many of them are affiliated to a large research organization (CNRS, INRIA (National Information Technology Institute), CEA (Atomic Energy Agency) and Institut Télécoms, which are partners in the Allistene national Alliance). Some public institutes also participate in this research area, such as the Laboratoire National de Métrologie et d’Essai (LNE), which develops activities related to Language Technology assessment, and the INA (Institut National de l’Audovisuel) or the BNF (Bibliothèque Nationale de France), regarding the processing of their huge amount of textual or audiovisual data.

Some large companies were active in that field some years ago (Alcatel, Thomson, France Telecom (FT)), but decreased their research effort, sometimes creating a spin-off company (such as FT with Telisma in Speech recognition, then bought by an Indian group). Several SMEs or VSEs are very active in Language and Speech Technologies, such as Vecsys and Vecsys-Research, Sinequa, Synapse, Syllabs, Tagmatica, Pertimm, Arisem, Bertin, Lingway, Systran, Softissimo,… while other companies either large (Technicolor, Orange…) or small (Exalead, Temis, Jouve, As an Angel, Aldebaran,…) develop activities in close relationship with Language Technology providers. Xerox has its European research centre in Grenoble.

Research in Language Technologies is very active in France, where many laboratories exist, and a lot of large size resources and state-of-the-art technologies have been produced and distributed for the French language. However, the size of the resources and the number of tools are still very limited compared to what exists for the English language, and still insufficient to address all the technologies related to the French language.

Contact Point Input

National/Regional contact: Joseph Mariani, LIMSI-CNRS & IMMI-CNRS.
National/Regional contact: Khalid Choukri, ELDA.

Programs

Funding for research and development in Language Technology mostly comes from the Ministry of Higher Education and Research through the National Research Agency (ANR), from the Ministry of Economy, Finance and Industry through the Oseo Agency and through the Pôles de Compétitivité (Competitiveness Clusters), gathering industrials and researchers, which are funded by Ministries and by local administrations (departments and regions). The Direction Générale de l’Armement of the Ministry of Defence has its own programs for defense applications, and also cooperates with the previously mentioned agencies on cooperative programs regarding dual (civil and defense) technologies, including Language Technologies.

Research on the French language has been supported by several programs. The Réseau Francophone d’Ingénierie de la Langue (FRANCIL) has been sustained by the Francophone Universities Association (AUF) from 1994 to 2000. It contained cooperation projects between Northern francophone countries and Southern ones (especially in Africa and Asia) and coopetition (mixing cooperation and competition) projects organized as technology evaluation campaigns both on written and spoken language processing.

The Techno-Langue program (2003-2005) was supported by the Ministries of Research, Industry and Culture. It included the development of Language Resources (corpus, lexica, dictionaries, etc) for French and the organization of 8 evaluation campaigns, on topics such as Syntactic Parsing, Machine Translation, Information Retrieval (Question & Answer) or Broadcast News speech transcription (ESTER campaign). All data and tools produced within the evaluation campaigns have been distributed as Evaluation packages. It was followed on the same basis by the Techno-Vision program addressing research in Computer Vision, comprising OCR (Optical Character Recognition) and document processing (including handwritten recognition).

Some of those activities are now continuing as individual projects supported by the ANR, including the REPERE evaluation campaign on multimedia people identification in audiovisual broadcast launched in 2010.

CNRS (the National Centre for Scientific Research) also had several programs in that field along the years (GRECO, CCIIL, CNRTL, CRDO).

Those programs helped a lot in gathering the scientific community around a common objective and allowed for the production of data (corpus, lexica, dictionaries…) which are crucial for the development of technologies. With the help of those efforts, French is ranked 2nd after English in terms of the number of Language Resources available for the European Union official languages, as it appears in the Euromatrix, where 1114 resources have been identified for French (August 2010).

For example, the Techno-Langue ESTER campaign allowed producing, in 2004, 1,700 hours of Broadcast News speech in French, 100 hours of which have been transcribed, making it possible to develop Broadcast News transcription systems of sufficient quality and opening the feasibility of automatic video transcription and indexing for French. However, this has to be compared with the Broadcast News corpus developed for Chinese within the US DARPA GALE program, which comprises 3,000 hours of speech, 500 of which have been transcribed.

Nowadays, OSEO supports the very large Quaero program gathering 26 industrial and academic partners with a public funding of 99 M€ over 5 years (2008-2013). Quaero addresses the development of around 30 technologies for various medias (speech, text, music, image, video) for the needs of 6 applications related to Multimedia and Multilingual Document processing (Digitization platform, Social impact media monitoring, Personalized video, Digital heritage, Communication portals and Multimedia search engines). The whole program is structured on the systematic comparative evaluation of technologies and on the production and use of large amounts of data.

When the European Language Resources Association (ELRA) was created in 1995, the French government expressed its support for welcoming its Evaluation and Language Resources Distribution Agency, ELDA, which is located in Paris.

CNRS also set in 2005 a Textual and Lexical Resources Centre (CNRTL) and Centers for Oral Resources (CRDO) in Aix and Paris, distributing data and tools in the framework of the Digital Resources Center (CRN).

The French and Francophone scientific community in NLP gathers in the ATALA association which recently celebrated its 50th birthday and organizes the annual TALN conference, while the francophone speech community gathers in the AFCP association which organizes the biennial JEP conference, alternately with the Interspeech conference in Europe, and in close cooperation with the International Speech Communication Association (ISCA), where it participates as a Special Interest Group. The TALN and JEP conferences are jointly organized from time to time, and a special yearly conference, RECITAL, is devoted to the young researchers. ATALA maintains the LN mailing list and, for young researchers’ activities, the Orbital mailing list, as well as the LN-Forum.

Professional Associations, such as the APIL (Association des Professionnels des Industries de la Langue) or the Tenor association on speech existed in the past, but seem to be presently inactive.

There are about 50 laboratories working on speech and language processing, also including Sign Language Processing and Multimodal communication, in France, gathering about 600 researchers. Many of them are affiliated to a large research organization (CNRS, INRIA (National Information Technology Institute), CEA (Atomic Energy Agency) and Institut Télécoms, which are partners in the Allistene national Alliance).

Some laboratories achieved the highest performances in the framework of international evaluation campaigns, such as the ones organized on Speech recognition by NIST in the USA, or on crosslingual Question&Answer by the CLEF project in Europe.

Some public institutes also participate in this research area, such as the Laboratoire National de Métrologie et d’Essai (LNE), which develops activities related to Language Technology assessment, and the INA (Institut National de l’Audovisuel) or the BNF (Bibliothèque Nationale de France), regarding the processing of their huge amount of textual or audiovisual data.

Some large companies were active in that field some years ago (Alcatel, Thomson, France Telecom (FT)), but decreased their research effort, sometimes creating a spin-off company (such as FT with Telisma in Speech recognition, then bought by an Indian group). Several SMEs or VSEs are very active in Language and Speech Technologies, such as Vecsys and Vecsys-Research, Sinequa, Synapse, Syllabs, Tagmatica, Pertimm, Arisem, Bertin, Lingway, Systran, Softissimo,… while other companies either large (Technicolor, Orange…) or small (Exalead, Temis, Jouve, As an Angel, Aldebaran,…) develop activities in close relationship with Language Technology providers. Xerox has its European research centre in Grenoble.

Research in Language Technologies is very active in France, where many laboratories exist, and a lot of large size resources and state-of-the-art technologies have been produced and distributed for the French language. However, the size of the resources and the number of tools are still very limited compared to what exists for the English language, and still insufficient to address all the technologies related to the French language.