Skip to main content

5.1.21. Portugal

FLaReNet Summary

In Portugal, Research & Development projects are funded mostly by the national funding agency Fundação para a Ciência e a Tecnologia (FCT). Occasionally, the Fundação Calouste Gulbenkian and Instituto Camões also fund some Language Technology projects.

There are many entities working in LRT in Portugal: the Centre of Linguistics of the University of Lisbon (CLUL), the Natural Language and Speech – NLX group, of the Faculty of Sciences of the University of Lisbon, the Laboratory of Spoken Language Systems (L2f) group at the Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento (INESC-ID), the Centre for Artificial Intelligence – Centria, of the Faculty of Science and Technology of the New University of Lisbon, the Research Center for Informatics and Information Technologies – CITI, from the New University of Lisbon, the Centre of Linguistics of the New University of Lisbon, the University of Beira Interior, Centre of Mathematics, the University of Évora, School of Sciences and Technology, Department of Informatics, the Instituto de Linguística Teórica e Computacional (ILTEC), the Artificial Intelligence and Computer Science Laboratory, of the University of Porto, and the Centre of Linguistics of the University of Porto.

Portugal has participated in some European projects, such as EUROTRA (through ILTEC) and LE-PAROLE (with the participation of CLUL and INESC). Currently, Portugal is participating in CLARIN through the NLX research group.

Regarding the national projects that have been undertaken, the corpus built in the LE-PAROLE project has been enriched and enlarged under the Portuguese project TagShare, conducted at the NLX group and in CLUL. The outcome was a 1 Million word corpus linguistically annotated and fully verified by experts – the CINTIL corpus –, and a whole range of processing tools for tokenization, morphosyntactic category (POS) tagging, inflection analysis, lemmatization, multi-word lexeme recognition, named entity recognition, etc. The annotation schemes developed in the project became de facto standards for Portuguese in the field of LT and have been further used in the Reference Corpus of Contemporary Portuguese (CRPC). Additionally, as a result of the project Computational Processing of Português, a distributed language resource center for Portuguese was created: Linguateca. Linguateca provides information about some Portuguese projects, such as the AC/DC project (free access to large quantities of Portuguese parsed text in a uniform way), the CETEMPúblico corpus (a 180 million word corpus of newspaper text from the daily Portuguese newspaper Público), the COMPARA/DISPARA project (free access to Portuguese parallel text aligned with other languages), the Floresta Sintá(c)tica project (treebank for Portuguese) and the organization of evaluation contests for Portuguese (Morfolimpíadas, CLEF and HAREM).

Furthermore, cooperation exists between the Information and Language Technologies Institute (LTI) od the School of Computer Science at Carnegie Mellon University (CMU) and Portuguese Universities.

Contact Point Input

National/Regional contact: David Martins de Matos, INESC.
National/Regional contact: Amalia Mendes, CLUL-University Lisboa.

Programs

A list of programs and activities is already available at: http://www.linguateca.pt.

  - European projects:

CLARIN – Portuguese institution: NLX, Faculty of Sciences of the University of Lisbon (http://nlx.di.fc.ul.pt).

  - Program CMU-PORTUGAL:

http://www.cmuportugal.org.
Cooperation between the Information and Language Technologies Institute (LTI) of the School of Computer Science at Carnegie Mellon University (CMU) and Portuguese Universities.

  - National projects:

funded by the national funding agency FCT- Fundação para a Ciência e a Tecnologia.

  - Main institutions/groups working in LRT in Portugal:

• Centre of Linguistics of the University of Lisbon – CLUL;
Natural Language and Speech-NLX group, Faculty of Sciences of the University of Lisbon;
• L2F group at INESC-ID, Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento;
Centre for Artificial Intelligence-Centria, Faculdade de Ciências e Tecnologia, New University of Lisbon;
• Research Center for Informatics and Information Technologies - CITI, New University of Lisbon;
Centre of Linguistics of the New University of Lisbon;
• University of Beira Interior, Centre of Mathematics;
• Universidade de Évora, School of Sciences and Technology, Department of Informatics;
• Instituto de Linguística Teórica e Computacional;
Artificial Intelligence and Computer Science Laboratory, University of Porto;
Centre of Linguistics of the University of Porto.