Skip to main content

5.1.16. Latvia

FLaReNet Summary

The last six years have been very important for research and development of language technologies in Latvia. Several large projects have been funded by the government, important tools and resources have been created by industry, and since 2006 Latvia participates in the CLARIN initiative, supported by the Ministry of Education of Science of the Republic of Latvia. Although there is still a gap between Language Resources and Technology (LRT) for Latvian and for the widely used languages, the current LRT for Latvian can already serve as a basic research infrastructure for the Humanities.

Language resources and tools have important role in the State language policy defined in two major documents: “Guidelines of the State Language Policy for 2005-2014” and “The State Language Policy Programme for 2006-1010”. Significant funding from the Latvian Council of Science (LCS) has been received in 2005-2009 for two LT related projects in the State Research Programs: “Scientific Foundations of Information Technology” and “Latvian Studies (Letonica): Culture, Language and History”: the SemTi-Kamols project aiming at the development and adaptation of the semantic web technologies for semantic analysis of Latvian, and the “Database of Latvian Explanatory Dictionaries and Recent Loanwords” project, mainly dealing with semi-automatic transformation of the Dictionary of Standard Latvian Language into a machine-readable format. In addition, every year since 2005 about 2-3 smaller projects related to LT have been funded by the LCS.

Taking into account importance of LT in ensuring sustainable development of the Latvian and other smaller languages, an initiative Language Shore was launched in 2009 under the patronage of the President of Latvia. This initiative fosters creation of the partnership between government, academia and industry to develop an international expertise cluster around LT in Latvia. The first Language Shore pilot projects have successfully started by Tilde and Microsoft Research bringing fast advancement in Latvian machine translation (MT), developing new crowd-sourcing model in MT data collection, and establishing cooperation in terminology data sharing. Several Language Shore related projects in machine translation, speech technologies, content analysis and other LT fields are planned as part of activities of the Latvian IT Competence Centre.

Contact Point Input

National/Regional contact: Andrejs Vasiljevs, Tilde.

Programs

The last six years have been very important for research and development of language technologies in Latvia. Several large projects have been funded by the government of Latvia, important tools and resources have been created by industry, and since 2006 Latvia participates in the CLARIN initiative, supported by the Ministry of Education of Science of Republic of Latvia. Although there is still a gap between language resources and technology (LRT) for Latvian and the widely used languages, the current LRT for Latvian can already serve as a basic research infrastructure for the Humanities.

Language resources and tools have important role in the State language policy defined in two major documents: “Guidelines of the State Language Policy for 2005-2014” and “The State Language Policy Programme for 2006-1010”. Significant funding from the Latvian Council of Science (LCS) has been received in 2005-2009 for two HLT related projects in the State Research Programs: “Scientific Foundations of Information Technology” and “Latvian Studies (Letonica): Culture, Language and History”: the SemTi-Kamols project (www.semti-kamols.lv) aiming at the development and adaptation of the semantic web technologies for semantic analysis of Latvian, and the “Database of Latvian Explanatory Dictionaries and Recent Loanwords” project, mainly dealing with semi-automatic transformation of the Dictionary of Standard Latvian Language into a machine-readable format. In addition, every year since 2005 about 2-3 smaller projects related to HLT have been funded by the LCS.

Taking into account importance of LT in ensuring sustainable development of the Latvian and other smaller languages, an initiative Language Shore was launched in 2009 under the patronage of the President of Latvia. This initiative fosters creation of the partnership between government, academia and industry to develop an international expertise cluster around LT in Latvia. The first Language Shore pilot projects have successfully started by Tilde and Microsoft Research bringing fast advancement in Latvian machine translation (MT), developing new crowd-sourcing model in MT data collection, and establishing cooperation in terminology data sharing (www.valodukrasts.lv). Several Language Shore related projects in machine translation, speech technologies, content analysis and other LT fields are planned as part of activities of the Latvian IT Competence Centre.

Main Resources and Tools

Latvian National Corpus Initiative and Latvian Language Corpora Resources
See www.korpuss.lv.

Electronic Dictionaries and Terminology Resources
Several machine-readable versions of monolingual dictionaries of modern Latvian have been created by IMCS in cooperation with other research institutions.

Machine Translation
Recent developments include the development by Tilde of English/Latvian SMT systems, which are publicly available at http://translate.tilde.com. Tilde is also involved in two SMT related EU projects: LetsMT! (www.letsmt.eu) and ACCURAT (www.accurat-project.eu).

Speech Technologies
Three speech synthesis systems for Latvian have achieved the level of practical usability: Visvaris (Tilde), T2S (IMCS) and Balss (SIA Rubuls & Co). There has not been for the time being any serious research in Latvian language speech recognition, which would result in a practically usable speech recognition system.

Tools for Natural Language Processing
Morphology tools and syntactic parsers are available for Latvian.