Skip to main content

5.2.10. Turkey

FLaReNet Summary

TUBITAK-UEKAE plays an important role for LT in Turkey, both as one of the major group conducting R&D in this area within the Multimedia Technologies Research and Development Laboratory (MSRE) established in 1999, including assessment of speech intelligibility and of speech communication systems, and as a coordinator of LT research activities in Turkey (through collaboration with IZMIR Institute of Technology, the Boğaziçi University and the Sabancı University), with the support of the MULTISAUND (MULTilingualism Integrated to Speech and Audio UNDerstanding) project funded by the EC REGPOT program. TUBITAK-UEKAE has been an active member of NATO Speech Group – RTO-IST-078 since 1999. MTRD conducts the effort to make Turkish available in the European Research Area and provides databases for scientific activities. Several LR has been produced for the Turkish language (text corpus, spoken corpus, Treebank, Wordnet, morphological parser, translation tools,…). There also exists a Turkish national corpus project.

Contact Point Input

National/Regional contact: Mehmet Ugur Dogan, Tubitak Uekae.

Programs

TUBITAK UEKAE plays an important role for HLT in Turkey. As one of the major R&D groups of UEKAE, the Multimedia Technologies Research and Development Laboratory (MSRE) is established in 1999 with the prime areas of interest in assessment of speech intelligibility/communication/quality, speech processing, biometrics, multimedia processing and system design/implementation. 15 dedicated researchers in MTRD are currently studying on Turkish speech recognition, machine translation for Turkish, machine learning, speaker verification/segmentation/tracking, speech enhancement, speech synthesis and assessment and validation of speech communication systems. TUBITAK-UEKAE has been an active member of NATO Speech Group – RTO-IST-078 since 1999. MTRD has two semi-anechoic acoustic examination rooms and a subjective acoustic test room serving 16 subjects, used for assessing Turkish intelligibility, communicability and quality of communication systems. This facility is the only infrastructure in Turkey and MTRD has a unique Turkish Diagnostic Rhyme Test (DRT).

MTRD has been involved in numerous classified projects related to Speech Processing by developing many recognition, enhancement, feature extraction, synthesis and coding algorithms as well as Natural Language Processing and multilingual translation algorithms. MTRD has also begun to participate in international competitions and achieved the best scores at the International Workshop on Spoken Language Translation (IWSLT) in 2007. MTRD conducts the effort to make Turkish available in European Research Area and provides databases for scientific activities.

MTRD has become more visible with the EC REGPOT 2008 – 1 project MULTISAUND (MULTilingualism Integrated to Speech and Audio UNDerstanding). MULTISAUND will improve the existing scientific and technological capacity of MTRD in speech and language technologies by enabling MTRD to improve its existing know-how in an efficient way and by leading MTRD to new joint research opportunities in FP7/FP8. This project will help MTRD to share its experience and capabilities with national and international collaborators. The results and dissemination activities of this project will enhance the awareness towards interactive speech and language technologies especially in Turkish. MULTISAUND will also enable MTRD to strengthen its visibility and links with the social and economic environment both at national and international level. This project should also enable to take the knowledge and experience existing in other regions of Europe and exchange of MTRD's existing know-how with other research entities in Europe. National cooperation has been initiated in the framework of MULTISAUND between MTRD and the IZMIR Institute of Technology, the Boğaziçi University and the Sabancı University.

The main activities on LR for Turkish are the following:
  - METU Turkish Corpus is a collection of 2 million words of post-1990 written Turkish samples (http://fodor.ii.metu.edu.tr/content/metu-turkish-corpus);
  - METU-Sabanci Turkish Treebank is a morphologically and syntactically annotated treebank corpus of 7262 grammatical sentences (http://fodor.ii.metu.edu.tr/content/treebank).

The aim of METU Spoken Turkish Corpus Project (ODT-STD), which is being conducted in the Department of Foreign Language Education since October 2008, is to construct a linguistically analyzed resource consisting of one million words of face-to-face or mediated interactions in present-day Turkish. This corpus will be made available to academia and researchers in all areas of studies related to language.

Turkish national corpus project:
http://tudd.org.tr.

Turkish wordnet:
http://people.sabanciuniv.edu/~oflazer/balkanet/index.htm.

Zemberek is an open source morphological analyzer - generator:
http://code.google.com/p/zemberek/.

TrMorph is an open source morphological analyzer:
http://www.let.rug.nl/~coltekin/trmorph/.

A Project aims translation between agglutinative and related languages:
http://ddi.ce.itu.edu.tr/projects/turkcevir.

Turkish dependency parsing:
http://ddi.ce.itu.edu.tr/projects/dependency-parsing-of-turkish.

English to Turkish translation project:
http://ddi.ce.itu.edu.tr/projects/english2turkish.

A mix of Turkish language resource links also can be found here:
http://denizyuret.blogspot.com/2006/11/turkish-resources.html.