Skip to main content

5.1.4. Bulgaria

FLaReNet Summary

There is no national program in Language Resources and Technologies in Bulgaria. Several laboratories conduct projects in this area, especially within the Bulgarian Academy of Science (Linguistic Modeling Department of the Institute for Parallel Processing, Mathematical Linguistics Department of the Institute of Mathematics and Informatics, Institute for Bulgarian Language and Institute for Literature), which produced a lot of LR for the Bulgarian language (dictionaries, grammars, spelling checkers, treebanks, wordnets…). Other laboratories active in that field are the Faculty of Slavic Studies, Sofia University “St. Kliment Okhridski”, Plovdiv University “Paisii Hilendarski”, the New Bulgarian University, the South-West University "Neofit Rilski" or the Konstantin Preslavsky University of Shumen, which also produced LR for Bulgarian.

Contact Point Input

National/Regional contact: Kiril Simov, IPPBAS - Bulgarian Academy of Sciences.

Programs

The main groups, projects and initiatives within the area of Language Resources and Technologies in Bulgaria are the following:

  - Linguistic Modeling Department of the Institute for Parallel Processing, Bulgarian Academy of Sciences
    Resources, tools and activities:
      • BulTreeBank – An HPSG-based treebank of Bulgarian
      • BulTreeBank Text Archive – Texts annotated up to paragraph level with respect to TEI guidelines
      • BulTreeBank Morphosyntactic Corpus – Texts annotated with grammatical information
      • Bulgarian CLEF Corpus – Supporting the evaluation of question answering and information retrieval systems for Bulgarian
      • Bulgarian LT4eL Corpus - Grammatical and semantic annotation
      • Morphological Dictionary of Bulgarian
      • BulTreeBank Gazetteers – Lexicon of proper names
      • BulTreeBank Partial Grammar – simple NPs and verb forms
      • Dependency Parser for Bulgarian
      • Medical data processing
      • Parallel Bulgarian-English HPSG Treebank

  - Mathematical Linguistics Department of the Institute of Mathematics and Informatics, Bulgarian Academy of Sciences
    Resources, tools and activities:
      • Corpora
          o MULTEXT-East Multilingual Parallel Annotated and Aligned Corpus
          o MULTEXT-East Comparable Corpora: BG fictions, BG news
          o Bulgarian-Polish Parallel Annotated Corpus
          o Bulgarian-Polish Comparable Corpus
          o Bulgarian-Polish-Lithuanian Parallel and Comparable Corpora
      • Language-specific Resources
          o MULTEXT-East Language-specific Resources - TEI-compliant Morphosyntactic Specifications for Corpora and Lexicon encoding http://nl.ijs.si/ME/CD/docs/mte-d12m/mte2.html - bulgarian
          o Bulgarian Lexicon
          o Bulgarian Corpus
          o Bulgarian LDB for integrated multilingual CONCEDE LDBs
      • Bilingual digital dictionaries
          o Bulgarian-Polish online dictionary
          o LDBs for Bulgarian-Lithuanian online dictionary (in progress)
          o Slovak-Bulgarian Terminology DB (in progress)

  - Institute for Bulgarian Language, Bulgarian Academy of Sciences
    Resources, tools and activities:
      • Bulgarian WordNet
      • Grammar Dictionary of Bulgarian - An Electronic Grammar Dictionary of Bulgarian
      • Automatic spelling checking system: ItaEst - Taka e!
      • Bulgarian written corpus - The corpus includes original and translated texts in Bulgarian from diverse thematic domains and genres
      • Tagged corpus of Bulgarian - The Tagged Corpus is the result of the manual POS disambiguation of each wordform
      • Semantic Corpus of Bulgarian - The Semantic Corpus contains sense-disambiguated lexical items defined in the context of occurrence
      • Digital Dialectological Resources

  - Institute for Literature, Bulgarian Academy of Sciences
    Resources, tools and activities:
      • Digital Mediaeval Resources of Bulgaria

  - Faculty of Slavic Studies, Sofia University “St. Kliment Okhridski”
    Resources, tools and activities:
      • Corpus of Bulgarian Public Speech
      • Bulgarian Speech Corpus
      • Digital Dialectological Resources

  - Plovdiv University “Paisii Hilendarski”
    Resources, tools and activities:
      • Bulgarian WordNet
      • Dictionary of Bulgarian Inflection Morphology
      • Bulgarian POS Tagger
      • Chunker of Bulgarian

  - New Bulgarian University
    Resources, tools and activities:
      • Translation Memory Resources

  - South-West University "Neofit Rilski"
    Resources, tools and activities:
      • Parallel Corpora

  - Konstantin Preslavsky University of Shumen
    Resources, tools and activities:
      • Parallel Corpora