Skip to main content

5.1.20. Poland

FLaReNet Summary

There is a national project on a Polish Platform for Homeland Security (PPBW), closely related to the Univeristy of Poznań, including Technologies for processing and distributing verbal information in internal security systems, and Text processing technologies for Polish in application for public security purposes. A National Corpus of Polish is supported though a national Ministry of Science and Higher Education research/development grant. A new project has been launched to build a System for the transliteration of Parliament speeches – PJWSTK.

There are many laboratories conducting activities in this area: The Institute for Computer Science of the Polish Academy of Science (IPI PAN, Linguistic Engineering Department), the University of Poznań (Phonetics Department, Department of computer linguistics and AI, Laboratory of Speech and Language (Speechlabs)), the Technical University of Wrocław (Computer Science Department), the PJWSTK (Polish-Japanese Institute of Information Technology), the AGH University of Science and Technology (DSP group). There are several SMEs active in that field, such as Ivona, TiP, Polfonetika, Primespeech, SkryBot, Neurosoft,… Polish Telecom has a TP SA Vocal Services Section closely related to France Telecom. LR available for the Polish language have been listed on the CLARIN Web site.

Contact Point Input

National/Regional contact: Zygmunt Vetulani, University Poznan.
National/Regional contact: Krzysztof Marasek, Polish-Japanese Institute of Information Technology and Speech.

Programs

The Public available resources (including those distributed by ELRA) can be found at:

http://www.clarin.eu/view_resources?field_resource_type_value_many_to_one=All&field_languages_value_many_to_one=Polish&field_country_value_many_to_one=All&title_op=contains&title=&field_institute_value_op=contains&field_institute_value=&field_distribution_type_value_many_to_one=All&field_institute_fromlist_nid=All.

Others can be found in Annex 2. See also:

http://www.coli.uni-saarland.de/~dominika/resources.html;
http://ec.europa.eu/translation/polish/polish_en.htm;
http://www.aclweb.org/aclwiki/index.php?title=Resources_for_Polish;
http://nlp.ipipan.waw.pl;
http://www.resourcebook.eu/LreMap/faces/views/resourceMap.xhtml.

Commercially available resources: See Web pages:

http://tip.net.pl;
http://polfonetika.com/news.php;
http://www.neurosoft.pl - NLP tools.

Institutes and scientific groups

IPI PAN
Linguistic Engineering Department: http://www.ipipan.eu/dept/bolc-op.html; current projects are listed at: http://nlp.ipipan.waw.pl; interest in: NLP, corpus linguistics, parsing, etc.
Tools:
  • Spejd - a tool for partial parsing and rule-based morphosyntactic disambiguation: http://nlp.ipipan.waw.pl/Spejd/;
  • Poliqarp -"a corpus indexing and search engine": http://poliqarp.sourceforge.net;
  • Dendrarium - "a treebank development system (under development)": http://sourceforge.net/projects/dendrarium/.

University of Poznań
  • Phonetics Department: http://www.staff.amu.edu.pl/~fonetyka/; current projects: Euronounce (http://euronounce.de); also: http://www.speechlabs.pl; interest in: phonetics, speech synthesis and recognition;
  • Department of computer linguistics and AI: http://www.staff.amu.edu.pl/~zlisi/; current projects: ALCALA (robot control using natural language), Text processing technologies for Polish in application for public security purposes (http://www.ppbw.pl/en/projekty_badawcze/projekt_vetulani.html).
Tools:
  • Lemmatizer by David Weiss: http://www.cs.put.poznan.pl/dweiss/xml/projects/lametyzator/index.xml?lang=pl.

Technical University of Wrocław Computer Science Department
  • Current projects: Polish Wordnet (http://plwordnet.pwr.wroc.pl/browser/?lang=en); interest in: NLP, computational linguistics (see: http://www.ii.pwr.wroc.pl/~piasecki/index-en.html).
Tools:
  • WordnetLoom - wordnet viewing and editing application;
  • WordnetWeaver - application for semi-automatic WordNet expansion based on the analysis of large collections of documents;
  • Inforex - Web-based application for text annotations: http://nlp.pwr.wroc.pl/gpw.

PJWSTK (Polish-Japanese Institute of Information Technology)
  • Interest in: Polish speech processing - speech synthesis, speech recognition, dialog systems, spoken resources (kmarasek@pjwstk.edu.pl).

TP SA Vocal Services Section
  • Interest in: Telephony voice services, R&D group at Polish Telecom (closely related to France Telecom).

AGH University of Science and Technology
  • DSP group: http://www.dsp.agh.edu.pl; interest in: speech recognition.

Speechlabs - closely related to Phonetics Department of Univeristy of Poznań
  • Interest in: speech synthesis, speech recognition (see: http://www.ppbw.pl/en/projekty_badawcze/projekt_demenko.html).

Polish Phonetic Association

Morfologik
  • A set of tools and resources created under the leadership of Martin Milkowski; these include morphological analyzer, stemmer and proofing tools.

Companies

http://www.ivona.com/ - speech synthesis;
http://tip.net.pl - dictionaries, NLP;
http://polfonetika.com/news.php - NLP;
http://www.primespeech.pl - speech recognition;
http://skrybot.przepisywanie.pl - speech recognition;
http://www.neurosoft.pl - NLP, speech synthesis;
http://poleng.pl - machine translation, NLP, lexicons, dictionaries.

Projects and grants

Technologies for processing and distributing verbal information in internal security systems: http://www.ppbw.pl/en/projekty_badawcze/projekt_demenko.html.
Text processing technologies for Polish in application for public security purposes: http://www.ppbw.pl/en/projekty_badawcze/projekt_vetulani.html.
National Corpus of Polish ‒ a national Ministry of Science and Higher Education research/development grant (number R17 003 03): http://nkjp.pl.

For list of projects see: http://nlp.ipipan.waw.pl, http://www.speechlabs.pl.
Participation in several EC projects: LUNA, ATLAS, CLARIN, FLaReNet, ...

New Polish grants: System for transliteration of Parliament speeches - PJWSTK.

Researchers

A list of researchers may be found in Annex 3.