Skip to main content

5.3.19. Korea

FLaReNet Summary

Since 2001, the Korean government actively supports the creation and distribution of Language Resources as an essential infrastructure to promote the Language Technology industry. There are several specific organizations in Korea for the creation and distribution of LRs.

Contact Point Input

National/Regional contact: Key-Sun Choi, KAIST.
National/Regional contact: Yong-ju Lee, Director/ SiTEC - Prof/Wonkwang University. Information sent by Dae Lim Choi (senior researcher working for speech corpus creation at SiTEC, Wonkwang University, Korea).

Programs

Since 2001, Korean government actively supported to create and distribute the Language Resources as an essential infrastructure to promote Language Technology industry.

There are 4 typical organizations for creation and distribution of Language Resources for common use in Korea.

1. SiTEC (Speech Information technology and industry promotion center), supported by the Ministry of Commerce, industry and Energy from 2001:
    http://www.sitec.or.kr.

2. ETRI (Electronics and Telecommunications Research Institute), supported by the Ministry of Information & Telecommunication from 2002:
    http://voice.etri.re.kr.

3. NIKL (National Institute of Korean Language), supported by the Ministry of Culture & Tourism from 1998:
    http://www.korean.go.kr/eng/index.jsp.

4. BORA (Bank of Language Resources) supported by the Ministry of Science & Technology from 2003:
    http://www.bora.or.kr.

The SiTEC was founded at Wonkwang University in May 2001. It has been funded about one million dollars each year for 5 years by the government and 11 companies as a consortium, and it is self-supported after 5 years by profits from speech corpora, technical materials, information contents, training, etc.

It has worked for creating and distributing speech corpora constantly, to develop and disseminate methodologies of assessment of speech recognition and synthesis systems, to collect and distribute technical and industrial information, and to train and foster specialists.

SiTEC has created and distributed 48 speech corpora since May 2001, among which there are corpora for car application, foreign languages, language learning and clean read words and sentences speech for basic research, and others. A total of 20,000 speakers have been involved, and the total memory amounts to 800 GBytes.

A large amount of research is also conducted in several Institutes:

1. http://swrc.kaist.ac.kr.

2. http://www.sejong.or.kr/eindex.php.

3. http://morph.kaist.ac.kr/kcp.