|The US pioneered the field of HLT, LT evaluation and of LR. Large activities regarding LT evaluation are conducted at NIST, while the LDC takes care of making available the Language Resources. Funding is provided by the Department of Defense (DARPA and IARPA), by NSF or by the Department of Education. Many universities and non-profit organizations are developing research investigations in this area. Most of the works address the (American) English language.|
Contact Point Input
National/Regional contact: Chistopher Cieri, Linguistic Data Consortium, University of Pennsylvania.
Here are links to programs that are or were recently related to language resources and technologies in the US.
* NIST - National institute of Standards and Technologies
NIST Machine Translation Evaluation for GALE (Global Autonomous Language Exploitation)
Automatic Content Extraction (ACE) Evaluation
Language Recognition Evaluation (LRE)
LVDID (Language Variations and Dialect Identification) is LDC's project to support language variety/dialect identification, especially in the NIST LRE campaigns.
NIST Open Machine Translation (OpenMT) Evaluation
NIST Metrics for Machine Translation Challenge
Rich Transcription Evaluation
Speaker Recognition Evaluation (SRE)
Mixer is LDC's project to support speaker identification campaigns (languages: Arabic, English, Mandarin, Russian, Spanish) and Mixer Greybeard was a specific LDC's project to support speaker identification R&D robust to aging, especially within the NIST SRE campaigns.
TRECVid 2009 Evaluation for Event Detection
HAVIC is LDC's project to support web video collection for NIST TRECVid campaigns.
NIST MADCAT Evaluation
MADCAT = Multilingual Automatic Document Classification Analysis and Translation
AVSS 2009 Multi-Camera Tracking Challenge
Spoken Term Detection
Broadcast News Recognition Evaluation
Conversational Telephone Recognition Evaluation
Spoken Document Retrieval Evaluation
Topic Detection and Tracking Evaluation
* NSF - National Science Foundation
Computer & Information Science & Engineering
NSF/CISE/IIS/RI (Robust Intelligence Program): http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503305&org=IIS&from=home
Computer Research Infrastructure
Production of the MASC manually annotated corpus.
Social, Behavioral & Economic Sciences
Global Autonomous Language Exploitation (GALE)
GALE aims at the multilingual transcription, translation into English and distillation of text into structured information. It includes text (news, newsgroup, blog), transcribed speech (broadcast news and conversation) translated and aligned at sentence and sub-sentence level, annotations for syntactic structure & propositional content, distillation into structured information, for English, Mandarin and Arabic.
Machine Reading (MR)
Multilingual Automatic Document Classification Analysis and Translation (MADCAT)
MADCAT supports systems that perform OCR and MT of handwritten, printed and hybrid text, with varying scribe, text type, writing instrument, time, speed of writing, paper quality. First language is Arabic.
Spoken Language Communication and Translation System for Tactical Use (TRANSTAC)
TRANSTAC aims at STS translation, in a limited domain, in a portable platform for Arabic and Persian.
Robust Automatic Transcription of Speech (RATS)
RATS concerns algorithmic development and Signal Processing: Speech Activity Detection, Language Identification, Speaker Identification and Key Word Spotting. It includes Data Collection and Evaluation.
IARPA, the “Intelligence Advanced Research Projects Activity” invests in high-risk/high-payoff research. Their activity includes:
• Smart Collection
- BEST (Biometrics Exploitation Science & Technology). Multiple biometrics: face, ocular, voice, with challenging collection conditions.
• Incisive Analysis
- ALADDIN (Automated Low-Level Analysis and Description of Diverse Intelligence Video)
- SCIL (Socio-cultural Content in Language)
* Department of Education
International Education Programs Service
IRS: Reading assistance and assessment tools for morphologically complex languages; Digital dictionaries of Arabic colloquial varieties; Survey of DOE funded dictionary projects.
* JHU Center for Language and Speech Processing Summer Workshops