Skip to main content

5.3.37. United States of America

FLaReNet Summary

The US pioneered the field of HLT, LT evaluation and of LR. Large activities regarding LT evaluation are conducted at NIST, while the LDC takes care of making available the Language Resources. Funding is provided by the Department of Defense (DARPA and IARPA), by NSF or by the Department of Education. Many universities and non-profit organizations are developing research investigations in this area. Most of the works address the (American) English language.

Contact Point Input

National/Regional contact: Chistopher Cieri, Linguistic Data Consortium, University of Pennsylvania.

Programs

Here are links to programs that are or were recently related to language resources and technologies in the US.

   * NIST - National institute of Standards and Technologies

NIST Machine Translation Evaluation for GALE (Global Autonomous Language Exploitation)
http://www.itl.nist.gov/iad/mig//tests/gale/

Automatic Content Extraction (ACE) Evaluation
http://www.itl.nist.gov/iad/mig//tests/ace/

CLEAR evaluation
http://www.clear-evaluation.org

Language Recognition Evaluation (LRE)
http://www.itl.nist.gov/iad/mig//tests/lre/
LVDID (Language Variations and Dialect Identification) is LDC's project to support language variety/dialect identification, especially in the NIST LRE campaigns.

NIST Open Machine Translation (OpenMT) Evaluation
http://www.itl.nist.gov/iad/mig//tests/mt/

NIST Metrics for Machine Translation Challenge
(MetricsMATR)
http://www.itl.nist.gov/iad/mig//tests/metricsmatr/

Rich Transcription Evaluation
http://www.itl.nist.gov/iad/mig//tests/rt/

Speaker Recognition Evaluation (SRE)
http://www.itl.nist.gov/iad/mig//tests/sre/
Mixer is LDC's project to support speaker identification campaigns (languages: Arabic, English, Mandarin, Russian, Spanish) and Mixer Greybeard was a specific LDC's project to support speaker identification R&D robust to aging, especially within the NIST SRE campaigns.

TRECVid 2009 Evaluation for Event Detection
http://www.itl.nist.gov/iad/mig//tests/trecvid/2009/index.html
HAVIC is LDC's project to support web video collection for NIST TRECVid campaigns.

NIST MADCAT Evaluation
MADCAT = Multilingual Automatic Document Classification Analysis and Translation
http://www.itl.nist.gov/iad/mig//tests/madcat/index.html

AVSS 2009 Multi-Camera Tracking Challenge
http://www.itl.nist.gov/iad/mig//tests/avss/2009/index.html

Spoken Term Detection
http://www.itl.nist.gov/iad/mig//tests/std/

Broadcast News Recognition Evaluation
http://www.itl.nist.gov/iad/mig//tests/bnr/

Conversational Telephone Recognition Evaluation
http://www.itl.nist.gov/iad/mig//tests/ctr/

Spoken Document Retrieval Evaluation
http://www.itl.nist.gov/iad/mig//tests/sdr/

Topic Detection and Tracking Evaluation
http://www.itl.nist.gov/iad/mig//tests/tdt/

   * NSF - National Science Foundation

Computer & Information Science & Engineering
http://nsf.gov/dir/index.jsp?org=cise
NSF/CISE/IIS/RI (Robust Intelligence Program): http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503305&org=IIS&from=home

Computer Research Infrastructure
http://www.nsf.gov/pubs/2006/nsf06597/nsf06597.htm
Production of the MASC manually annotated corpus.

Cyberinfrastructure
http://nsf.gov/dir/index.jsp?org=OCI

Social, Behavioral & Economic Sciences
http://nsf.gov/dir/index.jsp?org=sbe

   * DARPA

Global Autonomous Language Exploitation (GALE)
http://www.darpa.mil/ipto/programs/gale/gale.asp
GALE aims at the multilingual transcription, translation into English and distillation of text into structured information. It includes text (news, newsgroup, blog), transcribed speech (broadcast news and conversation) translated and aligned at sentence and sub-sentence level, annotations for syntactic structure & propositional content, distillation into structured information, for English, Mandarin and Arabic.

Machine Reading (MR)
http://www.darpa.mil/ipto/programs/mr/mr.asp

Multilingual Automatic Document Classification Analysis and Translation (MADCAT)
http://www.darpa.mil/ipto/programs/madcat/madcat.asp
MADCAT supports systems that perform OCR and MT of handwritten, printed and hybrid text, with varying scribe, text type, writing instrument, time, speed of writing, paper quality. First language is Arabic.

Spoken Language Communication and Translation System for Tactical Use (TRANSTAC)
http://www.darpa.mil/ipto/programs/transtac/transtac.asp
TRANSTAC aims at STS translation, in a limited domain, in a portable platform for Arabic and Persian.

Robust Automatic Transcription of Speech (RATS)
http://www.darpa.mil/IPTO/solicit/baa/BAA-10-34_Mod01.pdf
RATS concerns algorithmic development and Signal Processing: Speech Activity Detection, Language Identification, Speaker Identification and Key Word Spotting. It includes Data Collection and Evaluation.

   * IARPA

IARPA, the “Intelligence Advanced Research Projects Activity” invests in high-risk/high-payoff research. Their activity includes:
   • Smart Collection
         - BEST (Biometrics Exploitation Science & Technology). Multiple biometrics: face, ocular, voice, with challenging collection conditions.
   • Incisive Analysis
      - ALADDIN (Automated Low-Level Analysis and Description of Diverse Intelligence Video)
      - SCIL (Socio-cultural Content in Language)

   * Department of Education

International Education Programs Service
http://www2.ed.gov/about/offices/list/ope/iegps/index.html
IRS: Reading assistance and assessment tools for morphologically complex languages; Digital dictionaries of Arabic colloquial varieties; Survey of DOE funded dictionary projects.

   * JHU Center for Language and Speech Processing Summer Workshops

http://www.clsp.jhu.edu/workshops/