The Zero Resource Speech Challenge

General presentation

The ultimate goal of the Zero Resource Speech Challenge is to construct a system that learn an end-to-end Spoken Dialog (SD) system, in an unknown language, from scratch, using only information available to a language learning infant. "Zero resource" refers to zero linguistic expertise (e.g., orthographic/linguistic transcriptions), not zero information besides audio (visual, limited human feedback, etc). The fact that 4 year olds spontaneously learn language without supervision from language experts show that this goal is theoretically reachable.

The Zero Resource speech challenge addresses a fundamental scientific question (how can a system autonomously acquire language?) which is interesting in its own right, but has also three main practical benefits:

  • Traditional speech and language technologies are trained with massive amounts of textual information. However, most of the world’s languages do not have textual resources or even a reliable orthography. Systems constructed with zero expert resources could serve millions of users of these so-called ‘low-resource’ languages.
  • Languages are disappearing faster than what can be preserved through the language documentation effort. Zero resource technologies could help field linguists with tools to (semi-)automatically analyze and annotate audio recordings of these endangered languages with automatically discovered linguistic units (phonemes, lexicon, grammar).
  • Zero Resource Speech technologies provide predictive models of language growth for psychologists/clinicians interested in the impact of sociolinguistic variations in input on subsequent normal or abnormal language and cognitive development.

The Zero Resource Challenge series is constructed to progress incrementally towards this goal, by proposing achievable but progressively harder objectives, building and open sourcing the core technological components that are needed for an autonomous SD system along the way.

Weakly/Un- supervised learning is tricky to evaluate. We use two kinds of evaluation principles: (1). Unit testing: Each core component is evaluated by a specific set of metrics, largely inspired by psychometrics and linguistics. These tests do not guarantee that an entire system will work well, but they are useful to check and debug the systems. (2). Application testing. As the challenge progress in aggregating more components, useful applications will be possible to construct (e.g. keyword search, document classification, image retrieval from speech, speech to speech translation, etc), making it possible to use more standard evaluation techniques.

So far, two ZeroSpeech challenges have been organized, one in 2015, one in 2017. Please click on the corresponding tab for more information.