The Zero Resource Speech Challenge


In this page we list links to software to specific interest to the unsupervised speech learning. It has the emphasis on listing free software. Please, follow the instructions provided by the authors for the software installation and its operation, provide the appropriate reference and contact the authors in case of any issue.

The bootphon team develops pipelines for data analysis, speech processing or machine learning and distribute them in an open source format in the bootphon repo on github.

Discovery of subword units or subword representations

  • Discrete units, Bayesian approaches:
  • Continuous representations, posteriorgrams:
    • Chen, H., Leung, C. C., Xie, L., Ma, B., & Li, H. (2015). Parallel inference of dirichlet process gaussian mixture models for unsupervised acoustic modeling: A feasibility study. In Proceedings of Interspeech. [code]
    • Michael Heck, Sakriani Sakti, Satoshi Nakamura (2016). Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario. Procedia Computer Science, Volume 81, pp73-79. [the code is the same, plus kaldi]
  • Continuous representations, DNNs (this requires spoken term discovery):
    • Synnaeve, G., Schatz, T., & Dupoux, E. (2014, December). Phonetics embedding learning with side information. In Spoken Language Technology Workshop (SLT), 2014 IEEE (pp. 106-111). IEEE. [github]
    • Thiolliere, R., Dunbar, E., Synnaeve, G., Versteegh, M., & Dupoux, E. (2015). A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling. In Sixteenth Annual Conference of the International Speech Communication Association. [github]

Spoken Term Discovery