Speech Recognition with PocketSphinx

The following post puts together information from different sources about speech recognition and gives a brief overview of the CMUSphinx project and how to get started with PocketSphinx on Windows.

CMUSphinx project is an open source speech recognition project developed at Carnegie Mellon University, which consists of various tools use to build speech applications:
  • CMUclmtk — language model tools
  • Sphinxtrain — acoustic model training tools

The following recognizers (decoders)
  • Pocketsphinx: Designed to be fast and for real time speed written in C, supports desktop applications and mobile devices. It  needs the library Sphinxbase.
  • Sphinx3: Speed recognizer intended for researchers .
  • Sphinx4: Speech recognition written in the Java.

Let’s learn how HMM based speech recognition is handled: it functions by first learning the characteristics (or parameters) of a set of sound units, and then using what it has learned about the units to find the most probable sequence of sound units for a given speech signal. The process of learning about the sound units is called training. The process of using the knowledge acquired to deduce the most probable sequence of units in a given signal is called decoding, or simply recognition.

Setup Pocketsphinx  on  windows

Environment: Windows 7 and Visual Studio 2012, sphinxbase-0.7, pocketsphinx-0.7

Name the folders (sphinxbase / pocketsphinx ), the project pocketsphinx has external dependencies that use the relative paths like the following  “..\..\..\sphinxbase\include\sphinxbase\ad.h”.

To test the installation let's run pocketsphinx_continuous.exe, this tool runs speech recognition both continuous listening from microphone and continuous file transcription. To run it requires:
  • Copy sphinxbase.dll to the build folder, for example C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\bin\Debug.
  • The parameter –hmm, the directory containing acoustic model files.
  • The parameter –lm, word trigram language model input file.
  • The parameter –dict,  main pronunciation dictionary (lexicon) input file.

This is running the command with the information contained in the project.

pocketsphinx_continuous.exe -hmm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\hmm\en_US\hub4wsj_sc_8k 
-dict C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\cmu07a.dic 
-lm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\wsj0vp.5000.DMP

this is the output of me saying “no no no”

I will take a closer look at the project to check more the accuracy of the recognition.


language model assigns a probability to a sequence of m words P(w1, .., w1) by means of a probability distribution. Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

HMM: (Hidden_Markov_model)  is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.



