CMUSphinx project is an open source speech recognition project developed at Carnegie Mellon University, which consists of various tools use to build speech applications:
- CMUclmtk — language model tools
- Sphinxtrain — acoustic model training tools
The following recognizers (decoders)
- Pocketsphinx: Designed to be fast and for real time speed written in C, supports desktop applications and mobile devices. It needs the library Sphinxbase.
- Sphinx3: Speed recognizer intended for researchers .
- Sphinx4: Speech recognition written in the Java.
Let’s learn how HMM based speech recognition is handled: it functions by first learning the characteristics (or parameters) of a set of sound units, and then using what it has learned about the units to find the most probable sequence of sound units for a given speech signal. The process of learning about the sound units is called training. The process of using the knowledge acquired to deduce the most probable sequence of units in a given signal is called decoding, or simply recognition.
Setup Pocketsphinx on windowsEnvironment: Windows 7 and Visual Studio 2012, sphinxbase-0.7, pocketsphinx-0.7
Name the folders (sphinxbase / pocketsphinx ), the project pocketsphinx has external dependencies that use the relative paths like the following “..\..\..\sphinxbase\include\sphinxbase\ad.h”.
To test the installation let's run pocketsphinx_continuous.exe, this tool runs speech recognition both continuous listening from microphone and continuous file transcription. To run it requires:
- Copy sphinxbase.dll to the build folder, for example C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\bin\Debug.
- The parameter –hmm, the directory containing acoustic model files.
- The parameter –lm, word trigram language model input file.
- The parameter –dict, main pronunciation dictionary (lexicon) input file.
This is running the command with the information contained in the project.
pocketsphinx_continuous.exe -hmm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\hmm\en_US\hub4wsj_sc_8k -dict C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\cmu07a.dic -lm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\wsj0vp.5000.DMP
this is the output of me saying “no no no”
I will take a closer look at the project to check more the accuracy of the recognition.
Terminologylanguage model assigns a probability to a sequence of m words P(w1, .., w1) by means of a probability distribution. Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.
HMM: (Hidden_Markov_model) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.
- Recognizer/Decoder versions: http://cmusphinx.sourceforge.net/wiki/versions
- Toolkit overview: http://cmusphinx.sourceforge.net/wiki/tutorialoverview