Sunday, September 9, 2012

Speech Recognition with PocketSphinx

The following post puts together information from different sources about speech recognition and gives a brief overview of the CMUSphinx project and how to get started with PocketSphinx on Windows.

CMUSphinx project is an open source speech recognition project developed at Carnegie Mellon University, which consists of various tools use to build speech applications:
  • CMUclmtk — language model tools
  • Sphinxtrain — acoustic model training tools

The following recognizers (decoders)
  • Pocketsphinx: Designed to be fast and for real time speed written in C, supports desktop applications and mobile devices. It  needs the library Sphinxbase.
  • Sphinx3: Speed recognizer intended for researchers .
  • Sphinx4: Speech recognition written in the Java.

Let’s learn how HMM based speech recognition is handled: it functions by first learning the characteristics (or parameters) of a set of sound units, and then using what it has learned about the units to find the most probable sequence of sound units for a given speech signal. The process of learning about the sound units is called training. The process of using the knowledge acquired to deduce the most probable sequence of units in a given signal is called decoding, or simply recognition.

Setup Pocketsphinx  on  windows

Environment: Windows 7 and Visual Studio 2012, sphinxbase-0.7, pocketsphinx-0.7

Name the folders (sphinxbase / pocketsphinx ), the project pocketsphinx has external dependencies that use the relative paths like the following  “..\..\..\sphinxbase\include\sphinxbase\ad.h”.

To test the installation let's run pocketsphinx_continuous.exe, this tool runs speech recognition both continuous listening from microphone and continuous file transcription. To run it requires:
  • Copy sphinxbase.dll to the build folder, for example C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\bin\Debug.
  • The parameter –hmm, the directory containing acoustic model files.
  • The parameter –lm, word trigram language model input file.
  • The parameter –dict,  main pronunciation dictionary (lexicon) input file.

This is running the command with the information contained in the project.

pocketsphinx_continuous.exe -hmm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\hmm\en_US\hub4wsj_sc_8k 
-dict C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\cmu07a.dic 
-lm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\wsj0vp.5000.DMP


this is the output of me saying “no no no”

New Picture (2)

I will take a closer look at the project to check more the accuracy of the recognition.

Terminology

language model assigns a probability to a sequence of m words P(w1, .., w1) by means of a probability distribution. Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

HMM: (Hidden_Markov_model)  is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.

Resources:

12 comments:

  1. I run this example, but I get error :(

    ReplyDelete
  2. Great tutorial!! it worked perfect. Thank you

    ReplyDelete
  3. Do you have a tutorial about how to compile the "hello world" example on Windows?

    ReplyDelete
  4. Getting this...
    ERROR: "acmod.c", line 84: Acoustic model definition is not specified neither with -medef option nor with -hmm
    :(

    ReplyDelete
  5. getting this error plz help me out


    READY....
    ERROR: "pocketsphinx.c", line 625: No search module is selected, did you forget
    to specify a language model or grammar?
    FATAL_ERROR: "continuous.c", line 274: Failed to start utterance


    ReplyDelete
  6. I got this error

    Debug Assertion Failed!

    Program: .....Parser.exe

    file f:\dd\vctools\crt_bld\Self_x86\crt\src\fopen.c

    Line 54

    Expression: (file!=NULL)"

    Could you please help me??

    ReplyDelete
  7. Thanks for the explanation. It works!

    ReplyDelete