Monday, September 17, 2012

Speech Recognition – Setting up sclite word alignment

Word alignment is used to measure the accuracy of a decoder.  Sphinx tutorial references sclite from National Institute of Standards and Technology (NIST). This time I’m going to share some notes on how to run and setup sclite.

Running sclite

Sclite is a tool for scoring and evaluating the output of speech recognition by comparing the hypothesized text (HYP) output by the speech recognizer to the correct, or reference (REF) text. After comparing REF to HYP, (a process called alignment), statistics are gathered during the scoring process and a variety of reports can be produced to summarize the performance of the recognition system.

This is an example output using the files located src\sclite\testdata\tests.hyp and sclite\testdata\tests.ref:

Parameters:
  • The '-h' option is a required argument which specifies the input hypothesis file.
  • The '-r' option, a required argument, specifies the input reference file which the hypothesis file(s) are compared to.

sclite -h C:\Project\SpeechRecognition\CMUSphinx\3rdPartyLibs\sctk-2.4.3\src\sclite\testdata\tests.hyp -r C:\Project\SpeechRecognition\CMUSphinx\3rdPartyLibs\sctk-2.4.3\src\sclite\testdata\tests.ref

New Picture (1)


Setup  sctk-2.4.0-20091110-0958.tar.bz2 on Windows 7

I downloaded Speech Recognition Scoring Toolkit (SCTK) which includes the SCLITE, ASCLITE, tranfilt, hubscr, SLATreport and utf_filt scoring tools.

I could compile it with gcc version 3.4.4, found in the following MinGW setup package  mingw-get-inst-20101030.exe.

It was also necessary to modify 'src/rfilter1/makefile.in' and change the value of OPTIONS to be blank  (as specified in the instructions)

The following compilation error is thrown when compiling using gcc version 4.6.2:

recording.h:122:29: error: 'Filter::Filter' cannot appear in a constant-expression
recording.h:122:36: error: template argument 2 is invalid
recording.h:122:36: error: template argument 4 is invalid
make[3]: *** [main.o] Error 1

Setup sctk-2.4.2-20120810-0938.tar.bz2 on Windows 7

Something similar happened with this version, I could compile it with gcc version 3.4.4.

The following compilation error is thrown when using gcc version 4.6.2:

In file included from asctools.h:23:0,
                 from asctools.cpp:22:
timeval.h:33:8: error: redefinition of 'struct timeval'
c:\mingw\bin\../lib/gcc/mingw32/4.6.2/../../../../include/winsock2.h:109:8: error: previous definition of 'struct timeval'
make[2]: *** [asctools.o] Error 1
make[2]: Leaving directory `/c/Project/SpeechRecognition/CMUSphinx/3rdPartyLibs/
sctk-2.4.2/src/asclite/test'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/c/Project/SpeechRecognition/CMUSphinx/3rdPartyLibs/
sctk-2.4.2/src/asclite'
make: *** [all] Error 2

This is thrown when compiling rfilter1

gcc  -o rfilter1 rfilter1.c
C:\Users\MANGEL~1\AppData\Local\Temp\ccuj1MU6.o:rfilter1.c:(.text+0x760): undefined reference to `strncmpi'
C:\Users\MANGEL~1\AppData\Local\Temp\ccuj1MU6.o:rfilter1.c:(.text+0x7c4): undefined reference to `strncmpi'
C:\Users\MANGEL~1\AppData\Local\Temp\ccuj1MU6.o:rfilter1.c:(.text+0x827): undefined reference to `strncmpi'
C:\Users\MANGEL~1\AppData\Local\Temp\ccuj1MU6.o:rfilter1.c:(.text+0x935): undefined reference to `strncmpi'
collect2: ld returned 1 exit status
make[1]: *** [rfilter1] Error 1
make[1]: Leaving directory `/c/Project/SpeechRecognition/CMUSphinx/3rdPartyLibs/sctk-2.4.2/src/rfilter1'
make: *** [all] Error 2

It is also possible to compile this with gcc version 4.6.2 after removing asclite tests and rfilter1 from make file.

After finishing the setup you are able to run sclite as described initially.

Resources:

Sunday, September 9, 2012

Speech Recognition with PocketSphinx

The following post puts together information from different sources about speech recognition and gives a brief overview of the CMUSphinx project and how to get started with PocketSphinx on Windows.

CMUSphinx project is an open source speech recognition project developed at Carnegie Mellon University, which consists of various tools use to build speech applications:
  • CMUclmtk — language model tools
  • Sphinxtrain — acoustic model training tools

The following recognizers (decoders)
  • Pocketsphinx: Designed to be fast and for real time speed written in C, supports desktop applications and mobile devices. It  needs the library Sphinxbase.
  • Sphinx3: Speed recognizer intended for researchers .
  • Sphinx4: Speech recognition written in the Java.

Let’s learn how HMM based speech recognition is handled: it functions by first learning the characteristics (or parameters) of a set of sound units, and then using what it has learned about the units to find the most probable sequence of sound units for a given speech signal. The process of learning about the sound units is called training. The process of using the knowledge acquired to deduce the most probable sequence of units in a given signal is called decoding, or simply recognition.

Setup Pocketsphinx  on  windows

Environment: Windows 7 and Visual Studio 2012, sphinxbase-0.7, pocketsphinx-0.7

Name the folders (sphinxbase / pocketsphinx ), the project pocketsphinx has external dependencies that use the relative paths like the following  “..\..\..\sphinxbase\include\sphinxbase\ad.h”.

To test the installation let's run pocketsphinx_continuous.exe, this tool runs speech recognition both continuous listening from microphone and continuous file transcription. To run it requires:
  • Copy sphinxbase.dll to the build folder, for example C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\bin\Debug.
  • The parameter –hmm, the directory containing acoustic model files.
  • The parameter –lm, word trigram language model input file.
  • The parameter –dict,  main pronunciation dictionary (lexicon) input file.

This is running the command with the information contained in the project.

pocketsphinx_continuous.exe -hmm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\hmm\en_US\hub4wsj_sc_8k 
-dict C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\cmu07a.dic 
-lm C:\Project\SpeechRecognition\CMUSphinx\pocketsphinx\model\lm\en_US\wsj0vp.5000.DMP


this is the output of me saying “no no no”

New Picture (2)

I will take a closer look at the project to check more the accuracy of the recognition.

Terminology

language model assigns a probability to a sequence of m words P(w1, .., w1) by means of a probability distribution. Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence.

HMM: (Hidden_Markov_model)  is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.

Resources:

Tuesday, August 28, 2012

Buildbot Part IV: Customize the scheduler

My previous post was about customizing buildbot to display more information on the waterfall page. Now let’s customize the scheduler to trigger the knightly build only if the continuous one hasn’t failed.

One way to accomplish this is by adding a new scheduler called DependentNightly, this scheduler receives a new parameter called relatedBuilderNames containing a set of builders that the scheduler should watch. The startbuild method will now check, for every related builder, the last finished build status and start the knightly build if no failures are found. Bellow master.cfg changes.

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.schedulers.timed import Nightly
from buildbot.changes import filter
from buildbot.status.builder import SUCCESS, WARNINGS, FAILURE
from twisted.python import log
from twisted.internet import defer

class DependentNightly(Nightly):

    relatedBuilderNames = []
    
    def __init__(self, relatedBuilderNames = [], **kwargs):
        Nightly.__init__(self, **kwargs)
        self.relatedBuilderNames = relatedBuilderNames

    def startBuild(self):
        startBuild = True
        for builderName in  self.relatedBuilderNames:
            builder_status = self.master.status.getBuilder(builderName)
            lastBuild = builder_status.getLastFinishedBuild()
            if lastBuild != None:
                startBuild = startBuild and (lastBuild.getResults() != FAILURE)

        if (startBuild):
            return Nightly.startBuild(self)
        else:
            log.msg(("Nightly Scheduler <%s>: skipping build. " +
                         "- Related builder <%s> failed last build")
                        % (self.name, ", ".join(self.relatedBuilderNames)))
            return defer.succeed(None)   

c['schedulers'] = []
c['schedulers'].append(SingleBranchScheduler(
                            name="Continuous",
                            change_filter=filter.ChangeFilter(branch='master'),
                            treeStableTimer=3*60,
                            builderNames=["Continuous"]))
c['schedulers'].append(DependentNightly(
                            name="Ninghtly",
                            branch='master',
                            hour=13,
                            minute=24,
                            onlyIfChanged=False,
                            builderNames=["Ninghtly"],
                            relatedBuilderNames=["Continuous"] ))


When the scheduler finds out that the build won’t run, a message will be log to twistd.log file as shown bellow.

skippingknightly

I hope this series of blog posts were helpful to give you an overall idea about using buildbot for CI.

Thursday, August 23, 2012

Buildbot Part III: Customizations

On my previous post we configured buildbot to run automated build and tests. Now, let’s add customizations, usually this configurations consists of subclasses set up for use in a regular buildbot configuration file (master.cfg).

Display more information in the waterfall

The first customization will be adding detail information in the waterfall page. One way to do this is writing a new custom buildstep. Most shell commands emit messages to stdout/stderr as they run, we will get output information from xunit.console.exe and use the functions provided by buildbot to perform summarization of the step base on the content of the stdio logfile. Bellow changes done to display details about the tests executed in a dll: total, failed, skipped and the time it took to perform the tests.

class xUnitConsole(ShellCommand):
    def __init__(self, **kwargs):
        ShellCommand.__init__(self, **kwargs)

    def describe(self, done=False):
        description = ShellCommand.describe(self,done)
        if done:
            description.append('total: %d ' % self.step_status.getStatistic('total', 0))
            failed = self.step_status.getStatistic('failed', 0)
            if failed > 0:
                description.append('failed: %d' % failed)
            skipped = self.step_status.getStatistic('skipped', 0)
            if skipped > 0:
                description.append('skipped: %d' % skipped)
            description.append('took: %s seconds' % self.step_status.getStatistic('time', 0))                
            
        return description

    def createSummary(self,log):
        _re_result = re.compile(r'(\d*) total, (\d*) failed, (\d*) skipped, took (.*) seconds')
        
        total = 0
        failed = 0
        skipped = 0
        time = 0
        
        lines = self.getLog('stdio').readlines()
        
        for l in lines:
            m = _re_result.search(l)
            if m:                
                total = int(m.group(1))
                failed = int(m.group(2))
                skipped = int(m.group(3))
                time = m.group(4)
              
        self.step_status.setStatistic('total', total)
        self.step_status.setStatistic('failed', failed)
        self.step_status.setStatistic('skipped', skipped)
        self.step_status.setStatistic('time', time)

def RunTests(factory, testproject, configuration, testdirectory):
        factory.addStep(xUnitConsole(command=['src\\xunit.console\\bin\\%s\\xunit.console.exe' % configuration,
                                '%s\\%s' % (testdirectory, testproject), "/html", "%s\\testresults.html" % testdirectory],
                                     workdir='build\\', name="Run %s" % testproject,
                                     description="Running %s" % testproject ,
                                     descriptionDone="Run %s done" % testproject,
                                     flunkOnFailure=False, warnOnFailure=True))


We switched from using ShellCommand to use the new one xUnitConsole, bellow an image with the new information displayed in the waterfall page. 

buildbot-moreinfo

you can follow a similar approach to display more information in compiling buildstep, check the visual studio buildstep in vstudio.py.

xunit.net report

Next customization will be adding a link to the report generated by xunit.console. The ShellCommand has the parameter logfiles that is useful in this case, when a command logs information in a local file rather than emitting everything to stdout/stderr.  By setting lazylogfiles=True, logfiles will be tracked lazily, meaning they will only be added when and if something is written to them. Finally, the addHtmllog adds a logfile containing pre-formatted HTML.  Bellow the changes done in master.cfg.

class  xUnitConsole(ShellCommand):    

    def __init__(self, xunitreport=None, **kwargs):
        ShellCommand.__init__(self, **kwargs)

    def describe(self, done=False):
        description = ShellCommand.describe(self,done)
        
        if done:
            description.append('total: %d ' % self.step_status.getStatistic('total', 0))
            failed = self.step_status.getStatistic('failed', 0)
            if failed > 0:
                description.append('failed: %d' % failed)
            skipped = self.step_status.getStatistic('skipped', 0)
            if skipped > 0:
                description.append('skipped: %d' % skipped)
            description.append('took: %s seconds' % self.step_status.getStatistic('time', 0))                
            
        return description

    def createSummary(self,log):
        html = self.getLog("xunit-report")
        text = html.getText()
        self.addHTMLLog("xunit-report.html", text)
        
        _re_result = re.compile(r'(\d*) total, (\d*) failed, (\d*) skipped, took (.*) seconds')
        
        total = 0
        failed = 0
        skipped = 0
        time = 0
        
        lines = self.getLog('stdio').readlines()
        
        for l in lines:
            m = _re_result.search(l)
            if m:                
                total = int(m.group(1))
                failed = int(m.group(2))
                skipped = int(m.group(3))
                time = m.group(4)
              
        self.step_status.setStatistic('total', total)
        self.step_status.setStatistic('failed', failed)
        self.step_status.setStatistic('skipped', skipped)
        self.step_status.setStatistic('time', time)

def RunTests(factory, testproject, configuration, testdirectory):
        factory.addStep(xUnitConsole(command=['src\\xunit.console\\bin\\%s\\xunit.console.exe' % configuration,
                                '%s\\%s' % (testdirectory, testproject), "/html", "%s\\testresults.html" % testdirectory],
                                     workdir='build\\', name="Run %s" % testproject,
                                     description="Running %s" % testproject ,
                                     descriptionDone="Run %s done" % testproject,
                                     flunkOnFailure=False, warnOnFailure=True,
                                     logfiles = {"xunit-report" : "%s\\testresults.html" % testdirectory }, lazylogfiles=True))


The waterfall page now has a link to xunit.net report, as you can see bellow:

Buildbot-reportxunitreport

On my next post I’ll show you how to customize the scheduler.

Thursday, August 9, 2012

Buildbot Part II – Running automated build and tests

On my previous post we setup buildbot. Now, let’s configure it to run automated builds and tests. To do this, I’ll be using my xUnit.Net fork hosted on git/codeplex.
Make the following changes to master.cfg:

Connect to the buildslave

c['slaves'] = [BuildSlave("BuildSlave01", "mysecretpwd")]

Track repository – GitPoller

The gitpoller periodically fetches from a remote git repository and process any changes.

from buildbot.changes.gitpoller import GitPoller
c['change_source'] = []
c['change_source'].append(GitPoller(
        'https://git01.codeplex.com/forks/mariangemarcano/xunit',
        gitbin='C:\\Program Files (x86)\\Git\\bin\\git.exe',
        workdir='gitpoller_work',
        pollinterval=300))

Add scheduler

There are different types of schedulers supported by buildbot and you can customize it. Let’s add a single branch scheduler to run the build whenever a change is committed to the repository and knightly schedulers to run the build once a day.

from buildbot.schedulers.basic import SingleBranchScheduler
from buildbot.schedulers.timed import Nightly
from buildbot.changes import filter

c['schedulers'] = []
c['schedulers'].append(SingleBranchScheduler(
                            name="Continuous",
                            change_filter=filter.ChangeFilter(branch='master'),
                            treeStableTimer=3*60,
                            builderNames=["Continuous"]))
c['schedulers'].append(Nightly(
                            name="Ninghtly",
                            branch='master',
                            hour=3,
                            minute=30,
                            onlyIfChanged=True,
                            builderNames=["Ninghtly"]))

Define a Build Factory

It contains the steps that a build will follow. In our case, we need to get source from repository, compile xunit.net projects and run the tests.

Get source from repository: Let’s do an incremental update for continuous build and compile debug; the full build will do a full repository checkout and compile using release configuration. The git factory needs to access git binaries, set the environment path to where you install it for example: C:\Program Files (x86)\Git\bin\.

Compile xunit.net and run tests: Let’s compile using MSbuild.exe and run the tests with xunit.console.exe. By setting flunkOnFailure=False/warnOnFailure=True, test failures will mark the overall build as having warnings.

from buildbot.process.factory import BuildFactory
from buildbot.steps.source.git import Git
from buildbot.steps.shell import ShellCommand

def MSBuildFramework4(factory, configuration, projectdir):
    factory.addStep(ShellCommand(command=['%windir%\\Microsoft.NET\\Framework\\v4.0.30319\\MSbuild.exe',
                                          'xunit.msbuild', '/t:Build', '/p:Configuration=%s' % configuration],
                                 workdir=projectdir, name="Compile xunit.net projects",
                                 description="Compiling xunit.net projects",
                                 descriptionDone="Copile xunit.net projects done"))

def RunTests(factory, testproject, configuration, testdirectory):
        factory.addStep(ShellCommand(command=['src\\xunit.console\\bin\\%s\\xunit.console.exe' % configuration,
                                '%s\\%s' % (testdirectory, testproject), "/html", "%s\\testresults.html" % testdirectory],
                                     workdir='build\\', name="Run %s" % testproject,
                                     description="Running %s" % testproject ,
                                     descriptionDone="Run %s done" % testproject,
                                     flunkOnFailure=False, warnOnFailure=True))

factory_continuous = BuildFactory()
factory_continuous.addStep(Git(repourl='https://git01.codeplex.com/forks/mariangemarcano/xunit',  mode='incremental'))
MSBuildFramework4(factory_continuous, 'Debug', 'build\\')
RunTests(factory_continuous, 'test.xunit.dll', 'Debug', 'test\\test.xunit\\bin\\Debug')
RunTests(factory_continuous, 'test.xunit.gui.dll', 'Debug', 'test\\test.xunit.gui\\bin\\Debug')

factory_ninghtly = BuildFactory()
factory_ninghtly.addStep(Git(repourl='https://git01.codeplex.com/forks/mariangemarcano/xunit', mode='full', method='clobber'))
MSBuildFramework4(factory_ninghtly, 'Release', 'build\\')
RunTests(factory_ninghtly, 'test.xunit.dll', 'Release', 'test\\test.xunit\\bin\\Release')
RunTests(factory_ninghtly, 'test.xunit.gui.dll', 'Release', 'test\\test.xunit.gui\\bin\\Release')

Builder Configuration


from buildbot.config import BuilderConfig

c['builders'] = []
c['builders'].append(
    BuilderConfig(name="Continuous",
      slavenames=["BuildSlave01"],
      factory=factory_continuous))
c['builders'].append(
    BuilderConfig(name="Ninghtly",
      slavenames=["BuildSlave01"],
      factory=factory_ninghtly))

Status notifications

You could also add MailNotifier to configure mail notifications.

c['status'] = []

from buildbot.status import html
from buildbot.status.web import authz, auth

authz_cfg=authz.Authz(
    # change any of these to True to enable; see the manual for more
    # options
    gracefulShutdown = False,
    forceBuild = True, # use this to test your slave once it is set up
    forceAllBuilds = False,
    pingBuilder = True,
    stopBuild = True,
    stopAllBuilds = False,
    cancelPendingBuild = True,
)
c['status'].append(html.WebStatus(http_port=8010, authz=authz_cfg))

Every commit done will trigger a continuous build that automatically compiles and runs tests. In the waterfall page you can see the people who has committed changes to the branch (view image bellow).

Buildbot-waterfall

Click on the person’s name to view commit’s detail information like repository, branch and revision. At the bottom there is a list of changed files. This is very helpful to track down any failures (view image bellow).

 change-detail

On my next blog post I’ll show you how to add customizations.

Tuesday, July 31, 2012

Buildbot Part I - Setting Up

Buildbot is a Python-based continuous integration system that consists of a buildmaster and one or more buildslaves, you can check it out on github.

The buildmaster decides what, when and how the system is build, it has a configuration file called master.cfg where the build process is defined. On the other hand, the buildslaves handle the execution of the commands and return results to the master.

The following is a graph (from buildbot’s wiki) shows buildbot architecture supported repositories and notifiers.

buildbot

Setting up


To setup on windows, the master and slaves require:
Additionally, master requires:
Install buildbotmaster (http://buildbot.googlecode.com/files/buildbot-0.8.6p1.tar.gz) and buildbotslave (http://buildbot.googlecode.com/files/buildbot-slave-0.8.6p1.tar.gz)

For this example, I installed both on same machine, if the installation went ok you should be able to check their versions:

buildbotsetup

The following steps creates and starts the master and slave:

Create BuildbotMaster

C:\Python27\Scripts>buildbot create-master -r C:\BuildMaster

Create BuildbotSlave

C:\Python27\Scripts>buildslave create-slave C:\BuildSlave localhost:9989 BuildSlave01 mysecretpwd

Start BuildbotMaster

C:\Python27\Scripts>buildbot start c:\BuildMaster

Start BuildbotSlave

C:\Python27\Scripts>buildslave start c:\BuildSlave

if everything went fine you should see a page like the following:

buildbotpage

You can check for errors on C:\BuildMaster\twistd.log and C:\BuildSlave\twistd.log.

Who is using buildbot:  Python, Mozilla, Google Chromium and others. The following shows Python 2.7 waterfall page:

PythonBuildbot

On my next post I’ll show you how to run automated builds and tests for a .net application hosted on git/codeplex using buildbot.

Wednesday, June 27, 2012

Virtualization using Proxmox VE

Proxmox Virtual Environment is an open source virtualization platform for running virtual machines with a web UI to manage the VMs and view resource usage. It allows 32/64 bit Windows, Linux and more.
 
I would like to share some information about Proxmox and a few common actions I learned on a previous project. Although I used an earlier version, I just installed version 2.1, here is an installation video. I found that the UI completely changed and more features were added, below a sample screen of the management UI showing the services of the host and another one with the VM guest console.

HostServices

proxmoxconsole

Location of VM and config files

VM images:
proxmox:/var/lib/vz/images# ls
101 102

VM configurations:
proxmox:/etc/pve/openvz# ls
100.conf

or
proxmox:/etc/pve/qemu-server# ls
101.config

Previous version:
proxmox:/etc/qemu-server#


Backup

The backup tool used for OpenVZ containers and KVM is called vzdump; if you are using the UI, first create a backup directory on Datacenter->Storage (using the new UI), and select 'Backup' as content for that storage. If you don’t have a storage created, you will get “Can’t use storage for backups – wrong content type” error.

backup
There are other options to backup an image with no downtime for example using the snapshot mode.


Create a vm image

You can create a vm via command line or using the web UI.

The following example creates windows 2003 vm with 3G of ram and using 1 core/socket CPU.

qm create 101 --name BuildSlave01 --vlan0 virtio=DA:28:61:58:71:5B --ide2 none,media=cdrom --bootdisk ide0
--ostype w2k3 --memory 3072 --sockets 1 --onboot no --cores 1 --virtio0 local:100,format=qcow2

Make sure to assign a new mac address every time a vm is created.


Restore from a backup

To restore use vzrestore for OpenVZ containers or qmrestore for KVM machines.

restorevm


Copy VM image

It is also possible to copy the vm disk image from one virtual machine to another, doing this will keep each vm configuration separate.
proxmox:/var/lib/vz/images# cp 101/vm-101-disk-1.qcow2 106/vm-106-disk-1.qcow2 


Extend vm disk size

The following example will add 20G to the vm.
qemu-img resize vm-101-disk-1.qcow2 +20G


You can use gparted to make this space available on windows server 2003 c:\ drive. First, upload the iso image on proxmox and then boot the vm from cd using uploaded gparted iso image.



Converting image formats

The qemu-img program can be used to convert images from one format to another. For example:
qemu-img convert -O qcow2 MyVmwareImage.vmdk MyProxmoxImage.qcow2