< Home

Installing Speech Recognition Libraries in ROS

October 13, 2016 | Adam Allevato

This post describes how to set up speech recognition to be integrated with ROS, using the Sphinx libraries and custom code developed by human-robot interaction researchers at UT Austin and elsewhere.

Platform: Ubuntu 14.04

ROS Version: Indigo

  1. Download the CMU Sphinx library from this link. Choose the link for “sphinxbase-5-prealpha.tar.gz”, which is the Ubuntu source. Extract the folder into an existing catkin workspace, in a special subdirectory for speech, say, ~/catkin_ws/src/speech/sphinxbase.

  2. In general, we will follow these instructions, but we’re not using latest source, so our procedure is slightly different.
    • Enter your sphinxbase directory and run the following commands:

      ./autogen.sh
      ./configure
      make
      make check
      sudo make install
      
    • There is a good chance that you’ll have to install some packages (e.g. sudo apt-get install autoconf bison swig, which is what I had to install) to perform the ./autogen.sh step.

  3. Now we will follow similar steps for pocketsphinx. Download from this link. Choose “pocketsphinx-5-prealpha.tar.gz.” Extract to ~/catkin_ws/src/speech/pocketsphinx.

  4. Generally follow the directions here. Run these commands in the pocketsphinx directory:

    ./autogen.sh
    ./configure
    make
    make check
    sudo make install
    
  5. Now we will install the HLP-R package. Go up one directory to ~/catkin_ws/src/speech and execute the following command:

    git clone https://github.com/HLP-R/hlpr_speech.git
    

This will create the hlpr_speech directory, which contains a catkin package.

  1. catkin_make your workspace.

  2. Source your workspace if you haven’t already.

  3. Set up your speech dictionary using CMU’s lmtool.

    • Go to the path ~/catkin_ws/src/speech/hlpr_speech/hlpr_speech_recognition/data and modify the files you find there. You can make new files if you wish, but you will have to modify the launch files to point to them. It’s easiest to just change the existing files.
    • Open kps.txt and put the commands you want to recognize in the file, one per line.
    • Save and upload kps.txt to the lmtool (link above), and press the COMPILE KNOWLEDGE BASE button.
    • After the files have been generated, download the XXXX.dic and XXXX.lm files. Rename them kps.dic and kps.lm, overwriting the existing files. You will have to build your own kps.yaml, using the existing file as a guide. kps.map is only necessary if you want to use the GUI (see below). If you need it, just copy the syntax of the existing .map file.

You are now ready to run speech recognition! There are two launch files included with HLP-R, but you should only ever use one of them:, speech_rec.launch, is what you will want to use for actual listening. It publishes detected speech commands to a ROS topic.

By default, the node will launch a GUI that allows voice commands to be simulated by clicking buttons. If you don’t need the gui, you can disable it with the parameter specified below. If you’re not using the GUI, you also don’t need to make a .map file, as the only purpose of the .map file is to determing the GUI buttons. To run the speech listener, run

   roslaunch hlpr_speech_recognition speech_rec.launch speech_gui:=false

You may have an error because you are missing pyaudio, or various other packages. If so, simply sudo apt-get install python-pyaudio and try again. Some people have reported having to add /usr/local/lib to your LD_LIBRARY_PATH:

   export LD_LIBRARY_PATH=/usr/local/lib

Happy speaking!