Thursday, June 5, 2014

Making Jasper use the AT&T Speech API

So, I've recently began to mess with Jasper, and simply wasn't satisfied with the way the on board Speech to Text Engine handled what I said. So, I decided to follow the Jasper projects advice and transition over to the using the AT&T Speech API (AT&T calls it the "Speech API" so that's what it will be referred to from here on).

 The main concern I have is that the Speech API is rate limited, but it's very difficult to run up to the limit. You're limited to 1 request per second and your audio file must be less than a minute in length. Jasper should have no problem with that. The main modification comes to Jasper's "mic.py" file. In order to keep Jasper quick, he'll use his in-house Speech to Text engine to detect his name, but after that, he'll use the Speech API.

This article does assume you know a small bit about programming terms such as a method and where it begins, and ends. It also assumes you've set up an AT&T Developer Account (Sign up here), and you have a Sandbox app set up for Jasper, and you have it's APP_ID and APP_SECRET.


Changing Mic.py

A few lines need to be changed/added to the mic.py file, this will be the only modification to the Jasper Core...

At the very top, you'll need to add an import for Speech handler, just add

from ATTSpeech import ATTSpeech

After that, you'll need to add a variable to the mic class change

    speechRec = None
    speechRec_persona = None

to

    speechRec = None
    speechRec_persona = None
    attEngine = None


Then, we have to add a line to the mic's __init__ method. Add this to the very end of it.

    self.attEngine = ATTSpeech()

Now, find the mic's transcribe method, and change

     if MUSIC:
         self.speechRec_music.decode_raw(wavFile)
         result = self.speechRec_music.get_hyp()
     elif PERSONA_ONLY:
         self.speechRec_persona.decode_raw(wavFile)
         result = self.speechRec_persona.get_hyp()
     else:
         self.speechRec.decode_raw(wavFile)
         result = self.speechRec.get_hyp()

to



    if MUSIC:
        self.speechRec_music.decode_raw(wavFile)
        result = self.speechRec_music.get_hyp()
    elif PERSONA_ONLY:
        self.speechRec_persona.decode_raw(wavFile)
        result = self.speechRec_persona.get_hyp()
    else:
        result = self.attEngine.getResults(audio_file_path)



Adding the ATTEngine.py file

In the same directory as mic.py, we'll need a  new file called ATTEngine.py. It houses the methods to communicate with the speech API, the create the file and add the following:

import subprocess
import urllib2
from subprocess import call

class ATTSpeech:
    CLIENT_ID = "FillMEIn"
    CLIENT_SECRET = "FillMEIn"
    TOKEN = "None"
    def __init__(self):
        pass

    def getToken(self):
        #Get Access Token via OAuth.
        proc = subprocess.Popen(["../static/getToken.sh", self.CLIENT_ID, self.CLIENT_SECRET], stdout=subprocess.PIPE)
        tmp = proc.stdout.read()
        tmp = tmp.replace("\"","").replace("{","").replace("}","").split(":")[1]
        self.TOKEN = tmp[:32]

    def getResults(self,path):
        if not self.haveInternet():
            return ["SPECIAL CASE I COULD NOT CONNECT TO THE INTERNET",None]

        proc = subprocess.Popen(["sh","../static/getResults.sh",self.TOKEN,path], stdout=subprocess.PIPE)
        tmp = proc.stdout.read()
        for strin in tmp.upper().replace("\"","").replace("{","").replace("[","").replace(":","").replace("}","").replace("]","").split(","):
            print strin
            if "HYPOTHESIS" in strin:
                strin = strin.strip().replace("HYPOTHESIS","").strip()
                return [strin,None]
            if "IOEXCEPTION" in strin:
                return ["SPECIAL CASE SPEECH API HAD AN ERROR",None]
            if "UNAUTHORIZED" in strin:
                print "The current token is invalic, getting a new one..."
                self.getToken()
                return self.getResults(path)
        return ["",""]

    def haveInternet(self):
        try:
            response=urllib2.urlopen('http://74.125.228.100',timeout=1) #Google IP. Prevents a long DNS look up for a url. 1 second timeout. Adjust as needed
            return True
        except urllib2.URLError as err: pass
        return False



Change the CLIENT_ID to your APP_ID
Change the CLIENT_SECRET to your APP_SECRET

Scripts for the Engine

You'll notice the engine makes use of two bash scripts. They need to be located in the ./jasper/static directory.

getToken.sh 
#!/bin/bash
curl "https://api.att.com/oauth/token" \
    --header "Content-Type: application/x-www-form-urlencoded" \
    --header "Accept: application/json" \
    --data "client_id=$1&client_secret=$2&scope=SPEECH&grant_type=client_credentials" \
    --request POST


getResult.sh
#!/bin/sh
curl "https://api.att.com/speech/v3/speechToText"  \
        --header "Authorization: Bearer $1" \
        --header "Accept: application/json" \
        --header "Content-Type: audio/wav" \
        --data-binary @$2 \
        --request POST

Adding a new module to catch errors

The Speech API now adds a few errors that Jasper could run into, occasionally the API returns a Server Error, or it can't connect to the internet. In the event that happens, the ATTEngine will return a special phrase to trigger a module to let you know what went wrong. The module should be added like any other, and it is below

Special.py

import re

WORDS = ["SPECIAL","CASE"]


def handle(text, mic, profile):
    text = text.replace("SPECIAL","").replace("CASE","")
    mic.say(text)

def isValid(text):
    """
        Returns True if the input is related to the meaning of life.

        Arguments:
        text -- user-input, typically transcribed speech
    """
    return bool(re.search(r'\bSPECIAL CASE\b', text, re.IGNORECASE))


And now Jasper is set up to use the Speech API and will have much much better recognition of words not in his database. Words such as numbers or names. The possibilities are endless.