So, I've recently began to mess with Jasper, and simply wasn't satisfied with the way the on board Speech to Text Engine handled what I said. So, I decided to follow the Jasper projects advice and transition over to the using the AT&T Speech API (AT&T calls it the "Speech API" so that's what it will be referred to from here on).
The main concern I have is that the Speech API is rate limited, but it's very difficult to run up to the limit. You're limited to 1 request per second and your audio file must be less than a minute in length. Jasper should have no problem with that. The main modification comes to Jasper's "mic.py" file. In order to keep Jasper quick, he'll use his in-house Speech to Text engine to detect his name, but after that, he'll use the Speech API.
This article does assume you know a small bit about programming terms such as a method and where it begins, and ends. It also assumes you've set up an AT&T Developer Account (Sign up here), and you have a Sandbox app set up for Jasper, and you have it's APP_ID and APP_SECRET.
Changing Mic.py
A few lines need to be changed/added to the mic.py file, this will be the only modification to the Jasper Core...
At the very top, you'll need to add an import for Speech handler, just add
from ATTSpeech import ATTSpeech
After that, you'll need to add a variable to the mic class change
speechRec = None
speechRec_persona = None
to
speechRec = None
speechRec_persona = None
attEngine = None
Then, we have to add a line to the mic's __init__ method. Add this to the very end of it.
self.attEngine = ATTSpeech()
Now, find the mic's transcribe method, and change
if MUSIC: self.speechRec_music.decode_raw(wavFile) result = self.speechRec_music.get_hyp()
elif PERSONA_ONLY: self.speechRec_persona.decode_raw(wavFile) result = self.speechRec_persona.get_hyp() else: self.speechRec.decode_raw(wavFile) result = self.speechRec.get_hyp()
to
if MUSIC:
self.speechRec_music.decode_raw(wavFile)
result = self.speechRec_music.get_hyp()
elif PERSONA_ONLY:
self.speechRec_persona.decode_raw(wavFile)
result = self.speechRec_persona.get_hyp()
else:
result = self.attEngine.getResults(audio_file_path)
Adding the ATTEngine.py file
In the same directory as mic.py, we'll need a new file called ATTEngine.py. It houses the methods to communicate with the speech API, the create the file and add the following:
import subprocess
import urllib2
from subprocess import call
class ATTSpeech:
CLIENT_ID = "FillMEIn"
CLIENT_SECRET = "FillMEIn"
TOKEN = "None"
def __init__(self):
pass
def getToken(self):
#Get Access Token via OAuth.
proc = subprocess.Popen(["../static/getToken.sh", self.CLIENT_ID, self.CLIENT_SECRET], stdout=subprocess.PIPE)
tmp = proc.stdout.read()
tmp = tmp.replace("\"","").replace("{","").replace("}","").split(":")[1]
self.TOKEN = tmp[:32]
def getResults(self,path):
if not self.haveInternet():
return ["SPECIAL CASE I COULD NOT CONNECT TO THE INTERNET",None]
proc = subprocess.Popen(["sh","../static/getResults.sh",self.TOKEN,path], stdout=subprocess.PIPE)
tmp = proc.stdout.read()
for strin in tmp.upper().replace("\"","").replace("{","").replace("[","").replace(":","").replace("}","").replace("]","").split(","):
print strin
if "HYPOTHESIS" in strin:
strin = strin.strip().replace("HYPOTHESIS","").strip()
return [strin,None]
if "IOEXCEPTION" in strin:
return ["SPECIAL CASE SPEECH API HAD AN ERROR",None]
if "UNAUTHORIZED" in strin:
print "The current token is invalic, getting a new one..."
self.getToken()
return self.getResults(path)
return ["",""]
return ["",""]
def haveInternet(self):
try:
response=urllib2.urlopen('http://74.125.228.100',timeout=1) #Google IP. Prevents a long DNS look up for a url. 1 second timeout. Adjust as needed
return True
except urllib2.URLError as err: pass
return False
Change the CLIENT_ID to your APP_ID
Change the CLIENT_SECRET to your APP_SECRET
Scripts for the Engine
You'll notice the engine makes use of two bash scripts. They need to be located in the ./jasper/static directory.
getToken.sh
#!/bin/bash
curl "https://api.att.com/oauth/token" \
--header "Content-Type: application/x-www-form-urlencoded" \
--header "Accept: application/json" \
--data "client_id=$1&client_secret=$2&scope=SPEECH&grant_type=client_credentials" \
--request POST
getResult.sh
#!/bin/sh
curl "https://api.att.com/speech/v3/speechToText" \
--header "Authorization: Bearer $1" \
--header "Accept: application/json" \
--header "Content-Type: audio/wav" \
--data-binary @$2 \
--request POST
Adding a new module to catch errors
The Speech API now adds a few errors that Jasper could run into, occasionally the API returns a Server Error, or it can't connect to the internet. In the event that happens, the ATTEngine will return a special phrase to trigger a module to let you know what went wrong. The module should be added like any other, and it is below
Special.py
import re
WORDS = ["SPECIAL","CASE"]
def handle(text, mic, profile):
text = text.replace("SPECIAL","").replace("CASE","")
mic.say(text)
def isValid(text):
"""
Returns True if the input is related to the meaning of life.
Arguments:
text -- user-input, typically transcribed speech
"""
return bool(re.search(r'\bSPECIAL CASE\b', text, re.IGNORECASE))
And now Jasper is set up to use the Speech API and will have much much better recognition of words not in his database. Words such as numbers or names. The possibilities are endless.
Hi there,
ReplyDeleteI've implemented your class into a very simple Arecord python script that records the file that will be uploaded by your script, I am recieving only the following error:
ERRORHTTPCODE500 HTTPRESPONSEMESSAGEINTERNAL SERVER ERROR ERRORTYPESPEECHENGINEEXCEPTION ERRORMESSAGEIOEXCEPTION OCCURRED WHILE CALLING WATSON ASR SERVICE. ['SPECIAL CASE SPEECH API HAD AN ERROR', None]
I have posted a question at stackoverflow:
http://stackoverflow.com/questions/24159867/500-error-with-att-att-speech-to-text-api-python
if you could give me a hand with this problem I would be very grateful.
Thanks,
Ben
Occasionally the API will have the error, I found no way to fix it (yet) except to wait.
DeleteHello, is it possible to have jasper use AT&T's text to speech API?
ReplyDeleteThanks,
Nathan
This comment has been removed by the author.
ReplyDeleteHi Zach,
ReplyDeleteI am a kind of newbie here, please pardon me if I my questions are very basic.
Question 1:
==========
I have flashed my rasberry-pie's SD card with the image provided from Jasper documentation link. There is no trancribe method present under mic.py file. I am not able to locate the following lines of code in mic.py file :
if MUSIC:
self.speechRec_music.decode_raw(wavFile)
result = self.speechRec_music.get_hyp()
elif PERSONA_ONLY:
self.speechRec_persona.decode_raw(wavFile)
result = self.speechRec_persona.get_hyp()
else:
self.speechRec.decode_raw(wavFile)
result = self.speechRec.get_hyp()
in this case what should I do ?
Question 2:
=========
What do you mean by Sandbox app set up for Jasper ? If I don't have, how to setup one ?
Thanks,
Sankar.
Hi;
ReplyDeleteI have completed the instructions but get the following error when I start Jasper:
File "/home/pi/jasper/client/mic.py", line 13, in
from ATTSpeech import ATTSpeech
ImportError: No module named ATTSpeech
I added the text "from ATTSpeech import ATTSpeech" under the last from / import entry "from stt import transcriptionMode".
Can you help?
Please have a look at:
ReplyDeletehttps://github.com/jasperproject/jasper-client/pull/244/files