Speech Recognition Mod

Speech recognition on the Mac is something that I have toyed with on and off since it was first built-in to the Mac OS back in 1993. Having the computer recognize my voice definately has its “cool factor”, but sometimes it’s just easier to use the mouse.

The other day, I was watching Extreme Makeover: Home Edition. The episode was particularly interesting to me, because the makeover was being done for a man who lost his sight. As usual, they did some really amazing things for this family’s home. One thing that got me curious was the system that they installed that allowed the man to say something like “Computer, what temperature is the house?” and “Set the house temperature to 75 degrees.” and have the computer respond and act on his voice.


At the end of the episode, they said you could go to their website and learn all about the products used in the show. I immediately checked it out, and learned that the system that they used was HAL by Home Automated Living.


After doing some research on HAL, I learned that this system was little more than a voice recognition front end for an X10 home automation system. Sure, it had some other features, but that was basically the foundation of it. Being an Apple guy, I thought “big deal”. I could do that with AppleScript, the built-in Speech Recognition software, and one of the Mac X10 software apps.


I guess now is a good time to mention that my wife, Marie, lost her sight back in 1997 to Optic Neuritis. She was diagnosed with MS in 1998. I have a strong background in computer programming, and I enjoy using my gifts to write applications that can make her life easier. For example, read about how I set things up to allow her to have access to our huge CD collection with Talking TitleTrack.


Getting excited again about voice recognition and how I can enable my wife to be able to do more on her own, I decided to write an AppleScript that will go out and get the weather, and then read it aloud. I dropped it in the Speakable Items folder, and now I can say, “Computer, weather report.” Very cool.


Excited about my new script, I sat Marie in front of the computer, and had her say “Computer, weather report.” Nothing happened. She said it over and over, and nothing happened. I had her try one of the default commands “Computer, what time is is?” Nothing. She tried over and over, getting more and more frustrated, and couldn’t get it to work. Of course, it would pick my commands up immediately, which annoyed Marie even more.


Now I remember why I didn’t write anything like this before. I tried using it with Marie a few years ago, and ran into the same problem. My Mac just doesn’t want to listen to Marie’s voice.


Instead of giving up on the whole idea, I tried to figure out why Marie’s voice couldn’t be recognized by my Mac. My wife’s voice is very young sounding. I searched the web, and found some information that lead me to believe that higher pitched voices aren’t as easily understood by speech recognition applications. So, I asked Marie to try and talk in a deeper voice. After a few tries, the computer responded to her command! Very exciting!


Unfortunately, it is very difficult (and annoying) for Marie to speak in a lower voice every time she would want to tell the computer to do something. The computer didn’t pick it up every time, and she just sounds and feels silly talking that way. So I thought – I wonder if I can have Marie speak in her normal voice, and have the computer lower her voice programatically before passing the audio into the Speech Recognition application? Then Marie could speak in her normal voice, but the computer would “hear” a lower pitched voice that it could hopefully understand.


At this point, I turned to the Mac audio gurus – Rogue Amoeba. These guys are amazing. Not only do they put out solid applications, but they’re very reasonably priced, and they have top notch support.


I posted a message on their forums describing my problem, and they quickly introduced me to Audio Hijack Pro. With this program, I can hijack audio from any input or even any application, apply any set of filters I want to it, and then make that altered audio available as a new audio channel (called Soundflower). This is exactly what I was looking for!


I downloaded the demo version of AHP, and set up a test. The first thing I did was to see if the Speech Recognition application would be able to work through the new audio channel that AHP created for me. That seemed to work just fine. So far so good. Then I needed to apply a filter to pitch the sound down. At first I tried Apples new AUPitch control. Unfortunately, that thing’s a beast, and brought my Pismo down to a crawl. There’s no way my G3/500 is going to handle that kind of work.


I posted my results to Rogue Amoeba’s forums, and they pointed me to an older plug-in called MadShifta. It’s a freeware plugin written by Tobybear Productions and SmartElectronix.


I downloaded and installed the plug-in. I configured Audio Hijack Pro to use the MadShifta plug-in and set the pitch control. Much better performance with this plug-in compared to the AUPitch control. I sat my wife in front of my computer, and crossed my fingers.


“Computer, what time is it?” Worked on the first try!


“Computer, weather report?” Worked on the first try!


The hack/mod worked perfectly!


Knowing that Marie will now be able to use the Speech Recognition application, I am very excited about writing more Speakable Itemsfor her, and eventually hooking an X10 component up here and there, too.


A very speical thank you goes out to Rogue Amoeba for your guidance. Thanks to Tobybear Productions and SmartElectronix, too, for your free MadShifta plug-in.


Here are the details for this hack/mod:


  • Install Audio Hijack Pro from Rogue Amoeba
  • Install MadShifta into ~/Library/Audio/Plug-Ins/VST
  • From Audio Hijack Pro, choose Install Extras, and install Soundflower
  • From Audio Hijack Pro, create a new Session
  • Set that session’s Audio Source to “Audio Device”
  • Set the Input Device to your microphone
  • Set the Output Device to “Soundflower (2ch)”
  • Click on Effects
  • Insert the VST Effect -> MadShifta
  • In the MadShifta settings, adjust the Tune to -2. Your settings may vary.
  • Launch System Preferences -> Speech
  • Enable Speakable Items
  • Set the Microphone to “Soundflower (2ch)”
  • Adjust other settings in this dialog as appropriate
  • Speech Recogntition should now “hear” the voice shifted in real-time by the MadShifta plug-in through Audio Hijack Pro!

Comments are closed.