I hate it when things in a project go too well. It usually means that things are due to go not very well in a short while. I’ve got my Large Language Model running on a Pi 5 and I thought I’d use this to create “The Exchange”. This will work with “The Red Telephone”. The idea is that you pick up the receiver on your telephone and dial a “3”. A robotic voice asks you to state your business. You give your question and then put the receiver down. After a decent interval “The Exchange” rings back with the answer.
To make this work the phone needs to capture audio input from the phone, use speech to text to get the question text and send this to the Large Language Model Pi 5. The Pi 5 will respond with the answer and I can use the text to speech in the red phone to deliver the answer. Sounds simple enough.
I’ve found this lovely library which can run on the Raspberry Pi Zero in the phone and convert speech to text. It’s a bit slow, but I don’t care about that because I can record the question and then use speech to text on recorded sound file after the user has rung off. So the next thing I need to do is find something I can use to record audio into the Pi. Up pops https://www.npmjs.com/package/node-record-lpcm16 and that works a treat too. At this point my spidey sense is tingling a bit because things are going too well.
So I start to build the program. I write the code that tells the user to state their question and then records their response. It’s bound to work because I’ve tested it. But of course it doesn’t. The speech playback (using eSpeak) works a treat but the audio recorder fails because it can’t find the input device. Everything works fine individually. The only time it fails is when I ask it to do what I want it to do. I get this a lot when writing software.
I do have a fix though. If I run the whole application as a super-user it works. I’ve no idea what the speech generator is doing with the sound device, but giving the sound recorder awesome system powers seems to enable it to find a sound input device and make a recording.
I’ve spent a bit more time investigating the problem. I’ve added a timeout after the speech output finishes to give it time to release resources. I’ve tried different device names rather than the default one. But nothing works.
It’s not a huge problem if the application has to run as supervisor I suppose (although I’m not a fan of this approach). And I consider “Because it works that way” a perfectly reasonable answer to the question “Why have you done it that way?”. So I’m going to pop the question on the back burner for now and carry on.