I’m a Cursor Pro subscriber and ever since I started using the IDE I thought of how powerful it would be to be able to use my voice alongside Cursors GPT-4 code generator. I saw it wasn’t available already anywhere else so I created it myself!
The GitHub repo is here, if you’d like to see the code:
And the link to the VSCode extension is here:
It’s free to use as it uses on a local version of Whisper. All instructions are outlined in the repo and the extension readme.
It’s only tested on a Mac so open to feedback if you notice any issues. I’ve been coding with this for around 2 weeks and it’s definitely improved my Cursor experience!
Unfortunately, this plugin doesn’t work for me. I am running Cursor on Windows and have installed whisper (I can run it from the command line). However, the plugin keeps telling me Whisper is not installed. It’s in my PATH too.
The check for whisper just looks for the whisper command on the command line.
Run ‘whisper -v’ in a terminal to make sure it’s accessible, if it’s there then try restarting cursor and trying again.
It is accessible. As I said, I was able to use via the command line to transcribe some audio I recorded. I also tried restarting cursor and the terminal too.
Oh, I need to star this repo then! I wanted to see how hard it would be to make it streaming the recording and start the decoding before we finish speaking. Would that come with the WhisperX implementation?
That would be great. I had a go at this originally but ran out of time to get it working smoothly with SoX.
Unless I am mistaken, I don’t think you can stream using Whisper. You may have to chunk the audio and get the translation / response for each segment. If that is the only approach that will work, it would also work with WhisperX as well (if needed).
If you have the urge, see if there is another approach that would work to stream the recording with the original Whisper release, it may mean we don’t need to use WhisperX at all!
Have the same issue. Regardless though the transcription is so slow even for single sentence recordings that it doesn’t seem worth using unfortunately.