Whisper Assistant Extension (Voice to text)

Yeah, I have noticed this.

Simply changing the underlying package from Whisper → WhisperX will improve this dramatically.

WhisperX claims 70x realtime. I.e. A minute of audio would take a second :exploding_head:

I plan on implementing this soon, unless anyone else would wants to tackle it and send a PR :slightly_smiling_face:

1 Like

Oh a 70x speedup would indeed be good enough! Let’s be realistic, I wont have time to dive in it in the foreseeable future and if you say that it is probably hard I should not expect to make it work in a bored evening of hacking then.

1 Like

Yeah, if WhisperX lives up to what it promises it should really make this library much more usable.

I’m finishing a project over the next day or so, once completed I’ll take a look at WhisperX

couldn’t get this working on windows…sox portable looks to be not updated and so couldn’t install it via chocolatey, tried non-portable version and wasn’t recognised by the extension

I did get this chrome based app/extension working though for anyone else - works well, press shortcut, say your thing, press shortcut again, and it will auto paste text into cursor

Thanks for sharing this, it looks like a great project.

I’ve been looking at ways to make it easier to get started whilst not having to send recordings remotely.

It looks like faster-whisper-server makes this possible and also supports streaming. This would mean you wouldn’t need to go through a complex setup to run Whisper locally, just spin up a docker container.

Thanks for the feedback and alternative suggestions.

As mentioned, I will be looking into upgrading this extension soon and will update everyone on this thread once I have something ready.

As you likely would seen that project uses Whisperx with docker. I might try that running that.

There is surprisingly little there for good desktop easy live voice transcription (I think Mac has an app or two), so extensions like these would definitely worthwhile to implement for the vscode/cursor community

Is Whisperx faster than Groq’s whisper?

not sure about speed but the key difference is local vs. cloud (and speed will be determined by model size and your computer specs if local)

2 Likes

Yes, local for me is necessary for privacy but also cost. I wouldn’t want to pay each time I transcribe.

Not sure if relevant here, but advanced voice mode is now available in the API + it has tool use: https://openai.com/index/introducing-the-realtime-api/