Cursor has been a huge productivity boost and I’m easily writing code 50-100% faster now. To the point where the major bottleneck is how fast I can express my ideas, which, with typing, isn’t very fast.
I’ve seen it mentioned several times that people are using speech-to-text options, but most solutions seem to be Mac-only or stuck behind a waiting list. As a Windows dev, I have the Windows Voice Typing (Win+H), it’s just not reliable enough and doesn’t work with my way of thinking out loud, which involves a lot of pauses.
Long story short, I decided (with the help of Cursor) to build myself a minimal desktop app using Python for this and was encouraged to open source it when I shared it with others. Features:
Uses OpenAI Whisper for high-accuracy transcription
Works in ANY text field or editor (including Cursor chat/composer)
Lets you navigate to other windows while continuing to record
Activates globally with the previously useless Caps Lock key
Keeps your recent transcriptions in the system tray
No waitlists, no subscriptions, no platform lock-in. Just clone, add your API key, and start coding with your voice.
Just made a significant updates for this app based on user feedback.
Latest version (v.0.6.1) is pretty easy to setup and use, but the biggest improvement has been switching to the OpenAI gpt-4o-transcribe model so now the transcription is extremely good quality, probably the best you can get, and also nearly instant after you stop recording.
I made better voice typing because apps like wisprflow were Mac only for a while. My project is essentially a light UI wrapper around the best speech-to-text APIs, so you just need to bring your API key, no monthly plan.
It’s Windows only but works with any app that has a text box, you speak - and it inserts what you said with extremely high accuracy.