Voice input button starts audio context but receives silent stream — level meter dead, no transcription (macOS, fresh install)

Has anyone figured out a fix for this yet? The voice input is still not working well. The transcript is not accurate, sometimes disappears, adds multiple “-” between words, and splits words incorrectly. It’s really difficult to work at speed like this. I am on ULTRA plan.

@dato2000, this isn’t the same issue as at the start of the thread. The original bug Baseten outage was fixed server-side yesterday, and dictation is working for you now, it’s just that the output is broken.

What you’re describing text jumping around, words getting cut off like ba-- / sh— sounds like what @h-audeto reported above in #39 and #42. I’ve already reported it internally and it’s being tracked separately. No ETA yet.

To help strengthen the ticket, can you share:

  • Cursor version Help > About
  • Layout: editor or Glass/unifiedAgent Cursor Settings > General > Layout
  • Which mic you’re using built-in, USB, or Bluetooth
  • Does it happen on every dictation, or only longer ones 15+ sec
  • About censoring bad words, can you give an example in the format said X, got Y I want to figure out if the STT provider is censoring or if it’s something else

If you can make a short screen recording like @h-audeto in #42, that’d be perfect. No audio needed, just so we can see the text getting rewritten.

@Luisa, yep, I can reproduce it. As soon as there’s any content in the input (like a file reference or text), the mic gets replaced by the send arrow, and you can’t record voice anymore. This is a known UX issue and it’s being tracked separately. The request to keep both controls visible at the same time is already on the radar. I can’t share an ETA yet.

To avoid mixing this with the original Baseten thread, could you open a separate thread for this UX bug? It’ll be easier to track this issue отдельно from the transcription history.

@Lukas_Prokop, your case (text with hyphens where words get split) is the same issue I mentioned in #48. I’ve already reported it and it’s tracked separately. If you can, please share your Cursor version, your layout (editor or unifiedAgent), and which mic you’re using. That’ll help strengthen the ticket.

Thanks @DeanRie. Honestly I’m not sure it’s a separate issue — from my side it feels like it could still be related to the original thing, since the symptoms started in the same window and I don’t have a clean way to tell them apart. But I’ll defer to you on the triage since you can see what was fixed server-side yesterday vs what’s still open.

Either way, info you asked for:

  • Cursor version: 3.3.27

  • Layout: editor

  • Mic: Built-in MacBook Pro microphone (macOS default input). Teams and Zoom virtual audio devices are installed but not active.

  • Frequency: Happens on most dictations, more visible on longer ones (15+ sec). Short utterances sometimes look fine, probably just because there’s less time for the rewrite-and-delete loop to show up.

  • Example of the garbling: dictation I just did, pasted unedited:

So I’m still experiencing issues with the voice agent and actually sometimes- sometimes during- plan mode, but also just- in general. Oh, yeah. Even general. Sometimes it- doesn’t translate at all. Some- not sure it’s completely- separate from the pro- prior issue. 'Cause- right now- it feels like- it’s typing- sometimes. Then filling- in. Then deleting. Then- filling- in. Then deleting. And then- if I- pause for- a little bit- sometimes- it’s all- there. Sometimes- it’s not.

Pattern: words clipped mid-syllable (“pro- prior”, “Some-”), dash/hyphen artifacts where text gets committed and walked back, and stretches where nothing lands at all. If I pause for a few seconds it sometimes catches up — sometimes it doesn’t.

  • Censoring: no clean “said X / got Y” example from this session — none of the dropped fragments above are profane. I’ll grab one next time it happens.

I’ll try to capture a screen recording next time I hit it.

Resolved on my end. Thanks for the help!

user experience of the voice input after latest fix is pretty poor. seems that logic completely changed, it does not accurately pick up the voice input anymore, keeps rewriting my sentences, misspells the words etc. For now I have stoped using the voice input button in Cursor, and started using ‘Wispr Flow’ as a temporary work around. The voice input was one of my favorite features. Hopefully this will be resolved soon.

@dato2000, thanks for the details and the example. This kind of summary really helps. Confirming this is separate from the original bug in this thread. That one was a server-side regression where transcription didn’t return at all, and we’ve already shipped a fix. What you’re seeing now is a different class of issue: progressive STT commits text, then rewrites it, splits words mid-syllable (“pro- prior”, “Some-”), and sometimes drops whole chunks. Different root cause, it just showed up right after the first fix, so it feels related.

I’ll add your example (3.3.27, macOS, editor, built-in mic, garbling on longer dictations) to the existing ticket for this issue. No ETA yet, I’ll share an update when I have one.

If you can catch an example of censoring in the format “said X / got Y”, or a short screen recording, please send it over. That’ll also go into the ticket.

@artsvuni, based on your description it sounds like the same issue. Your report is included too. If you can share your version / keyboard layout / mic, that’ll help strengthen the signal.

Version: 3.2.21 (Universal)
VSCode Version: 1.105.1
Commit: 806df57ed3b6f1ee0175140d38039a38574ec720
Date: 2026-05-03T01:46:14.413Z
Layout: editor
Build Type: Stable
Release Track: Default
Electron: 39.8.1
Chromium: 142.0.7444.265
Node.js: 22.22.1
V8: 14.2.231.22-electron.0
OS: Darwin arm64 25.3.0

Keyboard layout: Brtish.
Mic: Mac Book Pro Mic (built in)

Hope this helps.

hello. the same issue is back. i click the voice button, and it is non response to my speaking. the animation on the voice doesn’t react and nothing is transcribed. it seemed to work a bit earlier, and now doesnt work at all. did hard reset of computer (mic check, etc), of cursor..etc. it’s identical to this prior issue.

Title: Voice-to-text (dictation) button stops capturing audio — recurring regression

Cursor: 3.7.42 (VS Code base 1.105.1, commit 5702c9cfca656d8710fad58402fe37f14345e3a0)
OS: macOS 26.5.1 (25F80), Apple Silicon (arm64)

Description:
The voice input button is unresponsive again. When I click it, recording
appears to start but no audio is captured — nothing is transcribed. This is
a recurrence of a previously reported issue that had been resolved.

Steps to reproduce:

  1. Click the voice/dictation button in the chat input.
  2. Speak.
  3. No audio is picked up; no text appears.

Expected: Speech is captured and transcribed into the input.
Actual: Button activates but no audio is received / nothing transcribes.

Frequency: Was working previously, then regressed (second time this has happened).

Hey, thanks for the detailed dump, @dato2000. This is exactly what we need.

I get why it looks like the same thing since the symptoms started in the same window. But these are two different bugs. The first one with no transcription at all was server-side due to an issue on our end, and it’s already fixed. What you’re seeing now, the text jumping, words getting cut into syllables like “pro- prior”, “ba–”, chunks disappearing, is a separate issue with progressive STT. It’s rewriting intermediate hypotheses more aggressively than it should. This has already been reported internally and is being tracked separately. No ETA yet.

I’ll add your example 3.3.27, editor layout, built-in mic, more noticeable on longer dictations 15+ seconds to the ticket. It helps a lot.

There’s a workaround for now. Dictation in the Agents Window uses a different path. The text shows up as one block after you stop, without live rewriting, so this bug doesn’t happen there. @h-audeto already confirmed this above. Not ideal UX, but it works so you don’t lose chunks.

If you catch this again, a short screen recording like @h-audeto’s in #42 and a clean redacted example in the format “said X / got Y” would be super helpful. I want to confirm whether the STT provider is cutting it or if it’s something else.

ok thank you.