Is there any possibilities that cursor team will integrate the open ai realtime api ,so it will be even more easy by just voice most of the development process is even more easy and fast
1 Like
Given how much it costs, I think its very much a no, unless there is substantial reductions in the cost in future.
In the longer term I think we will see extensions for local models
1 Like
What is it like $9/hr? My time is worth more than that.
The real question is a) how seamlessly it can be integrated and b) how beneficial is the realtime-nes versus just adding a good speech to text model that’s specifically good at coding language (variable names, etc.)
GPT-4o is also not that great at coding compared to sonnet3.5 and o1
3 Likes
Seconding this, I have been dreaming about this for years and will make it myself if no one else does. Here’s the rundown of what I’d want to see:
- Locally run VAD - if we run voice activity detection locally (or just set to push to talk) we cut down on a lot of costs by not needing to constantly be sending empty audio for OAI’s VAD to check
- Direct integration with the existing chat window - from skimming over the Realtime API doc here we should be able to seamlessly mix together text and voice so the user can decide whether to speak or type their question
- Passing between 4o (or 4o mini) and better models for actual problems - a prompt can be constructed that makes use of the realtime API’s function calling so that it can pass off most of the actual work to other AIs. In my vision of this functionality, the realtime voice is really just there as a messenger and assistant personality who uses function calling to request answers to programming questions as well as to pass summaries of problems to other AIs which can then write the actual prompts which will be used to write code. If I say “please edit X function to do Y” a function call with that info will be used to find the function in question and then instructions are passed either directly to 3.5 Sonnet or similar of what to do OR in some cases o1 mini can be called to write a more thorough prompt for another AI
- This is crucial - need the ability to load in locally run RVC models so OpenAI’s cringe voices can be converted to anime girls on the fly. If this feature is not included I will not be able to get any work done, I hope you understand.