I am finding that if Cursor gets stuck on something using claude-3.5-sonnet or gpt-4o that I can usually switch to o1-Preview and it can figure out the issue.
I use this technique in those situations where the model will start suggesting code edits that are already in the file or keep switching back and forth between the same two failing edits and get stuck trying to debug or build something. In this case, I find switching to o1-Preview and making the request again will usually present an output that fixes the issue.
I am finding o1-Preview costs me about $0.40 per message, so I use it sparingly. But I think it’s worth every penny to get past those times when the other models get stuck or hallucinate.
1 Like
If you are on high enough tier (on OpenAI API tiers) you could use “Toggle OpenAI key” functionality and use your own key for o1-preview requests. I use it like this when it’s time to consult o1 for problems that Sonnet cannot solve. Of course depending on your particular context size $0.40 might be cheaper but I use it sparingly with my own key and single request might cost easily less than $0.10.
EDIT: perhaps one could use openrouter if not applicable for direct OpenAI use.
1 Like
Same. I don’t even bother using anything except o1-preview anymore, besides some minor edits I know Claude can handle. It costs me around 100$ a month, but saves a lot of time.
What I’m wondering is whether o1-mini could be better in some instances. So far I’ve found o1-preview to work best in my testing (however, I haven’t extensively tested it against o1-mini, only Sonnet).
I’m confused about the benchmarks that say Sonnet is still better for coding (e.g. livebench). In most of my use cases (large codebase, large data analyses) o1-preview has been superior.
Perhaps the only time where Sonnet was preferable was some recent API and Sonnet’s training included new info about this API (due to a more recent cut off date), which was lacking in o1-preview.
Update: this holds true for me even after the new Sonnet release.