[Solved] Add Claude 3 models

well gpt4 is general purpose, I think coding models have to be trained specifically for those tasks. btw have you seen the ‘starcoder2’ on huggingface? the description is that it was trained specially for coding (on 17 languages but it doesn’t matter since models can pick up other syntax from simply supplying them a pdf or plain text with syntax rules cheat-sheet in context) but i don’t know how to use it, probably locally, but i’m so lazy (after playing with stable diffusion locally and getting headache from all the customization options and concepts, i’ll better leave this to professionals :laughing: , let’s wait when Cursor backend will get even better).

how do they benchmark the Haiku model? could be interesting. if it’s better than gpt4 at that cost, we can try to use a map-reduce approach of breaking down one request into many smaller more specific requests for inference and run them in parallel through that lighter model (each with 200k context limit will be able to always give exact and correct factual information based on latest docs/facts supplied from the official sources) and then fact-check and fix the results only once, using the ‘best’ model (let’s say Claude or whatever will be the next best choice). And all that will result if very good experience for us in Cursor. Hey how do we ping the admins, we should ask this as a feature request and vote on it :grin: :fire: :rocket: going to be supercool.

so cool! can you show exactly what you did? i’m really curious to test claude 3 in cursor for some of my work.

I keep seeing on YouTube and Reddit, people report that Claude 3 Opus is more accurate than GPT4 Turbo at coding and it’s not lazy so it output full code each time.

here are two benchmarks i trust. one shows claude just behind gpt-4, the other shows ahead:

https://evalplus.github.io/leaderboard.html

2 Likes

Add Claude 3 models - #11 by dioro :smiley:

btw aren’t you worried that Cursor will send a full 200k context max or even more, like all your codebase and docs, every single time to the API, and it’ll cost a lot of $ by mistake? I am still worried about understanding what exactly and how many content does the IDE send to AI API endpoint :eyes: . Maybe someone could shed light on that? :pray: (or does it mimick openai api limits, so will send only N amount as the max allowed in openai endpoint? in this case the Claude 200k simply won’t be used even if you connect the openrouter :man_shrugging: )

No idea, I’m personally not gonna mess with the API due to that reason. I just hope the Cursor team manages to add the new models to the Cursor Pro plan, or even higher tier if necessary. I would gladly pay $50+ for GPT-4 level model with 200k context.

1 Like

Cursor team, are there any updates you can give regarding the Claude 3 models?

  • Have you done any benchmarks on your side? Thoughts on how the models performed compared to GPT-4/3?
  • Do you have an ETA for when they might be available? Few days / a week / 2 weeks…
  • Thoughts on higher tier plans for bigger context models?

Thanks

1 Like

lol this is a good follow up but it’s weekend time :fox_face: :clinking_glasses: :smiley_cat:
Let’s see what happens next week. Very curious to see Haiku with my tasks as well :heavy_check_mark: probably the only model pricing which will not make us bankrupt. I’m trying Sonnet in openrouter today, it’s pretty good, feels like gpt4 in understanding longer prompts and tasks in those longer complicated requests.

1 Like

You can use it via api, here are my settings

1 Like

Interesting. Something might be wrong in my settings? I simply get a blank response in the chat.

Everything looks correct, but what about the balance on the openrouter? Are there enough funds?

1 Like

Same problem, funds: more than a few dollars.

just to be clear - when you do the setup shown above, you put your openrouter api key in place of the openai api key?

Yes exactly

Does it work here: Playground | OpenRouter

Does Claude allow setting a budget like OAI does? As in, I’d rather run out of my pre-paid credits then sell a kidney because I asked a question to Cursor in the wrong code base and I now owe Claude two bazillion dollars.

Yes.

Maybe claude shouldn’t be used in chat, but for a limited number of large context whole of codebase reasoning calls, when you really need some very hollistic reasoning?

See https://twitter.com/itsandrewgao/status/1766891500921671850 For inspiration.

They don’t use the maximum context window of GPT4, (I think they limit at around 10k token). Which is plenty for a lot of situation. The only time I wish the context was bigger is where you pass a @Doc + @Codebase