Which AI Model Should I Use for Programming?

I’m currently using Claude 3.5 Sonnet (Oct 22, 2024) for cursor composer agent, and I’m wondering if there are better AI models. Specifically, I’m looking for guidance on how to evaluate and choose the right AI model for coding.

Here are the factors I’m considering:

  1. Accuracy and Correctness: The quality of code generation and error detection is a top priority.
  2. Context Window Length: I understand that longer context windows allow for extended conversations and complex modifications. This seems especially useful for tasks involving large codebases or detailed iterations.

My Questions:

  • Are there any online resources or benchmarks that compare the latest AI models for programming?
  • Beyond accuracy and correctness, how significant is the context window length in real-world use cases?
  • How can I stay updated if better models or versions are released in the future?

I would appreciate any insights, experiences, or recommendations for selecting the most effective AI model for programming tasks. Thanks in advance!

Claude 3.5 Sonnet still seems like the best model for general coding tasks. If I get stuck on a problem using that model, I usually try OpenAI o1, and sometimes that can get me unstuck.

I’ve found the latest Gemini Flash model to be quite useless in Cursor so far. OpenAI 4o is almost worthless compared to Sonnet.

I have a feeling the context length is being limited by Cursor itself and not the model APIs.

Sorry for not answering all of your questions, I just wanted to pitch in with my thoughts. It seems that Sonnet is the preferred model by most developers still.

1 Like

thanks a lot for your input
there are 2 Claude 3.5 Sonnet, the latest is “Claude-3-5-Sonnect-20241022”, and recommended to use this right?
My way to confirm is go to Claude 3.5 Sonnet \ Anthropic to find the latest one.
But I feels this seem not the best practice.
I am looking for a better way…

I too find Claude to be the best. However, it falls short on very large files, somewhere around 1650 lines in my case, after which it could only see “…”, as it told me. I switched to o1 and it apparently digested the whole thing. I once got an excellent series of progressive unit tests from o1. In general, however, o1 is not as good at guessing what I probably want, and tends to stop short of a complete answer, perhaps because it is pre-prompted not to do too much inference due to the cost.

I’ve been dabbling with Rust for a couple of weeks now and get lots of issues with sonnet. My screwdriver is sometimes to paste my terminal logs into chatgpt o1 and provide its answer to sonnet. It tends to unblock some of Sonnet looping errors, especially with Rust.

claude 3.5 sonnet is the best right now