I tried 03, deepseeek, none comes close to claude…maybe its because it been integrated for a longer time but i just cant seem to do anything productive with the rest…what am i doing wrong ?
For agent I haven’t gotten better results than claude with anything, but for a lot of small tasks I use deepseek v3 to save my claude usage and free up claude for more difficult tasks.
so you chat with deepseek and use cloude as an agent ?
Yeah more or less. Working with both of these models I can kinda guess when deepseek will work and when claude will work better. For things which require a lot of context, claude works a lot better, even in chat. For things like splitting a function into multiple smaller ones, writing utilities, minor refactoring etc deepseek is enough. I don’t use any model other than sonnet in agent
No, after extended attempts to use Deepseek and o3mini, I have always come back to Claude.
It’s not so much the quality, it’s just the reliability – Claude just seems to know what to do, it has a reliable procedure, it reliably applies code, it’s predictable. It doesn’t always get it right, but it’s consistent.
I occasionally pop over to DS or o3 but I always come back to Claude Sonnet .
Yes the same and the speed i think also is much better in my opinion. I dont know why but cloude feels like it was trained inside the IDE and the others dont feel as natural inside. So much talk about these new models and we are still using the one that has been around for a while.
I actually did but not by default. My workflow currently is the following:
- I attempt to tackle problem with Claude in agent mode, especially if it’s something that can be used with tools. It works by itself in 30% of times, another 30% when I keep adjusting and directing it.
- Sometimes it’s completely doesn’t work, does’t grasp the concept or cannot understand some stuff, in this case I’m opening a chat, referencing summarized composer, codebase and trying to solve the issue in chat with other model.
Before I used ChatGPT o1 but now it seems Deepseek R1 is better option - ChatGPT and Claude thinks a bit the same when Deepseek seems to think differently, or it’s “random” in another direction, but I’ve found that this setup works.
My general experience (not agent specific) is that the other models can everything that Claude can (roughly speaking) but Claude is the best model at using context to infer what you want, especially in longer conversations (which incidentally, is important in our assessment in a model being “smart”). What that means is your prompts can be very minimal, or poorly worded, but it still figures stuff out. And that means it can save you a lot of time - like not having to backtrack when it did the wrong thing after your prompt, or wording your prompt carefully.
Ive been working on my Technomancer skills for quite a while now specifically looking to press the limits of what I PERSONALLY want and expect out of my [Electric_Butterly] – OpenAi in all formats has been, IMO the most nerf’d.
I was attempting to do deep research on powerful subjects and its Bowling-Bumpers pop up real fast.
Claude is fantastic.
R1 is too fluffy in the dialog… (the fluffy language costs mony - the model providers should have a token cache for fluffy ■■■■■■■■ words that do not rprovide actual data on the subject, thus should not count (there was an equiv in 90s with carriers - but cant recall now)
–
OpenAI is Literally manhandling their weight around and honestly its less trustworthy to me.
Claude’s interface ■■■■■. Complained forever about search, have my screen is wasted on a blank parchment.
billing customer service is broken (OAI charged for pro for 6 month - didnt upgrade my account, chat support says “email support” 'OK give me email address - and it refuses.
At the end of the fn day, the ONLY thing that matters to me is “Non_linear_Continuety_of_Thought.”
Continuity of thought is not Chain of Thought.
Its the ability to continue and recall any chain of thought at any point from an point, and any perspective given an AI-Kashic_Record which is a symbol of that point in which a human or an agent of the Human’s ‘intent’ wants to reoientate the position in context. Context could be thought of the present point of view from which actions are being formulated or invoked, given their position within the NCLoT and the INTENT.
in this, they all fail.
SO this is why we are all searching for the Mechanical_Elven_Language that will allot for NLCoT
I’m currently getting good results from Gemini 2.0 Flash (non exp). Its not supported in Agent mode though.
I spend a lot of time testing these models, as my day job is creating custom agents for various automation tasks. The reasoning/thinking models are good, but they are not yet really usable in an agentic workflow. They have limits on the context that can be provided, they have limitations on the input data types, whether they support structured outputs or system messages or even just temperature that make them non-starters for me. This is why you can have a fantastic experience with o3 with a one-off request in chatGPT and find it has poor performance in Cursor.
Despite there being one tool in Cursor that calls itself an agent, the entire software is built on agentic workflows, where many models are doing small parts of a larger task. I’m sure there is a lot of function calling and grepping and good old fashioned algorithms being used whenever and wherever possible to ensure the most reliable and repeatable results. I’ve found that its best to avoid using AI for any part of an automation task that can be done with code. This reduces the impact of sometimes unpredictable AI results.
The reasoning models you’ve just got to throw it an input and hope for the best, and all that ‘thinking’ seems to hurt their ability to insert function calls into their process and execute a task that takes longer than a single response. When having an ongoing chat I expect that important information may get buried in all that thinking logic.
I expect we’ll see improved performance from non-reasoning models still, as they progress past the current Claude 3.5 Sonnet’s capabilities.
In my experience, gemini-2.0-pro-exp starts to become confused on very large contexts much “later” than sonnet.