While writing tests, I noticed that Claude 4 Thinking has become really good at planning and analyzing the codebase to write tests. In a single pass, it called tools multiple times and reflected on the implementation (15s/24s/11s/20s/11s/16s) — in total, it spent about 97 seconds thinking. Before, it never did that — it used to be dumb in this regard, like a puppet.
Yes, it always wrote code well, but it never thought for more than 10 seconds. When calling tools, it just looked at the code without reflecting.
Maybe it’s a bug or a stroke of luck — or did you upgrade it and allow it to think while using tools?
If you did upgrade it — serious kudos to you
But if it’s a bug or just a fluke — in that moment, I really felt the presence of a truly thinking Claude 4 Thinking, something we’ve been missing so much.
I see also great improvements in Claude 4 Sonnet Thinking (and MAX). When it was fresh integrated it wasnt so well tuned.
Note that there are also several AI related features coming that will help as well I will leave that bit discovery by yourself once the features land in EAP.
I used to be a supporter of the Ask mode, but now I rarely use it, because the Agent mode has become much better — you can tell the agent not to make any changes, and it still works perfectly.
It seems like Claude 4 Sonnet Thinking has really been improved — it thinks continuously on every task when using tools
I also saw this improvement. I had Claude 4 Thinking implement an entire admin backend portal and it did a great job. Just last week it would not have did this good of a job. It even summarized what it did very good which is how I initially noticed the improvement.