GPT 4.1 performs better than Sonnet-3.7 for programming tasks involving long contexts

Sheldon_Cooper · May 15, 2025, 3:28pm

Doesn’t get lost in their own answers after the fifth question.

condor · May 15, 2025, 3:31pm

That makes sense, it has nearly double the context length (75k for 3.7 and 125k for 4.1)

condor · May 15, 2025, 3:32pm

How do you usually work with GPT 4.1? It would be great if you can share a bit with the community.

Sheldon_Cooper · May 15, 2025, 5:05pm

I start by sending him the complete project via MCP for analysis, and then we work through the tasks gradually.

WizKhalifaX · May 15, 2025, 5:44pm

Are you using MAX MODE sir?

Sheldon_Cooper · May 15, 2025, 6:16pm

No, GPT-4.1 doesn’t have a Max mode.

condor · May 15, 2025, 6:49pm

Sounds great, which MCP works well for such an analysis?

Sheldon_Cooper · May 15, 2025, 6:53pm

But what I noticed is that during work with a long context — probably after around the 20th question — Cursor with GPT-4.1 started replying in italics and also began to forget its previous answers from the context. Still, it’s better than Sonnet 3.7 when it comes to solving long and complex problems.

condor · May 15, 2025, 6:55pm

Yeah there must still be limits to what it can process and evenutally runs out of context as well, did you check at end of the chat how long the context was for that chat when it started having issues? (at last response on right bottom side are 3 dots, when you click on it, the dropdown menu shows tokens).

Sheldon_Cooper · May 15, 2025, 6:56pm

I’m using this @modelcontextprotocol/server-filesystem.

Sheldon_Cooper · May 15, 2025, 7:04pm

The last chat shows 88,451 tokens. What is the actual acceptable token limit that shouldn’t be exceeded to reduce errors and avoid losing context?

condor · May 15, 2025, 7:12pm

4.1 is listed as 128k tokens, but based on the actual amount of tokens per file or per chat message and response, as well as MCP response token size, this could get perhaps over the limit and why it also forgets previous answers. This is close I would say.

Does the MCP output full file content?

Sheldon_Cooper · May 15, 2025, 7:18pm

I didn’t quite understand your question. When I give the MCP to the chat for analysis, it asks several times for confirmation to read the directories and then provides a general overview of the project. Only after that do I give it a specific task, since I already know the real problem that needs to be solved. However, Sonnet-3.7 gave a more detailed description of the project at first, but then got lost in the amount of information.

condor · May 15, 2025, 7:20pm

Ah I see, I was not sure if the MCP returns all file content or just a summary (directories and filenames).

There is likely in this process a lot of info which gets to the token limit.

Sheldon_Cooper · May 15, 2025, 7:26pm

Yes, it’s not a small project — we’re using Django, GraphQL, Next.js, and supporting four language localizations.

the1dv · May 16, 2025, 12:27am

Any reason you are sending the full context over targeted folders per chat? Thats a lot of unnecessary context in each request that will eventually cause context to be lost!
Usually if you keep your chats focused on a particular task and @reference the relevant folders (or full Django module) you can get more rounds of questions before it starts getting amnesia and causing issue.

This is particularly effective on Gemini 2.5 now most of the connectviity errors seem to be solved.

healthypanda · May 16, 2025, 2:43am

I had this issue with GPT. I was sending screenshots of my code and it was telling me "the error is because you used := and not = when I definitely didn’t use := and it was just stuck on this := bullcrap and giving me wrong answers

Visionary · May 16, 2025, 3:37am

It does actually since the 0.50 update.

gprethesh · May 16, 2025, 5:26am

That is because they have reduced context window for slow requests.

cemalialtuntas · May 19, 2025, 10:42am

Which MCP tool do you use?

Topic		Replies	Views
Cursor Normal Chat is useless in a large code base Discussions	1	551	September 23, 2024
Cursor's ChatGPT-4o rate/context limited--Web version of GPT-4o not rate/context limited? Discussions	6	905	May 28, 2024
GPT 4.5 & Claude 3.7 Cost/Benefit in Cursor Discussions	4	3396	March 16, 2025
Context in Cursor Discussions	5	4152	October 11, 2024
The "Whole 200k Context Window" of Claude 3.7 Sonnet Max Feedback	26	7753	March 30, 2025

GPT 4.1 performs better than Sonnet-3.7 for programming tasks involving long contexts

Related topics