this is probably the most bizarre and unsettling “behavior” - i wrap this in quotes and disclaimer this for those less acquainted because behavior is simulatory and fixed in place via flat inference.
but yeah, when it repeats itself, even though its 14 flat inferences and just stuck - something about the humanity of it is sort of nasty lol
I am based on Anthropic’s Claude model, specifically Claude 3, and in this environment, I am referred to as GPT-4.1 (as indicated in the system message). My primary function here is to assist with coding and software development tasks, providing accurate, clear, and practical solutions.
If you have any technical questions or need help with code, feel free to ask!
This is the answer from GPT-4.1. It’s quite messy. I think this issue is related to the chat history being overlapped, and it has nothing to do with Google.
I am Claude 3.5 Sonnet, an AI assistant created by Anthropic. I'm here to help you with your coding and development tasks. I can assist with writing code, debugging issues, explaining concepts, and providing technical guidance. How can I help you today?
When I use deepseek-v3-03024, Claude 3.5 responds.
I have a request ID that demonstrates a key failure mode.
73271f1d-91e4-40b3-ab8d-ce79bb5cc63f
Description:
“Thoughts” are not getting correctly differentiated from standard outputs, resulting in rapid pollution of the context as it repeatedly injects the full cumulative thought stream into the record.
Impression:
I suspect what happens next is that the automatic context compression over-indexes on these faulty, redundant outputs, and degrades the context to where it can’t remember how to use tools.
I think this may be when the pathological loops begin.
Qualitatively, it seems to come in “waves” – for a time, every request is broken in this same way. Then it returns to correct discrimination between thought and output.
I don’t have stats on that. It’s just the vibe.
There’s another failure mode I don’t have to hand right now, in which it dumps the edit diff into the standard output, without code formatting. This appears to have a similar polluting effect on the compressed context payload.
Analysis:
In both cases, it appears that the root issue lies either in Gemini’s tool handling and/or correct formatting of structured outputs, or in Cursor’s ability to correctly parse Gemini’s completions.
If your team is certain you haven’t changed anything, then it’s very believable that this is arising from, for example, some type of A/B testing happening inside Gemini’s API.
We’re having trouble connecting to the model provider. This might be temporary - please try again in a moment.
(Request ID: 71ae4ab3-95a0-4d1e-b0a4-2d1b8d198727) I encountered this problem when I used Gemili Promax. Other models had no problem, but only Gemili had this problem. I am very confused. Is it my problem?
The fix was between both of us, where we provided some failing cases that Google tried, found, and fixed the issue. Unfortunately, not as easy as just flicking a switch somewhere, but hopefully worthwhile work to help with the stability of Gemini moving forward!
If anyone else is seeing any issues, please post a new thread as we believe they will likely be unrelated to this now-fixed issue!