I keep getting the same scenario:
I ask the AI something in the normal chat, it would suggest a library for example.
I tell it, ok that’s a good idea, but let’s handle something differently.
It then suggests something that isn’t working, because it forgets what we were talking about in the first message.
I checked the file that was sent as context and it was only 8650 tokens in GPT tokenizer. A few conversation messages could have added additional 1000. But it still shouldn’t hit the 10000 or 20000 limit.
In the long-context chat this problem doesn’t arise, it appears to be keeping the relevant things in the memory.
It feels like there is additional context cutting, which is very inefficient in the normal chat, even before the token limit is reached.
The model used was Claude 3.5 Sonnet.
I have almost not used the 500 request monthly quota this month, was relying heavily on the long-context mode, because of this issue.