Hey @andrewh - What does caching have to do with the issue we’re seeing? Can you clarify?
We want to use the 200k model, but Cursor keeps invoking the 1M model instead, which doubles the cost and has lower RPM and TPM quotas than the 200k model. So, caching is welcomed but doesn’t solve the bug you have, we still over pay and cannot use Cursor as intended according to your docs.
For reference, here is the issue I opened.
I responded in another Bedrock-related thread as well to check whether others see the same behavior.
Separately, if this ties into your recent changes to reduce consumption, I’ve noticed since yesterday that my chats are being summarized far too early, even when they’re well under 5% of the context window, and sometimes during the first agent response. That isn’t expected behavior and isn’t something we want. Screenshot below.
