Hi,
since June 1st, I’ve been consistently unable to use the Claude 4 Sonnet model in Cursor. Every time I try, I get the following error message:
Claude 4 is not currently enabled in the slow pool due to high demand. Please select another model, or enable usage-based pricing to get more fast requests.
I understand how the slow pool works and that high demand can temporarily affect availability. However, it’s been more than 3 days now with no improvement, which is seriously disrupting my workflow.
I use Cursor AI daily for programming, and this limitation makes it very difficult to work efficiently.
Is this a known issue? Are there any updates or plans to resolve it soon?
Is anyone else experiencing the same?
Especially with new models, we have to work to get enough capacity to serve all the requests we get during peak times of the day. At the moment, we don’t have enough available to server both fast and slow requests, hence why it’s off the cards right now.
This is really somewhat situational, in that the model providers we work with are working on ramping up their own capacity, and we can only get so much from them so far. We work closely with them to try to get to the best situation possible, but right now we still aren’t at a point where we can reliably offer Claude 4 on the slow pool without the risk of hitting our capacity limits - then requests start failing!
I want to be clear that this isn’t a cost issue. If/when more capacity is available, we will try out best to get it for Cursor and enable the slow pool when we can.
The infrastructure to run such a queue at the scale we operate in, while not rocket science, adds more overhead than it may initially sound like it would.
We did used to run an actual queue system, but this usually ended up being a worse experience for users during peak times.
If you don’t use a FIFO queue, then it can only mean that requests in the slow pool are not fair in the sense that some customers are able to send more messages in the same amount of time that other customers.
The system currently means the more you use, the slower the requests get. This is the fairest system as it ensures those using the slow pool as backup, and are just over their fast requests limits see the least amount of delays, but someone using 1000s of slow requests will see the highest delays, as they are using the most capacity from the slow pool!