I agree with all points except number 2. The idea that the slow pool shouldn’t be available for everything is, naaaa. It’s called the slow pool for a reason, and it should be accessible for all services I’m paying for. The problem isn’t the use of tokens; it’s that the system hasn’t scaled to meet demand IMHO. The lack of scaling cpuld be due to the vendor providing the inference not being able to scale too. The slow pool is a good concept because it doesn’t guarantee immediate availability. It acts as a kind of failover, especially considering different time zones and varying coding times, meaning the pool will naturally fluctuate in usage. If there were enough resources to meet demand, this wouldn’t be an issue. The real question is: what wait times are people willing to accept for slow pool access? Should we onboard more users without scaling? What alternative vendor options exist to handle the slow pool? In my opinion, managing expectations is key. The app should clearly display the number of users in the slow pool and expected wait times. We should also cap agentic iterations on the slow pool, for example, 500 iterations within 4 hours for paying users, after which users would experience longer wait times. However, users shouldn’t be penalized for using the service, as this would undermine the business model. Instead, a fair-use policy should be implemented with transparent performance metrics for the slow pool. Free users should have reduced access, but not be completely cut off, perhaps with 30 agentic requests per 4 hours. The core issue is transparency. Without it, we’ll all assume we’re being treated unfairly, when in reality, the team’s hands are tied due to resource limitations beyond their “control”.
Ideally we had prefer a cap on number of users until scaling demands are met. However, due to their popularity this is probably unrealistic. I also have a slight suspicion that not everyone uses the software for coding which could also bring in the so called unwanted users.