Built a LLM performance monitoring tool to know when it's "normal" for AI to be slow

Hey everyone! Not sure if many people experienced some slowness like I did lately, but I wanted to understand if it’s normal for a model to be slow, and when are the peak hours are to try to avoid them (we all like our AI to be fast). So, I built this tool to track providers and models performance over time.

We’re all somewhat irritated when it takes forever to get a response for a simple request, and I hope that this could help you “ease” the frustration by knowing when it’s just obviously slow and not necessarily the fault of Cursor. I know this won’t fix the speed in the actual moment you need it, but at least you’ll know why it’s happening.

Enjoy, and feel free to share any feedback :slight_smile:





Hi Hugo,

this looks great, I think many will be interested in such info.

  • Do you also track the LLM performance by region? The performance varies by the regional inference load and time of day heavily.
  • Any chance to track LLM performance within Cursor? This is what most would want to know, but likely not as easy to achieve.

Having looked at the actual site:

  • Wow, it takes 6 to 20 seconds to generate 500 tokens. Are the prompts long?
  • You could cache the data, the initial page loads quick but the data takes a while.

Bookmarked!

Thanks, glad you find it useful! :slight_smile: It’s not tracked per specific region, but that would be a good point to consider. Maybe the people at Cursor could tell me different, but I don’t think there’s a way to do it directly from Cursor (without hacking it, per se), so instead I used direct access to the providers since Cursor also use them directly, so the results are probably more or less the same (but for sure not 100% accurate).

The timing is from start to finish, so latency plus entire completion of prompt output, and the prompt is random up to 512 tokens. The data is cached but since the amount of data stored to do all the calculation is required it is what takes a longer time if I am correct, but for sure there might be a way to cache the end result to speed it up.

1 Like

Thank you for the update. Caching might be needed, now the cards on top are loading, but not the graphs.

Region specific info would help those in different regions with separate speed issues.

Cursor internal check might not be necessary if regions are possible as that will be close to real usage.

Yesterday I saw the blip up in Anthropic times matching their error reports so it will help.

For Deepseek, do you use original API or the Fireworks version? (both are used in Cursor, depending on region)

Thank you for your feedback! I’ll see what I can do.
For Deepseek I am not using the Fireworks version, I know it is not as accurate, but I didn’t have access to Fireworks, but I am open to sponsoring to help cover any related costs.

1 Like

Agreed, this should ideally be sponsored, specially for region spanning checks and other providers like fireworks

1 Like