Built a LLM performance monitoring tool to know when it's "normal" for AI to be slow

hugohamelcom · May 21, 2025, 10:28am

Hey everyone! Not sure if many people experienced some slowness like I did lately, but I wanted to understand if it’s normal for a model to be slow, and when are the peak hours are to try to avoid them (we all like our AI to be fast). So, I built this tool to track providers and models performance over time.

We’re all somewhat irritated when it takes forever to get a response for a simple request, and I hope that this could help you “ease” the frustration by knowing when it’s just obviously slow and not necessarily the fault of Cursor. I know this won’t fix the speed in the actual moment you need it, but at least you’ll know why it’s happening.

Enjoy, and feel free to share any feedback

condor · May 21, 2025, 11:06am

Hi Hugo,

this looks great, I think many will be interested in such info.

Do you also track the LLM performance by region? The performance varies by the regional inference load and time of day heavily.
Any chance to track LLM performance within Cursor? This is what most would want to know, but likely not as easy to achieve.

Having looked at the actual site:

Wow, it takes 6 to 20 seconds to generate 500 tokens. Are the prompts long?
You could cache the data, the initial page loads quick but the data takes a while.

Bookmarked!

hugohamelcom · May 21, 2025, 2:10pm

Thanks, glad you find it useful! It’s not tracked per specific region, but that would be a good point to consider. Maybe the people at Cursor could tell me different, but I don’t think there’s a way to do it directly from Cursor (without hacking it, per se), so instead I used direct access to the providers since Cursor also use them directly, so the results are probably more or less the same (but for sure not 100% accurate).

The timing is from start to finish, so latency plus entire completion of prompt output, and the prompt is random up to 512 tokens. The data is cached but since the amount of data stored to do all the calculation is required it is what takes a longer time if I am correct, but for sure there might be a way to cache the end result to speed it up.

condor · May 22, 2025, 4:44am

Thank you for the update. Caching might be needed, now the cards on top are loading, but not the graphs.

Region specific info would help those in different regions with separate speed issues.

Cursor internal check might not be necessary if regions are possible as that will be close to real usage.

Yesterday I saw the blip up in Anthropic times matching their error reports so it will help.

For Deepseek, do you use original API or the Fireworks version? (both are used in Cursor, depending on region)

hugohamelcom · May 22, 2025, 11:29am

Thank you for your feedback! I’ll see what I can do.
For Deepseek I am not using the Fireworks version, I know it is not as accurate, but I didn’t have access to Fireworks, but I am open to sponsoring to help cover any related costs.

condor · May 22, 2025, 12:01pm

Agreed, this should ideally be sponsored, specially for region spanning checks and other providers like fireworks

hugohamelcom · September 22, 2025, 7:56am

LLM Overwatch is now tracking DeepSeek and Grok, for a total of 5 providers (along with OpenAI, Anthropic, and Gemini).

Check it out: llmoverwatch.com

Artemonim · September 22, 2025, 9:22am

It’s a strange website. It now shows that gpt-5 responds in 17.6 seconds, but everything is working fast for me.

hugohamelcom · September 22, 2025, 9:25am

The timing you see in seconds is how long it takes for a model to:

Receive the response of a 500 token long request and start treating it.
Respond to the request entirely until it is finished, and for models with reasoning (like GPT-5) it will also include the “thinking” time

Hope it clarifies.

Topic		Replies	Views
Made a monitoring tool for AI providers and models Showcase	0	60	March 20, 2025
Cache training on docs Feature Requests	0	178	September 8, 2023
Cursor very slow today Bug Reports	7	625	July 29, 2024
Expected Model Availability per day time (hour) Feature Requests	2	29	August 11, 2025
LLM Models Status Feature Requests	2	221	April 5, 2025

Built a LLM performance monitoring tool to know when it's "normal" for AI to be slow

Related topics