- Open AI panel
- Toggle model to o1-mini
- Ask for help
The o1-mini model returns the full response at once, not gradually.
The o1-mini model returns the full response at once, not gradually.
I suspect this may be due to using OpenRouter under the hood.
Big conclusion jump - however I use OpenRouter for my products and they announced that while streaming is supported, all the tokens come in at one time (for o1-preview and o1-mini specifically)
This is probably a wise decision as 20RPM rate limit from their standard OpenAI account would not serve all of us!
I have a question: It seems that rate limits are designed to alleviate server pressure by delaying request processing. For instance, with a 20 RPM limit, the server would process my request after a delay and return the first token, rather than sending all tokens at once.
If I’m misunderstanding how RPM works, please correct me.
OpenAI has not yet enabled streaming for those models through the API.