I’m using Sonnet 4 as my daily driver. Sometimes it will get stuck in the initial “Generating…” stages for ages, and won’t even produce any thoughts. This latency seems to be transient and often if I stop/restart inference it’ll work.
I wonder if it makes sense to allow users to configure some sort of timeout/auto-retry for their inference requests?