Implement timeouts

I’m using Sonnet 4 as my daily driver. Sometimes it will get stuck in the initial “Generating…” stages for ages, and won’t even produce any thoughts. This latency seems to be transient and often if I stop/restart inference it’ll work.

I wonder if it makes sense to allow users to configure some sort of timeout/auto-retry for their inference requests?

This started to happen quite frequently in the last couple of week for me. In addition to that Cursor fails to run Build command for Unreal Engine in approximately 70% of tries. So for now I just skip that step manually every time it gets stuck.