- How long do GPT-5.4 cache tokens last?
- Is it possible for GPT-5.4 to launch a subagent and the cache expire by the time the subagent completes the task?
- Does MAX mode change cache working?
up!
Cache duration: GPT-5.4 supports extended prompt caching of up to 24 hours. The cache is prefix-based, meaning if the beginning of your prompt (system instructions, prior conversation context) matches a previous request, those tokens are served from cache at reduced cost.
Subagents and cache: Subagents use the same model infrastructure, so cache behavior is identical. It’s very unlikely the cache would expire during a subagent task, since caches persist for hours and subagent tasks typically complete much faster. One thing to note is that the cache is prefix-based – a subagent has its own prompt context, so it creates a separate cache entry rather than sharing the parent agent’s cache.
MAX mode: MAX mode does not change how caching works. It affects model selection and token limits, but the caching mechanism stays the same regardless.
One general note: extended caching requires that Privacy Mode is not enabled, since the provider needs to retain prompt data to cache it. If Privacy Mode is on, caching falls back to a shorter default duration.
This is important information. Thanks for the answer!