Context length and slow gpt-4

At this point, I think we just rely on 8k in production; though if you have 32k access using your API key, you’re welcome to use that.

When you blow out the prompt with large files, we do indeed use embeddings to pick the most relevant parts of the files. Sometimes you’ll see the UI give other options too. When your conversation gets too long, we recursively summarize it so the bot knows about some of the past.

(FWIW in our experience, 32k doesn’t really look at things that more than 8k tokens back in its context window)