Context length and slow gpt-4

Hi, I just installed Cursor today and have already used up the free quotas available. As a beginner in programming, I’ve found AI to be extremely helpful. Using Cursor feels like a significant step forward for me. I have a couple of questions before switching to cursor:

  1. What does “slow” mean in the context of gpt-4’s performance? How slow is slow gtp-4? So, far free slow gpt-4 i used seems okay speedwise for me. But is it get any slower?
  2. What is the context length for gpt-4 in cursor?
  3. Does the context length and speed remain constant, or are there any conditions under which it could be reduced, such as heavy usage?
  1. The speed of slow GPT-4 will fluctuate based on how busy our servers are. Personally, I also thought that the slow GPT-4 was often good enough for me.
  2. 8192 tokens.
  3. Context remains constant. Speed can fluctuate (see 1.).
1 Like

thanks for clarifying. is there any plan to release debian package?

Our Linux download should already work for Debian. AppImages work on Debian. You just need to chmod +x them first. We would like to release a deb package in the future.

Yeah, i am already using it with chmod+x, a deb package would be great in future. Thanks.

1 Like

Just a quick suggestion in case you’re not familiar with it - there’s a handy tool called app2deb which seems to work very successfully with the Cursor appimage.

Very straightforward console-based tool: pass it the appimage, get a deb, install the deb. I haven’t tried it on pure Debian as all mine are headless non-graphical boxes, but it worked flawlessly on Ubuntu.

1 Like

I read “We largely rely on 8k but sometimes use 32k” (OpenAI's API pricing - #4 by truell20) in August/24.
Is the use of 32k discontinued?

Also, what exactly happens when I fill the prompt with more tokens than the context window?
I saw that no error is raised, so I presume that it uses embeddings to just take chunks of my prompt. If this is the case, I’d like it to inform me (the user) about this.

At this point, I think we just rely on 8k in production; though if you have 32k access using your API key, you’re welcome to use that.

When you blow out the prompt with large files, we do indeed use embeddings to pick the most relevant parts of the files. Sometimes you’ll see the UI give other options too. When your conversation gets too long, we recursively summarize it so the bot knows about some of the past.

(FWIW in our experience, 32k doesn’t really look at things that more than 8k tokens back in its context window)