Display number of tokens in current chat

I am not sure if this information is easily available, but I just noticed that when mentioning files with the @ feature in Ctrl + L in Long Context Chat mode, there is a handy feature that displays the token length of the file next to the file name.

That is very cool.

It made me wonder if it would be possible to display the cumulative token length of the chat somewhere - i.e. where the value was updated as the chat progressed, just to see how large the chat was actually getting.

In case it makes a difference, I saw the token length of the files in this scenario:

  • (the ‘long context chat option’ is enabled in Cursor beta settings area)
  • Ctrl + L
  • Select Long Context Chat mode with claude-3-5-sonnet-200k
  • Press @ to mention some files

long_context_chat_file_token_length

5 Likes

Bumping this awesome idea.

Sounds like a great feature that would enhance our ability to manage token usage better!

Also, maybe a display to show remaining tokens allowed for that chat.

1 Like

Definitely need this :slight_smile: .

I have been thinking recently, I have a very poor conceptual intuition of token size when it comes to chat length.

In fact, this evening I googled something like:

how to visualise token length in ai models

And it returned results like these:

**01)** What are tokens and how to count them?
https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them

Takeaway:

Here are some helpful rules of thumb for understanding tokens in terms of lengths:

  • 1 token ~= 4 chars in English

  • 1 token ~= ¾ words

  • 100 tokens ~= 75 words

Or

  • 1-2 sentence ~= 30 tokens

  • 1 paragraph ~= 100 tokens

  • 1,500 words ~= 2048 tokens

To get additional context on how tokens stack up, consider this:

  • Wayne Gretzky’s quote “You miss 100% of the shots you don’t take” contains 11 tokens.

  • OpenAI’s charter contains 476 tokens.

  • The transcript of the US Declaration of Independence contains 1,695 tokens.

**02)** Visualizing Token Limits in Large Language Models
https://galecia.com/blogs/jim-craner/visualizing-token-limits-large-language-models

Takeaway:

“This sentence contains six tokens.” has 6 tokens and 36 characters.

The Gettysburg Address has 310 tokens and 1,453 characters.

The US Declaration of Independence has 1,638 tokens and 8,147 characters.

Anne of Green Gables, chapter 1 has 3,549 tokens and 15,585 characters.

**03)** Visualizing the size of Large Language Models
https://medium.com/@georgeanil/visualizing-size-of-large-language-models-ec576caa5557

Takeaway:

If we assume a typical Book contains ~100,000 Tokens and a typical Library shelf holds ~100 books. Each Library shelf would contain about 10 million Tokens.

I still don’t think these figures help me guess my chat lengths yet!

When things are getting slow, or I want to do something big, I just switch to a long-context chat and hope for the best!

3 Likes

+1
one thing I never know is if I tag certain files or folders as context… do I need to tag them again in subsequent Q/A or does it always retain it’s memory.
And if it already does, and I tag them again, am I necessarily wasting tons of tokens

2 Likes

Hi @wm9 ,

I think that question is worthy of its own topic!

Something like:

Do I need to re-tag files and folders in a chat or do they persist in context once added?

1 Like

I wonder how this guy did it in this video: