How Cursor Context works as of 0.45.7

Here’s what I mean:

  • Let’s say I want to introduce new context to cursor. Let’s say I have the documentation for a library crawl4ai in markdown format as docs-crawl4ai.md. It has 70k tokens.
  • From my current understanding I have various ways of working with this documentation
    1. As @doc context selector and give it the documentation website link. We ignore the file in this case. This fetches the page / webpages, indexes them into embeddings and retrieves the results.
    2. As a codebase query with the file in the project folder. If codebase is not indexed, or the file is ignored, no search results will appear. (as expected) Even though the CHAT has the submit with codebase option, I found that COMPOSER can find this as well if I tell it to search for it. This also uses embeddings.
    3. As @file context selector with the actual docs-openai-api.md. Here, cursor shows a “long file details” and displays the chunks where it got that information from. So im assuming this is also embeddings. This apparently behaves differently in CHAT vs COMPOSER. The CHAT proceeds to search directly the file I mentioned. The COMPOSER first searches the codebase even if I mentioned the file. Don’t know if this is the intended behaviour.

My questions are related to how context behaves:

  1. Is there anyway I can keep the file in context in the current chat / composer? The example here is that I want to know if I need to reload the @file on every subsequent query. Ideally I would like the file tokens in context. (taking into account the model context limit ex 200k for claude)
  2. Is there any max tokens threshold before cursor decides to index as embeddings?
  3. Utlimately, I want to know how to use a large documentation, when to use chat, when to use composer, and how can I load all tokens into context and not use embeddings.

Are there any team members that can accurately answer these?

Im sorry if this has been asked before but I don’t have time to search and scan. Thank you

On a related Q: Anyway to get it to forget context.

Like stop talking to me about darn DB when ive clearly moved up the stack…

Ill need to try to get it to ignore things.

I just installed R1-Distilled-1.5B locally in LM studio, and its responses are not as good so far as Cleaud, and 4O were doing… but at least I have it running locally - not I need to get it connected to the internet, and I was thinking of the following

So I started making a crawler - and I would like to have the crawler be fairly advanced to the point that I can have YOLO prompt the crawler (detailed rules) and then run the crawler, slurp the data - then have YOLO ingest the context…