Code index fragment association

First and foremost, it must be acknowledged that Cursor is an excellent AI IDE.
I encountered some difficulties in the code indexing aspect during my work. To put it simply, I segmented my code by functions using AST and performed vector embedding. However, when using RAG, I could only retrieve function A that the user directly asked about. How can I retrieve its associated functions B, C, and D?
I have two ideas:
One is to conduct static analysis and recursive retrieval.
The other is to pre-build a code graph during the AST parsing stage.
I carefully and repeatedly read the relevant content on the Cursor website, but still did not obtain the answer I was looking for. I expect to receive some suggestions

Hey, classic issue in code RAG. A few approaches are worth combining, since no single one is enough on its own:

  1. Hybrid retrieval, not just vector. Vector search alone often misses calling and called functions because their embeddings can look far from the request. Combine it with keyword search or grep over symbol names. Cursor uses semantic search plus grep for this reason Semantic & Agentic Search | Cursor Docs

  2. Build a code graph at the AST stage, like your second idea. This is usually the most reliable approach. Build a symbol and call graph at parse time, including definitions, references, imports, and class hierarchy. Then at retrieval time, take top-k from vector search and do 1 to 2 hops of graph expansion, like callees, callers, and type dependencies. Tree-sitter plus a language-aware symbol resolver like LSP, scip indexers, or stack-graphs is what many production setups use.

  3. Do recursive or static analysis on the fly, like your first idea. It can work, but it’s slower and harder to cache. It’s usually best as a refinement step on a small candidate set, not as the main retrieval method.

  4. Add a reranker on top. After vector plus graph expansion, run a cross-encoder or LLM rerank against the user request. This can greatly improve precision and helps when chunk boundaries aren’t great.

  5. Chunking matters too. Function-level AST chunking is good, but for short helper functions or methods in the same class, grouping them or attaching class and file context to each chunk often helps.

For Cursor’s indexing approach, the public part is here Semantic & Agentic Search | Cursor Docs. We don’t share internal details beyond that.