Code index fragment association

Ronchan0805 · May 8, 2026, 7:45am

First and foremost, it must be acknowledged that Cursor is an excellent AI IDE.
I encountered some difficulties in the code indexing aspect during my work. To put it simply, I segmented my code by functions using AST and performed vector embedding. However, when using RAG, I could only retrieve function A that the user directly asked about. How can I retrieve its associated functions B, C, and D?
I have two ideas:
One is to conduct static analysis and recursive retrieval.
The other is to pre-build a code graph during the AST parsing stage.
I carefully and repeatedly read the relevant content on the Cursor website, but still did not obtain the answer I was looking for. I expect to receive some suggestions

deanrie · May 8, 2026, 2:17pm

Hey, classic issue in code RAG. A few approaches are worth combining, since no single one is enough on its own:

Hybrid retrieval, not just vector. Vector search alone often misses calling and called functions because their embeddings can look far from the request. Combine it with keyword search or grep over symbol names. Cursor uses semantic search plus grep for this reason Semantic & Agentic Search | Cursor Docs
Build a code graph at the AST stage, like your second idea. This is usually the most reliable approach. Build a symbol and call graph at parse time, including definitions, references, imports, and class hierarchy. Then at retrieval time, take top-k from vector search and do 1 to 2 hops of graph expansion, like callees, callers, and type dependencies. Tree-sitter plus a language-aware symbol resolver like LSP, scip indexers, or stack-graphs is what many production setups use.
Do recursive or static analysis on the fly, like your first idea. It can work, but it’s slower and harder to cache. It’s usually best as a refinement step on a small candidate set, not as the main retrieval method.
Add a reranker on top. After vector plus graph expansion, run a cross-encoder or LLM rerank against the user request. This can greatly improve precision and helps when chunk boundaries aren’t great.
Chunking matters too. Function-level AST chunking is good, but for short helper functions or methods in the same class, grouping them or attaching class and file context to each chunk often helps.

For Cursor’s indexing approach, the public part is here Semantic & Agentic Search | Cursor Docs. We don’t share internal details beyond that.

Topic		Replies	Views
Codebase indexing VS chat with codebase Discussions	7	3671	April 17, 2025
Context-Mode -> Necessary for Cursor? Discussions mcp , context	1	433	March 13, 2026
Academic Frontier] How I use Cursor to write Dissertation at a QS Top 120 University Discussions indexing , context	1	111	March 10, 2026
Improve codebase indexing and search algorithm Feature Requests	1	321	March 5, 2025
Is creating a new instance of cursor in a sub folder of my worktrees folder with sparse worktrees losing the benefit of cursor rag indexing? Help worktrees , indexing , large-codebases	4	104	January 21, 2026

Code index fragment association

Related topics