Hi Cursor IDE team and community!
I’d like to share an idea to reduce token costs and improve efficiency when working with large language models (LLMs) in code editors. This approach could make interactions with expensive models (like Gemini 2.5 Pro or Claude 3.7) faster and more cost-effective.
The Problem
Today, even simple tasks (e.g., “Where is calculateTax
used?”) often require sending massive chunks of code to an LLM. This leads to:
- High token costs: Thousands of tokens wasted on irrelevant code.
- Slow responses: Models waste time parsing unrelated files.
- Noise overload: Important details get lost in bloated contexts.
Proposed Solution: Local Filter + Smart Caching
Use a lightweight local model as a “pre-filter” to identify relevant code snippets before querying the expensive model.
Workflow
-
Code Indexing
- On first run or file changes:
- Parse all files and extract key elements with AST parsing and/or with local LLM:
- Function/class names.
- Line numbers and file paths.
- Brief descriptions (comments/docs).
- Store this data in a local cache (SQLite, vector DB, or even a JSON file).
- Parse all files and extract key elements with AST parsing and/or with local LLM:
- On first run or file changes:
-
Query Processing
- User asks: “Fix the bug in
validateForm
.” - Local model:
- Scans the cache to find
validateForm
(e.g., line 120 inform.js
). - Builds a minimal, enriched prompt for the LLM:
- Scans the cache to find
In file form.js, there's a function validateForm (line 120): function validateForm(data) { ... } User says: "The email field isn't validating correctly."
- User asks: “Fix the bug in
-
Send to LLM
- The expensive model gets a focused context, reducing token use and improving accuracy.
-
Cache Updates
- Dynamically refresh the cache when new code is needed or files change.
Advanced: Function Relationship Mapping
If a user asks, “Fix validateForm
, which uses checkEmail
,”, the local model can record a dependency: validateForm → checkEmail
.
- Future queries about
email checking
will automatically includevalidateForm
andcheckEmail
in the context.
The system builds a graph of function interactions , reducing the need for manual exploration.
Implementation Steps
Post-Processing Hook:
- After task completion, trigger a lightweight analysis of:
- User queries (e.g., “Why does
processOrder
fail?”). - Code snippets sent to the LLM.
- LLM responses (e.g., “Modify
processOrder
to callvalidatePayment
”).
- User queries (e.g., “Why does
Update the Cache:
- Store relationships in a graph structure (e.g., Neo4j, Redis Graph, or a simple adjacency list):
{ "validateForm": ["checkEmail", "checkPassword"], "checkEmail": ["sanitizeInput"] }
Example Workflow
- User Query:
“FixcalculateTotal
— it’s not summing tax correctly.” - LLM Response:
- Modifies
calculateTotal
to call a newapplyTax
function.
- Modifies
- Post-Task Analysis:
- Local model infers:
calculateTotal → applyTax
. - Updates cache to link these functions.
- Local model infers:
- Next Query:
“Why isapplyTax
returning NaN?”- System automatically includes
calculateTotal
in the context.
- System automatically includes
P.S. We can enrich cache with data only from AST parsing but its not enough because we can’t find connection between entities if it is event-based system (e.g. EventEmitter). So it can be hybrid approach (AST + local LLM) or only local LLM.
Key Benefits
- Token Savings: Only send critical code fragments, not entire projects.
- Speed: Faster responses from both local filtering and smaller LLM contexts.
- Accuracy: Less noise = fewer errors from context overload.
- Scalability: Handles large codebases by focusing on relevant parts.
Call for Discussion
Does this align with Cursor’s roadmap? Are there technical challenges I’m missing? Thank you!