Quick Question About Cursor’s LLM Integration

monsieur_ninja · September 4, 2024, 12:07am

Hi everyone,

I’ve been really impressed with Cursor’s ability to efficiently and accurately update code, and I’m curious about the strategies behind it. I’m working on a project where I need to integrate an LLM to help users find specific segments of text based on natural language queries.

The challenge I’m facing is that each word in my text has a unique ID, and I need the LLM’s response to include information that allows me to programmatically highlight the correct words in the editor. Does anyone have experience or insights on how to approach this?

Any advice would be greatly appreciated. Thanks!

monsieur_ninja · September 4, 2024, 1:09pm

I wanted to clarify my initial question to ensure I get the most relevant advice.

To clarify:

In my project, each word in a text is associated with a unique ID. For example, the text might be represented as:

[
  {"id": 21, "word": "My"},
  {"id": 22, "word": "favorite"},
  {"id": 23, "word": "fruits"},
  {"id": 24, "word": "are"},
  {"id": 25, "word": "bananas"},
  {"id": 26, "word": "and apples"}
]

The text can be very long, such as a transcript of an hour-long conversation with around 20,000 words. I want to integrate an LLM so that when a user makes a natural language query like "find me all fruits", the system can efficiently identify and return the IDs of all related words.

What I need help with:

How should I structure the input and output for the LLM to ensure it accurately identifies and returns the correct IDs based on natural language queries, especially in the context of very long texts?
Are there any best practices or approaches to optimize the performance of the LLM when handling large texts and ensure that the output is precise and easily integrable with a system that highlights text segments using these IDs?

Example Scenario:

Input Text:

[
  {"id": 21, "word": "My"},
  {"id": 22, "word": "favorite"},
  {"id": 23, "word": "fruits"},
  {"id": 24, "word": "are"},
  {"id": 25, "word": "bananas"},
  {"id": 26, "word": "and apples"}
]

(Note: In practice, the text can be up to 20,000 words long, representing a full conversation.)

User Query: "find me all fruits"
Expected Output: [23, 25] (assuming "fruits" includes related words like "bananas")

I hope this clarifies what I’m looking for. Any insights or suggestions on how to handle large texts efficiently with an LLM would be greatly appreciated!

Thanks again!

servetgulnaroglu · November 8, 2024, 8:03am

Hi! I have the same question and am interested in hearing any insights others might have on this approach. Looking forward to any suggestions or strategies that could help with implementing LLMs for highlighting specific words in large texts.

Topic		Replies	Views
Reliability Issues in Parsing LLM Outputs Feedback	0	157	November 8, 2024
Can Cursor Index very large code bases and documentation? Discussions	0	181	October 3, 2024
Cursor not support llms.txt standard Bug Reports	2	169	July 8, 2025
Idea: Optimize LLM Usage with a Local Filter & Code Caching (reduce token costs) Feature Requests	3	186	May 15, 2025
How can I use Cursor AI to use LLMs that I have been access with? Discussions	0	83	May 27, 2025

Quick Question About Cursor’s LLM Integration

Related topics