Embedding Models for Different LLM Versions (GPT, Claude, etc.) in Cursor

PAPP92 · October 2, 2024, 10:06am

Hello everyone,

I recently began working with Cursor and have a question regarding the embedding models used.

I’m comparing the results from different models (various versions of GPT and Claude), and I’m curious about the embedding models utilized. Do you always use the same embedding model for every LLM (even though different versions of GPT typically recommend different embedding models)?
If you don’t use a universal embedding model that always works, is indexing done in parallel using different embedding models? Does the database of built embeddings change depending on the LLM model selected in chat?

Thank you for your insights.

litecode · October 2, 2024, 11:31am

Hi @PAPP92 ,

I don’t have an answer for your specific questions, but I have seen some bits and pieces around the forum and in the docs that may be of interest.

Based on these bits of information, my assumption is that:

In all interactions (Ctrl + K, Ctrl + L, Ctrl + I, Cursor Tab and Apply), Cursor does not just embed input using the same model as the selected LLM, send it to the LLM and return a response from the LLM
Rather, I imagine there is a more sophisticated play of deconstructing inputs and outputs, using different functions for different tasks and optimisations, with different parts of the data

But that is a guess.

On Cursor Tab and custom models:

Cursor Tab is our native autocomplete feature…powered by a custom model, Cursor Tab can: Suggest edits around your cursor, not just insertions of additional code; Modify multiple lines at once; Make suggestions based on your recent changes and linter errors.

Source: https://docs.cursor.com/tab/overview

Our custom models are hosted with Fireworks…

Source: https://www.cursor.com/security#infrastructure

On prompt building:

Are requests always routed through the Cursor backend?
Yes! Even if you use your API key, your requests will still go through our backend! That’s where we do our final prompt building.

Source: https://docs.cursor.com/privacy/privacy

On inference, embedding and codebase context (when enabled):

At inference time, we compute an embedding, let Turbopuffer do the nearest neighbor search, send back the obfuscated file path and line range to the client, and read those file chunks on the client locally. We then send those chunks back up to the server to answer the user’s question.

Source: https://www.cursor.com/security#indexing

If you choose to index your codebase, Cursor will upload your codebase in small chunks to our server to compute embeddings, but all plaintext code ceases to exist after the life of the request. The embeddings and metadata about your codebase (hashes, obfuscated file names) are stored in our database, but none of your code is.

Source: https://docs.cursor.com/privacy/privacy#does-indexing-the-codebase-require-storing-code

Related posts:

Note these posts are just provided for reference, for the most up to date details refer to the security and privacy pages and the docs.

Topic		Replies	Views
How to use own model for Cursor Tab? Discussions	2	986	September 10, 2024
Cursor Feature Deep Dive Discussions	3	1031	September 3, 2024
Inquiry: AI Model Utilization in Cursor's Different Modes Discussions	2	203	November 1, 2024
How to use Cursor with external models (such as Groq LLMs)? How To	7	1722	March 21, 2025
Why do Cursor and Claude.ai produce different responses? Discussions	3	1375	September 24, 2024

Embedding Models for Different LLM Versions (GPT, Claude, etc.) in Cursor

Related topics