Embedding Models for Different LLM Versions (GPT, Claude, etc.) in Cursor

Hello everyone,

I recently began working with Cursor and have a question regarding the embedding models used.

I’m comparing the results from different models (various versions of GPT and Claude), and I’m curious about the embedding models utilized. Do you always use the same embedding model for every LLM (even though different versions of GPT typically recommend different embedding models)?
If you don’t use a universal embedding model that always works, is indexing done in parallel using different embedding models? Does the database of built embeddings change depending on the LLM model selected in chat?

Thank you for your insights.

Hi @PAPP92 ,

I don’t have an answer for your specific questions, but I have seen some bits and pieces around the forum and in the docs that may be of interest.

Based on these bits of information, my assumption is that:

  • In all interactions (Ctrl + K, Ctrl + L, Ctrl + I, Cursor Tab and Apply), Cursor does not just embed input using the same model as the selected LLM, send it to the LLM and return a response from the LLM

  • Rather, I imagine there is a more sophisticated play of deconstructing inputs and outputs, using different functions for different tasks and optimisations, with different parts of the data

But that is a guess.

On Cursor Tab and custom models:

Cursor Tab is our native autocomplete feature…powered by a custom model, Cursor Tab can: Suggest edits around your cursor, not just insertions of additional code; Modify multiple lines at once; Make suggestions based on your recent changes and linter errors.

Source: https://docs.cursor.com/tab/overview

Our custom models are hosted with Fireworks…

Source: https://www.cursor.com/security#infrastructure

On prompt building:

Are requests always routed through the Cursor backend?
Yes! Even if you use your API key, your requests will still go through our backend! That’s where we do our final prompt building.

Source: https://docs.cursor.com/privacy/privacy

On inference, embedding and codebase context (when enabled):

At inference time, we compute an embedding, let Turbopuffer do the nearest neighbor search, send back the obfuscated file path and line range to the client, and read those file chunks on the client locally. We then send those chunks back up to the server to answer the user’s question.

Source: https://www.cursor.com/security#indexing

If you choose to index your codebase, Cursor will upload your codebase in small chunks to our server to compute embeddings, but all plaintext code ceases to exist after the life of the request. The embeddings and metadata about your codebase (hashes, obfuscated file names) are stored in our database, but none of your code is.

Source: https://docs.cursor.com/privacy/privacy#does-indexing-the-codebase-require-storing-code

Related posts:

Note these posts are just provided for reference, for the most up to date details refer to the security and privacy pages and the docs.

1 Like