Hey all
I’m working with a custom programming language inside Cursor, and I’ve got a setup that includes:
- A working Tree-sitter grammar, integrated as an extension (syntax highlighting works)
- A language server (LSP) using
glspc
, also based on the same Tree-sitter grammar
From what I’ve seen in Codebase indexing VS chat with codebase, Cursor splits code into syntactically meaningful chunks using Tree-sitter and uses those chunks to compute embeddings for its indexing system (used in @codebase
, completions, explanations, etc.).
However, I’m not sure if this leverages the Tree-sitter grammar provided by a custom extension, or instead uses an internal Tree-sitter parser that’s general-purpose for all languages.
So I’d love some clarification on how much the quality and structure of my Tree-sitter grammar actually affects Cursor’s indexing.
Here’s what I’m trying to figure out:
- How important is the level of structural detail in the grammar?
- If my grammar produces deeper and more specific trees (vs. shallow or generic rules), does that give Cursor more semantic precision?
- Do finer-grained distinctions between constructs help the indexer better understand the codebase?
- How important are the actual node names?
- Are there specific node names Cursor expects or prioritizes (e.g.,
function_definition
)? - Or is it mostly pattern-based or positional?
- For example, will Cursor index better if I define nodes like:
function_definition
instead ofunit
doc_comment
orcomment
instead ofcomm
import_statement
instead ofmacro
- I’m also assuming — and would like to confirm — that Cursor uses the internal VSCode LSP framework under the hood, which (as far as I know) may rely on Tree-sitter for tokenization and syntax parsing. Is that correct?
Would really appreciate any insight from the team or anyone who’s worked with custom languages in Cursor. Just trying to understand how much control I have by refining the grammar.js
file.
Thanks a lot!
Let me know if you’d like to cross-post this to GitHub Discussions or a Discord community — or want help tracking responses.