Which text formats are best understood by LLMs—Markdown, XML, or others? Also, are there any tools that can scrape content from a website or documentation and convert it into the optimal format for LLM processing?
For scraping docs in Cursor, you’ll want to check out the @Docs command (Cursor – @Docs). It’s built to crawl and index documentation sites automatically
As for text formats - while Cursor works with various formats, we don’t make specific recommendations about which formats are “best” for LLMs since that’s more of a general ML question than something specific to Cursor!
1 Like