Adding Documentations that don't have index-page or sitemaps

toe · September 5, 2024, 9:02pm

TL;DR

use xml-sitemaps.com to create a make-do site-map of the documentation’s homepage/entrypoint.
IMPORTANT: At the end of the page you’ll find “view HTML sitemap”. use that as the Entrypoint of the documentation. (Cursor doesn’t recognize XML, which is made available officially by every website, which would’ve made this a lot easier)
the prefix is the “repeating” element of the official URL, NOT xml-sitemaps.com

Full version:
some documentations do not come with an …/index.html which makes a great Entrypoint for Cursor.
To make matters worse, some websites like docs.langflow.org do not use traditional hierarchy in their URL. (by hierarchy i mean, the links are not in example.com/page/sub-page/.. format)
For example, in the case of langflow, the URL structure is example.com/Topic-SubTopic-SubSubTopic.. which seems to be an unfamiliar format for Cursor’s bot (atleast at the time of writing this)
Due to which my attempts at indexing Langflow’s documentation had Cursor forcing itself to look for links with a hierarchical format which I mentioned earlier. So it ended up hallucinating links which don’t exist, and indexed 404 pages which were in hierarchical format. Yet, it hallucinated answers with confidence. Classic AI.
A sitemap can fix this issue, but when there isn’t one, you gotta make your own. Most websites keep a xml sitemap for the search-engines (example.com/sitemap.xml)
But Cursor doesn’t work with XML links. That’s where XML-sitemaps.com comes in. It not only crawls the given link to make a sitemap, it also gives a HTML version of the result which can be used as an Entrypoint with Cursor.

Edit: the hierarchical URL structure is called path-based or directory-style URL. the one with the hyphen can be called flat/delimited URL, although gpt4 said it’s not a recognized term

adgower · September 12, 2024, 2:15pm

Can you explain this again? Right now I have used xml-sitemaps.com to generate the HTML link. I pasted that into the add new doc dialog. It shows indexed but displays 0 pages?

I set prefix as official url

Are there instructions on how can generate docs that are fully compatible with how they are indexing the data?

toe · September 13, 2024, 7:43am

for example, in case of example.com/xyz/abc
use only example.com as prefix
the HTML-XML-sitemap of example.com as the Entrypoint

is that what you did?

ron · September 18, 2024, 2:07am

Bro. You. Rule.

Hrnkas · February 21, 2025, 9:23pm

Can anyone tell me what is wrong with my documentation pages: Future-C Documentation
sitemap: futurefactory-software.com Site Map - Generated by www.xml-sitemaps.com

cursor refuses to index them (indexed 0 pages)

OhoyCaptain1 · March 1, 2025, 1:12pm

Bump on this topic. Is this a broken solution (or a last hail Mary attempt)? I tried if for example with yahoo finance api and also got not luck despite being able to generate the site-xml without issues.

benjaminfortunato · April 7, 2025, 12:54pm

I’m running into an issue with indexing being blocked:

Starting URL: serverless/docs at main · serverless/serverless · GitHub
Updated on: April 7, 2025, 12:53, 0 pages

Your Sitemap is empty

docs - Blocked by robots.txt file

Looks like your website pages are blocked from indexing, please try our Search Engine Robot Simulator tool to diagnoze possible issues.

Topic		Replies	Views
Document indexing settings How To	2	2197	September 5, 2024
Incomplete indexing of Docusaurus based documentation Bug Reports	2	454	September 5, 2024
Documentation indexing problems Discussions	14	820	March 12, 2025
Cursor cannot index docs sites Bug Reports	1	125	April 1, 2025
How to index Three.js doc? Bug Reports	4	446	September 15, 2024

Adding Documentations that don't have index-page or sitemaps

Your Sitemap is empty

Related topics