I’m having some problems indexing Haystack documentation. I’ve seen a couple threads about documentation indexing problems, but other than updating to the latest version of Cursor (which doesn’t help) I don’t see a fix.
What I’m trying to add: Introduction to Haystack 2.x. Indexing actually seems to progress: I can see different pages being indexed, but at some point it fails (with no log output). There are indeed pages indexed, but there’s still a failed message. And I can’t quite check if everything has been indexed, as the pages are not in order and there are a lot of them.
When I check the ‘Cursor Indexing & Retrieval’ logs in the terminal output, there is nothing related to indexing documentation. Adding a new documentation source, restarting indexing, etc don’t lead to any new log entries, even when setting the verbosity to trace.
How to reproduce:
- Add a new documentation source: Introduction to Haystack 2.x
Hey, adding the URL https://docs.haystack.deepset.ai/docs/intro
worked for me, and is already indexed by our system. Can you confirm this was the URL you were trying, and the prefix was correctly identified?
Sorry for the late reply. That one indeed works now, but I’m getting the same error now with other documentation, like Feedparser: Documentation — feedparser 6.0.11 documentation
Looks like https://feedparser.readthedocs.io/en/latest/
works for me
Does it show ‘Indexed ’ on your machine? This is what I saw (forgot to attach the screenshot last time) seeing:
It also shows a lot more pages. I also just tried it again a moment ago (with a different name, but the same URL, https://feedparser.readthedocs.io/en/latest/:
How does the indexing work? Given the difference in results I assume that it works locally? Are there logs I could check?
Hey, it may show as failed if one page fails, but the rest work fine. As there are 184 pages indexed, I would hope that has worked successfully.
We don’t have any individual fixes for docs that won’t index right now, but we have some improvements were are looking to make soon!
Ah I see. Well if it shows a failure that means that at least some of the doc pages have failed to load. I hope the opcoming improvements make this a bit more robust. Thanks for your response!
1 Like