Custom Doc Indexing

I’ve had indexing on DigitalOcean’s prod docs for a long time. It keeps failing, so I restart it. Each time this happens, is it starting over from scratch? Or continuing where it was.

Hey, can you send the link over to the docs, and I’ll take a look on my end.

When restarting documentation indexing, it does start from scratch, but as long as you use the same URL as others, the documents are de-duplicated, so if another user indexes the same docs successfully, that version is automatically available to you.

Might be a dumb question, but if you have a document indexed already that has been indexed for other people, then at what time do you go back and revisit all of the hundreds of pages to see if there have been updates?

If another user adds it, we always refresh the documentation at that point, as well as every so often behind the scenes!

@danperks

I am confused. This Indexing has been going on for many hours. Are these pages being downloaded to my machine and indexed locally on my machine?
Previously I saw the indexing come up with a message saying “Failed”.

Now it just seems to have come to a stop.

Hey, the index is stored in our infrastructure and may have visually stopped if another user had added the documentation and the indexing has restarted.

For something as popular as the Swift documentation, you should be fine to @ it without worrying if its out of date, or not fully indexed!

1 Like

@danperks - Something doesn’t seem quite right. As 24 hours later this is what I see:

Now the index has less pages. Also I notice it always seems to have the status of indexing failed.

If I was to index with a different name, would that stop others who are indexing it from continually restarting it?

Hey @danperks , I’ll take advantage of your presence to ask you a question on this subject: I’m trying to index the Unreal Engine documentation but it looks like it doesn’t find the pages. Is it something you’re aware of ?
Here is the link to the UE C++ API for example : https://dev.epicgames.com/documentation/en-us/unreal-engine/API

I believe you’d need a different URL to ensure the documentation was unique to you. However, it may be that the Swift documentation is just too big and, even if it was fully indexed successfully, may not be of use due to the sheer amount of data to sort through.

My recommendation would be to use the prefix setting to narrow the docs down into smaller chunks, if the structure of the Swift docs allows you to do so.

1 Like

Thanks Dan, I will go with your recommendation and narrow down the docs.

It looks like Unreal have put some systems in place to stop automated scraping from occurring, but I do think we have Unreal Engine as a built-in doc already, which may help.

@danperks - Just tried your recommendation with the following URL:

I was expecting many pages to be indexed, but it shows that it only indexed one page. Now I am even more confused. This page has links on it, but it seems not to see them. Could you try adding this as a doc to see if you get the same as me.

Any ideas?

Can’t get it to index on my side either. I’ll add these to our internal list to see if we can fix them!

It might help to let users just search existing docs alternatively to having to provide an URL, that would also avoid having to find the URL, also a project is usually limited to a few frameworks (front/backend/language) and most of those can be detected from packages/libraries used.

Btw. the 2nd modal when adding a new docu is unstable. when editing the URL in that 2nd modal the modal closes most of the times!

We’ve thought about this, but were concerned things like versions and revisions, or language specific versions of the same documentation could cause confusion.

With the second modal, you’ve got to be careful not to click out of it, otherwise it will close.

1 Like

I am experiencing a similar phenomenon in my environment. If I add or remove a document and then launch cursor again, it reverts back to its original state.
I am developing a new project and the documentation has a big impact on the quality of the generated code.