Add local PDF via @Docs

it’s been 2 years already. is there any timeline?

1 Like

Hey, not yet, unfortunately. We have not seen a huge demand for this feature yet, as remote PDFs should work by pasting the URL in - it’s currently only local PDFs that do not have good AI integration right now!

Hey,
Just adding an upvote here that I badly need this feature too!

2 Likes

Upvote here. I think this is crutial for many niche domains other than SDE.

2 Likes

Bump and upvote

I wrote a little tool bib4llm that converts PDFs into Markdown + PNGs. Either from a directory of PDF files or a Zotero BibTex file. Its currently using PyMuPDF4LLM for the conversion and leaves any RAG / indexing to Cursor.

I just drop a bib file synced with my Zotero collection (BetterBibTex + keep bib file updated during export) into my project folder, run bib4llm on it and start chatting with the papers.

Maybe you find it useful. :slight_smile:

bump and upvote

Bumping this again. I deal with a lot of academic papers and expecting users to do imperfect PDF conversions to markdown just to regain a feature already built into these LLMs is crazy at this point in the product lifecycle.

1 Like

Interesting project. Major use case issue for me personally when trying to use it is very limited efficacy converting equations back to markdown. Not familiar enough with OCR to know how much there is that one can do about this (other than ironically feed the PDFs into an LLM and have them fix rendering the equations…)

Also Agentic Document Extraction - LandingAI is the latest development as a reference for cursor developers.

Upvote

How many upvotes does the cursor team need before they feel it’s something important?

1 Like

Today I thought that it is random bug that cursor don’t have so as I think basic function. I though that model just break down and refused to read pdf so I started to test antoher models. Strangly I also noticed that when I paste print screen then curosr can read image but when I add image from repository to chat via attachment or just write path to it then cursor also refuse to read it saying it is only text model . Why it can read print screens but can’t read direct added images. Maybe it is same sort of problem because somebody mention that passing pdf as URL solve issue which is also strange.

You can run markitdown/packages/markitdown-mcp at main · microsoft/markitdown · GitHub as an MCP server and then add it to to the Cursor MCP server list so the agents have a tool to read various binary formats as Markdown.

This is a crazy difficult request.
PDF is a crappy document protocol and Adobe continuously made it worse over the 30 years it owned the standard.
I’m actually hoping the AI tools will refuse to read PDF files so the standard will die a silent death.

At any rate, i’ve had success creating an MCP that usess pdftoppm to generate images of the apges, then using the built in OCR to read the pages and generate an MD or MDC file from them.

It doesn’t catch everything, especially for chip datasheets that use alot of diagrams, but it does a good job of pulling the general text and table data.