Could you make cursor code based on academic papers and github repositories?

As an AI algorithm developer, I frequently work on implementing cutting-edge approaches in supervised learning, self-supervised learning, and reinforcement learning. However, I consistently encounter two major challenges:

Implementation Barriers

Paper Implementation

When working with academic papers, I often struggle to implement certain algorithmic components due to the absence of reference code. This creates a significant gap between theoretical understanding and practical implementation.

Cross-Language Migration

Even when open-source implementations exist, translating code from one programming language to another for my projects can be challenging. Some implementations contain complex or poorly documented sections that are difficult to interpret and port.

Proposed Impact

Addressing these challenges would:

  • Accelerate AI development workflow, even for Cursor editor itself

  • Enable faster algorithm prototyping

Other Thoughts on Development Documentation Challenges

GitHub Integration Concerns

The current @doc implementation for integrating GitHub repositories appears suboptimal. When adding repositories with identical main and prefix links, Cursor processes them rapidly but seemingly with limited comprehension depth. This quick processing may be compromising the quality of understanding and subsequent code assistance.

Academic Paper Integration

PDF document integration presents particular challenges:

  • Complex technical diagrams and figures are difficult to interpret

  • Mathematical equations and specialized notation require better parsing

  • The structured format of academic papers needs more sophisticated processing

Suggested Improvements

A more robust documentation system should:

  • Implement deeper repository analysis beyond surface-level scanning, including issues, wikis…

  • Better handle PDF-format academic papers, especially those with visual elements

  • Support intelligent cross-referencing between code and documentation

Interesting… Some time ago, I was messing around with Graphrag to handle academic papers for issues similar to those you described. However, I ended up writing a streamlit app for my specific needs. Basically, having all the latest papers in a KG allows me to query my text needs, and then I have an agent to assist with translating into code. You still have to do most of the work and understand the paper. But if you are digging through academic papers trying to use the latest advancements, I assume you want to understand it as well.
You might be able to use a .cursorrules (tools) to hook into the KG and run the needed queries, I am not sure. I just started using Cursor tools, but it is easy, and sometimes it works very well - but options are limited.
Otherwise, look into using an open-sourced coding AI ide. Then you can fork it and hack away some hook, perhaps having a @KG to use the indexed KG…
If you want to just use the latest paper to extract and prototype something without having to understand it perhaps just using openai o1 API, use “marker” to convert you papers into markdown write you instructions and see what it can do… At least it would be the place to start and less time-consuming.
Please share if you end up with an interesting solution

1 Like

I partly agree that AI is a “co-pilot”, so we humans still need to (at least roughly) know what we are doing/reading.

I came up with the idea here for two reasons: 1. I believe AI can catch many miscellaneous easily ignored details and hard-to-understand parts from academic papers, and 2. Recently, Claude has added a feature to read PDFs including images inside.

So I do think there could be significant improvement in this field when coding with papers.

And honestly, I’m not skilled enough to handle the customized GraphRAG and KG approach you mentioned. Besides, RAG-related solutions are sprouting up like mushrooms after rain—so many are coming out daily. So I guess I shouldn’t reinvent the wheel repeatedly but rather let the leading companies, including the Cursor Editor team, handle it when tacking these cutting of edge tech into coding. That’s why I posted this thread.

Depending on the number of papers and size you can try Tutorial: Adding full repo context, pdfs and other docs
I not sure what your specific use-case is. But if you have a collection of papers within a topic, you will need multi-hop, advanced reasoning ect,. and a KG RAG or similar would be my choice. You could try doing something with obsidian, hosting the informations you need adding it that way - I have not tried, and dont know if it is possible.

1 Like