As an AI algorithm developer, I frequently work on implementing cutting-edge approaches in supervised learning, self-supervised learning, and reinforcement learning. However, I consistently encounter two major challenges:
Implementation Barriers
Paper Implementation
When working with academic papers, I often struggle to implement certain algorithmic components due to the absence of reference code. This creates a significant gap between theoretical understanding and practical implementation.
Cross-Language Migration
Even when open-source implementations exist, translating code from one programming language to another for my projects can be challenging. Some implementations contain complex or poorly documented sections that are difficult to interpret and port.
Proposed Impact
Addressing these challenges would:
Accelerate AI development workflow, even for Cursor editor itself
Enable faster algorithm prototyping
Other Thoughts on Development Documentation Challenges
GitHub Integration Concerns
The current @doc implementation for integrating GitHub repositories appears suboptimal. When adding repositories with identical main and prefix links, Cursor processes them rapidly but seemingly with limited comprehension depth. This quick processing may be compromising the quality of understanding and subsequent code assistance.
Academic Paper Integration
PDF document integration presents particular challenges:
Complex technical diagrams and figures are difficult to interpret
Mathematical equations and specialized notation require better parsing
The structured format of academic papers needs more sophisticated processing
Suggested Improvements
A more robust documentation system should:
Implement deeper repository analysis beyond surface-level scanning, including issues, wikis…
Better handle PDF-format academic papers, especially those with visual elements
Support intelligent cross-referencing between code and documentation
Interesting… Some time ago, I was messing around with Graphrag to handle academic papers for issues similar to those you described. However, I ended up writing a streamlit app for my specific needs. Basically, having all the latest papers in a KG allows me to query my text needs, and then I have an agent to assist with translating into code. You still have to do most of the work and understand the paper. But if you are digging through academic papers trying to use the latest advancements, I assume you want to understand it as well.
You might be able to use a .cursorrules (tools) to hook into the KG and run the needed queries, I am not sure. I just started using Cursor tools, but it is easy, and sometimes it works very well - but options are limited.
Otherwise, look into using an open-sourced coding AI ide. Then you can fork it and hack away some hook, perhaps having a @KG to use the indexed KG…
If you want to just use the latest paper to extract and prototype something without having to understand it perhaps just using openai o1 API, use “marker” to convert you papers into markdown write you instructions and see what it can do… At least it would be the place to start and less time-consuming.
Please share if you end up with an interesting solution
I partly agree that AI is a “co-pilot”, so we humans still need to (at least roughly) know what we are doing/reading.
I came up with the idea here for two reasons: 1. I believe AI can catch many miscellaneous easily ignored details and hard-to-understand parts from academic papers, and 2. Recently, Claude has added a feature to read PDFs including images inside.
So I do think there could be significant improvement in this field when coding with papers.
And honestly, I’m not skilled enough to handle the customized GraphRAG and KG approach you mentioned. Besides, RAG-related solutions are sprouting up like mushrooms after rain—so many are coming out daily. So I guess I shouldn’t reinvent the wheel repeatedly but rather let the leading companies, including the Cursor Editor team, handle it when tacking these cutting of edge tech into coding. That’s why I posted this thread.
Depending on the number of papers and size you can try Tutorial: Adding full repo context, pdfs and other docs
I not sure what your specific use-case is. But if you have a collection of papers within a topic, you will need multi-hop, advanced reasoning ect,. and a KG RAG or similar would be my choice. You could try doing something with obsidian, hosting the informations you need adding it that way - I have not tried, and dont know if it is possible.