Cursor for Data Science - my experiences

Hi all,

I’m new here and I thought I’d share some of my experiences, positive, and negative with Cursor over the last few months. I’ve completed 4 data science projects in that time. Three fairly small and a moderately large one that I’m either wrapping up, or hopefully will take to production.

I come at this as an old time SAS programmer (yes it’s a programming language, not a spreadsheet) with a chemical engineering background and academic coding starting with Fortran IV, Pascal, C++, and some visual basic. I struggled with R for a bit, and am still really a beginner with Python. I’m fairly senior and most of my roles have been in leadership positions.

I’m now back in the programmers seat delivering data science work (and I use a more robust MLOps process than most) as an independent. With Cursor I’m doing projects in 1/3 the time, with 1/4 the staff at 1/5 the costs. A month project would have taken me a team of 4 for 3 months just two years ago. The productivity gains are incredible.

So are some of the roadblocks though. I am constantly fighting the limited context window. I can barely get a prompt together without hitting 90% context. (Max mode helps, but burning $1000 a week on tokens, like I did last week isn’t going to be sustainable without a bump in my fees).

Letting the agent run wild is dangerous. I’ve fought a data validation war for the last two weeks watching as the agents, skipped tests to pretend they passed, changed the validation data to match the code output, continuously write throw-away in chat (editor) code rather than recording the code as a script, duplicating scripts with multiple tags, names, etc. like a management consultant on powerport instead of using proper git/branch management to track audits, etc. Often refusing to use previously created and documented utility functions designed specifically for accessing databases, parallelizing runs, running high resource jobs in RAM, avoiding expensive and excessive disk i/o, etc.

Results are hit and miss.

On the other hand I’ve been amazed at how quickly and easily it can code a complex predictive or forecasting model. I was able to optimize a Bayesian Hierarchical forecast training, cross validation, and scoring model in an afternoon. It took several days to optimize it and I was eventually able to get it to run on my (admittedly souped up) Macbook, first in about 12 hours - (when I was in college it would have taken a Cray supercomputer to do the calculations at all), then with Pytorch optimized for Apple M2 Max metal in just under an hour….

Sometimes I’ll get great results, sometimes incredibly frustrating results. Sometimes, the context window just blows up, other times I get great results with Auto mode. (To be fair my repo on my current project is 600 GB including data, with 82,000 files… Multiple databases, dozens of tables, hundreds, if not a thousand or more columns.)

I’m sure there is a lot I can learn from this group. Happy to share my experiences (including my stupid mistakes…it’s how I learn).

Is anyone else using Cursor for data science, machine learning, optimization, or other AI type work?

What have your experiences been?

Any tips, thoughts, or suggestions appreciated.