Hi guys!
We all know what happened with 0.45 => 0.46 version upgrade. Most of us felt a downgrade in performance. Do we have any generic way to evaluate if one update/prompt/mix of prompts/settings is better than others?
Also, can I run some benchmarking tests to check out my custom modes/settings or anything combined?
Ofc, we can release and check what the community says.
I feel we lack cool, reliable evaluation for both Cursor AI releases and every user experimentation.
Any ideas on how to test it out or how to be better at this?