Swearing as the Best Model Quality Metric: Cursor graph

I would love to see a dashboard that shows over time the % of prompts that contain curse words by model on a realtime graph…

:slight_smile:

10 Likes

(and if there’s a uniform spike acrosss all models, then its a Cursor issue… :slight_smile:

1 Like

yea indeed… this would be interesting. I now figured for myself: once I start cursing, I need to choose a different model and/or approach.

@charles cursing and swearing often degrades performance as models are associating those words with bad code.

1 Like

true.
but also: there have been some articles around the net saying that if you THREATEN (instead of cursing) the model, performance can increase :wink: never tested it though.

That was with earlier models. Latest ones do not follow that.

You could in the past also offer a model incentives (money), doesnt work either.

1 Like

Models deny reality

1 Like

We hypothesize that the use of profanity is an indicator of the programmer’s deep emotional involvement with the code and its inherent complexities, thus producing better code based on a thorough, critical, and dialectical code analysis process," the study report says.

:grinning_face_with_smiling_eyes:

1 Like

I created this tool a few months back:

AGIfMeter :face_with_symbols_on_mouth::bar_chart:

AI Model Performance Analyzer - A tongue-in-cheek Ruby tool that measures AI model quality by analyzing f-word frequency in user prompts.

The Theory :brain:

The premise is simple yet surprisingly insightful: the more frustrated users get with an AI model (measured by f-word usage in their prompts), the worse the model is performing. While this is a fun and irreverent approach, it can actually provide genuine insights into user experience and model effectiveness!

Features :sparkles:

  • :magnifying_glass_tilted_left: Smart Pattern Detection: Detects various f-word spellings, censoring, and creative variations

  • :chart_increasing: Beautiful Terminal Graphs: ASCII charts showing frustration trends over time

  • :bar_chart: Statistical Analysis: Comprehensive metrics including rates, trends, and consistency

  • :bullseye: Performance Ratings: From β€œEXCELLENT” to β€œCRITICAL” based on f-word frequency

  • :date: Timeline Analysis: Tracks changes in user frustration over time

  • :wrench: Flexible Input: Works with any directory containing markdown prompt files

Sample output:

:chart_increasing: F-WORD FREQUENCY OVER TIME:

03/21 β”‚ β”‚ 0.000
03/21 β”‚β–ˆβ–ˆβ–ˆβ–ˆ β”‚ 0.600
03/22 β”‚ β”‚ 0.000
03/22 β”‚ β”‚ 0.000
…
03/27 β”‚ β”‚ 0.000
03/27 β”‚ β”‚ 0.000
03/27 β”‚ β”‚ 0.000
03/27 β”‚β–ˆβ–ˆβ–ˆ β”‚ 0.375
03/29 β”‚ β”‚ 0.000
03/29 β”‚β–ˆβ–ˆ β”‚ 0.286
…
05/10 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 2.800
05/11 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 2.300
05/11 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.286
05/11 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 2.000
05/11 β”‚β–ˆβ–ˆ β”‚ 0.273
05/12 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.683
05/12 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.000
05/13 β”‚β–ˆβ–ˆβ–ˆ β”‚ 0.500
05/13 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.756
05/13 β”‚ β”‚ 0.000
05/13 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.667
05/13 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.857
05/14 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.500
05/14 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ 5.750
05/14 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 3.200
05/14 β”‚ β”‚ 0.000
…
05/20 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 0.714
05/21 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 1.444
05/22 β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β”‚ 2.556
05/27 β”‚ β”‚ 0.000
05/27 β”‚β–ˆβ–ˆβ–ˆ β”‚ 0.500
05/28 β”‚ β”‚ 0.000
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
0.0 5.75

:counterclockwise_arrows_button: TREND: :chart_increasing: INCREASING (+50.0% change)

:brain: PERFORMANCE INSIGHTS:

:police_car_light: CRITICAL: High frustration levels! This AI model requires immediate attention.

:triangular_ruler: Statistical Analysis:
Average Rate: 0.3669 f-words per prompt
Standard Deviation: 0.752
Consistency: Low

:light_bulb: Remember: This is a tongue-in-cheek metric, but patterns in user frustration
can actually provide insights into AI model performance and user experience!

4 Likes

I’m not suggesting what people should do, I’m talking about an operational metric for Cursor. Show a realtime graph on the wall. If all models show a spike in swearing at the same time, cursor has a bug. And then historically, the model that encourages more swearing, is fundamentally a worse model than those that do not. It would be incredibly interesting and insightful.

1 Like