Has cursor become more stupid?

A week ago cursor was performing amazingly with Clause Sonnet 3.5.

Now that I’ve paid for it, cursor seems to making repetitive mistakes, overriding existing code, duplicating features and make continuous mistakes.

I’ve provided cursorrules, removed unneeded functionality and use single focus conversations to ensure that I don’t change too much at the same time.

Anyone else having these issues?
Has anyone got any creative and useful suggestions?

4 Likes

I experienced the same issue recently, where Cursor would just remove my code completely, which is not useful at all.

However, I assume it’s a mistake from the model, e.g. looks like Anthropic experienced some outages recently.

Switching to a different model would help in this case.

1 Like

Yeah. I thought it was just long chat history, so I partly solved that with my projectContext.md file over at this post Workaround for lack of memory between chats

That has helped a lot with starting new chats more readily. But recently seems increasingly sub-par from the get-go. Hard to tell whether its model and / or auxiliary functionality thats the cause, probably a little of both.

I wonder if large models like 3.5 Sonnet experience degradation the longer they are out in the wild getting RLHF’d to oblivion, can string some parallels with the long context conundrum.

Of course nothing empirical to back any of that up, just some rants. I’ll probably focus on keeping my own engineering sharp for a bit, while we wait for whatever’s next.

This is really interesting. Can you share a sample, and what you did please?

Thanks

Thanks @ngochai94

I think this is only partially correct.
The cursor team have made silly changes.
e.g
the application now continuously ignores its own .cursorrules and rules via the settings. This has little to do with the LLM, and more to do with how cursor sends the info to the LLM.

I’m going to ask for a refund.
i think Aider has again regained king.

Also Aider support is a bit more on the ball. If you look at solved issues by date, most people/issue havent even been looked at.

Cursor may be great - one day!

2 Likes

Let me join the chorus!
It seems that cursor (maybe LLM in general) is amazingly good to come up new code, or nearly green field development.
But I often frustrated by cursor’s lack of understanding or respect to the context. When I ask it modify some functions to add new functionality, it will do it with OK implementation, but destroying coded functionality.

Here is an example.

First, the code co-written by me and cursor, passing my review and test:

def process_location_date(location: str, df: pd.DataFrame, date: pd.Timestamp, 
                          window_size: int, overheat: float, 
                          prediction_horizon_list: List[int], 
                          scatter: bool = False, 
                          overall_error_metrics: Dict[int, List[float]] = {}
                          ) -> Dict[int, List[float]]:
    """
    Process and plot data for a specific location and date.
    """
    start_date, end_date = get_overheat_date_range(date)
    actual = df[df['Time'].between(start_date, end_date)].sort_values(by='Time')
    logger.info(f"Processing location: {location}, date: {date}, data points: {len(actual)}")

    predictions = []

    for index, horizon in enumerate(prediction_horizon_list):
        # Calculate predictions only for the relevant part of the data
        pred_end = actual['Time'].max() - pd.Timedelta(seconds=horizon)
        actual_for_pred = actual[actual['Time'] < pred_end]
        
        pred = calculate_predictions(actual_for_pred, window_size, horizon)
        
        # Trim actual data to match prediction time range
        pred_start = actual['Time'].min() + pd.Timedelta(seconds=horizon)
        actual_for_merge = actual[actual['Time'] > pred_start]
        
        # Merge prediction with actual data
        # Note: We use pred as the left DataFrame to ensure we match as many prediction rows as possible.
        # This is important because pred may be a subset of actual_for_merge in terms of timestamps.
        merged_df = pd.merge_asof(pred[['PredictionTime', 'Predicted']], actual_for_merge, 
                                  left_on='PredictionTime', 
                                  right_on='Time',
                                  direction='nearest',
                                  tolerance=pd.Timedelta(minutes=5))
        
        logger.info(f"Prediction horizon: {horizon}, merged data points: {len(merged_df)}")
        
        if merged_df.empty:
            logger.warning(f"No overlapping data for horizon {horizon}")
            logger.warning(f"Actual data: {actual}")
            logger.warning(f"Pred data: {pred}")
            continue

        error_metrics, additional_metrics = calculate_error(merged_df, overheat)
        true_pos, false_pos, false_neg, true_neg, sum_residure_squared, max_error = additional_metrics
        
        prior_metrics = overall_error_metrics.get(horizon, [0, 0, 0, 0, 0, 0])
        overall_error_metrics[horizon] = [
            prior_metrics[i] + additional_metrics[i] for i in range(5)
        ] + [
            max(prior_metrics[5], additional_metrics[5])  # max_error
        ]

        annotation_x_coord = index / max(len(prediction_horizon_list) - 1, 1)
        
        predictions.append(PredictionResult(
            dataframe=pred,
            prediction_horizon_seconds=horizon,
            error_metrics=error_metrics,
            annotation_x_coord=annotation_x_coord
        ))
    
    plot_description = {
        'location': location,
        'window_size': window_size,
        'overheat': overheat if actual['Value'].max() >= overheat else None,
        'plot_type': 'scatter' if scatter else 'line',
        'same_y_axis': False
    }
    
    plot_predictions_and_actual(actual, predictions, plot_description)
    return overall_error_metrics

def overheats_prediction_studies(dict_location_df: Dict[str, pd.DataFrame], 
                          overheat: float, window_size: int, 
                          prediction_horizon_list: List[int], 
                          scatter: bool = False) -> Dict[int, ErrorMetrics]:
    """
    Apply the linear model to predict overheat events for multiple locations and dates.
    """
    dict_location_df = {
        location: dft.convert_column_to_float(df, 'Value')
        for location, df in dict_location_df.items() if df is not None
    }

    overheat_location_date_pairs = [
        (location, date)
        for location, df in dict_location_df.items()
        for date in pd.to_datetime(df[df['Value'] >= overheat]['Time'].dt.date.unique())
    ]

    process_func = partial(process_location_date, 
                           window_size=window_size, 
                           overheat=overheat, 
                           prediction_horizon_list=prediction_horizon_list, 
                           scatter=scatter)
    overall_error_metrics = {}
    for location, date in overheat_location_date_pairs:
        overall_error_metrics = process_func(location, dict_location_df[location], date, 
            overall_error_metrics=overall_error_metrics)

    # Compute the final error metrics
    final_error_metrics = {}
    for horizon, metrics in overall_error_metrics.items():
        total_true_pos, total_false_pos, total_false_neg, total_true_neg, total_residure_squared, max_error = metrics
        total_predictions = total_true_pos + total_false_pos + total_false_neg + total_true_neg
        
        overall_precision = total_true_pos / (total_true_pos + total_false_pos) if (total_true_pos + total_false_pos) > 0 else 0
        overall_recall = total_true_pos / (total_true_pos + total_false_neg) if (total_true_pos + total_false_neg) > 0 else 0
        overall_rmse = np.sqrt(total_residure_squared / total_predictions) if total_predictions > 0 else 0
        
        final_error_metrics[horizon] = ErrorMetrics(
            rmse=overall_rmse,
            max_error=max_error,
            recall=overall_recall,
            precision=overall_precision
        )
    
    return final_error_metrics
type or paste code here

(I don’t why, I cannot write more in the last post. Continuing here.)
Here are the dialogues between me and cursor:

I would like to add the collection of error metrics of recall, precision, rmse, and max_error for all the location and overheat dates returend as list of the error metrics from the function overheats_prediction_studies, with the returned list, I would like to plot the histograms of them respectively.

Certainly! I'll modify the overheats_prediction_studies function to collect error metrics for all locations and overheat dates, and then create a new function to plot histograms of these metrics. Here's the updated code:

def overheats_prediction_studies(dict_location_df: Dict[str, pd.DataFrame], 
                          overheat: float, window_size: int, 
                          prediction_horizon_list: List[int], 
                          scatter: bool = False) -> Tuple[Dict[int, ErrorMetrics], Dict[int, List[ErrorMetrics]]]:
    """
    Apply the linear model to predict overheat events for multiple locations and dates.
    Returns overall error metrics and individual error metrics for each prediction.
    """
    dict_location_df = {
        location: dft.convert_column_to_float(df, 'Value')
        for location, df in dict_location_df.items() if df is not None
    }

    overheat_location_date_pairs = [
        (location, date)
        for location, df in dict_location_df.items()
        for date in pd.to_datetime(df[df['Value'] >= overheat]['Time'].dt.date.unique())
    ]

    process_func = partial(process_location_date, 
                           window_size=window_size, 
                           overheat=overheat, 
                           prediction_horizon_list=prediction_horizon_list, 
                           scatter=scatter)
    overall_error_metrics = {}
    individual_error_metrics = {horizon: [] for horizon in prediction_horizon_list}

    for location, date in overheat_location_date_pairs:
        result = process_func(location, dict_location_df[location], date)
        for horizon, metrics in result.items():
            overall_error_metrics[horizon] = [
                overall_error_metrics.get(horizon, [0, 0, 0, 0, 0, 0])[i] + metrics[i]
                for i in range(6)
            ]
            individual_error_metrics[horizon].append(ErrorMetrics(
                rmse=np.sqrt(metrics[4] / (sum(metrics[:4]))),
                max_error=metrics[5],
                recall=metrics[0] / (metrics[0] + metrics[2]) if (metrics[0] + metrics[2]) > 0 else 0,
                precision=metrics[0] / (metrics[0] + metrics[1]) if (metrics[0] + metrics[1]) > 0 else 0
            ))

    # Compute the final overall error metrics
    final_error_metrics = {}
    for horizon, metrics in overall_error_metrics.items():
        total_true_pos, total_false_pos, total_false_neg, total_true_neg, total_residure_squared, max_error = metrics
        total_predictions = total_true_pos + total_false_pos + total_false_neg + total_true_neg
        
        overall_precision = total_true_pos / (total_true_pos + total_false_pos) if (total_true_pos + total_false_pos) > 0 else 0
        overall_recall = total_true_pos / (total_true_pos + total_false_neg) if (total_true_pos + total_false_neg) > 0 else 0
        overall_rmse = np.sqrt(total_residure_squared / total_predictions) if total_predictions > 0 else 0
        
        final_error_metrics[horizon] = ErrorMetrics(
            rmse=overall_rmse,
            max_error=max_error,
            recall=overall_recall,
            precision=overall_precision
        )
    
    return final_error_metrics, individual_error_metrics

My further feedback and cursor’s response:

Given error_metrics for individual location, date has been calculated by process_location_date, would it be simpler to ask process_location_date to return error_metrics in addition than recalculate the individual error_metrics in overheats_prediction_studies

Be cause of the lack of reliability in coding, I find myself reviewing with extreme caution with a willing but not trust-worthy partner, cursor.
I’m not saving time, but maybe, saving some typing.

It seems that we new paradigm and methodologies to take advantage of generative AI, to really benefit from it.

The big problem I’m having atm is that long context mode only generates up to 300-400 lines of code before I need to press the continue button

Tests are the key. Not even tests, just acceptance criteria.

  1. Tell cursor to write tests for the given acceptance criteria
  2. Review the tests, which will be way easier than the business logic code
  3. Make cursor write the actual code, and loop: run tests → give output to composer → apply changes → repeat

We all hated BDD because making wild tools to just write tests in plain english sound stupid, but I think it will be the paradigm for a tool like cursor.

I don’t think it’s cursor. From yubrshen.

process_func = partial(process_location_date, 
                           window_size=window_size, 
                           overheat=overheat, 
                           prediction_horizon_list=prediction_horizon_list, 
                           scatter=scatter)

What library(s) are you using? I used to use ChatGPT not Sennet or Anthropic or anything. We would ask it to write code for simple things and it would screw up much worse than usual because of the two large libraries we were using. The point being see if there is a way to narrow or customize the scope of the chats prompt code output. If you can better describe the codebase to Cursor it’ll yield better results.

Same issue I have to be extra vigilant and check all code carefully to make sure that code outside of the my focus is not touched even when stated clearly not to make changes. Starting to just copy and paste like using copilot.

i went back to chatgpt because yesterday while enhancing a script, Cursor completely changed the file using information from another file that was being used for a completely different purpose.

It requires me to double check all updates, i feel much more relaxed using the chatgpt ui, maybe i’m more used to it.