AI Rules and other best practices

Has anyone tried system prompts or instructions to increase level of procedural structures and have them be made more effective ways?

For example, in my project fully made by the agents, I now have huge web pages with lot of copy pasted sections, which could have been made separated components and re-used.

And another thing I have noticed, is that even when it is creating a function, it might not do it very smartly.

For example, I might have asked to show in UI min and max dates from list of objects. That is something you could easily do in one go, but instead AI agent created two functions. One for min, and one for max. And each copy pasted function it is first building temporary list of objects from original list, then it will iterate it to find the desired value. So it is doing lot of extra work.

How to avoid this kind of waste?

Then third thing is, that it is doing bad designs that lack error handling and are prone to multi-thread errors, but perhaps that is beyond simple minded AIs to even think about, so I won’t ask that.

You should look into using frameworks as they already have an approach that would help avoiding duplicate code.

Also instruct agent to use SOLID & DRY. Both are approaches that avoid duplications and separating of components in to logical blocks.

Which programming language are you using? Any framework?

1 Like

Frameworks won’t help to the problems I described. Sure, I could and would like to use Mudblazor with the blazor I’m so fond of, but in my experience the constantly changing api is problem for AI hallunications. Pure blazor flies by easier, because AI actually knows it.

But even with mudblazor, and if AI knew it well, the problem would still be same like I said. It would still build huge pages with lot of copy pasted code. And it would still make inefficient code for simplest things, like calculating min and max date.

But solid&dry instructions, that is something I could try. Any idea how to actually do it?

I think that the programming language & framework you use may need to step up for better AI development.

For example PHP has done great advances for AI assisted coding, notably Symphony and Laravel frameworks. You do not need to declare what goes where, or how it needs to be done. They include tools that make AI assisted development using PHP and related languages very practical.

As for DRY & SOLID? Add this as a user rule:

Follow development best practices, SOLID & DRY

As you are the ‘architect’ the system design decisions rest with you. If you vibe code or do not use core guidelines/rules for your project then AI will find its own way.

Wait, I think I actually have that already (done by chatgpt). It obviously did not solve this, or maybe it is too long and complex to be the rules?


Coding AI Guidelines (Blazor/C# Focus)
Prefer procedural structure and DRY principles: modify and extend existing code rather than duplicating similar code for each feature.

Refactor common logic into helper methods or services; avoid repeating structures.

In Blazor components, avoid using shared component-level fields/properties inside async flows. Take required values into local variables (immutable snapshots) at the start of the method to prevent mid-operation changes.

Perform scheduled or continuous work in the background (in-process), e.g., using PeriodicTimer, BackgroundService, or a dedicated thread — do not overload components with heavy processing.

Send background work results to components via events or simple event propagation. Avoid overly complex, repetitive subscription patterns.

Ensure thread safety (immutable data, local copies, locking only when necessary) and use CancellationToken in long-running operations.

In C#: always use var when possible; for financial/numeric accuracy, prefer decimal (perform necessary conversions).

Environment: Windows/PowerShell examples only (no Linux unless explicitly requested).

In Blazor JS interop: avoid capital letters in JavaScript-side names.

The rules sound overall quite fine. My feedback would be:

  • Avoid too many negative statements like avoid,… Negative statements may be associated with bad code as that happens so in training data.
  • Back up positive statements with reasoning. The reasoning gives AI a reason why this is actually important. (not because we say so, but because it has consequences)

Perform scheduled or continuous work in the background (in-process), using PeriodicTimer, BackgroundService, or a dedicated thread as this prevents overloading components with heavy processing.

4 Likes

Good catch, thanks! I did feed it to the chatgpt and it gave me new rules. Interested to see how it goes :smiley:
—————–
Coding AI Guidelines (Blazor/C# Focus)

  • Extend existing code and keep the flow procedural
    Do new features by modifying/extending what’s there instead of starting parallel copies, because a single, evolving path reduces maintenance overhead and shrinks the bug surface.

  • Extract shared logic into helpers/services (DRY)
    Centralizing cross-cutting logic creates one source of truth, which simplifies testing, speeds refactors, and keeps feature behavior consistent.

  • Keep components thin; put ongoing work in background services
    Run recurring/heavy work in-process (e.g., PeriodicTimer, BackgroundService) so the UI thread stays responsive and lifecycle concerns (start/stop) are handled cleanly.

  • Snapshot state in async methods
    Capture needed values into local variables at the start of an async method, because component state can legitimately change between awaits and snapshots make behavior deterministic.

  • Deliver results to components via simple app-level events
    Raise .NET events (or a minimal event aggregator) from the background layer so components subscribe once and just render, which keeps wiring simple and reduces repeated boilerplate.

  • Favor immutable data and local copies; synchronize only at boundaries
    Immutable inputs and per-call copies make concurrency reasoning straightforward, which lowers the need for locks and reduces race conditions.

  • Propagate CancellationToken through long-running operations
    Cooperative cancellation enables fast teardown on navigation/dispose, improving UX and freeing resources promptly.

  • Use var when the type is clear
    This lowers visual noise and keeps refactors safer (the compiler infers the right type even if the right-hand side changes).

  • Prefer decimal for exact numeric work
    decimal avoids binary floating-point rounding issues, which is crucial for money and other precision-sensitive calculations.

  • Use Windows/PowerShell in examples and tooling
    Matching the target environment ensures copy-paste works immediately and reduces context switching.

  • Use lower-case identifiers for Blazor JS interop
    Consistent lower-case naming on the JavaScript side and in interop strings prevents casing mismatches, leading to more reliable bindings.

1 Like

Rules. They are fundamentally critical to getting Cursor Agent and an LLM to build good quality code.

Further, rules shouldn’t just be a few lines, a little blurb here or there. Rules need to be comprehensive, nuanced, detailed. Because otherwise, you end up with LOOPHOLES all over the place. I started with simple rules, but very quickly my rules became like the below example, after a few cycles of having the Agent itself, check my rules for loopholes and close them.

This rule has been iterated in three times I think. It was originally about 25 lines. Its been modified by the agent itself (mostly Sonnet and now Grok Code) each time I investigate for loopholes (which is usually right after the LLM seems to find one and use it!)

With rules like this, I have just about totally corralled the agent, with respect to the topic of each rule. In this case, unit testing. It covers not just how to test, but exactly how, exactly how to run CLI commands to perform testing, examples of good usage vs. bad, MANDATORY requirements (these become critical, when you really want the agent to DO EXACTLY what you require it to do!)

RULES. They are not a passing thought. They aren’t simple or light weight. They are the infrastructure that GOVERNS the agent and llm! Rules will generally come out of you running into the agent doing things you do not like, and over time, as you work on keeping the agent from doing more and more of those things, your rules WILL, and SHOULD, grow beyond a simple 3-line blurb. With rule files like these, I no longer have issues with unit testing, integration testing, design for things like data service layer, business logic layer, api service layer, api controller layer, support service types, etc. I am still working on code style and such, as those are often so extensive.

KEY FACT: The agent and llm, they don’t really know anything! Not about what YOU WANT. They know the fundamentals. They do not know any specifics. UNLESS YOU TELL THEM! Rules tell the agent and llm, WHAT YOU WANT. WHAT YOU EXPECT. WHAT YOU REQUIRE. WHAT YOUR EXPECTATIONS ARE.

Tell the agent! RULES!

KEY FACT: The agent and llm, do NOT have any actual intelligence! An LLM is a highly advanced knowledge base with human-like interactive capabilities. They posess no intelligence! BYOI: Bring Your Own Intelligence! Together the agent and llm can behave semi-intelligently, but they again cannot really behave exactly correctly, without YOUR INSTRUCTION. BYOI, and deliver the intelligence the LLM lacks.

RULES. They rule!

NOTE: The .mdc file below the next line is a SINGLE rule file!

=== unit-testing.mdc ===

---
description: Comprehensive unit testing guidelines covering test creation, execution, fixing, and best practices
globs: *.test.ts,*.test.js,*.spec.ts,*.spec.js,*.integration.spec.ts,*.integration.spec.js
alwaysApply: false
---

Comprehensive Unit Testing Guidelines

Activation Triggers (Apply Intelligently)

The assistant should automatically apply this rule when working with tests. Triggers (case-insensitive):

  • Keywords: test, spec, unit test, integration test, mock, stub, jest
  • Test files: *.spec.ts, *.test.ts, *.integration.spec.ts, *.timing.spec.ts
  • Phrases: run tests, fix tests, create tests, write tests, test coverage
  • Actions: testing, mocking, stubbing, test failure, test broken

Targeted Test Execution (MANDATORY)

CRITICAL: Use direct Jest commands for targeted testing. NEVER use npm test for specific file testing.

CRITICAL PARAMETER WARNING

:cross_mark: NEVER USE THESE INCORRECT PARAMETERS:

# WRONG - Will run full test suite (1300+ tests)
npx jest --testPathPatterns="pattern"  # ❌ Plural form is INVALID
npx jest --testNamePatterns="pattern"  # ❌ Plural form is INVALID
npx jest --testRegexs="pattern"        # ❌ Plural form is INVALID

:white_check_mark: ALWAYS USE THESE CORRECT PARAMETERS:

# CORRECT - Will run targeted tests
npx jest --testPathPattern="pattern"   # ✅ Singular form is CORRECT
npx jest --testNamePattern="pattern"   # ✅ Singular form is CORRECT  
npx jest --testRegex="pattern"         # ✅ Singular form is CORRECT

VALIDATION CHECK: If your Jest command runs more than ~50 tests when you expect to run 1-5 tests, you are using INCORRECT parameters.

Parameter Validation Methods

Before executing any Jest command, VERIFY:

  1. Check parameter spelling:

    # Quick validation - list tests that would run (doesn't execute tests)
    npx jest --listTests --testPathPattern="asset-data.service"
    
  2. Expected test count verification:

    • Single file: Should show 1 file
    • Module directory: Should show 2-10 files typically
    • If you see 50+ files, you have the wrong parameter
  3. Use shell aliases to prevent mistakes:

    # Load safe aliases (prevents typos)
    source .jest-aliases.sh
    
    # Use safe aliases instead of typing parameters
    jest-pattern="asset-data"           # Safe: uses --testPathPattern
    jest-unit                          # Safe: correct unit test pattern
    

Why Direct Jest Commands

The npm test script often has pre-configured filters that conflict with additional patterns:

"test": "jest --testPathIgnorePatterns='.*\\.timing\\.spec\\.ts$' --testPathIgnorePatterns='.*\\.integration\\.spec\\.ts$'"

Direct Jest commands provide precise control and prevent running thousands of tests unnecessarily.

Targeted Execution Commands

Single File Testing:

# Test specific file
npx jest src/domains/entity/entity-data.service.spec.ts

# Test with verbose output
npx jest src/domains/entity/entity-data.service.spec.ts --verbose

# Test with coverage for specific file
npx jest src/domains/entity/entity-data.service.spec.ts --coverage --collectCoverageFrom="src/domains/entity/entity-data.service.ts"

Multiple File Testing:

# Test multiple specific files
npx jest src/domains/entity/entity-data.service.spec.ts src/domains/entity/entity-orchestration.service.spec.ts

# Test entire directory
npx jest src/domains/entity/

# Test by pattern
npx jest --testPathPattern="assets.*\.spec\.ts$"

Test Suite Targeting:

# Test specific describe block
npx jest --testNamePattern="EntityDataService"

# Test specific functionality across files
npx jest --testNamePattern="soft deletion"

# Combine file and test name patterns
npx jest src/domains/assets/ --testNamePattern="createOrUpdate"

Test Type Filtering:

# Unit tests only (exclude integration/timing)
npx jest --testPathIgnorePatterns=".*\.integration\.spec\.ts$" --testPathIgnorePatterns=".*\.timing\.spec\.ts$"

# Integration tests only
npx jest --testPathPattern=".*\.integration\.spec\.ts$"

# Timing tests only
npx jest --testPathPattern=".*\.timing\.spec\.ts$"

When to Use Full Test Suite

ONLY use npm test in these scenarios:

  • Final validation before completing work
  • Explicitly requested by user
  • CI/CD pipeline execution
  • Regression testing after major changes

NEVER use npm test for:

  • Iterative development testing
  • Single file validation
  • Debugging specific test failures
  • Performance-sensitive testing workflows

Test Creation Best Practices

Test Structure and Organization

File Naming Conventions:

# Unit tests
component.spec.ts
service.spec.ts

# Integration tests
component-api.integration.spec.ts
service.integration.spec.ts

# Timing/Performance tests
component.timing.spec.ts

Test Structure Pattern:

describe("ComponentName", () => {
  let component: ComponentName;
  let mockDependency: jest.Mocked<DependencyType>;

  beforeEach(() => {
    // Setup for each test
  });

  afterEach(() => {
    // Cleanup after each test
    jest.clearAllMocks();
  });

  describe("methodName", () => {
    it("should handle normal case", () => {
      // Arrange
      // Act
      // Assert
    });

    it("should handle edge case", () => {
      // Test edge cases
    });

    it("should handle error case", () => {
      // Test error scenarios
    });
  });
});

Test Naming Conventions

Descriptive Test Names:

// ❌ Poor naming
it("should work", () => {});
it("test create", () => {});

// ✅ Good naming
it("should create user with valid email and password", () => {});
it("should throw ValidationError when email is invalid", () => {});
it("should return empty array when no results found", () => {});

Test Categories:

  • Happy Path: Normal successful operations
  • Edge Cases: Boundary conditions, empty inputs, limits
  • Error Cases: Invalid inputs, system failures, exceptions
  • Integration: Cross-component interactions

Mocking and Stubbing Guidelines

Mock External Dependencies:

// ❌ Don't test external systems
it("should call real database", async () => {
  const result = await realDatabaseService.findUser(id);
  expect(result).toBeDefined();
});

// ✅ Mock external dependencies
it("should handle database user lookup", async () => {
  const mockUser = { id: "123", name: "Test User" };
  mockDatabaseService.findUser.mockResolvedValue(mockUser);

  const result = await userService.getUser("123");

  expect(mockDatabaseService.findUser).toHaveBeenCalledWith("123");
  expect(result).toEqual(mockUser);
});

Sparse Object Pattern (Approved):

// ✅ Use sparse objects for testing - this is explicitly allowed
const mockUser = {
  id: "123",
  email: "[email protected]",
  // Only include properties needed for test
} as unknown as User;

Mock Creation Patterns:

// Service mocking
const mockUserService = {
  findById: jest.fn(),
  create: jest.fn(),
  update: jest.fn(),
  delete: jest.fn(),
} as jest.Mocked<Partial<UserService>>;

// Class mocking with jest
const mockPrismaService = {
  user: {
    findUnique: jest.fn(),
    create: jest.fn(),
    update: jest.fn(),
    delete: jest.fn(),
  },
  $transaction: jest.fn(),
} as jest.Mocked<PrismaService>;

Test Fixing Best Practices

Root Cause Analysis (MANDATORY)

Before making ANY changes:

  1. Read the full error message - understand what’s actually failing
  2. Identify the source - is it test code, production code, or environment?
  3. Understand the intent - what was the test supposed to verify?
  4. Check for side effects - are other tests affected?

Investigation Steps:

# Run failing test in isolation with verbose output
npx jest path/to/failing.spec.ts --verbose

# Run with debugging information
npx jest path/to/failing.spec.ts --no-cache --detectOpenHandles

# Check if it's environment-related
npx jest path/to/failing.spec.ts --runInBand

Common Test Failure Patterns

Async/Promise Issues:

// ❌ Missing await
it("should handle async operation", () => {
  service.asyncMethod(); // Missing await
  expect(result).toBeDefined(); // Will fail
});

// ✅ Proper async handling
it("should handle async operation", async () => {
  const result = await service.asyncMethod();
  expect(result).toBeDefined();
});

Mock State Pollution:

// ❌ Mocks not cleared between tests
describe("UserService", () => {
  it("should create user", () => {
    mockDatabase.create.mockResolvedValue(user);
    // Test runs, mock has call count = 1
  });

  it("should find user", () => {
    // This test might fail if it expects mock to be clean
    expect(mockDatabase.create).not.toHaveBeenCalled(); // Fails!
  });
});

// ✅ Proper mock cleanup
describe("UserService", () => {
  afterEach(() => {
    jest.clearAllMocks(); // Clean state between tests
  });

  // Tests now run independently
});

Type Issues:

// ❌ Don't use | null
interface TestUser {
  id: string;
  name: string | null; // Avoid this
}

// ✅ Use optional properties or | undefined
interface TestUser {
  id: string;
  name?: string; // Preferred
  email: string | undefined; // If absolutely necessary
}

Test Determinism Requirements

Tests MUST be deterministic:

  • Same input → Same output (always)
  • No dependency on external state
  • No reliance on system time, random values, or network
  • No shared mutable state between tests

Making Tests Deterministic:

// ❌ Non-deterministic
it("should create user with current timestamp", () => {
  const user = createUser(); // Uses Date.now() internally
  expect(user.createdAt).toBe(Date.now()); // Will fail due to timing
});

// ✅ Deterministic
it("should create user with specified timestamp", () => {
  const fixedDate = new Date("2023-01-01T00:00:00Z");
  jest.useFakeTimers().setSystemTime(fixedDate);

  const user = createUser();
  expect(user.createdAt).toBe(fixedDate.getTime());

  jest.useRealTimers();
});

Test Validation Process

Individual Test File Validation (MANDATORY)

After making changes to any test file:

  1. Run the specific test file:

    npx jest src/path/to/modified.spec.ts
    
  2. Verify build succeeds for the test:

    # Check TypeScript compilation
    npx tsc --noEmit src/path/to/modified.spec.ts
    
  3. Run with coverage to verify completeness:

    npx jest src/path/to/modified.spec.ts --coverage --collectCoverageFrom="src/path/to/production-file.ts"
    

Multi-File Validation

When changes affect multiple files:

  1. Run affected test files together:

    npx jest src/domains/users/ --coverage
    
  2. Check for test interaction issues:

    npx jest src/domains/users/ --runInBand --detectOpenHandles
    

Full Suite Validation (Final Step Only)

Only after ALL individual tests pass:

npm test

Performance and Efficiency

Fast Test Execution

Optimize test setup:

// ❌ Expensive setup in each test
beforeEach(() => {
  database = new TestDatabase(); // Heavy operation
  database.migrate();
});

// ✅ Reuse expensive setup
beforeAll(() => {
  database = new TestDatabase();
  database.migrate();
});

beforeEach(() => {
  database.clearData(); // Light operation
});

Avoid unnecessary async operations:

// ❌ Unnecessary async
it("should validate input", async () => {
  const result = validateEmail("[email protected]"); // Sync function
  expect(result).toBe(true);
});

// ✅ Keep it synchronous
it("should validate input", () => {
  const result = validateEmail("[email protected]");
  expect(result).toBe(true);
});

Resource Management

Clean up resources:

describe("FileService", () => {
  let tempFiles: string[] = [];

  afterEach(() => {
    // Clean up created files
    tempFiles.forEach((file) => fs.unlinkSync(file));
    tempFiles = [];
  });
});

Error Prevention and Recovery

Prohibited Actions

NEVER do these:

  • Comment out or delete failing tests without understanding why they fail
  • Skip test validation steps
  • Use any type to bypass TypeScript errors in tests
  • Create tests that depend on external systems (databases, APIs, file system)
  • Write tests that modify global state without cleanup

Recovery Procedures

If tests fail after changes:

  1. Isolate the failure - run individual test files
  2. Check recent changes - what code was modified?
  3. Verify mocks - are mocks properly configured?
  4. Check async handling - are promises properly awaited?
  5. Reset environment - clear caches, restart if needed

If build fails for test files:

  1. Check imports - are all imports correct?
  2. Verify types - do test objects match expected types?
  3. Check mock types - are mocks properly typed?

Integration with Development Workflow

During Feature Development

  1. Write tests first (TDD) or alongside implementation
  2. Run targeted tests frequently during development
  3. Use watch mode for active development:
    npx jest src/path/to/work-area/ --watch
    

Before Committing

  1. Run all affected tests:

    npx jest src/affected-areas/
    
  2. Verify no side effects:

    npm test  # Full suite only before commit
    

Code Review Preparation

  • Ensure all tests have clear, descriptive names
  • Verify good coverage of happy path, edge cases, and error cases
  • Check that mocks are realistic and properly configured
  • Confirm tests run quickly and deterministically

Best Practices Summary

Test Quality Checklist

  • Tests are deterministic and repeatable
  • External dependencies are mocked/stubbed
  • Test names clearly describe what is being tested
  • Tests cover happy path, edge cases, and error scenarios
  • Tests run quickly (< 100ms each for unit tests)
  • No test pollution between test cases
  • Proper async/await usage where needed
  • TypeScript types are properly maintained

Execution Checklist

  • Use npx jest for targeted testing
  • Individual test files validated before integration
  • Full test suite run only at completion
  • Build verification for modified test files
  • Coverage verification for critical paths

Maintenance Checklist

  • Root cause analysis before fixing failing tests
  • Obsolete tests removed (only when truly obsolete)
  • Test setup and teardown properly configured
  • Mock state cleaned between tests
  • Resource cleanup after tests complete

Integration with Other Rules

This rule works in conjunction with:

  • API v2 implementation testing requirements
  • Story implementation testing phases
  • Commit message standards for test changes
  • Code quality and architecture guidelines
  • Domain data services testing patterns

Remember: Good tests are fast, reliable, and maintainable. They should give you confidence in your code without slowing down your development process.

4 Likes

I confess, I’m little bit lazy adding any permanent instructions. I suppose memory would be perfect tool for my current workflow, where I just ask and it does then we refine further after that, but I tend to Forget even that :laughing:

1 Like

Memories, I find, are not as effective as rules. There is no real info or insight into when they apply, how, why, etc. I stopped using memories, due to their ambiguity, and how they might conflict with actual rules.

I highly recommend rules. You mention you are lazy. However, you started a thread noting some significant issues with your codebase. I know people vibe code, to keep it light. However, in the long run, without rules, its never light, and the unmitigated disasters that can (and frequently DO) happen, could be avoided.

FWIW…I have my rules locked up tight enough now, that I am able to, from a single prompt and a user story in Linear, trigger the agent to do 30-50 minutes of work IN ONE GO. It isn’t perfect, but, I can crank out layers of code, from a single prompt and a decent story. The rules make sure that teh agent and llm, know how to process and implement the requirements documented in the story. Not just “code this way” but “operate this way”, whiere operate goes beyond code into procedures like creating new branches, how to run those commands at the terminal, checking the build, running test cases and fixing broken tests, and finally committing, one the work is all done!

While the results are not “perfect”, with tight, detailed rules, they have become quite good, rather consistent (for an LLM anyway), following patterns, practices, -ilities, etc.

If you want to be able to instruct the LLM and have it actually DO WHAT YOU NEED IT TO DO, then build up your rules. To be clear, that doesn’t mean writing them yourself! :smiley: Vibe code the rules, too! Tell the agent to generate a rule about X. Then, when you find that rule X is not always being followed right, instruct the agent to find loopholes in your rules for X, and to improve them. Rinse and repeat. You don’t have to write your rules. I rarely do (the occasional manual tweak here and there). Instruct the agent to write them. Over time, you’ll get better code. Over time, the agent will work more and more how YOU WANT it to work.

1 Like

Heh, I know I started it, but to be quite honest most of the time I don’t really care what code it is producing as long as it works. Only the rare cases where I’m forced to go and look the code, I feel bad for the way it was done :zany_face: Benefits of being “vibe coder” I suppose.

This project I’m vibing right now just started from my excel file with work history, that I wanted to transform as service database with UI instead. And now it is (according to VS metrics) 51206 lines long and growing each day. All lines invented by AI and I’m fairly happy even tho the program is still not perfect for actual use yet, because I keep inventing new use cases for it.

But anyway, once those new redefined rules earlier start kicking, I will tell if they were any help or not. Difficult to know exactly, because I don’t have time nor interest to read all changes that have been made. Probably takes few days.

IMO, quality code these days, is less about human readability, and more about model efficacy and product stability. 50k LOC is a lot, for a small project. :wink:

Something I have found, is that when the code follows some key design and architectural principles, the agent and LLM have an easier time working with it. The chaos of bad code, affects models just as much as humans (in fact, probably affects the model more!)

Code with good design, IMHO, helps the model be more effective, which in turn requires less churn, which reduces cost (fewer tokens burned getting things working), which in the long run should be good for the vibe coding experience: faster results and ability to actually realize the value of those results more frequently.

Again, you wouldn’t necessarily be writing the rules. :wink:

2 Likes

In addition not designing anything before starting the project, I also started it with gpt5-high and then after getting tired of constant “oh, should I also do this and that” questions instead just finishing the tasks, I switched it to the claude models mostly sonnet. It brings even more to the mess, because those models have their own styles (especially without rules), and they don’t always understand others intent so well as their own.

Any idea if the gpt5 would work longer with some simple rule?

I find that GPT-5 is super fussy. I was frustrated using it as well. It is not as effective, IMO, as Claude 4 Sonnet. You also have to be much more careful about HOW you prompt, what kind of human language you use in your prompts, with GPT-5, as that model seems to be sensitive to those factors. Claude Sonnet, on the other hand, seems to be able to handle variations in natural human language much better.

GPT-5 is also monstrously “thinking” heavy. It spends an EXORBITANT amount of time thinking, to utter waste. So I stopped using it for most things, and went back to Claude 4 Sonnet. The latter is just a more natural tool, and seems to handle natural prompts (which are much more common with actual vibe coding).

You are correct, that the different models have different styles, opinions, etc.! It is a real pain in the ■■■■! I am actually very much a multi-modal user, and I use three primarily now, with usage of others here and there: Grok Code, Claude 4 Sonnet, GPT-5. In that order.

Now, Grok Code! This is FREE right now, and still for a few more days. This model KICKS ■■■. Its blazing fast, its thought cycles are super fast (GPT-5 at least 30 seconds on average, as much as 90 or so all too often; Grok Code? 1-2 seconds vast majority of the time!) Grok Code has a 2 MILLION tpm output rate, vs. Sonnet which I think is around 1/10th of that or so (200-300k tpm). So, you can get a lot more, done much quicker, with Grok Code. It is apparently competitively priced, so even once the free period is over, it sounds like it will be competitive with at least Sonnet, maybe cheaper. I say cheaper, becaus,e I find that Grok Code is even better than Sonnet at the natural language handling, and its understanding of programming problems seems much deeper than either GPT-5 or Sonnet.

SO, I highly recommend you use Grok Code! It will likely save you both time and money in the end.

Now, rules. IMO, I think the reason I CAN float between models so easily, is because I do now have a decent set of rules set up, that clearly explain my goals to every model. They do still have their “personal” quirks, preferences, individual nuances. But all of them, are generally much better at doing WHAT I want, than doing what they want, now that I DO have rules wrapped around everything.

My rules are not perfect. They aren’t even excellent. They are good. They keep evolving. I am regularly asking the agent to evaluate Rule X or Y for loopholes regarding some problem, and having it plug the loopholes. As I do that, the rules get better, they control everything more explicitly, and each model does things more consistently according to MY PERSONAL WAY.

A LOT of my rules, revolved around terminal command usage. That was really chaotic at first. Now, a lot of the terminal commands that the agent+llm want to do on a regular basis, are pretty consistent across models. I’ve found ways for the agent to use certain commands that work optimally with the agent and llm (i.e. in my unit testing rules…I have it run npx jest directly, and clarify what the parameter names are, because this is one of those areas that is usually all over the map. Every LLM now uses jest and npm test correctly, 95-97% of the time (there is always that edge case, though. ;P) I have the same sort of rules set up around other common problems. The agent is now setting up “apply intelligently trigger” sections, to help guide when a rule is actually factored in and used.

Over time, things will just get more consistent, more reliable, as my rules become more refined. And refinement is a continual, ongoing process. Little by littel, you tighten things up, and make Cursor, its agent and the llm work FOR YOU, the way YOU WANT. It really helps. IMO, it is the key that a lot of vibe coding is missing, which is why I’m sharing. :wink:

So: Grok Code + Rules. I think it will change a lot for you.

1 Like

Ah, a good example just cropped up. I’m planning a new story for an epic on a rather large project (refactoring a critical table of a database!) I had grok-code-fast-1 :brain: read the stories that existed, and evaluate the requirements in all of them as a whole, against something else I hadn’t accounted for previously: updating our seed scripts for dev/test environments to accommodate the schema changes.

Grok gave me a great, detailed, clear and explicit plan for how to update the seed scripts. However when I told it to create a story in Linear for it, it heavily summarized all the detail, dropped all code examples and templates, all the nuances and explicit information that is ESSENTIAL to having the agent actually implement something like this, implement it well, implement it in one shot, and do it right (or 97% right) the first time through.

So I had the agent (and Grok) examine my linear-mcp.mdc rules, look for loopholes that would allow such summarization. It updated the rule with this section:

Plan Preservation & Story Content Guidelines

CRITICAL: NEVER Summarize Plans

MANDATORY RULE: When creating Linear stories from detailed plans developed with the agent, you must include the FULL PLAN as-is. NEVER summarize, condense, or abbreviate the plan content unless explicitly instructed to do so by the user.

WHY THIS MATTERS:

  • Plans ARE the stories; stories ARE the plans

  • Implementation teams use stories to drive development

  • Summarized plans lose critical technical details

  • Full plans enable proper estimation and execution

REQUIRED BEHAVIOR:

  • Copy the complete plan verbatim into the story description

  • Preserve all technical details, code examples, and specifications

  • Maintain original formatting and structure

  • Include all phases, timelines, and success criteria

  • Only summarize if the user explicitly requests: “summarize this plan” or “create a summary story”

ENFORCEMENT:

  • This rule takes precedence over any other content guidelines

  • Violation requires immediate correction and re-creation of the story

  • Default behavior: FULL DETAIL unless explicitly instructed otherwise

Hey, speaking of database tables, how does your agent communicate with them? I managed to get mssql-python mcp working after huge struggles, and it was working just fine. But after 1.5.x I noticed many times it wants to know what data table has, it says the tool did not return any actual results.

And also, I wanted to do some little bit more advanced migration from one structure to another, and while agent was doing fine job itself, it decided to do only few records as sample. I was not able to enforce it to migrate everything no matter how I wanted. I’m not sure if that is problem with the mcp (not being able to read what data exist), or maybe some optimizations implemented to Cursor. Later seems more better explanation anyway.

Anyway, for context of this, I should maybe mention that I did fall back giving the old data in xml instead mcp reading. The agent was supposed to delete old data from database, but even that failed. Eventually I did do it with chatgpt generated sql scripts that read directly from the xml, but while it is good idea itself, the problem was that I wanted LLM AI to recognize some things (written by humans to invoice lines) from there, that cannot be automated with mere sql script.

So, I don’t let my agents communicate directly with my DBs. I’ve got three decades of experience in the industry, and, I am just not at a point where I can allow any current generation LLM direct access to my DB. :wink:

So I manage the DB in other ways. DETERMINISTIC PROCESSES. This is a key difference between LLMs and other software tools: An LLM is probablistic in nature, and thus highly NON-DETERMINISTIC. You just don’t know what they might do! Even with rules, they can choose to ignore them (that 3-5% edge case I mentioned earlier.) So, when it comes to CRITICAL INFRASTRUCTURE, I use deterministic tools.

Another reason I probably wouldn’t allow the agent/llm to manage…my db, or any other critical data store, is the instability of Cursor itself. Its got a lot of issues, bugs, stability problems, etc. I just can’t give it that kind of control yet. So I haven’t run into issues like you mentioned, as I just don’t do that. Maybe some day…IMHO, this tech just isn’t here yet. I’ve also read about vibe coders who have nuked their prod dbs, or entire prod environments, so, I know that these risks are REAL, and not that uncommon (sadly.)

Regarding getting the LLM to do record migrations. I DO think that rules would help. So here is my recommendation for you: Before you try to do that again, talk with teh agent, explain your goals, and probe it to see if it actually understands what you want it to do. Then ask it HOW it might go about doing it. One thing to keep in mind with data…data is often much too large, for the agent to handle. Further, you probably don’t even WANT the agent to be handling all your data anyway…it’ll burn up tokens FAST, cost you a LOT more money, etc. Instead, you probably want to approach it this way, which will probably be more effective, not to mention safer:

  1. Before allowing the agent to do anything, instruct it to create a backup. This is CRITICAL. MAN-DA-TO-RY! :wink:
  2. Have the agent sample the data you need it to work with. A few rows.
  3. Explain what you need it to do. In detail, but, don’t get too deep into the weeds. Sufficient detail. Explain things you need done, things you WILL NOT ALLOW, things you ALWAYS EXPECT.
  4. Ask the agent to create a rule that will codify how you want it to work with that kind of data.
  5. Once the rule is in place, ask the agent to generate a script, to perform the necessary operations you require. Maybe its just a .SQL script, or maybe its in your preferred programming language. Have it generate a script.
  6. YOU run the script!

Note…the script? Its DETERMINISTIC! Once written, it’ll do exactly the same thing, over and over, EVERY single time you run it. Very different from an LLM!

Now, since you have a backup, if something goes ■■■■, you can recover. In fact, in some cases, after a backup, you might even have the agent generate a script to clone your table, and initially operate on the clone. This works fine for tables that don’t have massive amounts of data. Thousands of records, even tens of thousands, create a clone of the table and test your operations on that first. When you know the script works (since LLMs are non-deterministic, its usually a process to get things like that working right!), then, and only then, do you apply it to the real table.

This is a level of…know how, knowledge, with a specificity, that is just beyond BULT-IN capabilities of anything. Cursor can’t do this for you. The LLM just doesn’t know your specific data out of the box. You have to provide the necessary CONTEXT, KNOWLEDGE, INTENT and RULES, so that Cursor and the LLM can perform the work you require effectively. Once you start doing that…bridging the gap between the agent’s built-in capabilities and the LLMs generalized knowledge…then you can really start making them work for you.

Heh, yes, running rampant on production database would be really really bad :zany_face: One time, with just sqlcmd since the mcp was disabled for some reason, it outright wanted to DROP WHOLE DATABASE. And it was definitely not something I wanted to do, especially because the database has asp identity tables too that had nothing to do with business data tables. Thank god the sqlcmd is not on my whitelist :sweat_smile:

But that said, I don’t feel bad letting it update and get data from my development database that originated from excel and has not even seen production status yet. Even after that I think it can read just fine, but should prefer the ef core migration files it is so eager to generate anyway. Reason for this is, that sometimes I might make changes myself to the database using sql management stuio. I have noticed the ai is not very smart with relation database structures either, and planning myself might safe lot of headache later. But the models need to be synchronized to the c# classes somehow after that..

I definitely feel I should maybe do some rules for this tho. Maybe I will talk with chatgpt about that tomorrow :thinking:

1 Like

I think rules will help. You don’t have to start super advanced. Get something in place, and work on it over time. I think it will help your vibe coding be more effective and efficient, and you’ll get more of what you want, less of what you don’t. Its a journey.

1 Like

What would we think about this kind rule? Could it work in practice?
—————————-
When unsure, propose options and ask for feedback.
Being “unsure” means situations where requirements are ambiguous, multiple valid implementations exist,
trade-offs cannot be decided without user preference, or external context is missing (e.g., file paths, naming, environment details).
In such cases, present 2–3 actionable options (each with brief pros/cons, required inputs, and expected impact),
state any assumptions, and recommend a default. Then ask one concise confirmation question.
This keeps progress moving while ensuring alignment and preventing wasted work.