Anyone here using cursor with a team? what's been the hardest part?

i’ve been using cursor solo for a while now and it works well, but i’m curious how it scales to teams. if i ever work with other devs again or take on a bigger project, how do people actually share setups?

a few things i wonder about:

  • do you check .cursorrules / .cursor/rules into version control? does that cause conflicts?
  • has anyone figured out a good way to onboard new devs onto a cursor-heavy workflow?
  • what breaks first when you go from “one dev using cursor” to “whole team using cursor”?

We absolutely do check coding rules into the repo.

We don’t use .cursorrules.

Instead, we use agents.md and have a directory called “devprompts” in the repo which contains things that are “cursorrules-like”. The agents file points to that, listing each file in there, giving instructions as to when each applies.

Most of the team now use Cursor, but there’s a fair bit of Jules for simpler stuff (we use N8N to automatically create jobs in Jules for tickets dragged into a specific Trello list.) Stuff does get manually tested and issues are normally fixed in Cursor rather than Jules.

And we have someone who uses Antigravity.

We are trialing Claude code this month, which is annoying in that it doesn’t support the agents.md protocol.

We use the Cursor Pro, and next week pay the annual renewals. There was a point late in 2025 when we seriously considered whether we should renew or not, but overall it got sufficiently better in December and January that we’ve decided to renew.

One of the team has two days left to go on the monthly token cycle and is at 98% of her token allowance. Most of us stay around 80%, typically by using auto mode for everything and always plan first then execute once we’ve had a detailed review of the plan. Tokens only really get used for the planning phase if the auto plan doesn’t feel redeemable, but even then we flip back to auto for the coding phase.

We are on a team plan and do use cursor rules heavily. However since the field is still evolving, its not perfect. The non-deterministic architecture of AI models makes it like a wild horse - very unpredictable.

We seem to run out of allocated credits in cursor, so we have a backup google ai pro plan for the team, where we can use antigravity and gemini cli.

Hi @nedcodes

We use Cursor on a huge mono repo, with my team

Agents, hooks, rules, and skills must be tracked in Git so you can iterate on them and easily distribute them to your team.

One caveat: depending on your Git strategy (for example, trunk-based development), your team can lag behind when you introduce new rules or update agents. We use trunk-based development, so the “time to master” for users is very short.

Humans are humans, they don’t change. You need to have your company’s agent “harness” ready, and your AI feedback loop must be solid. The AI should be able to lint, run tests, and perform changes on your codebase without encountering issues like port conflicts or missing dependencies. A localhost-first setup is a strong advantage.

You need a very clean repository where everything can be started with a single command, and where code generation parts are properly handled in each app’s startup process. Once this works, the magic happens, and adoption becomes much easier.

From a team perspective, things break quickly if your contribution workflow is not AI-friendly. If the AI cannot iterate on the project and run tests without human intervention, productivity drops. For example, frontend applications need proper end-to-end testing (like Cypress). You need strong test coverage: unit, integration, and E2E tests.

Progressively, you’ll realize that the review process on GitHub or GitLab becomes the bottleneck. You may lack reviewers, depending on your team’s experience level. Some team members who were strong at “producing” code will need to become strong reviewers and that transition is not easy for everyone.

In the long term, the review process becomes the main constraint, but it must not be bypassed. AI-generated code needs to be reviewed. You can add AI review tools like CodeRabbit, Cursor BugBot, etc., to help catch technical issues, but you will still need human review if you don’t want to put your company at risk.

2 Likes

didn’t expect this much detail, appreciate it.

@yourpropertyexpert the agents.md + devprompts setup is interesting. do you find the AI actually follows those prompts consistently, or do you still end up catching stuff in review that it should’ve known from the rules?

@Moumouls you mentioned review is the bottleneck. what are the most common things you catch that the AI got wrong? like is it ignoring specific rules, or more subtle stuff like using the wrong patterns for your codebase?

@valentinoPereira the “wild horse” thing resonates. when it doesn’t follow rules, is it random or are there specific types of rules it tends to ignore?

@nedcodes

Some of it is AI slop, but you can progressively prevent this with the right skills and rules. AI tends to duplicate code, sometimes ignores existing utilities, applies overly defensive patterns, or creates suboptimal solutions. However, AI is still subject to the same issues as humans: wrong approaches, flawed business logic, security breaches, or dangerous operations.

Like humans, AI will fail when given misleading specifications, missing context, or incomplete requirements.

The real bottleneck is that in today’s world it’s easier and faster than ever to produce code, but there is no magic on the review side. Some companies may ship unreviewed code, but the mid- and long-term consequences can be severe.

Currently, in my company, we use Cursor IDE. We start with Plan Mode to validate the approach, then let the AI implement the code. The AI iterates within our loop (mostly running tests until everything passes). Then:

  1. First review by the developer in the IDE to reduce slop and double-check the final implementation.

  2. Second automated review once the code is pushed, using tools like CodeRabbit.

  3. Once CI is green and all CodeRabbit comments are resolved, a final review is done by another human developer.

You can see two important things here:

  • The “producer” developer is accountable for the AI-generated code because they technically approve every line change inside the IDE during the local review process.

  • The human reviewer is involved only at the final stage of the PR, to avoid wasting time too early and to use this valuable resource at the right moment, especially since there is now a large volume of code to review.

Using this technique, we produce reliable, non-sloppy, industrial-grade code that is ready for critical production environments.

Last note: we never approve or use AI-generated code that we do not understand, even if the AI’s approach seems smarter than ours.

the review bottleneck point is the thing i wouldn’t have expected. everyone talks about code quality from AI being the problem but you’re saying the actual constraint is that there aren’t enough humans to review it all. which makes sense, if the team is producing 3x more PRs the review pipeline doesn’t magically scale with it.

i use cursor solo so i don’t hit this but i’ve been wondering about the rules distribution side. trunk-based makes it easy since everyone pulls from main, but what happens when someone pushes a rule change that breaks another person’s workflow? like if i update a testing rule and now the agent behaves differently for the whole team mid-sprint.

The thing is - rules is followed well by certain models - especially Claude’s and latest GPT ones. We had many developers using the Auto mode, or Composer-1, gpt-5, gemini 2.5, Grok 4 and a botched up Claude sonnet 4 - these guys always had trouble following rules.

Eg: we had convention rules for writing tsx files for react, rspec files - but these guys always overlooked it. Its only the bit more expensive models of thinking variants that actually did follow these.

So it was bit confusing what to do and what not - because Cursor was clearly a black box and we did not know whats going on. Even even you look through the forums, there are couple of times the Cursor team themselves keep suggesting a particular model that works well, but who is even going to keep track of every forum thread? It’s tedious.

+1

We are facing the review bottleneck ourselves. Developers are creating PRs more frequently and people are not able to review them. We set a team rule that we dedicate atleast 10minutes per day for code review, because each of us has a tight schedule - but that’s also not enough.

do you find the AI actually follows those prompts consistently, or do you still end up catching stuff in review that it should’ve known from the rules?

It’s pretty good most of the time.

But it seems to “wobble” early afternoon UK time.

My hypothesis, which I have no way of testing, is that “something” (model choice, context size) is being dynamically downgraded, either as a result of our recent usage, or load on the Cursor Mothership. [Early afternoon UK time being “when the Americans get into work.”]

I ought to go into a bit more detail about the standard workflow for a ticket:

  1. Get Cursor to plan in “Auto”.
  2. Review the plan in detail. We actually spend most of our human time here.
  3. Build [see below]
  4. Review and make tweaks if needed
  5. Tell it to update the agents.md / update any devprompts it used with information it learned as a result of those tweaks

Some more detail on the Build stage (since I can’t see how to do multi-level numbered lists)

  • If the plan is golden, just go to build
  • If the plan feels redeemable with a few followup prompts, do so…
  • If the plan feels completely unworkable, either start a new agent with a revised prompt, or the same prompt but picking a high-end model … once you’re happy with THAT plan, switch back to “auto” and build.

But every 3-4 weeks, we run a detailed job that is just “review the agents.md and all the devprompt to look for inconsistencies, duplication, or ways to improve them.”

One final thought: In terms of “maximising the human time”, most of us will, quite often, have a second agent in “plan” mode working on a different ticket while the previous agent is in “build mode.” Just make sure you enforce your git discipline, don’t let two builds run in the same branch, and constantly remind your junior devs that they have to pull from main before each build :slight_smile:

1 Like

@valentinoPereira the model inconsistency thing is frustrating. i’ve tested Auto and the opacity is the main problem. you can’t tell which model ran, so if one person’s getting Opus and another’s getting something weaker, the team has no way to know. that’s a coordination problem on top of a rules problem.

@yourpropertyexpert the 3-4 week agents.md review cycle is interesting. i hadn’t thought about rules drifting over time as the codebase changes. the parallel agents on different tickets makes sense for throughput but do you ever run into merge conflicts where two agents touched overlapping files?

1 Like

Not really, but I’m already used to dealing with reasonable sized teams, and in this sense the “parallel agents” issue is just an extension of the “parallel human programmers” issue.

We already have reasonably good mechanisms in place to prioritise tickets that won’t clash in this way.

Agentic programming just means that we have to consider “more tickets per day”, but therefore does put a far higher priority on a fast review cycle.

So, we have a fairly comprehensive [there’s always room for more] CI setup that checks lots of things…

… and use GCR (Gemini Code Review) automatically on every PR.

And trunk-based development is a vital part of dealing with that.

1 Like

makes sense that it maps to the same coordination problem you already solved with human teams. the GCR on every PR is smart. do you find Gemini catches different stuff than what a human reviewer would, or is it more of a first-pass filter so the human reviewers can focus on architecture decisions?

Both :slight_smile:

It’s more “picky” than I’d expect my human reviewers to be.

As a CTO I find that acceptable because it gives me the benefit of tighter review without bruising egos. That’s to say, the programmers whose code is being reviewed feel it’s being done more objectively and never a personal attack on them.

So stuff getting as far as a human reviewer tends to already be a bit high-quality than it was in the past.

But that means that the human reviewer does a first pass by considering everything that GCR has picked up on… and increasingly hardly ever finding anything else.

The effective outturn is that:

  • When the programmer submitting the code was one of my seniors, then we hardly bother to do human review once GCR has passed it.
  • That leaves the seniors more time both to write their own code, and review the code of our juniors.

The thing that gets much more detailed human review are the changes that are being made to the instructions for AIs.

Mentally, I’m a long way to thinking of prompts and those instructions as being our source… and the code that’s being written is being akin to something compiled from that… or at least some intermediate pseudo code type thing.

1 Like

if the prompts and instructions are effectively the source and the code is more like compiled output, then the review process for those instructions matters way more than reviewing individual PRs. which is kind of the opposite of how most teams think about it.

do you version control the AI instructions the same way you’d version control code? like full PR review, commit history, blame, the whole thing? or is it more informal than that?

i ask because the failure mode that worries me is someone tweaks a rule or prompt, it subtly changes behavior across the whole codebase, and nobody notices until something breaks weeks later. if the instructions are the real source of truth, they’d need the same rigor as any other critical config.