Cloud agents missing cgroup memory and cpuset

zyfihx · February 27, 2026, 2:32pm

Hey ! For work we have an end to end testing environment that I wanted to have cursor cloud be able to navigate and iterate in for bugs or features. However, this relies on kubernetes tooling, and it needs some way to limit ressources for the containers it allocates. And for this, the cloud environment is missing two blocking simple changes. Here’s the recap from cursor cloud itself below.

Could you fix please fix the cloud environment to include this ? I am amazed by how good it is and would love to be avle to leverage it a lot at my company !

— Cursor cloud agent investigations below —

The Firecracker VM kernel used by Cloud Agents is missing CONFIG_MEMCG (memory cgroups) and CONFIG_CPUSETS. This prevents any Kubernetes tooling (Kind, k3d, k3s) from running, and also means Docker cannot enforce memory limits on containers.

What’s affected

Any project that uses Kubernetes for local development — via Kind, k3d, k3s, or minikube — cannot work in Cloud Agents. The kubelet requires the memory cgroup subsystem to start, and there is no userspace workaround because Go binaries (which kubelet is) make direct syscalls to /proc, bypassing any LD_PRELOAD or bind-mount tricks.

I confirmed this through extensive testing:

Kind fails: "Failed to start ContainerManager" err="system validation failed - Following Cgroup subsystem not mounted: [cpuset memory]"
k3s fails: "failed to find memory cgroup"
The kernel reports Unknown subsys name 'memory'when attempting to mount the cgroup
/proc/cgroups lists cpu, cpuacct, blkio, devices, freezer, net_cls, perf_event, net_prio, hugetlb, pids — but no memory or cpuset

Docker itself runs (with the fuse-overlayfs storage driver and iptables-legacy workarounds documented for Cloud Agents), but it cannot enforce memory limits on containers, which it warns about at startup: WARNING: No memory limit support.

The fix

Enable two kernel config options when building the Firecracker VM kernel:

CONFIG_MEMCG=y
CONFIG_CPUSETS=y

These are enabled by default in every mainstream Linux distribution (Ubuntu, Debian, Fedora, Alpine, etc.) and are part of the standard kernel configuration recommended by Docker, Kubernetes, and the OCI runtime spec.

Security implications

These are resource isolation features — they give the host more control, not less:

Memory cgroups allow capping per-container RAM usage. Without them, a runaway container can consume all VM memory unchecked. Enabling this actually improves the security posture of the Cloud Agent VM.
CPU sets allow pinning processes to specific cores. Same principle — it’s a restriction mechanism.
Neither feature adds new syscalls, network capabilities, or privilege escalation paths. They are passive accounting and limit-enforcement mechanisms.

Performance implications

Minimal:

Memory cgroups add a small per-cgroup accounting overhead (a few KB of kernel memory per container).
Boot time impact is negligible.
These have been enabled in production Linux kernels for over a decade with no meaningful performance concerns.

What this would unlock

Enabling these two kernel options would allow Cloud Agent users to:

Run Kind, k3d, k3s, and minikube clusters
Use Docker with proper memory limits (safer resource management)
Run testcontainers and other container-based test frameworks that depend on cgroup functionality
Use Kubernetes-based development workflows without workarounds

zyfihx · March 2, 2026, 11:27pm

For anyone that night have similar issue, fyi I got an answer from cursor: https://x.com/dwetterau/status/2028584743416090687?s=20

However a problem remains getting kind to work:
the kernel does have CONFIG_MEMCG and CONFIG_CPUSETScompiled in, and they’re available as cgroup v2 controllers. That was the breakthrough.

Here’s what I found after extensive testing:

What works now: After unmounting the cgroup v1 hierarchies and mounting cgroup v2 unified at /sys/fs/cgroup, I get:

$ stat -fc %T /sys/fs/cgroup/
cgroup2fs
$ cat /sys/fs/cgroup/cgroup.controllers
cpuset memory

With this, the kubelet actually starts and passes its health check — first time we got past that barrier. Progress!

Where it breaks: The API server pod fails to create because runc can’t set cpu.weight on the pod cgroup:

OCI runtime create failed: runc create failed: unable to start container process:
error setting cgroup config for procHooks process:
openat2 /sys/fs/cgroup/system.slice/kubelet-kubepods-burstable-pod.../cpu.weight:
no such file or directory

The cpu controller isn’t available in cgroup v2 — it’s only in v1. Same for pids and io. Checking /proc/cgroups shows:

cpu       6   ...  1    ← hierarchy 6 = bound to cgroup v1
pids      9   ...  1    ← hierarchy 9 = bound to cgroup v1

The VM boots with these controllers already bound to v1 hierarchies (inherited by PID 1 from the host), so they can’t migrate to v2 at runtime. Only memory and cpuset ended up in v2, presumably because they weren’t mounted as v1 at boot.

The fix: The Firecracker VM needs to boot in unified cgroup v2 mode so all controllers (cpu, memory, cpuset, pids, io) are in cgroup v2 from the start. This is typically done with the kernel boot parameter:

systemd.unified_cgroup_hierarchy=1

or by configuring the init system to not mount any cgroup v1 hierarchies at boot.

This is the standard mode for Ubuntu 22.04+, Fedora 35+, and most modern distros. It’s what Docker, Kind, k3s, and Kubernetes expect in 2026. The current boot configuration puts the VM in a hybrid v1/v2 state that breaks container runtimes because controllers are split across versions.

TL;DR: Memory and cpuset are already in v2, but cpu/pids/io are stuck in v1 from boot. Adding systemd.unified_cgroup_hierarchy=1 to the kernel cmdline would unify everything under v2 and make Kind, k3d, k3s, and testcontainers all work out of the box.

anatolebeuzon · March 6, 2026, 5:27pm

Hi, Cursor team here. Thanks for the detailed post. We just switched to unified cgroup v2 mode for new Cloud Agents going forward. Let us know if you see issues.

zyfihx · March 8, 2026, 6:57am

Hey Anatole ! Thanks a lot. Had cursor try to setup the environment for kind again. It also tried k3s. But it’s still failing unfortunately, though at a different point. It’s really too bad because the cursor cloud setup is now much better than GitHub workflows, having access and gui. But GitHub workflows come with kind support (and actually it’s installed out of the box already) and the docker daemon is also a bit easier to use. But anyway, here a re it’s new investigations:

Kubernetes-in-Docker: Deep Technical Analysis

Investigation of the Cursor Cloud VM incompatibility with Kind/K3D/K3s

1. Environment Fingerprint

Kernel:       6.12.58+ (Firecracker micro-VM)
PID 1:        /pod-daemon (custom Cursor init, not systemd)
Cmdline:      systemd.unified_cgroup_hierarchy=1
Root FS:      /dev/vda (virtio), overlay NOT supported as upperdir
cgroup:       v2 unified (nsdelegate, memory_recursiveprot)
iptables:     legacy mode (nftables modules absent)

2. Failure Cascade — Three Layers

Running Kubernetes in Docker hits three independent failure layers. Each must be resolved for pods to run.

Layer 1: cgroup subtree_control is frozen (blocks K3s agent, Kind)

Observation:

/sys/fs/cgroup/cgroup.controllers:      cpuset cpu io memory hugetlb pids
/sys/fs/cgroup/cgroup.subtree_control:  cpuset cpu pids

Six controllers exist, but only three are delegated. The memory, io, and hugetlb controllers are stuck.

Root cause: PID 1 (/pod-daemon) resides directly in the root cgroup. cgroup v2’s “no internal processes” rule prevents modifying subtree_control while any process lives directly in the cgroup. PID 1 cannot be moved (it’s the cgroup-namespace init):

# Reproducible:
echo 1 > /sys/fs/cgroup/child/cgroup.procs   # → ENOTSUP
echo "+memory" > /sys/fs/cgroup/cgroup.subtree_control   # → ENOTSUP

Impact:

K3s agent: level=fatal msg="failed to find memory cgroup (v2)" — K3s checks cgroup.controllers of its container’s cgroup, finds only cpuset cpu pids, aborts.
Kind: systemd inside node containers needs memory controller for process management.

Who sets this: The Firecracker host writes subtree_control for the VM’s cgroup before spawning /pod-daemon. The VM can never change it afterward.

Layer 2: Child cgroups are `domain invalid` (blocks runc container creation)

Observation:

mkdir /sys/fs/cgroup/test_child
cat /sys/fs/cgroup/test_child/cgroup.type       # → domain invalid
echo $PID > /sys/fs/cgroup/test_child/cgroup.procs   # → ENOTSUP
echo "+cpu" > /sys/fs/cgroup/test_child/cgroup.subtree_control   # → ENOTSUP

New child cgroups start as domain invalid and are completely non-functional. Neither cgroup.procs (domain operation) nor cgroup.subtree_control works. The only way to make a child cgroup usable is to set it to threaded mode:

echo "threaded" > /sys/fs/cgroup/test_child/cgroup.type   # succeeds
echo $PID > /sys/fs/cgroup/test_child/cgroup.threads       # succeeds

This is why Docker’s cgroupfs driver creates all container cgroups as threaded — it’s the only functional path. But threaded cgroups cannot run Kubernetes components.

Impact on runc (container creation):
When K3s kubelet schedules a pod, runc tries to create a domain cgroup for the container:

runc create failed: unable to apply cgroup configuration:
  cannot enter cgroupv2 "/sys/fs/cgroup/k8s.io" with domain controllers
  -- it is in an invalid state: unknown

This is the final blocker — even when the K3s control plane and kubelet are running, no pod containers can actually be created.

Impact on Kind: systemd inside Kind node containers needs domain-type cgroups with cgroup.procs support. In a threaded cgroup, cgroup.procs returns ENOTSUP:

Failed to create /init.scope control group: Structure needs cleaning

Layer 3: Missing kernel modules (blocks CNI networking)

Observation:

sudo iptables -t nat -A OUTPUT -j ACCEPT -m comment --comment "test"
# → Warning: Extension comment revision 0 not supported, missing kernel module?
# → iptables: No chain/target/match by that name.

The xt_comment iptables module is missing from the kernel. This breaks every CNI plugin that uses iptables NAT rules with comments (bridge, flannel, calico all use this).

Additionally:

vxlan kernel module: absent (flannel VXLAN backend fails)
overlay filesystem: present in /proc/filesystems but the root filesystem doesn’t support it as upperdir (likely tmpfs or similar)

3. Approaches Tested

#	Approach	Result	Failure Point
1	Kind v0.22.0 (default)		systemd can’t create /init.scope (Layer 2)
2	Kind v0.27.0 (latest)		Same Layer 2 failure
3	K3D (K3s in Docker)		K3s fatal: no memory cgroup (Layer 1+2)
4	Mount cgroup v1 (`mount -t cgroup -o memory`)		`dmesg: cgroup: Unknown subsys name 'memory'` — v1 memory disabled at kernel level
5	Mount fresh cgroup v2		Same hierarchy, same frozen subtree_control
6	unshare --cgroup (new namespace)		Same hierarchy visible, same limitations
7	Docker cgroupns=private		Container sees threaded cgroup, memory absent
8	Docker cgroupns=host		Container sees host root, still can’t modify subtree_control
9	K3s in Docker with kubelet bypass flags		`failed to evacuate root cgroup` (threaded cgroup.procs ENOTSUP)
10	K3s direct on host, --disable-agent		Control plane works! API server, scheduler, controller-manager all run
11	K3s direct on host, full (with agent)	Partial	Kubelet starts, node registers as Ready, but runc can’t create pod containers (Layer 2)
12	Fake cgroup via tmpfs	Not viable	Would need to fake the entire cgroup v2 API surface

Best result achieved:

$ sudo k3s server --snapshotter=native --flannel-backend=none \
    --disable=traefik,servicelb,metrics-server \
    --kubelet-arg=cgroups-per-qos=false \
    --kubelet-arg=enforce-node-allocatable="" \
    --write-kubeconfig-mode=644

$ kubectl get nodes
NAME     STATUS   ROLES                  AGE   VERSION
cursor   Ready    control-plane,master   33s   v1.31.5+k3s1

$ kubectl get pods -A
NAMESPACE     NAME                     READY   STATUS              RESTARTS  AGE
kube-system   coredns-...              0/1     ContainerCreating   0         28s

Node is Ready. Pods are scheduled but runc can’t create containers because of the domain invalid cgroup issue (Layer 2).

4. What The Host Must Change

Change 1 (P0): Delegate all controllers in subtree_control

Before spawning /pod-daemon, the host must write:

echo "+cpuset +cpu +io +memory +hugetlb +pids" \
  > /sys/fs/cgroup/<vm-cgroup>/cgroup.subtree_control

Why: This is the single configuration line that determines which controllers child cgroups can use. It must be set before any process is placed in the cgroup (before PID 1 starts), because once PID 1 exists in the root, the “no internal processes” rule prevents modification.

Implications:

Memory accounting overhead: ~1-2% additional memory usage per cgroup (kernel tracks RSS, cache, swap per cgroup)
IO accounting overhead: minimal, kernel already tracks block IO
No security downside: these controllers only enable resource visibility and limits, they don’t grant capabilities

Change 2 (P0): Ensure domain-type child cgroups work

The current behavior where child cgroups are domain invalid and can’t accept processes needs investigation. This may be caused by:

Insufficient controller delegation: if subtree_control doesn’t delegate memory, the kernel may refuse domain operations on children. Fixing Change 1 may automatically fix this.
Root cgroup type domain threaded: this type allows both domain and threaded children, but in practice only threaded works. The host may need to either:
- Create the VM’s cgroup in pure domain mode (not domain threaded)
- Or pre-create a child cgroup in domain mode and set it as the cgroup namespace root

Verification after fix:

mkdir /sys/fs/cgroup/test
cat /sys/fs/cgroup/test/cgroup.type   # Should be: domain (or domain invalid → domain after accepting proc)
echo $$ > /sys/fs/cgroup/test/cgroup.procs   # Should succeed
rmdir /sys/fs/cgroup/test

Change 3 (P1): Enable xt_comment and VXLAN kernel modules

The kernel needs these modules for Kubernetes networking:

xt_comment: required by every major CNI plugin for iptables rule management
vxlan: required by Flannel VXLAN backend (the most common K8s network)

Either:

Compile these into the kernel (CONFIG_NETFILTER_XT_MATCH_COMMENT=y, CONFIG_VXLAN=y)
Or load them as modules if the kernel supports module loading (currently nomodule is in the kernel cmdline)

Workaround if modules can’t be added: Use CNI plugins that don’t need iptables comments (e.g., ptp without ipMasq), or use host-gw networking instead of VXLAN. This is fragile.

Change 4 (P1): Enable overlay2 filesystem support

dmesg: overlay: filesystem on /upper not supported as upperdir

The root filesystem doesn’t support overlay mounts. This forces Docker and containerd to use fuse-overlayfs (userspace, slow) or native (copies all layers, very slow and disk-heavy).

Fix: Use ext4 or xfs for the VM’s root filesystem with d_type support enabled. The kernel already has CONFIG_OVERLAY_FS compiled in.

Change 5 (P2): Consider systemd as PID 1

With systemd as PID 1:

Docker can use the systemd cgroup driver (cleaner cgroup management)
cgroup delegation works via Delegate=yes in systemd unit files
The domain threaded vs threaded issue may not arise

This is a significant architectural change and may not be desired. It’s not strictly necessary if Changes 1-2 are implemented.

5. What Would Work TODAY (Without Host Changes)

K3s control plane only (no pods)

k3s server --disable-agent --write-kubeconfig-mode=644

This gives you a working Kubernetes API server for testing manifests, RBAC, CRDs, etc. No pods will actually run.

Docker containers directly (current approach)

Docker itself works because it uses threaded cgroups. All services that can run as Docker containers (which is everything in Big Boom Phase 1) work perfectly.

6. Summary

Failure	Layer	Fix Owner	Fix Complexity	Impact
memory controller not delegated	cgroup subtree_control	Host provisioning	Low (one line change)	Unblocks K3s memory check
domain invalid cgroups unusable	cgroup hierarchy	Host provisioning	Low-Medium (may be automatic with ^)	Unblocks runc container creation
xt_comment module missing	Kernel config	Kernel build	Medium	Unblocks CNI networking
overlay2 unsupported	Filesystem	VM image	Medium	Enables fast container storage
vxlan module missing	Kernel config	Kernel build	Medium	Enables Flannel VXLAN
systemd not PID 1	VM architecture	VM image	High	Cleaner cgroup management

Minimal viable fix: Changes 1 + 3 (delegate all controllers + enable xt_comment module). This would likely enable K3s running directly on the host with basic networking.

Full fix: Changes 1 + 2 + 3 + 4 would enable both Kind and K3D with full functionality.

zyfihx · March 26, 2026, 11:25pm

From re testing this again, it seems the main issues are now solved: we can now get memory controller and usable cgroups in general.

Still, overlay2 is not supported which is problematic as that is the default and without it, some containerd systems like whiteout does not work in the fuse overlay fallback.

xt_comment is missing too, which is failing some kind parts still. But I was now able to mostly get a kind cluster running, just with temporary hacks here and there.

zyfihx · March 26, 2026, 11:59pm

Cursor mentions:

Update the cloud-agent environment so Docker can use overlay2 on a non-overlay-backed data root. Investigation showed the kernel supports overlayfs, and an isolated dockerdusing overlay2 works when its data root is placed on tmpfs, but fails with failed to mount overlay: invalid argument when Docker’s data root lives on the current overlay-backed root filesystem. Please change the base environment so /var/lib/docker (or Docker’s configured data root) is on a real backing filesystem suitable for overlay2, instead of the current overlay-backed rootfs.

joshma · April 3, 2026, 3:44am

Hey everyone! We’ve shipped a fix to include memory, io, and hugetlb controllers in subtree_control, which should now allow running kind and other kubernetes dev tools. There’s still more ways to go to make running containers less painful, but hopefully this unblocks some of you.

Give it a spin and let us know what you think!

ConallOGribin · May 9, 2026, 9:32am

@joshma

I tried setting up kind in a new environment (I hit “Start fresh”)
kind partially works but Service networking is broken, making it mostly unusable.

Without the ability to run a kube-proxy, there is no Service (ClusterIP/NodePort) networking, so CoreDNS can’t reach the API server, DNS doesn’t work, and no real workloads can run.

What works

Capability	Status
Docker with `fuse-overlayfs`	Works
cgroup v2 controllers (cpu, memory, io, hugetlb, pids)	Works (after moving root procs to `init.scope`)
`kind create cluster`	Succeeds
Node reaches `Ready` state	Yes
Pod execution (busybox test pod)	Works — printed output correctly
iptables IPv4 filter/nat/mangle/raw tables	Work
kindnet CNI (pod-to-pod networking)	Works

What’s broken

Capability	Status	Impact
`xt_comment` iptables match	Missing from kernel, can’t load	kube-proxy cannot create any rules
nftables (`add table ip kube-proxy`)	“Operation not supported”	nftables proxy mode fails too
BTF (`/sys/kernel/btf/vmlinux`)	Missing	Cilium eBPF can’t replace kube-proxy
ip6tables `raw` table	Missing	Docker networking needs `"ip6tables": false` workaround

the setup attempt was:

https://cursor.com/agents/bc-a0e30855-74c7-4661-b6bd-59db30445f37?app=desktop

would appreciate some support/clarity on whether this is as far as we can get today.

ConallOGribin · May 31, 2026, 8:11am

bump.

any guidance on this’d be great

@joshma @Tyler

mohitjain · June 1, 2026, 9:29am

You’re right that Service networking is the current gap.

To set expectations: running a full Kubernetes cluster with Service networking inside Cloud Agents is not something we currently support. The VM environment supports Docker containers (with the documented fuse-overlayfs and iptables-legacy workarounds), but the kernel doesn’t include the modules needed for kube-proxy or eBPF-based alternatives like Cilium.

What works today:

Docker containers (with workarounds documented here)
kind cluster creation + node Ready + basic pod execution
Pod-to-pod networking via kindnet CNI

What doesn’t work:

kube-proxy (needs xt_comment + nftables)
Cilium (needs BTF)
Full Service/ClusterIP/NodePort networking

We’re aware this is a limitation and it’s something we’d like to improve over time, but I don’t have a timeline to share. For workflows that need full Kubernetes networking today, a self-hosted environment or CI system with kind pre-installed would be the way to go.

Thanks for the detailed report — it’s helpful for tracking exactly what’s needed.

ConallOGribin · June 1, 2026, 12:34pm

Thanks for confirming, it’s appreciated.

Topic		Replies	Views
Cloud Agents with Computer Use Release Discussions sandbox , browser , review , cloud-agents	35	1953	March 10, 2026
Agent shell uses stale Node version vs active terminal Bug Reports terminal , devcontainers , linear-linked	7	181	June 3, 2026
Headless agent -p ignores sandbox.json and generates helper policy with disableTmpWrite:false Bug Reports sandbox , cli , linux	6	115	May 6, 2026
Dev container remotes don't work with Glass Bug Reports extensions , ssh , devcontainers , agents-window	22	793	June 4, 2026
Cursor Cloud Agent Not Shown in Agents Dashboard Bug Reports cloud-agents , cli	15	351	April 5, 2026

Cloud agents missing cgroup memory and cpuset

What’s affected

The fix

Security implications

Performance implications

What this would unlock

Kubernetes-in-Docker: Deep Technical Analysis

Investigation of the Cursor Cloud VM incompatibility with Kind/K3D/K3s

1. Environment Fingerprint

2. Failure Cascade — Three Layers

Layer 1: cgroup subtree_control is frozen (blocks K3s agent, Kind)

Layer 2: Child cgroups are domain invalid (blocks runc container creation)

Layer 3: Missing kernel modules (blocks CNI networking)

3. Approaches Tested

Best result achieved:

4. What The Host Must Change

Change 1 (P0): Delegate all controllers in subtree_control

Change 2 (P0): Ensure domain-type child cgroups work

Change 3 (P1): Enable xt_comment and VXLAN kernel modules

Change 4 (P1): Enable overlay2 filesystem support

Change 5 (P2): Consider systemd as PID 1

5. What Would Work TODAY (Without Host Changes)

K3s control plane only (no pods)

Docker containers directly (current approach)

6. Summary

What works

What’s broken

Related topics

Layer 2: Child cgroups are `domain invalid` (blocks runc container creation)