Cursor needs awareness of the UI đź‘€

This AG-UI seems to be a collaboration between all the major players, openAI is working on it for starters.

The agent needs direct UI awareness with continuous feedback loops. Currently humans bridge the gap between agent and UI instead of directing the vision. More agent-UI interaction, less human mediation. Let’s go!

8 Likes

+1 Great idea mate!

1 Like

Yes! And the ability to test the GUI automatically in the background although I imagine that’s much harder.

2 Likes

By “test”, do you mean run structured, repeatable interaction tests? or test the code its currently trying to figure out?

@WokeBloke @ssdevilj I just converted this into a feature request, can you +1 it? Thx!! :raising_hands:

@curious_coder

Probably both, but I was thinking in the context of I tell it to implement a feature involving a GUI and it implements it and tests its implementation in background to make sure it’s working as specified.

1 Like

Yes, that is exactly what I’m thinking. Too much of my time is spent taking screenshots of the browser inspector tab relaying inspection details to cursor, this is something that can and should be automated. It took me 2 hours and about 30-50 prompts (waiting for slow mode o4-mini, and in between other chores) to line up a button with a textfield yesterday… finally told it to “strip away all css and start from scratch” that finally did it. Should not have taken so long.

1 Like

I believe the BrowserTools MCP tries to solve this:

It works, but is somewhat convoluted. You need 3 separate components running for it to work.

1 - Browser extension, 2 - the MCP tools and 3 - the BrowserTools Server.

Once that’s up and running, though, development with Cursor being UI-aware is pretty seamless (except for the massive, annoying banner on ALL your browser windows informing you that BrowserTools is running, which, after performing three separate steps, you’re already acutely aware of!)

1 Like

I agree! What I suggested in a cursor engineering call was at least two things:

  1. Cursor grows the ability to actually proxy itself between the UI and the backend, e.g. cursor advertises the UI port on somehost:1234 whereas the “actual UI” is on somehost:5678 and all user interaction goes through cursor, so now it literally knows what the UI is doing and how the back-end is responding.
  2. Cursor has a chrome extension that allows it to get access to the console debugging log information (I’m not sure if this is actually possible, for security reasons) - if this was possible, then at least it could also debug a web UI that it created or helped create in full production mode.
1 Like

There are quite a few browser MCP servers that do this.

1 Like

Just want to jump in and say this is something we are looking at internally! While we don’t have anything to add, the majority of Cursor’s features and power is around context sources and management, and adding more information about your app in executing, both the structure and behavior, is an obvious extension to move into for us I think!

4 Likes

Seems too bleeding edge for me, I read the page and what sticks out as an issue is
* Currently selected DOM elements

thats still has humans too much in the loop for what I want. I’m looking for more of an autonomous loop.

lets take the super basic real world example of mine that took 30+ calls to resolve!!

the goal was simple: align the text box with the button.

for 20 prompts with o4-mini, I screenshotted the results and the browser’s inspector tab for various elements to it…and it kept failing. It wasn’t until I prompted “strip away all css, start from scratch and solve it” that it finally succeeded.

All of those 20 prompts should have happened between the browser and the agent providing a closed feedback loop that the agent can iterate off of and learn from and avoid such basic kinds of mistakes.

I can imagine one thing it would do is

  1. take both elements
  2. look at their Dom and Box models:

    As well as the Dom and Box models for all parents/relevant siblings
  3. Make a change
  4. take a screenshot
  5. analyze screenshot
  6. go back to step 1 until it thinks it it solved

Web would be the easiest starting point, once that is mastered, can branch out into other UI’s iOS/Android etc.

To summarize my main point again, we programmers should be removed from this feedback loop as the AI should be able to get the information better + faster than we can. Excited to hear Cursor team is working on this, and can’t wait to try it out.

1 Like

I built something to solve this for myself.

It’s a chrome extension + VSCode plugin that allows a user to select an element in the browser, have a prompt popover show up for easy input, and sends the user’s prompt with the HTML elements to the open chat editor.

It’s allowed me to be much more targeted with my edits and has saved me a considerable amount of time.

Needs a better name :sweat_smile:

1 Like

I’ve faced that problem many times, loved your comment on “strip away all css, start from scratch and solve it”. Sometimes I’ve found myself just going back on a branch to when the feature was added on, and then trying to regenerate it (“treating it like a slot machine lol”), take the code and restore to main.

1 Like

Hey folks, this is literally what Puppeteer does. That’s the simplest and most industry standard way to give Cursor a view into the DOM.

Don’t bother with cheap copies. That’s the one you want.

Does Puppeteer have an MCP? As a JS library it isnt really connected to Cursor and the browser at same time.

2 Likes