This AG-UI seems to be a collaboration between all the major players, openAI is working on it for starters.
The agent needs direct UI awareness with continuous feedback loops. Currently humans bridge the gap between agent and UI instead of directing the vision. More agent-UI interaction, less human mediation. Let’s go!
Probably both, but I was thinking in the context of I tell it to implement a feature involving a GUI and it implements it and tests its implementation in background to make sure it’s working as specified.
Yes, that is exactly what I’m thinking. Too much of my time is spent taking screenshots of the browser inspector tab relaying inspection details to cursor, this is something that can and should be automated. It took me 2 hours and about 30-50 prompts (waiting for slow mode o4-mini, and in between other chores) to line up a button with a textfield yesterday… finally told it to “strip away all css and start from scratch” that finally did it. Should not have taken so long.
I believe the BrowserTools MCP tries to solve this:
It works, but is somewhat convoluted. You need 3 separate components running for it to work.
1 - Browser extension, 2 - the MCP tools and 3 - the BrowserTools Server.
Once that’s up and running, though, development with Cursor being UI-aware is pretty seamless (except for the massive, annoying banner on ALL your browser windows informing you that BrowserTools is running, which, after performing three separate steps, you’re already acutely aware of!)
I agree! What I suggested in a cursor engineering call was at least two things:
Cursor grows the ability to actually proxy itself between the UI and the backend, e.g. cursor advertises the UI port on somehost:1234 whereas the “actual UI” is on somehost:5678 and all user interaction goes through cursor, so now it literally knows what the UI is doing and how the back-end is responding.
Cursor has a chrome extension that allows it to get access to the console debugging log information (I’m not sure if this is actually possible, for security reasons) - if this was possible, then at least it could also debug a web UI that it created or helped create in full production mode.
Just want to jump in and say this is something we are looking at internally! While we don’t have anything to add, the majority of Cursor’s features and power is around context sources and management, and adding more information about your app in executing, both the structure and behavior, is an obvious extension to move into for us I think!
the goal was simple: align the text box with the button.
for 20 prompts with o4-mini, I screenshotted the results and the browser’s inspector tab for various elements to it…and it kept failing. It wasn’t until I prompted “strip away all css, start from scratch and solve it” that it finally succeeded.
All of those 20 prompts should have happened between the browser and the agent providing a closed feedback loop that the agent can iterate off of and learn from and avoid such basic kinds of mistakes.
As well as the Dom and Box models for all parents/relevant siblings
Make a change
take a screenshot
analyze screenshot
go back to step 1 until it thinks it it solved
Web would be the easiest starting point, once that is mastered, can branch out into other UI’s iOS/Android etc.
To summarize my main point again, we programmers should be removed from this feedback loop as the AI should be able to get the information better + faster than we can. Excited to hear Cursor team is working on this, and can’t wait to try it out.
It’s a chrome extension + VSCode plugin that allows a user to select an element in the browser, have a prompt popover show up for easy input, and sends the user’s prompt with the HTML elements to the open chat editor.
It’s allowed me to be much more targeted with my edits and has saved me a considerable amount of time.
I’ve faced that problem many times, loved your comment on “strip away all css, start from scratch and solve it”. Sometimes I’ve found myself just going back on a branch to when the feature was added on, and then trying to regenerate it (“treating it like a slot machine lol”), take the code and restore to main.