Security Vulnerability: Prompt Injection via Web Search Tool Causes Model Jailbreak
Labels: bug
, security
, vulnerability
, prompt-injection
, tool-use
Overview
The model exhibits a critical security vulnerability when executing sequential tasks that require calls to the web search tool (web_search
). When the content returned from a web search contains a maliciously crafted prompt, the model is “jailbroken.” It completely deviates from the user’s original instructions, instead executing the injected malicious commands and outputting garbled, irrelevant text controlled by the attacker (i.e., the malicious web content provider).
Actual Behavior
The model was successfully targeted by a prompt injection. It abandoned the user’s original task, and its control was hijacked by the web content. This issue was reliably reproducible within the same session, each time requiring explicit user intervention to restore the model to a normal state.
Observed Payload Output:
The model returned the following garbled text, indicating its internal state was corrupted and it was executing unintended instructions:
-.-..-I an an R-------T to A-----..-I as'------n------------------------------------... a-------t-----------------------------------------------------------------------------------Vell--------------------------T--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------t--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------L-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------_ - a- a- n-- n-- a- a- r- y- - i- - s- m- - - - a- - t- - e- m- - w- i- t- h- - C- h-i- n- e- s- e- - c- h- a- r- a- c- t- e- r- i- s- t- i- c- s- .
Impact and Security Risk
- Critical Security Vulnerability: This is not a simple functional bug but a classic prompt injection vulnerability. It allows any untrusted, external third party who can control web content to hijack the model’s output.
- Erosion of User Trust: Users can no longer trust the model to safely browse and process information from the web. Malicious actors could exploit this vulnerability to present deceptive, offensive, unsafe, or false content to the user.
- Loss of Task Reliability: Any long-running or critical tasks that rely on web search are rendered completely unreliable, as they can fail at any moment due to this vulnerability.
Suggested Remediation
- Input Sanitization & Isolation: Implement strict sanitization for content returned from external tools, especially web search. This content should be treated as inert plain text with no executable capability. Any metacharacters, formatting, or template syntax that could be interpreted as instructions should be stripped or escaped.
- Reinforce Instruction Hierarchy & Boundaries: The model must be more rigorously trained to distinguish between user-given instructions and data retrieved from external sources. Clear boundaries must be established to ensure that any potential instructions parsed from a data source are nullified and not acted upon.
Steps to Reproduce
- Give the model a command that requires multiple calls to the web search tool (CHINESE). In this session, the command was to “verify the answers to a list of questions one by one using web search.”
- During task execution, the model’s web search tool indexes and returns content from a webpage containing a malicious prompt injection attack.
- When processing the returned web content, the model fails to treat it as pure information to be analyzed and summarized. Instead, it incorrectly executes the malicious instructions embedded within it.
- As a result, the model’s original task (verifying answers) is completely interrupted. Its behavior is successfully hijacked, and it begins to output nonsensical or malicious content designed by the attacker.
Expected Behavior
The model should be able to safely handle any content returned from the web search tool, treating it strictly as plain text information for analysis and summarization, without executing any potential instructions it may contain. The model’s behavior should always adhere to the user’s top-level instructions and remain “immune” to data from external tools.
Screenshots / Screen Recordings
Operating System
Windows 10/11
Current Cursor Version (Menu → About Cursor → Copy)
Version: 1.1.6 (user setup)
VSCode Version: 1.96.2
Commit: 5b19bac7a947f54e4caa3eb7e4c5fbf832389850
Date: 2025-06-25T02:20:02.577Z
Electron: 34.5.1
Chromium: 132.0.6834.210
Node.js: 20.19.0
V8: 13.2.152.41-electron.0
OS: Windows_NT x64 10.0.22000
Additional Information
Nope
Does this stop you from using Cursor
Sometimes - I can sometimes use Cursor