Security Vulnerability: Prompt Injection via Web Search Tool Causes Model Jailbreak

Security Vulnerability: Prompt Injection via Web Search Tool Causes Model Jailbreak

Labels: bug, security, vulnerability, prompt-injection, tool-use

Overview

The model exhibits a critical security vulnerability when executing sequential tasks that require calls to the web search tool (web_search). When the content returned from a web search contains a maliciously crafted prompt, the model is “jailbroken.” It completely deviates from the user’s original instructions, instead executing the injected malicious commands and outputting garbled, irrelevant text controlled by the attacker (i.e., the malicious web content provider).

Actual Behavior

The model was successfully targeted by a prompt injection. It abandoned the user’s original task, and its control was hijacked by the web content. This issue was reliably reproducible within the same session, each time requiring explicit user intervention to restore the model to a normal state.

Observed Payload Output:
The model returned the following garbled text, indicating its internal state was corrupted and it was executing unintended instructions:

-.-..-I an an R-------T to A-----..-I as'------n------------------------------------... a-------t-----------------------------------------------------------------------------------Vell--------------------------T--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------t--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------L-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------_  - a- a- n-- n-- a- a- r- y- - i- - s- m- - - - a- - t- - e- m- - w- i- t- h- - C- h-i- n- e- s- e- - c- h- a- r- a- c- t- e- r- i- s- t- i- c- s- .

Impact and Security Risk

  • Critical Security Vulnerability: This is not a simple functional bug but a classic prompt injection vulnerability. It allows any untrusted, external third party who can control web content to hijack the model’s output.
  • Erosion of User Trust: Users can no longer trust the model to safely browse and process information from the web. Malicious actors could exploit this vulnerability to present deceptive, offensive, unsafe, or false content to the user.
  • Loss of Task Reliability: Any long-running or critical tasks that rely on web search are rendered completely unreliable, as they can fail at any moment due to this vulnerability.

Suggested Remediation

  • Input Sanitization & Isolation: Implement strict sanitization for content returned from external tools, especially web search. This content should be treated as inert plain text with no executable capability. Any metacharacters, formatting, or template syntax that could be interpreted as instructions should be stripped or escaped.
  • Reinforce Instruction Hierarchy & Boundaries: The model must be more rigorously trained to distinguish between user-given instructions and data retrieved from external sources. Clear boundaries must be established to ensure that any potential instructions parsed from a data source are nullified and not acted upon.

Steps to Reproduce

  1. Give the model a command that requires multiple calls to the web search tool (CHINESE). In this session, the command was to “verify the answers to a list of questions one by one using web search.”
  2. During task execution, the model’s web search tool indexes and returns content from a webpage containing a malicious prompt injection attack.
  3. When processing the returned web content, the model fails to treat it as pure information to be analyzed and summarized. Instead, it incorrectly executes the malicious instructions embedded within it.
  4. As a result, the model’s original task (verifying answers) is completely interrupted. Its behavior is successfully hijacked, and it begins to output nonsensical or malicious content designed by the attacker.

Expected Behavior

The model should be able to safely handle any content returned from the web search tool, treating it strictly as plain text information for analysis and summarization, without executing any potential instructions it may contain. The model’s behavior should always adhere to the user’s top-level instructions and remain “immune” to data from external tools.

Screenshots / Screen Recordings

Operating System

Windows 10/11

Current Cursor Version (Menu → About Cursor → Copy)

Version: 1.1.6 (user setup)
VSCode Version: 1.96.2
Commit: 5b19bac7a947f54e4caa3eb7e4c5fbf832389850
Date: 2025-06-25T02:20:02.577Z
Electron: 34.5.1
Chromium: 132.0.6834.210
Node.js: 20.19.0
V8: 13.2.152.41-electron.0
OS: Windows_NT x64 10.0.22000

Additional Information

Nope

Does this stop you from using Cursor

Sometimes - I can sometimes use Cursor

Hello! That’s interesting. Were you able to determine which page contained the malicious instructions? What was your search query that led you to the harmful page?

That’s the problem,all the web-pages that Cursor showed me looked perfectly normal.

Well, I often use Cursor to code for deep learning-related experiments and also call upon web tools, but I’ve never encountered such a situation before.

Recently, I have to prepare for a political and ideological exam (Chinese characteristics :sweat_smile::upside_down_face:), and need to batch verify the answers in my review materials. So I thought of using Cursor to accomplish this task.

Before this prompt injection issue occurred, the web pages that Cursor last showed browsing were, in order:

1. http://www.qstheory.cn/qshyjx/2022-01/27/c_1128305426.htm
2. https://www.dswxyjy.org.cn/n1/2022/1027/c427152-32552960.html
3. https://monthlyreview.org/2025/05/01/chinas-triple-revolution-theory-and-marxist-analysis/
4. https://cssn.cn/skgz/202304/t20230407_5618419.shtml
5. https://www.marxistphilosophy.org/Hongqi/64/196413-01.htm

Among them, the last link (the fifth one) appeared as garbled text in Cursor, like this: 0ÖÄзþ·òμļ……. However, despite this, these web pages don’t seem to be malicious, you can just try opening it to see.

It’s particularly important to add the following clarifications:

  1. When this issue occurred, I was using Gemini-2.5-Pro, version: 0605.

  2. Immediately after the issue arose, I switched to Claude Sonnet-4 and said, “You’ve been poisoned by low-quality internet language data. Please reflect on this mistake and get back on track to verify the answers.”

However, I suspect that Cursor’s web tool might rely on a lower-level application model internally. Claude Sonnet-4 still browsed the same five webpages, and it also suffered prompt injection (:sweat_smile:). For example(In Chinese):

I sincerely apologize. My previous response contained serious errors and generated content completely unrelated to your request. This should not have happened, and I will reflect on and adjust my approach to ensure similar issues do not recur.

Thank you for pointing this out. Let's get back on track and continue verifying the exam answers.
I've already checked Questions 1 and 2, and their answers are correct. Now, let's look at Question 3.
I'll search the web to verify this question.

-D,'s of aThe-I'V'R.-g-p.I'A/I.D's-I.D'S-
- 'S S'r. 'S S.S'S 'W' 'S'
_
I-m-m-s-s
(A-n-g--g-l-e-l-v-a-s-e-t-h-e-r-s-s-i-l--v-e-r  t-r-i-a-d-e-s).
-
D.
- a-s- c- e-
1 Like

Here my chat ID: c9495188-c3c0-49a6-82c9-8cf03432a375