How to rewrite Prompts for better efficiency

A common issue of LLM prompting can be solved by rewriting your prompts,
based on a recent scientific paper, I created an XML rule using their prompt that will activate everytime you prompt “rewrite this prompt”, just add it as an MDC file or add to context:

---
description: MUST activate when any of these trigger words are seen: rewrite this prompt.
globs: 
---
Priority: High
Instructions: MUST follow <prompt_rewriting_rules> for rewriting the prompt:

<?xml version="1.0" encoding="UTF-8"?>
<prompt_rewriting_rules version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <metadata>
    <author>LLM Architecture Team</author>
    <created>2025-03-25</created>
    <purpose>Query analysis and improvement</purpose>
    <application-boundary>
      <limit>Applies to user query analysis and rewriting</limit>
      <limit>Does not modify conversational response generation</limit>
      <limit>Preserves original user intent</limit>
    </application-boundary>
  </metadata>
  
  <objective priority="critical">
    <definition>Analyze and improve user queries while preserving intent</definition>
    <requirements>
      <requirement type="preservation">Maintain original user goals</requirement>
      <requirement type="analysis">Identify improvement opportunities</requirement>
      <requirement type="generation">Propose effective rewrites</requirement>
    </requirements>
  </objective>
  
  <analysis_process priority="high">
    <phase name="InputEvaluation" sequence="1">
      <step name="QueryExamination">
        <action>Analyze user query for clarity, specificity, and completeness</action>
        <evaluation_criteria>
          <criterion type="clarity">Clear communication of needs</criterion>
          <criterion type="specificity">Sufficient detail level</criterion>
          <criterion type="context">Proper utilization of conversation history</criterion>
        </evaluation_criteria>
      </step>
      <step name="ModificationDetermination">
        <action>Decide if query requires improvement</action>
        <output format="boolean">YES or NO response</output>
      </step>
    </phase>
    
    <phase name="ImprovementIdentification" sequence="2" conditional="ModificationDetermination=YES">
      <step name="AspectIdentification">
        <action>List specific aspects requiring improvement</action>
        <aspects>
          <aspect type="clarity">Communication clarity</aspect>
          <aspect type="specificity">Detail sufficiency</aspect>
          <aspect type="structure">Query organization</aspect>
          <aspect type="relevance">Contextual alignment</aspect>
        </aspects>
      </step>
    </phase>
    
    <phase name="EffectivenessIdentification" sequence="2" conditional="ModificationDetermination=NO">
      <step name="StrengthIdentification">
        <action>Identify effective aspects of the query</action>
        <strengths>
          <strength type="clarity">Clear communication</strength>
          <strength type="specificity">Appropriate detail</strength>
          <strength type="structure">Logical organization</strength>
          <strength type="relevance">Contextual relevance</strength>
        </strengths>
      </step>
    </phase>
  </analysis_process>
  
  <rewriting_protocol priority="critical">
    <generation_rules>
      <rule name="IntentPreservation">
        <definition>Maintain user's original goal and intent</definition>
        <implementation>
          <strategy>Compare original query purpose with rewrite</strategy>
          <strategy>Preserve key question elements</strategy>
          <strategy>Maintain task-specific instructions</strategy>
        </implementation>
      </rule>
      
      <rule name="ContextIntegration">
        <definition>Incorporate relevant conversational history</definition>
        <implementation>
          <strategy>Reference established concepts from prior exchanges</strategy>
          <strategy>Avoid redundant information already provided</strategy>
          <strategy>Discard irrelevant historical context</strategy>
        </implementation>
        <validation>
          <check type="relevance">Context must relate to current query topic</check>
          <check type="recency">Prioritize recent conversational turns</check>
        </validation>
      </rule>
      
      <rule name="ClarityEnhancement">
        <definition>Improve communication clarity</definition>
        <implementation>
          <strategy>Restructure ambiguous phrasing</strategy>
          <strategy>Add explicit structure when beneficial</strategy>
          <strategy>Remove unnecessary verbosity</strategy>
        </implementation>
      </rule>
    </generation_rules>
    
    <rewrite_ordering>
      <criterion priority="1">Likelihood of matching user intent</criterion>
      <criterion priority="2">Minimal assumption introduction</criterion>
      <criterion priority="3">Clarity improvement magnitude</criterion>
    </rewrite_ordering>
  </rewriting_protocol>
  
  <assumption_management priority="high">
    <evaluation_required>YES or NO determination</evaluation_required>
    <assumption_attributes>
      <attribute name="salience" values="HIGH,MID,LOW">
        <definition>Importance to query effectiveness</definition>
      </attribute>
      <attribute name="plausibility" values="HIGH,MID,LOW">
        <definition>Likelihood of user agreement</definition>
      </attribute>
    </assumption_attributes>
    <documentation_format>
      <table columns="assumption,salience,plausibility"/>
    </documentation_format>
  </assumption_management>
  
  <output_template priority="critical">
    <section name="ModificationRequired">
      <question>Does the Query need modification?</question>
      <format>YES or NO response</format>
    </section>
    
    <section name="AnalysisReasoning" conditional="true">
      <condition applies-to="ModificationRequired=YES">
        <content>Specific aspects requiring improvement</content>
      </condition>
      <condition applies-to="ModificationRequired=NO">
        <content>Effective aspects of the query</content>
      </condition>
    </section>
    
    <section name="ProposedRewrites">
      <format>Numbered list of rewrites</format>
      <ordering>Most to least likely effective</ordering>
    </section>
    
    <section name="AssumptionsRequired">
      <question>Does the rewrite require assumptions not present in Query or Conversational History?</question>
      <format>YES or NO response</format>
    </section>
    
    <section name="Assumptions" conditional="AssumptionsRequired=YES">
      <format>Markdown table with columns: assumption, salience, plausibility</format>
      <values for="salience">HIGH, MID, LOW</values>
      <values for="plausibility">HIGH, MID, LOW</values>
    </section>
  </output_template>
  
  <input_structure>
    <component name="ConversationalHistory" required="false">
      <description>Previous exchanges providing context</description>
      <processing>
        <instruction>Use if relevant to current query topic</instruction>
        <instruction>Discard if about different task or topic</instruction>
      </processing>
    </component>
    
    <component name="Query" required="true">
      <description>Current user question or instruction</description>
      <processing>
        <instruction>Analyze for improvement opportunities</instruction>
        <instruction>Preserve original intent when rewriting</instruction>
      </processing>
    </component>
  </input_structure>
  
  <compliance_validation>
    <validation xpath="count(//section) = 5" message="All output sections must be present"/>
    <validation xpath="every $v in //values/@for satisfies $v = 'salience' or $v = 'plausibility'"
               message="Value attributes must be either salience or plausibility"/>
    <validation xpath="count(//rule) >= 3" message="At least three rewriting rules required"/>
  </compliance_validation>
</prompt_rewriting_rules>

Example:

rewrite this prompt: Create a pacman game in Python

Result:

[…]

Proposed Rewrites

  1. “Create a Pac-Man game in Python with the following specifications:
    Include core gameplay mechanics (player movement, ghost AI, pellet collection, power-ups)
    Use Pygame for graphics and input handling
    Implement at least 1 level with proper scoring system
    Include basic sound effects and visuals
    Provide clear code organization with comments”

  2. “I’m learning Python and want to create a simple Pac-Man clone. Please provide:
    Step-by-step implementation guide using Pygame
    Core game mechanics code (movement, collision detection)
    Simplified ghost AI explanation
    Basic UI implementation
    Suggestions for extending the game after completing the basic version”

  3. “Develop a Python implementation of Pac-Man with:
    Object-oriented design
    Multi-level support
    Customizable maze generation
    Score tracking and persistence
    Please explain any complex algorithms used, particularly for ghost behavior”

[…]

Original prompt to use in mid-conversations to use less context,
paper shows even better results as it takes chat history(not tested):

Goal: Given a user’s query and their
conversational history with an AI Chatbot,
your task is to identify the aspects in which
the query can be improved or if it’s already
optimal, identify the aspects in which it is
already effective. To do so, first analyze the
query for aspects of improvement or describe
aspects that are already effective. Then,
propose a list of one or more possible rewrites
that communicates the user’s needs and goals
more effectively as an input to an AI Chatbot
while keeping the user intent intact. Be
careful not to change the goal or the intent
of the user when you propose a rewrite keeping
in mind the Conversational History. For each
rewrite, if you have to add any new information
that is not present in the Conversational
History to make the query better, list the
assumptions you need to make.
Task: Given a user Query, your task is
to output the following:
First, output whether or not the Query needs
modification for eliciting an effective
response from an AI Chatbot. If it’s a good
query and doesn’t need any modification at all,
output NO MOD. If it requires some modification,
output SOME MOD. If the Query requires to be
heavily rewritten, output HEAVY MOD.
If you chose NO MOD, output the aspects
of the Query that makes it an effective query
in a markdown table in the following format:
<table format>
If the query needs any rewrite (that is, if you
answered SOME MOD or HEAVY MOD in the previous
question), output the aspects of improvement
in a markdown table in the format below:
<table format>
DO NOT answer the input Query, your job is only
to evaluate how well it expresses the user’s
information need from a Chatbot.
Conversational History: query_context
Query: target_query

If you propose a list of
rewrites, then for each rewritten query, list
the following information:
Rewrite: <The Rewritten Query. Make sure to
include ALL relevant information from the
original Query and the Conversational History>
Information Added: <Whether information
beyond what’s present in the Query or the
Conversational History needs to be added in the
rewrite. Reply YES or NO>
Assumptions: If there’s additional information
needed to be added to the user’s query for it
to be effective, then those are assumptions
about the user’s goals that need to be made.
If you answered YES in the previous step, list
the assumptions along with how salient they are
for the rewrite, and how plausible they are for
the user to believe in from a scale of HIGH,
MID and LOW in a markdown table in the format
below:
|assumption|salience|plausibility|
|<assumption text>|<HIGH, MID or LOW>|<HIGH,
MID or LOW>|
Note:
The conversational history may or may not be
present, and it provides you with some context
on the user query you need to analyze. If the
context is about a different task or topic,
discard it.
Order the rewrites from the most likely to the
least.
Output using the template outlined below:
<START OF OUTPUT TEMPLATE>:
...
<END OF OUTPUT TEMPLATE>
Conversational History: query_context
Query: target_query
Based on the Query and the Conversational
History, fill out the OUTPUT TEMPLATE in order
to structurally analyze the user Query in
context without trying to answer the query.
4 Likes

Which model did you use there for the prompt?

The XML prompting/structure benefit depends heavily on which model you use. Some models like Anthropics claude are trained more towards XML, others are not necessarily.

From experience proper MD structure works just as well as XML without duplicating tokens used for opening/closing tags.

I’ve researched scientific papers about this issue and XML is the best format, then Yaml, MD, and last Json, its true that XML uses a lot of tokens but also is the most programmable thus ensuring step following or verifications, being a file of 184 lines it doesn’t create context issues, I used 3.7-thinking to rewrite it as the most intelligent model should be used to rewrite a prompt, it also follows XML perfectly as you stated, being also the best Tooling model, rewrite prompts can be used in mid-conversation and a strimmed version is preferable, paper shows even better results when used mid-conversation, I’ll add original prompt to my post as it can be handy!

Great, yeah good to hear that 3.7 thinking made good use of it. Perhaps more people should try

2 Likes

Just bigging up YAML here, though I do appreciate its not as effective as XML - as a shorthand for context, YAML is a solid option.
Also like to add mermaid diagram are very good for articulating complicated things in a short amount of tokens.

2 Likes

Do lots of XML make LLMs reason worse? Like, if we’re too strict with a kid, will they get scared and only say safe things?

From my experience with rules in XML and general consensus about 16k context being the best working context, 2-3 rules for about 1k lines and 1k lines of code works really well

Bonus prompt: underspecification optimizer

we introduce novel requirements-aware prompt optimization mechanisms
that can improve performance by 4.8% on average over baselines

You are an experienced requirements engineer. Your goal is to brainstorm a list of
requirements that specify desired LLM behaviors for the given task.
These requirements should identify behaviors that, if omitted, would likely
frustrate or annoy users -- such as forgetting to surface important
reminders, warnings, or common-sense.
These requirements should be consistent with each other without contradictions and
complementary to existing requirements.
Guidelines:
- Each requirement should test exactly ONE requirement
- Requirements should be easily verifiable, almost as if writing a Boolean
condition in Python. They should be testable with Python code or an LLM
itself (no human judgment or external sources needed).
- Requirements should not be overly general (i.e. they should not be universal
requirements that might apply to any reasonable reasponse)
- Requirements should be generally applicable for responses to that task, not
referring to any specific response
- Avoid unrealistic edge cases - focus on plausible failures that could occur even
in aligned or well-trained LLMs.
- Focus only on objective, measurable requirements
- Use concise and unambiguous language
- Never generate similar requirements to the existing requirements
2 Likes

I rewrote your Original prompt to use in mid-conversations to use less context prompt, Can you tell me how it fares in your analysis?

Objective: Evaluate query optimization for AI chatbot interaction given conversational context. Perform lexical-semantic analysis to identify enhancement vectors or validate existing efficacy parameters.

Protocol:
1. Classification: Output modification requirement level - NO MOD (optimal), SOME MOD (minor refinement), HEAVY MOD (substantial reconstruction)
2. Analysis: Generate tabular assessment of query characteristics - effective aspects (NO MOD) or improvement vectors (SOME/HEAVY MOD)
3. Reconstruction: If modification required, generate ranked rewrites preserving user intent while maximizing information retrieval probability
4. Assumption mapping: Document extrapolated information additions with salience/plausibility metrics (HIGH/MID/LOW)

Constraint parameters: Maintain semantic intent fidelity, integrate conversational context, exclude domain-irrelevant historical data, rank outputs by likelihood optimization.

Input variables: query_context (conversational history), target_query (evaluation subject)

Output schema: Structured template containing modification classification, characteristic analysis tables, ranked rewrite candidates with assumption matrices.

Execution directive: Perform query optimization analysis without query resolution - evaluate communicative efficacy for AI interaction exclusively.
1 Like

And I also “optimized” this one too - and I would like to know if it had a measurable impact.

Role=Requirements engineer: enumerate LLM behavioral specifications preventing user frustration via omitted critical information (reminders, warnings, heuristics). Generate mutually consistent, non-contradictory, complementary requirements satisfying: (1) singular requirement per specification, (2) programmatically verifiable via Boolean logic or automated LLM evaluation, (3) task-specific rather than universal applicability, (4) response-agnostic generalizability, (5) realistic failure modes in aligned systems, (6) objective measurability, (7) linguistic precision, (8) non-redundancy with existing specifications.
1 Like

TL;DR

In a head-to-head test I ran in Cursor with Claude Sonnet 4, a plain-text prompt-rewriter rule consistently beat an XML-based rule on structure, analytical depth, rewrite quality, assumption handling, and overall developer usability. Full raw prompts and outputs are tucked away in the details blocks below, so you can inspect everything without a wall of text.

If this breakdown helps, please tap the :heart: or drop a reply—would love to compare notes!


What I Did (Quick Summary)

  1. Two rule sets

    • Plain-text version (keyword clauses).
    • XML version (formal schema).
  2. Controlled runs in Cursor

    • Enabled one rule in the IDE; loaded the other manually.
    • Started a fresh chat each time with the same baseline request:
      rewrite this prompt: Create a pac-man game in Python.
    • Model: Claude Sonnet 4.
  3. Captured outputs

    • Saved each model reply to its own Markdown file.
    • Ran a follow-up ChatGPT comparison on the two files (criteria: structure, depth, completeness, assumptions, usability).
  4. Key findings

    Dimension Winner Why
    Structure/Readability Plain-text Clear headings, numbered rewrites, less fluff
    Analytical depth Plain-text Flags 60 FPS target, OOP separation, UX points
    Rewrite quality Plain-text Adds power-pellets, level progression, perf budget
    Assumption handling Plain-text More nuanced salience/plausibility table
    Dev usability Plain-text Ready-to-copy spec variants (beginner → full)

    XML is still viable—especially if downstream tools need structured markup—but it could use tighter wording, explicit metrics, and ranked outputs.


  • Reply with your own rule tweaks, different models, or other benchmark ideas—we can build a mini bake-off library together!

NOTE: Scroll into the details sections below for every prompt, rule file, and raw model response. Nothing is hidden. Enjoy nerding out!

1. The original prompt in XML
Priority: High

Instructions: MUST follow <prompt_rewriting_rules> for rewriting the prompt:
<?xml version="1.0" encoding="UTF-8"?>
<prompt_rewriting_rules version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <metadata>
    <author>LLM Architecture Team</author>
    <created>2025-03-25</created>
    <purpose>Query analysis and improvement</purpose>
    <application-boundary>
      <limit>Applies to user query analysis and rewriting</limit>
      <limit>Does not modify conversational response generation</limit>
      <limit>Preserves original user intent</limit>
    </application-boundary>
  </metadata>

  <objective priority="critical">
    <definition>Analyze and improve user queries while preserving intent</definition>
    <requirements>
      <requirement type="preservation">Maintain original user goals</requirement>
      <requirement type="analysis">Identify improvement opportunities</requirement>
      <requirement type="generation">Propose effective rewrites</requirement>
    </requirements>
  </objective>

  <analysis_process priority="high">
    <phase name="InputEvaluation" sequence="1">
      <step name="QueryExamination">
        <action>Analyze user query for clarity, specificity, and completeness</action>
        <evaluation_criteria>
          <criterion type="clarity">Clear communication of needs</criterion>
          <criterion type="specificity">Sufficient detail level</criterion>
          <criterion type="context">Proper utilization of conversation history</criterion>
        </evaluation_criteria>
      </step>
      <step name="ModificationDetermination">
        <action>Decide if query requires improvement</action>
        <output format="boolean">YES or NO response</output>
      </step>
    </phase>

    <phase name="ImprovementIdentification" sequence="2" conditional="ModificationDetermination=YES">
      <step name="AspectIdentification">
        <action>List specific aspects requiring improvement</action>
        <aspects>
          <aspect type="clarity">Communication clarity</aspect>
          <aspect type="specificity">Detail sufficiency</aspect>
          <aspect type="structure">Query organization</aspect>
          <aspect type="relevance">Contextual alignment</aspect>
        </aspects>
      </step>
    </phase>

    <phase name="EffectivenessIdentification" sequence="2" conditional="ModificationDetermination=NO">
      <step name="StrengthIdentification">
        <action>Identify effective aspects of the query</action>
        <strengths>
          <strength type="clarity">Clear communication</strength>
          <strength type="specificity">Appropriate detail</strength>
          <strength type="structure">Logical organization</strength>
          <strength type="relevance">Contextual relevance</strength>
        </strengths>
      </step>
    </phase>
  </analysis_process>

  <rewriting_protocol priority="critical">
    <generation_rules>
      <rule name="IntentPreservation">
        <definition>Maintain user's original goal and intent</definition>
        <implementation>
          <strategy>Compare original query purpose with rewrite</strategy>
          <strategy>Preserve key question elements</strategy>
          <strategy>Maintain task-specific instructions</strategy>
        </implementation>
      </rule>

      <rule name="ContextIntegration">
        <definition>Incorporate relevant conversational history</definition>
        <implementation>
          <strategy>Reference established concepts from prior exchanges</strategy>
          <strategy>Avoid redundant information already provided</strategy>
          <strategy>Discard irrelevant historical context</strategy>
        </implementation>
        <validation>
          <check type="relevance">Context must relate to current query topic</check>
          <check type="recency">Prioritize recent conversational turns</check>
        </validation>
      </rule>

      <rule name="ClarityEnhancement">
        <definition>Improve communication clarity</definition>
        <implementation>
          <strategy>Restructure ambiguous phrasing</strategy>
          <strategy>Add explicit structure when beneficial</strategy>
          <strategy>Remove unnecessary verbosity</strategy>
        </implementation>
      </rule>
    </generation_rules>

    <rewrite_ordering>
      <criterion priority="1">Likelihood of matching user intent</criterion>
      <criterion priority="2">Minimal assumption introduction</criterion>
      <criterion priority="3">Clarity improvement magnitude</criterion>
    </rewrite_ordering>
  </rewriting_protocol>

  <assumption_management priority="high">
    <evaluation_required>YES or NO determination</evaluation_required>
    <assumption_attributes>
      <attribute name="salience" values="HIGH,MID,LOW">
        <definition>Importance to query effectiveness</definition>
      </attribute>
      <attribute name="plausibility" values="HIGH,MID,LOW">
        <definition>Likelihood of user agreement</definition>
      </attribute>
    </assumption_attributes>
    <documentation_format>
      <table columns="assumption,salience,plausibility"/>
    </documentation_format>
  </assumption_management>

  <output_template priority="critical">
    <section name="ModificationRequired">
      <question>Does the Query need modification?</question>
      <format>YES or NO response</format>
    </section>

    <section name="AnalysisReasoning" conditional="true">
      <condition applies-to="ModificationRequired=YES">
        <content>Specific aspects requiring improvement</content>
      </condition>
      <condition applies-to="ModificationRequired=NO">
        <content>Effective aspects of the query</content>
      </condition>
    </section>

    <section name="ProposedRewrites">
      <format>Numbered list of rewrites</format>
      <ordering>Most to least likely effective</ordering>
    </section>

    <section name="AssumptionsRequired">
      <question>Does the rewrite require assumptions not present in Query or Conversational History?</question>
      <format>YES or NO response</format>
    </section>

    <section name="Assumptions" conditional="AssumptionsRequired=YES">
      <format>Markdown table with columns: assumption, salience, plausibility</format>
      <values for="salience">HIGH, MID, LOW</values>
      <values for="plausibility">HIGH, MID, LOW</values>
    </section>
  </output_template>

  <input_structure>
    <component name="ConversationalHistory" required="false">
      <description>Previous exchanges providing context</description>
      <processing>
        <instruction>Use if relevant to current query topic</instruction>
        <instruction>Discard if about different task or topic</instruction>
      </processing>
    </component>

    <component name="Query" required="true">
      <description>Current user question or instruction</description>
      <processing>
        <instruction>Analyze for improvement opportunities</instruction>
        <instruction>Preserve original intent when rewriting</instruction>
      </processing>
    </component>
  </input_structure>

  <compliance_validation>
    <validation xpath="count(//section) = 5" message="All output sections must be present"/>
    <validation xpath="every $v in //values/@for satisfies $v = 'salience' or $v = 'plausibility'"
               message="Value attributes must be either salience or plausibility"/>
    <validation xpath="count(//rule) >= 3" message="At least three rewriting rules required"/>
  </compliance_validation>

</prompt_rewriting_rules>
1.1. The response to `rewrite this prompt: Create a pacman game in Python` when used in cursor with Claude-sonnet-4 model.
## Does the Query need modification?

YES

## Analysis Reasoning

The query requires improvement in several specific aspects:

- **Specificity**: Lacks detail about game complexity, features, graphics approach, and technical requirements
- **Structure**: Could benefit from organized requirements and clear deliverables
- **Technical context**: Missing information about preferred libraries, target audience, or implementation constraints
- **Scope definition**: No indication of whether this should be a full game, prototype, or learning exercise

## Proposed Rewrites

1. **Develop a complete Pacman game in Python with the following specifications: classic maze navigation, ghost AI with different behaviors, pellet collection mechanics, score tracking, and multiple levels. Use pygame for graphics and include sound effects. Provide a modular code structure with separate classes for the player, ghosts, maze, and game state management.**

2. **Create a Python implementation of Pacman that includes: a playable character that moves through a maze, at least 2 AI-controlled ghosts with basic chase behavior, collectible pellets/dots, collision detection, and a simple scoring system. Use a graphics library of your choice and include clear documentation and setup instructions.**

3. **Build a simplified Pacman-style game in Python featuring: maze-based movement, ghost enemies, pellet collection, and basic game states (start, playing, game over). Focus on clean, well-commented code suitable for learning game development concepts. Include requirements.txt and a README with setup/play instructions.**

## Does the rewrite require assumptions not present in Query or Conversational History?

YES

## Assumptions

| assumption | salience | plausibility |
|------------|----------|--------------|
| User wants a playable game rather than just a code skeleton | HIGH | HIGH |
| User has intermediate Python programming experience | MID | MID |
| User wants classic Pacman features (ghosts, pellets, maze) | HIGH | HIGH |
| Graphics/visual output is desired over text-based implementation | HIGH | MID |
| User wants complete documentation and setup instructions | MID | HIGH |
| User prefers pygame over other graphics libraries | MID | MID |
| User wants modular, object-oriented code structure | MID | HIGH |
2. The text based prompt I optimized.
TRIGGER: "rewrite this prompt" → EXECUTE query optimization protocol

OBJECTIVE: Analyze user queries, preserve intent, generate improved variants

ANALYSIS_PIPELINE:
1. INPUT_EVALUATION: Assess clarity/specificity/completeness → DECISION(MODIFY:BOOLEAN)
2. IF MODIFY=TRUE: Identify deficient aspects (clarity/specificity/structure/relevance)
3. IF MODIFY=FALSE: Document effective query characteristics

REWRITING_CONSTRAINTS:
- INTENT_PRESERVATION: Maintain original goal semantics
- CONTEXT_INTEGRATION: Leverage conversational history if topically relevant
- CLARITY_OPTIMIZATION: Eliminate ambiguity, reduce verbosity
- ASSUMPTION_MINIMIZATION: Avoid unwarranted inferences

RANKING_CRITERIA: Intent-match probability > assumption count > clarity improvement magnitude

OUTPUT_SCHEMA:
1. MODIFICATION_REQUIRED: YES|NO
2. ANALYSIS_RATIONALE: Improvement areas OR effective aspects
3. PROPOSED_REWRITES: Ordered by effectiveness probability
4. ASSUMPTIONS_REQUIRED: YES|NO
5. IF ASSUMPTIONS_REQUIRED=YES: Table(assumption, salience[HIGH|MID|LOW], plausibility[HIGH|MID|LOW])

PROCESSING_RULES:
- Discard irrelevant conversational context
- Preserve task-specific instructions
- Structure enhancement where beneficial


- Validate intent preservation across rewrites
2.1. The response to `rewrite this prompt: Create a pacman game in Python` when used in cursor with Claude-sonnet-4 model.
Ran tool
Based on the prompt-rewriter rule, let me analyze and rewrite the prompt "Create a pacman game in Python":

## MODIFICATION_REQUIRED: YES

## ANALYSIS_RATIONALE:
The original prompt lacks specificity regarding game features, complexity level, implementation details, and success criteria. While the intent is clear (create a Pacman game using Python), it needs enhancement in the following areas:
- Technical requirements (libraries, graphics, complexity)
- Game features and mechanics
- UI/UX expectations
- Performance and structure requirements

## PROPOSED_REWRITES:
Ordered by effectiveness probability:

### 1. Comprehensive Specification (Highest Intent-Match)
"Develop a fully functional Pacman game in Python using pygame, implementing core mechanics including: player-controlled Pacman movement through a maze, ghost AI with different behaviors, pellet collection system with scoring, power pellets for temporary ghost vulnerability, level progression, game over/restart functionality, and retro-style graphics with smooth animations. Structure the code using object-oriented principles with separate classes for game entities, include proper collision detection, and ensure 60 FPS performance."

### 2. Intermediate Specification (High Intent-Match)
"Create a Pacman game in Python with pygame featuring: maze navigation, ghost enemies, pellet collection mechanics, basic scoring system, and win/lose conditions. Use clean object-oriented design with modular components and include simple graphics/sound effects."

### 3. Beginner-Friendly Specification (Medium Intent-Match)
"Build a simplified Pacman game in Python using pygame or turtle graphics, including basic maze layout, player movement controls, collectible items, simple enemy behavior, and score tracking. Focus on core gameplay mechanics with clear, well-commented code suitable for learning game development concepts."

## ASSUMPTIONS_REQUIRED: YES

| Assumption | Salience | Plausibility |
|------------|----------|-------------|
| User wants graphical game (not text-based) | HIGH | HIGH |
| Python pygame library is acceptable | HIGH | HIGH |
| Complete game experience expected | MID | HIGH |
| Object-oriented design preferred | MID | HIGH |
| Standard Pacman rules apply | MID | HIGH |
| Single-player game intended | LOW | HIGH |
| Desktop application target | LOW | MID |

The rewrites preserve the original intent while adding necessary technical specifications and success criteria for effective implementation.

I then had ChatGPT in a new session take both output files and assess them against each other.

ChatGPT ajudicates the results

Below is a side-by-side critique of the two AI responses. I focus on five dimensions that matter most for a “prompt-upgrader”: structure, analytical depth, rewrite quality, assumption handling, and overall usability.


1. Structure & Formatting

Dimension Plain-text rule set (“non-xml”) XML rule set (“xml”)
Top-level headings Clear keywords: MODIFICATION_REQUIRED, ANALYSIS_RATIONALE, PROPOSED_REWRITES, ASSUMPTIONS_REQUIRED Conversational headings: “Does the Query need modification?”, “Analysis Reasoning”, etc.
Visual hierarchy Numbered sub-headings and a ranked list (1 → 3). Flat list without ranking.
Signal/noise ratio Compact; every section is directly useful. Slightly more prose; some repetition (e.g., restating that the query needs improvement).

Verdict: Non-xml wins on scannability and explicit ranking.


2. Analytical Depth

  • Non-xml lists four concrete deficiency areas (technical requirements, game features, UI/UX, performance) and ties each to the rewrite. It also flags the need for “60 FPS performance” and “object-oriented principles”, demonstrating deeper consideration of implementation challenges.
  • XML mentions missing “technical context” and “scope definition” but stops short of quantifying any target performance or coding standards.

Verdict: Non-xml provides richer, more actionable analysis.


3. Quality of the Rewritten Prompts

Aspect Non-xml (best rewrite) XML (best rewrite)
Feature coverage Adds power-pellets, ghost AI behaviors, level progression, restart flow, retro graphics, and a 60 FPS budget. Adds sound effects and modular classes, but omits power-pellets and frame-rate targets.
Technical clarity Explicitly requires pygame and OOP separation of entities. Requires pygame and classes but leaves graphics library “of your choice” in other variants.
Success criteria Performance metric plus OOP structure. Only qualitative deliverables (“modular code”).

Verdict: Non-xml rewrite is more complete and measurable; XML rewrite is adequate but less detailed.


4. Assumption Table

  • Both outputs include a salience/plausibility matrix, but non-xml differentiates salience levels more thoughtfully (e.g., distinguishes single-player vs. desktop target) , whereas xml repeats several assumptions already implicit in its rewrite and gives MID salience to obviously critical items like graphics.

Verdict: Slight edge to non-xml for nuance.


5. Practical Usability for a Developer

  • Non-xml offers three rewrites tuned to different audiences (comprehensive, intermediate, beginner), letting you pick one that matches scope.
  • XML provides three rewrites of descending complexity but without explicit labeling; a user must infer which fits their need.
  • Non-xml’s wording (“implementing core mechanics including… ensure 60 FPS”) reads like an acceptance-criteria list, ideal for a developer or freelancer.

Verdict: Non-xml delivers clearer, ready-to-copy specifications.


Overall Assessment

Category Winner
Structure / Readability Non-xml
Analytical Depth Non-xml
Technical Completeness Non-xml
Explicit Assumptions Non-xml (narrow margin)
Developer Usability Non-xml

Best Instruction Set: the plain-text (non-xml) rule set produced the stronger output. It combines a tighter structure, deeper analysis, richer success criteria, and clearly ranked rewrites, making it more valuable for anyone who must turn a vague request into an actionable, high-quality prompt.


How the XML-based Instructions Could Catch Up

  1. Add explicit performance or quality metrics (frame rate, code style checker, test coverage).
  2. Label rewrite variants by audience or effort (“full production”, “prototype”, “tutorial”).
  3. Trim redundant prose to match the crisp keyword style used in the non-xml version.
  4. Incorporate priority ranking so users instantly see which rewrite is recommended.

Implementing those tweaks would close most of the gap while retaining XML’s more formal schema, if that format is required for downstream processing.

1 Like

As I had some ideas about your findings, I put them through a test: what if ‘prompt optimizer’ gets its most interpretable version with a symbolic ruleset? By rendering rules less rigid(XML to YAML to Symbolic) we may obtain better results as LLMs have a greater reasoning space that we struggle to fill with rules:

SYMBO ruleset (credits to @Tof, I tried to replicate his framework)

Priority: High
Instruction: MUST follow SYMBO Prompt Instructions below.

<?xml version="1.0" encoding="UTF-8"?>
<symbo_rules version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <metadata>
    <author>Normal</author>
    <created>2025-04-12</created>
    <source>https://forum.cursor.com/u/Tof</source>
    <scope>Defines rules for using the SYMBO notation to generate symbolic prompts.</scope>
  </metadata>

  <rule name="DefineCoreComponentsSymbolically" priority="critical">
    <definition>Identify the primary functional, cognitive, or structural units of the desired system (e.g., reasoning engine, memory, task manager, learning module, error handler, state tracker). Assign unique symbols (often Greek letters, sometimes with modifiers like *, +, H, ⍺, τ) to represent these core components.</definition>
    <implementation>
      <strategy>Validation: Seen consistently (Ω*, M, T, Ξ*, Λ, Ψ, D⍺, TDD, Ω_H, Ω₁, etc.). This is the foundational step.</strategy>
      <strategy>Improvement: Emphasize that symbols represent abstractions of functionality.</strategy>
    </implementation>
  </rule>

  <rule name="UseSymbolicOperatorsForRelationshipsTransformations" priority="critical">
    <definition>Employ a consistent set of symbols to define relationships between components or concepts and to describe transformations. Key operators observed: ⇌ (Equivalence, Implementation), ⟶ (Mapping, Causality, Transformation), ⨁ (Composition, Aggregation), = (Definition/assignment), () (Grouping/application), {} (Sets/Collections), ∂/∂τ or ∇ (Change/Dependency), Σ (Summation/Aggregation), max() (Optimization/Selection), | (Conditional), ∈ (Membership), ⇨ (Implication/Transition), + (Combination).</definition>
    <implementation>
      <strategy>Validation: Crucial for defining interactions and internal logic in all examples. The RIPER-5 conversion clearly showed mapping relationships (⟶) and implementation (⇌).</strategy>
      <strategy>Improvement: Standardize the meaning of each operator within the context of the prompt system.</strategy>
    </implementation>
  </rule>

  <rule name="DetailModuleImplementationStructure" priority="high">
    <definition>Elaborate on core modules using attributes (dot notation like M.memory_path or key-value pairs within {}), define their internal structure, break down complex functions into sub-components (often using ⨁ or listing them), and describe their constituent parts or operational modes.</definition>
    <implementation>
      <strategy>Validation: Heavily used in all examples (M.memory_path, T.decomposition, Λ.naming_convention, Ω.modes, Ψ.capture).</strategy>
      <strategy>Improvement: Clarify that this detailing bridges the gap between the abstract symbol and its intended behaviour/configuration.</strategy>
    </implementation>
  </rule>

  <rule name="EncodeConstraintsRulesPoliciesLogic" priority="high">
    <definition>Translate specific operational rules, constraints (permissions, prohibitions), guardrails (e.g., simplicity checks), decision logic, methodologies (like TDD), and handling procedures (e.g., error handling, conflict resolution) into the symbolic notation. This often involves defining specific attributes, sub-components, or using conditional logic within definitions (Ω_C, Ξ_S, Ω.simplicity_guard, Φ* = constrained to...).</definition>
    <implementation>
      <strategy>Validation: Essential for capturing the detailed behaviour, especially evident in the RIPER-5 conversion (Ω_C, Ξ_S) and the enhanced prompt (Ω.guards, Λ.autonomy, TDD.loop).</strategy>
      <strategy>Improvement: This rule highlights how the symbolic language encodes behavioural specifications.</strategy>
    </implementation>
  </rule>

  <rule name="MapAbstractConceptsToConcreteImplementations" priority="critical">
    <definition>Link abstract modules or functions to tangible elements, primarily file paths, specific file structures, or data formats. This grounds the system and enables persistence and interaction with external tools/environments.</definition>
    <implementation>
      <strategy>Validation: Critical for statefulness (M.memory_path, T.plan_path, Λ.rules_path, .cursor/ centralization, T.file_structure).</strategy>
      <strategy>Improvement: Emphasize this rule enables the practical application and state persistence described in "what it does."</strategy>
    </implementation>
  </rule>

  <rule name="DefineControlFlowStateManagementTransitions" priority="high">
    <definition>Explicitly represent how the system state changes, how transitions between modes or states are triggered (manually or automatically), and how context or information (ζ, τ, λ) influences behaviour or is pruned based on state.</definition>
    <implementation>
      <strategy>Validation: Seen in RIPER-5 conversion (Στ(λ), Ω_C state changes) and hinted at in the hooks (Σ_hooks).</strategy>
      <strategy>Improvement: Specify the need to define state variables and transition logic clearly.</strategy>
    </implementation>
  </rule>

  <rule name="ImplementEventDrivenArchitecture" priority="medium">
    <definition>Define system events (e.g., on_task_created, on_error_detected) and link them explicitly to actions or functions within specific modules (Σ_hooks). This clarifies inter-module communication and automates responses.</definition>
    <implementation>
      <strategy>Validation: Clearly implemented in the enhanced prompt via Σ_hooks.</strategy>
      <strategy>Improvement: Frame this as a pattern for managing complex interactions and autonomy. (Optional but Powerful)</strategy>
    </implementation>
  </rule>

  <rule name="IncorporateMetacognitionAdaptationLearning" priority="high">
    <definition>Include components dedicated to self-monitoring, tracing internal states/reasoning (Ψ), diagnostics (Ξ), learning/rule generation (Λ), and dynamic adaptation (𝚫).</definition>
    <implementation>
      <strategy>Validation: Present in all versions to varying degrees (Ξ, Λ, Ωₜ, Ψ, 𝚫*).</strategy>
      <strategy>Improvement: Highlight this as enabling advanced behaviours like self-improvement and context-awareness.</strategy>
    </implementation>
  </rule>

  <rule name="UtilizeHierarchicalStructures" priority="medium">
    <definition>Organize components or processes hierarchically (e.g., Ω_H contains Ω₁..₅, Φ_H builds on Φ*) to manage complexity, represent layers of abstraction, or detail task breakdowns.</definition>
    <implementation>
      <strategy>Validation: Used for modes (Ω_H), tasks (T), and potentially rules (Λ).</strategy>
      <strategy>Improvement: Useful for both conceptual organization and defining layered processing.</strategy>
    </implementation>
  </rule>

  <rule name="MaintainSymbolicConsistencyVocabulary" priority="critical">
    <definition>Use the defined symbols and operators consistently throughout the prompt according to their established meanings within that specific prompt's architecture. Define new symbols clearly if needed.</definition>
    <implementation>
      <strategy>Validation: Implicitly necessary for the notation to be coherent. The reuse of Ω, Ξ, Φ etc. demonstrates this.</strategy>
      <strategy>Improvement: Crucial for readability and preventing ambiguity.</strategy>
    </implementation>
  </rule>

  <rule name="AbstractImplementationDetailsSelectively" priority="medium">
    <definition>Focus the symbolic notation on the logical structure, relationships, constraints, and core functionality. Omit fine-grained implementation details unless they are necessary for defining the system's structure or behaviour (like file paths for persistence, or specific algorithms if key to the function). Avoid verbose natural language explanations within the core symbolic definitions.</definition>
    <implementation>
      <strategy>Validation: The density of the notation comes from this abstraction. Formatting details from RIPER-5 were omitted in the conversion. Concrete paths were kept.</strategy>
      <strategy>Improvement: Balance abstraction with necessary concrete details for grounding.</strategy>
    </implementation>
  </rule>

</symbo_rules>
@Jamie-BitFlight the prompt he optimized to symbo
optimized prompt to symbo (o3)
Follow SYMBO rules to rewrite this ruleset enclosed in <ruleset>
<ruleset>
[Jamie optimized prompt]
<ruleset>
symbo result
Ωₜ = "rewrite this prompt"          ⟶ event TRIGGER

Ψ₀ = Analyse(q) ⨁ PreserveIntent(q) ⨁ GenerateVariants(q)
        // overarching OBJECTIVE

Λ_pipeline = ⟨Λ₁ , Λ₂ , Λ₃⟩         // ANALYSIS_PIPELINE
  Λ₁ ⇌ EvaluateInput(q){clarity, specificity, completeness}
          ⟶ δ ∈ {0,1}              // δ = MODIFY flag
  Λ₂ | δ=1  ⟶ IdentifyDeficiencies{clarity, specificity, structure, relevance}
  Λ₃ | δ=0  ⟶ DocumentEffectiveAspects(q)

Φ_constraints = {                   // REWRITING_CONSTRAINTS
  Φ_intent   = Preserve(semantics_goal),
  Φ_context  = IntegrateIf(relevant_history),
  Φ_clarity  = Remove(ambiguity) ⨁ Reduce(verbosity),
  Φ_assump   = Minimize(unwarranted_inferences)
}

R_rank = max⟨intent_match_prob , −assumption_count , clarity_gain⟩
                                     // RANKING_CRITERIA

Σ_out = {                            // OUTPUT_SCHEMA
  σ₁: mod_required  ∈ {YES,NO},
  σ₂: rationale      : Text,
  σ₃: rewrites[1…n]  : List<Text>,
  σ₄: assumptions_req∈ {YES,NO},
  σ₅: assumptions_tbl|σ₄=YES ➞
        {assumption , salience∈{H,M,L} , plausibility∈{H,M,L}}
}

Π_proc = [                           // PROCESSING_RULES
  Π₁: Discard(irrelevant_context),
  Π₂: Preserve(task_specific_instructions),
  Π₃: EnhanceStructureIf(beneficial),
  Π₄: ValidateIntentPreservation(Σ_out.σ₃)
]
AI testing with symbo rules
o3
Proposed rewritten prompt:
“Develop a fully-featured Pac-Man clone in Python, preferably with the Pygame library. Your implementation should include:  
• The classic maze layout with pellets, power-pellets, tunnels, and fruit bonuses.  
• Pac-Man movement controlled by keyboard input and smooth grid-based motion.  
• Four ghosts (Blinky, Pinky, Inky, Clyde) with authentic AI behaviors—chase, scatter, frightened, and retreat modes.  
• Score tracking, level progression, life system, and high-score persistence.  
• Intro screen, pause/resume, game-over screen, and basic sounds.  
Provide clear instructions to run the game, a brief architecture overview, and well-documented, PEP-8-compliant source code.”
claude-4-thinking
## Analysis Pipeline (Λ_pipeline)

**Λ₁ - EvaluateInput Analysis:**
- Clarity: Medium (clear goal, unclear scope)
- Specificity: Low (no technical details, features, or constraints)  
- Completeness: Low (missing implementation details, requirements, deliverables)
- **δ = 1** (MODIFY flag = YES)

**Λ₂ - IdentifyDeficiencies:**
- **Clarity**: Ambiguous scope and complexity level
- **Specificity**: Missing technical stack, game features, performance requirements
- **Structure**: Lacks clear deliverables and acceptance criteria
- **Relevance**: No context about target audience or use case

## Output Schema (Σ_out)

**σ₁: mod_required = YES**

**σ₂: rationale =** Original prompt lacks specificity regarding game mechanics, technical requirements, project scope, and deliverable expectations. Enhancement needed for actionable implementation.

**σ₃: rewrites[1…3] =**

**Variant 1 (Comprehensive):**

Develop a fully functional Pacman game in Python with the following specifications:

CORE REQUIREMENTS:

  • Implement classic Pacman gameplay mechanics (player movement, ghost AI, pellet collection)
  • Use pygame library for graphics and game loop management
  • Create a 21x21 tile-based maze with authentic Pacman layout
  • Implement 4 ghosts with distinct AI behaviors (chase, ambush, patrol, random)

TECHNICAL SPECIFICATIONS:

  • Object-oriented architecture with separate classes for Player, Ghost, Maze, Game
  • 60 FPS game loop with proper collision detection
  • Score system with points for pellets (10), power pellets (50), ghosts (200-1600)
  • Sound effects for movement, pellet collection, ghost consumption

DELIVERABLES:

  • Playable game executable
  • Source code with documentation
  • Basic unit tests for core game logic
  • README with installation and play instructions

**Variant 2 (Focused):**

Build a minimal Pacman clone in Python focusing on core mechanics:

SCOPE:

  • Single-level gameplay with essential features only
  • Player-controlled Pacman character with 4-directional movement
  • 2-3 AI ghosts with basic chase behavior
  • Pellet collection and scoring system
  • Win/lose conditions

CONSTRAINTS:

  • Use only Python standard library + pygame
  • Maximum 500 lines of code
  • Complete within 2-3 development sessions
  • Prioritize functionality over visual polish

OUTPUT: Working game demo with source code


**Variant 3 (Educational):**

Create a Pacman game in Python as a learning project with progressive complexity:

PHASE 1: Basic Framework

  • Set up pygame window and game loop
  • Implement player movement and basic maze rendering
  • Add collision detection system

PHASE 2: Game Mechanics

  • Add pellet collection and scoring
  • Implement ghost entities with movement
  • Create win/lose conditions

PHASE 3: Enhancement

  • Add ghost AI behaviors
  • Implement power pellets and ghost vulnerability
  • Polish graphics and add sound effects

LEARNING GOALS: Understand game development patterns, state management, AI basics
DOCUMENTATION: Include code comments explaining game development concepts


**σ₄: assumptions_req = YES**

**σ₅: assumptions_tbl =**
| Assumption | Salience | Plausibility |
|------------|----------|-------------|
| User has Python/pygame experience | H | M |
| Desktop application target | H | H |
| Classic Pacman rules expected | M | H |
| Complete game implementation desired | H | M |
| Single-player focus | M | H |
gemini-2.5-pro
Of course. I will use the provided symbolic rules to analyze and rewrite the prompt.

---

### **Prompt Analysis & Rewrite**

*   **`σ₁: mod_required`**: YES

*   **`σ₂: rationale`**: The original prompt, "Create a pacman game in Python," is underspecified. It lacks clarity on essential components such as the choice of graphics library, specific game mechanics (e.g., ghost AI, scoring, levels), and the expected code architecture. To generate a useful program, the AI must make significant, potentially incorrect, assumptions about these details. The rewritten prompts below add structure and specificity to guide the development process, ensuring the final product is closer to the user's implicit requirements.

*   **`σ₃: rewrites[1…n]`**:
    *   **Rewrite 1 (Beginner-Focused, Minimal Viable Product):**
        > Create a basic Pac-Man game in Python using the `pygame` library.
        >
        > **Core Requirements:**
        > 1.  A single, static game board/maze defined by a 2D array.
        > 2.  A player-controlled Pac-Man that can move using the arrow keys. Pac-Man should stop when hitting a wall.
        > 3.  The maze should contain pellets that disappear when Pac-Man moves over them.
        > 4.  Include one ghost that moves randomly within the maze.
        > 5.  The game ends when the ghost collides with Pac-Man.
        >
        > Please write the code as a single, well-commented script.

    *   **Rewrite 2 (Intermediate, Feature-Oriented):**
        > Develop a feature-rich Pac-Man clone in Python with an object-oriented design.
        >
        > **Technical Specifications:**
        > *   **Library:** Use the `pygame` library for graphics, sound, and input handling.
        > *   **Architecture:** Implement using classes for `Game`, `Player`, `Ghost`, `Maze`, etc.
        >
        > **Gameplay Features:**
        > *   **Player:** Controlled via WASD keys.
        > *   **Ghosts:** Implement four ghosts, each with a distinct AI behavior (e.g., one chases directly, one tries to ambush, one patrols a specific area).
        > *   **Scoring:** Score points for eating pellets. Implement power-pellets that allow Pac-Man to eat ghosts for a limited time.
        > *   **Game State:** Include a start screen, a persistent high-score mechanism, a lives counter, and a "Game Over" screen.

    *   **Rewrite 3 (Step-by-Step Implementation):**
        > I want to build a Pac-Man game in Python. Let's do it step-by-step.
        >
        > **Step 1: Setup and Basic Movement**
        > Create a Python script using `pygame` that opens a window and displays a simple maze layout based on a 2D list or tuple. Implement a player character (a yellow circle) that can be moved with the arrow keys. Add collision detection so the player cannot move through walls.
        >
        > *After you provide the code for this step, I will ask for the next part.*

*   **`σ₄: assumptions_req`**: YES

*   **`σ₅: assumptions_tbl`**:
    | Assumption                                      | Salience | Plausibility |
    | :---------------------------------------------- | :------: | :----------: |
    | A graphical game is desired, not text-based.    |    H     |      H       |
    | `pygame` is the preferred graphics library.       |    H     |      H       |
    | The game should follow standard Pac-Man rules.  |    H     |      H       |
    | The user desires an object-oriented structure.  |    M     |      M       |
    | The user wants a complete, playable game at once.|    M     |      H       |
    | The user has the necessary libraries installed.   |    L     |      M       |

o3 evaluation:

  • Below is a side-by-side critique of four AI responses: 03, claude-4-thinking, gemini-2.5-pro and non-xml. I focus on five dimensions that matter most for a “prompt-upgrader”: structure, analytical depth, rewrite quality, assumption handling, and overall usability. Rank the four drafts against my dimensions
    [AI responses]
Ranking of the four drafts against your five dimensions

(1 = best, 4 = worst for each dimension)

Dimension | 1st | 2nd | 3rd | 4th | Key reasoning

--------- | ---- | ---- | ---- | ---- | -------------

Structure | claude-4 | gemini-2.5 | non-xml | o3 | claude’s Λ-pipeline & Σ_out schema give the clearest, most predictable layout; gemini mirrors that schema but with lighter labeling; non-xml uses headings but less hierarchy; o3 is just a single bullet list.

Analytical depth | claude-4 | gemini-2.5 | non-xml | o3 | claude provides multi-stage analysis (EvaluateInput → IdentifyDeficiencies) and explicit deficiency list; gemini gives solid rationale but less diagnostic detail; non-xml’s rationale is one paragraph; o3 has none.

Rewrite quality | claude-4 | o3 | gemini-2.5 | non-xml | claude offers three well-scoped variants; o3’s single rewrite is thorough and polished; gemini’s variants are solid but slightly wordier/redundant; non-xml rewrites cover similar ground but with vaguer scoring/AI details.

Assumption handling | claude-4 | gemini-2.5 | non-xml | o3 | Only the first three include a dedicated assumption table; claude’s is best organized; o3 omits assumptions entirely.

Overall usability | claude-4 | gemini-2.5 | o3 | non-xml | claude’s mix of depth, structure, multiple variants, and documented assumptions gives maximum immediate utility; gemini is close; o3 is usable but forces the reader to infer assumptions; non-xml adds little beyond the others.

Aggregate ranking (averaging positions across all five dimensions):

1. claude-4-thinking (consistent 1st place)
2. gemini-2.5-pro (mostly 2nd)
3. o3 (strong rewrite quality but weak analysis/assumptions)
4. non-xml (adequate but outperformed on every axis)

These results confirm my initial assumption and it’s great to know we should leverage more generic and open-ended rules as models grow, XML is great to instruct models into a rigid structure but limits their potential, in some cases we want XML to retain this rigidity(like my SYMBO ruleset), in others like in this case, we want to leverage the model potential to expand upon rules given, thanks @Jamie-BitFlight for inspiration and @Tof for his work on symbolic rules. @SoMaCoSF take a look, I think you’ll like it!

2 Likes

Thank you for your detailed response! When I rewrote the XML, I didn’t specify the structure of the output. If I add that information, we might achieve better compliance.

Like this for example?

partial XML prompt rewrite prompt

You are an AI assistant tasked with analyzing and potentially improving a given prompt while preserving its original intent. Your goal is to enhance the clarity, specificity, and effectiveness of the prompt without altering its fundamental purpose. Follow these instructions carefully:

  1. Review the original prompt:
    <original_prompt>
    {{ORIGINAL_PROMPT}}
    </original_prompt>

  2. If provided, examine the conversation history for context:
    <conversation_history>
    {{CONVERSATION_HISTORY}}
    </conversation_history>

  3. Conduct an analysis following this protocol:
    a. Evaluate the input for clarity, specificity, and completeness.
    b. Determine if modifications are necessary (YES/NO).
    c. If YES, identify aspects that need improvement.
    d. If NO, identify effective aspects of the prompt.

  4. Apply these rewriting rules if modifications are needed:

    • Preserve the original intent and task instructions.
    • Integrate relevant context from the conversation history, discarding irrelevant information.
    • Enhance clarity by restructuring ambiguous parts, adding structure, and removing verbosity.
  5. Manage assumptions using this format:
    [assumption | salience (HIGH/MID/LOW) | plausibility (HIGH/MID/LOW)]

  6. Prepare your output in the following structure:
    a. ModificationRequired: State YES or NO
    b. AnalysisReasoning: List improvement aspects (if YES) or effective aspects (if NO)
    c. ProposedRewrites: If modifications are required, provide a numbered list of rewrites, ordered by effectiveness probability
    d. AssumptionsRequired: State YES or NO
    e. Assumptions: If required, present a markdown table of assumptions

  7. When ordering proposed rewrites, prioritize based on:
    a. Probability of matching the original intent
    b. Minimal introduction of new assumptions
    c. Magnitude of clarity improvement

  8. Present your final output using the following XML tags:
    <modification_required>YES or NO</modification_required>
    <analysis_reasoning>
    Your reasoning here
    </analysis_reasoning>
    <proposed_rewrites>

    1. First rewrite
    2. Second rewrite

      </proposed_rewrites>
      <assumptions_required>YES or NO</assumptions_required>

      | Assumption | Salience | Plausibility |
      |------------|----------|--------------|
      | Assumption 1 | HIGH/MID/LOW | HIGH/MID/LOW |

Ensure that your analysis and rewrites are thorough, clear, and aligned with the original intent of the prompt. If no modifications are required, provide a detailed explanation of why the original prompt is effective as is.

1 Like