Currently there’s a lot of talk about Grok being the new beAst in town or DeepSeek-R1 impressive reasoning, from past days testing them I’m finding O3-mini to be the best, with Grok-3 being too generic, DeepSeek-R1 too specific(adding or hallucinating stuff), Claude to be too specific but without hallucinations(academic), O3-mini just feels right, sometimes it doesn’t write files but situation is improving, lets see a real example from today experiments, prompt:
# CONTEXT: Developing a financial visualization module in Panda3D (primarily 3D engine) that requires 2D rendering capabilities. Challenges include coordinate system conversion, real-time data updates, and maintaining visual clarity for market data, our python and comments guidelines need to be followed strictly.
# OBJECTIVE: Create a reusable 2D market chart component demonstrating:
- Candlestick plot rendering
- Moving average overlays
- Volume histogram bars
- Interactive zoom/pan controls
- Dynamic data updates
# STYLE: Follow Panda3D's official example patterns (clean OOP structure, explicit coordinate management). Incorporate financial charting conventions from libraries like mplfinance. Use PEP8-compliant Python with type hints.
# TONE: Technical precision with didactic clarity. Target intermediate developers needing to extend 3D engine capabilities for 2D financial visualization.
# AUDIENCE: Python developers with basic Panda3D experience seeking advanced 2D/3D integration techniques for financial applications.
# RESPONSE: Provide complete Panda3D script with:
1. Separate ChartManager class handling coordinate transformations
2. DataSeries class for OHLC data management
3. Real-time update mechanism using Panda3D's task manager
4. Annotation system for price levels
5. Performance optimizations for large datasets
O3-mini got it in one-shot:
Claude-3.5 got it after resolving a display issue(that still persist):
Claude-3.7 got it after 1 iteration with some display issues(but they’re pretty indeed):
Grok-3 after 4 iterations showed this:
DeepSeek-R1 didn’t solved issues in 5 iterations
RESULTS:
- Create a reusable 2D market chart(O3-mini, Claude3.5[3.7 was in part 3D])
- Candlestick plot rendering (O3-mini, Claude3.5[3.7])
- Moving average overlays (O3-mini, Claude3.7)
- Volume histogram bars (O3-mini, Claude3.7)
- Interactive zoom/pan controls (O3-mini, Claude3.5[3.7])
- Dynamic data updates (O3-mini)
Happy to hear others experiences!
XML ruleset used to create the COSTAR prompt:
<costarprompt name="COSTAR Prompt Instruction Set" version="1.0">
<metadata>
<author>Prompt Engineering Specialist</author>
<created>2025-01-28</created>
<scope>Prompt Structuring for LLMs</scope>
<application-boundary>
<limit>Applies to the structure and content of prompts for LLMs</limit>
<limit>Does not govern the LLM's response content</limit>
<limit>Excludes non-prompt related text</limit>
</application-boundary>
</metadata>
<ruleset>
<rule name="CONTEXT">
<description>Provides background information for task understanding</description>
<guidelines>
<guideline>Include relevant scenario details</guideline>
<guideline>Specify domain/task parameters</guideline>
<example>Company launching new product in beauty tech sector</example>
</guidelines>
</rule>
<rule name="OBJECTIVE">
<description>Defines the primary task goal</description>
<guidelines>
<guideline>Use action-oriented language</guideline>
<guideline>Specify measurable outcomes</guideline>
<example>Create persuasive Facebook post driving product link clicks</example>
</guidelines>
</rule>
<rule name="STYLE">
<description>Dictates writing style and voice</description>
<guidelines>
<guideline>Reference specific style models (e.g., corporate, influencer)</guideline>
<guideline>Specify professional/expert alignment</guideline>
<example>Mimic Dyson's product launch style</example>
</guidelines>
</rule>
<rule name="TONE">
<description>Sets emotional/attitudinal parameters</description>
<guidelines>
<guideline>Specify emotional resonance</guideline>
<guideline>Include tone examples</guideline>
<example>Persuasive yet respectful for mature audience</example>
</guidelines>
</rule>
<rule name="AUDIENCE">
<description>Identifies target recipients</description>
<guidelines>
<guideline>Specify demographic details</guideline>
<guideline>Include psychographic factors</guideline>
<example>Older generation valuing simplicity and reliability</example>
</guidelines>
</rule>
<rule name="RESPONSE">
<description>Defines output format requirements</description>
<guidelines>
<guideline>Specify structural format</guideline>
<guideline>Include technical requirements</guideline>
<example>JSON structure for API consumption</example>
</guidelines>
</rule>
</ruleset>
<example id="Example-COSTAR">
<description>
A practical application of the CO-STAR framework for drafting a Facebook post.
</description>
<scenario>
<p>
Suppose you work as a social media manager and need help drafting a Facebook post to advertise your company’s
new product.
</p>
</scenario>
<prompt>
<line># CONTEXT: I want to advertise my company’s new product. My company’s name is Alpha and the product is called Beta—
a new ultra-fast hairdryer.</line>
<line># OBJECTIVE: Create a Facebook post aimed at getting people to click on the product link to purchase it.</line>
<line># STYLE: Adopt the writing style of successful companies that advertise similar products, such as Dyson.</line>
<line># TONE: Persuasive</line>
<line># AUDIENCE: Target the older generation on Facebook, focusing on what they look for in quality hair products.</line>
<line># RESPONSE: Compose a concise yet impactful Facebook post.</line>
</prompt>
</example>
</costarprompt>