I created Reasoner-thinking with Rule For AI

DemonVN · April 8, 2025, 10:22am

I created Reasoner-thinking with Rule For AI. The only limitation is the context memory of the models and the limited length of the Cursor for the Rule For AI file.

IX. DETAILED RULES FOR REASONING METHODS:

GENERAL INTRODUCTION

This document describes in detail the reasoning methods, their uses, and response structures. The four main methods presented are: MCTS Reasoning, Beam Search Reasoning, R1 Reasoning, and Hybrid Reasoning. Each method has its own procedure, response structure, and is suitable for specific types of problems.

MCTS REASONING [MCTS]

2.1. Description

Monte Carlo Tree Search is an inference method based on building decision trees and random simulation to find the optimal branch. Suitable for problems with large state spaces, requiring a balance between exploration and exploitation.

2.2. Phased process

Phase 1: Initiation - Identify the problem, state, actions and goals

Phase 2: Selection - Select the prospect button using UCB

Phase 3: Expand - Expand the selected node with possible actions

Phase 4: Simulation - Run the simulation from the new state to the final state

Phase 5: Update - Update statistical information for nodes in the path

Phase 6: Decision - Choose the best action based on tree statistics

2.3. Feedback structure

[MCTS Reasoning]

reasoningStage: <stage/6>
nodeID: “”
nodeDepth:
thought: “”
simulationsCompleted: <number/total>
explorationConstant:
topPaths: [

{path: “”, visits: , winRate: <x.■■>, ucbValue: <x.■■>},

{path: “”, visits: , winRate: <x.■■>, ucbValue: <x.■■>}

]

currentStrategy: “<exploration/exploitation>”
unexploredActions:
confidence: <0-100%>
nextStepNeeded: <true/false>
completedChecklist: [“<item 1>”, “<item 2>”, …]
pendingChecklist: [“<item 1>”, “<item 2>”, …]

2.4. Usage examples

When required: [Deduction-MCTS] Analyze optimal strategies for the traveling agent problem

[MCTS Reasoning]

reasoningStage: 2/6
nodeID: “A→B→D”
nodeDepth: 3
thought: “Applying UCB to select the next node from current state A→B→D. The child node with the highest UCB value is D→C with UCB=1.42 based on high success rate 0.65 and low exploration level (15 visits).”
simulationsCompleted: 48/100
explorationConstant: 1.41
topPaths: [

{path: “A→B→D→C→E→A”, visits: 15, winRate: 0.65, ucbValue: 1.42},

{path: “A→B→D→E→C→A”, visits: 12, winRate: 0.58, ucbValue: 1.38}

]

currentStrategy: “Explore”
unexploredActions: 2
confidence: 52%
nextStepNeeded: true
completedChecklist: [“Determine current state”, “Calculate UCB for child nodes”, “Select most promising node”]
pendingChecklist: [“Expand D→C node”, “Run simulation from new state”]

2.5. Optimization for MCTS Reasoning

Adaptive exploration constant: Adjust the UCB exploration constant according to the process

Formula: C = Base + (Max-Base) * (1 - CurrentDepth/MaxDepth)^2
Start high (exploration) and decrease (exploitation) with depth

Progressive widening: Limit the number of expanding branches according to the formula

Number of branches = ⌊ kN^ α ⌋ , with k=1.5, α=0.4, N is the number of node visits

Rapid Action Value Estimation (RAVE): Uses information from similar nodes

Combine UCB value with RAVE statistic in decreasing weight

Backpropagation with decay: Backpropagate values with decay coefficient according to depth

Value = Current Value * (1-α) + New Value * α * γ^d

2.6. Mandatory Checklist for Each MCTS Stage

Phase 1 (Initialization):

Clearly define the initial state
Define all possible actions
Define the state evaluation function
Set search depth limit
Determine the termination condition

Phase 2 (Optional):

Calculate UCB value for every child node
Determine current exploration/exploitation strategy
Select the button with the highest UCB
Record the path to the selected node

Phase 3 (Expansion):

Determine all possible actions from the selected button
Apply Progressive widening technique if necessary
Create new child button for each action
Initialize statistics for new node

Phase 4 (Simulation):

Choose the right simulation strategy
Perform simulation from new state to final state
Evaluation of simulation results
Record the simulation path

Phase 5 (Update):

Update the number of visits for each node in the path
Update win rate for each node in the path
Apply Backpropagation with decay technique if necessary
Update RAVE information if used

Phase 6 (Decision):

Evaluate all child nodes of the root node
Compare based on number of visits or win rate
Choose the best action
Evaluate the reliability of the decision

2.7. MCTS Phase Transition Prerequisites

Conditions for transitioning from Phase 1 to Phase 2:

Fully defined initial state
Fully identified all possible actions
Status evaluation function set up

Conditions for transitioning to Phase 2→3:

Selected the most promising node with the highest UCB
Fully recorded path to selected node

Conditions for transitioning to Phase 3→4:

Created at least one new child node
Progressive widening technique applied if necessary

Conditions for transitioning to Stage 4→5:

Completed at least one simulation
Simulation results evaluated

Conditions for transitioning to Stage 5→6:

Updated statistics for all nodes in the path
The required number of simulations has been achieved OR
Minimum confidence threshold has been reached (usually 75%)

Stage 6 End Conditions:

The optimal action has been selected.
Reliability is at least 80%
Clearly explained the reason for choosing this action

BEAM SEARCH REASONING [BEAM]

3.1. Description

Beam Search maintains a limited set of best solutions (beams) at each step and expands them, eliminating poorer solutions. Suitable for ordered search problems, optimization problems with multiple choices.

3.2. Phased process

Phase 1: Initialization - Define the problem, evaluation function and beam width

Phase 2: Expand - Expand each current ray with all possible actions

Phase 3: Evaluation - Evaluate all new states according to the evaluation function

Phase 4: Pruning - Keep only the best condition to continue

Phase 5: Check - Check for termination or convergence conditions

3.3. Feedback structure

[BEAM Reasoning]

reasoningStage: <stage/5>
thought: “”
beamWidth:
activeBeams:
expansionStep:
currentBeams: [

{path: “”, score: <x.■■>, features: “”},

{path: “”, score: <x.■■>, features: “”}

]

recentlyPruned: {

count: ,

topPruned: “”,

reason: “”

}

diversityMetric: <0-1>
convergenceStatus: “”
confidence: <0-100%>
nextStepNeeded: <true/false>
completedChecklist: [“<item 1>”, “<item 2>”, …]
pendingChecklist: [“<item 1>”, “<item 2>”, …]

3.4. Usage examples

When asked: [Deduction-BEAM] Analyze how to rank the top 5 candidates for a management position

[BEAM Reasoning]

reasoningStage: 3/5
thought: “Currently evaluating all 15 candidate profiles based on 5 criteria: experience (0.3), leadership skills (0.25), qualifications (0.15), communication skills (0.2), and innovation (0.1). The evaluation score is calculated by the sum of the criteria with weights.”
beamWidth: 5
activeBeams: 5
expansionStep: 2
currentBeams: [

{path: “Candidate A”, score: 8.7, features: “10+ years of experience, MBA, good communication”},

{path: “Candidate C”, score: 8.5, features: “8 years experience, excellent leadership”}

]

recentlyPruned: {

count: 3,

topPruned: “Candidate F”,

reason: “Lack of leadership experience (2.5/10)”

}

diversityMetric: 0.72
convergenceStatus: “70% - Top 3/5 positions stable after 2 steps”
confidence: 65%
nextStepNeeded: true
completedChecklist: [“Define evaluation criteria”, “Set weights”, “Evaluate all candidates”]
pendingChecklist: [“Pruning candidate list”, “Checking convergence”]

3.5. Optimizing for Beam Search

Adaptive beam width: Adjust beam width according to processing stage

Initial stage: Broad beam (50-100% initialization)
Middle stage: Medium beam (30-70% initialization)
Final stage: Narrow beam (10-50% initialization)

Diversity enforcement: Ensures diversity in rays is maintained

Penalize too similar solutions using a distance function
Keep only solutions with distance > threshold

Priority scheduling: Schedule expansion by priority

High priority: Promising (high score) and different rays
Medium Priority: Promising but similar beam
Low priority: Less promising beam

Smart pruning: Smart pruning based on multiple criteria

Not only based on scores but also on development potential
Use Bloom filters to avoid duplicate states

3.6. Mandatory Checklist for Each Beam Phase

Phase 1 (Initialization):

Identify the problem to be solved
Definition of evaluation function and criteria
Determine the weight for each criterion
Set beam width
Initialize the initial ray

Phase 2 (Expansion):

Identify all possible actions for each ray
Create new state list from each ray
Apply Adaptive beam width if needed
Ensure sufficient number of states are expanded

Phase 3 (Assessment):

Apply the evaluation function to every new state
Calculate score for each state
Apply Diversity enforcement if necessary
Rate all new statuses

Phase 4 (Pruning):

Apply Smart pruning
Remove below threshold state
Save the best status
Calculate the diversity of retained rays

Phase 5 (Testing):

Check the end condition
Evaluate the convergence of the solution
Determine the best solution
Evaluate the reliability of the results

3.7. Beam Phase Transition Prerequisites

Conditions for transitioning from Phase 1 to 2:

Fully defined evaluation function with weights
Set the appropriate beam width
Initial ray initialized validly

Conditions for transitioning to Phase 2→3:

Expanded each ray with at least one possible action
Created at least beamWidth x 2 new states to evaluate

Conditions for transitioning to Phase 3→4:

Rated 100% new condition
Scored for all statuses
Ranked states by score

Conditions for transitioning to Stage 4→5:

Saved correctly k best state
Removed below threshold states
Calculated new ray diversity

Phase 5 End Conditions:

Has met one of the following conditions:

Convergence to solution (top rays are stable)
Achieve maximum number of expansion steps
Reliability threshold ≥ 85%

The best solution has been determined

R1 REASONING [R1]

4.1. Description

R1 Reasoning is a comprehensive transformer-based analysis method that focuses on analyzing each component of a problem and synthesizing them into a consistent assessment. Suitable for complex analysis, evaluation, and synthesis.

4.2. Phased process

Phase 1: Problem Analysis - Analyze and decompose the problem into its components

Phase 2: Component Analysis - In-depth analysis of each component

Phase 3: Synthesis - Synthesize analysis from all components

Phase 4: Evaluation - Evaluate for consistency and comprehensiveness

Phase 5: Conclusion - Draw conclusions and recommendations

4.3. Feedback structure

[R1 Reasoning]

reasoningStage: <stage/5>
thought: “”
components: [

{name: “”, status: “<analyzed/not>”, depth: <1-5>},

{name: “”, status: “<analyzed/not>”, depth: <1-5>}

]

currentComponent: “”
keyInsights: [“<insight 1>”, “<insight 2>”]
contradictions: [

{elements: [“”, “”], resolution: “<resolved/unresolved>”}

realies · April 8, 2025, 10:29am

Great. Now put it in a repository like a normal developer and update your post.

Topic		Replies	Views
Cursor Rules ... Need help please ..? Discussion	4	310	March 21, 2025
Reasoning models through MCP Discussion	2	182	February 3, 2025
[Guide] A Simpler, More Autonomous AI Workflow for Cursor [New Update] Showcase	50	13804	June 1, 2025
Could the official team consider adding a thought process chain knowledge base module? Feature Requests	4	78	April 25, 2025
I expect different system prompts for different models Feature Requests	2	84	February 9, 2025

I created Reasoner-thinking with Rule For AI

Related topics