I think to have found what’s the problem, its called underthinking
In complex thinking(like math) there’s this problem called underthinking and evidenced in this paper https://arxiv.org/pdf/2501.18585
This paper uses TIP(thought switching penalty)
We conducted the experiments using QwQ-32B-Preview, as the DeepSeek-R1-671B API does not
allow for the modification of logit
Searching further found this gist exploring the paper with AI to use prompts instead of modifying logit: AI Prompt python programming usecase (TIE Methodology) · GitHub
First try with a clean context shows behavior difference:
They would produce different results for cases where:
* event.start < segment.start but close to boundary
* Floating point precision edge cases near the min_gap threshold
Second try using the mega prompt shows no behavior difference:
Conclusion:
Both functions exhibit identical behavior across all possible input scenarios. The structural differences in condition checking are logically equivalent.
Try using this mega-prompt from the gist to solve complex thinking scenarios:
Solve this problem using a focused and persistent approach. Begin by selecting a single line of reasoning and explore it as deeply as possible. Demonstrate sustained reasoning by meticulously showing every step of your logic, even if the solution path becomes challenging or complex. Prematurely changing your strategy is strongly discouraged. Do not use phrases that suggest a shift in approach, such as "Alternatively, " unless your current strategy is definitively and demonstrably incorrect. If, and only if, you have completely exhausted all possibilities within your initial approach and can prove it will not lead to a correct solution, you may then consider a new strategy. If a strategy change becomes absolutely necessary, provide a clear and detailed justification, explaining why the first approach failed before moving on. Your goal is to demonstrate thoroughness and depth in your reasoning process, prioritizing a complete exploration of a single approach over premature exploration of multiple approaches.
Its possible deepseek chat automatically uses thinking switching penalties for complex thinking as model used by Cursor is the 671B one as tested from this user(and confirmed by Cursor team): Is Cursor using the full power of DeepSeek R1? - #23 by Kirai