Two things:
-
What’s the point of having mid-response “Think” blocks, if you remove the previous Think block from the response? This leaves the model repeating OVER AND OVER the exact same CoT. Incredibly inneficient.
-
The models, instead of applying one large edit on the whole file, they go and change one little thing at a time, and WAIT FOR THE SLOW APPLY MODEL to finish every little edit before continuing. Solution: let the models generate all their edits, and then apply everything at once.