I propose a new set of tests. Getting to the same working state that we can agree upon, in as few of tokens as possible. So rather than trying to maximize the complexity of our (pre)prompt. this task wants to quantify per llm provider, which can get to some objective with the least amount of signal/entropy presented as input.
let me rephrase this ;
whoever makes the dankest game of snake in the prompt:
make the game snake
wins