LLM says "You have my word" in response

Ask the model to write a report on what rules compelled it to say that. It should list all influences from training data, system prompt, rules and prompt that induced positive and negative scores towards this behavior.

1 Like