It seems like you could build a model that could reasonable predict the range of cost for a prompt, given the selected model, the prompt characteristics, the codebase characteristics, and the user history. This would help users better understand and improve their prompts.
You have the perfect dataset for this kind of thing, would be very useful.
Potentially could even cluster users and then fine tune the model on those clusters, for better accuracy. It feels like maybe this could be done with a relatively simple model (not that deep of a network), if features are precomputed and selected well. So it could be fast and cheap, and maybe even run locally.
Then, if the features are human-understandable, you could even provide explanation of why the prompt is more / less expensive. Thereby helping us become better users of AI.
Need help doing this? Joking, I know your team has primo talent.
Having said all of that, I wouldn’t be surprised if you already have this internally.