That’s actually a really nice setup, especially the part where you try to reduce dependency on external AI services 
Building your own LLM-powered system + integrating it with real use cases (Discord bot, CMS, live website editing) shows solid hands-on work.
I also like the idea of having a fallback when credits run out — that’s something many people don’t think about.
Just from a technical perspective, a few things come to mind that are worth keeping in mind (not criticism, more like trade-offs):
Model quality: Self-hosted or alternative models usually can’t fully match Claude/GPT in reasoning and consistency, so depending on your use case, responses might feel weaker or less reliable.
Performance & infrastructure: Running your own models (even smaller ones) can get heavy pretty fast — GPU, RAM, latency, scaling, etc. Without strong infra, things can slow down.
Maintenance: With your own system, you basically become responsible for everything (updates, improvements, bugs, scaling), which can grow into a big workload over time.
Safety / control: Especially with things like live website editing, it’s important to have strict safeguards in place to avoid unintended actions.
Costs over time: Sometimes self-hosting looks cheaper at first, but infra + maintenance can end up costing more if not optimized well.
Overall though, this is a strong direction — especially if you combine it with a hybrid approach (your own system + external APIs where needed). That usually gives the best balance between quality and control.
Curious to see how others are approaching this too 