How do you stop an AI's guess from quietly becoming "fact" in the next tool?

@deanrie nudged me to spin this out of the “handling context across different AI coding tools” thread, so here it is.

Quick recap if you missed the other thread. Everyone’s been talking about pruning. Keeping memory small, deciding what’s worth storing. But there’s a different problem sitting underneath that nobody really named. Not “is this worth keeping” but “is this even true.”

Here’s the thing that kept biting me. After a few hops between Claude Code, Codex and Cursor, my memory had stuff the agent just made up and never checked, sitting right next to stuff I’d actually confirmed. Same shape, same weight, looks equally legit. Then one of those guesses gets pulled into a fresh session like it’s gospel, and the next agent quietly builds on it. That’s how cross-tool memory rots.

So the thing I’ve been building (piia-engram, local-first, open source) basically runs on three rules.

First, nothing the agent writes is trusted by default. It can’t promote its own stuff. It only becomes “confirmed” if I sign off, or if a hard signal does, like a passing test. No grading its own homework.

Second, “confirmed” isn’t one flavor. A test result is ground truth. Me saying yes is strong. The agent’s own reasoning is just a guess until something real touches it. Different sources, different weight.

Third, and this is the one I actually cared about. The tag lives in one local store that every tool reads through the same MCP server. Not plain text I’m praying each tool re-reads, not a separate memory per tool. One store, so the tag just travels with it. The catch, and I’ll be honest, it only works for tools that actually read that store. Anything walled off stays walled off.

Stuff I’d genuinely like to argue about:

  • where’s the line between “a human confirmed it” and “a test confirmed it,” and should they go stale at different speeds
  • how do you keep a “confirmed” fact from rotting after the world moves on (the classic “we use Jest” surviving a switch to Vitest)
  • for tools that just won’t read a shared store, is there an honest fix or do you just accept the walls

Repo’s in my profile if you want to see how it’s wired. Mostly I just want people to poke holes in it.

Glad you moved this into a separate thread. This topic deserves its own space outside the pruning discussion.

The most valuable thing in your approach, in my view, is the second principle: confirmed is not a single flavor. Most memory approaches collapse everything into a binary yes or no, and that is exactly where the rot you describe comes from. Provenance plus different weights by source is what is usually missing.

On your three questions, here’s what I think:

  • The line between a human-confirmed fact vs a test-confirmed fact, and how fast it expires. I’d separate them by nature, not just by weight. A test is a re-runnable truth. You can recheck it cheaply, so it doesn’t need a TTL, it needs a trigger. The fact is valid as long as the test is green. A human yes can’t be rechecked without asking a human again, so that’s what should decay over time. So the test doesn’t really expire slower, it’s tied to a signal, not to the clock.

  • We use Jest survives a move to Vitest. This is exactly the kind of case where TTL won’t help. The fact didn’t get old with time, it got invalidated by an event. It makes sense to attach facts to observable anchors, like a dependency or config existing in the repo, so when the anchor changes it automatically drops the fact from confirmed back to guess instead of waiting for a timeout.

  • Tools that don’t read the shared store. The honest answer is walls are still walls, there’s no magic here. But you can make the boundary explicit. Mark facts coming from non-reading tools as unverified on input. Then walled-off tools don’t silently poison the store, they go into the same guess until touched bucket as the agent’s own assumptions.

On the Cursor side, Rules and Memories cover some of this pain inside one tool, but they don’t have a trust and provenance layer like yours. So your approach is interesting. Drop the repo link in the thread and I’ll take a look at how the promotion logic works.

Thanks for the close read. You took this further than I had.

Your split is better than mine. Right now my decay is all time-based. Fresh, aging, stale, just by age. But you’re right. A test is something you can re-run. So it doesn’t need a timer, it needs a trigger. If the test is green, the fact stays true. Only a human “yes” should fade with time. I’ll split those two.

Same with Jest to Vitest. A timer is the wrong tool there. The fact didn’t get old. An event killed it. Better idea: tie a confirmed fact to something real in the repo, a dependency or a config line. When that thing changes, the fact drops back to “guess.” I’m adding this.

“Unverified on input” is a good call too. Right now I just keep outside stuff out of the store. Letting it in as “guess until checked” is more honest, and more useful.

Code is here: GitHub - Patdolitse/piia-engram: Local-first AI memory you can see, edit, and override — portable across Claude Code, Codex, Cursor, Windsurf, and other MCP coding tools. · GitHub

The part you asked about: an agent can’t set its own trust level. strip_untrusted_trust_fields in storage.py removes any tier it tries to give itself. To become “verified,” it has to go through staging review. Provenance and the time-based freshness are in provenance.py.

Two honest notes, since you’ll read the code. One, the trust gating is opt-in right now, not on by default. I need to tighten that. Two, the trigger and anchor ideas aren’t built yet. Your reply just named the gap.

This thread really helped. If you can point me to how Cursor Memories decides what to surface, I’d read it.

Thanks, I read it carefully. The foundation is solid: pure, stdlib-only, dicts aren’t mutated, so calling it on the read path is actually safe. No questions there.

But one structural issue outweighs everything else, and it comes straight from our earlier chat. We agreed that a test-confirmed fact is held by a trigger, and time-based fading is only for human-yes. Right now compute_freshness applies the same FRESH_MAX_DAYS and AGING_MAX_DAYS to any entry, without looking at the source. That means a green test fact will become stale after 90 days even though nothing broke. You already have resolve_source_agent and trust tiers, but freshness doesn’t use them at all. Until you close that gap, you can’t let freshness drive promotion or demotion because it’ll sink valid facts just because a timer ran out.

How I’d untangle it: freshness should check the source. Time decay should be only for human-confirmed. Test and anchor facts should be taken out of the time timeline entirely, either via a separate status like trigger_bound, or by simply skipping time decay. And I’d move the thresholds out of module-level constants into arguments for compute_freshness since once decay is source-aware, you’ll want different thresholds for different sources.

Smaller stuff for later:

  • _clean_identifier: the docstring promises not free text or paths, but in practice it only strips newlines and length. A path or short content will get through. If the goal is anti-injection, you need an explicit ban on separators.
  • Future timestamps are clamped with max(0.0, ...). That prevents crashes, but a garbage or skewed date will silently become fresh. I’d flag those instead of treating them as fresh.
  • annotate_freshness does a shallow dict(item). Nested provenance is still by reference. It’s fine now, but it’ll bite if someone starts touching nested data.
  • basis returns the string none, while status uses the constant UNKNOWN. Harmless, but make basis a constant too for consistency.

The fundamentals are right. The main thing is to tie decay to the source before freshness starts moving trust. Once you’ve got source-aware decay and anchor triggers, share it. I’d be interested to see it on real history.

This is really generous. Thanks for actually reading the code.

You nailed the main one. It’s the real hole. Right now compute_freshness is source-blind. It puts the same FRESH_MAX_DAYS and AGING_MAX_DAYS on everything. So a green test fact goes stale at 90 days even though nothing broke. resolve_source_agent and the tiers are right there. Freshness just never looks at them. So yeah, I can’t let freshness drive promotion or demotion yet. It would sink good facts on a timer.

Here’s what I’m taking from this.

  • time decay only for human-confirmed facts
  • test and anchor facts come off the time line completely, a trigger_bound status, not a clock
  • thresholds move out of module constants into args on compute_freshness, since once it knows the source you want different numbers per source

And the order matters like you said. Source-aware decay first. Then let freshness touch trust. Not the other way round.

The smaller ones are all fair.

  • _clean_identifier overpromises. It only strips newlines and length. A short path slips through. If it’s meant to block injection it needs a hard ban on separators. I’ll make the code and the docstring agree.
  • future timestamps quietly turning fresh is wrong. I’ll flag a skewed date instead of clamping it.
  • annotate_freshness only shallow copies. Nested provenance is still a reference. I’ll deep copy before anything touches nested data.
  • basis returns the string none. status uses UNKNOWN. I’ll make basis a constant too.

I’ll build source-aware decay and the anchor triggers, then bring it back here on real history like you offered. This gave me a much sharper target than I had. Thanks again.

I don’t really get the concept here. Am I understanding the problem you are trying to solve correctly? You are trying to deal with agent hallucination/mistakes/lies via a memory server system? That’s an interesting idea. I do think a good memory system will inevitably cut down on such problems since they are often caused by the agent nor remembering previous choices/actions. But I am unclear how you are trying to go about solving this problem. It seems a very open ended problem to try and run tests on saved data to confirm accuracy. I suppose if the project is very narrow this could be possible, like if a set of playwright tests, for instance, was all you were concerned with verifying. But, what if the type of information saved is broader, involved multiple projects, multiple testing systems, different architectures, etc? It seems like if you tried to somehow verify all that data, you would do not much else. If you don’t mind laying our your approach a little more, I am curious.

@neverinfamous, I think it’s worth rephrasing a bit, since there’s a small misunderstanding here. @Patdolitse, correct me if I’m wrong.

The idea isn’t to run tests against all saved data and “verify” it that way. That really wouldn’t scale. The point is provenance: every fact in memory carries a tag showing where it came from, and the trust level travels with that fact across tools.

Three different sources, three different modes:

  • agent guess: not trusted by default, and the agent can’t promote it by itself
  • human-confirmed: trusted a lot, but it fades over time (TTL), since you can only re-check by asking the human again
  • test or anchor-confirmed: tied to a trigger, not a timer. The fact is valid while the test is green or while the anchor (a dependency, a config line) is still there. If the anchor changes, the fact automatically drops back to guess

So nobody is “verifying everything” at runtime. For broad cases with lots of projects and architectures, this works via source tagging, not exhaustive testing. Expensive checks are only used where there’s a cheap, re-runnable signal.

So your point that “good memory reduces hallucinations” is right. This project just adds an extra layer on top: “how much can we trust this memory item at all.”

Please don’t take me as hostile. But I want to ask some hard questions so I can understand better. Maybe they aren’t hard questions and I just don’t grok. But, provenance sounds useful, for sure. If you can efficiently record who makes each and every change to code and even documentation, that sounds great, if you are a coder that modifies code yourself rather than always through agents. Then you could distinguish between code you added/changed and code the agent added/changed. You could do this with commit history, right? But, what if you jump in to modify/fix code the agent writes? Won’t that get confused? I guess I just don’t see how you are going to be able to label every piece of data saved in memory without that itself becoming a large amount of overhead and without introducing more chances of hallucination/misinformation. If the agent is relied on to do the labeling, what’s to prevent them from making mistakes/hallucinations in the process and thereby increasing error? Also, is the assumption that the human makes less mistakes than the agent? That may be but it seems unlikely if the human has to manually tag all these things. I would think the human would give up and just rubber stamp everything.


The three-tier model (agent guess → human-confirmed → test/anchor-confirmed) reframed how I was thinking about the problem, and it’s been rattling around in my head since.

Where it led me is this: what if you didn’t have to label trust at all?

My concern with any system where the agent assigns its own trust level is that the labels themselves become another surface for hallucination. You’re relying on potentially unreliable information to certify other potentially unreliable information. And if you put that burden on humans instead, the realistic outcome is rubber-stamping — especially across multiple projects.

What I keep coming back to is that most repositories already contain an immutable provenance record that nobody has to create or maintain. Git history tells you who changed something, when they changed it, and exactly what changed. That information exists whether humans or agents are doing the work, and neither party can fabricate it after the fact. A commit SHA is cryptographic. You can’t hallucinate one into existence.

So instead of trying to make each memory item carry its own trust score, I’ve been thinking about giving agents tools to answer targeted questions against that existing record:

  • Has this file changed since this memory was created?
  • Which commits relate to this issue?
  • Is this documentation describing the current state of the code?
  • Was this change made by a human or an agent?

None of that requires exhaustive verification. It’s cheap and targeted. An agent checking whether a piece of code has been modified since a journal entry was written is one query, not a full audit.

I recently began embedding commit SHAs directly into changelogs alongside agent-optimized descriptions. That way an agent reading a changelog entry can trace it back to the exact diff without any additional lookup overhead, and when it does need to dig deeper, the SHA is right there.

The broader picture I’m planning now thanks to this discussion is combining structured memory with git history mining — not to verify everything, but to give agents a way to detect when something they “know” might be stale, superseded, or contradicted by later changes. A lot of what we call hallucination is really just context drift and lost project history. Better memory helps with that, but access to objective provenance data would help more.

Your framing of “the fact is valid while the anchor is still there” is the piece that clicked for me. I’m just betting that for software projects, git is the most universal and lowest-overhead anchor available.

What do you think?