Notes

What a verifiable AI code review actually looks like

By Verabase · 27 April 2026 · 6 min read

Most "AI code review" tools answer fluently. Almost none let you click a claim and land on the line of code that backs it. Here is what that costs you — and what the alternative looks like in practice.

The fluency trap

If you have ever asked an LLM "is this codebase production-ready," you have seen the trap: a confident, well-organised paragraph that sounds correct, contains specific-sounding nouns ("good test coverage," "reasonable error handling," "modular architecture"), and offers no way to check any of it.

For a quick second opinion that is sometimes fine. For an answer you have to defend — to a customer security team, to an acquisition committee, to your own management — fluency without traceability is worse than nothing. It feels like progress. It commits the reviewer to a position they cannot back up. When the buyer asks "where in the code does it say that?" the reviewer is stuck.

"AI code review" as marketed today optimises for the first kind of answer. It writes well. It moves fast. It produces no audit trail.

What a verifiable answer requires

An answer is verifiable when three properties hold:

Every claim points to a specific piece of source. Not "the codebase has tests" — instead "internal/auth/login_test.go:42 exercises the wrong-password path with a 5-attempt-rate-limit assertion." A reader can open that file and check.
The pointer is rendered alongside the claim, not as a footnote. Inline anchors keep readers from dropping the link. A footnote at the bottom of an answer is rarely clicked. An inline pill that says auth/login.go:42 next to the claim is.
The model is told it can only use the supplied evidence. If the prompt allows freelancing, the model freelances. The architecture has to physically gate the answer to anchors that exist in the index. No anchor → no claim → the model says so.

None of this is research-grade novelty. It is discipline. The discipline is what most AI code review tools skip in exchange for sounding more conversational.

What it looks like in Verabase

Verabase shows every grounded answer as a paragraph where each claim is wrapped in an anchor pill linking back to the source. A pill might read r0/internal/auth/login.go; clicking it opens the file at the line the model used. Different repos in a multi-repo run get different prefixes (r0, r1, …) so cross-service claims stay attributable.

Below the answer, a small "Anchored to" footer lists every file path the model used. There is no path the model touched that is not listed; there is no claim that does not resolve to one of those paths.

If the available evidence does not support a strong answer, the answer says so. The model is allowed to write "the snapshot does not show X" — a more useful signal, in a diligence context, than a hallucinated reassurance.

You can see a real, public example here:

Try the live demo →

(The dashboard's demo mode runs against a cached scan of facebook/react. The same flow on your own repos costs $10/month for one repo, or $49/month for up to ten in a single grounded answer.)

What changes for the reviewer

The shift in workflow is small but compounding:

Forwarding goes from "trust me" to "here, click." Sending an evidence-anchored answer to a security questionnaire owner or an investor's technical lead converts the conversation from interpretation to verification. Either they accept your reading of the source, or they have a specific line to argue with.
You stop second-guessing the model. Without anchors, every confident-sounding sentence triggers the same internal "wait, is that actually true?" When the sentence is followed by r0/migration/0007.sql:14, you check once and move on.
The cost of being wrong is contained. A wrong claim with an anchor is correctable: open the file, see what the model misread, push back. A wrong claim without an anchor lives forever in someone else's notes.

Why this is not the default

Three real reasons. The first two are technical, the third is commercial.

Indexing the right thing is hard. Anchored answers need an evidence index where every entry is small enough for a model to cite, large enough to carry context, and stable across builds. Most "vector RAG" stacks index whole files or arbitrary chunks; they retrieve "relevant" snippets but cannot guarantee they will be cited verbatim. Verabase's evidence layer is structured: AST anchors, deterministic identifiers, snippet-bounded.

Constraining the model is hard. Anchored output requires the model to refuse to invent identifiers — even when the user clearly wants an answer. Most prompt engineering is steered the other way: be helpful, fill gaps, confabulate when uncertain. Anchored prompts work against the model's natural pull toward fluency.

"Sounds confident" sells faster than "this is what we can prove." A demo where the model writes a beautiful paragraph wins more pilots than a demo where the model says "the snapshot does not show that." Most AI code review startups optimise for the demo. The answers do not survive contact with a real diligence committee.

How to check this yourself

If you want to test whether your current AI code review tool is verifiable, run this single check:

Ask it a specific question about a part of your code where you already know the truth. Then look at the answer. Is each substantive claim followed by a pointer to source — a file, a line, a function name — that you can click and check? Or is it a paragraph of confident prose with the source vaguely "implied"?

The first kind of answer is auditable. The second kind is plausible. There is no middle ground that survives a real review.

Verabase produces grounded answers like this for any GitHub repo. Try it free (no signup needed for the demo) at verabase.ai, or read the example output on the landing page.