Claim Check · BYO-Judge
Ship agent code with confidence
Coding agents help your team ship faster — but a plausible-looking diff doesn't mean the change did what the task asked. Claim Check catches that gap before review using deterministic probes, with an optional second opinion from your own agent (BYO-Judge).
Your agent finishes a task and writes a confident summary — “Added retry handling for failed Stripe payment webhooks.” The diff looks plausible. But did it bind a handler to the payment-failure event, or refactor something nearby and call it done? Claim Check answers that one question, so you can ship the change instead of stopping to re-read the whole diff. Deterministic probes check the change against the specific, falsifiable expectations the task sets up — and when you want a second opinion on the long tail, it comes from your own agent, clearly labeled and advisory. You move faster because you can trust what the agent shipped, not because you skipped the check.
- Catch claim gaps before review
- BYO-Judge runs in your own agent
- No Shipmoor model · no source upload
- deterministic decides
- LLM only advises
- BYO-Judge
- no hosted model
- no source upload
$ shipmoor scan --diff main...HEAD \ --intent "persist the order and charge the customer, then emit order.paid" \ --agent "claude -p" --author-model-id my-authoring-model Claim check GAP DISCLOSED · coverage 3/4probes · deterministic ✓ satisfied order row persisted to orders ✗ unsatisfied payment captured on checkout ◦ cannot_check refund path — no probe yet llm_inferred · BYO-Judge (claude -p) · advisory second opinion ~ change may charge in a sibling service — verify manually 1 gap caught before review — surfaced while it's still cheap to fix Claim Check on a payment change: deterministic probes find the gap; the BYO-Judge (your own agent) only advises.
Run it
Claim Check appears when you scan a changeset and supply the task's intent. The BYO-Judge is opt-in and only runs on the long tail, at medium-or-higher intent confidence.
-
Check a change against its task
shipmoor scan --staged --intent "add retry to the webhook client" -
Two agreeing sources raise confidence
shipmoor scan --staged --intent "…" --prompt "…" -
Opt into the BYO-Judge (your own agent)
SHIPMOOR_INTENT_DRIFT_STAGE3=1 shipmoor scan --diff main...HEAD --intent "…" --agent "claude -p" -
Assert judge isolation
shipmoor scan --diff main...HEAD --intent "…" --agent "codex exec" --author-model-id my-model --strict-judge-isolation -
Turn the gate on (deterministic only)
shipmoor scan --staged --intent "…" --prompt "…" --verdict-policy .shipmoor/verdict-policy.yaml
No intent supplied? The scan output is unchanged from a plain Community scan. Offline (SHIPMOOR_OFFLINE=1) disables the BYO-Judge entirely.
What BYO-Judge gives you
The optional LLM second opinion handles the long tail Shipmoor has no probe for — on your terms: your model, your provider, your call on what to do with it.
- your model
Shipmoor hosts no model
Shipmoor builds, hosts, and calls no model and opens no network boundary of its own. The call rides your agent under your existing provider relationship.
- advisory
Advisory by default
The opinion is labeled inferred and excluded from the score. It surfaces the gap and stays out of your way — you decide what to do, and any optional CI gate runs on deterministic evidence only.
- masked
No source upload
The agent sees only a masked change signal — secrets never reach the prompt — and Shipmoor records what was asked, not model output replay.
- isolation
Judge isolation
Declare the authoring model; if it matches the judge, Shipmoor warns loudly, and --strict-judge-isolation makes it a hard error. No link validates itself.
Your model, your machine, your call
Claim Check runs locally. The deterministic core never leaves your machine, and the optional LLM second opinion runs in your own agent under your own provider — Shipmoor hosts no model and uploads no source. The result is a verdict you can defend, not a vibe you have to trust.
How Claim Check works
Four steps, and only deterministic evidence moves the verdict. The LLM, when you opt in, only ever advises — so you act on falsifiable evidence, not a guess.
- Resolve the intent
- Deterministic probes check it
- BYO-Judge advises (opt-in)
- Verdict + evidence
The result is advisory by default — it surfaces the gap and stays out of your way. If you choose to gate CI, only deterministic evidence counts toward the verdict; a low-confidence intent or an LLM opinion never can. You stay in control of what ships.
Ship your next agent change with confidence
Install the free CLI, sign in to Shipmoor IC, and check your next agent change against the task it was given.
Get Shipmoor CLI
One installer. One shipmoor command. Free Community scans.
Claim Check & BYO-Judge FAQ FAQ
How the deterministic core and the optional LLM second opinion fit together.