How it works
How the Shipmoor scan works
The deterministic, local-first engine that catches the defects AI coding agents leave behind, before a human reviewer sees the change. Five stages, four languages, honest degradation.
This is the defect layer (the free Community CLI), code that is broken or invented. Claim Check (claim-admissibility review, which checks whether a change did what the task asked) is a separate layer built on top of this one, and is covered elsewhere.
1. The problem
AI agents write code that compiles, lints clean, and often passes the obvious tests, yet still ships defects a careful reviewer would catch. These defects share a shape:
- imports of packages that do not exist, or that are not declared in the project manifest
- function bodies that are just
pass,..., orthrow new Error("not implemented") - safety bypasses like
any,@ts-ignore, or bareexcept:blocks - debug output left behind, broad
panic(err)swallows, unreachable code afterreturn
These are not style issues. They are signals that the agent finished the shape of the work but skipped or invented the substance. Existing linters and security scanners were designed for human code at full repository scope; they catch these patterns only incidentally and tend to be loud about everything else.
Shipmoor narrows the focus: one moment in the workflow, one set of agent-shaped defects, deterministic answers.
2. Where Shipmoor runs
Shipmoor runs in the gap between the agent finished and the human starts reviewing.
The agent produces an edit. The developer runs shipmoor scan --changed. Shipmoor reads the changed files, classifies findings, and prints a one-line verdict. If the verdict is Needs work, the change goes back to the agent or the developer before any human reviewer is asked to spend attention on it.
The same engine runs in CI on the pull request diff, so the gate is enforced even if a local run was skipped.
3. The pipeline at a glance
Every Shipmoor invocation is a one-shot pipeline. There is no daemon, no history database, no cloud round-trip. Each scan is deterministic: same input, same finding ids.
The five stages:
- Resolve the scan target (a directory, a Git change set, or a patch file) into a normalized list of files plus the line ranges of any changes.
- Dispatch each file to the analyzer that matches its language.
- Detect findings using AST-based or line-based rules.
- Classify each finding by severity, confidence, category, and whether it was introduced by the change or already existed.
- Render the result and exit with a code based on the gate threshold.
4. What Shipmoor scans
Shipmoor accepts five input modes. Exactly one mode is active per scan.
shipmoor scan| Mode | Use case |
|---|---|
shipmoor scan . | scan the whole project on disk |
shipmoor scan --changed | scan everything staged, unstaged, or untracked in the working tree |
shipmoor scan --staged | scan only the Git index (pre-commit hook usage) |
shipmoor scan --diff main...HEAD | scan files changed in a branch range (CI usage) |
shipmoor scan --patch agent.patch | scan a unified diff file before applying it (agent handoff) |
Files outside the supported languages are skipped. Files inside .git, node_modules, .venv, dist, build, and similar directories are skipped. .gitignore patterns and project-level ignore: entries are honored. Lock files are never scanned.
When --patch adds a file that does not exist on disk yet, Shipmoor materializes a temporary copy and scans it in place, so a patch produced by an agent gets the same finding ids it would get after the patch is applied. This is called patch and changed parity.
5. Supported languages
Shipmoor analyzes four languages without invoking their compilers.
py · pyits tsx js jsx mjs cjsgo| Language | Approach |
|---|---|
| Python | full AST parse via the standard library |
| TypeScript and JavaScript | line-aware regex plus package.json and tsconfig.json resolution |
| Go | line-aware regex plus go.mod resolution |
Generated files (// Code generated ... DO NOT EDIT), test corpora, abstract interfaces, and Python files that are pure re-exports are detected and skipped. Optional Python imports inside try / except ImportError blocks are not flagged.
6. What Shipmoor looks for
The rule catalog is intentionally small. Every rule fits into one of five categories, and every rule carries a default severity that follows a cross-language policy.
phantom_dependencyunresolvable importsplaceholder_logicempty / constant / TODO bodiesregression_risktrust bypasses, dead codequality_signaldebug output, bare exceptsyntax_errorunparseable source| Category | What it catches |
|---|---|
phantom_dependency | imports that the project, the registry, or the filesystem cannot resolve |
placeholder_logic | function bodies that are empty, constant, or a TODO panic or throw |
regression_risk | trust bypasses (any, @ts-ignore), ignored errors, unreachable code |
quality_signal | debug output, mutable defaults, bare excepts, oversized functions |
syntax_error | source that cannot be parsed at all |
Severity ceilings are consistent across languages. A phantom import is high in every language because the code cannot run as authored. A bare catch-all is low everywhere because the pattern has too many legitimate uses to block on by default.
7. The phantom import check
Phantom import detection is Shipmoor’s headline feature and covers all four supported languages. Each finding carries a subtype that names the specific failure mode.
The four subtypes:
| Subtype | What it means |
|---|---|
hallucinated_package | the package name returned 404 on the public registry; the agent invented it |
missing_manifest_entry | the package exists on the registry but is not declared in package.json, requirements.txt, pyproject.toml, or go.mod |
broken_relative_path | a relative import (from .foo import bar) where the target file does not exist |
unresolved_local_module | the import looks like a local module but no project manifest was found to attempt resolution |
Two design choices keep this check accurate:
Monorepo-aware resolution. Shipmoor walks up from each source file to find the nearest manifest, descending into top-level subdirectories during initial discovery. A repo with backend/requirements.txt and frontend/package.json resolves each side against its own dependency set rather than producing cross-stack false positives.
Honest degradation. Registry lookups use a short timeout and are cached. When the network is unavailable, Shipmoor still flags the import but downgrades the message to note that registry confirmation was not possible. When --patch is used against a checkout that has no manifest, phantom-dependency findings are downgraded to medium with an annotation that the context could not be resolved. The framework refuses to be confidently wrong.
8. The placeholder logic check
Placeholder detection catches stub-shaped code: function bodies the agent emitted as scaffolding and never came back to fill in.
The shapes Shipmoor recognizes:
- a body that is only
pass,...,NotImplemented, or empty after the docstring - a body that is a single
return None,return True,return False,return 0,return "",return [], orreturn {} - a
throw new Error("TODO" / "not implemented" / "fixme")in TypeScript or JavaScript - a
panic("TODO" / "not implemented" / "fixme")in Go - an unresolved
// TODO,// FIXME, or// HACKcomment in non-test Go source
Functions decorated with @abstractmethod, methods of classes that inherit from ABC or Protocol, and files in test corpora are not flagged. Interfaces are supposed to be empty.
9. Trust suppression and quality checks
These rules detect deliberate safety bypasses and debug residue.
The TypeScript and JavaScript checks:
trust.any_boundary: an exported function withanyin its signaturetrust.as_any: a value cast throughas anytrust.ts_ignore: a@ts-ignoredirectivedebug.console: aconsole.log(or.debug,.info,.warn,.error) in non-test sourceplaceholder.not_implemented: athrow new Error("not implemented")control_flow.unreachable_code: a statement after a terminalreturn,throw, orprocess.exit
The Go checks:
error.ignored_error: an assignment that discards a likely error return into_error.panic_error:panic(err)as a broad fallbackdebug.fmt_print:fmt.Printfamily output in non-test sourcestructure.god_function: a function body of 60 lines or more
The Python checks:
quality.mutable_default: a default argument that is a list, dict, or setquality.bare_except: a bareexcept:with no exception type
10. The finding contract
Every analyzer emits the same shape. A finding is the atomic unit Shipmoor produces.
- +idSHM-…
- +rule_id
- +language
- +severitycritical…info
- +confidencelow…high
- +category
- +subtype
- +path
- +start_line
- +end_line
- +messagewhy
- +root_causehow
- +recommendationwhat to do
- +evidence
- +change_status
- +fingerprintSHA-256
Each finding carries:
- a stable
idof the formSHM-followed by a 16-character fingerprint prefix - a
rule_idandlanguage - a
severity(critical,high,medium,low,info) andconfidence(low,medium,high) - a
categoryand optionalsubtype - a
pathandstart_lineplusend_linelocation - a one-sentence
message(why),root_cause(how), andrecommendation(what to do) - an
evidencemap with the function name, the source line, the import name, or similar context - a
change_statusofintroduced,existing, orunknown - a SHA-256
fingerprintof the stable fields, used for suppression in upstream systems
The fingerprint deliberately excludes severity and recommendation text. A cosmetic copy edit to a recommendation does not break a suppression that someone added in GitHub Code Scanning or another SARIF consumer.
11. Diff-aware classification
When the scan was launched in a change-aware mode (--changed, --staged, --diff, --patch), every finding is classified against the diff.
change_status = unknownkeep findingchange_status = introducedkeep findingchange_status = existingkeep findingBy default Shipmoor reports only findings that intersect the changed line ranges. This is why pre-merge scans stay quiet on a repo with pre-existing debt: existing findings the agent did not touch are suppressed. Configuration can flip this to report existing findings too, but the default keeps the signal aligned with what is in front of the reviewer.
12. The review gate
The gate is the binary decision Shipmoor produces. It compares the highest-severity finding against a threshold and selects one of three verdict states.
fail_on?Threshold (--fail-on) | What blocks |
|---|---|
none | nothing blocks; the gate always passes |
critical | only critical blocks |
high (default) | critical and high block |
medium | critical, high, and medium block |
The verdict line printed at the top of every scan reflects the gate decision:
Readywhen there are no findings at allNeeds a lookwhen there are findings but none block at the current thresholdNeeds workwhen at least one finding blocks
Exit codes:
| Code | Meaning |
|---|---|
| 0 | gate passed (Ready or Needs a look) |
| 1 | gate failed (Needs work) |
| 2 | usage error (bad flag, bad config) |
| 3 | unexpected scan failure |
13. Outputs
One scan result, four ways to consume it.
.shipmoor/last-scan.jsonfor shipmoor explainHuman terminal: a verdict line, a project context line (manifests detected, file count, gate threshold), findings grouped by file with blockers first, and a footer with the next command to run.
JSON: a stable shipmoor.scan.v1 schema with tool metadata, scan metadata, summary counts, and the full findings list. Stable ordering by path, then line, then rule id makes diffs against previous reports meaningful.
SARIF 2.1.0: full SARIF output with severity mapped to SARIF levels (critical and high become error, medium becomes warning, low and info become note), partial fingerprints for suppression, and properties carrying confidence, subtype, change status, and evidence. This is what GitHub Code Scanning consumes.
Markdown summary: a compact table suitable for posting into a pull request description or a CI step summary.
The JSON output is also written to .shipmoor/last-scan.json after every scan so shipmoor explain can drill into a single finding without re-scanning.
14. The explain view
shipmoor explain <id> reads the last scan report (or one passed via --from report.json) and prints a single finding in a fixed grammar:
high phantom import python.phantom_import
src/flask/ai_helpers.py:7 SHM-fd914abf5fdea281 confidence high phantom_dependency
why
Package 'incidentlib' does not exist on PyPI.
root cause
The import name could not be found in the Python package registry.
fix
No package named 'incidentlib' exists on PyPI. Ask the agent to use a
real package or remove the import.
evidence
import_name: incidentlib
registry_lookup: missing
The id can be a unique prefix. The same grammar is used inline by scan, so users learn one format.
15. Configuration
A single optional file, .shipmoor.yaml, controls scan behavior. shipmoor init writes a starter version.
schema_version: 1
languages:
enabled: [python, typescript, javascript, go]
ignore:
- .shipmoor/
rules:
disabled: []
severity_overrides: {}
thresholds:
fail_on: high
diff:
only_introduced: true
output:
default_format: human
The hierarchy is: command-line flags win, then .shipmoor.yaml, then built-in defaults. Disabled rules are filtered out after detection. Severity overrides are applied to the finding before classification.
16. CI integration
The same engine, three deployment surfaces.
shipmoor scan —changed—fail-on highshipmoor scan —staged—fail-on highshipmoor scan —diff origin/main…HEAD—sarif —fail-on highLocal pre-merge: shipmoor scan --changed --fail-on high after the agent finishes. The exit code tells the developer whether to send the change back or commit it.
Pre-commit hook: shipmoor scan --staged --fail-on high. The hook aborts the commit on Needs work.
Pull request CI: shipmoor scan --diff origin/main...HEAD --sarif --output shipmoor.sarif --markdown-summary $GITHUB_STEP_SUMMARY --fail-on high. The SARIF is uploaded to GitHub Code Scanning, the Markdown lands in the PR step summary, and the exit code controls the check status.
17. End-to-end example
A file an agent might plausibly add to a real project. Here it is in Flask’s source tree, where the project manifests are present, so legitimate imports resolve and only the invented ones flag:
"""AI-assisted helpers for Flask request handling."""
from incidentlib.ai import summarize
from sqlalchemy.orm import Session
def build_payload_summary(payload, tags=[]):
tags.append(payload.get("kind"))
return summarize.compact(payload, tags=tags)
def record_audit(session: Session, request_id: str, text: str) -> None:
# TODO: implement actual persistence
pass
def safe_record(session: Session, request_id: str, text: str) -> None:
try:
record_audit(session, request_id, text)
session.commit()
except:
pass
Running shipmoor scan --changed produces:
Needs work 2 of 5 findings block review
pyproject.toml (8 deps), examples/celery/requirements.txt (21 deps), examples/celery/pyproject.toml (2 deps), examples/javascript/pyproject.toml (2 deps), examples/tutorial/pyproject.toml (2 deps) 1 file gate high
src/flask/ai_helpers.py 5 findings
high :7 phantom import python.phantom_import
Package 'incidentlib' does not exist on PyPI.
-> No package named 'incidentlib' exists on PyPI. Use a real package or remove the import.
high :8 phantom import python.phantom_import
Package 'sqlalchemy' is imported but not declared in requirements.txt or pyproject.toml.
-> 'sqlalchemy' is used but not declared. Add it to the manifest or remove the import.
medium :11 mutable default python.quality.mutable_default
Function 'build_payload_summary' uses a mutable default argument.
medium :21 empty body python.placeholder.empty_body
Function 'record_audit' has no meaningful implementation.
low :30 bare except python.quality.bare_except
Bare except catches all exceptions.
gate fail 2 high blocks at threshold "high" exit 1
fix the 2 blockers, then re-run shipmoor scan --changed --fail-on high
drill into one shipmoor explain SHM-fd914abf5fdea281
2 medium 1 low won't block, worth a look.
Two things to read carefully. First, the same rule (python.phantom_import) prints two
different messages because the subtypes differ. incidentlib is a hallucinated_package
(no such thing on PyPI, the agent invented it), while sqlalchemy is a missing_manifest_entry,
a real package the change forgot to declare. A reviewer can tell at a glance which is which.
Second, the context line shows real manifests, not degraded resolvers: legitimate imports
resolve against the project’s actual dependencies, so only the planted defects surface, and
across Flask’s other source files the scan stays silent. The signal is the change, not the tree.
18. Design principles
The framework is small on purpose. A few principles explain the shape:
- Deterministic only. Same input, same finding ids, same exit code. No ML scoring, no calibration, no rolling thresholds. Reproducibility is more useful than nuance at this stage of the workflow.
- Local first. No account, no upload, no telemetry. The whole engine runs offline. Registry lookups are the only network calls and they fail gracefully.
- One moment, one job. Shipmoor scans changes at the agent-to-human handoff. The bet is that timing and shape matter more than breadth of rule catalog at this point in the workflow.
- Discrete findings, not scores. Every defect is a fingerprinted finding with a precise message. There is no aggregate quality score because suppression, triage, and CI gates all need atomic units, not numbers.
- Honest degradation. When Shipmoor cannot prove a finding, it says so in the message and lowers its confidence rather than guessing.
19. The verdict loop
The simplest way to think about Shipmoor:
shipmoor scan —changedThe job of the framework is to make that decision quickly, with enough evidence that the developer can act without thinking about it.
20. In one sentence
Timing and shape, not breadth. Shipmoor scans the agent’s change at the moment it hands off to human review, names a small set of high-confidence defects, and prints a verdict you can act on in seconds. The whole engine is deterministic and local: same input, same finding ids, same exit code, no source upload. Where it cannot prove a finding, it says so and lowers its confidence rather than guessing.
This is the defect layer. Claim Check, which checks whether a change did what the task asked, builds on the same deterministic foundation.
Try it
Install the Community CLI and run it on your next agent-authored change:
curl -fsSL https://dl.shipmoor.dev/install-community-cli.sh | bash
cd path/to/your/repo
shipmoor scan --changed
It is free, local, and needs no account. See pricing for Team and Enterprise, or read what AI code integrity means.