How it works

How the Shipmoor scan works

The deterministic, local-first engine that catches the defects AI coding agents leave behind, before a human reviewer sees the change. Five stages, four languages, honest degradation.

This is the defect layer (the free Community CLI), code that is broken or invented. Claim Check (claim-admissibility review, which checks whether a change did what the task asked) is a separate layer built on top of this one, and is covered elsewhere.


1. The problem

AI agents write code that compiles, lints clean, and often passes the obvious tests, yet still ships defects a careful reviewer would catch. These defects share a shape:

  • imports of packages that do not exist, or that are not declared in the project manifest
  • function bodies that are just pass, ..., or throw new Error("not implemented")
  • safety bypasses like any, @ts-ignore, or bare except: blocks
  • debug output left behind, broad panic(err) swallows, unreachable code after return

These are not style issues. They are signals that the agent finished the shape of the work but skipped or invented the substance. Existing linters and security scanners were designed for human code at full repository scope; they catch these patterns only incidentally and tend to be loud about everything else.

Shipmoor narrows the focus: one moment in the workflow, one set of agent-shaped defects, deterministic answers.


2. Where Shipmoor runs

Shipmoor runs in the gap between the agent finished and the human starts reviewing.

AI agent
Developer
Shipmoor
Human reviewer
produces a change
scan the change
verdict
alt verdict is Needs work
send back with findings
else · verdict is Ready or Needs a look
open PR for review

The agent produces an edit. The developer runs shipmoor scan --changed. Shipmoor reads the changed files, classifies findings, and prints a one-line verdict. If the verdict is Needs work, the change goes back to the agent or the developer before any human reviewer is asked to spend attention on it.

The same engine runs in CI on the pull request diff, so the gate is enforced even if a local run was skipped.


3. The pipeline at a glance

Every Shipmoor invocation is a one-shot pipeline. There is no daemon, no history database, no cloud round-trip. Each scan is deterministic: same input, same finding ids.

Input directory · Git changes · patch file
Stage 1Resolve target
Stage 2Dispatch per file by language
Stage 3Detect findings
Stage 4Classify and filter
Stage 5Render and exit
Output Terminal · JSON · SARIF · Markdown
Exit code 0 pass · 1 block

The five stages:

  1. Resolve the scan target (a directory, a Git change set, or a patch file) into a normalized list of files plus the line ranges of any changes.
  2. Dispatch each file to the analyzer that matches its language.
  3. Detect findings using AST-based or line-based rules.
  4. Classify each finding by severity, confidence, category, and whether it was introduced by the change or already existed.
  5. Render the result and exit with a code based on the gate threshold.

4. What Shipmoor scans

Shipmoor accepts five input modes. Exactly one mode is active per scan.

shipmoor scan
Which mode
default
Filesystem walk a directory
staged
Staged Git index only
changed
Changed staged + unstaged + untracked
diff range
Diff range between two refs
patch file
Patch file unapplied diff
Normalized file list + changed line ranges
ModeUse case
shipmoor scan .scan the whole project on disk
shipmoor scan --changedscan everything staged, unstaged, or untracked in the working tree
shipmoor scan --stagedscan only the Git index (pre-commit hook usage)
shipmoor scan --diff main...HEADscan files changed in a branch range (CI usage)
shipmoor scan --patch agent.patchscan a unified diff file before applying it (agent handoff)

Files outside the supported languages are skipped. Files inside .git, node_modules, .venv, dist, build, and similar directories are skipped. .gitignore patterns and project-level ignore: entries are honored. Lock files are never scanned.

When --patch adds a file that does not exist on disk yet, Shipmoor materializes a temporary copy and scans it in place, so a patch produced by an agent gets the same finding ids it would get after the patch is applied. This is called patch and changed parity.


5. Supported languages

Shipmoor analyzes four languages without invoking their compilers.

Source file
Extension
py · pyi
Python analyzer full AST parse
ts tsx js jsx mjs cjs
TypeScript / JavaScript regex + package.json + tsconfig.json
go
Go analyzer regex + go.mod
Findings
LanguageApproach
Pythonfull AST parse via the standard library
TypeScript and JavaScriptline-aware regex plus package.json and tsconfig.json resolution
Goline-aware regex plus go.mod resolution

Generated files (// Code generated ... DO NOT EDIT), test corpora, abstract interfaces, and Python files that are pure re-exports are detected and skipped. Optional Python imports inside try / except ImportError blocks are not flagged.


6. What Shipmoor looks for

The rule catalog is intentionally small. Every rule fits into one of five categories, and every rule carries a default severity that follows a cross-language policy.

Rule catalog
phantom_dependencyunresolvable imports
placeholder_logicempty / constant / TODO bodies
regression_risktrust bypasses, dead code
quality_signaldebug output, bare except
syntax_errorunparseable source
CategoryWhat it catches
phantom_dependencyimports that the project, the registry, or the filesystem cannot resolve
placeholder_logicfunction bodies that are empty, constant, or a TODO panic or throw
regression_risktrust bypasses (any, @ts-ignore), ignored errors, unreachable code
quality_signaldebug output, mutable defaults, bare excepts, oversized functions
syntax_errorsource that cannot be parsed at all

Severity ceilings are consistent across languages. A phantom import is high in every language because the code cannot run as authored. A bare catch-all is low everywhere because the pattern has too many legitimate uses to block on by default.


7. The phantom import check

Phantom import detection is Shipmoor’s headline feature and covers all four supported languages. Each finding carries a subtype that names the specific failure mode.

Import statement detected
Relative import?
Yes
Walk parent dirs check target file exists
Resolves?
Yes
no finding
No
broken_relative_path
No
Standard library?
Yes
no finding
No
Declared in project manifest?
Yes
no finding
No
Local file resolvable?
Yes
no finding
No
Any manifest in scope?
No
unresolved_local_module
Yes
Look up name on public registry
Registry status
exists
missing_manifest_entry
404 missing
hallucinated_package
offline / timeout
missing_manifest_entrymarked unverified

The four subtypes:

SubtypeWhat it means
hallucinated_packagethe package name returned 404 on the public registry; the agent invented it
missing_manifest_entrythe package exists on the registry but is not declared in package.json, requirements.txt, pyproject.toml, or go.mod
broken_relative_patha relative import (from .foo import bar) where the target file does not exist
unresolved_local_modulethe import looks like a local module but no project manifest was found to attempt resolution

Two design choices keep this check accurate:

Monorepo-aware resolution. Shipmoor walks up from each source file to find the nearest manifest, descending into top-level subdirectories during initial discovery. A repo with backend/requirements.txt and frontend/package.json resolves each side against its own dependency set rather than producing cross-stack false positives.

Honest degradation. Registry lookups use a short timeout and are cached. When the network is unavailable, Shipmoor still flags the import but downgrades the message to note that registry confirmation was not possible. When --patch is used against a checkout that has no manifest, phantom-dependency findings are downgraded to medium with an annotation that the context could not be resolved. The framework refuses to be confidently wrong.


8. The placeholder logic check

Placeholder detection catches stub-shaped code: function bodies the agent emitted as scaffolding and never came back to fill in.

Function definition
Suppressed?
Yes
no finding @abstractmethod · ABC · Protocol · tests
No
Strip leading docstring
Body shape
empty · pass · ellipsis · NotImplemented
empty_body
single constant return
constant_return
throw not implemented
not_implemented
panic TODO
todo_panic
TODO comment in Go
todo_comment
real logic
no finding

The shapes Shipmoor recognizes:

  • a body that is only pass, ..., NotImplemented, or empty after the docstring
  • a body that is a single return None, return True, return False, return 0, return "", return [], or return {}
  • a throw new Error("TODO" / "not implemented" / "fixme") in TypeScript or JavaScript
  • a panic("TODO" / "not implemented" / "fixme") in Go
  • an unresolved // TODO, // FIXME, or // HACK comment in non-test Go source

Functions decorated with @abstractmethod, methods of classes that inherit from ABC or Protocol, and files in test corpora are not flagged. Interfaces are supposed to be empty.


9. Trust suppression and quality checks

These rules detect deliberate safety bypasses and debug residue.

The TypeScript and JavaScript checks:

  • trust.any_boundary: an exported function with any in its signature
  • trust.as_any: a value cast through as any
  • trust.ts_ignore: a @ts-ignore directive
  • debug.console: a console.log (or .debug, .info, .warn, .error) in non-test source
  • placeholder.not_implemented: a throw new Error("not implemented")
  • control_flow.unreachable_code: a statement after a terminal return, throw, or process.exit

The Go checks:

  • error.ignored_error: an assignment that discards a likely error return into _
  • error.panic_error: panic(err) as a broad fallback
  • debug.fmt_print: fmt.Print family output in non-test source
  • structure.god_function: a function body of 60 lines or more

The Python checks:

  • quality.mutable_default: a default argument that is a list, dict, or set
  • quality.bare_except: a bare except: with no exception type

10. The finding contract

Every analyzer emits the same shape. A finding is the atomic unit Shipmoor produces.

Findinganalyzer output · stable contract
  • +idSHM-…
  • +rule_id
  • +language
  • +severitycritical…info
  • +confidencelow…high
  • +category
  • +subtype
  • +path
  • +start_line
  • +end_line
  • +messagewhy
  • +root_causehow
  • +recommendationwhat to do
  • +evidence
  • +change_status
  • +fingerprintSHA-256

Each finding carries:

  • a stable id of the form SHM- followed by a 16-character fingerprint prefix
  • a rule_id and language
  • a severity (critical, high, medium, low, info) and confidence (low, medium, high)
  • a category and optional subtype
  • a path and start_line plus end_line location
  • a one-sentence message (why), root_cause (how), and recommendation (what to do)
  • an evidence map with the function name, the source line, the import name, or similar context
  • a change_status of introduced, existing, or unknown
  • a SHA-256 fingerprint of the stable fields, used for suppression in upstream systems

The fingerprint deliberately excludes severity and recommendation text. A cosmetic copy edit to a recommendation does not break a suppression that someone added in GitHub Code Scanning or another SARIF consumer.


11. Diff-aware classification

When the scan was launched in a change-aware mode (--changed, --staged, --diff, --patch), every finding is classified against the diff.

Finding with line range
Mode is change aware?
No
change_status = unknownkeep finding
Yes
Look up changed ranges for this file
Lines intersect a changed range?
Yes
change_status = introducedkeep finding
No
Config: only_introduced?
false
change_status = existingkeep finding
true · default
drop finding

By default Shipmoor reports only findings that intersect the changed line ranges. This is why pre-merge scans stay quiet on a repo with pre-existing debt: existing findings the agent did not touch are suppressed. Configuration can flip this to report existing findings too, but the default keeps the signal aligned with what is in front of the reviewer.


12. The review gate

The gate is the binary decision Shipmoor produces. It compares the highest-severity finding against a threshold and selects one of three verdict states.

Findings list
Any finding at or above fail_on?
no findings at all
Readyexit 0
present, none block
Needs a lookexit 0
at least one blocks
Needs workexit 1
Threshold (--fail-on)What blocks
nonenothing blocks; the gate always passes
criticalonly critical blocks
high (default)critical and high block
mediumcritical, high, and medium block

The verdict line printed at the top of every scan reflects the gate decision:

  • Ready when there are no findings at all
  • Needs a look when there are findings but none block at the current threshold
  • Needs work when at least one finding blocks

Exit codes:

CodeMeaning
0gate passed (Ready or Needs a look)
1gate failed (Needs work)
2usage error (bad flag, bad config)
3unexpected scan failure

13. Outputs

One scan result, four ways to consume it.

Scan result
Human terminal
JSON shipmoor.scan.v1
.shipmoor/last-scan.jsonfor shipmoor explain
SARIF 2.1.0 GitHub Code Scanning
Markdown summary PR / CI step summary

Human terminal: a verdict line, a project context line (manifests detected, file count, gate threshold), findings grouped by file with blockers first, and a footer with the next command to run.

JSON: a stable shipmoor.scan.v1 schema with tool metadata, scan metadata, summary counts, and the full findings list. Stable ordering by path, then line, then rule id makes diffs against previous reports meaningful.

SARIF 2.1.0: full SARIF output with severity mapped to SARIF levels (critical and high become error, medium becomes warning, low and info become note), partial fingerprints for suppression, and properties carrying confidence, subtype, change status, and evidence. This is what GitHub Code Scanning consumes.

Markdown summary: a compact table suitable for posting into a pull request description or a CI step summary.

The JSON output is also written to .shipmoor/last-scan.json after every scan so shipmoor explain can drill into a single finding without re-scanning.


14. The explain view

shipmoor explain <id> reads the last scan report (or one passed via --from report.json) and prints a single finding in a fixed grammar:

high  phantom import  python.phantom_import
src/flask/ai_helpers.py:7  SHM-fd914abf5fdea281  confidence high  phantom_dependency

why
  Package 'incidentlib' does not exist on PyPI.

root cause
  The import name could not be found in the Python package registry.

fix
  No package named 'incidentlib' exists on PyPI. Ask the agent to use a
  real package or remove the import.

evidence
  import_name: incidentlib
  registry_lookup: missing

The id can be a unique prefix. The same grammar is used inline by scan, so users learn one format.


15. Configuration

A single optional file, .shipmoor.yaml, controls scan behavior. shipmoor init writes a starter version.

schema_version: 1
languages:
  enabled: [python, typescript, javascript, go]
ignore:
  - .shipmoor/
rules:
  disabled: []
  severity_overrides: {}
thresholds:
  fail_on: high
diff:
  only_introduced: true
output:
  default_format: human

The hierarchy is: command-line flags win, then .shipmoor.yaml, then built-in defaults. Disabled rules are filtered out after detection. Severity overrides are applied to the finding before classification.


16. CI integration

The same engine, three deployment surfaces.

Local pre-merge
shipmoor scan —changed—fail-on high
Pre-commit hook
shipmoor scan —staged—fail-on high
Pull request CI
shipmoor scan —diff origin/main…HEAD—sarif —fail-on high
Exit code 0 = pass · 1 = block

Local pre-merge: shipmoor scan --changed --fail-on high after the agent finishes. The exit code tells the developer whether to send the change back or commit it.

Pre-commit hook: shipmoor scan --staged --fail-on high. The hook aborts the commit on Needs work.

Pull request CI: shipmoor scan --diff origin/main...HEAD --sarif --output shipmoor.sarif --markdown-summary $GITHUB_STEP_SUMMARY --fail-on high. The SARIF is uploaded to GitHub Code Scanning, the Markdown lands in the PR step summary, and the exit code controls the check status.


17. End-to-end example

A file an agent might plausibly add to a real project. Here it is in Flask’s source tree, where the project manifests are present, so legitimate imports resolve and only the invented ones flag:

"""AI-assisted helpers for Flask request handling."""

from incidentlib.ai import summarize
from sqlalchemy.orm import Session


def build_payload_summary(payload, tags=[]):
    tags.append(payload.get("kind"))
    return summarize.compact(payload, tags=tags)


def record_audit(session: Session, request_id: str, text: str) -> None:
    # TODO: implement actual persistence
    pass


def safe_record(session: Session, request_id: str, text: str) -> None:
    try:
        record_audit(session, request_id, text)
        session.commit()
    except:
        pass

Running shipmoor scan --changed produces:

Needs work  2 of 5 findings block review
pyproject.toml (8 deps), examples/celery/requirements.txt (21 deps), examples/celery/pyproject.toml (2 deps), examples/javascript/pyproject.toml (2 deps), examples/tutorial/pyproject.toml (2 deps)  1 file  gate high

src/flask/ai_helpers.py  5 findings
  high     :7   phantom import     python.phantom_import
    Package 'incidentlib' does not exist on PyPI.
    -> No package named 'incidentlib' exists on PyPI. Use a real package or remove the import.
  high     :8   phantom import     python.phantom_import
    Package 'sqlalchemy' is imported but not declared in requirements.txt or pyproject.toml.
    -> 'sqlalchemy' is used but not declared. Add it to the manifest or remove the import.
  medium   :11  mutable default    python.quality.mutable_default
    Function 'build_payload_summary' uses a mutable default argument.
  medium   :21  empty body         python.placeholder.empty_body
    Function 'record_audit' has no meaningful implementation.
  low      :30  bare except        python.quality.bare_except
    Bare except catches all exceptions.

gate fail  2 high blocks at threshold "high"  exit 1
  fix the 2 blockers, then re-run  shipmoor scan --changed --fail-on high
  drill into one  shipmoor explain SHM-fd914abf5fdea281
2 medium  1 low won't block, worth a look.

Two things to read carefully. First, the same rule (python.phantom_import) prints two different messages because the subtypes differ. incidentlib is a hallucinated_package (no such thing on PyPI, the agent invented it), while sqlalchemy is a missing_manifest_entry, a real package the change forgot to declare. A reviewer can tell at a glance which is which. Second, the context line shows real manifests, not degraded resolvers: legitimate imports resolve against the project’s actual dependencies, so only the planted defects surface, and across Flask’s other source files the scan stays silent. The signal is the change, not the tree.


18. Design principles

The framework is small on purpose. A few principles explain the shape:

  • Deterministic only. Same input, same finding ids, same exit code. No ML scoring, no calibration, no rolling thresholds. Reproducibility is more useful than nuance at this stage of the workflow.
  • Local first. No account, no upload, no telemetry. The whole engine runs offline. Registry lookups are the only network calls and they fail gracefully.
  • One moment, one job. Shipmoor scans changes at the agent-to-human handoff. The bet is that timing and shape matter more than breadth of rule catalog at this point in the workflow.
  • Discrete findings, not scores. Every defect is a fingerprinted finding with a precise message. There is no aggregate quality score because suppression, triage, and CI gates all need atomic units, not numbers.
  • Honest degradation. When Shipmoor cannot prove a finding, it says so in the message and lowers its confidence rather than guessing.

19. The verdict loop

The simplest way to think about Shipmoor:

Agent finishes a change
shipmoor scan —changed
Verdict
Ready · Needs a lookopen PR or commit
Needs worksend back to agent or fix
Needs work · iterate

The job of the framework is to make that decision quickly, with enough evidence that the developer can act without thinking about it.


20. In one sentence

Timing and shape, not breadth. Shipmoor scans the agent’s change at the moment it hands off to human review, names a small set of high-confidence defects, and prints a verdict you can act on in seconds. The whole engine is deterministic and local: same input, same finding ids, same exit code, no source upload. Where it cannot prove a finding, it says so and lowers its confidence rather than guessing.

This is the defect layer. Claim Check, which checks whether a change did what the task asked, builds on the same deterministic foundation.

Try it

Install the Community CLI and run it on your next agent-authored change:

curl -fsSL https://dl.shipmoor.dev/install-community-cli.sh | bash
cd path/to/your/repo
shipmoor scan --changed

It is free, local, and needs no account. See pricing for Team and Enterprise, or read what AI code integrity means.

Contact sales

Our team can help with custom support, team rollouts, and self-hosted deployments. Or to get started now, explore our self-serve plans.