How it works

How the Shipmoor scan works

The deterministic, local-first engine that catches the defects AI coding agents leave behind, before a human reviewer sees the change. Five stages, four languages, honest degradation.

Get the Community CLI Get the Community CLI What is AI code integrity? What is AI code integrity?

This is the defect layer (the free Community CLI), code that is broken or invented. Claim Check (claim-admissibility review, which checks whether a change did what the task asked) is a separate layer built on top of this one, and is covered elsewhere.

1. The problem

AI agents write code that compiles, lints clean, and often passes the obvious tests, yet still ships defects a careful reviewer would catch. These defects share a shape:

imports of packages that do not exist, or that are not declared in the project manifest
function bodies that are just pass, ..., or throw new Error("not implemented")
safety bypasses like any, @ts-ignore, or bare except: blocks
debug output left behind, broad panic(err) swallows, unreachable code after return

These are not style issues. They are signals that the agent finished the shape of the work but skipped or invented the substance. Existing linters and security scanners were designed for human code at full repository scope; they catch these patterns only incidentally and tend to be loud about everything else.

Shipmoor narrows the focus: one moment in the workflow, one set of agent-shaped defects, deterministic answers.

2. Where Shipmoor runs

Shipmoor runs in the gap between the agent finished and the human starts reviewing.

AI agent

Developer

Shipmoor

Human reviewer

produces a change

scan the change

verdict

alt verdict is Needs work

send back with findings

else · verdict is Ready or Needs a look

open PR for review

The agent produces an edit. The developer runs shipmoor scan --changed. Shipmoor reads the changed files, classifies findings, and prints a one-line verdict. If the verdict is Needs work, the change goes back to the agent or the developer before any human reviewer is asked to spend attention on it.

The same engine runs in CI on the pull request diff, so the gate is enforced even if a local run was skipped.

3. The pipeline at a glance

Every Shipmoor invocation is a one-shot pipeline. There is no daemon, no history database, no cloud round-trip. Each scan is deterministic: same input, same finding ids.

Input directory · Git changes · patch file

Stage 1Resolve target

Stage 2Dispatch per file by language

Stage 3Detect findings

Stage 4Classify and filter

Stage 5Render and exit

Output Terminal · JSON · SARIF · Markdown

Exit code 0 pass · 1 block

The five stages:

Resolve the scan target (a directory, a Git change set, or a patch file) into a normalized list of files plus the line ranges of any changes.
Dispatch each file to the analyzer that matches its language.
Detect findings using AST-based or line-based rules.
Classify each finding by severity, confidence, category, and whether it was introduced by the change or already existed.
Render the result and exit with a code based on the gate threshold.

4. What Shipmoor scans

Shipmoor accepts five input modes. Exactly one mode is active per scan.

shipmoor scan

◆ Which mode

default

Filesystem walk a directory

staged

Staged Git index only

changed

Changed staged + unstaged + untracked

diff range

Diff range between two refs

patch file

Patch file unapplied diff

Normalized file list + changed line ranges

Mode	Use case
`shipmoor scan .`	scan the whole project on disk
`shipmoor scan --changed`	scan everything staged, unstaged, or untracked in the working tree
`shipmoor scan --staged`	scan only the Git index (pre-commit hook usage)
`shipmoor scan --diff main...HEAD`	scan files changed in a branch range (CI usage)
`shipmoor scan --patch agent.patch`	scan a unified diff file before applying it (agent handoff)

Files outside the supported languages are skipped. Files inside .git, node_modules, .venv, dist, build, and similar directories are skipped. .gitignore patterns and project-level ignore: entries are honored. Lock files are never scanned.

When --patch adds a file that does not exist on disk yet, Shipmoor materializes a temporary copy and scans it in place, so a patch produced by an agent gets the same finding ids it would get after the patch is applied. This is called patch and changed parity.

5. Supported languages

Shipmoor analyzes four languages without invoking their compilers.

Source file

◆ Extension

py · pyi

Python analyzer full AST parse

ts tsx js jsx mjs cjs

TypeScript / JavaScript regex + package.json + tsconfig.json

go

Go analyzer regex + go.mod

Findings

Language	Approach
Python	full AST parse via the standard library
TypeScript and JavaScript	line-aware regex plus `package.json` and `tsconfig.json` resolution
Go	line-aware regex plus `go.mod` resolution

Generated files (// Code generated ... DO NOT EDIT), test corpora, abstract interfaces, and Python files that are pure re-exports are detected and skipped. Optional Python imports inside try / except ImportError blocks are not flagged.

6. What Shipmoor looks for

The rule catalog is intentionally small. Every rule fits into one of five categories, and every rule carries a default severity that follows a cross-language policy.

Rule catalog

phantom_dependencyunresolvable imports

placeholder_logicempty / constant / TODO bodies

regression_risktrust bypasses, dead code

quality_signaldebug output, bare except

syntax_errorunparseable source

Category	What it catches
`phantom_dependency`	imports that the project, the registry, or the filesystem cannot resolve
`placeholder_logic`	function bodies that are empty, constant, or a TODO panic or throw
`regression_risk`	trust bypasses (`any`, `@ts-ignore`), ignored errors, unreachable code
`quality_signal`	debug output, mutable defaults, bare excepts, oversized functions
`syntax_error`	source that cannot be parsed at all

Severity ceilings are consistent across languages. A phantom import is high in every language because the code cannot run as authored. A bare catch-all is low everywhere because the pattern has too many legitimate uses to block on by default.

7. The phantom import check

Phantom import detection is Shipmoor’s headline feature and covers all four supported languages. Each finding carries a subtype that names the specific failure mode.

Import statement detected

◆ Relative import?

Yes

Walk parent dirs check target file exists

◆ Resolves?

Yes

no finding

broken_relative_path

◆ Standard library?

Yes

no finding

◆ Declared in project manifest?

Yes

no finding

◆ Local file resolvable?

Yes

no finding

◆ Any manifest in scope?

unresolved_local_module

Yes

Look up name on public registry

◆ Registry status

exists

missing_manifest_entry

404 missing

hallucinated_package

offline / timeout

missing_manifest_entrymarked unverified

The four subtypes:

Subtype	What it means
`hallucinated_package`	the package name returned 404 on the public registry; the agent invented it
`missing_manifest_entry`	the package exists on the registry but is not declared in `package.json`, `requirements.txt`, `pyproject.toml`, or `go.mod`
`broken_relative_path`	a relative import (`from .foo import bar`) where the target file does not exist
`unresolved_local_module`	the import looks like a local module but no project manifest was found to attempt resolution

Two design choices keep this check accurate:

Monorepo-aware resolution. Shipmoor walks up from each source file to find the nearest manifest, descending into top-level subdirectories during initial discovery. A repo with backend/requirements.txt and frontend/package.json resolves each side against its own dependency set rather than producing cross-stack false positives.

Honest degradation. Registry lookups use a short timeout and are cached. When the network is unavailable, Shipmoor still flags the import but downgrades the message to note that registry confirmation was not possible. When --patch is used against a checkout that has no manifest, phantom-dependency findings are downgraded to medium with an annotation that the context could not be resolved. The framework refuses to be confidently wrong.

8. The placeholder logic check

Placeholder detection catches stub-shaped code: function bodies the agent emitted as scaffolding and never came back to fill in.

Function definition

◆ Suppressed?

Yes

no finding @abstractmethod · ABC · Protocol · tests

Strip leading docstring

◆ Body shape

empty · pass · ellipsis · NotImplemented

empty_body

single constant return

constant_return

throw not implemented

not_implemented

panic TODO

todo_panic

TODO comment in Go

todo_comment

real logic

no finding

The shapes Shipmoor recognizes:

a body that is only pass, ..., NotImplemented, or empty after the docstring
a body that is a single return None, return True, return False, return 0, return "", return [], or return {}
a throw new Error("TODO" / "not implemented" / "fixme") in TypeScript or JavaScript
a panic("TODO" / "not implemented" / "fixme") in Go
an unresolved // TODO, // FIXME, or // HACK comment in non-test Go source

Functions decorated with @abstractmethod, methods of classes that inherit from ABC or Protocol, and files in test corpora are not flagged. Interfaces are supposed to be empty.

9. Trust suppression and quality checks

These rules detect deliberate safety bypasses and debug residue.

The TypeScript and JavaScript checks:

trust.any_boundary: an exported function with any in its signature
trust.as_any: a value cast through as any
trust.ts_ignore: a @ts-ignore directive
debug.console: a console.log (or .debug, .info, .warn, .error) in non-test source
placeholder.not_implemented: a throw new Error("not implemented")
control_flow.unreachable_code: a statement after a terminal return, throw, or process.exit

The Go checks:

error.ignored_error: an assignment that discards a likely error return into _
error.panic_error: panic(err) as a broad fallback
debug.fmt_print: fmt.Print family output in non-test source
structure.god_function: a function body of 60 lines or more

The Python checks:

quality.mutable_default: a default argument that is a list, dict, or set
quality.bare_except: a bare except: with no exception type

10. The finding contract

Every analyzer emits the same shape. A finding is the atomic unit Shipmoor produces.

Findinganalyzer output · stable contract

+idSHM-…
+rule_id
+language
+severitycritical…info
+confidencelow…high
+category
+subtype
+path
+start_line
+end_line
+messagewhy
+root_causehow
+recommendationwhat to do
+evidence
+change_status
+fingerprintSHA-256

Each finding carries:

a stable id of the form SHM- followed by a 16-character fingerprint prefix
a rule_id and language
a severity (critical, high, medium, low, info) and confidence (low, medium, high)
a category and optional subtype
a path and start_line plus end_line location
a one-sentence message (why), root_cause (how), and recommendation (what to do)
an evidence map with the function name, the source line, the import name, or similar context
a change_status of introduced, existing, or unknown
a SHA-256 fingerprint of the stable fields, used for suppression in upstream systems

The fingerprint deliberately excludes severity and recommendation text. A cosmetic copy edit to a recommendation does not break a suppression that someone added in GitHub Code Scanning or another SARIF consumer.

11. Diff-aware classification

When the scan was launched in a change-aware mode (--changed, --staged, --diff, --patch), every finding is classified against the diff.

Finding with line range

◆ Mode is change aware?

change_status = unknownkeep finding

Yes

Look up changed ranges for this file

◆ Lines intersect a changed range?

Yes

change_status = introducedkeep finding

◆ Config: only_introduced?

false

change_status = existingkeep finding

true · default

drop finding

By default Shipmoor reports only findings that intersect the changed line ranges. This is why pre-merge scans stay quiet on a repo with pre-existing debt: existing findings the agent did not touch are suppressed. Configuration can flip this to report existing findings too, but the default keeps the signal aligned with what is in front of the reviewer.

12. The review gate

The gate is the binary decision Shipmoor produces. It compares the highest-severity finding against a threshold and selects one of three verdict states.

Findings list

◆ Any finding at or above fail_on?

no findings at all

Readyexit 0

present, none block

Needs a lookexit 0

at least one blocks

Needs workexit 1

Threshold (`--fail-on`)	What blocks
`none`	nothing blocks; the gate always passes
`critical`	only `critical` blocks
`high` (default)	`critical` and `high` block
`medium`	`critical`, `high`, and `medium` block

The verdict line printed at the top of every scan reflects the gate decision:

Ready when there are no findings at all
Needs a look when there are findings but none block at the current threshold
Needs work when at least one finding blocks

Exit codes:

Code	Meaning
0	gate passed (`Ready` or `Needs a look`)
1	gate failed (`Needs work`)
2	usage error (bad flag, bad config)
3	unexpected scan failure

13. Outputs

One scan result, four ways to consume it.

Scan result

Human terminal

JSON shipmoor.scan.v1

.shipmoor/last-scan.jsonfor shipmoor explain

SARIF 2.1.0 GitHub Code Scanning

Markdown summary PR / CI step summary

Human terminal: a verdict line, a project context line (manifests detected, file count, gate threshold), findings grouped by file with blockers first, and a footer with the next command to run.

JSON: a stable shipmoor.scan.v1 schema with tool metadata, scan metadata, summary counts, and the full findings list. Stable ordering by path, then line, then rule id makes diffs against previous reports meaningful.

SARIF 2.1.0: full SARIF output with severity mapped to SARIF levels (critical and high become error, medium becomes warning, low and info become note), partial fingerprints for suppression, and properties carrying confidence, subtype, change status, and evidence. This is what GitHub Code Scanning consumes.

Markdown summary: a compact table suitable for posting into a pull request description or a CI step summary.

The JSON output is also written to .shipmoor/last-scan.json after every scan so shipmoor explain can drill into a single finding without re-scanning.

14. The explain view

shipmoor explain <id> reads the last scan report (or one passed via --from report.json) and prints a single finding in a fixed grammar:

high  phantom import  python.phantom_import
src/flask/ai_helpers.py:7  SHM-fd914abf5fdea281  confidence high  phantom_dependency

why
  Package 'incidentlib' does not exist on PyPI.

root cause
  The import name could not be found in the Python package registry.

fix
  No package named 'incidentlib' exists on PyPI. Ask the agent to use a
  real package or remove the import.

evidence
  import_name: incidentlib
  registry_lookup: missing

The id can be a unique prefix. The same grammar is used inline by scan, so users learn one format.

15. Configuration

A single optional file, .shipmoor.yaml, controls scan behavior. shipmoor init writes a starter version.

schema_version: 1
languages:
  enabled: [python, typescript, javascript, go]
ignore:
  - .shipmoor/
rules:
  disabled: []
  severity_overrides: {}
thresholds:
  fail_on: high
diff:
  only_introduced: true
output:
  default_format: human

The hierarchy is: command-line flags win, then .shipmoor.yaml, then built-in defaults. Disabled rules are filtered out after detection. Severity overrides are applied to the finding before classification.

16. CI integration

The same engine, three deployment surfaces.

Local pre-merge

shipmoor scan —changed—fail-on high

Pre-commit hook

shipmoor scan —staged—fail-on high

Pull request CI

shipmoor scan —diff origin/main…HEAD—sarif —fail-on high

Exit code 0 = pass · 1 = block

Local pre-merge: shipmoor scan --changed --fail-on high after the agent finishes. The exit code tells the developer whether to send the change back or commit it.

Pre-commit hook: shipmoor scan --staged --fail-on high. The hook aborts the commit on Needs work.

Pull request CI: shipmoor scan --diff origin/main...HEAD --sarif --output shipmoor.sarif --markdown-summary $GITHUB_STEP_SUMMARY --fail-on high. The SARIF is uploaded to GitHub Code Scanning, the Markdown lands in the PR step summary, and the exit code controls the check status.

17. End-to-end example

A file an agent might plausibly add to a real project. Here it is in Flask’s source tree, where the project manifests are present, so legitimate imports resolve and only the invented ones flag:

"""AI-assisted helpers for Flask request handling."""

from incidentlib.ai import summarize
from sqlalchemy.orm import Session


def build_payload_summary(payload, tags=[]):
    tags.append(payload.get("kind"))
    return summarize.compact(payload, tags=tags)


def record_audit(session: Session, request_id: str, text: str) -> None:
    # TODO: implement actual persistence
    pass


def safe_record(session: Session, request_id: str, text: str) -> None:
    try:
        record_audit(session, request_id, text)
        session.commit()
    except:
        pass

Running shipmoor scan --changed produces:

Needs work  2 of 5 findings block review
pyproject.toml (8 deps), examples/celery/requirements.txt (21 deps), examples/celery/pyproject.toml (2 deps), examples/javascript/pyproject.toml (2 deps), examples/tutorial/pyproject.toml (2 deps)  1 file  gate high

src/flask/ai_helpers.py  5 findings
  high     :7   phantom import     python.phantom_import
    Package 'incidentlib' does not exist on PyPI.
    -> No package named 'incidentlib' exists on PyPI. Use a real package or remove the import.
  high     :8   phantom import     python.phantom_import
    Package 'sqlalchemy' is imported but not declared in requirements.txt or pyproject.toml.
    -> 'sqlalchemy' is used but not declared. Add it to the manifest or remove the import.
  medium   :11  mutable default    python.quality.mutable_default
    Function 'build_payload_summary' uses a mutable default argument.
  medium   :21  empty body         python.placeholder.empty_body
    Function 'record_audit' has no meaningful implementation.
  low      :30  bare except        python.quality.bare_except
    Bare except catches all exceptions.

gate fail  2 high blocks at threshold "high"  exit 1
  fix the 2 blockers, then re-run  shipmoor scan --changed --fail-on high
  drill into one  shipmoor explain SHM-fd914abf5fdea281
2 medium  1 low won't block, worth a look.

Two things to read carefully. First, the same rule (python.phantom_import) prints two different messages because the subtypes differ. incidentlib is a hallucinated_package (no such thing on PyPI, the agent invented it), while sqlalchemy is a missing_manifest_entry, a real package the change forgot to declare. A reviewer can tell at a glance which is which. Second, the context line shows real manifests, not degraded resolvers: legitimate imports resolve against the project’s actual dependencies, so only the planted defects surface, and across Flask’s other source files the scan stays silent. The signal is the change, not the tree.

18. Design principles

The framework is small on purpose. A few principles explain the shape:

Deterministic only. Same input, same finding ids, same exit code. No ML scoring, no calibration, no rolling thresholds. Reproducibility is more useful than nuance at this stage of the workflow.
Local first. No account, no upload, no telemetry. The whole engine runs offline. Registry lookups are the only network calls and they fail gracefully.
One moment, one job. Shipmoor scans changes at the agent-to-human handoff. The bet is that timing and shape matter more than breadth of rule catalog at this point in the workflow.
Discrete findings, not scores. Every defect is a fingerprinted finding with a precise message. There is no aggregate quality score because suppression, triage, and CI gates all need atomic units, not numbers.
Honest degradation. When Shipmoor cannot prove a finding, it says so in the message and lowers its confidence rather than guessing.

19. The verdict loop

The simplest way to think about Shipmoor:

Agent finishes a change

shipmoor scan —changed

◆ Verdict

Ready · Needs a lookopen PR or commit

Needs worksend back to agent or fix

Needs work · iterate

The job of the framework is to make that decision quickly, with enough evidence that the developer can act without thinking about it.

20. In one sentence

Timing and shape, not breadth. Shipmoor scans the agent’s change at the moment it hands off to human review, names a small set of high-confidence defects, and prints a verdict you can act on in seconds. The whole engine is deterministic and local: same input, same finding ids, same exit code, no source upload. Where it cannot prove a finding, it says so and lowers its confidence rather than guessing.

This is the defect layer. Claim Check, which checks whether a change did what the task asked, builds on the same deterministic foundation.

Try it

Install the Community CLI and run it on your next agent-authored change:

curl -fsSL https://dl.shipmoor.dev/install-community-cli.sh | bash
cd path/to/your/repo
shipmoor scan --changed

It is free, local, and needs no account. See pricing for Team and Enterprise, or read what AI code integrity means.