All posts

Engineering

How an AI lab team runs Shipmoor as a CI gate on a brownfield Python monorepo

A field report: an AI lab team adopted the free Shipmoor Community CLI as a diff-scoped CI gate across eight services in a complex Python monorepo, alongside Trivy, with SARIF and stable exit codes — and what running on real agent-written code taught us.

How an AI lab team runs Shipmoor as a CI gate on a brownfield Python monorepo cover image

One of the clearest signals that pre-merge integrity checks belong in the workflow comes from teams already living the problem. This is a field report from one of them — an AI lab team running Shipmoor in continuous integration on a large, brownfield Python monorepo. At their request we’re keeping them anonymous; what follows is the shape of the deployment and what it taught us, not a logo.

The setting

The team builds on a complex Python monorepo — the kind that accretes over years, spans many services, and never has every third-party dependency installed in any one environment. They had also leaned into agent-assisted development, which meant more code arriving faster, and more of it landing in pull requests with a confident summary attached. The reviewers’ problem wasn’t a shortage of tools; it was a shortage of attention for the volume of plausible-looking change.

They wanted a check that ran before a human spent review time — one that was honest about what it could and couldn’t see, and that didn’t ship their source anywhere.

The deployment

They adopted the free Shipmoor Community CLI as a diff-scoped CI gate in GitHub Actions, across eight services in the monorepo. It runs alongside Trivy — Trivy for dependency and vulnerability scanning, Shipmoor for the generated-code failure modes a vulnerability scanner doesn’t look for: phantom imports, hallucinated APIs, and stub paths.

Two properties made it fit cleanly into their pipeline:

  • SARIF output, uploaded to GitHub code scanning, so Shipmoor findings show up in the same place as the rest of their security signal — no new dashboard to learn.
  • Stable exit codes, so the gate behaves predictably: a clean scan exits 0, a finding that meets the threshold exits 1 (the gate firing, not a tooling error), and real failures are distinct from both. That contract is what makes it safe to block a merge on.

The whole thing was validated end to end in their real GitHub Actions setup — not a toy repo — before it gated anything.

Why diff-scoped is the whole game

The most important configuration decision was scoping the scan to the change. On a brownfield monorepo where a CI job doesn’t install every service’s third-party dependencies, a full-repo scan will light up with phantom-import findings for code nobody touched — noise that drowns the signal. Scoping the gate to the pull request’s diff (shipmoor scan --changed, or a --diff range) keeps the check focused on exactly the code under review. That’s also the code an agent most likely just wrote, which is precisely where the high-confidence failure modes cluster.

- name: Install Shipmoor CLI
  run: curl -fsSL https://dl.shipmoor.dev/install.sh | bash
- name: Run Shipmoor
  run: |
    "$HOME/.shipmoor/bin/shipmoor" scan --changed \
      --sarif --output shipmoor.sarif \
      --fail-on high
- name: Upload SARIF
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: shipmoor.sarif

What running on real code surfaced

Putting Shipmoor in a real pipeline, on real agent-written changes, did what dogfooding on toy repos can’t: it surfaced things worth fixing on our side. The engagement turned up one server-side install issue, which we’ve since fixed, and two product-feedback items that fed directly into the CLI’s roadmap. A check that’s honest about its limits earns that kind of feedback — teams tell you where the edges are when the tool doesn’t overclaim.

Notably, none of this required the team to send us their source. The Community CLI runs entirely on their runners; only SARIF — findings, not code — leaves the job, and it goes to their own GitHub code scanning, not to Shipmoor. There is no Shipmoor cloud in the path of a scan.

Where this goes next

A diff-scoped structural gate catches the code that’s broken. The natural next step for a team this far along is catching the code that runs but doesn’t do what the task asked — the claim gap. That’s what Claim Check adds in the Shipmoor IC plan: it compares an agent’s change to the task it was given, with deterministic probes deciding and an LLM only ever advising. For teams that want the check to live inside the agent loop, Agent Skills run the same checks from inside Claude, Codex, Cursor, or Aider.

If you’re handing more of a complex codebase to coding agents and review can’t keep up, the starting point is free and local:

curl -fsSL https://dl.shipmoor.dev/install.sh | bash
shipmoor scan --changed

No account, no telemetry, no source upload — just a short list of high-confidence risks on the change in front of you.

Contact sales

Our team can help with custom support, team rollouts, and self-hosted deployments. Or to get started now, explore our self-serve plans.