What is Surface?
Documentation, governed like code.
You anchor a sentence to the code it describes. When that code’s logic changes, surf check
fails the build until a human re-confirms the sentence still holds — the same way a broken test
blocks a merge. Deterministic: no model, no network, no API key in the core.
Part of Gradient.
The problem
Section titled “The problem”You write a context file for your codebase — an architecture note, an AGENTS.md, a hub for the
auth flow. The day you write it, it’s accurate.
Then the code moves. Someone refactors the function you described; the behavior changes on purpose, the tests get updated to match, CI goes green, the PR merges. Everything is correct — except the paragraph that described that function. Nobody touched it, for two ordinary reasons: they didn’t know it existed, and there was no standard place to look for it. It now says something that is no longer true.
Nothing failed. Nothing fired. The only thing that broke is the explanation the next engineer — and every agent on every run — will trust and reason from. A codebase can be fully green on tests and full of confident, well-written, completely false documentation. That second failure quietly poisons everyone who reads it, and nothing in your toolchain catches it.
Surface closes that gap two ways: hubs/ gives documentation a standard home so people and
agents actually find it, and surf check governs the prose like a test so it can’t silently
rot.
What Surface does that tests don’t
Section titled “What Surface does that tests don’t”A test asserts that behavior matches an expectation written in code. Surface asserts that prose still matches the code it describes. Those are different expectations, and they drift apart at exactly the moment someone updates one and forgets the other.
| docs accurate | docs stale | |
|---|---|---|
| code correct | fine | ← nothing else catches this |
| code broken | your tests catch this | both might fire |
The bottom-left cell is what tests are for. The top-right cell — code that works fine but no longer does what your docs claim — has no owner. You can’t write a unit test for “the README still describes this accurately,” because the thing that drifted is human-language understanding, and tests don’t speak that language. Surface owns that cell.
How it works, in one breath
Section titled “How it works, in one breath”You anchor a sentence to the code it’s about:
# auth/_hub.md (a "hub" — frontmatter + prose, lives next to the code it describes)anchors: - claim: "refresh rotation is single-use; reuse triggers global logout" at: "src/auth/refresh.ts > rotateRefreshToken" hash: 9b1c33asurf check runs in CI. For each anchor it finds the symbol, reduces it to pure logic (ignoring
formatting, comments, and renames), fingerprints that, and compares the fingerprint to the one
stored from the last time a human confirmed the sentence was true.
- Fingerprint matches → the logic didn’t meaningfully change → silent pass.
- Fingerprint differs → the code diverged from its description → block the merge with a precise report: which hub, which claim, old code vs. new code.
It’s a tamper-evident seal on the logic of exactly the code your docs claim to describe. Quiet on cosmetics, loud on logic — see How the gate works.
What Surface does NOT do
Section titled “What Surface does NOT do”Read this part. It’s the difference between a tool you trust and one that burns you.
- It does not tell you your docs are true. It tells you the code they point at changed, so a human should re-read the prose. A green check means “nothing drifted since the last sign-off,” not “everything is correct.” That’s a deliberately weaker promise than a passing test, because meaning isn’t mechanically decidable.
- It only watches what you anchored. If a change in a file no hub points at quietly invalidates a documented invariant, Surface will not see it. Catching that is security review and taint analysis — a different discipline. Surface guards the spans you chose to describe, nothing more.
- It is not a retrieval system. It doesn’t search, embed, or serve context. There are good tools for that. Surface optimizes a different thing: trust in what you retrieve.
If you want the fuzzy “is this claim still true” judgment, that lives in an optional reviewer plugin that reads Surface’s JSON output. The core never depends on it. Pull every plugin out and the gate blocks and passes exactly the same.
Is Surface for you?
Section titled “Is Surface for you?”Honestly? Maybe not. Roughly, it earns its keep when
codebase complexity × change velocity × (humans + agents) reading it
is high. A small, slow, simple codebase doesn’t need this — your team can just read the code, and two well-kept markdown files beat the whole apparatus. Use Surface where rebuilding the mental model from source is genuinely expensive and the code moves fast enough to drift.
One thing pushes the math toward “yes”: AI agents. A human onboards onto a domain once and amortizes the cost over months. An agent re-onboards every session and amortizes nothing — it’s a new hire on its first day, every day, paying the full cost of wrong context on every invocation. A bigger context window doesn’t fix this; it lets an agent read every line and still confidently derive a wrong model, because it can’t tell a deliberate invariant from incidental code. If your team runs agents hard, your effective reader count is enormous, and governed context stops being optional.
But Surface is justified without a single agent in the loop. “Your architectural invariants are now governed like code” pays off for human onboarding, review, and incident response on its own. Agents are a multiplier, not the foundation.
- Install · Quickstart
- Authoring hubs · CI integration · Examples
- Reference: Commands · Configuration · How the gate works · FAQ