How TokenShift works

TokenShift sits between a coding agent and the shell. When the agent runs a command — git diff, npm test, kubectl get pods — TokenShift intercepts the call, runs the real command, rewrites the output into a shorter form, and hands the trimmed text back to the agent. The full original is kept locally so the agent can fetch it back at any time.

That’s the main loop, but two other things happen alongside it on every hook call: telemetry ships home, and — on a submitted prompt — a small on-device model classifies what kind of work is happening. Both are covered below.

The compression loop

agent runs:  git diff
     ↓
TokenShift recognizes the command
     ↓
runs the real command
     ↓
keeps the changes, trims the noise
     ↓
caches the full output on disk
     ↓
returns the short version to the agent

Five steps, every invocation. None of it requires configuration from the user — the rules ship in the binary.

What gets compressed

Source control

git diff, git log, git show, git status. Keeps every change but trims unchanged context, omits binary blobs, and collapses long history.

Test output

pytest, jest, vitest, go test, cargo test. Keeps failing tests with stack traces and assertion details; drops the wall of pass lines.

Build & lint

npm, cargo, go build, eslint, tsc. Keeps errors and warnings; drops the noise around them.

rg, grep, find. Groups results by file, trims duplicate matches.

Listings

ls, tree, git ls-files. Bounds depth, summarizes large directories.

Infrastructure

kubectl, docker, gh, aws, gcloud. Keeps the structure of resources; drops boilerplate fields the model already knows.

What never gets compressed

Anything the agent writes (prompts, file edits, code) — TokenShift only touches command output, never input.
Commands TokenShift doesn’t recognize — they pass through untouched.
Local files — TokenShift never reads the filesystem outside of running commands you’ve asked the agent to run.

Two more things happening in the background

Compression is the main loop, but the same hooks that trigger it also trigger two other things:

Telemetry ships home

Every tool call becomes a signal — what ran, on which harness, how many tokens it cost, how much was saved. Redacted on the device, encrypted before it ships. See Security overview and the Data contract for exactly what’s in it.

Prompts get classified, locally

On a submitted prompt, a small on-device model works out what kind of work it is — debugging, feature work, research, and so on. Only that label ships in telemetry; the prompt text itself never leaves the machine, and never reaches an external model just to be categorized.

How the classifier actually runs: it’s not embedded in the main binary. The first time a harness session needs it, TokenShift starts a small background helper that loads a compact local embedding model and scores the prompt against it, falling back to a plain keyword-based scorer if the model isn’t available for any reason. That helper is not a persistent system service — it lives only as long as you’re actively using the harness and shuts itself down after about an hour of inactivity. Classification is fully non-blocking: it runs in a detached background step, so a cold or unreachable classifier never adds latency to the agent’s prompt — the same fail-open principle as everything else TokenShift does.

The model itself is downloaded once (signed and verified the same way the tokenshift binary’s own updates are) and cached locally; it’s checked for a newer version at most once a day, never on every prompt.

Two ways it can be installed

Transparent, the default. A one-time install command writes hooks into the AI agent’s settings — currently Claude Code, Cursor, GitHub Copilot CLI, and OpenAI Codex are supported. From then on, every command the agent runs flows through TokenShift automatically. The agent has no idea compression is happening; it just sees shorter output.

Explicit, for scripts and shells. You can also call TokenShift directly — tokenshift git diff, tokenshift npm test. Useful for shell aliases or quick checks. Rarely needed in normal use.

Nothing is permanently hidden

Every compressed response ends with one extra line:

… tokenshift recover <hash>

That hash points to the full, uncompressed output, stored on the local machine. If the agent ever needs the original, it asks for it by hash and gets it back verbatim. The compressed view is a summary, not a truncation.

One exception: harnesses that run shell commands inside a sandbox — Cursor and OpenAI Codex — block writes outside the workspace, so the recovery cache (under ~/.tokenshift) isn’t populated for them. Compression and telemetry still work; tokenshift recover <hash> just has nothing to replay for those two.

What stays local, what goes off-device

Command output, full and compressed: the text stays on the device. Telemetry may carry a one-way hash of it for correlation, never the content itself.
Local files: never touched. TokenShift only sees what commands print.
Prompt text: stays local. It’s read once, on-device, to produce an intent label (see above); the prompt itself is never sent anywhere, not even to power the classification.
Telemetry: opt-in and encrypted. Sensitive fields — output, prompts, file contents, paths — are one-way hashed on the device before send, so the raw content never leaves. See Security overview for the full breakdown.

Harness coverage — how completely each of the four harnesses is covered today, scored and scored the same way, per OS.
Core commands — doctor, dashboard, recover, backfill, and the rest.