Skip to content

Data contract

Audience: customer admins, security reviewers, and executives. Scope: every record TokenShift emits over the wire.

TokenShift has no control plane. Customers configure the binary entirely at install time via local files deployed by their MDM; PointFive only ingests data. There is no remote policy push, no remote rule update, no remote opt-out. See Installation & distribution for the install and configuration model.

Every record on the wire is a typed JSON document, hybrid-encrypted client-side against PointFive’s public key (pinned in the binary at build time), posted to a single HTTPS endpoint, decrypted, and persisted keyed by signal. Each record carries full per-event fidelity so any analytical question is answerable at query time. Each record also carries a schema_version field.

The contract goes through one explicit allowlist. Anything not enumerated below is dropped at the wire boundary, fail-closed.


TokenShift emits exactly five typed records — never anything else.

#SignalMin tierFires when
1tool_invocation1The agent makes a tool call.
2recovery_retrieved1A developer pulls back the original output that was compressed away.
3session_summary1An agent session ends.
4compression_sample4Triple-gated, off by default — sampled redacted before/after pair for rule tuning.
5client_state1Periodically (≤ once / 24h per machine) — version + install metadata.

The atomic record. One per tool call the agent makes, regardless of whether TokenShift compressed it. Carries the tool kind, model, activity bucket, project (hashed by default), a stable redactor-derived command_hash, the original token count, and an outcome enum (unmatched / skipped / compressed / fallback / bypassed). At Tier 2+ also carries a redacted command_shape and rewritten_command_shape — the redactor output of what the agent wrote and what was actually executed. No raw command bytes ship at any tier.

Fires when a developer pulls back the original (uncompressed) output via the recovery command. Strong signal that compression destroyed context the developer actually needed. Carries only a join key (recovery_idtool_invocation), the recovering session, and an age in seconds bounded by the cache TTL. No content. No file paths.

One best-effort record per agent session, fired at session end. Carries the things you can’t reconstruct from tool_invocation alone — end_reason (clean / timeout / crash / interrupted), turn count, peak context-window usage — plus small denormalized aggregates (total tokens in, total tokens saved, per-outcome counts) for fast single-session dashboards.

A redacted before/after pair captured for PointFive rule tuning. The only signal that carries actual output content — and the most tightly gated record in the contract. Off by default. Three independent gates must all be true for any sample to leave the process: tenant has enabled Tier 4 in their manifest, the matched rule is on the per-rule sample allowlist, and a random draw lands inside the configured sampling percentage. Never surfaced on customer dashboards. Carries no user_id (defense in depth). Hard-capped retention: 7 days.

Periodic meta-telemetry about the binary itself — version, OS, architecture, install method, enrollment age, whether the binary’s internal self-check (redactor regression corpus, allowlist invariants, manifest validity) passed. Rate-limited to at most once per 24 hours per client_id; new installs always emit on first session. No PII, no per-tool-call detail. Without a control plane this is the only way PointFive and customer admins see fleet hygiene — stale versions, dominant install methods, enrollment age.


A single integer set by user choice and capped from above by install-time config. The effective tier on the wire is min(user_choice, admin_cap). Each higher tier strictly adds; no tier silently relaxes a guarantee made at a lower tier.

TierWhat leaves the laptop
0Nothing. Local stats only.
1tool_invocation (no shape fields), recovery_retrieved, session_summary, client_state.
2Tier 1 + command_shape and rewritten_command_shape on tool_invocation.
4Tier 2 + compression_sample. Triple-gated; off by default even when permitted.

Default tier when enrolled: 1. Tier 4 is off by default even when an admin permits it — sampling additionally requires a per-rule allowlist and a configured sample rate.


Every knob a tenant admin sets when deploying TokenShift via MDM. There is no runtime config push — changing any control below requires regenerating the manifest from the PointFive portal and redeploying via MDM. No user-level overrides — users cannot relax or extend beyond what the manifest permits.

ControlDefaultEffect
Tier capTier 1Maximum tier any client in the tenant can emit.
Project identifier modeHashed git remote URLChoices: hashed remote URL, hashed directory name, raw directory name.
Recovery cache enabledOnOff disables the local one-hour cache.
Sample allowlistEmptyPer-rule list of which rules may be sampled at Tier 4. Empty = no samples ever emitted.
Sample rate0%Per-rule sampling probability used at Tier 4.
Retention extensionsDefaults belowNegotiated via contract for four of the five signals. compression_sample is hard-capped regardless.

  • tenant_id — customer organization, from the enrollment manifest.
  • client_id — random per-machine UUID, created on first run. Stable for the life of the install.
  • user_id — pseudonymous user identifier computed locally by the binary as HMAC(per_tenant_key, email). Email is read once from git config --global user.email (fallback: $TOKENSHIFT_USER_EMAIL), HMAC’d, then discarded. The raw email never leaves the developer’s machine. The HMAC key is per-tenant. user_id is deliberately absent from compression_sample (defense in depth) and client_state.
  • project — hashed git remote URL by default; falls back to hashed directory name; install-time config may permit raw. The hash is a correlation defense, not a secrecy defense.
  • session_id, invocation_id, recovery_id — opaque UUIDs used as join keys between signals.

Raw email, hostnames, OS usernames, IPs, MACs, and raw file paths are never captured and have no code path that would attach them.


Four fields can carry shape-like data derived from real commands: command_shape and rewritten_command_shape on tool_invocation (Tier 2+), and the two halves of compression_sample (Tier 4). All four pass through the same redactor before emission, in this order:

  1. Known-secret patterns — sixteen patterns covering common cloud, SaaS, and language-runtime keys (AWS / GitHub / GitLab / Slack / Google / Stripe / Anthropic / OpenAI / JWT / private keys / DB-URL-with-creds / generic password/token/secret/api_key assignments).
  2. Structural value redaction — about thirty common command-line tools are recognized; flag names are preserved, flag values are replaced with placeholders. Unknown binaries are replaced with a generic placeholder.
  3. Length cap — 200 bytes post-redaction on command_shape.
  4. Backend pass — a final redaction pass on the receiving side as a net.

A 40+ case regression corpus runs in CI on every change and as part of the binary’s startup self-check. A redactor warning on any shape field drops the whole record — never partial.


Anything that doesn’t go right ends in a drop, never a raw emit.

TriggerAction
Unknown attribute on a recordAttribute stripped before send.
Unknown signal nameRecord dropped.
Effective tier below the signal’s minimumSignal not emitted.
Redactor warning on any shape fieldWhole record dropped.
Per-rule sample allowlist miss / sample-rate draw missSample not emitted.
Local outbox overflowOldest records dropped first; developer’s command never blocked.
Network unavailableRecords queued in the bounded local outbox until network returns.
Enrollment manifest absent or unparseableBinary refuses to send and surfaces the error.
Encryption failsRecord dropped.

  • One HTTPS endpoint, hardcoded in the binary.
  • Bearer-token auth, token shipped in the manifest; rotated by regenerating the manifest in the portal.
  • Per-record hybrid encryption against PointFive’s pinned public key. The matching private key is held by PointFive and never leaves PointFive’s key-management system. See Installation & distribution for the key model.
  • gzip on wire (after encryption).
  • Stateless ingest. No intermediate storage, no brokers, no records at rest in PointFive’s ingest path.
  • Offline tolerance. A bounded local outbox holds pending records and drains when the network returns; overflow drops oldest first. The developer’s command is never blocked by network state.

SignalDefault retentionTenant-extensible?
tool_invocation90 daysYes
recovery_retrieved90 daysYes
session_summary90 daysYes
client_state90 daysYes
compression_sample7 daysNo — hard cap.

The 7-day hard cap on compression_sample is non-negotiable. Samples are the only signal that carries redacted output content; the rule-tuning use case doesn’t benefit from longer retention, and every extra day is unnecessary blast radius.


What the contract explicitly does not send

Section titled “What the contract explicitly does not send”

The wire never carries any of the following. None are on the allowlist; no code path attaches them. An automated test asserts only allowlisted keys appear on the wire — a change that adds any of these fails the build before merge.

  1. Bash stdout / stderr content, in any form (including hashes).
  2. Tool inputs: file contents, web content, MCP tool-call results.
  3. Agent transcripts, prompts, messages, message hashes.
  4. Environment variables — names or values.
  5. Raw file paths.
  6. Hostnames, OS usernames, real names, IP addresses, MAC addresses.
  7. Free-text feedback / notes / comments. None exist by design.
  8. Rule contents or rule diffs — only the rule’s identifier.
  9. Stack traces, panic messages, crash report bodies.
  10. Branch names, commit messages, git refs.
  11. Raw email. Read once locally to compute user_id, then discarded. The hash ships; the email never does.

  • Allowlist — Explicit list of what is permitted; everything else is denied.
  • Enrollment manifest — A small JSON file generated by the PointFive customer portal and deployed via MDM. Carries tenant_id, bearer_token, user_id_hmac_key, and install-time config. The ingest URL and PointFive public key are pinned in the binary at build time, not carried in the manifest.
  • Fail-closed — On any error, drop the data rather than emit it raw.
  • MDM — Mobile Device Management / endpoint management system (Jamf, JumpCloud, Intune, Ansible, etc.).
  • Pseudonymous — Identifies a unique entity but cannot be reversed to a person without separately held data.
  • Signal — A named, schema’d record type on the wire (one of the five above).
  • Tier — Level of data sharing (0/1/2/4). Capped from above by install-time config.