Data contract
Audience: customer admins, security reviewers, and executives. Scope: every record TokenShift emits over the wire.
TokenShift has no control plane. Customers configure the binary entirely at install time via local files deployed by their MDM; PointFive only ingests data. There is no remote policy push, no remote rule update, no remote opt-out. See Installation & distribution for the install and configuration model.
Every record on the wire is a typed JSON document, hybrid-encrypted client-side against PointFive’s public key (pinned in the binary at build time), posted to a single HTTPS endpoint, decrypted, and persisted keyed by signal. Each record carries full per-event fidelity so any analytical question is answerable at query time. Each record also carries a schema_version field.
The contract goes through one explicit allowlist. Anything not enumerated below is dropped at the wire boundary, fail-closed.
Signal index
Section titled “Signal index”TokenShift emits exactly five typed records — never anything else.
| # | Signal | Min tier | Fires when |
|---|---|---|---|
| 1 | tool_invocation | 1 | The agent makes a tool call. |
| 2 | recovery_retrieved | 1 | A developer pulls back the original output that was compressed away. |
| 3 | session_summary | 1 | An agent session ends. |
| 4 | compression_sample | 4 | Triple-gated, off by default — sampled redacted before/after pair for rule tuning. |
| 5 | client_state | 1 | Periodically (≤ once / 24h per machine) — version + install metadata. |
1. tool_invocation
Section titled “1. tool_invocation”The atomic record. One per tool call the agent makes, regardless of whether TokenShift compressed it. Carries the tool kind, model, activity bucket, project (hashed by default), a stable redactor-derived command_hash, the original token count, and an outcome enum (unmatched / skipped / compressed / fallback / bypassed). At Tier 2+ also carries a redacted command_shape and rewritten_command_shape — the redactor output of what the agent wrote and what was actually executed. No raw command bytes ship at any tier.
2. recovery_retrieved
Section titled “2. recovery_retrieved”Fires when a developer pulls back the original (uncompressed) output via the recovery command. Strong signal that compression destroyed context the developer actually needed. Carries only a join key (recovery_id → tool_invocation), the recovering session, and an age in seconds bounded by the cache TTL. No content. No file paths.
3. session_summary
Section titled “3. session_summary”One best-effort record per agent session, fired at session end. Carries the things you can’t reconstruct from tool_invocation alone — end_reason (clean / timeout / crash / interrupted), turn count, peak context-window usage — plus small denormalized aggregates (total tokens in, total tokens saved, per-outcome counts) for fast single-session dashboards.
4. compression_sample
Section titled “4. compression_sample”A redacted before/after pair captured for PointFive rule tuning. The only signal that carries actual output content — and the most tightly gated record in the contract. Off by default. Three independent gates must all be true for any sample to leave the process: tenant has enabled Tier 4 in their manifest, the matched rule is on the per-rule sample allowlist, and a random draw lands inside the configured sampling percentage. Never surfaced on customer dashboards. Carries no user_id (defense in depth). Hard-capped retention: 7 days.
5. client_state
Section titled “5. client_state”Periodic meta-telemetry about the binary itself — version, OS, architecture, install method, enrollment age, whether the binary’s internal self-check (redactor regression corpus, allowlist invariants, manifest validity) passed. Rate-limited to at most once per 24 hours per client_id; new installs always emit on first session. No PII, no per-tool-call detail. Without a control plane this is the only way PointFive and customer admins see fleet hygiene — stale versions, dominant install methods, enrollment age.
A single integer set by user choice and capped from above by install-time config. The effective tier on the wire is min(user_choice, admin_cap). Each higher tier strictly adds; no tier silently relaxes a guarantee made at a lower tier.
| Tier | What leaves the laptop |
|---|---|
| 0 | Nothing. Local stats only. |
| 1 | tool_invocation (no shape fields), recovery_retrieved, session_summary, client_state. |
| 2 | Tier 1 + command_shape and rewritten_command_shape on tool_invocation. |
| 4 | Tier 2 + compression_sample. Triple-gated; off by default even when permitted. |
Default tier when enrolled: 1. Tier 4 is off by default even when an admin permits it — sampling additionally requires a per-rule allowlist and a configured sample rate.
Install-time configuration
Section titled “Install-time configuration”Every knob a tenant admin sets when deploying TokenShift via MDM. There is no runtime config push — changing any control below requires regenerating the manifest from the PointFive portal and redeploying via MDM. No user-level overrides — users cannot relax or extend beyond what the manifest permits.
| Control | Default | Effect |
|---|---|---|
| Tier cap | Tier 1 | Maximum tier any client in the tenant can emit. |
| Project identifier mode | Hashed git remote URL | Choices: hashed remote URL, hashed directory name, raw directory name. |
| Recovery cache enabled | On | Off disables the local one-hour cache. |
| Sample allowlist | Empty | Per-rule list of which rules may be sampled at Tier 4. Empty = no samples ever emitted. |
| Sample rate | 0% | Per-rule sampling probability used at Tier 4. |
| Retention extensions | Defaults below | Negotiated via contract for four of the five signals. compression_sample is hard-capped regardless. |
Identifiers
Section titled “Identifiers”tenant_id— customer organization, from the enrollment manifest.client_id— random per-machine UUID, created on first run. Stable for the life of the install.user_id— pseudonymous user identifier computed locally by the binary asHMAC(per_tenant_key, email). Email is read once fromgit config --global user.email(fallback:$TOKENSHIFT_USER_EMAIL), HMAC’d, then discarded. The raw email never leaves the developer’s machine. The HMAC key is per-tenant.user_idis deliberately absent fromcompression_sample(defense in depth) andclient_state.project— hashed git remote URL by default; falls back to hashed directory name; install-time config may permit raw. The hash is a correlation defense, not a secrecy defense.session_id,invocation_id,recovery_id— opaque UUIDs used as join keys between signals.
Raw email, hostnames, OS usernames, IPs, MACs, and raw file paths are never captured and have no code path that would attach them.
The redactor
Section titled “The redactor”Four fields can carry shape-like data derived from real commands: command_shape and rewritten_command_shape on tool_invocation (Tier 2+), and the two halves of compression_sample (Tier 4). All four pass through the same redactor before emission, in this order:
- Known-secret patterns — sixteen patterns covering common cloud, SaaS, and language-runtime keys (AWS / GitHub / GitLab / Slack / Google / Stripe / Anthropic / OpenAI / JWT / private keys / DB-URL-with-creds / generic
password/token/secret/api_keyassignments). - Structural value redaction — about thirty common command-line tools are recognized; flag names are preserved, flag values are replaced with placeholders. Unknown binaries are replaced with a generic placeholder.
- Length cap — 200 bytes post-redaction on
command_shape. - Backend pass — a final redaction pass on the receiving side as a net.
A 40+ case regression corpus runs in CI on every change and as part of the binary’s startup self-check. A redactor warning on any shape field drops the whole record — never partial.
Fail-closed posture
Section titled “Fail-closed posture”Anything that doesn’t go right ends in a drop, never a raw emit.
| Trigger | Action |
|---|---|
| Unknown attribute on a record | Attribute stripped before send. |
| Unknown signal name | Record dropped. |
| Effective tier below the signal’s minimum | Signal not emitted. |
| Redactor warning on any shape field | Whole record dropped. |
| Per-rule sample allowlist miss / sample-rate draw miss | Sample not emitted. |
| Local outbox overflow | Oldest records dropped first; developer’s command never blocked. |
| Network unavailable | Records queued in the bounded local outbox until network returns. |
| Enrollment manifest absent or unparseable | Binary refuses to send and surfaces the error. |
| Encryption fails | Record dropped. |
Transport
Section titled “Transport”- One HTTPS endpoint, hardcoded in the binary.
- Bearer-token auth, token shipped in the manifest; rotated by regenerating the manifest in the portal.
- Per-record hybrid encryption against PointFive’s pinned public key. The matching private key is held by PointFive and never leaves PointFive’s key-management system. See Installation & distribution for the key model.
- gzip on wire (after encryption).
- Stateless ingest. No intermediate storage, no brokers, no records at rest in PointFive’s ingest path.
- Offline tolerance. A bounded local outbox holds pending records and drains when the network returns; overflow drops oldest first. The developer’s command is never blocked by network state.
Retention
Section titled “Retention”| Signal | Default retention | Tenant-extensible? |
|---|---|---|
tool_invocation | 90 days | Yes |
recovery_retrieved | 90 days | Yes |
session_summary | 90 days | Yes |
client_state | 90 days | Yes |
compression_sample | 7 days | No — hard cap. |
The 7-day hard cap on compression_sample is non-negotiable. Samples are the only signal that carries redacted output content; the rule-tuning use case doesn’t benefit from longer retention, and every extra day is unnecessary blast radius.
What the contract explicitly does not send
Section titled “What the contract explicitly does not send”The wire never carries any of the following. None are on the allowlist; no code path attaches them. An automated test asserts only allowlisted keys appear on the wire — a change that adds any of these fails the build before merge.
- Bash stdout / stderr content, in any form (including hashes).
- Tool inputs: file contents, web content, MCP tool-call results.
- Agent transcripts, prompts, messages, message hashes.
- Environment variables — names or values.
- Raw file paths.
- Hostnames, OS usernames, real names, IP addresses, MAC addresses.
- Free-text feedback / notes / comments. None exist by design.
- Rule contents or rule diffs — only the rule’s identifier.
- Stack traces, panic messages, crash report bodies.
- Branch names, commit messages, git refs.
- Raw email. Read once locally to compute
user_id, then discarded. The hash ships; the email never does.
Glossary
Section titled “Glossary”- Allowlist — Explicit list of what is permitted; everything else is denied.
- Enrollment manifest — A small JSON file generated by the PointFive customer portal and deployed via MDM. Carries
tenant_id,bearer_token,user_id_hmac_key, and install-time config. The ingest URL and PointFive public key are pinned in the binary at build time, not carried in the manifest. - Fail-closed — On any error, drop the data rather than emit it raw.
- MDM — Mobile Device Management / endpoint management system (Jamf, JumpCloud, Intune, Ansible, etc.).
- Pseudonymous — Identifies a unique entity but cannot be reversed to a person without separately held data.
- Signal — A named, schema’d record type on the wire (one of the five above).
- Tier — Level of data sharing (0/1/2/4). Capped from above by install-time config.