Hypernym Infinite Memory / Librarian · CTO research board · Direct IP only

Memory control plane for small-model long-memory inference.

True north: prove that a 4,096-token-class ~3B local model can use controller-curated memory to handle long-memory pressure, isolated recall, and provenance-controlled retrieval at materially lower serving cost than brute-force long context.

Current decision: continue research and build V3 around selected-current payloads, controller-owned metadata, and tail-anchored output contracts. v0.61 shows the tail-contract and tail-schema variants passed both 1024 and 2048 pressure; the original-control 2048 row hit one transport disconnect.

Why this page exists: it is not a pretty dashboard. It is the current decision record: what we can say, what we cannot say, and what the next eval must prove before a CTO should change direction.

QuestionCan a 4,096-context-class small model behave like a useful long-memory system when a controller curates memory?

Evidence80 live rows, 1,199,692 prompt tokens, 84.6% true-north score.

ConstraintFailed-slice rerun coherence stayed at 25.0%; the weak cluster is real.

Current testGranite Pilots 8192 passed with 210,631 reported prompt tokens; single-shot 16384 hit gateway HTTP 413. Live chunking found a different boundary: one chunk plus compact recall works semantically, but a second chunk returns server 500 Invalid input batch even at 1024x2.

Largest observed pressure

198,888

48.6x a 4,096-token native context class. Measured successful pressure, not a base-context increase.

Transport liveness

3 / 4

Known compact transport probes that completed under the request-path wall.

Pocket semantic score

83.3%

Latest pocket-generalization semantic mean across completed rows.

Personal-memory score

84.6%

v0.51 live true-north score across the complete 80-row matrix.

v0.53 semantic score

46.3%

Format/schema/pressure diagnostic. This is a constraint finding, not a demo score.

v0.54 semantic score

85.2%

Runtime-selected minimal-payload diagnostic. This is the strongest V3 direction signal.

v0.55 pressure score

11.1%

Semantic pass under 1024 pressure in deterministic envelope live run. This is the next hard constraint.

v0.57 HTTP 200

9 / 9

Timeout-aware shard completed every row, including 2048-pressure rows, without 503 contamination.

v0.58 HTTP 200

0 / 1

First row stopped after repeated chat-slot busy responses; health was OK, but no quality row completed.

v0.59 HTTP 200

14 / 15

Idle-gated retry avoided the busy cascade; stopped on one fast HTTP 500 Invalid input batch.

v0.60 HTTP 200

6 / 6

Focused agent-loop replay completed all rows. Original controller-expands passed at 1024/2048.

v0.61 HTTP 200

5 / 6

Tail-contract and tail-schema variants passed 4/4 scored rows; original-control 2048 disconnected.

Pilots pressure pass

1,024

Highest strict pass band on the session endpoint; max probe prompt tokens 24,584, max probe latency 58.1629s.

Pilots 2048 pass

52,933

Reported prompt tokens in the independent 2048 extension; elapsed 55.1825s, strict/semantic 100%, forbidden hits 0.

Pilots 4096 pass

105,498

Reported prompt tokens in the independent 4096 extension; elapsed 106.0359s, strict/semantic 100%, forbidden hits 0.

Pilots 8192 pass

210,631

Reported prompt tokens in the independent 8192 extension; elapsed 196.4813s, strict/semantic 100%, forbidden hits 0.

16384 ceiling

HTTP 413

The 16384 probe hit nginx Request Entity Too Large in 0.4259s. This is a request-admission ceiling, not a memory-quality failure.

What This System Is

A controller-mediated memory layer around a small local model. The model is not simply given an infinite prompt; the controller selects, scopes, and verifies memory payloads so the model can answer from long-memory state without brute-force long-context serving.

Serving target: local/self-hosted Granite GGUF through the isolated direct-IP lane.
Security target: per-tenant memory isolation, active binding precedence, revoked/stale memory rejection, and empty-result honesty.
Business target: reduce the cost and reliability penalty of long-memory inference across many users.

What Is Actually Strong

Pressure handling: successful prompts have exceeded native context by tens of times.
Structured output: strict JSON / digest-handle patterns are working in prepared evals.
Control-plane safety: empty result, duplicate conflict, and ambiguous candidate abstention tests are now explicit repeatable harnesses.
Product-shaped eval: v0.51 live covered research, story canon, relationship boundaries, personal preferences, and long-running agent continuity.
Cost guard: current prompt-injection runs are dry-run or direct self-hosted; frontier generation endpoints are blocked by policy audit.

Current Verdict, In Human Terms

Deck-safe claim

Small local models can be made materially more useful for long-memory workflows when memory selection, rejection, and provenance handling are moved into a controller layer instead of being left to raw prompt text.

Hard proof

v0.51 completed 80/80 live rows with 80 HTTP 200 responses, 100.0% parse success, 100.0% forbidden-fact absence, and 95.4% required-fact recall.

Hard weakness

The system is not yet robust enough for arbitrary exact long-term recall: failed-slice rerun coherence was 25.0%, especially in update precedence and long-running agent entity binding.

Research action

Do fewer visuals. After each major run, publish only this form: question, evidence, limits, next decision. v0.61 moves the agent-loop direction from "do not shorten" to "keep the original prompt body, but add the output contract at the tail."

Current Eval Harness State

The latest work extended the Granite Pilots lane from the earlier 0/256/1024 ladder through independent 2048, 4096, and 8192 pressure bands, then measured the next ceiling: a single-shot 16384 probe was rejected by nginx with HTTP 413 before model inference. We then tested chunked session continuation live. The backend accepted one pressure chunk and a compact final probe recovered all current facts semantically, but every two-chunk shape tested failed on the second chunk with server 500 Invalid input batch, including 1024x2. The most recent harness work adds a bounded adaptive queue so future Pilots jobs run sequentially, at most two jobs at a time, on the same durable session handle.

Artifact	Status	Hard fact	Why it matters
Objective audit	not_complete_live_gate_blocked	15 satisfied · 2 partial live · 6 blocked · contract IDs 23	Goal remains incomplete until live evidence covers story, relationship, psychology, agent, Q2 sequential state, and full-domain threshold pressure. Blocked IDs: `REQ-LIVE-STORY, REQ-LIVE-RELATIONSHIP, REQ-LIVE-PSYCHOLOGY, REQ-LIVE-AGENT, REQ-LIVE-Q2-SEQUENTIAL, REQ-LIVE-THRESHOLD-COVERAGE`.
Objective contract	False	satisfied IDs 15; blocked IDs 6; rule: live gate-admitted artifacts only	Machine-readable contract for CTO/API consumers. Dry-runs, gate refusals, and transport rejections are explicitly barred from counting as memory-quality proof.
Threshold frontier	analysis_ready_live_gate_blocked	live domains 1/5; highest live lower bound 2,048; Q2 live False; dry-exclusion live domains 0	Current hard claim: only `research_development` has a live pressure lower bound, at 2048. Unproven domains: `long_running_agent_workflows, personal_psychology_preference, relationship_boundary_editing, story_world_canon`. Separate dry-source exclusion proof prevents dry Q1/Q4 artifacts from becoming live threshold claims.
Final validation	pass	failures 0; warnings 0; suite index verified 1; closure plan verified 1; lock guard verified 1; adaptive queue verified 1; live boundary verified 1	The manifest, packet, comparison, checklist, objective closure plan, first-live subset, finalizer, full-suite artifact index, Granite Pilots session lock guard, adaptive scheduler, adaptive queue, chunked continuation runner, and live boundary analysis agree.
Admission gate refresh	pass	mode dry_run; manifest updated True; endpoint touched False; gate block_memory_quality_run; allow False	Fresh dry-run calibration plus v0.66 gate refresh. The current gate remains blocked because there is no HTTP 200 same-size calibration and no lease, but the manifest now follows the refreshed gate artifact.
Granite Pilots router	dry_run_plan_only	handle `velvet-starfall-crusader-9`; max active pilots 2; endpoint touched False	New session-aware route uses one durable handle and enforces the shared two-active-pilot policy before live calls.
Granite Pilots lock guard	verified	router dry_run_plan_only; probe dry_run_pass; pressure pass; endpoint touched false	All live-capable Pilots tools now share a nonblocking local session-handle lock plus the remote berth gate. This prevents two local runs from accidentally consuming multiple berths while another researcher is active.
Granite Pilots adaptive pressure	ready_no_live_endpoint	pass ceiling 8,192; gateway ceiling 16,384; target safe single-shot False	Future pressure runs are now mode-gated: replay 8192 safely, recheck 16384 only as an explicit gateway test, and use chunked session continuation for >8192 accumulated pressure. Safe replay command: `bash research/tracks/hypernym-infinite-mim/forge_runner.sh run-granite-pilots-pressure-ladder --live --run-id <run>_p8192_replay --bands 8192 --timeout 300`
Granite Pilots adaptive queue	pass	jobs 2/2; plan entries 2; handle velvet-starfall-crusader-9; endpoint touched False; endpoint touch allowed False	Sequential execution guard for shared Pilots capacity. The dry-run proves the route job and a 1024x2 chunked job run in order without live traffic, and now records per-job session handle, expected artifact, queue-lock, child-lock, and berth-gate requirements. Live mode adds a queue lock and the same two-active-pilots berth gate before any child runner executes.
Granite Pilots chunked continuation	pass	chunks 2; max chunk 8,192; accumulated band 16,384; endpoint touched False; compact final probe True	Dry-run verifies the correct next test shape for >8192: two bounded 8192 pressure turns on the same handle, then strict/semantic JSON recall at 1.0 without repasting the pressure block. This is not live capability evidence until run with `--live`.
Granite Pilots live chunk boundary	live_boundary_measured	single 4096 chunk semantic 100.0%; recall 100.0%; forbidden absence 100.0%; second-chunk failure bands 4	Measured live: one 4096 chunk plus compact final probe recalled all five current facts and no forbidden facts. Two-chunk runs failed on chunk 2 with `500 Invalid input batch` at 8192x2, 4096x2, 2048x2, and 1024x2.
Granite Pilots live probe	pass	turns 9; semantic 100.0%; strict 100.0%; undocked True	Measured live one-handle session memory: five current private facts recalled across research/story/relationship/psychology/agent domains, stale/rejected/foreign controls absent, berth freed after use.
Granite Pilots pressure ladder	pass	bands [0, 256, 1024]; highest strict 1,024; highest semantic 1,024; max prompt tokens 24,584	Measured live session recall under rising pressure on the same durable handle. 1024 band passed strict JSON recall with no stale/rejected/foreign leakage and then undocked.
Granite Pilots 2048 extension	pass	band [2048]; highest strict 2,048; highest semantic 2,048; max prompt tokens 52,933	Independent single-band extension. 2048 passed strict JSON recall in 55.1825s with no stale/rejected/foreign leakage and then undocked.
Granite Pilots 4096 extension	pass	band [4096]; highest strict 4,096; highest semantic 4,096; max prompt tokens 105,498	Independent single-band extension. 4096 passed strict JSON recall in 106.0359s with no stale/rejected/foreign leakage and then undocked.
Granite Pilots 8192 extension	pass	band [8192]; highest strict 8,192; highest semantic 8,192; max prompt tokens 210,631	Independent single-band extension. 8192 passed strict JSON recall in 196.4813s with no stale/rejected/foreign leakage and then undocked.
Granite Pilots 16384 ceiling	partial_or_fail	band [16384]; probe HTTP 413; elapsed 0.4259s; undocked True	Gateway/request-size ceiling. Setup turns were admitted, but the 16384 probe was rejected by nginx before model inference, so this is transport-admission evidence.
Next-live evidence matrix	ready_but_gate_blocked	6 gaps · 27 rows/turns · Q1 4, Q2 15, Q4 8; parallel live calls False; max pilots 2; queue pass	Each blocked objective gap has a specific label or turn trace. Queue and no-parallel controls are machine-readable before the next admitted run.
Objective closure plan	ready_but_gate_blocked	6/6 blocked requirements mapped; first-live 27 rows/turns; overlap sum 52; max pilots 2	Human/CTO handoff for the remaining work: REQ-LIVE-STORY, REQ-LIVE-RELATIONSHIP, REQ-LIVE-PSYCHOLOGY, REQ-LIVE-AGENT, REQ-LIVE-Q2-SEQUENTIAL, REQ-LIVE-THRESHOLD-COVERAGE. Each item names the exact runner, labels, close condition, stop condition, and gate dependency. No live traffic until the manifest-selected v0.66 gate admits it.
Indexed dry run	dry_run_pass	steps: validate-eval-suite, q1-cross-domain-tail-contract, q2-multi-turn-personal-memory, q3-tenant-foreign-boundary-regression, q4-sensitive-preference-boundary-abstention	Harness proof only. It exercises Q1/Q2/Q3/Q4 without endpoint traffic.
Indexed live refusal	blocked_by_gate	live endpoint touched: False	Under v0.66, full live suite refuses before touching the direct-IP endpoint.
Full-suite finalizer	certification_artifacts_ready	Q3 safety-control: True; Q1/Q2/Q4 live-like: {'q1_scores': False, 'q2_scores': False, 'q4_scores': False}	Consumes the suite artifact index and produces threshold plus version-comparison artifacts. Dry-index use remains harness evidence only.
Version packet	v3-candidate	fingerprinted files: 38; command modes: preflight, audit, validation, dry_run, pilots_queue_dry, gate_check, first_live_subset, first_live_threshold_finalize, live_when_gate_allows, full_suite_certification_finalize, version_compare	Future V3 or V-next can be rerun against the same objective, manifest-selected gate artifact, queue preflight, and comparison contract.
Comparator	comparison_scaffold_ready_no_candidate_artifacts	candidate Q1/Q2/Q3/Q4 artifacts supplied: none	Correctly refuses to create a capability delta until live candidate artifacts exist.
Launch checklist	ready_but_gate_blocked	steps: 11; first-live minimum rows: 27	Operator path is explicit: status, audit, validation, suite dry-run, Pilots queue dry-run, gate, first-live subset, threshold finalizer, live suite, full-suite finalizer, version compare.

Artifact index command trace: bash research/tracks/hypernym-infinite-mim/forge_runner.sh compare-version-eval-results --run-id 20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_compare --candidate-version <version> --q1-scores research/tracks/hypernym-infinite-mim/results/v0.63-unscored-domain-drain-resume/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_q1/scores.json --q2-scores research/tracks/hypernym-infinite-mim/results/q2-multi-turn-personal-memory-session/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_q2/scores.json --q3-scores research/tracks/hypernym-infinite-mim/results/q3-tenant-foreign-boundary-regression-plan/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_q3_plan/plan.json --q4-scores research/tracks/hypernym-infinite-mim/results/q4-sensitive-preference-boundary-abstention/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_q4/scores.json

Threshold command trace: bash research/tracks/hypernym-infinite-mim/forge_runner.sh analyze-threshold-boundary --run-id 20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_threshold --live-scores research/tracks/hypernym-infinite-mim/results/v0.63-unscored-domain-drain-resume/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_q1/scores.json --live-scores research/tracks/hypernym-infinite-mim/results/q4-sensitive-preference-boundary-abstention/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_q4/scores.json --q2-live-scores research/tracks/hypernym-infinite-mim/results/q2-multi-turn-personal-memory-session/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1_q2/scores.json

Evidence Ladder

Layer	Status	Evidence	Meaning
Transport under pressure	measured live	198,888 prompt tokens, 48.6x native class	The endpoint path can accept and complete much larger prompt pressure than the model's native context class.
Quality floor	measured live	Semantic 100.0%; provenance 80.0%	Basic structured recall can work under controlled pressure.
Pocket generalization	measured mixed	825,958 prompt tokens total; exact provenance 28.6%	There is a real exactness pocket, but the system drifts outside that pocket.
Empty-result contract	prepared dry-run	v0.47 verifier 100.0%	The harness now tests that no memory selected means no invented placeholder memory.
Duplicate conflict rejection	prepared dry-run	v0.48 page safe 100.0%; bundle safe 83.3%	The harness now tests stale/shadow duplicate binding conflicts before live launch.
Ambiguous candidate abstention	prepared dry-run	v0.49 abstention modes 100%; aggregate 95.8%	The harness now tests that candidate hints do not become recalled memory without controller selection.
Personal entity coherence	measured live	v0.51 true-north 84.6%; coherence 81.2%; rows 80/80; prompt tokens 1,199,692	First complete live eval shaped like single-user long-term memory across research, story, relationship, psychology, and agent workflows.
Timeout-aware pressure retry	measured live	v0.57 strict true-north 74.1%; semantic true-north 74.1%; rows 9/9; prompt tokens 370,933	Serving contamination from v0.56 was fixed; quality still drops at 1024/2048 pressure.
Granite Pilots session pressure	measured live	bands [0, 256, 1024]; highest strict pass 1,024; 1024-band prompt tokens 24,584; max probe latency 58.1629s	The session endpoint preserved five current private facts and rejected stale/rejected/foreign controls under rising pressure on a persistent one-handle memory lane.
Granite Pilots 2048 extension	measured live	band [2048]; prompt tokens 52,933; latency 55.1825s; strict/semantic pass 2,048/2,048	The next pressure boundary passed independently with the same durable handle, max two-pilot policy, no forbidden hits, and mandatory undock.
Granite Pilots 4096 extension	measured live	band [4096]; prompt tokens 105,498; latency 106.0359s; strict/semantic pass 4,096/4,096	The next pressure boundary passed independently with the same durable handle, max two-pilot policy, no forbidden hits, and mandatory undock.
Granite Pilots 8192 extension	measured live	band [8192]; prompt tokens 210,631; latency 196.4813s; strict/semantic pass 8,192/8,192	The next pressure boundary passed independently with the same durable handle, max two-pilot policy, no forbidden hits, and mandatory undock.
Granite Pilots 16384 ceiling	measured live	band [16384]; probe status HTTP 413; elapsed 0.4259s; prompt tokens n/a	The current shared gateway rejects the next band before inference. The proven memory pass ceiling is therefore 8192 until request-size limits are changed.
Chunked continuation route	live boundary found	1 × 4096 semantic recall pass; 2 × 1024 failed on second chunk; 2 × 2048 failed; 2 × 4096 failed; 2 × 8192 failed	The next optimization is not prompt wording. The backend needs multi-chunk session admission, compaction between pressure chunks, or a mode that prevents the prior pressure block from making the next batch invalid.
Current-fact locking retry	serving blocked	v0.58 attempted 1 row, HTTP 200 0/1, stop reason `non_200_after_v058-research_update_precedence-baseline_verbatim_current_facts-p1024`	No memory-quality evidence yet. `/health` can be OK while the single chat slot is busy.
Idle-gated fact-locking retry	partial live	v0.59 attempted 15 rows, HTTP 200 14/15, strict/semantic true-north 100.0%/100.0%	Chat-slot idle gate worked; completed rows were clean, but one agent-loop locked-ID row hit HTTP 500 Invalid input batch.
Agent-loop input-batch hardening	measured live	v0.60 completed 6 rows, HTTP 200 6/6; original replay coherence 100.0%	The v0.59 500 did not reproduce. Original controller-expands passed; shorter hardened/budgeted variants returned plain text and failed parse.
Agent-loop tail output contract	measured live	v0.61 completed 6 rows, HTTP 200 5/6; tail-contract coherence 100.0%; tail-schema coherence 100.0%	Tail-anchored output contract fixed the v0.60 short-prompt plain-text collapse on scored rows. Original-control 2048 had one remote disconnect and should be treated as a serving boundary, not a memory-quality pass.

v0.51 Live Findings

Transport: 80/80 rows returned HTTP 200; no non-200 rows recorded.
Format control: parse success 100.0%; entity id exactness 92.5%.
Memory safety: forbidden-fact absence 100.0%; no stale/forbidden phrase leakage in scored answers.
Recall: required fact recall 95.4%; coherence pass 81.2%.
Focused audit: rerunning the original failed slices produced 25.0% coherence while keeping parse and forbidden-fact absence at 100.0%/100.0%.

v0.51 Constraints

Latency: p50 2.4586s; p90 89.9753s; max 93.018s across completed rows.
Pressure behavior: 1024-band rows were slow but scored best: coherence 100.0%.
Weakest domain: long-running agent task at 56.2%; failures were mostly entity-id exactness and temporal update handling, not fact leakage.
Temporal updates: some rows remembered core facts but missed the newest update condition; V3 needs explicit update precedence.

Scenario Scorecard

This is the part that should drive product and CTO decisions. It shows where the memory system is already useful, and where V3 should focus.

Scenario	Coherence	Meaning	Decision
Relationship boundary memory	100.0%	Best current product fit: safety-sensitive stated preferences and boundaries stayed clean.	candidate demo lane
Story-world canon	87.5%	Strong for creative continuity; still needs explicit supersession rules for canon changes.	keep testing
Personal preference / psychology	81.2%	Useful but must be conservative because wrong continuity can feel personally invasive.	guard heavily
Research program continuity	81.2%	Promising for long-running research, but update precedence is the blocker.	V3 target
Long-running agent task	56.2%	The main failure cluster: facts are often present, but entity binding and task identity drift.	do not demo yet

v0.52 Targeted Follow-Up

v0.52 is live evidence now. It targeted the v0.51 failure cluster: update precedence and stable memory-key/entity binding under stale and foreign near-duplicate pressure.

Composite live result: 24 rows, 24 HTTP 200 after focused replacement, 74.1% true-north score.
Safety preserved: stale-fact absence 100.0%; foreign-fact absence 100.0%.
Unexpected result: prose memory coherence 91.7%; structured control-plane coherence 66.7%.
Pressure cliff: 1024-pressure coherence 50.0%; 64-pressure coherence 100.0%.

Convergence Standard

Future pages should only publish after the run has a named hypothesis, stored scores, and a convergence/review artifact. Raw visual output without this context is not a research result.

Every page must state the research question and why it matters to true north.
Every chart/table must have a decision attached to it.
Every institutional claim must distinguish measured fact, inference, and open question.
Dry-runs may validate harness logic; they must not be framed as live model capability.

v0.53 Format / Schema / Pressure Diagnostic

v0.53 is live evidence. It tested whether the v0.52 failure cluster was caused by record format, output schema, or pressure budget. The result is constraining: every row returned HTTP 200, but quality fell sharply under the diagnostic matrix.

Slice	Measured result	CTO meaning
All rows	36 rows, 36 HTTP 200, 811,194 prompt tokens	Transport is not the blocker in this run; quality and binding are.
Strict true north	25.9%	The strict machine-contract path is not yet robust enough for production memory claims.
Semantic true north	46.3%	Even allowing field recovery, many rows lose rejected-key completeness or admit stale/foreign facts.
Best memory format	prose coherence 50.0%; compact KV 25.0%; verbose JSONL 8.3%	More explicit structure did not solve the problem. Prose remains the current reliable baseline.
Output contract	strict JSON coherence 55.6%; field-tagged coherence 0.0%	Relaxing the answer envelope did not recover coherence. Use strict JSON plus verifier/repair, not looser text.
Pressure cliff	0-pressure semantic pass 72.2%; 1024-pressure semantic pass 33.3%	Pressure still materially degrades reliable memory behavior.

v0.54 Runtime-Selected Payload Diagnostic

v0.54 is live evidence. It tested whether the system improves when the controller sends selected memory instead of asking the model to sort through stale and foreign fact bodies. This is the strongest post-v0.53 result and should drive V3.

Slice	Measured result	CTO meaning
All rows	36 rows, 36 HTTP 200, 822,190 prompt tokens	Transport stayed clean; this is a quality/control-plane signal.
Strict true north	77.8%	Recovered sharply from v0.53 strict true-north 25.9%.
Semantic true north	85.2%	Recovered sharply from v0.53 semantic true-north 46.3%.
Noisy full payload	coherence 58.3%; semantic 58.3%	Letting the model see stale and foreign fact bodies is worse.
Selected current only	coherence 100.0%; semantic 100.0%	Best quality result. Runtime selection works when audit handles are not required.
Selected + rejected handles	coherence 66.7%; semantic 91.7%	Promising audit envelope, but exact rejected-key set formatting still needs repair.
Verifier repair	repair attempted 13.9%; repair improved 0.0%	The current repair prompt does not fix failures. V3 needs deterministic repair or a stronger constrained decoder.

v0.55 Deterministic Envelope Diagnostic

v0.55 is live evidence. It tested whether the controller should fill rejected handles and identity metadata instead of making the model emit the full memory-control envelope. The answer is mixed: controller-filled envelopes are better than model-filled full JSON, but pressure recall remains the blocker.

Slice	Measured result	CTO meaning
All rows	18 rows, 18 HTTP 200, 363,059 prompt tokens	Transport stayed clean; failures are quality/control-plane under pressure.
Strict true north	29.6%	Deterministic envelope alone did not recover v0.54 quality.
Semantic true north	33.3%	Semantic recovery is also low because current-fact recall collapses under pressure.
Model fills full JSON	coherence 0.0%; semantic 16.7%; rejected-key recall 50.0%	Do not ask the model to own the full audit envelope. Strict coherence was zero in this run.
Controller fills rejected handles	coherence 50.0%; semantic 50.0%	Better than full model envelope, but still vulnerable when the model must emit IDs and current facts under pressure.
Controller fills all IDs	coherence 66.7%; semantic 66.7%; rejected-key recall 100.0%	Best envelope mode. Controller should own identity and rejection metadata.
Pressure split	0-pressure semantic 77.8%; 1024-pressure semantic 11.1%	The next optimization is not more envelope filling; it is preserving current-fact recall under pressure.

v0.56 Selected-Current Controller Envelope Pressure Recall

v0.56 is live evidence, but not a completed quality matrix. It tested the v0.54+v0.55 synthesis: selected-current payloads plus controller-owned identity/rejection metadata at 0, 1024, and 2048 pressure. The important result is a serving boundary: the 2048-pressure row exceeded the 150s client timeout and the backend continued processing it, so subsequent sequential rows got single-flight 503 Backend busy.

Slice	Measured result	CTO meaning
Completed matrix	27 attempted rows, 2 HTTP 200, 41,539 prompt tokens counted from successful calls	The matrix ran, but quality conclusions are valid only for the first two completed rows.
0-pressure row	coherence 100.0%; semantic 100.0%	The selected-current/controller envelope path works cleanly without pressure in the observed slice.
1024-pressure row	current-fact recall 66.7%; semantic 0.0%	At 1024 pressure, exact IDs and rejected handles stayed correct, but one current fact was replaced by a weaker memory-key statement.
2048-pressure row	first 2048 row timed out after 150s; later rows returned 503 busy	This is a serving-control boundary, not a memory-quality verdict. V3 harness needs cancellation-aware timeout, retry-after sleep, or per-row cooldown.
Rejected handles	rejected-key recall 100.0%; exact-clean 100.0% over completed rows	Controller-owned rejected handles remained exact in completed rows; pressure recall, not metadata exactness, is the unresolved issue.

v0.57 Timeout-Aware Sharded Pressure Retry

v0.57 is live evidence. It reran the v0.56 question with a smaller 9-row matrix, 300s timeout, cooldowns, and stop-after-timeout behavior. This separates serving control from memory quality: the serving path completed cleanly, but recall quality remained pressure-dependent.

Slice	Measured result	CTO meaning
Serving control	9/9 rows completed, 9/9 HTTP 200, stop_reason `None`, partial `False`	The v0.56 503 cascade was harness/serving contamination, not proof that 2048 pressure cannot complete.
Token pressure	370,933 prompt tokens and 1,199 completion tokens across 9 rows	The run is a real pressure eval over the direct-IP lane, not a dry-run or local-only verifier result.
Overall quality	strict true-north 74.1%; semantic true-north 74.1%; coherence 77.8%	Useful memory behavior exists, but it is not yet a production-grade arbitrary recall guarantee.
0 pressure	coherence 100.0%; semantic 100.0%	The selected-current/controller-owned envelope is clean when not under added pressure.
1024 pressure	coherence 66.7%; semantic 66.7%; current-fact recall 100.0%	At 1024 pressure, current facts remained present, but one row hit length/parse/entity-output failure.
2048 pressure	coherence 66.7%; semantic 66.7%; current-fact recall 77.8%	2048-pressure rows can complete under a 300s wall, but research-update precedence lost current facts in one row.
Safety invariants	stale-fact absence 100.0%; foreign-fact absence 100.0%	The most important safety invariant stayed clean across this retry: stale and foreign facts were not admitted.

v0.58 Current-Fact Locking + Output Budget

v0.58 is staged, approved, and dry-run validated, but the first live attempt did not reach quality scoring. The first row returned 503 Backend busy on all four attempts. Preflight and final /health were 200 OK, which means health is not sufficient proof that the single inference slot is idle.

Slice	Measured result	CTO meaning
Dry-run	18/18 rows, strict/semantic true-north 100%	Harness/scoring mechanics are valid; this is not live model evidence.
Live attempt	1/18 rows attempted, 0/1 HTTP 200, stop reason `non_200_after_v058-research_update_precedence-baseline_verbatim_current_facts-p1024`	The runner stopped safely before contaminating the matrix.
First-row response	HTTP 503 Backend busy after configured retries	The endpoint can be healthy while the chat slot is occupied. Retry must add chat-slot idle probing or a longer cool-down.
Quality score	not available	Do not claim fact-ID locking improved recall until a clean live matrix exists.

v0.59 Chat-Slot Idle-Gated Fact Locking

v0.59 is partial live evidence. It reran the v0.58 fact-locking matrix after proving the chat slot was idle with a tiny authenticated chat probe. This avoided the v0.58 503 busy cascade and produced 14 successful scored rows before one fast HTTP 500 Invalid input batch stopped the matrix.

Slice	Measured result	CTO meaning
Idle gate	idle probe success `True`; attempts 1	The correct readiness probe is a tiny chat request, not only `/health`.
Live attempt	15/18 rows attempted, 14/15 HTTP 200, stop reason `non_200_after_v058-agent_loop_directive-locked_fact_ids_controller_expands-p1024`	The run produced quality evidence before stopping; it is partial, not a full 18-row matrix.
Completed-row quality	strict true-north 100.0%; semantic true-north 100.0%; coherence 100.0%	Every completed scored row was coherent and semantically correct under the harness.
Safety invariants	stale absence 100.0%; foreign absence 100.0%; rejected-key recall 100.0%	The completed rows preserved the core safety invariant: no stale/foreign memory admitted.
Best variant	budgeted fact-ID output: current-fact-ID recall 100.0%; n=4	Budgeted fact-ID output is the strongest completed V3 direction, but it has only four completed rows in this partial run.
Boundary	non-200 row: `v058-agent_loop_directive-locked_fact_ids_controller_expands-p1024`	This is not the v0.58 busy failure. It is an input-batch validity edge on a specific agent-loop locked-ID prompt shape.

v0.60 Agent-Loop Input-Batch Hardening

v0.60 is live evidence. It isolated the v0.59 agent-loop HTTP 500 boundary with a six-row matrix: original replay, hardened JSON-ID instruction, and budgeted ID output at 1024 and 2048 pressure. All six rows returned HTTP 200, so the v0.59 Invalid input batch did not reproduce.

Slice	Measured result	CTO meaning
Serving	6/6 rows completed, 6/6 HTTP 200, stop reason `None`	The v0.59 500 was not a stable reproducible input-batch failure under identical replay.
Original replay	coherence 100.0%; current-fact-ID recall 100.0%; n=2	The original controller-expands instruction remains the best agent-loop ID contract at 1024/2048 pressure.
Hardened JSON IDs	coherence 0.0%; parse 0.0%	The shorter literal-array instruction caused plain-text output, not safer JSON.
Budgeted output	coherence 0.0%; parse 0.0%	Budgeted ID output was not robust for the agent-loop slice even though it looked strong in v0.59's completed non-agent rows.
Safety invariant	stale absence 100.0%; foreign absence 100.0%	Even failed parse rows did not admit stale or foreign memory text; the failure mode was output-contract loss.

v0.61 Agent-Loop Tail Output Contract

v0.61 is live evidence. It kept the agent-loop prompt body close to the working v0.60 original-control path and moved the JSON/output contract to the prompt tail. The measured result is not a clean 6/6 transport pass: one original-control 2048 row disconnected after 152s. The two new tail-contract variants passed all four rows they owned at 1024 and 2048 pressure.

Slice	Measured result	CTO meaning
Serving	6 rows attempted, 5/6 HTTP 200, non-200 `v061-agent_loop_directive-controller_expands_original_control-p2048`	Do not overclaim a full transport pass. One original-control 2048 row hit `RemoteDisconnected`.
Tail contract	coherence 100.0%; parse 100.0%; n=2	Tail-appended output contract preserved JSON/ID compliance without the plain-text collapse seen in v0.60's shortened variants.
Tail schema example	coherence 100.0%; parse 100.0%; n=2	The schema-example tail also passed 1024 and 2048 pressure; this is now the best next candidate to rerun at wider probe diversity.
Safety invariant	scored-row stale absence 100.0%; foreign absence 100.0%; rejected-key recall 100.0%	On the scored rows, the system kept current facts and rejected stale/foreign handles under pressure.

What Is Not Proven

Not proven: general random-access exact recall across arbitrary positions.
Not proven: production multi-tenant safety under live traffic.
Not proven: transfer of the same multiplier to frontier-scale models.
Not proven: stable correction on known failed slices; focused rerun confirmed the weak cluster.
Not a leaderboard-certified MTRAG result in this board.

V3 Optimization Targets

Make controller-selected payloads first-class: selected, candidate, stale, revoked, and foreign-tenant states must be impossible to confuse.
Improve arbitrary-position exactness, especially non-tail provenance recovery.
Promote focused-row/resume execution into the standard harness so hard suite walls do not create partial matrices.
Promote update precedence to a first-class memory field: original fact, superseding fact, effective timestamp, and scope.
Add stable entity-id constraints for agent-task memories; current failures often recall facts but return the wrong entity id.
Use runtime-selected payloads as the V3 default; v0.54 shows selected-current-only hit 100% coherence in that matrix.
Make controller-filled identity and rejection metadata the default; v0.55 shows `controller_fills_all_ids` is the best envelope mode.
Do not assume deterministic envelopes solve recall; v0.55 1024-pressure semantic pass was only 11.1%.
Add cancellation-aware serving control before larger pressure sweeps; v0.56 showed client timeout can leave the backend busy and contaminate later sequential rows.
Add pressure-aware output budgeting; v0.57 showed one 1024-pressure row failed through length/parse output despite current facts being available.
Add current-fact locking for update precedence; v0.57 showed one 2048-pressure research-update row dropped to 33.33% current-fact recall.
Keep chat-slot idle probing; v0.59 showed it avoids the v0.58 busy cascade.
Do not over-shorten agent-loop ID prompts; v0.60 showed original controller-expands passed while shorter hardened/budgeted variants returned plain text.
Use tail-anchored output contracts for agent-loop prompts; v0.61 showed tail-contract and schema-example variants passed 1024/2048 scored rows while preserving parse and ID recall.
Do not rely on the current repair prompt; v0.54 repair attempts produced 0% measured improvement.
Expose substrate observability per run: what was selected, what was rejected, and why.

Next Decision

continue Build V3 update-precedence and stable-entity binding tests from the confirmed failure cluster.
tighten Replace broad dashboards with decision boards after major rounds only.
measure Add latency/cost normalization to true-north score so slow 1024-band wins do not hide serving constraints.
do not claim “infinite perfect memory” or “general exact recall” yet.

Deck-Safe One-Liner

Hypernym Infinite Memory is a memory control plane for model fleets: per-tenant memory stores, controller-curated recall, provenance-handle verification, and lower long-memory serving cost for small local models under extreme context pressure.

Deck-safe caveat: current evidence supports pressure handling and prepared control-plane safety tests; general exact arbitrary recall remains the optimization target.

Data Trace

Every row below is a retrieval handle. A future agent, CTO, or local API client should be able to use these paths to pull the raw score object, reconstruct the claim, and follow the run into the ledger or CXDB handoff. This mirrors the RMT/Hermes lineage frame: run id to score artifact to finding to durable handoff.

Compound Research Chain

These are the prior pages and local artifacts this board compounds. Public links are directly openable. Local artifact paths are intended for Forge/CXDB/API retrieval from the workstation or repository.

Document	Kind	Reference
Current public board	public page	https://hypernym-infinite-memory-v09.pages.dev/
Previous immutable board with v0.55	public page	https://2f1e75b6.hypernym-infinite-memory-v09.pages.dev
Previous immutable objective-audit board	public page	https://1b03ee20.hypernym-infinite-memory-v09.pages.dev/
Previous immutable board with v0.54	public page	https://1db5d7c7.hypernym-infinite-memory-v09.pages.dev
Institutional 2026-06-09 page	local artifact	`.forge/artifacts/hypernym-infinite-memory-institutional-20260609.html`
CTO findings 2026-06-09 page	local artifact	`.forge/artifacts/hypernym-infinite-memory-cto-findings-20260609.html`
Research ledger JSON	local artifact	`.forge/artifacts/hypernym-infinite-memory-research-ledger.json`
Compound visualization standard	local standard	`research/tracks/hypernym-infinite-mim/compound-research-visualization-standard.md`

API Pull Targets

Canonical local query shape for future automation:

jq '.summary' research/tracks/hypernym-infinite-mim/results/<suite>/<run_id>/scores.json
jq '.latest_results' .forge/artifacts/cxdb-hypernym-infinite-mim-handoff-20260610.json
jq '.runs.v057_timeout_aware_sharded_pressure_retry_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.json
jq '.runs.v058_current_fact_locking_output_budget_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.json
jq '.runs.v059_chat_slot_idle_gated_fact_locking_retry_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.json
jq '.runs.v060_agent_loop_input_batch_hardening_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.json
jq '.runs.v061_agent_loop_tail_output_contract_live' .forge/artifacts/hypernym-infinite-memory-research-ledger.json
jq '.summary' research/tracks/hypernym-infinite-mim/results/granite-pilots-session-memory-probe/20260610T_granite_pilots_session_memory_probe_live_codex_v2/scores.json
jq '.summary' research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_live_codex_v1/scores.json
jq '.summary' research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_2048_live_codex_v1/scores.json
jq '.summary' research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_4096_live_codex_v1/scores.json
jq '.summary' research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_8192_live_claude_v1/scores.json
jq '.summary' research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_16384_live_codex_v1/scores.json
jq '.summary' research/tracks/hypernym-infinite-mim/results/granite-pilots-chunked-session-continuation/20260610T_granite_pilots_chunked_session_continuation_dryrun_codex_v1/scores.json
jq '.hard_facts' research/tracks/hypernym-infinite-mim/results/granite-pilots-chunked-session-boundary/20260610T_granite_pilots_chunked_session_boundary_live_codex_v1/analysis.json
jq '{summary, execution_plan}' research/tracks/hypernym-infinite-mim/results/granite-pilots-adaptive-job-queue/20260610T_granite_pilots_adaptive_job_queue_plantrace_dryrun_codex_v1/queue.json
Suggested CXDB lineage key: research:hypernym-infinite-mim:<run_id>:<finding_id>

Claim	Run / id	Type	Source path	Used for
largest observed pressure	`20260608T202409Z`	measured fact	`research/tracks/hypernym-infinite-mim/results/byte-threshold/20260608T202409Z/scores.json`	48.56x native-context pressure claim
v0.51 personal entity coherence	`20260610T_personal_entity_coherence_combined_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.51-personal-entity-coherence-threshold/20260610T_personal_entity_coherence_combined_live_codex_v1/scores.json`	single-user memory true-north baseline
v0.52 update precedence / entity binding	`20260610T_update_precedence_entity_binding_combined_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.52-update-precedence-entity-binding/20260610T_update_precedence_entity_binding_combined_live_codex_v1/scores.json`	prose-vs-structured prompt comparison
v0.53 record format / pressure diagnostic	`20260610T_record_format_schema_pressure_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.53-record-format-schema-pressure-diagnostic/20260610T_record_format_schema_pressure_live_codex_v1/scores.json`	strict vs semantic contract failure cluster
v0.54 selected payload recovery	`20260610T_runtime_selected_minimal_payload_strict_repair_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.54-runtime-selected-minimal-payload-strict-repair/20260610T_runtime_selected_minimal_payload_strict_repair_live_codex_v1/scores.json`	selected-current payload V3 lever
v0.55 deterministic envelope constraint	`20260610T_deterministic_rejected_handle_envelope_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.55-deterministic-rejected-handle-envelope/20260610T_deterministic_rejected_handle_envelope_live_codex_v1/scores.json`	controller metadata helps; pressure recall still fails
v0.56 pressure serving boundary	`20260610T_selected_current_controller_envelope_pressure_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.56-selected-current-controller-envelope-pressure-recall/20260610T_selected_current_controller_envelope_pressure_live_codex_v1/scores.json`	2048-pressure timeout and single-flight busy contamination boundary
v0.57 timeout-aware pressure retry	`20260610T_timeout_aware_sharded_pressure_retry_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.57-timeout-aware-sharded-pressure-retry/20260610T_timeout_aware_sharded_pressure_retry_live_codex_v1/scores.json`	serving-control fix and pressure-dependent memory quality score
v0.58 serving-slot busy stop	`20260610T_current_fact_locking_output_budget_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.58-current-fact-locking-output-budget/20260610T_current_fact_locking_output_budget_live_codex_v1/scores.json`	health-ok but chat-slot-busy boundary before quality evidence
v0.59 idle-gated fact-locking partial live result	`20260610T_chat_slot_idle_gated_fact_locking_retry_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.59-chat-slot-idle-gated-fact-locking-retry/20260610T_chat_slot_idle_gated_fact_locking_retry_live_codex_v1/scores.json`	chat-idle gate success, fact-ID variant signal, and HTTP 500 input-batch boundary
v0.60 agent-loop input-batch hardening result	`20260610T_agent_loop_input_batch_hardening_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.60-agent-loop-input-batch-hardening-fact-locking-completion/20260610T_agent_loop_input_batch_hardening_live_codex_v1/scores.json`	proves v0.59 HTTP 500 did not reproduce and original controller-expands passed agent-loop 1024/2048
v0.61 agent-loop tail output-contract result	`20260610T_agent_loop_tail_output_contract_live_codex_v1`	measured fact	`research/tracks/hypernym-infinite-mim/results/v0.61-agent-loop-tail-output-contract/20260610T_agent_loop_tail_output_contract_live_codex_v1/scores.json`	tail contract and schema-example variants passed 1024/2048; original-control 2048 had one transport disconnect
current objective readiness audit	`20260610T_objective_completion_threshold_frontier_codex_v1`	gate/audit artifact	`research/tracks/hypernym-infinite-mim/results/objective-readiness-audit/20260610T_objective_completion_threshold_frontier_codex_v1/audit.json`	current completion matrix and objective contract: 15 satisfied harness/control requirements, 2 partial live evidence, 6 blocked live-certification requirements; includes 8192 pass, 16384 gateway ceiling, and manifest-backed gate refresh
current eval-suite validation	`20260610T_eval_objective_closure_plan_codex_v3`	validation artifact	`research/tracks/hypernym-infinite-mim/results/eval-suite-manifest-validation/20260610T_eval_objective_closure_plan_codex_v3/report.json`	proves the current suite artifact index, audited 8192/16384 boundary, objective contract, closure plan, manifest-selected gate, Granite Pilots controls, packet, comparison, checklist, first-live subset, threshold finalizer, and full-suite certification finalizer are internally consistent
threshold frontier analysis	`20260610T_threshold_boundary_frontier_codex_v1`	threshold frontier artifact	`research/tracks/hypernym-infinite-mim/results/threshold-boundary-analysis/20260610T_threshold_boundary_frontier_codex_v1/analysis.json`	proves only research_development has current live lower-bound evidence at 2048, while dry sources are excluded from live threshold claims
admission gate refresh	`20260610T_admission_gate_refresh_dry_codex_v2`	admission-control refresh	`research/tracks/hypernym-infinite-mim/results/admission-gate-refresh/20260610T_admission_gate_refresh_dry_codex_v2/refresh.json`	fresh dry-run calibration plus v0.66 gate decision; no live endpoint touch; manifest points to the refreshed blocked gate
Granite Pilots session router	`20260610T_pilot_session_router_dryrun_codex_v1`	session routing guard	`research/tracks/hypernym-infinite-mim/results/granite-pilots-session-router/20260610T_pilot_session_router_dryrun_codex_v1/route.json`	enforces one durable handle for this track, session field on chat bodies, max two active pilots, and undock-when-idle shared-capacity behavior
Granite Pilots live session-memory probe	`20260610T_granite_pilots_session_memory_probe_live_codex_v2`	measured live fact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-session-memory-probe/20260610T_granite_pilots_session_memory_probe_live_codex_v2/scores.json`	proves the one-handle Granite Pilots lane can recall five current private memory facts across domains, reject stale/rejected/foreign controls, and undock after use under the two-active-pilot cap
Granite Pilots live pressure ladder	`20260610T_granite_pilots_pressure_ladder_live_codex_v1`	measured live fact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_live_codex_v1/scores.json`	measures private session recall under sequential pressure bands 0, 256, and 1024; highest strict pass 1024 with 24,584 reported prompt tokens and no stale/rejected/foreign leakage
Granite Pilots 2048 pressure extension	`20260610T_granite_pilots_pressure_ladder_2048_live_codex_v1`	measured live fact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_2048_live_codex_v1/scores.json`	independently measures the next pressure boundary: 2048 band passed strict and semantic recall with 52,933 reported prompt tokens, 55.18s latency, no forbidden hits, and mandatory undock
Granite Pilots 4096 pressure extension	`20260610T_granite_pilots_pressure_ladder_4096_live_codex_v1`	measured live fact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_4096_live_codex_v1/scores.json`	independently measures the next pressure boundary: 4096 band passed strict and semantic recall with 105,498 reported prompt tokens, 106.04s latency, no forbidden hits, and mandatory undock
Granite Pilots 8192 pressure extension	`20260610T_granite_pilots_pressure_ladder_8192_live_claude_v1`	measured live fact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_8192_live_claude_v1/scores.json`	independently measures the next pressure boundary: 8192 band passed strict and semantic recall with 210,631 reported prompt tokens, 196.48s latency, no forbidden hits, and mandatory undock
Granite Pilots 16384 pressure ceiling	`20260610T_granite_pilots_pressure_ladder_16384_live_codex_v1`	measured gateway ceiling	`research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_16384_live_codex_v1/scores.json`	identifies the current request-size ceiling: setup turns admitted, 16384 probe rejected by nginx HTTP 413 Request Entity Too Large in 0.43s, no memory-quality verdict, mandatory undock
Granite Pilots local session lock guard	`20260610T_granite_pilots_pressure_ladder_lock_dryrun_codex_v1`	tooling guard artifact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-pressure-ladder/20260610T_granite_pilots_pressure_ladder_lock_dryrun_codex_v1/scores.json`	proves dry-run lock metadata and validates live-capable Pilots tools now use a shared nonblocking local session-handle lock plus remote berth gate
Granite Pilots adaptive pressure plan	`20260610T_granite_pilots_adaptive_pressure_plan_codex_v3`	pressure scheduler artifact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-adaptive-pressure-plan/20260610T_granite_pilots_adaptive_pressure_plan_codex_v3/plan.json`	encodes 8192 as measured pass ceiling, 16384 as gateway ceiling, and directs >8192 exploration toward chunked session continuation rather than oversized single-shot probes
Granite Pilots chunked session continuation	`20260610T_granite_pilots_chunked_session_continuation_dryrun_codex_v1`	verified harness artifact	`research/tracks/hypernym-infinite-mim/results/granite-pilots-chunked-session-continuation/20260610T_granite_pilots_chunked_session_continuation_dryrun_codex_v1/scores.json`	proves the next >8192 route accumulates 16384 equivalent pressure as two <=8192 session chunks, then asks a compact final probe without repasting the synthetic bundle
Granite Pilots live chunked boundary	`20260610T_granite_pilots_chunked_session_boundary_live_codex_v1`	measured live boundary	`research/tracks/hypernym-infinite-mim/results/granite-pilots-chunked-session-boundary/20260610T_granite_pilots_chunked_session_boundary_live_codex_v1/analysis.json`	proves current backend accepts one pressure chunk plus compact semantic recall, but rejects a second pressure chunk with 500 Invalid input batch at 1024x2/2048x2/4096x2/8192x2
Granite Pilots adaptive job queue	`20260610T_granite_pilots_adaptive_job_queue_plantrace_dryrun_codex_v1`	sequential execution guard	`research/tracks/hypernym-infinite-mim/results/granite-pilots-adaptive-job-queue/20260610T_granite_pilots_adaptive_job_queue_plantrace_dryrun_codex_v1/queue.json`	proves future Pilots work has a bounded two-job sequential queue with per-job session handle, lock, berth-gate, endpoint-touch, and expected-artifact trace
next live evidence matrix	`20260610T_next_live_evidence_matrix_threshold_frontier_codex_v1`	objective-to-row ledger	`research/tracks/hypernym-infinite-mim/results/next-live-evidence-matrix/20260610T_next_live_evidence_matrix_threshold_frontier_codex_v1/matrix.json`	maps each of the six unproved objective gaps to exact labels, turns, runners, stop conditions, and the 27-row first-live certification subset
objective completion contract	`20260610T_objective_completion_threshold_frontier_codex_v1`	machine-readable objective contract	`research/tracks/hypernym-infinite-mim/results/objective-readiness-audit/20260610T_objective_completion_threshold_frontier_codex_v1/audit.json`	stable requirement IDs for the six still-blocked live-memory capabilities plus satisfied harness/control-plane capabilities
objective closure plan	`20260610T_objective_closure_plan_threshold_frontier_codex_v1`	requirement-keyed closure artifact	`research/tracks/hypernym-infinite-mim/results/objective-closure-plan/20260610T_objective_closure_plan_threshold_frontier_codex_v1/closure-plan.json`	maps REQ-LIVE-STORY, RELATIONSHIP, PSYCHOLOGY, AGENT, Q2-SEQUENTIAL, and THRESHOLD-COVERAGE to exact runners, labels, row counts, close conditions, gate dependency, and max-two Pilots queue discipline
hardened admission lease validation	`20260610T_admission_gate_valid_lease_validation_codex_v1`	gate validation artifact	`research/tracks/hypernym-infinite-mim/results/v0.66-admission-control-gate/20260610T_admission_gate_valid_lease_validation_codex_v1/decision.json`	proves the admission gate accepts only a fresh track-scoped, endpoint-scoped, sequential-only lease; wrong-track, stale, wrong-endpoint, missing, and parallel-unsafe fixtures are refused
runtime runner refusal guard	`20260610T_admission_runtime_runner_refusals_codex_v1`	runtime guard artifact	`research/tracks/hypernym-infinite-mim/results/admission-runtime-guard-validation/20260610T_admission_runtime_runner_refusals_codex_v1/report.json`	proves first-live, full-suite, Q2, and Q4 runners refuse expired allow decisions at runtime before endpoint traffic
indexed full-suite dry-run	`20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1`	harness artifact, not capability evidence	`research/tracks/hypernym-infinite-mim/results/personal-memory-eval-suite/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1/suite-run.json`	machine-readable Q1/Q2/Q3/Q4 artifact index and ready-to-copy threshold/version-comparison commands
full-suite artifact index	`artifact-index`	machine-readable trace	`research/tracks/hypernym-infinite-mim/results/personal-memory-eval-suite/20260610T_personal_memory_eval_suite_indexed_dryrun_codex_v1/artifact-index.json`	future live run path reconstruction for Q1/Q2/Q3/Q4 comparisons without manual lookup
indexed full-suite live gate refusal	`20260610T_personal_memory_eval_suite_indexed_live_gate_refusal_codex_v1`	gate artifact	`research/tracks/hypernym-infinite-mim/results/personal-memory-eval-suite/20260610T_personal_memory_eval_suite_indexed_live_gate_refusal_codex_v1/suite-run.json`	proves the full live suite refuses before endpoint traffic while v0.66 blocks memory-quality rows
full-suite certification finalizer	`20260610T_full_suite_certification_dry_index_codex_v2`	certification guard	`research/tracks/hypernym-infinite-mim/results/full-suite-certification-finalizer/20260610T_full_suite_certification_dry_index_codex_v2/finalizer.json`	consumes full-suite artifact-index.json; refuses non-live Q1/Q2/Q4 unless explicitly allowed; treats Q3 as safety-control evidence when page safety passes
versioned eval packet v74	`20260610T_versioned_eval_packet_v3_candidate_codex_v74`	fingerprinted rerun packet	`research/tracks/hypernym-infinite-mim/results/versioned-eval-packet/20260610T_versioned_eval_packet_v3_candidate_codex_v74/packet.json`	future V3/V-next reruns with identical objective, manifest-selected gate artifact, queue preflight, catalog, scoring contract, and manifest fingerprint
version-comparison scaffold v77	`20260610T_version_comparison_scaffold_codex_v77`	comparison guard	`research/tracks/hypernym-infinite-mim/results/version-comparison/20260610T_version_comparison_scaffold_codex_v77/comparison.json`	refuses dry-run, gate-refusal, health-only, or missing Q1/Q2/Q3/Q4 artifacts as capability evidence
live launch checklist v79	`20260610T_live_launch_checklist_codex_v79`	operator checklist	`research/tracks/hypernym-infinite-mim/results/live-launch-checklist/20260610T_live_launch_checklist_codex_v79/checklist.json`	explicit launch order: status, audit, validation, suite dry-run, Pilots queue dry-run, gate, first-live subset, threshold finalizer, live suite, full-suite finalizer, version compare
current compound ledger	`ledger`	compiled artifact	`.forge/artifacts/hypernym-infinite-memory-research-ledger.json`	machine-readable index of source runs
current CXDB handoff	`cxdb-hypernym-infinite-mim-handoff-20260610`	durable handoff	`.forge/artifacts/cxdb-hypernym-infinite-mim-handoff-20260610.json`	resume/API import packet