# Local-first multimodal medical records

Canonical page: https://capsules.run/research/local-first-multimodal-medical-records/

A portable file as the source of truth for between-visit patient data.

## The frame

Between-visit patient data usually lives in a vendor cloud, a device-specific app, or a note the patient brings to the appointment. None of those compose cleanly with an on-device multimodal LLM that can interpret a photo, a voice memo, or a free-text note.

If the model runs on-device and the patient owns the data, the transport format itself has to be portable, signed, and verifiable. The narrow claim is simple: the record should be a file the patient can carry and a clinician can verify without asking a vendor for permission.

## File as source of truth

In the medical-journal example, the encrypted `.capsule` on the patient's phone is the source of truth. Browser storage is working state. The chain of events, media payloads, clinic trust anchor, manifest, and signed envelope live in the file.

Every log follows the same shape: open the capsule, decrypt the inner content, append an event, recompute hashes, seal a deterministic ZIP, sign the envelope, encrypt to the patient and clinic recipients, and write the file back.

That costs write amplification, but it gives a clear continuity property: the data outlives the app, the device, and the specific model that produced an interpretation.

## Multimodal capture

The model's role is constrained. A patient can log text, a photo, audio, or structured readings. The original bytes go into `payload/`. The event records the path and hash reference. Any LLM-authored interpretation is marked as untrusted model output in the event payload.

The split between patient-authored bytes and model-authored interpretation is the important seam. A later reader can render the original evidence, show the model output, and decide how much weight to give it.

## Device-side skills

The example is organized around small skills:

1. A medical-journal skill owns the chain and sealing flow.
2. A clinical-probe skill asks clarifying questions and produces structured observations from local model calls.
3. A clinic-recipient skill returns the clinic's recipient key and identity metadata.

The file ties these skills together. The host may be Edge Gallery, a browser reader, a local app, or another future runtime.

## Trust establishment

The example does not claim a global identity authority. The clinic publishes an installable skill and public recipient key through an out-of-band channel. The patient installs it manually, sees the clinic name and fingerprint, and later encrypts exports to that key.

The capsule proves the math: content, hashes, signatures, and recipients. The relationship between a key and a real clinic remains an operational trust step.

## Clinician side

The clinician reader is a static browser surface. It opens a capsule, verifies the signature and chain, decrypts with the clinic key, renders media, and shows deterministic analytics. A model can add chain-grounded Q&A later, but verification and timeline rendering do not require a model.

## Status and trajectory

The shipped medical-journal example is a concrete vertical for the Capsule v0.6 format. It tests local multimodal capture, patient-held keys, multi-recipient encryption, offline verification, and a browser-only clinician reader.

Next work includes richer clinician skills, key rotation and recovery, lazy-seal mode, and external time anchoring.

Example: /examples/medical-journal/

