How I Design Replay-Safe HL7 v2 to FHIR Ingestion

Published: 2026-03-08
Tags: HL7 v2, FHIR R4, Replay Safety, Correlation IDs, Idempotency, Go, Kubernetes

Why Replay Safety Matters

HL7 v2 feeds are retry-heavy by design. Interface engines, network interruptions, and downstream backpressure all cause message re-delivery. If ingestion logic is not replay-safe, duplicate encounters and out-of-order updates pollute the clinical record and create reconciliation work for operations and revenue-cycle teams.

The goal is simple: same message replayed 1x or 10x should produce the same final state.

Reference Architecture

+------------------+      +-------------------+      +---------------------+
| HL7 v2 Sender(s) | ---> | Ingress/API Layer | ---> | Validation + Parsing |
+------------------+      +-------------------+      +---------------------+
                                   |                          |
                                   v                          v
                          +----------------+          +---------------------+
                          | Dedup Store    | <------> | Correlation Service |
                          | (idempotency)  |          | (message lineage)   |
                          +----------------+          +---------------------+
                                   |
                                   v
                          +---------------------+
                          | Mapping + Enrichment|
                          | HL7 v2 -> FHIR R4   |
                          +---------------------+
                                   |
                                   v
                          +---------------------+
                          | Transaction Writer  |
                          | (upsert + version)  |
                          +---------------------+
                                   |
                                   v
                          +---------------------+
                          | Audit + Telemetry   |
                          | logs/traces/metrics |
                          +---------------------+

Core Design Decisions

1) Idempotency Key Strategy

I use a deterministic key derived from fields that identify the business event, not delivery transport noise.

Candidate fields: sending facility, message type, patient identifier, visit/encounter identifier, event timestamp.
Key format: stable hash of canonicalized values.
TTL: long enough to cover replay windows and downstream retries.

2) Correlation IDs Everywhere

Every stage carries a correlation ID so I can answer:

Which inbound HL7 message produced this FHIR resource?
Was this event replayed, skipped, or applied?
How long did each transformation step take?

3) Deterministic Mapping

Mapping rules must be pure and deterministic:

Same input segment values always map to the same FHIR output fields.
Optional enrichment is versioned so transformations are reproducible.
Null/empty HL7 semantics are explicitly handled to avoid accidental field erasure.

4) Transaction-Safe Writes

FHIR writes use guarded upsert behavior:

Conditional create/update for known identifiers.
Transaction bundles for multi-resource consistency.
Conflict handling policy for late or out-of-order events.

Pseudocode: Replay-Safe Pipeline

func HandleHL7Message(msg HL7Message) Result {
    correlationID := EnsureCorrelationID(msg)
    key := BuildIdempotencyKey(msg)

    if DedupStore.Exists(key) {
        EmitMetric("hl7.replay.skipped", 1)
        Audit("skipped_replay", correlationID, key)
        return Result{Status: "skipped"}
    }

    if err := Validate(msg); err != nil {
        EmitMetric("hl7.validation.failed", 1)
        Audit("validation_failed", correlationID, key, err)
        return Result{Status: "rejected", Err: err}
    }

    resources, err := MapHL7ToFHIR(msg)
    if err != nil {
        EmitMetric("hl7.mapping.failed", 1)
        Audit("mapping_failed", correlationID, key, err)
        return Result{Status: "failed", Err: err}
    }

    txResult, err := FHIRWriter.UpsertTransaction(resources)
    if err != nil {
        EmitMetric("hl7.fhirtx.failed", 1)
        Audit("transaction_failed", correlationID, key, err)
        return Result{Status: "failed", Err: err}
    }

    DedupStore.MarkProcessed(key)
    EmitMetric("hl7.processed", 1)
    Audit("processed", correlationID, key, txResult.ResourceIDs)

    return Result{Status: "applied", IDs: txResult.ResourceIDs}
}

Metrics I Track in Production

Replay skip rate (% of inbound messages deduplicated)
Mapping failure rate (by segment and rule version)
End-to-end ingestion latency (p50/p95/p99)
FHIR transaction success rate
Queue drain time during replay storms

Practical Guardrails

Backfill mode and real-time mode use the same idempotency contract.
Rule changes are versioned; rollouts are canaried.
Audit events are immutable and queryable by correlation ID.
Alerting focuses on business risk (duplicate writes, stale queues), not just CPU/memory.

Closing

Replay-safe ingestion is less about one dedup table and more about end-to-end determinism. When idempotency keys, correlation IDs, deterministic mapping, and transaction-safe writes work together, you can replay data confidently without breaking downstream clinical and billing workflows.