How I Design Replay-Safe HL7 v2 to FHIR Ingestion
Published: 2026-03-08
Tags: HL7 v2, FHIR R4, Replay Safety, Correlation IDs,
Idempotency, Go, Kubernetes
Why Replay Safety Matters
HL7 v2 feeds are retry-heavy by design. Interface engines, network interruptions, and downstream backpressure all cause message re-delivery. If ingestion logic is not replay-safe, duplicate encounters and out-of-order updates pollute the clinical record and create reconciliation work for operations and revenue-cycle teams.
The goal is simple: same message replayed 1x or 10x should produce the same final state.
Reference Architecture
+------------------+ +-------------------+ +---------------------+
| HL7 v2 Sender(s) | ---> | Ingress/API Layer | ---> | Validation + Parsing |
+------------------+ +-------------------+ +---------------------+
| |
v v
+----------------+ +---------------------+
| Dedup Store | <------> | Correlation Service |
| (idempotency) | | (message lineage) |
+----------------+ +---------------------+
|
v
+---------------------+
| Mapping + Enrichment|
| HL7 v2 -> FHIR R4 |
+---------------------+
|
v
+---------------------+
| Transaction Writer |
| (upsert + version) |
+---------------------+
|
v
+---------------------+
| Audit + Telemetry |
| logs/traces/metrics |
+---------------------+
Core Design Decisions
1) Idempotency Key Strategy
I use a deterministic key derived from fields that identify the business event, not delivery transport noise.
- Candidate fields: sending facility, message type, patient identifier, visit/encounter identifier, event timestamp.
- Key format: stable hash of canonicalized values.
- TTL: long enough to cover replay windows and downstream retries.
2) Correlation IDs Everywhere
Every stage carries a correlation ID so I can answer:
- Which inbound HL7 message produced this FHIR resource?
- Was this event replayed, skipped, or applied?
- How long did each transformation step take?
3) Deterministic Mapping
Mapping rules must be pure and deterministic:
- Same input segment values always map to the same FHIR output fields.
- Optional enrichment is versioned so transformations are reproducible.
- Null/empty HL7 semantics are explicitly handled to avoid accidental field erasure.
4) Transaction-Safe Writes
FHIR writes use guarded upsert behavior:
- Conditional create/update for known identifiers.
- Transaction bundles for multi-resource consistency.
- Conflict handling policy for late or out-of-order events.
Pseudocode: Replay-Safe Pipeline
func HandleHL7Message(msg HL7Message) Result {
correlationID := EnsureCorrelationID(msg)
key := BuildIdempotencyKey(msg)
if DedupStore.Exists(key) {
EmitMetric("hl7.replay.skipped", 1)
Audit("skipped_replay", correlationID, key)
return Result{Status: "skipped"}
}
if err := Validate(msg); err != nil {
EmitMetric("hl7.validation.failed", 1)
Audit("validation_failed", correlationID, key, err)
return Result{Status: "rejected", Err: err}
}
resources, err := MapHL7ToFHIR(msg)
if err != nil {
EmitMetric("hl7.mapping.failed", 1)
Audit("mapping_failed", correlationID, key, err)
return Result{Status: "failed", Err: err}
}
txResult, err := FHIRWriter.UpsertTransaction(resources)
if err != nil {
EmitMetric("hl7.fhirtx.failed", 1)
Audit("transaction_failed", correlationID, key, err)
return Result{Status: "failed", Err: err}
}
DedupStore.MarkProcessed(key)
EmitMetric("hl7.processed", 1)
Audit("processed", correlationID, key, txResult.ResourceIDs)
return Result{Status: "applied", IDs: txResult.ResourceIDs}
}Metrics I Track in Production
- Replay skip rate (% of inbound messages deduplicated)
- Mapping failure rate (by segment and rule version)
- End-to-end ingestion latency (p50/p95/p99)
- FHIR transaction success rate
- Queue drain time during replay storms
Practical Guardrails
- Backfill mode and real-time mode use the same idempotency contract.
- Rule changes are versioned; rollouts are canaried.
- Audit events are immutable and queryable by correlation ID.
- Alerting focuses on business risk (duplicate writes, stale queues), not just CPU/memory.
Closing
Replay-safe ingestion is less about one dedup table and more about end-to-end determinism. When idempotency keys, correlation IDs, deterministic mapping, and transaction-safe writes work together, you can replay data confidently without breaking downstream clinical and billing workflows.