§ 03 · Read

System architecture

How the product composes. The inside/outside one-way seam, components, data flow, segmentation, and trust model.

View as .md

Frame

Read this as three layers. Industrial Independence (IIA) is the security concept — an open spec, claim-by-name, anyone may implement. The role spec is the shape every conversational factory takes: an ingest data plane (passive or active) → a data source of record → a one-way seam → an outside copy → a query plane. Conversational Factory is the open, local product that fills the shape — one coherent, source-available, deployable bundle you run on-site, every line of the seam auditable. The concept is canonical; the product is reference; every role is swappable. The witness is an example of a passive-ingest data plane, discovery an example of an active-ingest one, and modelpond an example of a query plane (the hosted SaaS form); none defines its role. When a concept and a piece of code disagree, the concept is canonical. Status: coming soon — source-available.

It is built on the Industrial Independence Architecture (IIA), a separate architectural specification governing how a sovereign-unit-per-zone industrial system is shaped. IIA is the principle; CF applies the same posture to itself — specify the roles, treat the code as reference.

For the canonical role boundaries — inside historian, outside historian, MCP gateway — including deployment topologies and the diode constraints, see role definitions.

Inside vs outside

The product crosses one structural seam, and the entire design is organized around it: the inside holds the authoritative historian on the trusted network; the outside holds an expendable copy and the query plane (MCP gateway). Between them is a one-way sync — datagrams out, no acknowledgement, no return socket — optionally enforced by a hardware data diode.

Why the seam is shaped this way:

  1. The dangerous direction must not exist. Any interface that can answer a question from outside is an interface an attacker can try to reach the plant through. The only way to remove that risk entirely is to remove the return path entirely — not gate it, remove it.
  2. AI lives where humans are. Claude on a workstation, a model in the cloud, a small model at the edge. The gateway has to be near the client; the client is never inside the trusted boundary, so the gateway sits on the outside copy.
  3. LLM-adjacent complexity stays out of the cell. Model versions, prompt drift, token costs, vendor APIs, response variance — IT-side concerns that do not belong inside a sovereign appliance governed by Safety, Reliability, Performance.
  4. The plant cannot depend on the outside. The inside historian is authoritative and self-sufficient. Cut the link and the plant is unaffected; only the outside view goes dark.
  5. Audit cleanliness. The gateway keeps an operator-side ledger (“what was asked”); the inside keeps its own record (“what left”). Different retention, different concerns, correlated by request_id.

The architectural seam is the one-way sync. Everything on the trusted side of it is authoritative and never reachable from outside. Everything on the far side of it is a copy and safe to lose.

Components

Inside historian

The authoritative system of record on the trusted network, in a standardized schema. It is complete and self-sufficient on its own.

  • Holds the time-aligned operational record consumers ask questions against.
  • Standardized schema, so the outside copy and every downstream reader are bespoke-free.
  • Runs with zero outbound reach. The only thing that ever leaves it is a copy.
  • Fed by the ingest data plane (below). The specific feed is site-dependent; the role is not.

The ingest data plane

A plant already holds the data — in scattered pools: Modbus registers, controller context, alarm and event logs, batch and recipe history, OPC tags, most of it untapped. The ingest data plane is the role that taps those pools into the standardized record, upstream of the data source of record, with zero process side effects. It comes in two forms, and a factory may run both:

  • Passive-ingest — observe and ingest without touching the process: never transmit, never probe. Example: the witness (passive capture and decode of plant traffic).
  • Active-ingest — reach out to read: poll, query, or discover devices to obtain values they don’t volunteer. Strictly read-only — it may ask, never write or control. Example: discovery (active enumeration where there’s no existing traffic to observe).

Both land the same standardized schema and share the binding invariant: read-only, no process side effects. Passive adds “no emission at all”; active relaxes that to “reads only, never writes.” Which software fills the role — and whether a deployment uses one form or both — is its own concern; the constraints are what the architecture cares about.

One-way sync

The transport that mirrors the inside historian outward. This is the moat.

  • Datagrams out only. No acknowledgement, no retry-on-request, no return socket.
  • Forward error correction and/or redundant transmission for loss resilience; a local buffer for operational continuity across short outages of the far side — not guaranteed delivery.
  • Optionally enforced by a hardware data diode, making the one-wayness physical: no software bug on either side can create a path back.
  • One-way by transport, not by policy. There is no setting that opens the reverse direction because there is no reverse direction.

Outside historian

A standardized copy of the historian on the outside, designed to be lost.

  • Compromise it completely and the attacker holds historical data and no route to anything — no socket, no interface, no path home.
  • Standardized schema and read API: any client, model, dashboard, or downstream system reads it without custom glue.
  • Optionally forwards to a cloud or off-site server over MQTT in realtime, to be evaluated there as well. Opt-in and additive.

The query plane (the MCP gateway)

The off-copy front door for AI clients: the role that turns natural-language questions into bounded, read-only source calls and composes grounded, audited answers. Often an MCP server, hence “MCP gateway.”

  • MCP server exposing a small, fixed set of read-only verbs as MCP tools.
  • Natural-language translation. A query like “why did line 3 lose throughput last shift?” expands into a sequence of reads — identify line 3, fetch its components, range-query history, fetch findings in the window, compose.
  • Answer composer. Aggregates records into model-readable context and returns grounded answers with citations into the audit chain.
  • Audit binding. Every query, every read, every composed answer is bound to the operator-side audit ledger, so an operator can later ask “why did the AI tell you that?” and trace it.
  • Read-only by surface. No tool maps to a write. Read-only is a property of the call surface and the topology, not a prompt directive.

The query plane is deliberately source-agnostic — it does not know or care where the data sits. The outside copy is simply one source it can be pointed at, alongside MCP servers, time-series databases, SQL historians, or search indexes. modelpond is a reference implementation of this role — a single-binary web client (read-only to sources, grounded-or-silent, fail-closed audit, egress disclosed, pluggable model). It is a complement to CF, not a part of it: its genericness is the point, and it carries none of the OT/one-way framing. One way to fill the role, not the definition of it.

Inference

Inference is model-agnostic and runs wherever the deployment allows:

  • On-prem / edge. Specialized small models next to the outside copy, no outbound connectivity — the air-gapped path.
  • Frontier / preferred. Any frontier or preferred model pointed at the larger external dataset, when the site permits it.

Same data, same query surface — only the model placement changes.

Data flow

The product is read-only and one-way at the only seam that matters. Steps are labelled by the part that owns the work; see role definitions for responsibilities and limits.

  1. Inside record. (Inside historian.) The authoritative, standardized operational record is maintained on the trusted network. Self-sufficient; never reachable from outside.
  2. Inside → one-way sync. (Inside historian.) A copy is exported as datagrams over the one-way transport — no acknowledgement, no return socket. Optionally across a hardware data diode.
  3. One-way sync → outside copy. (Transport.) The copy materializes on the outside. The transport carries one direction only; there is no channel back.
  4. Outside copy → standard read API. (Outside historian.) The expendable copy exposes its data through a standardized read-only interface. Optionally also forwarded to cloud/off-site over MQTT.
  5. Read API → query plane. (Query plane / MCP gateway.) The query plane turns natural-language questions into reads against the copy, composes grounded answers, and binds everything to the audit ledger.
  6. MCP gateway → inference. (Edge or frontier model.) An on-prem small model or any preferred frontier model consumes the composed context and returns the answer. No path from here reaches the plant.

There is no inverse arrow at any step because there is no return socket: the export is datagrams over a one-way transport. Compromise the entire outside and you hold a copy and no way back.

Segmentation model

This is governed by IIA. Conversational Factory inherits the model:

  • One unit per zone. Each zone is a complete inside / one-way / outside unit, sovereign for its scope.
  • The only crossing is outward. The trusted side has no inbound listener that the untrusted side can reach. The sole boundary crossing is the one-way sync, away from the plant.
  • Hierarchy. Cloud (optional) → Site → Zone → Subzone. Data flows outward and upward only; nothing flows back down through the seam.
  • Inter-zone visibility. Where a higher scope needs a lower scope’s data, it consumes that zone’s outside copy — never the inside. Aggregation composes copies; it never opens a path inward.

Trust and security model

  • The guarantee is the transport. The one-wayness is a property of the wire (and optionally a hardware diode), not a firewall rule, a policy, or a config someone can fat-finger. A misconfiguration cannot open a path that does not physically exist.
  • Expendable outside. The threat model assumes the attacker fully owns the outside. The designed answer: a copy of historical data and zero reach back to the plant.
  • No inbound at the boundary. No listener on the trusted side that the untrusted side can address. No outbound HTTP from the inside — no registry pulls, no rule feeds, no telemetry, no CRL/OCSP. Updates arrive as signed bundles.
  • Configuration is a signed artifact. No live mutation API on the trusted side. Configuration is generated offline, signed, and applied by a constrained-grammar parser; the parser is the trust boundary.
  • Read-only by surface and topology. No tool maps to a write, and the only reachable thing is a copy. Read-only is structural, not a prompt directive.
  • Audit chain. Append-only, externally verifiable, externally publishable. Every query, every read, every composed answer writes to it.
  • Provenance without a back channel. Because the receiver cannot tell the sender anything, provenance signing uses pre-shared keys or out-of-band trust anchors, and inside-side timestamps are authoritative.

Repo boundaries

The canonical artifact is the role specification — the data source, the one-way seam, the outside copy, the query plane, and the read-only/one-way concepts that bind them. Code satisfies the spec; it does not define it.

The reference implementation in this repository covers:

  • The standardized read API / schema the outside copy exposes (the open contract).
  • The one-way sync transport and its diode-compatible profiles.
  • Reference packaging — deployment, install, ops, customer-facing docs.

Roles filled by separate, reference components:

  • IIA — the architectural specification CF inherits its posture from.
  • The ingest data plane — produces the standardized record, passive or active. The witness is one reference (passive); discovery another (active). The specific feed is site-dependent.
  • The query plane — turns questions into bounded read-only calls. modelpond is one reference; source-agnostic and intentionally uncoupled from the OT framing.

Each is an example of its role, not the role itself.

Diagrams. Architecture diagrams for the inside / one-way / outside model are being redrawn and are intentionally omitted here until they reflect the current model rather than an earlier one.