January 26, 2026 Hitoshi Murakami

Inside the dodoAI Agent Runtime: Architecture Decisions for On-Prem Deployment

Building an agent runtime for air-gapped enterprise deployment forces you to make decisions that cloud-native AI products don't need to make. No outbound network access means no remote model API calls, no telemetry phone-home, no automatic dependency updates. The entire system has to run on hardware the customer already has, in a network that may have no outbound connectivity to the public internet, managed by an IT team whose primary job is keeping the existing ERP and networking infrastructure running — not operating ML systems.

This post describes the architecture of the dodoAI runtime, the decisions that shaped it, and the trade-offs we made. We're publishing this because these architectural choices aren't widely discussed in the AI product space — most agent platforms are cloud-first and treat on-premises as an afterthought or an enterprise tier add-on. We think the constraints of air-gapped deployment produce a more disciplined architecture than cloud-default, and it's worth explaining why.

The Inference Layer: No External API Calls

The most fundamental constraint is that all inference must happen locally. We cannot call any external model endpoint. This means we run inference on hardware inside the customer's network.

For the reasoning and decision-making components of the agent, we use a quantized LLM running via llama.cpp or Ollama, depending on the customer's preference for container management. The inference server exposes a local HTTP endpoint that conforms to the OpenAI API schema — this matters because it means the agent code doesn't need to know whether it's talking to a local inference server or a remote API, which keeps the higher layers clean.

Quantization is necessary because the hardware we target — commodity x86 servers with 64-128 GB RAM, no GPU required — can't run full-precision 70B parameter models at acceptable latency. We target 4-bit or 8-bit quantized versions of models in the 7B to 13B parameter range for most tasks. This produces noticeably lower reasoning quality than frontier cloud models. We accept that trade-off. The tasks we're automating — structured policy evaluation, document classification, approval routing — don't require frontier-level reasoning. They require consistent rule application and reliable information extraction, which a well-prompted 7B model handles adequately.

We're explicit with customers about this: if you need the reasoning quality of GPT-4 or Claude 3 Opus, you cannot have air-gapped deployment. Those models require compute that exceeds what commodity enterprise servers can run. The choice between sovereignty and capability at the frontier is real, and we don't pretend otherwise. For the process automation use cases we focus on, we believe the 7B-13B range is sufficient.

Container Packaging: Podman Over Docker

We ship the runtime as a container, but we support Podman as the primary container runtime in addition to Docker. This is a deliberate choice driven by the enterprise environments we deploy into.

Most Osaka-area manufacturers running RHEL 8 or RHEL 9 as their server OS have moved to Podman as their default container runtime, following Red Hat's direction. Asking them to install Docker requires an exception to their OS support policy, adds a dependency on the Docker daemon (which runs as root), and introduces a software vendor relationship the IT team may not want to manage.

Podman's rootless execution mode is also directly relevant to our security requirements. Our container processes do not run as root inside the container. The Podman rootless mode maps the container's UID namespace to an unprivileged user on the host, which means even if a container process is compromised, it doesn't have write access to host system paths. Running rootless with Podman on RHEL 9 also works cleanly with SELinux in enforcing mode, which most security-conscious IT teams require.

The deployment command a customer's IT team runs looks like this:

podman run -d \
  --name dodo-agent \
  --user 1001:1001 \
  --security-opt no-new-privileges \
  -v /opt/dodo/config:/app/config:ro,Z \
  -v /opt/dodo/data:/app/data:Z \
  -v /opt/dodo/logs:/app/logs:Z \
  --network dodo-internal \
  dodoai/runtime:2.1.0

The :Z suffix on volume mounts handles SELinux context relabeling automatically. The --network dodo-internal argument restricts the container to a Podman-managed bridge network that has no outbound internet access. We provision this network as part of the deployment setup script so the customer's IT team isn't configuring network namespaces manually.

State and Persistence: Deliberate Minimalism

One of the more counterintuitive architectural decisions we made is to minimize the agent's stateful footprint. The agent runtime is designed to be largely stateless: process context is fetched from the ERP at the start of each evaluation, decisions are written to the decision log, and no long-lived state is maintained in the agent's own database beyond configuration and logs.

This seems to contradict how most agent frameworks work — they maintain conversation history, tool call results, and intermediate state across multi-step processes. We do maintain step-level state within a single process execution, but that state is ephemeral and lives in process memory, not in a persistent database that needs to be backed up and managed.

The reason is operational simplicity. An enterprise IT team that's already managing ERP databases, network infrastructure, and user directories doesn't want another database to maintain. If the agent runtime goes down and restarts, it should come back to a clean state, re-read its configuration, and continue processing from the ERP's own queue. The ERP is the authoritative state source; the agent is a processor that reads from and writes back to it.

The decision log is the one database the runtime maintains. It's an append-only SQLite database by default, or PostgreSQL if the customer prefers a managed relational database. Append-only SQLite is suitable for most deployments — it handles thousands of decisions per day without performance issues, and SQLite's single-file storage makes backup as simple as copying a file. For customers with higher volume requirements or who want to integrate the log into their existing database infrastructure, we support PostgreSQL connections.

Configuration Management and Hot Reload

Agent configuration — the rule sets, the ERP connection parameters, the escalation paths — is stored as YAML files on the host filesystem, mounted into the container read-only. The runtime watches the configuration directory with inotify and reloads configuration when files change, without restarting the container.

Hot reload is important because policy updates should not require a maintenance window. If the procurement team updates an approval threshold, the IT team should be able to update the configuration file and have the agent apply the new threshold within seconds, not after the next scheduled deployment window. We use a staged reload: the new configuration is parsed and validated first, and if validation fails (missing required fields, invalid rule syntax), the reload is rejected and an error is logged. The running configuration stays in effect until a valid update is provided.

The configuration validation step runs the same logic as our offline rule-set validation tool, so an IT team can test a configuration change locally before pushing it to the production deployment:

dodo-validate --config /opt/dodo/config/rules.yaml
# Output:
# ✓ Syntax: OK
# ✓ Schema: OK
# ✓ Rule conflicts: 0 detected
# ✓ Escalation paths: all targets resolvable
# Configuration valid. 14 rules loaded.

Observability Without External Dependencies

In a cloud deployment, observability is handled by external services: application logs go to a log aggregation service, metrics go to a monitoring platform, traces go to a distributed tracing system. In an air-gapped environment, none of those external services are available.

We handle observability in three ways. Application logs are written in structured JSON format to the mounted log volume, which the customer's existing log management infrastructure (typically a syslog daemon or a local ELK stack) can ingest. We don't require any specific log collector — we produce structured logs, and the customer pipes them wherever they already pipe their application logs.

Metrics are exposed via a Prometheus-compatible /metrics endpoint on the container's internal network. If the customer runs Prometheus (common in environments that have started adopting container workloads), it can scrape the agent's metrics without any configuration changes on the agent side. If they don't, the endpoint is there but unused — it doesn't depend on an external service to function.

For health checking, the runtime exposes a simple /health endpoint that returns the inference backend status, the ERP connection status, and the last successful decision timestamp. A customer's monitoring system can poll this endpoint and alert on degraded status without understanding anything about the agent's internal architecture.

Upgrade Mechanism for Air-Gapped Environments

Software updates in air-gapped environments require a different approach than cloud deployments where you can pull a new container image directly from a registry. Our customers can't run podman pull dodoai/runtime:2.2.0 because there's no outbound access to the container registry.

We handle this with an offline update bundle: a tarball containing the new container image layers in OCI format, the updated model weights if the LLM has changed, and any updated configuration schemas. The customer's IT team transfers the bundle into the secure network via whatever mechanism they use for authorized software distribution — usually an internal file server or a DVD/USB procedure for highly restricted environments. The update script loads the container image from the bundle into the local Podman image store and performs the container restart.

Model updates require significantly larger bundles — a 7B quantized model at 4-bit precision is roughly 4-5 GB. We try to ship model updates infrequently and only when reasoning quality improvements are substantial enough to justify the transfer and validation overhead. Configuration and rule schema updates are small and can ship more frequently.

What This Architecture Doesn't Do Well

We deliberately excluded several capabilities that cloud-native agent frameworks typically offer, because they either require external connectivity or add operational complexity that doesn't serve our target environment.

We don't do real-time model fine-tuning or feedback learning. The LLM's weights are fixed at deploy time. If the agent's decisions are consistently wrong in a specific pattern, the fix is a rule set change, not model retraining. This is a deliberate constraint: retraining in a customer environment requires ML expertise and compute resources that most enterprises don't have. Rule-based corrections are reproducible, auditable, and don't require ML expertise to implement.

We don't support multi-agent orchestration in the current version. The runtime is a single agent with a defined scope. We've seen interest in chaining multiple agents for complex multi-step processes, but multi-agent coordination in an air-gapped environment adds significant complexity — you need a reliable message bus, consensus mechanisms for shared state, and distributed tracing to debug failures. We're not ready to recommend that complexity to customers whose IT teams are running the system without ML specialists. Single-agent, well-scoped processes first.

The architecture reflects a specific set of constraints and priorities: air-gapped deployment, operational simplicity, explainable decisions, and compatibility with existing enterprise IT infrastructure. These constraints shaped an architecture that's substantively different from cloud-native agent platforms. We think those differences are features of the design, not limitations of it.

Interested in sovereign AI for your enterprise?

We deploy inside your perimeter. Your data never leaves. Start with a discovery call to map your use case and environment.

Talk to the Team Read More Articles