May 12, 2025 Hitoshi Murakami

On-Premises AI Deployment: A Technical Checklist for Enterprise IT

Deploying an AI agent inside an enterprise network for the first time involves a different set of checkpoints than deploying a conventional application. The infrastructure requirements, network topology considerations, identity integration, and logging architecture all have AI-specific dimensions that catch IT teams off guard if they approach the deployment with only their standard application checklist.

This post is a practical technical checklist drawn from our deployment experience at dodoAI. It covers the infrastructure, security, and integration points that matter for a production on-premises AI agent deployment. Not every item applies to every environment — a small deployment on a single dedicated server has different considerations than a multi-node deployment in a segmented enterprise network. Use it as a starting point and adjust for your specific architecture.

1. Infrastructure and Compute

Minimum hardware specification for inference: A modern quantized language model suitable for enterprise workflow processing (7B–13B parameter range) can run inference on CPU with sufficient RAM. For production throughput at a mid-size enterprise, plan for a dedicated server with at minimum 16 CPU cores and 128 GB RAM. SSD storage for model weights reduces cold-start time significantly. GPU is not required for this parameter range at enterprise workflow volumes, but a CUDA-capable GPU (NVIDIA T4 or equivalent) will roughly triple throughput if your inference volume justifies it.

Containerization: Deploy the agent runtime in containers (Docker / containerd). This is not optional for maintainability — bare-metal process deployment creates dependency management problems at update time that compound over the agent's lifecycle. Define your container image at deployment and version-control it. Use a local registry if your network policy restricts outbound connections to public container registries.

# Sample: verify container runtime on target server
docker info | grep -E "Server Version|Storage Driver|Cgroup"
# Confirm no internet pull is needed by pre-loading images locally
docker load -i /opt/dodoai/images/agent-runtime-latest.tar

Storage planning: Model weights for a 7B quantized model run 4–8 GB on disk. Inference logs and audit trails grow at a rate proportional to workflow volume. Plan for at minimum 500 GB dedicated storage for a mid-size deployment, with a clear data retention and archival policy in place before go-live. Logs that don't get managed become a compliance and operational problem at month 6.

High availability: For production deployments where the agent is in the critical path of a business process, plan for at minimum a two-node deployment with load distribution. Full active-active HA adds complexity; for most enterprise workflow agents, a warm standby that can be promoted within 5–10 minutes is sufficient and simpler to operate.

2. Network Segmentation and Firewall Rules

Determine outbound connectivity requirements before deployment: The agent runtime should have a documented, minimal outbound connectivity profile. The specific list depends on your configuration, but as a general principle: if the runtime requires outbound connections to operate core inference, that is a design issue to resolve before production. Outbound connections that are acceptable for an on-premises deployment are those to internal resources — your ERP, your LDAP/AD, your document management system — not to external endpoints.

Network segment for AI workloads: Place the agent runtime in a network segment with explicit firewall rules. The segment needs inbound access from the workflow intake system (the application or users submitting requests) and outbound access only to the specific internal systems the agent needs to read from and write to. A flat internal network where the agent can reach any internal system is an unnecessary exposure surface.

ERP connectivity: Document the exact protocols and ports required. SAP ERP integration typically uses RFC over port 3300 (for ABAP RFC) or HTTP/HTTPS for NetWeaver Gateway REST API. Oracle ERP typically uses JDBC over port 1521. OBIC and other domestic Japanese ERP systems often use proprietary connectors — confirm the protocol before planning the firewall ruleset.

# Example firewall rules for agent segment (iptables notation)
# Allow inbound from workflow intake subnet
iptables -A INPUT -s 10.10.20.0/24 -p tcp --dport 8080 -j ACCEPT
# Allow outbound to ERP (SAP NetWeaver Gateway)
iptables -A OUTPUT -d 10.10.30.10 -p tcp --dport 443 -j ACCEPT
# Allow outbound to LDAP
iptables -A OUTPUT -d 10.10.1.5 -p tcp --dport 636 -j ACCEPT
# Default deny outbound to internet
iptables -A OUTPUT -d 0.0.0.0/0 -j DROP

Verify air-gap behavior: Before go-live, disconnect the agent from all outbound internet access and run a full workflow test. Confirm that inference, logging, audit trail generation, and ERP write operations all complete successfully. Any component that silently fails or produces degraded output when internet connectivity is absent is a problem to resolve before production.

3. Identity and Access Integration

LDAP or Active Directory integration: The agent needs to authenticate requestors and validate their access rights against the appropriate systems. This requires read access to your directory service — LDAP over port 389 (or 636 for LDAPS) to your AD or OpenLDAP instance. Configure a dedicated service account for the agent with minimal directory read permissions. Do not use an administrator account. Do not use a personal account. Document the service account, its permissions, and the process for rotating its credentials.

Service account for ERP writes: The agent needs a service account in your ERP with write permissions limited to the specific operations it performs — vendor master updates, PO status flags, approval workflow state. Avoid giving it a role with broad ERP access. In SAP, define a custom role that grants access only to the specific transaction codes the agent needs. Audit this role definition before go-live and review it at least quarterly.

API key management: If the agent runtime communicates with internal services via API keys or tokens, store these in a secrets management system — HashiCorp Vault, AWS Secrets Manager on-prem equivalent, or at minimum a secrets store that is not a flat configuration file checked into version control. Rotation policy: rotate service credentials at minimum quarterly, with automatic rotation preferred.

4. Audit Logging and Decision Records

Structured logging format: Audit logs from the agent runtime should be in a structured format (JSON-L is typical) that can be ingested by your SIEM or log management system. Unstructured text logs are acceptable for debugging but not for compliance audit purposes. Each log entry for a workflow decision should include: timestamp (ISO 8601 with timezone), workflow ID, requestor identity, document reference, decision output, and the policy clauses evaluated.

Log integrity: Decision records that may be used in compliance or dispute contexts benefit from tamper-evidence. At minimum, implement log signing where each batch of decision records is hashed and the hash is stored in a separate system. This does not need to be complex — a daily hash of the log file written to a separate storage location with access controls different from the log store is sufficient for most enterprise requirements.

Retention policy: Define and document the retention period for agent decision logs before go-live. For procurement and compliance approvals, a five-year retention period is common in Japanese regulated industries, though your specific obligation depends on your industry and the nature of the decisions. Configure log rotation and archival to implement the policy automatically — do not rely on manual archival.

Access controls on audit logs: The people who operate the AI agent should not have unilateral write access to the audit logs. This is a standard separation of duties control. Implement it with filesystem permissions, RBAC on your log management system, or both. An audit trail that the system operator can modify is not an audit trail in any meaningful compliance sense.

5. Update and Maintenance Procedures

Model update process: Document how model updates will be applied. On-premises deployments require a defined procedure: where updated model weights are obtained, how they are tested in a staging environment before production promotion, and how the previous version is preserved for rollback. A model update that degrades decision quality for your specific workflows is a production incident. Treat model updates with the same change management rigor you apply to application releases.

Agent runtime updates: The agent runtime software itself will need updates — bug fixes, security patches, feature additions. Define the update frequency and process. Container-based deployment makes this manageable: pull the new image, run in staging, promote to production with a rollback path. The update should not require downtime for a well-designed runtime.

Policy store updates: When your approval policies change — new spending thresholds, updated escalation rules, revised vendor tier definitions — the agent needs to reflect the change. Confirm that your agent runtime has a defined mechanism for policy updates that does not require a full redeploy. Policy updates should be versioned and logged: when a policy changed, who authorized the change, and what the previous policy was. This history matters when auditing decisions made under different policy versions.

6. Pre-Go-Live Validation

Before moving any production workflows to agent processing, complete a structured validation period. Run the agent in shadow mode alongside the existing manual process: the agent processes the same requests as the manual approvers, and you compare outcomes. This serves two purposes — it validates that the agent's decisions align with your policy intent, and it surfaces edge cases in your policy rules that produce unexpected outcomes.

The shadow mode period should be long enough to cover a representative sample of your workflow volume, including any seasonal or periodic variations. For a procurement approval workflow, a four-to-six-week shadow period that spans a month-end cycle is a reasonable minimum. For compliance workflows with low-frequency high-stakes decisions, the shadow period may need to be longer to accumulate sufficient cases.

Document the outcomes of the shadow period formally: agreement rate between agent and human decisions, breakdown of disagreement cases by type, policy clarifications that were required. This document becomes the baseline for monitoring agent decision quality in production and for future policy audits.

On-premises AI deployment is more work than signing up for a cloud API. That work is front-loaded — most of it happens during the infrastructure and security setup phase rather than during ongoing operation. Organizations that do it carefully end up with a deployment that is auditable, maintainable, and genuinely under their operational control. The checklist above is where that work starts.

Interested in sovereign AI for your enterprise?

We deploy inside your perimeter. Your data never leaves. Start with a discovery call to map your use case and environment.

Talk to the Team Read More Articles