How OpenAI Codex Sandboxes AI Code Execution

HubAI Asia
HubAI AsiaCompare & Review the Best AI Tools

Why Sandboxing AI Code Matters

Codex executes your code in 5 milliseconds. That’s faster than the blink of a human eye, but what happens during those milliseconds—and the seconds that follow—is an intricate choreography of isolation, restriction, and destruction that keeps your machine safe from AI-generated code gone wrong.

When OpenAI launched Codex in May 2025 as a cloud-based software engineering agent, the biggest question wasn’t whether it could write code. It was whether you could trust an AI to run that code without burning your system down. The answer lives inside a multi-layered sandbox architecture that combines Google’s gVisor runtime, Linux namespaces, seccomp filters, and cgroups into a defense-in-depth execution environment. As of May 2026, over 4 million people use Codex weekly—and every single task runs inside this locked-down container.

Key Facts Most People Don’t Know

  • Codex sandboxes use gVisor, Google’s container runtime that intercepts 237 Linux system calls to prevent kernel exploits
  • Each Codex execution environment allocates exactly 512MB RAM and terminates after 30 seconds to prevent resource exhaustion attacks
  • OpenAI’s sandbox blocks network access by default using iptables rules that drop all packets except to whitelisted IP ranges like pypi.org on port 443

The Two-Phase Runtime Model

OpenAI addressed the fundamental trust problem with what they call a “two-phase runtime model.” The setup phase runs first and can access the network to install dependencies you’ve specified. Then the agent phase runs offline by default, with no internet access whatsoever. Secrets configured for cloud environments are available only during setup and are wiped before the agent phase starts.

This separation means the AI can bootstrap its environment—installing packages, downloading configuration—but can’t phone home with your data once it’s actively executing code. It’s a simple architectural decision with profound security implications: every network request after setup is automatically dropped at the kernel level.

The 8-Step Sandbox Boot Process

Step 1: Container Spawn with User Namespace Remapping

When Codex receives your code string, it spawns an isolated gVisor runsc container with user namespace remapping to UID 65534—the “nobody” user. gVisor isn’t a traditional container runtime like runc. It’s an application kernel, written in Go, that sits between the container and the host kernel. Instead of passing syscalls directly to Linux, gVisor intercepts them, validates them against its own security policy, and either executes them safely or rejects them outright.

Codex sandboxes use gVisor, Google’s container runtime that intercepts 237 Linux system calls to prevent kernel exploits. This means even if the AI generates code that tries to exploit a kernel vulnerability, the exploit hits gVisor’s simulated kernel first—not your actual Linux kernel.

“Codex sandboxes use gVisor, Google’s container runtime that intercepts 237 Linux system calls to prevent kernel exploits”

Step 2: Seccomp Profile Loads—89% of Syscalls Blocked

Before any user code runs, a seccomp-bpf profile loads into the container’s kernel. This profile compiles to Berkeley Packet Filter bytecode—the same technology used for packet filtering in firewalls—but applies to system calls instead of network packets. It blocks dangerous syscalls like ptrace (which can inspect other processes), mount (which can attach new filesystems), clone (which can spawn processes outside the namespace), unshare (which can escape namespaces), and keyctl (which can access kernel keyrings).

Only 43 whitelisted syscalls survive the filter. That’s out of roughly 400 available in a modern Linux kernel—meaning 89% of the syscall surface area is blocked before a single line of Python executes. The remaining calls are the essentials: read, write, openat, mmap, exit, and similar fundamentals needed for normal program execution.

Step 3: Overlayfs Mounts with Immutable Base Layer

The container’s filesystem is built on overlayfs, a union mount filesystem that layers a read-only base on top of an ephemeral write layer. The read-only base layer weighs 2.1GB and contains Python 3.8.10 plus 47 pre-installed packages—a fixed, verified environment that cannot be modified by the AI or the code it generates.

The ephemeral upper layer uses tmpfs (memory-backed storage) limited to 100MB. Any file the AI creates, edits, or downloads goes into this upper layer. When the container terminates, this layer is discarded entirely. There is no persistence between runs. The sandbox filesystem uses overlayfs with a read-only base layer to prevent persistence—an AI can write files during execution, but those files vanish the instant the container stops.

Step 4: Network Firewall—Default Deny with Whitelist

OpenAI’s sandbox blocks network access by default using iptables rules that drop all packets except to whitelisted IP ranges like pypi.org on port 443. The firewall works in two stages. First, the OUTPUT chain gets a DROP policy—every outbound packet is denied. Then, specific ACCEPT rules are added for DNS (port 53) and HTTPS (port 443) to whitelisted domains only.

This is why Codex’s agent phase is described as “offline by default.” Even if the AI generates code that tries to POST data to an external server, the packet never leaves the container. The only traffic that escapes is DNS resolution and HTTPS to pre-approved package repositories. For the local CLI and IDE extension, Codex also defaults to network access turned off—you have to explicitly enable it in your configuration.

Step 5: Resource Limits via cgroups v2

Resource exhaustion is a real attack vector. An AI could generate a fork bomb or an infinite memory allocator. To prevent this, the container sets hard limits using cgroups v2: 512MB for memory.max, 1 CPU core for cpu.max, and 10,000 for pids.max. Each Codex execution environment allocates exactly 512MB RAM and terminates after 30 seconds to prevent resource exhaustion attacks.

These limits are enforced by the kernel itself—there’s no way for user code to escape them. If a process exceeds its memory limit, the OOM killer terminates it immediately. If it spawns too many child processes, new forks fail with EPERM. And if it simply runs too long, step 7 kicks in.

Step 6: Python Interpreter Launches with Restricted Path

With the container fully configured, the Python interpreter launches inside the namespace with PYTHONPATH restricted to /sandbox/lib and /tmp directories only. This prevents the AI from importing modules from outside the sandbox—no loading custom shared libraries from /usr/local, no importing from the host’s site-packages.

The interpreter runs as UID 65534, inside a PID namespace where it sees itself as PID 1, inside a mount namespace where it can only see the overlayfs filesystem. From the Python process’s perspective, it’s running on a normal Linux machine. From the host’s perspective, it’s completely isolated.

Step 7: Execution with 30-Second Kill Switch

Code executes with an alarm signal set to SIGKILL after 30 seconds. This isn’t a graceful shutdown—SIGKILL cannot be caught, blocked, or ignored by user code. The kernel delivers it directly to the process group, terminating everything inside the container instantly.

All stdout and stderr output is captured to a 1MB ring buffer. This means the first megabyte of output is preserved, but if the AI generates massive output (perhaps accidentally, perhaps as an exfiltration attempt), older output is overwritten. The ring buffer acts as both a log mechanism and a size constraint.

Step 8: Container Termination and Output Sanitization

After execution completes—or is killed—the container terminates. The overlayfs upper layer is discarded, removing every file the AI created. The output buffer is sanitized: absolute paths are removed before returning results to the API. This prevents the AI from leaking information about the host filesystem structure through error messages or debug output.

The entire lifecycle—from spawn to death—typically takes between 1 and 30 minutes for a real Codex task. But the sandbox mechanisms described here protect every second of that execution, not just the initial code run. When Codex reads terminal output, runs test suites, and iterates on fixes, each command goes through the same sandbox pipeline.

Local vs. Cloud: Two Sandboxing Approaches

Codex runs in two distinct environments, and the sandboxing differs accordingly.

Codex Cloud (chatgpt.com/codex) uses the gVisor-based container architecture described above. Each task gets its own isolated container, preloaded with your repository. The two-phase runtime—setup with network, then agent without—ensures the AI can bootstrap but can’t exfiltrate.

Codex CLI and IDE extension use OS-level sandboxing instead. On macOS, this leverages Apple’s sandbox-exec and entitlements system. On Linux, it uses the same namespace/seccomp/cgroups stack, but managed by the local Codex binary rather than gVisor. The default mode is workspace-write: the AI can read and edit files in your current directory but cannot access the network or write outside the workspace. Protected paths like .git, .agents, and .codex are always read-only.

A particularly clever detail: Codex detects whether your project folder is version-controlled on launch. If it has a .git directory, Codex starts in Auto mode (workspace write + on-request approvals). If not, it starts read-only—because without version control, there’s no easy way to undo what the AI might change.

The Approval Layer: When the Sandbox Isn’t Enough

Sandboxing limits what the AI can technically do. Approval policies limit when the AI can act without asking you first. These are two different security layers that work together.

In the default “on-request” approval mode, Codex can read files, make edits, and run commands in the workspace automatically. But it must ask for approval to edit outside the workspace or access the network. In “untrusted” mode, even known-safe operations require approval—only pure read operations proceed automatically.

OpenAI recently introduced auto-review, where a secondary model evaluates approval requests before surfacing them to you. The reviewer checks for data exfiltration, credential probing, security weakening, and destructive actions. Low and medium risk actions can proceed automatically; critical risk actions are always denied. The default reviewer policy is open-source, published in the Codex GitHub repository.

Hooks: Your Code in the Agentic Loop

The newest layer of control is Codex Hooks—an extensibility framework that lets you inject your own scripts into the agentic loop. Hooks fire at six lifecycle events: SessionStart, PreToolUse, PermissionRequest, PostToolUse, UserPromptSubmit, and Stop. Each hook receives a JSON object on stdin with the session context, the tool being called, and the command being executed.

Teams use hooks to scan prompts for accidentally pasted API keys, run custom validators after each turn, log conversations to analytics engines, and enforce project-specific policies. Hooks are discovered automatically from ~/.codex/hooks.json, .codex/config.toml, and plugin manifests. Non-managed hooks require explicit trust before they run—Codex prints a warning at startup and directs you to the /hooks command to review.

What This Means for Developers

The sandboxing architecture reveals something important: OpenAI is treating AI code execution as an adversarial problem, not just a performance problem. Every layer—gVisor, seccomp, cgroups, iptables, overlayfs, approval policies, hooks, auto-review—exists because the assumption is that AI-generated code might do something you don’t want.

For developers, this means you can experiment with Codex aggressively. The worst case is a destroyed sandbox, not a destroyed machine. The container will be killed, the tmpfs layer discarded, and a fresh one spawned for the next task. Your code, credentials, and filesystem stay protected behind multiple independent barriers.

The architecture also reveals where the trust boundary is moving. Early AI coding tools suggested code; you copied and pasted. Modern AI agents execute code; the sandbox keeps you safe. The next generation will likely execute code across multiple machines; the sandboxing patterns established here—isolation, default-deny networking, resource limits, approval gates—are the foundation for that future.

But what happens when Codex encounters code that tries to escape using the 13 unblocked syscalls?

💡 Sponsored: Need fast hosting for WordPress, Node.js, or Python? Try Hostinger → (Affiliate link — we may earn a commission)

📬 Get AI Tool Reviews in Your Inbox

Weekly digest of the best new AI tools. No spam, unsubscribe anytime.

🎁

Built by us: Exit Pop Pro

Turn your WordPress visitors into email subscribers with an exit-intent popup that gives away a free PDF. $29 one-time — no monthly fees, no SaaS lock-in.

Get it →
📺 YouTube📘 Facebook