# How OpenAI’s Jalapeno Chip Is Built With Broadcom
OpenAI’s secret chip could kill Nvidia’s monopoly. On June 24, 2026, OpenAI and Broadcom officially unveiled Jalapeño — OpenAI’s first custom Intelligence Processor, designed from scratch for LLM inference and developed in a record-breaking nine months from concept to silicon.
- OpenAI codenamed their custom inference chip ‘Jalapeno’ in internal documents leaked through procurement channels in late 2023
- Broadcom assigned over 120 engineers specifically to OpenAI’s project under a multi-year contract worth an estimated $500 million starting in 2023
- The Jalapeno chip targets 5nm process node fabrication through TSMC, aiming for 3x better performance-per-watt than Nvidia’s H100 for transformer inference
The chip, delivered to OpenAI CEO Sam Altman and President Greg Brockman by Broadcom CEO Hock Tan and President Charlie Kawwas, represents the most aggressive vertical integration move by an AI lab to date. Engineering samples are already running GPT-5.3-Codex-Spark in the lab at production target frequency and power. But the real story isn’t the announcement — it’s how this chip was actually built, and what it means for the $300 billion AI infrastructure market.
## Why OpenAI Built Its Own Chip
For two years, OpenAI burned through Nvidia GPUs like they were disposable. Every ChatGPT query, every Codex agent execution, every API call ran on someone else’s hardware — hardware that came with Nvidia’s legendary margin premium. Reuters reported in February 2026 that OpenAI was “unsatisfied” with certain Nvidia chips and actively seeking alternatives.
The core problem: inference costs. When you run ChatGPT for 400 million weekly users, the cost of generating each token matters enormously. Nvidia’s H100 and B200 chips are powerful general-purpose AI accelerators, but they carry circuits for FP32 training workloads that OpenAI doesn’t need during inference. That’s wasted silicon, wasted power, and wasted money.
Jalapeño eliminates all of that. OpenAI’s chip design focuses exclusively on INT8 and INT4 quantized inference, eliminating FP32 training circuits to reduce die size by approximately 40%. That smaller die means more chips per wafer, lower power consumption, and ultimately, a much lower cost per token.
## The Broadcom Connection
This isn’t Broadcom’s first rodeo with custom AI silicon. Broadcom previously designed Google’s TPU v4 and v5 chips, which process over 90% of Google’s internal AI inference workloads as of 2024. That track record is precisely why OpenAI partnered with them in October 2025 for what would become the Jalapeño project.
Broadcom assigned over 120 engineers specifically to OpenAI’s project under a multi-year contract worth an estimated $500 million starting in 2023. The scale of that commitment reflects both the complexity of the chip and the strategic importance both companies placed on breaking Nvidia’s grip on AI compute.
## How Jalapeño Was Actually Built: The 8-Step Process
### Step 1: Define the Transformer Architecture Requirements
OpenAI started by defining specific transformer architecture requirements — including attention head counts, layer depths, and token sequence lengths their models use. Unlike general-purpose GPU designers who optimize for every possible workload, OpenAI knew exactly which tensor dimensions, attention patterns, and memory access sequences their models would execute. This “workload-first” approach gave Broadcom’s silicon architects a precise target to optimize against.
### Step 2: Map Requirements to Custom Silicon Blocks
Broadcom’s architects then mapped these requirements to custom silicon blocks, creating specialized matrix multiplication units optimized for the specific tensor dimensions OpenAI’s models use. Where an Nvidia GPU runs thousands of CUDA cores that handle any math you throw at them, Jalapeño’s math units are hardwired for the exact matrix shapes that GPT-class models produce — cutting clock cycles from every inference pass.
### Step 3: Design Custom On-Chip SRAM Hierarchies
Memory bandwidth is the silent killer of AI inference performance. Engineers designed custom on-chip SRAM hierarchies with sizes matching OpenAI’s model layer dimensions to minimize external memory bandwidth bottlenecks. Instead of fetching weights from HBM memory across a narrow bus (the typical GPU bottleneck), Jalapeño keeps the most-accessed model parameters in on-chip SRAM that’s sized precisely for the layers it’s running.
### Step 4: Implement Custom Dataflow Patterns
“Broadcom implements custom dataflow patterns where activation data flows directly between processing units without round-tripping to external HBM memory”
This is where Jalapeño diverges most dramatically from GPU architectures. Broadcom implemented custom dataflow patterns where activation data flows directly between processing units without round-tripping to external HBM memory. In a traditional GPU, each layer’s output gets written back to global memory, then read again for the next layer. Jalapeño’s dataflow architecture keeps data moving through the compute pipeline, slashing latency and power consumption simultaneously.
### Step 5: Build Compressed Weight Decoders
The design team created specialized decoders for compressed weight formats, allowing 4-bit weights to be decompressed on-the-fly during computation cycles. This is critical: storing weights in 4-bit format instead of 16-bit reduces memory requirements by 4x, but you need silicon that can decompress them fast enough to keep the compute units fed. Jalapeño’s dedicated decompression circuits solve exactly this problem.
### Step 6: Partition Into Inference Tiles
Physical design engineers partition the chip into multiple identical inference tiles, each capable of processing independent user requests simultaneously. This is the multi-tenancy advantage: a single Jalapeño chip can serve multiple ChatGPT users at once without context-switching overhead, because each tile operates independently. It’s a fundamentally different approach from a GPU’s time-slicing model.
### Step 7: Fabricate at TSMC’s 5nm Node
TSMC fabricates the chip using EUV lithography at the 5nm node, with each wafer yielding approximately 400–600 chips depending on die size. The 40% die-size reduction from eliminating training circuits directly translates to higher yields and lower per-chip cost — a double economic advantage over Nvidia’s larger GPU dies.
### Step 8: Integrate Into Custom Server Infrastructure
OpenAI integrates chips into custom server boards with optimized PCIe Gen5 interconnects and liquid cooling systems for deployment in their data centers. Celestica handles board, rack, and system integration. Broadcom’s Tomahawk networking silicon connects the racks together. The result is a full-stack inference platform where every component — from chip to rack to network — was co-designed for one purpose.
## The Nine-Month Miracle
Perhaps the most staggering claim from OpenAI: Jalapeño went from initial design to manufacturing tape-out in just nine months. OpenAI calls it “the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors.”
How? Two factors. First, deep software-hardware co-development — OpenAI’s engineers worked side-by-side with Broadcom’s silicon team, not in the traditional arms-length vendor relationship. Second, OpenAI’s own AI models accelerated parts of the design and optimization process. The same models that ChatGPT serves to users helped engineers design the chips that will run future models.
That recursive loop — AI helping build better AI infrastructure — is the part that should worry Nvidia the most. If each generation of chip gets faster to design because the AI helping design it gets smarter, the cadence advantage compounds.
## What This Means for the AI Chip Market
Jalapeño is inference-only. It won’t train GPT-6 or GPT-7 — those workloads will still run on Nvidia’s most powerful GPUs for the foreseeable future. But inference is where 80–90% of AI compute spending actually goes. Every ChatGPT response, every API call, every Codex agent execution is inference. If Jalapeño delivers even 2x better performance-per-watt than current alternatives, the economics shift dramatically.
Google already runs most of its inference on custom TPU silicon. Amazon’s Trainium and Inferentia chips handle a growing share of AWS AI workloads. Now OpenAI is joining the custom silicon club — and they’re doing it with Broadcom, the company that built Google’s TPUs.
Hock Tan, Broadcom’s CEO, made the ambition clear: “This is just the beginning of a multi-generation roadmap.” Jalapeño is the first chip in what OpenAI describes as a platform that will expand to gigawatt-scale data centers with Microsoft and other partners beginning in 2026.
## The Full-Stack Flywheel
OpenAI’s announcement emphasized a concept they call the “full-stack advantage.” Because they build the models, the serving systems, the products, and now the chips, every layer can be optimized around the same goal. Better infrastructure drives compute efficiency. Greater efficiency enables better training and serving. Better models become better products. Better products drive more revenue. More revenue funds the next generation of infrastructure.
It’s a flywheel that Nvidia can’t replicate because Nvidia doesn’t run ChatGPT. Google can replicate it — and has, with TPUs. But OpenAI’s bet is that their direct understanding of LLM inference patterns, combined with Broadcom’s silicon expertise, produces a chip that no general-purpose accelerator can match.
But there’s one critical bottleneck Broadcom still hasn’t solved that could make this entire chip obsolete — and we’ll explore what that is in our next deep dive.
Built by us: Exit Pop Pro
Turn your WordPress visitors into email subscribers with an exit-intent popup that gives away a free PDF. $29 one-time — no monthly fees, no SaaS lock-in.

