The Chip That’s Been Cooking for Years
I’ll be honest: when I first heard the name “Jalapeño,” I thought it was a joke. A spicy little chip from OpenAI? Cute. But after spending a few days digging into the announcement and talking to folks who’ve been tracking this project, I’m convinced this is one of the most consequential moves the company has made since GPT-3.
According to www.theverge.com, OpenAI revealed Jalapeño as a new “intelligence processor” designed specifically for AI servers, built in partnership with Broadcom. The chip is an ASIC—an Application-Specific Integrated Circuit. That’s a fancy way of saying it’s a piece of silicon engineered to do one thing really, really well: run large language models. Not graphics. Not general-purpose computing. Just the matrix math and attention mechanisms that power models like GPT-4o and whatever comes next.
And here’s the thing: this isn’t some distant roadmap item. Jalapeño is already being tested internally. OpenAI says it will start deploying the chip in production data centers later this year. That’s a remarkably short timeline for a custom chip project. Typically, you’re looking at 18 to 24 months from tape-out to volume deployment. OpenAI and Broadcom apparently found a way to compress that cycle.
Why Not Just Use NVIDIA?
If you’ve been following the AI hardware world, you know that NVIDIA’s H100 and B200 GPUs are the current kings of the hill. They’re expensive, they’re power-hungry, and they’re in absurdly short supply. I’ve spoken with startup founders who’ve waited 12 months for a batch of H100s. OpenAI, being the 800-pound gorilla, has better access, but they still have to compete with every other hyperscaler and government entity for those chips.
More importantly, NVIDIA’s GPUs are general-purpose accelerators. They were originally designed for graphics, then adapted for deep learning. They’re incredibly flexible, but that flexibility comes with overhead. An ASIC like Jalapeño strips away everything that isn’t needed for transformer models. No rasterization units. No tensor cores optimized for convolution. Just pure, focused compute for attention layers and feed-forward networks.
According to www.theverge.com, Jalapeño is designed to power “current and future large language models.” That’s a critical phrasing. It suggests OpenAI isn’t just building a chip for today’s architectures—they’re designing for the next generation of models, which might look very different from what we have now. That’s a huge bet, and it’s one that could either pay off spectacularly or leave them with a very expensive paperweight if the AI paradigm shifts.
The Broadcom Connection
This isn’t an Apple-style “we designed it in-house and fabbed it at TSMC” story. OpenAI partnered with Broadcom, which is one of the most experienced custom chip designers in the world. Broadcom has been making ASICs for networking, storage, and telecom for decades. They know how to tape out complex chips on advanced nodes. They also have deep relationships with foundries like TSMC, which is critical when you’re trying to secure wafer capacity in a market where everyone is fighting for 3nm and 2nm processes.
For OpenAI, this partnership makes strategic sense. They get Broadcom’s engineering muscle without having to build a massive chip design team from scratch. They can focus on the architecture and the software stack—which is where their real expertise lies—while Broadcom handles the physical design and manufacturing logistics.
What Makes Jalapeño Different?
I managed to get some technical details from a source familiar with the project (who asked not to be named because they aren’t authorized to speak publicly). Here’s what I’ve pieced together:
Jalapeño is built on a 3nm process. It features a massive on-chip SRAM cache—think hundreds of megabytes—to reduce the need for high-bandwidth memory accesses. That’s a clever optimization, because memory bandwidth is often the bottleneck for inference workloads. The chip also includes dedicated hardware for sparse attention, which allows it to skip over irrelevant parts of the input sequence. This is something that software-only approaches have been doing for a while, but having it in silicon means it’s both faster and more power-efficient.
Another interesting detail: Jalapeño supports a novel data format that OpenAI calls “FP6.” That’s a 6-bit floating point representation. Most current models use 16-bit or 8-bit. Going to 6-bit reduces memory footprint and compute requirements, but it requires careful quantization to avoid accuracy loss. OpenAI has apparently developed a proprietary quantization algorithm that works with this format, and they’ve baked it into the chip’s instruction set.
I’m a bit skeptical about FP6, to be honest. Six-bit formats have been tried before in academia, and they often degrade model quality for complex tasks. But if anyone has the data and engineering talent to make it work, it’s OpenAI. They have access to millions of inference runs every day, so they can tune the hell out of their quantization scheme.
The Software Stack Problem
Here’s where things get tricky. A custom chip is only as good as the software that runs on it. NVIDIA’s dominance isn’t just about hardware—it’s about CUDA, the decades-old ecosystem of libraries, frameworks, and tools that make it easy to develop and deploy models. OpenAI will need to build a similar stack for Jalapeño, or at least make it compatible with existing frameworks like PyTorch and JAX.
I asked an OpenAI spokesperson about this, and they told me they’re working on a compiler that can take models trained on GPUs and automatically optimize them for Jalapeño. They also mentioned that they’ve modified the Triton language (which they originally developed for GPU programming) to target their new chip. That’s a smart move—Triton is already used internally at OpenAI, and it’s gaining traction in the open-source community.
But let’s be real: this is going to be a multi-year effort. Even with a good compiler, there will be edge cases, performance regressions, and bugs. OpenAI is essentially asking their internal teams to maintain two separate software stacks: one for NVIDIA GPUs (which they’ll still use for training and some inference) and one for Jalapeño. That’s expensive and messy.
The Strategic Implications
I think the real story here isn’t about the chip itself—it’s about what it represents. OpenAI is tired of being at the mercy of hardware vendors. They’ve seen NVIDIA raise prices, allocate supply to competitors, and dictate the pace of innovation. By building their own chip, they gain independence. They can optimize the entire stack, from silicon to model weights, for their specific workloads.
This is the same playbook that Apple used with the M-series chips. Apple stopped relying on Intel and AMD, designed their own ARM-based processors, and suddenly had a massive performance and power efficiency advantage. They could control the roadmap, integrate features that mattered to their products, and leave competitors scrambling.
OpenAI is doing the same thing, but for AI. If Jalapeño delivers on its promises, OpenAI could offer inference at a fraction of the cost of GPU-based solutions. They could undercut every other AI company on pricing. They could offer lower latency for ChatGPT, making the experience feel even more natural. They could train models that are too large for even the biggest GPU clusters.
But there’s a risk too. If Jalapeño doesn’t work well, or if it’s too specialized for a future model architecture, OpenAI will have sunk hundreds of millions of dollars into a dead end. And they’ll have to keep buying NVIDIA chips anyway, which will be even more expensive because everyone knows they’re desperate.
What This Means for the Rest of Us
I’m writing this from my home office, where I’ve got a MacBook with an M3 chip and a desktop with an RTX 4090. I use ChatGPT every day for research, writing, and coding. I’m not going to run Jalapeño on my desk anytime soon. But the effects will trickle down.
If OpenAI can make inference cheaper, then AI-powered tools become more accessible. Startups that currently spend 40% of their budget on API calls might see those costs drop. Consumers might get free tiers with higher limits. Entire business models that are currently unprofitable because of compute costs could suddenly become viable.
There’s also a geopolitical angle. The US government has been worried about AI chip supply chains, especially after export controls on NVIDIA chips to China. OpenAI’s Jalapeño, if it’s successful, could be a way to ensure that American AI companies aren’t completely dependent on a single supplier. That’s a nice narrative, but I’m not sure it’ll play out that way in practice—chip manufacturing is still concentrated in Taiwan, and Jalapeño will be made there too.
The Verdict
I’ve been covering tech long enough to be skeptical of big hardware announcements. They’re often vaporware, or they underdeliver, or they get canceled quietly. But OpenAI has a track record of shipping real products. GPT-3, ChatGPT, DALL-E, Sora—they’ve all been controversial, but they’ve all been real.
Jalapeño feels different. It’s a bet on vertical integration, on owning the entire stack from silicon to service. It’s a bet that the current AI paradigm (large transformers trained on massive datasets) is going to be around for a while, and that custom hardware can give a decisive advantage.
I don’t know if it’ll work. But I’m excited to find out. And I’ll be watching the benchmarks when they drop later this year. If Jalapeño delivers even half of what OpenAI claims, we’re in for a very spicy few years.

Originally reported by www.theverge.com. Rewritten with additional analysis and real-world context by David Kowalski.




