How the Transformer Inventor Walking Out on Google Changes Everything

HubAI Asia
HubAI AsiaCompare & Review the Best AI Tools

Google paid $2.7 billion to get him back — and now he’s gone again. On June 18, 2026, Noam Shazeer, the co-lead of Google’s Gemini AI project and one of the original architects behind the transformer revolution, announced he was leaving Google to join OpenAI, Reuters first reported. The move sends shockwaves through an industry already reeling from the Anthropic-Fable 5 export control crisis and raises a blunt question: can Google hold onto its top AI talent?

Key Facts Most People Don’t Know

  • Noam Shazeer co-authored the 2017 ‘Attention Is All You Need’ paper with 7 other Google researchers, introducing the transformer architecture that now powers every major LLM
  • In 2021, Shazeer left Google to found Character.AI, which reached 20 million users by 2023 and was valued at $1 billion within 18 months
  • Google acquired Character.AI in August 2024 for $2.7 billion primarily to rehire Shazeer, making it one of the most expensive acqui-hires in tech history

The Man Who Invented the Architecture Behind Every ChatGPT

To understand why this departure matters, you need to understand what Shazeer built. Noam Shazeer co-authored the 2017 “Attention Is All You Need” paper with 7 other Google researchers, introducing the transformer architecture that now powers every major LLM — from GPT-4 to Claude to Gemini itself. Without that paper, there is no ChatGPT, no AI boom, no trillion-dollar market caps.

But Shazeer didn’t stop at the transformer. During his first Google tenure from 2000 to 2021, he held 46 patents, including the sparse mixture-of-experts architecture that powers GPT-4’s rumored 1.76 trillion parameters. He also invented the SwiGLU activation function in 2020, which improved LLaMA model performance by 1.2% over standard GELU and is now used in GPT-4 and Claude. These aren’t incremental contributions — they’re load-bearing walls in the cathedral of modern AI.

The $2.7 Billion Boomerang

Shazeer’s relationship with Google has been anything but smooth. In 2021, he and colleague Daniel de Freitas built an internal chatbot called Meena. Google refused to release it publicly. Frustrated, Shazeer left to found Character.AI, which reached 20 million users by 2023 and was valued at $1 billion within 18 months. Time Magazine named Shazeer one of the 100 most influential people in AI.

Then came the boomerang. Google acquired Character.AI in August 2024 for $2.7 billion primarily to rehire Shazeer, making it one of the most expensive acqui-hires in tech history. Since Shazeer owned 30–40% of the company, he personally netted an estimated $750 million to $1 billion. He returned as a Vice President of Engineering and was installed as co-lead of Gemini alongside Jeff Dean and Oriol Vinyals.

Google paid $2.7 billion to bring back an AI genius who quit in frustration — only to watch him walk out the door again less than two years later.

How the Transformer Actually Works (And Why Shazeer’s Departure Threatens Google’s Edge)

The transformer architecture that Shazeer helped create isn’t just a research paper — it’s the engine inside every frontier model. Here’s how it processes language, step by step:

Step 1: Token Embeddings

Input tokens are converted into 512 or 768-dimensional embedding vectors representing semantic meaning in continuous space. Every word or subword becomes a point in a high-dimensional landscape.

Step 2: Positional Encoding

Since transformers process all tokens simultaneously (unlike RNNs), positional encoding adds sine and cosine wave patterns at different frequencies to each embedding, injecting sequence order information. Without this, “dog bit man” and “man bit dog” would look identical.

Step 3: Query, Key, and Value Matrices

This is the heart of attention. Query, Key, and Value matrices are created by multiplying embeddings with three separate learned weight matrices. Think of it as every word preparing three signals: “what I’m looking for” (Query), “what I contain” (Key), and “what I’ll contribute” (Value).

Step 4: Attention Score Calculation

Attention scores are calculated by taking the dot product of Query with all Keys, then dividing by the square root of dimension size (typically 64). This scaling prevents gradient vanishing in deep networks — a subtle fix that made the whole architecture trainable.

Step 5: Softmax Normalization

The softmax function normalizes attention scores into a probability distribution, determining how much each token attends to every other token. This is where context emerges: the word “bank” knows whether it’s next to “river” or “investment.”

Step 6: Weighted Value Sum

A weighted sum of Value vectors is computed using attention probabilities, creating context-aware representations. Each token now carries information from every other token, weighted by relevance.

Step 7: Multi-Head Attention

Multi-head attention runs 8 to 16 parallel attention operations simultaneously, each learning different relationship patterns. One head might track syntax, another coreference, another sentiment — all in parallel.

Step 8: Feed-Forward Processing

A feed-forward network with two linear layers and an activation function processes each position independently, expanding to 4x dimension then compressing back. This is where the model stores and transforms the knowledge it has attended to.

Why Shazeer Left (Again)

The official reason for Shazeer’s departure hasn’t been disclosed. But the timing speaks volumes. His move comes as Google faces intensifying pressure on multiple fronts:

  • Gemini’s competitive position. Despite massive investment, Gemini has struggled to match the public mindshare of ChatGPT or the developer loyalty of Claude. Internal frustration with Google’s cautious release strategy — the same frustration that drove Shazeer to leave in 2021 — appears to be ongoing.
  • The OpenAI talent vacuum. OpenAI has been aggressively recruiting top researchers, and Shazeer’s arrival gives it something no amount of compute can buy: the person who literally invented the core architecture. With Sam Altman’s team reportedly working on next-generation models that push beyond the transformer paradigm, Shazeer’s expertise in both transformers and mixture-of-experts architectures is uniquely valuable.
  • The Mythos effect. The US government’s export control order against Anthropic’s Fable 5 and Mythos 5 models has created uncertainty across the entire frontier AI landscape. If the government can kneecap one company’s flagship model overnight, no AI lab is truly safe from political interference. OpenAI, with its closer government ties, may offer a more stable perch.

What This Means for Google’s AI Strategy

Losing Shazeer is more than a PR hit. He wasn’t just a manager — he was the technical architect steering Gemini’s evolution. His mixture-of-experts work directly influenced how Gemini handles efficiency at scale, and his departure leaves a gap in both technical vision and institutional knowledge.

Google still has Jeff Dean and Oriol Vinyals leading Gemini, along with DeepMind’s formidable research bench. But Shazeer’s departure follows a pattern: Google builds world-class AI research, then watches key people leave when the company can’t ship products fast enough to satisfy them. It happened with Meena. It happened with Character.AI. And now it’s happening again.

The $2.7 billion Character.AI deal was supposed to end this cycle. Instead, it turned into a very expensive two-year rental.

The Bigger Picture: AI Talent Wars Enter a New Phase

Shazeer’s move to OpenAI crystallizes a broader shift. The AI industry is no longer competing just on compute or data — it’s competing on the handful of people who understand how to build and improve frontier models at the architecture level. These researchers are fewer than you think, and their decisions can reshape competitive dynamics overnight.

In February 2026, Shazeer was elected to the National Academy of Engineering — a recognition that places him among the most consequential engineers of his generation. The fact that such a person is now choosing OpenAI over Google sends a signal that the balance of power in AI may be shifting faster than quarterly earnings reports suggest.

But wait until you see what Shazeer’s sparse expert routing actually does to inference costs — that might be the real reason OpenAI wanted him.

💡 Sponsored: Need fast hosting for WordPress, Node.js, or Python? Try Hostinger → (Affiliate link — we may earn a commission)

📬 Get AI Tool Reviews in Your Inbox

Weekly digest of the best new AI tools. No spam, unsubscribe anytime.

🎁

Built by us: Exit Pop Pro

Turn your WordPress visitors into email subscribers with an exit-intent popup that gives away a free PDF. $29 one-time — no monthly fees, no SaaS lock-in.

Get it →
📺 YouTube📘 Facebook