NNuggets
BookmarksCollections
  • About Us
  • Terms of use
  • Privacy policy
  • Disclaimer
  • Copyright & Takedown Policy
  • Community Guidelines
  • Cookie Policy
  • Contact

© 2026 Nuggets

NuggetsMarket PulseCollections

On this page

Speakers & Credentials

  • Speakers & Credentials
  • 1. Executive Summary
  • 2. Chronological Table of Contents
  • 3. Detailed Thematic Summary
  • The Reference Vault
  • 4. Data & Figures
  • 5. Core Frameworks & Mental Models
  • 6. Anecdotes
  • 7. References & Recommendations

On this page

  • Speakers & Credentials
  • 1. Executive Summary
  • 2. Chronological Table of Contents
  • 3. Detailed Thematic Summary
  • The Reference Vault
  • 4. Data & Figures
  • 5. Core Frameworks & Mental Models
  • 6. Anecdotes
  • 7. References & Recommendations
Technology/March 30, 2026/11 min read/youtu.be

AI Chip & Silicon Round-up 2026 | SemiAnalysis

Source
Source
Watch on YouTube ↗

"2026 will be a massive year for AI hardware no matter if GPUs, ASICs, or beyond, and I'm not sure we can still talk about an emerging industry." - SemiAnalysis Host [00:00:01]

"If Nvidia is Goliath, this company would be David. At least I'm sure that's what AMD would like to hear." - SemiAnalysis Host [00:02:16]

References

  1. Original source (youtu.be)

Disclaimer: Orignal content owned by or sourced from third parties. It does not represent the views of 'Nuggets' platform or it's team. AI is used extensively across this platform including for summaries. Accuracy is not guaranteed, there can be mistakes. Any info or content on this platform is not a financial, legal, or investment advice. Do your own research. Refer for complete disclosures:- Terms of Use · Full Disclaimer

Related nuggets

Jun 2, 2026

AI Is Escaping the Screen | 01 Jun 2026 | Coatue

Coatue : AI is entering a new phase: moving beyond digital tools and into fully autonomous systems operating in the physical world. From advanced manufacturing and surgical robotics to robots in the home, the next wave of innovation will b…

Jun 2, 2026

Kalshi Monthly Volume - Politics ($M) | Chart of the Day | Coatue

Coatue: Kalshi's political volume has scaled dramatically, and the American Power Index KPOW is what that scale enables: a single number gauge of the current balance of political power and where markets expect it to move, which Kalshi bill…

Jun 2, 2026

The BlackBerry Problem |18 May 2026 | The Mistakes Series | Malcolm Gladwell's Revisionist History

"My mistake and naivity was to think that people are were with me so you're flying around the world you're trying to get people on side and you think they're on side but they're not mhm mhm and you get blindsight" Jim Balsillie 00:01:34 ht…

Jun 2, 2026

Partnership Perspectives: Network International | 2 Jun 2026 | Brookfield Perspectives

Actions

Reading

Published
March 30, 2026
Read time
11 min read
Progress0%

"While everyone was hyped about blockchains, Google already prepared for AI." - SemiAnalysis Host [00:03:40]

"The chip is designed in a way that every step, every calculation runs like clockwork. This is called deterministic execution." - SemiAnalysis Host [00:06:37]

"More and more inference is moving away from Nvidia to custom silicon." - SemiAnalysis Host [00:11:12]


Speakers & Credentials

  • SemiAnalysis Host: Industry analyst representing SemiAnalysis, a premier boutique semiconductor research and consulting firm specializing in deep-dive technical evaluations of AI hardware, supply chains, and hyperscaler deployments.

1. Executive Summary

  • The AI accelerator market in 2026 will witness a mature but intensely competitive landscape, moving beyond early-stage development into highly differentiated architectures across multiple major players [00:00:01].
  • Nvidia maintains its dominant "Goliath" position with its upcoming Vera Rubin architecture, boasting unprecedented performance metrics like 35 Petaflops of FP4 [00:07:32], while simultaneously securing its inference flank by acquiring deterministic competitor Groq for $20 billion [00:06:59].
  • Merchant silicon competitors, notably AMD with its MI455X, are aggressively targeting Nvidia's memory bottleneck by pushing massive capacities up to 432 GB of NextG HBM4 [00:02:51].
  • Hyperscalers (Google, Meta, AWS, Microsoft) are dramatically scaling their custom, in-house silicon (ASICs) to optimize Total Cost of Ownership (TCO) for massive internal inference and training workloads, effectively reducing their reliance on Nvidia [00:11:12].
  • Alternative architectures prioritizing raw memory bandwidth and latency, such as Cerebras' wafer-scale engines and Groq's LPUs, continue to carve out specialized niches in the inference and ultra-fast serving markets [00:04:55].

2. Chronological Table of Contents

  • [00:00:01] Introduction: The Maturation of AI Hardware
  • [00:00:57] Qualcomm: From AI 100 to AI 250
  • [00:02:16] AMD: The David to Nvidia's Goliath (MI455X)
  • [00:03:33] Google: Ironwood (TPU v7) and Optical Networking
  • [00:04:55] Cerebras: Wafer Scale Engine 3 (WSE-3)
  • [00:05:54] Groq: Deterministic Execution via LPUs
  • [00:07:14] Nvidia: The Vera Rubin Era (VR200)
  • [00:08:17] Meta: MTIA v3 for Internal Recommendation Models
  • [00:09:26] Amazon/AWS: Trainium 3 for Large-Scale Deployment
  • [00:10:32] Microsoft: Maia 200 Inference ASIC
  • [00:11:20] Intel: Jaguar Shores (2027 Prospect)
  • [00:12:05] Conclusion & Industry Benchmarks

3. Detailed Thematic Summary

The Merchant Silicon Challengers: Qualcomm & AMD [00:00:57]

  • Qualcomm is attempting to penetrate the inference market with its AI200 ASIC, which features 70 billion transistors manufactured on TSMC's N3E node [00:01:23]. To bypass HBM supply constraints, it utilizes 768 GB of low-power DDR5X memory, though skyrocketing LPDDR5X prices may blunt its competitive edge [00:01:35]. Their future AI250 promises a compute-near-memory architecture that will allegedly yield a 10x increase in effective memory bandwidth using next-gen LPDDR6 [00:02:01].
  • AMD is positioning itself as a primary alternative to Nvidia with the MI455X and the Helios rack [00:02:31]. Built on the CDNA 5 architecture, the MI455X is massive, containing 320 billion transistors distributed across 12 2nm and 3nm logic chiplets via 3.5D packaging [00:02:41]. Its primary weapon is memory capacity: it boasts 432 GB of NextG HBM4 with a bandwidth of nearly 20 terabytes per second [00:02:51].

The Incumbent Dominator: Nvidia Vera Rubin [00:07:14]

  • Nvidia is set to maintain its market leadership with its 2026 flagship, the VR200 (Vera Rubin), manufactured on TSMC's N3B node [00:07:21].
  • A single VR200 superchip will deliver an unprecedented 35 Petaflops of FP4 compute performance [00:07:32].
  • It is paired with 288 GB of HBM4 memory capable of 22 terabytes per second in bandwidth [00:07:41].
  • At the system level, Nvidia will scale this using the NVL72 rack, combining 72 VR200 chips networked via their proprietary scale-up network [00:07:49]. They are also planning a "Reuben Ultra" variant that will expand memory to an astonishing 1 Terabyte of HBM [00:08:04].

Hyperscaler Custom Silicon: Google, Meta, AWS, and Microsoft [00:03:33]

  • Google: The grandfather of AI hardware is releasing Ironwood (TPU v7) on TSMC's N3E node with over 100 billion transistors across two chiplets and 192 GB of HBM3E [00:03:50]. Google's true differentiator is their Optical Circuit Switches (OCS)—physical mirrors that allow for highly efficient optical interconnects, enabling "super pods" connecting up to 9,216 TPUs [00:04:08].
  • Meta: Currently on their third iteration, MTIA v3 uses TSMC's N3P node and features over 100 billion transistors [00:08:40]. Crucially, Meta is transitioning from LPDDR5X to HBM memory for this generation [00:08:47]. This chip primarily targets their internal recommendation algorithms (Facebook, Instagram, Threads), allowing them to reserve Nvidia GPUs for frontier model training while optimizing inference TCO internally [00:08:57].
  • AWS (Amazon): Trainium 3 is designed for massive deployment, following the hundreds of thousands of Trainium 2 chips already successfully deployed in AWS data centers located in Canton and New Carlisle [00:09:35]. Built on TSMC's N3P, it holds 125 billion transistors and 144 GB of HBM3E [00:09:53]. It has garnered massive external customer adoption, with OpenAI committing to use 2 Gigawatts of Trainium compute [00:10:22].
  • Microsoft: Maia 200 is a massive 825 square millimeter chip packing 140 billion transistors on TSMC's N3P node [00:10:48]. Equipped with 216 GB of HBM3, it is highly optimized for FP8 and FP4 inference workloads, delivering between 5 to 10 Petaflops of performance to run in-house and future ChatGPT models [00:11:05].

The Third GPU Player: Intel [00:11:20]

  • Intel is making another push into the AI GPU space with Jaguar Shores, potentially targeting a 2027 release [00:11:28].
  • The chip boasts a highly competitive paper specification, leveraging Intel's 18A process node to pack 175 billion transistors alongside 288 GB of HBM4 [00:11:34].
  • Despite strong specs, Intel faces immense pressure to prove it can deliver both the hardware and the necessary software support to viably compete as a third major GPU player [00:11:50].

Alternative Architectures: Cerebras & Groq [00:04:55]

  • Cerebras: The Wafer Scale Engine 3 (WSE-3) uses an entire silicon wafer on the N4P node to hold a staggering 4 Trillion transistors [00:05:28]. It features 44 GB of on-chip SRAM yielding an unbelievable 21 Petabytes per second of memory bandwidth [00:05:11]. While 44 GB is becoming a limiting factor for the largest models, it remains dominant for ultra-fast serving, with the industry keeping its "fingers crossed" for a WSE-4 announcement in 2026 [00:05:38].
  • Groq: Utilizing a radically different approach, Groq's LPU (Language Processing Unit) relies on Deterministic Execution to eliminate latency [00:06:37]. The current generation uses older 14nm GlobalFoundries tech with 55 billion transistors and just 230 MB of ultra-fast SRAM located adjacent to compute cores, meaning zero external memory is used [00:06:21]. Underscoring the threat and value of this specialized inference architecture, Nvidia successfully acquired the company for $20 billion [00:06:59]. A highly anticipated second-generation LPU based on Samsung's 4nm process is currently on the horizon [00:07:06].

The Reference Vault

4. Data & Figures

Data PointValueContextTimestamp
Qualcomm AI 100 Deployment1,024 chipsSmall-scale older cluster deployment mentioned for context.[00:01:16]
Qualcomm AI200 Transistors70 BillionTransistor count for Qualcomm's N3E inference chip.[00:01:23]
Qualcomm AI200 Memory768 GB LPDDR5XNon-HBM memory approach to circumvent supply chain constraints.[00:01:35]
Qualcomm AI250 Bandwidth10x IncreaseExpected memory bandwidth increase using compute-near-memory.[00:02:01]

5. Core Frameworks & Mental Models

  • Deterministic Execution (The Groq Model): [00:06:37] Unlike traditional GPUs that experience latency due to complex scheduling and external memory fetching, Groq designs hardware where every operation runs on an exact, predictable physical clockwork. By keeping all memory on-chip (SRAM) and knowing exactly when data will arrive, software can perfectly orchestrate compute, completely eliminating variable latency for AI inference.
  • Scale-out vs. Scale-up via Optical Interconnects: [00:04:08] Google's architectural moat relies heavily on its Optical Circuit Switches (OCS). Instead of trying to put thousands of chips on a massive copper bus (which has physical distance limitations), Google uses physical mirrors to route data via light. This framework allows them to create seamlessly integrated "super pods" of up to 9,216 TPUs functioning as a unified compute engine.
  • The TCO / Margin Substitution Strategy (The Meta Model): [00:09:06] Meta leverages expensive, general-purpose Nvidia hardware strictly for the unpredictable, intensive task of training frontier models. Simultaneously, they deploy their cheaper, highly-specialized MTIA custom silicon to run their fixed, high-volume internal recommendation algorithms. This dual-track strategy maximizes business margins by optimizing Total Cost of Ownership (TCO) where workloads are known and stable.

6. Anecdotes

  • Google's Bet on AI over Blockchain: [00:03:40] The host highlights Google's profound foresight by contrasting their 2015 launch of the first TPU against the broader tech industry's behavior at the time. While the majority of the tech ecosystem was caught up in the hype cycle of blockchain and cryptocurrency, Google was quietly laying the physical infrastructure for the AI revolution nearly a decade before it entered mainstream consciousness.
  • Nvidia's $20 Billion Groq Acquisition: [00:06:59] Highlighting the immense value of Groq's radical architectural departure—eschewing external memory for deterministic execution—the host emphasizes that Nvidia, the Goliath of AI hardware, acquired the startup for a staggering $20 billion. This serves as an anecdote proving that the undisputed king of AI hardware explicitly recognizes the potent threat of specialized inference architectures.

7. References & Recommendations

  • Companies & Ecosystems Mentioned: Nvidia, AMD, Qualcomm, Google, Cerebras, Groq, Meta, Amazon (AWS), Anthropic, OpenAI, Microsoft, Intel, TSMC, GlobalFoundries, Samsung.
  • Key Chips & Products Referenced: Qualcomm AI 100/AI200/AI250, AMD MI455X & Helios Rack, Google Ironwood (TPU v7), Cerebras WSE-3 & WSE-4, Groq LPU (Gen 1 & Gen 2), Nvidia Vera Rubin (VR200), NVL72 Rack & Reuben Ultra, Meta MTIA v3, AWS Trainium 2 & Trainium 3, Microsoft Maia 200, Intel Jaguar Shores.
  • Recommended SemiAnalysis Literature/Tools:
    • SemiAnalysis article on AMD's AI Strategy (Deep Dive into MI455X & Helios)
    • SemiAnalysis TPU v7 Deep Dive
    • SemiAnalysis Vera Rubin Deep Dive
    • SemiAnalysis Trainium 3 Deep Dive (with TCO numbers)
    • SemiAnalysis Accelerator and HBM Model (Market shipment numbers and ASPs)
    • SemiAnalysis Inference X Dashboard (Real-world performance benchmarks)

"Brookfield's the largest infrastructure owner in the world... We drew a pipeline and we showed all the different components of the payments ecosystem on a pipeline and said it's like a pipe that moves any commodity except what it's moving…

AMD MI455X Transistors320 BillionDistributed across a mixed logic package.[00:02:41]
AMD MI455X Chiplet Config12 ChipletsMix of 2nm and 3nm logic chiplets utilizing 3.5D packaging.[00:02:41]
AMD MI455X Memory Capacity432 GB NextG HBM4Massive memory buffer targeting Nvidia's perceived weakness.[00:02:51]
AMD MI455X Memory Bandwidth~20 TB/sExtremely high bandwidth to feed the 320B transistors.[00:03:01]
Google Ironwood (TPU v7) Transistors>100 BillionSpread across two large compute chiplets on TSMC N3E.[00:03:50]
Google Ironwood Memory192 GB HBM3EHigh Bandwidth Memory specs for the TPU v7.[00:03:59]
Google Super Pod Scale9,216 TPUsMaximum TPU connection enabled by Optical Circuit Switches.[00:04:16]
Cerebras WSE-3 Memory Capacity44 GB SRAMEntirely on-chip static RAM; no external memory.[00:05:11]
Cerebras WSE-3 Bandwidth21 PB/sUnmatched wafer-scale theoretical memory bandwidth.[00:05:20]
Cerebras WSE-3 Transistors4 TrillionFabricated on TSMC's N4P node on a single wafer.[00:05:28]
Groq 1st Gen LPU Transistors55 BillionManufactured on an older 14nm GlobalFoundries node.[00:06:21]
Groq 1st Gen LPU Memory230 MB SRAMExtremely small but zero-latency memory placed next to compute.[00:06:37]
Groq Acquisition Value$20 BillionPrice Nvidia paid to acquire Groq and its deterministic LPU tech.[00:06:59]
Groq 2nd Gen LPU NodeSamsung 4nmExpected foundry and node for Groq's upcoming second generation.[00:07:06]
Nvidia VR200 Compute35 Petaflops (FP4)Insane raw compute power per individual Vera Rubin package.[00:07:32]
Nvidia VR200 Memory Capacity288 GB HBM4Standard memory loadout for the baseline Vera Rubin chip.[00:07:32]
Nvidia VR200 Bandwidth22 TB/sTarget speed for Vera Rubin memory interface.[00:07:41]
Nvidia NVL72 Rack Scale72 SuperchipsNumber of VR200 chips integrated into a single scale-up rack.[00:07:49]
Nvidia Reuben Ultra Memory1 Terabyte HBMUpgraded version of VR200 pushing memory capacity to the limit.[00:08:04]
Meta MTIA v3 Transistors>100 BillionManufactured on TSMC's N3P node.[00:08:40]
AWS Trainium 2 DeploymentsHundreds of ThousandsExisting scale of Trainium 2 chips in AWS data centers.[00:09:35]
AWS Trainium 3 Transistors125 BillionBuilt on TSMC's N3P node.[00:09:53]
AWS Trainium 3 Memory144 GB HBM3ECombined training and inference ASIC memory spec.[00:09:53]
OpenAI AWS Compute Deal2 GigawattsPower scale of Trainium compute OpenAI is committing to use.[00:10:22]
Microsoft Maia 200 Die Size825 mm²Radical reticle-busting physical chip footprint.[00:10:48]
Microsoft Maia 200 Transistors140 BillionFabricated on TSMC N3P.[00:10:48]
Microsoft Maia 200 Memory216 GB HBM3Memory spec for Microsoft's in-house inference chip.[00:10:56]
Microsoft Maia 200 Compute5-10 PetaflopsPerformance specifically optimized for FP8 and FP4 math.[00:11:05]
Intel Jaguar Shores Node18AAdvanced Intel manufacturing node targeted for its upcoming GPU.[00:11:34]
Intel Jaguar Shores Transistors175 BillionSpec sheet for Intel's 2027 GPU contender.[00:11:34]
Intel Jaguar Shores Memory288 GB HBM4Competitive on-paper memory specification.[00:11:43]