NNuggets
BookmarksCollections
  • About Us
  • Terms of use
  • Privacy policy
  • Disclaimer
  • Copyright & Takedown Policy
  • Community Guidelines
  • Cookie Policy
  • Contact

© 2026 Nuggets

NuggetsMarket PulseCollections

On this page

Speakers & Credentials [00:00:06]

  • Speakers & Credentials [00:00:06]
  • 1. Executive Summary [00:00:06]
  • 2. Chronological Table of Contents [00:00:06]
  • 3. Detailed Thematic Summary
  • The Shift from Code Gen to Platforms [00:00:06]
  • Inference X and the Automation of Pareto Curves [00:03:11]
  • Kernel Optimizations, Evaluation Variance, and Compute Economics [00:05:29]
  • GTC Keynote Triumphs [00:08:00]
  • The Reference Vault
  • 4. Data & Figures [00:00:51]
  • 5. Core Frameworks & Mental Models [00:02:30]
  • 6. Anecdotes [00:01:17]
  • 7. References & Recommendations [00:00:32]

On this page

  • Speakers & Credentials [00:00:06]
  • 1. Executive Summary [00:00:06]
  • 2. Chronological Table of Contents [00:00:06]
  • 3. Detailed Thematic Summary
  • The Shift from Code Gen to Platforms [00:00:06]
  • Inference X and the Automation of Pareto Curves [00:03:11]
  • Kernel Optimizations, Evaluation Variance, and Compute Economics [00:05:29]
  • GTC Keynote Triumphs [00:08:00]
  • The Reference Vault
  • 4. Data & Figures [00:00:51]
  • 5. Core Frameworks & Mental Models [00:02:30]
  • 6. Anecdotes [00:01:17]
  • 7. References & Recommendations [00:00:32]
Technology/March 31, 2026/9 min read/youtu.be

Waleed Atallah (Makora) x Dylan Patel | Researcher Conversations at GTC | SemiAnalysis

Source
Source
Watch on YouTube ↗

"i think the entire Silicon Valley like 80% of the startups here are starting to rethink everything now that coding has come this far." - Waleed Atallah [00:00:51]

"the point is that you cannot um have code generation itself be your product i think we're going to see a lot of basically giving the taking the codegen capability and then making these kind of endto-end platforms" - Waleed Atallah [00:02:30]

References

  1. Original source (youtu.be)

Disclaimer: Orignal content owned by or sourced from third parties. It does not represent the views of 'Nuggets' platform or it's team. AI is used extensively across this platform including for summaries. Accuracy is not guaranteed, there can be mistakes. Any info or content on this platform is not a financial, legal, or investment advice. Do your own research. Refer for complete disclosures:- Terms of Use · Full Disclaimer

Related nuggets

Jun 2, 2026

AI Is Escaping the Screen | 01 Jun 2026 | Coatue

Coatue : AI is entering a new phase: moving beyond digital tools and into fully autonomous systems operating in the physical world. From advanced manufacturing and surgical robotics to robots in the home, the next wave of innovation will b…

Jun 2, 2026

Kalshi Monthly Volume - Politics ($M) | Chart of the Day | Coatue

Coatue: Kalshi's political volume has scaled dramatically, and the American Power Index KPOW is what that scale enables: a single number gauge of the current balance of political power and where markets expect it to move, which Kalshi bill…

Jun 2, 2026

The BlackBerry Problem |18 May 2026 | The Mistakes Series | Malcolm Gladwell's Revisionist History

"My mistake and naivity was to think that people are were with me so you're flying around the world you're trying to get people on side and you think they're on side but they're not mhm mhm and you get blindsight" Jim Balsillie 00:01:34 ht…

Jun 2, 2026

Partnership Perspectives: Network International | 2 Jun 2026 | Brookfield Perspectives

Actions

Reading

Published
March 31, 2026
Read time
9 min read
Progress0%

"we only implemented GSM 8K so far but even in GSM 8K you see a 10% swing even on different points on a curve for the same hardware really because something they did on the curve something they did on the kernels" - Dylan Patel [00:06:06]

"i love open source models but like stop being poor just pay Claude pay anthropic pay daddy" - Dylan Patel [00:07:06]

"more tokens per second is more more intelligence yeah more intelligence per second and more intelligent per second is more revenue eventually even though it's mostly just a cost" - Dylan Patel [00:07:41]

"he spent five minutes on our slide said my name he only said two people's names in the entire keynote mine and the Open Club bro" - Dylan Patel [00:08:06]


Speakers & Credentials [00:00:06]

  • Dylan Patel: Chief Analyst at SemiAnalysis, host of the conversation, and key contributor to the open-source "Inference X" project. He is a leading voice in AI hardware, semiconductor economics, and model deployment infrastructure.
  • Waleed Atallah: Founder/Executive at Makora (formerly Mako). His expertise lies in high-performance computing, kernel generation, and building end-to-end infrastructure platforms that leverage advanced LLMs for low-level code compilation and validation.

1. Executive Summary [00:00:06]

  • The conversation centers on a massive paradigm shift in AI tooling: because raw code-generation capabilities have become highly proficient, startups must transition from selling code-generation as a product to building comprehensive workflow platforms around it.
  • Makora’s strategic pivot exemplifies this trend, moving up the stack to assume LLM proficiency (like Claude Code and Opus 4.6) as a permanent tailwind rather than the core offering.
  • Concurrently, Dylan Patel outlines critical insights from the Inference X project, revealing how hidden low-level kernel optimizations—designed to push models along the latency-throughput Pareto curve—can directly alter benchmark intelligence, causing up to a 10% variance in GSM 8K scores on identical hardware.
  • The briefing concludes with a strict economic philosophy for AI startups: default to closed-source frontier models and pay for maximum compute ("Fast Mode"), because generating more tokens per second equates to faster iterative cycles, superior intelligence, and ultimately, greater long-term revenue.

2. Chronological Table of Contents [00:00:06]

  • [00:00:06] Introduction at GTC & Makora's Strategic Pivot
  • [00:01:50] The Evolution of Kernel Generation and AI Coding Platforms
  • [00:03:11] Inference X, Pareto Curves, and Automated Model Configuration
  • [00:05:29] Infrastructure Partnerships & The Hidden Cost of Kernel Optimization on Evals
  • [00:06:59] Open Source vs. Closed Source Models & The "Fast Mode" Philosophy
  • [00:08:00] GTC Keynote Reflections & Conclusion

3. Detailed Thematic Summary

The Shift from Code Gen to Platforms [00:00:06]

  • Makora (formerly Mako) has executed a strategic pivot to move higher up the software stack [00:00:32].
  • Historically, the company focused on improving the quality of kernel generation and validation. However, as LLM capabilities have accelerated, approximately 80% of Silicon Valley startups are currently being forced to rethink their entire operational strategies [00:00:51].
  • The catalyst for this industry-wide shift is the sheer proficiency of models like Claude Code, Codex, and Sonnet 3. As a striking example of this raw power, the newly announced Opus 4.6 model was prompted to build a C compiler from scratch, which it successfully completed by utilizing roughly $20k in compute and token spend [00:01:17].
  • Consequently, raw code generation can no longer serve as a standalone product; companies must instead build comprehensive, end-to-end platforms that validate and deploy these kernels, treating advancing LLM capabilities as a permanent technological tailwind [00:02:30].

Inference X and the Automation of Pareto Curves [00:03:11]

  • The conversation transitions into the operational reality of model deployment through the lens of "Inference X", an open-source inference engine project aimed at achieving day-zero support for all new models [00:05:29].
  • The engineering team has aggressively expanded the project's capabilities, successfully doubling or tripling the number of supported models over a single month [00:04:02].
  • Generating optimized model configurations requires navigating complex latency-versus-throughput Pareto curves. Inference X automates this process by prompting Claude to generate all potential configurations from first principles, immediately cutting out illogical architectural setups (e.g., attempting to pipeline PP8 eight times) [00:04:22].
  • Current leading inference engines like vLLM and SGLang still rely heavily on a manual, "engineer-in-the-loop" process to optimize these models. By automating the culling of bad configs, Inference X drastically reduces the raw compute required to map the Pareto curve, pushing the industry closer to dynamic, automated model submission [00:04:46].

Kernel Optimizations, Evaluation Variance, and Compute Economics [00:05:29]

  • Inference X is supported by a massive coalition of infrastructure partners providing free GPU compute, including major industry players such as CoreWeave, Microsoft, Oracle, Nebius, Crusoe, TensorWave, AMD, and Nvidia [00:05:36].
  • A critical discovery in their validation engineering is that specific low-level kernel optimizations can drastically alter an LLM's apparent intelligence. Even running on the exact same hardware, shifting a model to a different point on the Pareto curve can result in a massive 10% swing in GSM 8K benchmark scores [00:06:06].
  • This variance strongly implies that when end-users accuse major labs (like OpenAI or Anthropic) of silently "downgrading" models under high server traffic, the degradation is actually due to engineers swapping to throughput-optimized kernels rather than intentionally modifying the core model weights [00:06:44].
  • When discussing the open-source landscape, Dylan expresses a staunch preference for closed-source frontier models. He notes that a minor 3 to 6-month capability lag at the core model level translates into a year or more of lost traction, customer feedback, and iterative product development [00:07:20].
  • To maintain a competitive edge, the prevailing strategy is to maximize token generation speed regardless of the immediate financial cost ("Fast Mode"). The framework dictates that more tokens per second yields more intelligence per second, which ultimately drives revenue faster than the burn rate [00:07:41].

GTC Keynote Triumphs [00:08:00]

  • The briefing concludes with reflections on the massive NVIDIA GTC Keynote, a highly visible industry event.
  • Dylan highlights a major professional milestone: NVIDIA CEO Jensen Huang spent exactly 5 minutes discussing SemiAnalysis's slide during the global presentation [00:08:06].
  • Furthermore, Jensen specifically named only two individuals during the entire keynote—Dylan Patel and the "Open Club bro"—cementing their analytical influence within the hardware and AI ecosystem [00:08:06].

The Reference Vault

4. Data & Figures [00:00:51]

Data PointValueContextTimestamp
Silicon Valley Startups Pivoting80%The estimated percentage of startups reconsidering their approach due to advanced coding models commoditizing raw generation.[00:00:51]
Cost to generate a C Compiler~$20kThe compute/token cost incurred by Opus 4.6 to autonomously build and debug a C compiler.[00:01:17]
Inference X Model Growth2x - 3xThe project doubled or tripled the number of supported models over roughly the last month.[00:04:02]
Evaluation Score Variance (GSM 8K)10%The swing in benchmark performance on the exact same hardware due strictly to different kernel optimizations along the Pareto curve.[00:06:06]

5. Core Frameworks & Mental Models [00:02:30]

  1. The "Platform Overlay" Imperative: [00:02:30] As foundational capabilities (like raw code generation) are commoditized by frontier models, startups cannot rely on the core generation as their moat. Instead, they must build end-to-end validation platforms and workflows that treat the underlying AI improvements as a persistent, compounding tailwind.
  2. The Intelligence-to-Revenue Conversion ("Fast Mode" Economics): [00:07:41] A framework arguing against cost-cutting on inference. It posits that spending heavily on faster compute translates directly into "more tokens per second," which equates to "more intelligence per second." This accelerated intelligence cycle allows for faster shipping, feedback, and revenue generation, easily offsetting the initial cloud expenditure.
  3. The Kernel-Evaluation Discrepancy: [00:06:06] A mental model for understanding model degradation. Model intelligence is not static to its weights; it is highly fluid based on low-level infrastructure. Optimizing kernels for throughput versus latency along a Pareto curve can alter mathematical reasoning capabilities (e.g., a 10% GSM 8K swing), completely changing how a model "feels" to the end user without any actual changes to the neural network itself.
  4. The Capability/Traction Time Dilation: [00:07:20] A strategic warning for builders: trying to save money by using open-source models that are only "3 to 6 months behind" state-of-the-art creates a compounding penalty. Because lesser models require more debugging and deliver worse user experiences, that 3-month model gap actually results in a 1-year (or more) delay in market traction and product development cycles.

6. Anecdotes [00:01:17]

  • Opus 4.6 Building a C-Compiler: [00:01:17] To illustrate how far coding capabilities have progressed, it was mentioned that the new Opus 4.6 model was tasked with creating a C compiler. The model ran autonomously, iterating and nitpicking its own bugs, ultimately racking up $20k in token costs but successfully completing an extraordinarily complex systems-engineering task.
  • The GTC Keynote Feature: [00:08:06] Highlighting the immense influence of SemiAnalysis, Dylan Patel recounted how NVIDIA CEO Jensen Huang spent five full minutes presenting their specific slide during the massive GTC Keynote, naming only Dylan and one other person ("the Open Club bro") in the entire presentation.

7. References & Recommendations [00:00:32]

  • Companies & Organizations: Makora (formerly Mako), SemiAnalysis, CoreWeave, Microsoft, Oracle, Nebius, Crusoe, TensorWave, AMD, NVIDIA, Anthropic, OpenAI, Hugging Face.
  • Models & Software Tools: Claude Code, Codex, Sonnet 3, Opus 4.6, GPT-5, PyTorch, YDP Disaggregated Pre-fill Decode, vLLM, SGLang.
  • Benchmarks: GSM 8K (Mathematical reasoning benchmark for LLMs).
  • Key People: Jensen Huang (NVIDIA CEO).

"Brookfield's the largest infrastructure owner in the world... We drew a pipeline and we showed all the different components of the payments ecosystem on a pipeline and said it's like a pipe that moves any commodity except what it's moving…

Open Source Capability Lag3-6 monthsThe time delay in open-source model capabilities compared to frontier closed-source models.[00:07:20]
Resulting Product Cycle Delay1+ yearsThe downstream delay in traction and product iteration caused by using inferior open-source models.[00:07:20]
GTC Keynote Feature Time5 minutesThe amount of time Jensen Huang spent dissecting SemiAnalysis's slide during the GTC Keynote.[00:08:06]