Session

The CXL Fabric End-Game: Bandwidth Realities and Networked Memory for AI Scale

Speakers

PJ Waskiewicz

Label

Moonshot

Session Type

Talk

Description

Compute Express Link, or CXL, has seen significant industry focus surrounding memory expansion devices, specifically utilizing CXL.mem to add capacity. As large-scale AI models continue to demand massive, distributed memory footprints, the conversation has naturally shifted toward using CXL.mem to architect network-attached memory pools.

However, current implementations gloss over a severe architectural bottleneck: bandwidth. The common industry assumption is that CXL memory access can simply be treated as a “far” NUMA node, implying that latency is the primary hurdle. While the latency is manageable, CXL link bandwidth can be orders of magnitude slower than native, multi-channel DDR speeds. This massive bandwidth-to-core disparity completely changes the economics and execution realities of scaling large AI models across distributed memory.

With CXL 2.0 implementations only recently introducing initial support for hardware-level memory pooling, much of the work to realize true memory-coherent clusters remains ahead of us. The focus of this talk is to look past simple memory expansion and evaluate what is required to achieve native cluster execution across fully memory-coherent CXL fabrics. The presentation will outline the “moonshot” requirements necessary to bridge this gap, detailing the critical architectural work and future integration needed across switching infrastructure, firmware, and the Linux kernel networking and memory subsystems.