As you read this on your phone or laptop, spare a thought for the tiny slivers of silicon that power them. Computer chips, or integrated circuits, are technological marvels — fitting millions of components into fingernail-sized pieces of semiconducting material.
And they just keep getting denser year after year, doubling transistor counts around every two years. This trend of exponential growth is known as Moore‘s law, named after Intel co-founder Gordon Moore. But eventually, physicists assure us, Moore‘s law must end as we reach the physical limits of how small transistors can get.
The Need for Speed
So what happens when you go the other way — making chips bigger instead of smaller? That‘s the radical experiment underway at Silicon Valley startup Cerebras Systems. They‘ve created the world‘s largest computer chip, the Wafer-Scale Engine (WSE). At 215 square centimeters, it‘s bigger than an iPad mini!
Don‘t plan on slotting this into your iPhone anytime soon. The WSE is intended for specialized artificial intelligence workloads, crunching unprecedented volumes of data for today‘s machine learning models.
Let‘s dive deeper into Cerebras‘ huge chip and the engineering feats making it possible. What does a computer chip the size of your face enable? And could wafer-scale designs be the next evolution of Moore‘s law? I‘ll breakdown everything you need to know about the planet‘s biggest chip.
The Need for Specialized AI Hardware
First, why build such an enormous processor in the first place? Today artificial intelligence is powering breakthroughs in fields from drug discovery to self-driving cars. But training complex AI models requires insane amounts of mathematical operations and data.
We‘re talking billions or trillions of parameters. Internet giants like Google and Microsoft invest heavily in specialized AI supercomputers made up of thousands of graphics processing units (GPUs) working in parallel.
But all those discrete GPU chips still aren‘t efficient for model training. There’s overhead in moving data back and forth between separate chips. Heat is another major limiting factor when GPUs and CPUs are packed tightly together. Plus you need an entire server rack just to house all that hardware!
Engineers at Cerebras Systems knew there had to be a better approach. They realized the answer was a single enormous chip, bigger than any processor ever manufactured. That became the Wafer-Scale Engine.
Introducing the Wafer-Scale Engine
The second-generation WSE packs 850,000 cores and 2.6 trillion transistors onto its 46,225 square millimeter surface. For comparison, Nvidia‘s powerful A100 GPU used for AI acceleration has just 54 billion transistors!
And note Cerebras achieves that density using larger manufacturing nodes than leading-edge chips from TSMC. The current WSE is built on TSMC‘s 7nm node, showing the strength of their design.
The Wafer Scale Engine compared to the giant Fujitsu processor and an average CPU.
(Image credit: Cerebras)
Specification | Wafer-Scale Engine |
---|---|
Die Size | 46,225 mm2 |
Manufacturing Process | TSMC 7nm |
Transistor Count | 2.6 trillion |
On-Chip Memory | 40GB High Bandwidth SRAM |
Cores | 850,000 Sparc v9 Cores |
Interconnect | Swarm Communication Fabric |
Peak Performance | 9.6 PetaFLOPS FP32 |
Key technical specs for the 2nd generation WSE
So what does a trillion-plus transistor budget enable? Each of those 850,000 cores provides execution resources to run model training workloads in parallel. Backing this computational muscle is a whopping 40GB of on-chip SRAM memory. And it feeds data to the cores at lighting-fast 20 petabytes per second! In computer terms, the WSE is an absolute beast.
No surprise — this monster chip consumes a lot of electricity with its 220 watt TDP. Cerebras’ cooling system uses dielectric fluid immersion just like some supercomputers. This lets every ounce of power drive AI workloads rather than being lost as waste heat. Pretty clever!
Now you might be wondering — with that huge size how do they manufacture just one flawless chip? Well, Cerebras partners with TSMC to leverage their advanced wafer fabrication abilities. TSMC guarantees extremely low defect densities for a chip this large. And Cerebras adds redundancy by design, allowing the WSE to route around any minor flaws that do occur. Together this makes such a giant processor commercially viable.
The key to the WSE‘s blazing speed is the Swarm interconnect fabric etched onto the silicon wafer itself. This network ties all 850,000 cores together for rapid, low-latency communication critical to distributed training algorithms. Information on gradient updates can ripple almost instantly across the entire chip surface without bottlenecking. And adjacent SRAM banks feed nearby computation units with minimal delay or energy costs. It‘s all densely packed together by design.
Real-World Performance Advantages
Of course, all these flashy specs don‘t mean much on their own. The proof is in the pudding — can the WSE accelerate actual AI development?
The answer is a resounding yes! Cerebras published benchmarks showing the CS-2 server with a single WSE training popular natural language understanding models 80x faster than an Nvidia DGX-A100 server. We‘re talking reducing training cycles from weeks to minutes in some cases!
Where does this tremendous performance advantage come from? It mainly stems from the WSE minimizing data movement latencies. There‘s no need to shuttle data between separate DRAM, SRAM caches, and external GPUs. Instead, inputs flow quickly and directly into the mathematics engines thanks to the united architecture.
Plus the unified 40GB SRAM acts as a lightning-fast buffer pool feeding all those parallel cores simultaneously. In essence, Cerebras achieves speedups by maximizing density of communication between components that previously were discrete pieces. That monolithic approach is the antithesis of conventional server chips from Intel, AMD, and others relying on divided resources and external connections.
When optimized well, integration wins.
Optimized for Modern AI Workloads
Now there is a downside to going against the modular computing grain – reduced flexibility. The WSE is purpose-built for AI and machine learning rather than general-purpose use. But given how quickly artificial intelligence workloads are growing, being high performance yet specialized makes sense.
Plus customers can mix and match other processors to handle alternate tasks.
In terms of target applications, natural language processing is low-hanging fruit. Whether it‘s parsing search queries or analyzing social media conversations, NLP powers today‘s most popular digital experiences. And Symbolic AI for more contextual understanding of language is an emerging field needing lots of model training cycles.
Medical imaging diagnostics via computer vision neural networks is another promising use case. Healthcare companies want to flag abnormalities in scans rapidly to aid doctors. Drug designers are also harnessing AI to discover new molecular combinations. Wherever enormous datasets meet huge models, the WSE looks to accelerate exploration.
Ecosystem Partners Maximizing Impact
Cerebras collaborates closely with industry leaders to ensure their wafer-scale systems integrate optimally for real-world applications.
The CS-2 with WSE-2 accelerators ships with Canonical‘s Ubuntu Linux distribution. This delivers compatibility with all major AI software frameworks like TensorFlow and PyTorch. Partners can build models leveraging cutting-edge techniques out-of-the-box.
On the workflow side, Cerebras has validated their solution with container orchestrators like Docker Swarm. These streamline sharing trained models into production without dependencies issues. And KubeFlow adds automated MLops for monitoring ongoing model quality over time.
Initial customers span industries from automotive to pharmaceuticals. For example, Jaguar Land Rover uses Cerebras to accelerate design simulations and synthetic data generation powered by neural networks. In drug discovery, startups like Atomwise need vast experimentation – trying new molecular structures guided by AI predictions. That demands data throughput and model capacity only possible with systems like the CS-2 based on the WSE architecture.
Supercomputing Scale AI with Andromeda
Now one Wafer-Scale Engine is powerful by itself. But like CPUs and GPUs, multiple WSEs can interconnect for additive gains training ever-larger models. This is the idea behind Cerebras‘ latest creation Andromeda – grouping 16 of their flagship chips into one AI appliance with legitimate supercomputer performance.
Unveiled in late 2021, Andromeda already claims the top spot on MLPerf, a leading industry benchmark for machine learning hardware. Again outperforming racks of Nvidia‘s flagship A100 GPUs. How? By minimizing communication delays between WSE chips compared to traditional network links between compute nodes.
Cerebras achieves this using a high-bandwidth, low-latency Swarm fabric. So Andromeda isn‘t just 16 isolated WSE‘s. It pools them into one coherent system with tight data orchestration between chips. Optimized memory hierarchies keep adjacent components fed with fresh data while avoiding duplication. It‘s akin to a hive mind!
Ultimately Andromeda promises to accelerate product timelines for partners in pharmaceutical, manufacturing, and research domains relying on AI algorithms. These customers have valuable enterprise data yet lack internal expertise to translate it into insights via custom models. By handling the intensive model building exercise, Cerebras hopes to speed up their real-world impact.
Economic Impact
As a private company, Cerebras does not disclose detailed financials publicly. However, we can analyze estimated market size and growth to model their potential revenue opportunity.
Globally, spending on AI systems hardware and software is projected to surpass $500 billion annually by 2024 according to IDC. Cerebras seems poised to capture a fraction of this exploding pie servicing clients unable to adopt AI otherwise.
Let‘s conservatively assume Cerebras garners 2% share over the next 5 years. That would equal over $5 billion in cumulative sales by 2027. And with expanding profit margins on their premium systems, Cerebras may realize $500 million in cumulative net income in the same period.
At similar growth stage hardware vendors, valuation multiples of 6x revenue are common. That puts Cerebras on a path well above their last private valuation of $2.5 billion. I expect they will delay going public until crossing $100 million in annual recurring revenue. But they have a war chest now above $750 million to invest in growth until then.
Of course the key business metric to watch will be case studies proving AI projects accelerated on Cerebras have material ROI. If they can consistently demonstrate 50%+ productivity gains, customer adoption should soar. Saving months of computing time does wonders for a pharmaceutical firm‘s bottom line!
Comparisons with Top Supercomputer Chips
While the WSE stands alone in the AI accelerator market, it‘s interesting to compare its specs versus traditional supercomputing processors. These chips power the world‘s fastest Top500 supercomputers applying simulation and modeling to scientific challenges.
AMD‘s newest EPYC "Genoa" CPU packs an impressive 96 cores and 600 million transistors built on a 5nm process. Yet still dwarfed by the WSE‘s stats. Nvidia‘s flagship H100 GPU for data centers announced in 2022 likewise seems small beside Cerebras‘ single-chip offering – just 80 billion transistors and a still tiny 814 mm2 die size.
When it comes to maximizing cores and memory bandwidth at scale, Cerebras‘ Wafer-Scale Engine stands supreme. Of course wafer yields drop dramatically over 400 mm2 which restricts both AMD and Nvidia‘s chip dimensions. But the WSE shows what becomes economically feasible for specialized workloads in exchange for supercomputing-class cooling requirements.
The Road Ahead
Cerebras has aggressive plans to push wafer-scale computing forward over the next several years. They aim to enhance the core WSE architecture while expanding product options.
For the Wafer-Scale Engine itself, roadmaps call for new iterations with refined manufacturing processes, more cores, and memory capacity increases every 12-18 months. This follows the tempo of past Intel, Nvidia, and AMD datacenter chips riding smaller silicon geometries to higher transistor budgets.
Around the WSE, Cerebras plans new systems configurations to broaden the customer universe. Today only major corporations or well-funded research labs can afford a full-blown CS-2 setup with dedicated cooling infrastructure. By offering shared cloud access to Andromeda-like superclusters, Cerebras could democratize AI acceleration for small and mid-sized organizations. I expect them to offer this HPC-style option within 2 years.
Perhaps most revolutionary, Cerebras is pioneering multi-wafer packaging to interconnect several WSE dies. Today they bond silicon wafers to printed circuit boards carrying power regulators and data communication links around the core logic.
But tomorrow Cerebras envisions directly fusing multiple full-reticle sized chips for additive computing resources. This wafer-wafer-scale technique could enable connecting 4 or even 9 WSE‘s into one system with immense model capacity and training speed. Watch for those exotic multi-tile parts enabling new AI capabilities that push silicon to the very edge!
The Future of Wafer-Scale Computing
Looking ahead, innovations like the WSE and Cerebras‘ wafer-based approach could disrupt hardware marching steadily smaller per Moore‘s law. There are signs Moore‘s law is plateauing as fab foundries shove more transistors into diminishing space on 300mm wafers. Costs are skyrocketing too – new plants can run $20 billion plus!
Maybe expanding outward on oversized chips managing heat with immersive cooling is an alternate path. It certainly seems viable for select use cases like AI and HPC workflows needing tightly coupled communication across cores. Though I don‘t expect 500mm+ silicon to slot into iPhones anytime soon!
Ultimately pioneering experiments like the WSE push computing capabilities forward through manufacturing and architectural advances. What we learn designing and optimizing these exotic chips may well trickle down to influence commercial processors 5-10 years from hence.
Moore‘s law has amazed before thanks to determined engineers – don‘t be shocked if wafer-scale computing amazes in the future too!
I hope you‘ve enjoyed peering inside the mind-blowing Wafer-Scale Engine powering a new era of artificial intelligence. What wild inventions do you think could be on the horizon 20 years from now? Share your predictions with me on Twitter @tech_explainer. And don‘t hesitate to ask this chip expert any lingering questions!