Supercomputers represent the pinnacle of computing power, able to crunch through data-intensive simulations beyond the capabilities of everyday computers. But supporting these massive digital workloads requires remarkably powerful supercomputer chips customized for parallel processing across thousands of cores.
In this comprehensive guide, we’ll examine 5 record-breaking supercomputer processors powering the world’s fastest machines, delve into their cutting-edge architectures optimized for high performance computing, and ponder the mind-blowing capabilities unlocked by their elite computational capacities.
The Making of a Supercomputer Chip
Creating computer chips for this elite echelon of supercomputers necessitates engineering specialized designs and leveraging leading-edge manufacturing techniques to achieve unmatched levels of throughput, efficiency and scalability.
Several key considerations influence supercomputer processor architecture:
Parallelism
Running thousands of calculations simultaneously requires massively parallel chips boasting high core counts and interconnects between cores including:
- Many-core designs maximizing the number of simpler cores over complex cores for parallelism at scale. Top supercomputer chips pack in hundreds of cores.
- Ultra high-bandwidth links like AMD’s Infinity Fabric and Intel’s Ultra Path Interconnect enable low latency communication between CPU cores but also memory, network and accelerators.
- NVLink provides another fast interconnect from NVIDIA for attaching GPUs, essential for accelerated computing.
Specialized Processing
While consumer PC chips focus on single thread performance, supercomputer chips instead optimize for:
- High floating point performance essential for scientific computations
- AI acceleration features like bfloat16 throughput important for machine learning
- Wide vector pipelines and instruction sets including AVX-512 critical for parallel vector calculations
Energy Efficiency
With supercomputers consuming megawatts of power across thousands of chips, efficiency is paramount:
- Large numbers of slower, simpler cores provide more operations per watt versus complex, power-hungry cores
- Optimized memory subsystems reduce expensive data movement
- Advanced manufacturing nodes scale voltage along Moore’s Law trajectory
Now let’s profile 5 examples of these impressive supercomputing processors powering the world’s most capable HPC rigs.
AMD EPYC 7A53
Advancing processor technology relies not just on cutting-edge architectures but also continual manufacturing innovations to pack more performance into tiny silicon features.
AMD’s latest EPYC datacenter chips leverage industry-leading TSMC 5nm fabrication to double performance per socket over the previous 7nm generation. The flagship EPYC 7A53 soars to 96 high throughput Zen 4 cores while adding acceleration for growing HPC workloads like AI and analytics.
- 96 cores / 192 threads
- Up to 4.5 GHz boost freqs
- 384MB L3 cache + 32MB L2 cache per chip
- 32 DDR5 memory channels
- 228W TDP
- 12 x 16-lane PCIe Gen 5 lanes = 192 lanes
The 7A53 powers AMD’s latest generation EPYC “Bergamo” lineup, delivering up to 128 cores for incredible parallel throughput. AMD couples Bergamo’s brute force muscle with optimized packaging in the form of TSMC’s 3D V-Cache stacking, adding another 64MB L3 atop each chip for rapid access to data.
All this bleeding-edge silicon design comes together in the new Frontier exascale supercomputer at Oak Ridge National Laboratory. Harnessing over 9,000 EPYC nodes and brewing HPC workloads with advanced AI, Frontier recently crushed the LINPACK benchmark to seize the title of world’s fastest at 1.1 exaflops.
Fujitsu A64FX
While AMD leads the latest Top500 supercomputer rankings, the now second-place Fugaku system featuring Fujitsu’s A64FX chip held the top spot not long ago thanks to its custom high performance architecture tailored for massively parallel workloads.
Fujitsu packs 48 relatively simple cores based on the scalable vector extension (SVE) into each A64FX node. By emphasizing high memory bandwidth and low latency access, Fujitsu optimized data movement across cores to speed parallel computing rather than relying on complex branch prediction logic.
- 48 SVE cores
- 2.2 GHz base, (3.6 boost)
- 32GB HBM2 memory
- 1 TFLOPS double precision performance
- High memory bandwidth (1024GB/s)
- Energy efficient at 500 MFLOPS/watt
Over 158,000 Fujitsu A64FX nodes deliver 19.8 petaflops of 64-bit floating point muscle and 98.8 petaflops at half precision to push key HPC benchmarks. While each A64FX core lags leading CPUs, Fugaku’s insane node count and system scalability showcase software innovations over just beefy silicon.
Researchers in Japan relied on Fugaku’s tailored architecture for breakthroughs in COVID-19, disaster mitigation and other critical scientific fields.
IBM POWER9
While manufactures like AMD push x86 performance to new heights, IBM’s POWER architecture offers its own twist on enterprise and scientific computing. POWER designs focus on accelerating data-first workloads with leading I/O throughput.
IBM’s POWER9 packs cores with abundant cache and bandwidth. Each chip delivers up to 24 high speed cores leveraging simultaneous multithreading (SMT8) for up to 192 hardware threads per socket. Cranking up both core counts and clocks to 4GHz with aggressive turbo, POWER9 rates 500 to 750 GFLOPS double-precision performance per core.
- Up to 24 cores / 192 threads (SMT8)
- 4.0 GHz / 5.2 GHz turbo frequencies
- 120MB shared L3 cache per chip
- Tuned for high memory bandwidth
- 8 memory channels supporting 4.5 TB/s
- 7 Tbps direct links between POWER9 CPUs and NVIDIA Volta GPUs
- Enhanced vector capabilities (2x VSX8 with quad FMA)
Thousands of IBM’s POWER9 processors power elite supercomputers like Summit and Sierra, ranking 2nd and 3rd on the latest Top500. With over 200 petaflops each, these machines combine scale-up POWER9 nodes and high speed NVLink to NVIDIA GPUs for optimum data-intensive computing.
Sunway SW26010
While Sunway‘s SW26010 chip may seem an underdog by specs alone, the magic comes from sheer scale – over 10 million cores in the Sunway TaihuLight supercomputer.
The SW26010 capitalizes on a “many core” design packing an incredible 260 simpler RISC cores onto each chip, adding four auxiliary processing cores for scheduling and management. While each core operates at a modest 1.45 GHz using older manufacturing technology, together they provide parallel throughput for certain supercomputing workloads rivaling processors with the latest advanced features.
- 260 cores + 4 auxiliary cores = 264 total per chip
- 1.45 GHz clock speed
- Shared distributed memory
- 16GB memory per chip
- 1.34PB aggregate memory distributed across system
- Customized high-speed interconnects
- Full-bandwidth cross chassis links
- Specialized topology minimizing latency between nodes
- 400 petaflops High Performance Linpack
By scaling over 40,000 homegrown SW26010 processors into its massively parallel architecture, Sunway TaihuLight achieves impressive supercomputing performance despite relying on older generation technology. While its 10-million-plus core count recently dropped in global HPC rankings, this massive system showcases the benefits of purpose-built architectures for targeted workloads.
Intel Xeon Scalable E5-2692v2
Today AMD and IBM’s chips may claim more TOP500 systems, but Intel remains a juggernaut delivering widespread HPC solutions combining performance upticks from CPU enhancements with software innovations across its Xeon stack.
The E5-2692v2 hails from Intel’s Ivy Bridge generation, leveraging 22nm process technology to cram 12 cores onto a high performance server-targeted die for 240 GFLOPS double precision throughput per chip. While it has since been superseded by newer Intel architectures, thousands of E5-2692v2 chips still power China’s 4th ranked Tianhe-2A performing over 33 petaflops.
- 12 cores / 24 threads
- 24 MB shared L3 cache
- Quad DDR3 memory channels
- 2.2GHz base / 3.0GHz turbo freq
- 16 lanes PCIe Gen2 provides low latency add-in connections
- Advanced vector extensions (AVX)
By tuning the E5-2692 v2 to scale efficiently across fat nodes each with 24 DIMMS (384GB) of high speed memory, Tianhe-2A leverages Intel’s mature processor platform as a balanced HPC building block.
Benchmarking competitive merit between processors challenges even industry veterans. Weighing microarchitectures and implementations designed for radically different systems complicates direct comparisons. A mix of metrics provides perspective:
LINPACK HPL
The gold standard for ranking supercomputer performance remains High Performance Linpack (HPL), modeling solutions for dense linear equations that translate to FLOPS throughput. AMD’s Frontier recently topped the HPL charts at 1.1 EFLOPS.
System | Chip / Node | Power (MW) | HPL Rmax (PFLOPS) | Power Efficiency (GFLOPS/W) |
---|---|---|---|---|
Frontier | AMD EPYC 7A53 | 52 | 1,102,003 | 21,192 |
Fugaku | Fujitsu A64FX | 28 | 537,212 | 19,183 |
Summit | IBM POWER9 + NVIDIA Volta | 13 | 200,795 | 15,446 |
Theoretical Peak Performance
We can also compare the theoretical computational performance based on chip specs like core counts and clocks. Note that theoretical peaks depend heavily on problem types and software efficiency actually achieving this roofline limit.
Chip | Top Speed (GHz) | DP Cores | Peak DP TFLOPS |
---|---|---|---|
AMD Epyc 7A53 | 4.5 | 96 | 864 |
Fujitsu A64FX | 3.6 | 48 | 829 |
IBM Power9 | 5.2 | 24 | 393 |
Real Application Performance
Custom benchmarks like High Performance Conjugate Gradients (HPCG) run actual simulations for better application-based projections. Benchmarking specific domain workloads directly on systems allows precise projections tailored to customer needs.
Timeline of Top Supercomputer Chips
Tracing the silicon engines powering #1 systems shows the relentless drive to advance HPC hardware.
![Timeline of TOP500 Chips](https://www.researchgate.net/profile/Julien-Jaeger/publication/343691516/figure/fig3/AS:949526988861448@1603402984901/Roadmap-of-HPC-systems-and-their-respective-CPUs-The-timeline-highlights-the– TOP500.png)
These elite supercomputing processors power incredible breakthroughs by governments, academics and businesses analyzing massive datasets, running complex models and advancing bleeding edge technologies from personalized medicine to fusion energy.
Continued exponential progress satisfying the world’s unrelenting hunger for ever greater computing power relies on new innovations across hardware, system architectures and software programming environments.
Balancing performance gains with economic viability and environmental sustainability also grows more crucial in this industry where leading rigs consume many megawatts of power.
With so many technology, business and ethical factors accelerating high performance computing into the future, one thing is clear – we’ll need many more generations of brilliant supercomputer chip designs like these profiled here!