The 5 Most Powerful Supercomputer Chips

Supercomputers represent the pinnacle of computing power, able to crunch through data-intensive simulations beyond the capabilities of everyday computers. But supporting these massive digital workloads requires remarkably powerful supercomputer chips customized for parallel processing across thousands of cores.

In this comprehensive guide, we’ll examine 5 record-breaking supercomputer processors powering the world’s fastest machines, delve into their cutting-edge architectures optimized for high performance computing, and ponder the mind-blowing capabilities unlocked by their elite computational capacities.

The Making of a Supercomputer Chip

Creating computer chips for this elite echelon of supercomputers necessitates engineering specialized designs and leveraging leading-edge manufacturing techniques to achieve unmatched levels of throughput, efficiency and scalability.

Several key considerations influence supercomputer processor architecture:

Parallelism

Running thousands of calculations simultaneously requires massively parallel chips boasting high core counts and interconnects between cores including:

Many-core designs maximizing the number of simpler cores over complex cores for parallelism at scale. Top supercomputer chips pack in hundreds of cores.
Ultra high-bandwidth links like AMD’s Infinity Fabric and Intel’s Ultra Path Interconnect enable low latency communication between CPU cores but also memory, network and accelerators.
NVLink provides another fast interconnect from NVIDIA for attaching GPUs, essential for accelerated computing.

Specialized Processing

While consumer PC chips focus on single thread performance, supercomputer chips instead optimize for:

High floating point performance essential for scientific computations
AI acceleration features like bfloat16 throughput important for machine learning
Wide vector pipelines and instruction sets including AVX-512 critical for parallel vector calculations

Energy Efficiency

With supercomputers consuming megawatts of power across thousands of chips, efficiency is paramount:

Large numbers of slower, simpler cores provide more operations per watt versus complex, power-hungry cores
Optimized memory subsystems reduce expensive data movement
Advanced manufacturing nodes scale voltage along Moore’s Law trajectory

Now let’s profile 5 examples of these impressive supercomputing processors powering the world’s most capable HPC rigs.

AMD EPYC 7A53

Advancing processor technology relies not just on cutting-edge architectures but also continual manufacturing innovations to pack more performance into tiny silicon features.

AMD’s latest EPYC datacenter chips leverage industry-leading TSMC 5nm fabrication to double performance per socket over the previous 7nm generation. The flagship EPYC 7A53 soars to 96 high throughput Zen 4 cores while adding acceleration for growing HPC workloads like AI and analytics.

96 cores / 192 threads
Up to 4.5 GHz boost freqs
384MB L3 cache + 32MB L2 cache per chip
32 DDR5 memory channels
228W TDP
12 x 16-lane PCIe Gen 5 lanes = 192 lanes

The 7A53 powers AMD’s latest generation EPYC “Bergamo” lineup, delivering up to 128 cores for incredible parallel throughput. AMD couples Bergamo’s brute force muscle with optimized packaging in the form of TSMC’s 3D V-Cache stacking, adding another 64MB L3 atop each chip for rapid access to data.

All this bleeding-edge silicon design comes together in the new Frontier exascale supercomputer at Oak Ridge National Laboratory. Harnessing over 9,000 EPYC nodes and brewing HPC workloads with advanced AI, Frontier recently crushed the LINPACK benchmark to seize the title of world’s fastest at 1.1 exaflops.

Fujitsu A64FX

While AMD leads the latest Top500 supercomputer rankings, the now second-place Fugaku system featuring Fujitsu’s A64FX chip held the top spot not long ago thanks to its custom high performance architecture tailored for massively parallel workloads.

Fujitsu packs 48 relatively simple cores based on the scalable vector extension (SVE) into each A64FX node. By emphasizing high memory bandwidth and low latency access, Fujitsu optimized data movement across cores to speed parallel computing rather than relying on complex branch prediction logic.

48 SVE cores
2.2 GHz base, (3.6 boost)
32GB HBM2 memory
1 TFLOPS double precision performance
High memory bandwidth (1024GB/s)
Energy efficient at 500 MFLOPS/watt

Over 158,000 Fujitsu A64FX nodes deliver 19.8 petaflops of 64-bit floating point muscle and 98.8 petaflops at half precision to push key HPC benchmarks. While each A64FX core lags leading CPUs, Fugaku’s insane node count and system scalability showcase software innovations over just beefy silicon.

Researchers in Japan relied on Fugaku’s tailored architecture for breakthroughs in COVID-19, disaster mitigation and other critical scientific fields.

IBM POWER9

While manufactures like AMD push x86 performance to new heights, IBM’s POWER architecture offers its own twist on enterprise and scientific computing. POWER designs focus on accelerating data-first workloads with leading I/O throughput.

IBM’s POWER9 packs cores with abundant cache and bandwidth. Each chip delivers up to 24 high speed cores leveraging simultaneous multithreading (SMT8) for up to 192 hardware threads per socket. Cranking up both core counts and clocks to 4GHz with aggressive turbo, POWER9 rates 500 to 750 GFLOPS double-precision performance per core.

Up to 24 cores / 192 threads (SMT8)
4.0 GHz / 5.2 GHz turbo frequencies
120MB shared L3 cache per chip
Tuned for high memory bandwidth
- 8 memory channels supporting 4.5 TB/s
- 7 Tbps direct links between POWER9 CPUs and NVIDIA Volta GPUs
Enhanced vector capabilities (2x VSX8 with quad FMA)

Thousands of IBM’s POWER9 processors power elite supercomputers like Summit and Sierra, ranking 2nd and 3rd on the latest Top500. With over 200 petaflops each, these machines combine scale-up POWER9 nodes and high speed NVLink to NVIDIA GPUs for optimum data-intensive computing.

Sunway SW26010

While Sunway‘s SW26010 chip may seem an underdog by specs alone, the magic comes from sheer scale – over 10 million cores in the Sunway TaihuLight supercomputer.

The SW26010 capitalizes on a “many core” design packing an incredible 260 simpler RISC cores onto each chip, adding four auxiliary processing cores for scheduling and management. While each core operates at a modest 1.45 GHz using older manufacturing technology, together they provide parallel throughput for certain supercomputing workloads rivaling processors with the latest advanced features.

260 cores + 4 auxiliary cores = 264 total per chip
1.45 GHz clock speed
Shared distributed memory
- 16GB memory per chip
- 1.34PB aggregate memory distributed across system
Customized high-speed interconnects
- Full-bandwidth cross chassis links
- Specialized topology minimizing latency between nodes
400 petaflops High Performance Linpack

By scaling over 40,000 homegrown SW26010 processors into its massively parallel architecture, Sunway TaihuLight achieves impressive supercomputing performance despite relying on older generation technology. While its 10-million-plus core count recently dropped in global HPC rankings, this massive system showcases the benefits of purpose-built architectures for targeted workloads.

Intel Xeon Scalable E5-2692v2

Today AMD and IBM’s chips may claim more TOP500 systems, but Intel remains a juggernaut delivering widespread HPC solutions combining performance upticks from CPU enhancements with software innovations across its Xeon stack.

The E5-2692v2 hails from Intel’s Ivy Bridge generation, leveraging 22nm process technology to cram 12 cores onto a high performance server-targeted die for 240 GFLOPS double precision throughput per chip. While it has since been superseded by newer Intel architectures, thousands of E5-2692v2 chips still power China’s 4th ranked Tianhe-2A performing over 33 petaflops.

12 cores / 24 threads
24 MB shared L3 cache
Quad DDR3 memory channels
2.2GHz base / 3.0GHz turbo freq
16 lanes PCIe Gen2 provides low latency add-in connections
Advanced vector extensions (AVX)

By tuning the E5-2692 v2 to scale efficiently across fat nodes each with 24 DIMMS (384GB) of high speed memory, Tianhe-2A leverages Intel’s mature processor platform as a balanced HPC building block.

Benchmarking competitive merit between processors challenges even industry veterans. Weighing microarchitectures and implementations designed for radically different systems complicates direct comparisons. A mix of metrics provides perspective:

LINPACK HPL

The gold standard for ranking supercomputer performance remains High Performance Linpack (HPL), modeling solutions for dense linear equations that translate to FLOPS throughput. AMD’s Frontier recently topped the HPL charts at 1.1 EFLOPS.

System	Chip / Node	Power (MW)	HPL Rmax (PFLOPS)	Power Efficiency (GFLOPS/W)
Frontier	AMD EPYC 7A53	52	1,102,003	21,192
Fugaku	Fujitsu A64FX	28	537,212	19,183
Summit	IBM POWER9 + NVIDIA Volta	13	200,795	15,446

Theoretical Peak Performance

We can also compare the theoretical computational performance based on chip specs like core counts and clocks. Note that theoretical peaks depend heavily on problem types and software efficiency actually achieving this roofline limit.

Chip	Top Speed (GHz)	DP Cores	Peak DP TFLOPS
AMD Epyc 7A53	4.5	96	864
Fujitsu A64FX	3.6	48	829
IBM Power9	5.2	24	393

Real Application Performance

Custom benchmarks like High Performance Conjugate Gradients (HPCG) run actual simulations for better application-based projections. Benchmarking specific domain workloads directly on systems allows precise projections tailored to customer needs.

Timeline of Top Supercomputer Chips

Tracing the silicon engines powering #1 systems shows the relentless drive to advance HPC hardware.

![Timeline of TOP500 Chips](https://www.researchgate.net/profile/Julien-Jaeger/publication/343691516/figure/fig3/AS:949526988861448@1603402984901/Roadmap-of-HPC-systems-and-their-respective-CPUs-The-timeline-highlights-the– TOP500.png)

These elite supercomputing processors power incredible breakthroughs by governments, academics and businesses analyzing massive datasets, running complex models and advancing bleeding edge technologies from personalized medicine to fusion energy.

Continued exponential progress satisfying the world’s unrelenting hunger for ever greater computing power relies on new innovations across hardware, system architectures and software programming environments.

Balancing performance gains with economic viability and environmental sustainability also grows more crucial in this industry where leading rigs consume many megawatts of power.

With so many technology, business and ethical factors accelerating high performance computing into the future, one thing is clear – we’ll need many more generations of brilliant supercomputer chip designs like these profiled here!