Wednesday, December 18, 2024

Creating liberating content

Realme 12X 5G Tipped...

The Realme 12x 5G was launched by Realme last week in China. The...

iQOO will launch a member...

iQOO Neo 10 series's new member will feature SDG3 SoC In April, iQOO is...

Samsung Galaxy A35 and...

Samsung Galaxy A35 and A55 Specs and featuresRelated Samsung released the Galaxy A35 and...

Motorola confirms upcoming smartphone...

Motorola has begun to tease the release of its next smartphone. It is...
HomeTech NewsAmpere Introduces 192-Core...

Ampere Introduces 192-Core CPU and Controversial Benchmarks

Ampere’s new AmpereOne CPU includes 192 cores and an entirely new microarchitecture.

CPU

Ampere announced its AmpereOne processors for cloud datacenters this week, which are the industry’s first general-purpose CPUs with up to 132 processors that can be utilised for AI inference.

The new chips consume more power than their predecessors, Ampere Altra (which will remain in Ampere’s stable for the foreseeable future), but the company claims that, despite higher power consumption, its processors with up to 192 cores provide higher computational density than AMD and Intel CPUs. Some of their performance claims may be debatable.

192 Custom Cloud Native Cores

Ampere’s AmpereOne processors have 136–192 cores (compared to 32–128 cores for Ampere Altra) that operate at up to 3.0 GHz. They are based on the company’s custom Armv8.6+ instruction set architecture and feature two 128-bit vector units that support FP16, BF16, INT16, and INT8 formats. Additionally, each core has a 2MB of 8-way set associativity L2 cache The SoC furthermore contains a 64MB system level cache in addition to L1 and L2 caches. The new CPUs are rated for 200W to 350W, up from 40W to 180W for the Ampere Altra, depending on the particular SKU.

The company asserts that its new cores have been further optimised for cloud and AI workloads and have ‘power and are efficient’ instructions per clock (IPC) gains, which most likely refers to higher IPC (compared to Arm’s Neoverse N1 used for Altra) without a discernible increase in power consumption and die area. Speaking about die area, Ampere does not provide any information on it but does state that the AmpereOne is produced using a TSMC 5nm-class manufacturing process.

Although Ampere does not reveal all of the details about its AmpereOne core, it does say that it includes a highly accurate L1 data prefetcher (which reduces latency, ensures that the CPU spends less time waiting for data, and reduces system power consumption by minimising memory accesses), refined branch misprediction recovery (which reduces latency and wastes less power the sooner the CPU can detect and recover from a branch misprediction), and sophisticated memory disambiguation.

While the list of AmpereOne core architectural upgrades appears short on paper, these enhancements can greatly increase performance and needed extensive study (e.g., which factors slow down the performance of a cloud datacenter CPU the most?) takes a lot of effort to put ideas into action.

I/O and Advanced Security

Because the AmpereOne SoC is designed for cloud datacenters, it offers adequate I/O such as eight DDR5 channels for up to 16 modules supporting up to 8TB of memory per socket, 128 lanes of PCIe Gen5 with 32 controllers, and x4 bifurcation.

Reliability, availability, serviceability (RAS), and security aspects are also required in datacenters. To that aim, the SoC fully supports, to mention a few, ECC memory, single key memory encryption, memory tagging, secure virtualization, and layered virtualization. AmpereOne also contains a number of security features, such as crypto and entropy accelerators, speculative side channel attack mitigation, ROP/JOP attack mitigation, and so on.

Curious Benchmark Results

Without a question, Ampere’s AmpereOne SoC is an outstanding piece of silicon intended to tackle cloud workloads and sporting the industry’s first 192 general-purpose cores. Ampere, on the other hand, employs quite unusual benchmark results to establish its views.

Ampere’s key advantage is the compute density of its AmpereOne. A 42U 16.5kW rack filled with 192-core AmpereOne SoC-based 1S machines can support up to 7926 virtual machines, while a rack powered by AMD’s 96-core EPYC 9654 ‘Genoa’ CPUs can handle 2496 VMs and a rack powered by Intel’s 56-core Xeon Scalable 8480+ ‘Sapphire Rapids’ CPUs can handle 1680 VMs. In the 16.5kW power budget, this comparison makes a lot of sense.

However, 42U rack power density is increasing, and exascalers such as AWS, Google, and Microsoft are prepared for this, especially for their performance-demanding applications. According to an UpTimeInstitute 2020 poll, 16% of firms adopted conventional 42U racks with rack power density ranging from 20kW to above 50kW. As AMD’s newest and previous-generation CPUs improved their TDPs relative to their predecessors, the number of typical installations with 20kW racks has climbed, not reduced.

Ampere compares systems based on AMD’s 96-core EPYC 9654 CPU with 256GB of memory (meaning that it worked in an eight-channel mode, not the 12-channel mode that is supported by Genoa) to show the advantages of its 160-core AmpereOne-based system with 512GB of memory running Generative AI (stable diffusion) and AI Recommenders (DLRM). Ampere-based machines generated over 2X as many queries per second for AI recommendations and 2.3X as many frames per second for generative AI.

Ampere compared the efficiency of their systems in this scenario, which crunched data with an FP16 precision, whereas AMD-based computers calculated with an FP32 precision, which is not an accurate comparison. Additionally, a lot of FP16 workloads are being executed on GPUs rather than CPUs, and massively-parallel GPUs frequently deliver outstanding performance for workloads including generative AI and AI recommendations.

Summary

The AmpereOne general-purpose CPUs from Ampere are the first of their kind in the industry and have up to 192 cores. Strong I/O capabilities, cutting-edge security measures, and enhanced instructions per clock (IPC) increases are further aspects of these CPUs. They can also handle AI tasks with FP8, INT8, BF16, and FP16 precision.

But when it comes to benchmark results, the corporation choose to utilise certain dubious ways to support its claims, which puts some doubt on its accomplishments. Having said that, it will be particularly fascinating to watch the outcomes of unbiased tests of servers based on AmpereOne.

 

 

Get notified whenever we post something new!

Continue reading

Realme 12X 5G Tipped to Launch in India Soon

The Realme 12x 5G was launched by Realme last week in China. The Realme 12x 5G sits lower than other current models, such as the Realme 12 5G and 12+ 5G. There are multiple rumors that the smartphone will...

iQOO will launch a member of the Neo 10 series featuring a Snapdragon 8 Gen3 chipset.

iQOO Neo 10 series's new member will feature SDG3 SoC In April, iQOO is planning to release a new Z series of smartphones in the domestic market of China. The newly released will feature the Snapdragon 8s Gen 3 processor,...

Samsung Galaxy A35 and Galaxy A55 have best displays in the price range: DxOMark

Samsung Galaxy A35 and A55 Specs and featuresRelated Samsung released the Galaxy A35 and A55 smartphones worldwide earlier this week. DxOMark, a well-known authority on camera and display tests, gave both devices good ratings soon after they were released. To top...