How memory bandwidth is killing AMD's 32-core Threadripper performance

Here's just how much memory bandwidth constraints might be hurting the performance.

Credit: Gordon Mah Ung

AMD's 32-core Threadripper 2990WX is the fastest consumer CPU ever sold. And let's be clear: We're in full agreement with anyone who said that. But we would also be the first ones to say it has its limitations, too. 

The most glaring is the lack of consumer applications that can truly exploit the cores available. The other limitation is apparent in the diagram below, which shows how AMD built this 32-core monster. Rather than a single chip with every single CPU core on it, AMD connects four dies using its high-speed Infinity Fabric.

Why memory bandwidth affects the 32-core Threadripper

If you look closer at the diagram, you can see that two of the dies don't have their own memory controllers or PCIe access. Instead, they have to talk to an adjacent CPU die.

It is, essentially, like having having a two-apartment unit where the second one must access the hallway outside by going through the first apartment.

2990wx die topology updated IDG

AMD says the four-die Threadripper has 25GB of bandwidth shared among all of the chips.

Perhaps more important is the overall bandwidth available. AMD had initially said the total bandwidth available between the four CPU dies was 25GBps bi-directional. The company amended its original documentation to state it was total bandwidth. Compare that with the 16-core Threadripper 2950X, with its 50GBps of bandwidth and two links between the two dies (also updated information from AMD.)

die top 2950x updated AMD

A two-die 16-core Threadripper 2950X has 50GBps and two links between two dies,vs. the 25GBps among four dies that AMD originally claimed (and then amended).

Many believe this is Threadripper 2990WX's main weakness: Lack of memory bandwidth per core is impacting it in memory-intensive tasks such as compression and encoding. Even worse for Threadripper 2990WX is that bandwidth has to be shared on a CPU with 14 more cores than Intel's Core i9-7980XE.

Below, you can see the result of Sandra 2018 Titanium's memory bandwidth test and the available bandwidth per core. As you can see, the bandwidth per core plummets from almost 5GB at 8-core and 16-core to just 2GB when you utilize all 32 cores. 

ryzen threadripper 2990wx sandra 2018 per core memory bandwidth IDG

Sisoft Sandra 2018 Titanium's per core memory bandwidth results say the Threadripper has only 2GB per core available.

Synthetic memory bandwidth tests are one thing. To dig further into performance in memory-intensive tests, we fired up the newest version of the free and popular 7-Zip application. Written by Igor Pavlov, this open-source compression and decompression utility is popular and generally awesome. For example, when I run tests on a laptop and decompress Cinebench R15.08 and its thousands of small files with Windows 10's built-in utility, it takes several minutes to finish. I can actually connect to the Internet, download 7-Zip, and decompress the contents of Cinebench R15.08 with it in less time than it takes the built-in Windows utility to do its thing.

The GUI version runs two tests, for compression and decompression. The overall score looks like a simple average of the two results.

What 7-Zip tests

You can read more about the test on the 7-cpu.com web site, but we've highlighted some of the key information about the tests here. Regarding the Compression test, the website discusses the factors that influence the test results, saying it "strongly depends from memory (RAM) latency, Data Cache size/speed and TLB. Out-of-Order execution feature of CPU is also important for that test." The site goes on: "The compression test has big number of random accesses to RAM and Data Cache. So big part of execution time the CPU waits the data from Data Cache or from RAM."

About the Decompression test, the website says it "strongly depends on CPU integer operations. The most important things for that test are: branch misprediction penalty (the length of pipeline) and the latencies of 32-bit instructions ('multiply', 'shift', 'add' and other). The decompression test has very high number of unpredictable branches."

How we retested Threadripper vs. Core i9

For our retest, we decided to lock both the Threadripper 2990WX and the Core i9-7980XE at 3GHz to remove any variables from each CPU's boost schemes. This was done to make the comparison more dependent on the test rather than the clock speed differences between the two. We also set both to DDR4/3,200 clocks, and both were run in quad-channel mode except where noted. To be up-front: The Threadripper system had a slight edge in CAS latency at CL14 and 1T, while the Core i9 was running at CL15 and 2T. As in our original review, both were running Founders Edition GTX 1080 cards using the same drivers and the same version of Windows 10 Enterprise Edition.

Because much of the concern over Threadripper is its per-core memory bandwidth performance, we decided to run from 1 thread to the maximum number of threads on each CPU. We also decided to see whether performance of the Threadripper would change if you turned off dies, so we ran it with a single die (8 cores/16 threads) and two dies (16 cores/32 threads), and all four (32 cores/64 threads).

In the integer-focused decompression component of 7-Zip, the performance was quite nice. Although we don't see perfect scaling, there's little difference in 7-Zip decompression performance as you switch off dies.

All of the tests were also completed using the GUI version of 7-Zip 18.05 with the default dictionary size of 32MB (although we did decide to recompile our own version, too.)

ryzen threadripper 2990wx 7 zip 18.05 decompression performance  per die lzma IDG

There's no apparent change in the decompression performance by moving between one, two, or four dies on the 32-core Threadripper.

You're probably more interested in the Core i9 vs. Threadripper 2990WX, so we ran that, of course. For the most part, it's not bad for either part. Interestingly, Threadripper 2990WX seems to have that slight fall-off in decompression performance as you cross the threshold of 8 cores. Core i9 has a decent performance advantage up to about 16 cores, but after that it runs out of steam and ends up losing to the 32-core Threadripper 2990WX CPU.

ryzen threadripper 2990wx 7 zip decompression performance vs core i9 IDG

The 7-Zip LZMA decompression is more sensitive to integer, branch prediction, and instruction latency. Although Core i9 has some advantage, it's clear that more cores are better in the end.

This shouldn't surprise too many, though. The CPU performance when you don't run out of memory bandwidth is a known quantity of the Threadripper 2990WX. You only have to look at our multi-threaded rendering tests to see how it's simply a monster.

The question is, what happens under memory bandwidth or memory latency tests? Here are the results of the Threadripper 2990WX in 7-Zip's compression test. It's not pretty, but the the good news is switching dies off didn't seem to matter. As you can see, the CPU appears to hit a ceiling at 26 threads, and then it just gets worse from there.

ryzen threadripper 2990wx 7 zip 18.05 compression performance  per die lzma IDG

We ran the Threadripper 2990WX in single-die, dual-die and quad-die configuration to see if memory bandwidth issues would ease. 

Perhaps worse is when you compare it to the Core i9-7980XE. Again—remember both of the CPUs were at a fixed clock speed of 3GHz and DDR4/3200.

ryzen threadripper 2990wx 7 zip compression performance vs core i9 IDG

7-Zip's compression test is said to be memory latency, cache, and out-of-order efficiency sensitive. Obviously, it doesn't do great on the 32-core Threadripper

That's just not a good look for the 32-core Threadripper 2990WX and does seem to confirm that memory latency and bandwidth chores suffer greatly.

But can memory bandwidth also hurt Core i9? To find out, we switched the Core i9 system from quad-channel mode into single-channel mode. Unfortunately, for our test, we did have to lower total memory to 16GB rather than 32GB due to lack of density on modules. The good news is the 7-Zip with the default dictionary fits fine, and we don't believe overall memory capacity was the issue. We can say that overall memory bandwidth as measured in Sandra 2018 was cut from 77GBps in quad-channel memory mode to 18.5GBps in single-channel mode on the Intel part. Per-core memory bandwidth went from 4.8GBps in quad-channel to 1GBps in single-channel mode.

ryzen threadripper 2990wx 7 zip compression performance vs core i9. single channeljpg IDG

Does cutting memory bandwidth on the Core i9-7980XE also kill its 7-Zip compression performance? Yup.

As you can see, the performance of Core i9-7980XE also suffers when its memory bandwidth is drastically cut. It doesn't suffer as much as the Threadripper 2990XE, but this doesn't appear to be the fault of some pro-Intel code at work. 

Linux tests bring a surprise. Keep reading!

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Gordon Mah Ung

PC World (US online)
Show Comments

Cool Tech

Toys for Boys

Family Friendly

Stocking Stuffer

Logitech Ultimate Ears Wonderboom Bluetooth Speaker

Learn more >

SmartLens - Clip on Phone Camera Lens Set of 3

Learn more >

Christmas Gift Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Maryellen Rose George

Brother PT-P750W

It’s useful for office tasks as well as pragmatic labelling of equipment and storage – just don’t get too excited and label everything in sight!

Cathy Giles

Brother MFC-L8900CDW

The Brother MFC-L8900CDW is an absolute stand out. I struggle to fault it.

Luke Hill

MSI GT75 TITAN

I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Laura Johnston

MSI GS65 Stealth Thin

If you can afford the price tag, it is well worth the money. It out performs any other laptop I have tried for gaming, and the transportable design and incredible display also make it ideal for work.

Andrew Teoh

Brother MFC-L9570CDW Multifunction Printer

Touch screen visibility and operation was great and easy to navigate. Each menu and sub-menu was in an understandable order and category

Featured Content

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?