The CPU Overload 2020 Suite

Our new CPU tests go through a number of main areas. We cover Web tests using our un-updateable version of Chromium, opening tricky PDFs, emulation, brain simulation, AI, 2D image to 3D model conversion, rendering (ray tracing, modeling), encoding (compression, AES, video and HEVC), office based tests, and our legacy tests (throwbacks from another generation of code but interesting to compare). Over the next few pages we’ll go over the high level of each test.

However, as mentioned in passing on the previous page, we run a number of registry edit commands again to ensure that various system features are turned off and disabled at the start of the benchmark suite. This includes disabling Cortana, disabling the GameDVR functionality, disabling Windows Error Reporting, disabling Windows Defender as much as possible again, disabling updates, and re-implementing power options and removing OneDrive, in-case it sprouted wings again.

A number of these tests have been requested by our readers, and we’ve split our tests into a few more categories than normal as our readers have been requesting specific focal tests for their workloads. A recent run on a Core i5-10600K, just for the CPU tests alone, took around 20 hours to complete.

Power

  • Peak Power (y-Cruncher using latest AVX)
  • Per-Core Loading Power using POV-Ray

Office

  • Agisoft Photoscan 1.3: 2D to 3D Conversion
  • Application Loading Time: GIMP 2.10.18 from a fresh install
  • Compile Testing (WIP)

Science

  • 3D Particle Movement v2.1 (Non-AVX + AVX2/AVX512)
  • y-Cruncher 0.78.9506 (Optimized Binary Splitting Compute for mathematical constants)
  • NAMD 2.13: Nanoscale Molecular Dynamics on ApoA1 protein
  • AI Benchmark 0.1.2 using TensorFlow (unoptimized for Windows)

Simulation

  • Digicortex 1.35: Brain stimulation simulation
  • Dwarf Fortress 0.44.12: Fantasy world creation and time passage
  • Dolphin 5.0: Ray Tracing rendering test for Wii emulator

Rendering

  • Blender 2.83 LTS: Popular rendering program, using PartyTug frame render
  • Corona 1.3: Ray Tracing Benchmark
  • Crysis CPU-Only: Can it run Crysis? What, on just the CPU at 1080p? Sure
  • POV-Ray 3.7.1: Another Ray Tracing Test
  • V-Ray: Another popular renderer
  • CineBench R20: Cinema4D Rendering engine

Encoding

  • Handbrake 1.32: Popular Transcoding tool
  • 7-Zip: Open source compression software
  • AES Encoding: Instruction accelerated encoding
  • WinRAR 5.90: Popular compression tool

Legacy

  • CineBench R10
  • CineBench R11.5
  • CineBench R15
  • 3DPM v1: Naïve version of 3DPM v2.1 with no acceleration
  • X264 HD3.0: Vintage transcoding benchmark

Web

  • Kraken 1.1: Depreciated web test with no successor
  • Octane 2.0: More comprehensive test (but also deprecated with no successor)
  • Speedometer 2: List-based web-test with different frameworks

Synthetic

  • Geekbench 4
  • AIDA Memory Bandwidth
  • Linux OpenSSL Speed (rsa2048 sign/verify, sha256, md5)
  • LinX 0.9.5 LINPACK

SPEC (Estimated)

  • SPEC2006 rate-1T
  • SPEC2017 rate-1T
  • SPEC2017 rate-nT

It should be noted that due to the terms of the SPEC license, because our benchmark results are not vetted directly by the SPEC consortium, we have to label them as ‘estimated’. The benchmark is still run and we get results out, but those results have to have the ‘estimated’ label.

Others

  • A full x86 instruction throughput/latency analysis
  • Core-to-Core Latency
  • Cache-to-DRAM Latency
  • Frequency Ramping
  • A y-cruncher ‘sprint’ to see how 0.78.9506 scales will increasing digit compute

Some of these tests also have AIDA power wrappers around them in order to provide an insight in the way the power is reported through the test.

2020 CPU Gaming (GPU) Benchmarks

For our new set of CPU Gaming tests, we wanted to think big. There are a lot of users in the ecosystem that prioritize gaming above all else, especially when it comes to choosing the correct CPU. If there is a chance to save $50 and get a better graphics card for no loss in performance from the CPU, then this is the route that gamers would prefer to tread. The angle here though is tough - lots of games have different requirements and cause different stresses on a system, with various graphics cards having different reactions to the code flow of a game. Then users also have different resolutions and different perceptions of what feels 'normal'. This all amounts to more degrees of freedom than we could hope to test in a lifetime, only for the data to become irrelevant in a few months when a new game or new GPU comes into the mix. Just for good measure, let us add in DirectX 12 titles that make it easier to use more CPU cores in a game to enhance fidelity.

When it comes down to gaming tests, some of the same rules apply to the CPU tests. If we can get standalone versions of tests, then perfect – even better if they will never update, because that gives us a consistent codebase to work with. However, given the nature of Steam or Origin or the EPIC Store, having a consistent code base is not always possible. So for our gaming tests, for those that we could find with offline DRM-free variants (such as those from GOG), we used those instead. Otherwise we rely on Steam for the most part, because it is the only store front that offers an external API to allow us to check if an account is online – and thus a single account to be used across multiple systems. When scaling out automation, it can be difficult when there are multiple accounts to deal with, so as we aim for fewer than 10 systems running simultaneously, one account is enough.

I could speak for a few days about the gripes of automating gaming benchmarks – the ones that do it well compared to the ones that have no consideration for the others that want to use an in-game benchmark repeatedly. There’s also the discussion for in-game benchmarks vs native benchmarks, which I’ve had many times with colleagues and peers, that I might go into depth sometime. But I have thrown benchmark titles out for the stupidest things – updates that cause *new* splash screens is why I’ve cut games like AoTS and Civ6 in the past. Or Ubisoft games that offer benchmark modes that do not output benchmark results files. Or those files that create HTML files that need to be pruned for the correct data, rather than a simple text file. Or shall we go into games that have their settings not as simple ini files, but are embedded in the registry !?! Total War gets thrown out for not allowing key presses in its menus, and then having cheat detection when you try to emulate mouse movements. I have, on multiple occasions, spent a day of work trying to code for a game that just doesn’t want to work – as a result, it gets thrown out of our benchmark suite.

In the past, we’ve tackled the GPU benchmark set in several different ways. We’ve had one GPU to multiple games at one resolution, or multiple GPUs take a few games at one resolution, then as the automation progressed into something better, multiple GPUs take a few games at several resolutions. However, based on feedback, having the best GPU we can get hold of over a dozen games at several resolutions seems to be the best bet.

Normally securing GPUs for this testing is difficult, as we need several identical models for concurrent testing, and very rarely is a GPU manufacturer, or one of its OEM partners, happy to hand me 3-4+ of the latest and greatest. In that aspect, over the years, I have to thank ECS for sending us four GTX 580s in 2012, MSI for sending us three GTX 770 Lightnings in 2014, Sapphire for sending us multiple RX 480s and R9 Fury X cards in 2016, and in our last test suite, MSI for sending us three GTX 1080 Gaming cards in 2018.

For our testing on the 2020 suite, we have secured three RTX 2080 Ti GPUs direct from NVIDIA. These GPUs have been optimized for with drivers and in gaming titles, and given how rare our updates are, we are thankful for getting the high-end hardware.  (It’s worth noting we won’t be updating to whatever RTX 3080 variant is coming out at some point for a while yet.)

On the topic of resolutions, this is something that has been hit and miss for us in the past. Some users state that they want to see the lowest resolution and lowest fidelity options, because this puts the most strain on the CPU, such as a 480p Ultra Low setting. In the past we have found this unrealistic for all use cases, and even if it does give the best shot for a difference in results, the actual point where you come GPU limited might be at a higher resolution. In our last test suite, we went from the 720p Ultra Low up to 1080p Medium, 1440p High, and 4K Ultra settings. However, our most vocal readers hated it, because even by 1080p medium, we were GPU limited for the most part.

So to that end, the benchmarks this time round attempt to follow the basic patter where possible:

  1. Lowest Resolution with lowest scaling, Lowest Settings
  2. 2560x1440 with the lowest settings (1080p where not possible)
  3. 3840x2160 with the lowest settings
  4. 1920x1080 at the maximum settings

Point (1) should give the ultimate CPU limited scenario. We should see that lift as we move up through (2) 1440p and (3) 4K, with 4K low still being quite strenuous in some titles.

Point (4) is essentially our ‘real world’ test. The RTX 2080 Ti is overkill for 1080p Maximum, and we’ll see that most modern CPUs pull well over 60 FPS average in this scenario.

What will be interesting is that for some titles, 4K Low is more compute heavy than 1080p Maximum, and for other titles that relationship is reversed.

So we have the following benchmarks as part of our script, automated to the point of a one-button run and out pops the results approximately 10 hours later, per GPU. Also listed are the resolutions and settings used.

Offline Games

  1. Chernobylite, 360p Low, 1440p Low, 4K Low, 1080p Max
  2. Civilization 6, 480p Low, 1440p Low, 4K Low, 1080p Max
  3. Deus Ex: Mankind Divided, 600p Low, 1440p Low, 4K Low, 1080p Max
  4. Final Fantasy XIV: 768p Min, 1440p Min, 4K Min, 1080p Max
  5. Final Fantasy XV: 720p Standard, 1080p Standard, 4K Standard, 8K Standard
  6. World of Tanks enCore: 768p Min, 1080p Standard, 1080p Max, 4K Max

Online Games

  1. Borderlands 3, 360p VLow, 1440p VLow, 4K VLow, 1080p Badass
  2. F1 2019, 768p ULow, 1440p ULow, 4K ULow, 1080p Ultra
  3. Far Cry 5, 720p Low, 1440p Low, 4K Low, 1080p Ultra*
  4. Gears Tactics, 720p Low, 4K Low, 8K Low 1080p Ultra
  5. Grand Theft Auto 5, 720p Low, 1440p Low, 4K Low, 1080p Max
  6. Red Dead Redemption 2, 384p Min, 1440p Min, 4K Min, 1080p Max
  7. Strange Brigade DX12, 720p Low, 1440p Low, 4K Low, 1080p Ultra
  8. Strange Brigade Vulkan, 720p Low, 1440p Low, 4K Low, 1080p Ultra

For each of the games in our testing, we take the frame times where we can (the two that we cannot are Chernobylite and FFXIV). For these games, at each resolution/setting combination, we run them for as many loops in a given time limit (often 10 minutes per resolution). Results are then taken as average frame rates and 95th percentiles.

Some of the games are ultimately still being evaluated for usefulness, and may eventually be dropped – Far Cry 5 has taken more time than I care to admit to get to work. Some of these titles require the exact CPU/GPU combination to be part of the settings files otherwise the settings file will be discarded, which gets ever increasingly frustrating.

*Update 7/20 : I recently found that Far Cry 5 has additional requirements regarding monitor resolution support. If the settings file requests a resolution that it can’t detect in the monitor on the test bed, then it defaults to 1080p. My test beds contain two brands of 4K monitor – Dell UP2415Qs and cheap 27-inch TN displays, in a 50:50 split. For whatever reason, FC5 doesn’t really like any resolution changes on the Dell monitors. I can adjust the resolution scale (0.5x-2.0x) for this game, and quality, but I only found this out on 7/20, which means we have to rerun chips for this data.

If there are any game developers out there involved with any of the benchmarks above, please get in touch at ian@anandtech.com. I have a list of requests to make benchmarking your title easier!

The other angle is DRM, and some titles have limits of 5 systems per day. This may limit our testing in some cases; in other cases it is solvable.

OS Preparation and Benchmark Installation CPU Tests: Office
Comments Locked

110 Comments

View All Comments

  • PeachNCream - Tuesday, July 21, 2020 - link

    You don't get what it means to perform a controlled test do you?
  • Aspernari - Wednesday, July 22, 2020 - link

    It's important to note that the environment is not actually well-controlled.

    https://twitter.com/IanCutress/status/128480609693...

    We don't know temperature for the operating conditions for these tests, which matters more and more for boost behavior for CPUs and GPUs. He says 36c when he got into the office, we'll never know what the temperature peaked at, nor how often similar conditions were reached.

    A standard platform is a good choice, but a controlled environment is also important. Unfortunately, the results aren't as reliable as they otherwise might have been.
  • PeterCollier - Wednesday, July 22, 2020 - link

    And that's why this entire test is a complete waste of time. Something like Geekbench or especially Userbench is much, much better because it gives you a range of scores. Instead of trying to create false precision by saying that a AMD 4700U scored, say, a "979" on a benchmark, Userbench will say that all the 4700U's tested scored from 899 to 1008, and break it down into percentiles. This way, you have a range of expected performance in mind instead of being fixated on that "979" number, which could have been obtained in an unrealistic scenario.
  • Rudde - Saturday, July 25, 2020 - link

    Isn't userbench a synthetic together with geekbench? What exactly are they testing? Instead of knowing which of Intel i7 10700k and AMD ryzen 7 3800X is better at rendering, video encoding, number crunching or whatever your use case is, you'll get a distribution based on a largely unknown test. The Intel and AMD processors might end up being within error margins of each other in your use case, but that in itself tells something too. All benchmarks are inherently bad; there is not a single benchmark that captures every use case while not being affected by its environment (ram speeds, temperatures, etc). I prefer tests that I understand, over tests that I do not understand.
  • bananaforscale - Wednesday, July 22, 2020 - link

    One could ask what the point of Userbenchmark is in these days of quadcores being basically entry level while the benchmark has DECREASED its multicore weighting.
  • A5 - Monday, July 20, 2020 - link

    For my own personal test, getting an i7-4770K in the list would be a big help.

    Once you have a compile test, a Xeon E5-1680v3 would be nice to see so that I can sell my corp on newer workstations...
  • Shmee - Wednesday, July 22, 2020 - link

    Those are great Haswell EP CPUs, and they OC too! I have an E5-1660v3 in my X99 rig.
  • Mockingtruth - Monday, July 20, 2020 - link

    I have a 3570k and a E8600 spare with respective motherboards and ram if useful?
  • CampGareth - Monday, July 20, 2020 - link

    Personally I'd like to see a Xeon E5-2670 v1 benchmarked. I'm still running a pair of them as my workstation but these days AMD can beat the performance on a single socket and halve the power consumption.
  • Samus - Tuesday, July 21, 2020 - link

    Do you run them in an HP Z620? I ran the same system with the same CPU’s for years at one of my clients. What a beast.

Log in

Don't have an account? Sign up now