Huawei & Honor's Recent Benchmarking Behaviour: A Cheating Headache
by Andrei Frumusanu & Ian Cutress on September 4, 2018 8:59 AM EST- Posted in
- Smartphones
- Huawei
- SoCs
- Benchmarks
- honor
- Kirin 970
The Raw Benchmark Numbers
Section By Andrei Frumusanu
Before we go into more details, we're going to have a look at how much of a difference this behavior contributes to benchmarking scores. The key is in the differences between having Huawei/Honor's benchmark detection mode on and off. We are using our mobile GPU test suite which includes of Futuremark’s 3DMark and Kishonti’s GFXBench.
The analysis right now is being limited to the P20’s and the new Honor Play, as I don’t have yet newer stock firmwares on my Mate 10s. It is likely that the Mate 10 will exhibit similar behaviour - Ian also confirmed that he's seeing cheating behaviour on his Honor 10. This points to most (if not all) Kirin 970 devices released this year as being affected.
Without further ado, here’s some of the differences identified between running the same benchmarks while being detected by the firmware (cheating) and the default performance that applies to any non-whitelisted application (True Performance). The non-whitelisted application is a version provided to us from the benchmark manufacturer which is undetectable, and not publicly available (otherwise it would be easy to spot).
We see a stark difference between the resulting scores – with our internal versions of the benchmark performing significantly worse than the publicly available versions. We can see that all three smartphones perform almost identical in the higher power mode, as they all share the same SoC. This contrasts significantly with the real performance of the phones, which is anything but identical as the three phones have diferent thermal limits as a result of their different chassis/cooling designs. Consequently, the P20 Pro, being the largest and most expensive, has better thermals in the 'regular' benchmarking mode.
Raising Power and Thermal Limits
What is happening here with Huawei is a bit unusual in regards to how we’re used to seeing vendors cheat in benchmarks. In the past we’ve seen vendors actually raise the SoC frequencies, or locking them to their maximum states, raising performance beyond what’s usually available to generic applications.
What Huawei instead is doing is boosting benchmark scores by coming at it from the other direction – the benchmarking applications are the only use-cases where the SoC actually performs to its advertised speeds. Meanwhile every other real-world application is throttled to a significant degree below that state due to the thermal limitations of the hardware. What we end up seeing with unthrottled performance is perhaps the 'true' form of an unconstrained SoC, although this is completely academic when compared to what users actually expereience.
To demonstrate the power behaviour between the two different throttling modes, I measured the power on the newest Honor Play. Here I’m showcasing total device power at fixed screen brightness; for GFXBench the 3D phase of the benchmark is measured for power, while for 3DMark I’m including the totality of the benchmark run from start to finish (because it has different phases).
The differences here are astounding, as we see that in the 'true performance' state, the chip is already reaching 3.5-4.4W. These are the kind of power figures you would want a smartphone to limit itself to in 3D workloads. By contrast, using the 'cheating' variants of the benchmarks completely explodes the power budget. We see power figures above 6W, and T-Rex reaching an insane 8.5W. On a 3D battery test, these figures very quickly trigger an 'overheating' notification on the device, showing that the thermal limits must be beyond what the software is expecting.
This means that the 'true performance' figures aren’t actually stable - they strongly depend on the device’s temperature (this being typical for most phones). Huawei/Honor are not actually blocking the GPU from reaching its peak frequency state: instead, the default behavior is a very harsh thermal throttling mechanism in place that will try to maintain significantly lower SoC temperature levels and overall power consumption.
The net result is that that in the phones' normal mode, peak power consumption during these tests can reach the same figures posted by the unthrottled variants. But the numbers very quickly fall back in a drastic manner. Here the device thottles down to 2.2W in some cases, reducing performance quite a lot.
84 Comments
View All Comments
sing_electric - Tuesday, September 4, 2018 - link
At some point, Huawei (and other Chinese OEMs) need to decide whether they want to build their brands globally or just in their home market."Other Chinese OEMs lie so we've got to as well" ends up doing nothing but providing ammunition for those that say that Chinese phones are "cheap," under-performing knock offs.
The Nexus 6p showed many years ago that Huawei can make good hardware. HiSilicon's chips obviously aren't doing them many favors in the GPU department, but that just means they need to target appropriate segments where they are competitive (I'm convinced that there's a large niche of people who want stylish devices that feel premium but don't really care much about performance), rather than lying and perpetuating a stereotype that will hurt their brand long after they've abandoned those practices.
A5 - Tuesday, September 4, 2018 - link
Calling the 6P "good" hardware is a bit generous. The battery subsystem has a devastating defect rate, especially since the phone is sealed.At one point Google ran out of refurbs and had to give out Pixel XLs to people as replacements.
ventrolis - Tuesday, September 4, 2018 - link
Are the charts for Aztec Normal/High flipped? Somehow I imagine the 'High' test would be more difficult and have lower frame rates than 'Normal'.Andrei Frumusanu - Tuesday, September 4, 2018 - link
Thank you for pointing it out, indeed the labels were flipped.CityZ - Tuesday, September 4, 2018 - link
Why not simply do a combined performance & power test where you run the benchmark continuously until the phone shuts down? If a phone maker tries to cheat for the performance side, they'll look bad on the run-time side. If a phone throws up a "I'm running too hot" screen, consider that the end of the test. Such a test not only shows how fast your game may perform, but also for how long you can game.Andrei Frumusanu - Tuesday, September 4, 2018 - link
Screen resolution, V-Sync and other device differences makes this kinda hard. In my view there's no added value over just peak & sustained performance as well as just measuring power.wow&wow - Tuesday, September 4, 2018 - link
"A Cheating Headache"The worst one should be the "Intel's repeatedly not following the specs" that causes the problems of "Meltdown", requiring OS memory relocation, the industry's 1st and only, and "Foreshadow" that the mitigation can only "reduce" the risk but "not eliminated" it!
Xex360 - Tuesday, September 4, 2018 - link
Because I don't consider phones to be gaming devices, for that I have a PC and a console, so benchmarks are worthless to me, the most important things in a phone are the OS (Unfortunately I'm stuck with android, iOS isn't well suited for my use), screen (high resolution and no notch) and finally battery life (around one day, an OLED screen).A5 - Tuesday, September 4, 2018 - link
Android increasingly uses the GPU to render the OS, and apps like Google Maps use it extensively as well. Sustained GPU performance isn't just relevant to gamers.eastcoast_pete - Tuesday, September 4, 2018 - link
First and foremost: Thanks Andrei and Ian! This kind of article is why I come to Anandtech again and again (and more frequently than other computer tech websites). Yes, those benchmarks are not only misleading, they also steer the manufacturers towards optimizing for an artificial use (benchmarking), often at the expense of actually optimizing their smartphones for real world use. Who knows just how much better Huawei's phones could have been for everyday use if the time and energy invested in cheating for benchmarks would have instead gone into optimizing their phones for productivity, real world applications, and battery live (Huawei gets a bonus for historically having large capacity batteries, though!). As they are, those benchmarking suites would probably come in handy if one needs to use the phone as a hand warmer in winter.One minor edit: the "High" and "Normal"labels on the the Aztec Ruins graphs are probably switched around. The fps numbers for "high" are really high for all devices. However, that is a very minor, cosmetic, point in an otherwise very good article!