Section by Andrei Frumusanu

Performance & Efficiency

In terms of scalability and performance, what we can generally say is that one G76 core is roughly equal to two G72 cores. This also changes the configuration options that Arm offers as the maximum core count for the largest GPU is an MP20 configuration.

When going all-out in laying down cores this means we have a 25% higher maximum performance point. To date we haven’t seen vendors reach near the maximum configuration option of MP32 for the G71 and G72 and as the largest Mali was the Exynos 8895 with a G71MP20.

Improving the performance density of the cores by consolidating functional blocks and execution engines in fewer “cores” improves the PPA of the GPU dramatically. The G76 at iso process and frequency, at similar area configurations, is said to improve the fps/mm² metric by 39% in Manhattan 3.0 and thanks to the improvements in the geometry pipelines, a significant 65% in Car Chase. The casual gaming benchmark here depicts a simpler fill-rate bound workload such as Angry Birds and Candy Crush.

In terms of power efficiency, the metrics presented here depict the performance improvement at ISO process node and frequency, at ISO power values of peak power coming in at a target 2.3W GPU power only. We’re to be reminded that for 3D workloads there’s significant power overhead from the memory subsystem and DRAM and that’s why this figure is lower than what I’ve usually published in terms of total platform active power in the past.

In general the figures that we’re looking at in terms of improvement in common benchmarks like Manhattan are a 1.3x increase in performance at equal power and area, process and frequency.

How this would look in a late 2018 / early 2019 SoC would be something like the following projection:

GFXBench Manhattan 3.1 Offscreen Power Efficiency
(System Active Power)
  Mfc. Process FPS Avg. Power
(W)
Perf/W
Efficiency
Mali G76MP12 SoC Projection 7/8nm class 69.00 4.08 16.90 fps/W
Galaxy S9+ (Snapdragon 845) 10LPP 61.16 5.01 11.99 fps/W
Galaxy S9 (Exynos 9810) 10LPP 46.04 4.08 11.28 fps/W
Galaxy S8 (Snapdragon 835) 10LPE 38.90 3.79 10.26 fps/W
LeEco Le Pro3 (Snapdragon 821) 14LPP 33.04 4.18 7.90 fps/W
Galaxy S7 (Snapdragon 820) 14LPP 30.98 3.98 7.78 fps/W
Huawei Mate 10 (Kirin 970) 10FF 37.66 6.33 5.94 fps/W
Galaxy S8 (Exynos 8895) 10LPE 42.49 7.35 5.78 fps/W
Galaxy S7 (Exynos 8890) 14LPP 29.41 5.95 4.94 fps/W
Meizu PRO 5 (Exynos 7420) 14LPE 14.45 3.47 4.16 fps/W
Nexus 6P (Snapdragon 810 v2.1) 20Soc 21.94 5.44 4.03 fps/W
Huawei Mate 8 (Kirin 950) 16FF+ 10.37 2.75 3.77 fps/W
Huawei Mate 9 (Kirin 960) 16FFC 32.49 8.63 3.77 fps/W
Huawei P9 (Kirin 955) 16FF+ 10.59 2.98 3.55 fps/W

Arm that’s that the 1.5x target improvement in performance on a future G76 in 7nm would happen thanks to a relative increase of the GPU capabilities scaling from a G72MP18 to a G76MP12. So it seem natural to take the Exynos 9810 as a baseline for the performance projections. Assuming the power target wouldn’t change, we’d see a G76MP12 in the upcoming process node outperforming current generation leader, the Snapdragon 845, by 13% in Manhattan 3.1. Power efficiency at peak performance would also be 47% better.

Obviously the competition won’t be standing still – although Qualcomm had a bit of a misstep in terms of power efficiency in the Adreno 630, it’s possible this will be caught up in the next iteration next year, not to mention that the process node improvements alone would be then sufficient to retake the lead on the GPU side.

End Remarks

All in all, the Mali G76 provides extremely solid advancements – 30% better performance at the same area and power are heavy generational improvements. However while this will greatly improve the competitiveness of Mali GPUs – I don’t think it will be quite sufficient to catch up with the competition.

In terms of the microarchitectural changes, I think Arm did the right choices in terms of consolidating the cores and beefing them up. Currently it seems that the high-core count in Mali GPUs is a two-edged sword; while it does provide extremely fine-grained configuration ability and allows vendors to pick exactly a certain core count that fits their area budget for the GPU, it also causes inevitable overhead.

The Mali G76 proves the kind of improvement that comes from simply avoiding overhead control logic. Arm envisions a MP12 configuration for a flagship SoC and I still quite think this is rather too many cores. Compared to the 4-core Adreno 540, 2-core Adreno 630 or even the 3-core Apple A11 GPU it’s easy to see quite why Mali lags behind in power efficiency and area. I wish that in the future we’ll see another doubling of the computational resources per core as that would bring another large improvement to close the gap to the competition.

For now, I’m looking forward to how the landscape will change with upcoming SoCs and how the G76 will perform in actual silicon.

The Mali G76 µarch - Fine tuning it
Comments Locked

25 Comments

View All Comments

  • levizx - Friday, June 1, 2018 - link

    " the size of a wavefront is typically a defining feature of an architecture. For long-lived architectures, especially in the PC space, wavefront sizes haven’t changed for years.."

    That's self-contradictory, if something stays the same across years of different μarch, it's by definition NOT a defining feature.
  • levizx - Friday, June 1, 2018 - link

    "Arm is touting a 2.7x increase in machine learning performance"

    No they are not. They are claiming 2.7x the performance, 1.7x increase.
  • Quantumz0d - Friday, June 1, 2018 - link

    I remember how bad the S8s Exynos GPU was, plus older Kirin SoCs power guzzlers. If it was delivering performance that would be still okay but in this age of slim era glass backed phones Multicore configurations will end up throttling. Still a progress is welcomed.
  • newblar - Monday, June 4, 2018 - link

    I always wondered why ARM didn't just buy imagination technologies on the cheap so they could get their GPU tech.
  • digitalwhatsup - Tuesday, June 5, 2018 - link

    Wow . Lot of information at one place. Love to see details on storage system. Thanks https://www.digitalwhatsup.com/

Log in

Don't have an account? Sign up now