Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights
by Andrei Frumusanu on October 25, 2021 9:00 AM EST- Posted in
- Laptops
- Apple
- MacBook
- Apple M1 Pro
- Apple M1 Max
Last week, Apple had unveiled their new generation MacBook Pro laptop series, a new range of flagship devices that bring with them significant updates to the company’s professional and power-user oriented user-base. The new devices particularly differentiate themselves in that they’re now powered by two new additional entries in Apple’s own silicon line-up, the M1 Pro and the M1 Max. We’ve covered the initial reveal in last week’s overview article of the two new chips, and today we’re getting the first glimpses of the performance we’re expected to see off the new silicon.
The M1 Pro: 10-core CPU, 16-core GPU, 33.7bn Transistors
Starting off with the M1 Pro, the smaller sibling of the two, the design appears to be a new implementation of the first generation M1 chip, but this time designed from the ground up to scale up larger and to more performance. The M1 Pro in our view is the more interesting of the two designs, as it offers mostly everything that power users will deem generationally important in terms of upgrades.
At the heart of the SoC we find a new 10-core CPU setup, in a 8+2 configuration, with there being 8 performance Firestorm cores and 2 efficiency Icestorm cores. We had indicated in our initial coverage that it appears that Apple’s new M1 Pro and Max chips is using a similar, if not the same generation CPU IP as on the M1, rather than updating things to the newer generation cores that are being used in the A15. We seemingly can confirm this, as we’re seeing no apparent changes in the cores compared to what we’ve discovered on the M1 chips.
The CPU cores clock up to 3228MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz.
The two E-cores in the system clock at up to 2064MHz, and as opposed to the M1, there’s only two of them this time around, however, Apple still gives them their full 4MB of L2 cache, same as on the M1 and A-derivative chips.
One large feature of both chips is their much-increased memory bandwidth and interfaces – the M1 Pro features 256-bit LPDDR5 memory at 6400MT/s speeds, corresponding to 204GB/s bandwidth. This is significantly higher than the M1 at 68GB/s, and also generally higher than competitor laptop platforms which still rely on 128-bit interfaces.
We’ve been able to identify the “SLC”, or system level cache as we call it, to be falling in at 24MB for the M1 Pro, and 48MB on the M1 Max, a bit smaller than what we initially speculated, but makes sense given the SRAM die area – representing a 50% increase over the per-block SLC on the M1.
The M1 Max: A 32-Core GPU Monstrosity at 57bn Transistors
Above the M1 Pro we have Apple’s second new M1 chip, the M1 Max. The M1 Max is essentially identical to the M1 Pro in terms of architecture and in many of its functional blocks – but what sets the Max apart is that Apple has equipped it with much larger GPU and media encode/decode complexes. Overall, Apple has doubled the number of GPU cores and media blocks, giving the M1 Max virtually twice the GPU and media performance.
The GPU and memory interfaces of the chip are by far the most differentiated aspects of the chip, instead of a 16-core GPU, Apple doubles things up to a 32-core unit. On the M1 Max which we tested for today, the GPU is running at up to 1296MHz - quite fast for what we consider mobile IP, but still significantly slower than what we’ve seen from the conventional PC and console space where GPUs now can run up to around 2.5GHz.
Apple also doubles up on the memory interfaces, using a whopping 512-bit wide LPDDR5 memory subsystem – unheard of in an SoC and even rare amongst historical discrete GPU designs. This gives the chip a massive 408GB/s of bandwidth – how this bandwidth is accessible to the various IP blocks on the chip is one of the things we’ll be investigating today.
The memory controller caches are at 48MB in this chip, allowing for theoretically amplified memory bandwidth for various SoC blocks as well as reducing off-chip DRAM traffic, thus also reducing power and energy usage of the chip.
Apple’s die shot of the M1 Max was a bit weird initially in that we weren’t sure if it actually represents physical reality – especially on the bottom part of the chip we had noted that there appears to be a doubled up NPU – something Apple doesn’t officially disclose. A doubled up media engine makes sense as that’s part of the features of the chip, however until we can get a third-party die shot to confirm that this is indeed how the chip looks like, we’ll refrain from speculating further in this regard.
493 Comments
View All Comments
richardnpaul - Wednesday, October 27, 2021 - link
I'm not saying that it's not great and energy efficient marvel of technology (although you're forgetting that the compared part is Zen3 mobile 35W part which has 12MB rather than 32MB of L3 and that's partly because its a small die on 7nm).They mentioned Metal they mentioned how they can't get direct comparative results, this is one of the downsides of this, and the others from Apple, chip, great as it is it has drawbacks that hamper it which are nothing to to do with the architecture.
OreoCookie - Wednesday, October 27, 2021 - link
I don’t think I’m forgetting anything here. I am just saying that Anandtech should compare the M1 Max against actual products rather than speculate how it compares to future products like Alderlake or Zen 3 with V cache. Your claim was that the article “comes across as a fanboi article”, and I am just saying that they are just giving the chip a great review because in their low-level benchmarks it outclasses the competition in virtually every way. That’s not fanboi-ism, it is just rooted in fact.And yes, they explained the issue with APIs and the lack of optimization of games for the Mac. Given that Mac users either aren’t gamers or (if they are gamers) tend to not use their Macs for gaming, we can argue how important that drawback actually is. In more GPU compute-focussed benchmarks (e. g. by Affinity that make cross-platform creativity apps), the results of the GPU seem very impressive.
richardnpaul - Thursday, October 28, 2021 - link
My main disagreement was not them comparing with with Zen3, but more that I felt that they failed to adequately cover how the change would impact this use case scenario between M1 versions given that comparing Zen2 to Zen3 has been covered (and AMD have already said that the Vcache will mainly impact gaming and server workloads by around 15% on average) and shown in these specific use cases to have quite a large benefit and I'd just wanted that kind of abstract logical analysis of how the Max might be more positively positioned for this or these use cases above say the original M1. (I know that they mentioned in the article that they didn't have the M1 anymore and the actual AMD 5900HS device is dead which has severely impacted their testing here.I come to Anandtech specifically for the more indepth coverage that you don't get elsewhere and I come for all the hardware articles irrespective of brand because I'm interested in technology not brand names which is why I dislike articles that come across as biased (whilst it'll never be intentionally biased we're all human at the end of the day and it's hard not to let the excitement of novel tech cloud our judgement).
richardnpaul - Wednesday, October 27, 2021 - link
Also my comparison was AMD to AMD between generations and how it might apply to increasing the cache sizes of the M1 and the positive improvement it might have on performance in situations using the GPU such as gaming.Ppietra - Wednesday, October 27, 2021 - link
You are so focused on a fringe case that you don’t stop to think that "maybe" there are other things happening besides "gluing" a CPU and GPU on the same silicon, fighting for memory bandwidth. Unified memory architecture plus CPU and GPU sharing data over the system cache, has an impact on memory bandwidth needs.Besides this, looking at data that it is provided, we seem to be far from saturating memory bandwidth on a regular basis.
It would be interesting though to actually see how applications behave when truly optimised for this hardware and not just ported with some compatibility abstraction layer in the middle. Affinity Photo would probably be the best example.
richardnpaul - Wednesday, October 27, 2021 - link
This is exactly what I wanted coving in the article. If the GPU and CPU are hitting the memory subsystem they are going to be competing for cache hits. My point was that Zen3 (desktop) showed a large positive correlation between doubling the cache (or unifying it into a single blob in reality) and increased FPS in games and that that might also hold true for the increased cache on the M1 Pro and Max.Unfortunately testing this chip is hampered by decisions completely unrelated to the hardware itself, and that also applies to certain use cases.
it'll be more interesting to see testing the same games under Linux between an Nvidia/AMD/Intel based laptop as then the only differences should be the ISA; and immature drivers.
Ppietra - Wednesday, October 27, 2021 - link
"hitting the memory subsystem they are going to be competing for cache hits"CPU and GPU also have their own cache (CPU 24MB L2 total; GPU don’t know how much now) which is very substantial.
And I think you are not seeing the picture about CPU and GPU not having to duplicate resources, working on the same data in an enormous 48MB system cache (when using native APIs of course) before even needing to access RAM, reducing latency, etc. This can be very powerful. So no, I don’t assume that there will any significant impact because of some fringe case while ignoring the great benefits that it brings.
richardnpaul - Wednesday, October 27, 2021 - link
One person's fringe edge case is another person's primary use case.The 24/48MB is a shared cache between the CPU and GPU (and everything else that accesses main memory).
Ppietra - Wednesday, October 27, 2021 - link
no, it’s a fringe case period! You don’t see laptop processors with these amounts of L2 cache and system cache anywhere, not even close, and yet for some reason you feel that it would be at an disadvantage, failing to acknowledge the advantages of sharingrichardnpaul - Thursday, October 28, 2021 - link
What you call a fringe case I call 2.35m people. Okay, so it's probably on about 1.5 to 2% of Mac users; it's ~2.5% of Steam users.I know people who play games on Windows Machines because their GPUs in their Macs aren't good enough. Those people who are frustrated having to maintain a Windows machine just to play games. Those people will buy into an M1 Pro or Max just so they can be rid of the Windows system. It won't be their main concern, but then they're not going to be buying an M1 Pro/Max for the reason of rendering etc when they're a web developer, they're going to buy it so that they can dump the pain in the backside Windows gaming machine. Valve don't maintain their MacOS version of Steam for no good reason.