The iPad Air Review
by Anand Lal Shimpi on October 29, 2013 9:00 PM ESTAn Update on Apple’s A7: It's Better Than I Thought
When I reviewed the iPhone 5s I didn’t have much time to go in and do the sort of in-depth investigation into Cyclone (Apple’s 64-bit custom ARMv8 core) as I did with Swift (Apple’s custom ARMv7 core from A6) the year before. I had heard rumors that Cyclone was substantially wider than its predecessor but I didn’t really have any proof other than hearsay so I left it out of the article. Instead I surmised in the 5s review that the A7 was likely an evolved Swift core rather than a brand new design, after all - what sense would it make to design a new CPU core and then do it all over again for the next one? It turns out I was quite wrong.
Armed with a bit of custom code and a bunch of low level tests I think I have a far better idea of what Apple’s A7 and Cyclone cores look like now than I did a month ago. I’m still toying with the idea of doing a much deeper investigation into A7, but I wanted to share some of my findings here.
The first task is to understand the width of the machine. With Swift I got lucky in that Apple had left a bunch of public LLVM documentation uncensored, referring to Swift’s 3-wide design. It turns out that although the design might be capable of decoding, issuing and retiring up to three instructions per clock, in most cases it behaved like a 2-wide machine. Mix FP and integer code and you’re looking at a machine that’s more like 1.5 instructions wide. Obviously Swift did very well in the market and its competitors at the time, including Qualcomm’s Krait 300, were similarly capable.
With Cyclone Apple is in a completely different league. As far as I can tell, peak issue width of Cyclone is 6 instructions. That’s at least 2x the width of Swift and Krait, and at best more than 3x the width depending on instruction mix. Limitations on co-issuing FP and integer math have also been lifted as you can run up to four integer adds and two FP adds in parallel. You can also perform up to two loads or stores per clock.
I don’t yet have a good understanding of the number of execution ports and how they’re mapped, but Cyclone appears to be the widest ARM architecture we’ve ever seen at this point. I’m talking wider than Qualcomm’s Krait 400 and even ARM’s Cortex A15.
I did have some low level analysis in the 5s review, where I pointed out the significantly reduced memory latency and increased bandwidth to the A7. It turns out that I was missing a big part of the story back then as well…
A Large System Wide Cache
In our iPhone 5s review I pointed out that the A7 now featured more computational GPU power than the 4th generation iPad. For a device running at 1/8 the resolution of the iPad, the A7’s GPU either meant that Apple had an application that needed tons of GPU performance or it planned on using the A7 in other, higher resolution devices. I speculated it would be the latter, and it turns out that’s indeed the case. For the first time since the iPad 2, Apple once again shares common silicon between the iPhone 5s, iPad Air and iPad mini with Retina Display.
As Brian found out in his investigation after the iPad event last week all three devices use the exact same silicon with the exact same internal model number: S5L8960X. There are no extra cores, no change in GPU configuration and the biggest one: no increase in memory bandwidth.
Previously both the A5X and A6X featured a 128-bit wide memory interface, with half of it seemingly reserved for GPU use exclusively. The non-X parts by comparison only had a 64-bit wide memory interface. The assumption was that a move to such a high resolution display demanded a substantial increase in memory bandwidth. With the A7, Apple takes a step back in memory interface width - so is it enough to hamper the performance of the iPad Air with its 2048 x 1536 display?
The numbers alone tell us the answer is no. In all available graphics benchmarks the iPad Air delivers better performance at its native resolution than the outgoing 4th generation iPad (as you'll soon see). Now many of these benchmarks are bound more by GPU compute rather than memory bandwidth, a side effect of the relative lack of memory bandwidth on modern day mobile platforms. Across the board though I couldn’t find a situation where anything was smoother on the iPad 4 than the iPad Air.
There’s another part of this story. Something I missed in my original A7 analysis. When Chipworks posted a shot of the A7 die many of you correctly identified what appeared to be a 4MB SRAM on the die itself. It's highlighted on the right in the floorplan diagram below:
A7 Floorplan, Courtesy Chipworks
While I originally assumed that this SRAM might be reserved for use by the ISP, it turns out that it can do a lot more than that. If we look at memory latency (from the perspective of a single CPU core) vs. transfer size on A7 we notice a very interesting phenomenon between 1MB and 4MB:
That SRAM is indeed some sort of a cache before you get to main memory. It’s not the fastest thing in the world, but it’s appreciably quicker than going all the way out to main memory. Available bandwidth is also pretty good:
We’re only looking at bandwidth seen by a single CPU core, but even then we’re talking about 10GB/s. Lookups in this third level cache don’t happen in parallel with main memory requests, so the impact on worst case memory latency is additive unfortunately (a tradeoff of speed vs. power).
I don’t yet have the tools needed to measure the impact of this on-die memory on GPU accesses, but in the worst case scenario it’ll help free up more of the memory interface for use by the GPU. It’s more likely that some graphics requests are cached here as well, with intelligent allocation of bandwidth depending on what type of application you’re running.
That’s the other aspect of what makes A7 so very interesting. This is the first Apple SoC that’s able to deliver good amounts of memory bandwidth to all consumers. A single CPU core can use up 8GB/s of bandwidth. I’m still vetting other SoCs, but so far I haven’t come across anyone in the ARM camp that can compete with what Apple has built here. Only Intel is competitive.
444 Comments
View All Comments
eanazag - Monday, November 4, 2013 - link
The innovation that the Thunderbolt people are really waiting for is faster eMMC flash on the iPad. USB 3 or Thunderbolt is not going to help the fact that the flash storage is too slow to even make USB 2 sweat. I completely agree that sync and restore via iTunes is painfully slow. I would also argue that WiFi sync on 2 stream N is useless if the iPad still sports slow storage.If I had a request for USB 3, it would solely based on higher power specs for charging or docking. I
mnbob1 - Friday, November 8, 2013 - link
Apple has put a lot of emphasis on iCloud and backing up to iCloud. I have an iPad and an IPhone and haven't connected either to my computer for over a year. I backup to iCloud and use iTunes Match to access my music library which also gives me ad free iTunes Radio now. I store photos to iCloud because I take advantage of Photo Stream. My documents are backed up to Drop Box. Earlier this year I upgraded from an iPhone 4S to and iPhone 5. With iCloud all of my device settings were restored within a few minutes and my apps were downloaded in the background so I could still use my phone while that was happening. With iTunes Match I was able to see my entire music library of over 7,000 songs and choose what I wanted to download to my phone when I wanted to. I was able to restore my photos quickly and access my documents from Dropbox quickly. The whole process took me less than an hour initially since I don't bog my phone down with a lot of apps that I don't use and I only download the music as I use it. I trying to figure out why you guys think you need to connect up with thunderbolt or USB 3.0 when the iPad Air also has wifi with MIMO capabilities. Stop tethering your portable devices to the desktop because Apple isn't going to do thunderbolt because it would exclude Windows PC's or upgrade to USB 3.0 because the need for data going across that wire becomes less important and it becomes more of a charging port.IUU - Sunday, November 17, 2013 - link
I am glad you 're feeling so comfortable having your dafa stored on other people's hds.I suppose you feel comfortable, storing your food in other people's refrigerators,
writing your diaries and personal notes on other people's diaries and notebooks.(Marx and Lenin would absolutely fall in love with you).
And all this, despite the fact that your "entire music library" is laughably small to what an average local storage could offer. Oh I get it, you do this as a future proof policy, because you somehow know
e storage won't improve in the future, despite the fact that the known laws of nature allow for much much more than zettabytes to be stored locally.
Like the ignorant chinese peasant, thanking his lords for offering him 200 dollars instead of 100, you thank your cloud bosses for offering you 50 instead of 25gb. Sorry, but trying to convert the data network to a feudal type traditional energy grid won't work, because it's against the ways of nature.
This energy grid is going to die soon as well, much to the dislike of the last remaining tyrants.
pojkeboy - Wednesday, October 30, 2013 - link
Ha. I love this comment.pdjblum - Wednesday, October 30, 2013 - link
Despite the snap 600, the nexus 7.2 is still a wonderful device for a couple of hundred less than a mini with a reasonable amount of storage, not the pittance they offer in the base model. Not sure how he can recommend a mini at all when it is hundreds more than the nexus 7.2? The Verge will do that because they shit crApples, but a so called objective, highly intelligent reviewer should have a problem with that.akdj - Wednesday, October 30, 2013 - link
The Nexus 7.2 is a POS. I returned mine within 3 weeks. 475,000 optimized tablet apps in the App Store, maybe 15 in the Play store. What a joke attempting to surf on the Nex7 in portrait. Decent performance, yeah....but without apps that aren't 'blown up' phone apps, it's a joke. With the mini, you're not just buying a quality built tablet (that is obviously more powerful than the Nex7), but into an extremely active and blossoming eco-system...now $50+ in 'free' productivity and creative apps optimized for the system...and phenomenal post purchase support. Google is selling their tabs right @ the cost of the BOM. Why? They're in it for YOUR personal info...they're miners, data miners. Your information is what they make their money on, not the hardware.While the Nex 7.2 maybe a decent choice for some looking to save a bit of cash, if you've got the money, the iPad is THEE only way to buy into the current tablet market. Period.
pdjblum - Wednesday, October 30, 2013 - link
crApple makes their money on ignorant, entitled, insecure people who want to pay extra to feel good about themselves.Scannall - Wednesday, October 30, 2013 - link
Bitter much?Fact is, I don't mind paying more for a quality product with great service and support. With the added bonus of having apps I actually use, that have NO Android equivalents. Not to mention 16:9 form factor sucks for my tablet usage.
I don't buy the cheapest car on the market either.
jopamo - Sunday, November 3, 2013 - link
"crApple doesn't make any money from me, yet I am equally ignorant, entitled, and insecure as the people I who claim want to pay extra to feel good about themselves."There. Fixed that for you. :)
akdj - Monday, November 4, 2013 - link
It's YOUR ignorance that shows---using 'crApple' and for YOU to decide folks' insecurities? I'm thinkin' you might be a bit secure---either that or you Mom said "HELL No!"....."If you want one, get a job, save some money---and buy it yourself!"Am I close? Certainly nailed YOUR insecurities...lol, always wonder about the ambiguity of the 'net and what these nay-sayers would actually have the Balls to say to an Apple owner in real life, face to face.
pdjblum----Silly, Silly Boy