Comments for Apple A8X’s GPU - GXA6850, Even Better Than I Thought

Apple A8X’s GPU - GXA6850, Even Better Than I Thought

by Ryan Smith on 11/11/2014 11:00 PM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

114 Comments

Back to Article

dyc4ha - Tuesday, November 11, 2014 - link
Wouldn't it be better to name it GX6850A instead? less confusing and more consistent, just my 2c.
Ryan Smith - Wednesday, November 12, 2014 - link
Admittedly the naming is arbitrary. But since GX6850 doesn't technically exist, it's better to convey it's an Apple design than to make it sound like a variant of a non-existent Imagination design (think NCC-1701-A)
dyc4ha - Wednesday, November 12, 2014 - link
Thank you for your reply, I also like the enterprise reference ;)
Spunjji - Wednesday, November 12, 2014 - link
Doubleplus bonus geek points.
vFunct - Wednesday, November 12, 2014 - link
Do we even know that they're using PowerVR IP?

As far as I can tell the only thing that relates the two companies would be the tile-based deferred rendering. Apple could just as easily have created their own complete architecture for the that, perhaps by licensing Imagination Tech's patents or an architectural license, instead of a hard-macro license.

They did the same with the ARM-based Cyclone CPU cores, so why not with the GPU cores as well?
jameskatt - Saturday, November 15, 2014 - link
Since Apple OWNS 10% of Imagination Technologies, it can do what it wants with the IP including adding its own customizations.
feldspar - Tuesday, November 11, 2014 - link
Perhaps the extra CPU core, doubled memory, and two-cluster GPU design also are paving way for iPad to easily do split-screen apps? Just give each side app its own GPU cluster and off they go.
przemo_li - Wednesday, November 12, 2014 - link
Rigid distribution of hw is bad idea. Its better if both apps just request via some API some 3/2D tasks to be done. Fetch ready pixel buffers and submit those to the OS for final composition.

This way GPU driver can optimize resource allocation (which btw can be sub-cluster) including priority of those apps, and OS can update whole screen once.

Also works much better when one app is GPU heavy and other is not (like compute used in one app, while second is simple 2D app)
tipoo - Wednesday, November 12, 2014 - link
No point in fixed distribution, let each app use what it needs. It's like unified shaders vs fixed, unified is more efficient because the hardware does whatever it needs to, rather than leaving some unused.
hecnpo - Wednesday, June 17, 2015 - link
You're a visionary, boy! 8 months earlier you predicted that.
jjj - Tuesday, November 11, 2014 - link
Is it strong perf though? I mean it's a huge die and on 20nm and all it can do is this? Got to wonder where it will be when Maxwell goes 20nm and maybe others got new architectures soon..Sure they can afford to use much bigger SoCs than others since they make their own but it doesn't seem all that efficient.
In the end we measure mobile GPUs by perf in synthetic benchmarks and that has no real relevance since all wee need is for games to run well.Wish mobile benchmarking would get better tools already since GPU testing is rather misleading and we all look for best perf not good enough perf.
smartypnt4 - Wednesday, November 12, 2014 - link
This is what I was thinking... The GPU just clock incredibly low since it performs as well as would be expected from a 6-cluster implementation at the same clockspeeds as the 4-cluster in the A8. I'd wager they're getting awesome power consumption numbers out of this GPU if that's the case.
jjj - Wednesday, November 12, 2014 - link
They do tend to clock lower to save power but gaming battery life isn't great. Don't have heat testing maybe they save on that. Anyway, assuming the GPU die area is 2x the GPU in the iphone then on 28nm it would be over 70mm2. Ofc i assumed that the TK1 die area is about 80mm2 but we don't really know for sure and we don't know how big the GPU is either so maybe i should have been more cautious in assuming Nvidia's die size since we just don't have enough info on it.
testbug00 - Wednesday, November 12, 2014 - link
The Tegra K1 has a die area of around 120mm^2 for the A15 variant. No clue about the Denver variant.
easp - Wednesday, November 12, 2014 - link
Apple's priorities seem to be excellent balanced performance, battery life, and low weight. Those things are somewhat at odds with each other, so they have to make accommodations in other parts of the design. So, they spend silicon to get strong balanced performance with low power consumption. That sounds like a reasonable thing to me.
tipoo - Wednesday, November 12, 2014 - link
Yup. They're not the kind to double the GPU clocks for this performance, sacrificing battery life. More GPU logic allows a lower clock.
testbug00 - Wednesday, November 12, 2014 - link
what 20nm Maxwell chip is coming? Sites have finally realized that GM200 is going to be 28nm.

The Erista chip is likely going to be 28nm based on those facts. I would be quite happy to see it on 20nm though, especially if they made an updated version of the original SHIELD device-thingy.
tviceman - Wednesday, November 12, 2014 - link
It'd be heavily fool-hardy for Nvidia to release a 28nm SoC in 2015. I think the general sentiment going is that 20nm isn't viable or large dies, which SoC's are not (at least in comparison to mid-range and flagship GPU's). Mobile is currently much, much more competitive than anything else and since TK1 is shown to consume higher power when under max load than other devices in it's class, a shrink is needed for it to remain competitive in perf/w and maintain a sane die size.
testbug00 - Wednesday, November 12, 2014 - link
Given NVidia works on lowering A15/57 and Denver power consumption, with how Maxwell is sized, it could end up with a chip smaller than the K1 that uses noticably less power and is far faster.

I highly doubt there is any Maxwell chips designed for 20nm. Don't see why they would shrink the design just for the next Tegra processor.

Nvidia was the first company to publicly go after foundries for the huge cost increases that were happening in public (to my knowledge) and I don't think their stance has magically changed. Not to mention that 20nm power and performance characteristics aren't that large above 28nm.
aenews - Saturday, November 15, 2014 - link
What's wrong with GFXBench? 3DMark is definitely biased by resolution, even on off-screen benchmarks though. I've tested with Resolution Changer.
ltcommanderdata - Wednesday, November 12, 2014 - link
Another explanation for the GXA6850 is that though it may be "overweight" for the iPad Air 2, it would be well suited for reuse in the rumoured larger, higher resolution iPad Pro that could appear early next year.
vFunct - Wednesday, November 12, 2014 - link
or an ARM-based MacBook...
bill.rookard - Wednesday, November 12, 2014 - link
That's what I'm thinking too. It would be essentially the Apple 'Chromebook', running IOS, in a Macbook Air chassis, and would probably get a stupidly long runtime. More battery space, more thermal headroom in the chassis, and powerful enough for light work.

Yeah - it might not be the fastest when it comes to processing video, but if they tossed in some licenced Quicksync IP from Intel with the spare room they apparently have on-die...
jameskatt - Saturday, November 15, 2014 - link
It is too slow for a MacBook. I seriously doubt Macs will switch to ARM. Once ARM reaches desktop level workloads, they will have the same power requirements as Intel's desktop chips. It is hard to get around the laws of physics.
blackcrayon - Wednesday, November 12, 2014 - link
Or perfect for an Apple TV 4 gaming set top box.
jameskatt - Saturday, November 15, 2014 - link
Certainly, this would be a great reason for creating the GXA6850. I want my iPad Pro with 14" screen.
LJSteve - Friday, November 21, 2014 - link
I believe its also intended for their next-gen AppleTV; 3840x2160x30 = 249 Mpixels/sec vs 2048x1536x60 = 189 Mpixels/sec, so ~32% greater perf is needed than to cover just the iPad 6 (i.e., Air 2). And here I'm obviously assuming 60 fps is NOT practical with the A8X at 2160p.
CrazyElf - Wednesday, November 12, 2014 - link
I'm not much of an Apple fan, but I have to admit their SOC design team that they picked up from PA Semi is quite impressive.

They've been able to design some very power efficient CPUs, while using some of the fastest GPUs around that are licensed from PowerVR. It's quite an accomplishment.

I just wish there was a comparable SOC offered on Android.
kurahk7 - Wednesday, November 12, 2014 - link
Sadly, most people are ignorant of the fact of how advanced and powerful Apple socs are; they have faster single threaded performance than CPUs that are clocked at almost twice its frequency, that's some incredibly high ipc right there.
Daniel Egger - Wednesday, November 12, 2014 - link
Not just that. There're also surprisingly little (read: no known) hiccups in their very impressive designs and the associated processes especially compared to Apples' non-tablet/phone business. Maybe AMD and NVidia should take this as a cue how to do designs that not only perform well but also last...
vFunct - Wednesday, November 12, 2014 - link
It also helps that Apple doesn't design for "specs", but instead designs for systems level. The most efficient designs are designed at the systems level.

The 8-Core cpus are a good example of design-for-spec, meant to sell to low-information buyers that mistakenly believe 8-cores are faster. They don't know that no software uses 8 cores, and most software typically only use 1-2 cores, which makes all those extra cores useless.

Meanwhile, Apple never even announced the number of cores on the iPad! They do that because they're here to sell you an iPad, not an A8X chip.
Bob Todd - Wednesday, November 12, 2014 - link
True, but I believe they also didn't trumpet that fact (or others like the RAM) in order to not outshine the rest of their product stack too badly, especially the already seemingly half-assed attempt that is the new iPad Mini 3. While Apple customers rarely buy on specs alone, double the RAM and another (faster) core is going to go an awfully long way to making the iPad Air 2 considerably more future proof than the new iPad Mini 3. I really wish everything had been bumped to 2GB this generation, phones included. I prefer Android, but even I think the iPad Air 2 is a smart purchase.
Nogib - Wednesday, November 12, 2014 - link
Apple has the advantage of doing things in-house. Most Android manufacturers are at the mercy of 3rd party SoC suppliers that drag their feet. Seriously, I can't figure out why Qualcomm, nVidia, etc take so long between tapout and market. Apple cranks out their new designs fast while often SoCs from other companies seem to take an extra year to be available. Samsung may have their own Exynos designs but even those are behind the times and lackluster.
prosit - Wednesday, November 12, 2014 - link
I wonder where that 2MB L2 cache comes from?
2MB for 3 cores doesn't sound right.
0.6666 MB per core...
Ryan Smith - Wednesday, November 12, 2014 - link
Logically it's one shared block of L2 cache. Furthermore 2MB is a power-of-two number and makes everyone happier than 1.5MB.
prosit - Wednesday, November 12, 2014 - link
The L2 cache blocks are integrated in the space per core, look at the die.
Only the 4MB L3 cache is separate.
2MB makes no sense.
Ryan Smith - Wednesday, November 12, 2014 - link
We've tested this. It's definitely a 2MB unified L2 cache.

http://images.anandtech.com/doci/8666/A8X_Latency....
prosit - Wednesday, November 12, 2014 - link
Then there must be a 4th core, at least only the L2 cache part of it.
We really need a die shot soon!
DERSS - Wednesday, November 12, 2014 - link
Die shot already exists and there is no fourth core.
prosit - Wednesday, November 12, 2014 - link
The 4th core isn't there, but it's L2 cache part is.
Weird they didn't go for 4 full cores.
tipoo - Wednesday, November 12, 2014 - link
There is no reason for a (and there is no) fourth core just because of the cache amount.
iMacmatician - Wednesday, November 12, 2014 - link
Could the L2 cache be slightly smaller or larger than 2 MB so it divides by 3 but can't be easily distinguished from 2 MB in that test?
name99 - Wednesday, November 12, 2014 - link
Why does it have to BE divisible by 3?
Do you understand how the L2 cache works and the role it plays? There is nothing that say THESE transistors in the cache are dedicated to one CPU and not another.
name99 - Wednesday, November 12, 2014 - link
Oh for crying out loud.
This is human engineering design, it's not numerology. You can stick as many damn cache transistors as you like attached to as many CPU transistors as you like.

"asymmetric" (ie non-power of 2 ram/core) cache designs are common for L2 and L3. POWER has used them frequently (in that case with a power of 2 number of cores, but a non-power of 2 number of cache slices). There's, I believe, an nV GPU that also uses them literally because they ran out of space on the die and could only fit something like 21 of 24 planned cache slices.
lucam - Wednesday, November 12, 2014 - link
It seems that Apple A8X GPU and Tegra K1 use 2 different ways to reach similar performance. The first uses more cluster (cores) utilising a very low frequency in order to be power efficient as well. The latter, instead, use a ''simpler'' cores configuration but with aggressive frequency (at least in the Shield tablet) with a consequential trade off of higher consumption.
Engineering speaking Apple did an astonishing job placing a so complex chip in only 20nm and with so low power consumption though.
Dribble - Wednesday, November 12, 2014 - link
It's not really - the A8X is a significantly larger more expensive to manufacture chip then K1. If nvidia had gone for the same size/cost restrictions as A8X it would be a lot faster.
WaltFrench - Wednesday, November 12, 2014 - link
End of the day, these lustworthy chips are all about making money for the companies. Apple's SoC efforts sell iPads; nVidia's sell chips at probably well less than 10% of the iPad's price.

nVidia's main GPU business dwarfs their CPU business, so they can afford to prop up what has to be a much less profitable business, as long as they see it as a future business they can profit from. But they have had awful luck with design wins in the Xoom and Surface/RT, plus the horrible distraction of trying to get Flash working.

Anybody who can estimate the number of nVidia CPUs shipped, and the gross margins that eventually have to be plowed back into future enhancements, I'd love to see. Until then, I'm curious how much longer nVidia will continue to play the game, with all the volume in ultra-low-cost devices.
chizow - Wednesday, November 12, 2014 - link
Except Nvidia was able to match it in GPU perf without the benefit of 20nm and Apple was only able to beat Denver K1 CPU in multi-threaded by shoehorning an entire CPU core in there.

Don't get me wrong, A8X is impressive, but the more we find out about it, the less impressive it seems. A8X on 20nm has 50% more CPUs, 100% more GPUs compared just to match Nvidia's GPU perf and eek out a win in the CPU benches while still on 28nm.
Squirrel! - Wednesday, November 12, 2014 - link
Did you take into account power and heat, as well as sustained performance?
michael2k - Wednesday, November 12, 2014 - link
And it also means Apple can scale down to a phone by removing one CPU and one GPU. NVIDIA doesn't have that option.
ltcommanderdata - Wednesday, November 12, 2014 - link
As lucam was talking about there's a balance between transistors and frequency to achieve performance. Apple spends transistors on the problem whereas nVidia achieves performance by higher frequency. Apple may have 50% more CPU cores, but nVidia's CPU is clocked 53% higher (2.3 GHz vs 1.5 GHz). It's actually very interesting that two very different CPU architectures and design philosophies are able to achieve similar performance and great that nVidia is able to match Apple's 2nd gen 64-bit design on their first try.
Nogib - Wednesday, November 12, 2014 - link
I think nVidia had to clock it that high since they went with an in-order design rather than out-of-order.
testbug00 - Wednesday, November 12, 2014 - link
Nvidia's architecture is about 100/60% IPC of Apples in INT/FP based on benchmarks that I have seen.

I'm guessing that Apple can clock up if they want, while, Nvidia is likely near their max clockspeed.
jasonbayk - Wednesday, November 12, 2014 - link
They don't match. Not even close. I am not sure why the on-screen benchmark is provided in this article. A8X has to ship 1.4 as many pixels for the screen resolution compared to Shield's 1980x1080. Better compare off-screen performance on a same resolution.
deppman - Saturday, November 15, 2014 - link
The Shield tablet resolution is 1920x1200. The iPad pushes (2048×1536)÷(1920×1200) = 1.365 pixels for each pixel the shield tablet. The reason onscreen numbers are important is because *that's what people see*. The Shield tablet tries to optimize the balance of screen fidelity, speed, and battery life for the best user experience. I think they got it pretty close to perfect.
testbug00 - Wednesday, November 12, 2014 - link
clockspeeds? NVidia's chip is clocked in the 700-800MHz range. Apple's is probably 400-600Mhz.

How about power usage?

anyhow, it has been said time and time again that 20nm performance gains over 28nm are low. Not sure why this is a shocker.
lukarak - Thursday, November 13, 2014 - link
And NVIDIA was only able to beat it by running it at 800 MHz more, or, in other words that better describe it, MORE than 50%.
easp - Thursday, November 13, 2014 - link
It's kind of silly to pit these chips against each other, but it is even sillier to compare mobile SoCs without taking into consideration power consumption.

The new Tegra's direct competitors are other ARM SoCs, nearly all of which are 4-8 core CPUS. Nvidia seems to offer a chip that could trounce them on CPU and GPU performance. Using a brawny dual core CPU should help them in terms of power consumption. Using a smaller die will help them hold the line on price. Whether these all come together to result in some significant design wins remains to be seen.

The A8X really has no direct competition. Apple is going to ship it. If they had missed the schedule, for this fall's iPad, it wouldn't have been good, but it wouldn't have been the end of the world. The iPad would still be competitive with a higher clocked A8, and the 8x might have found a place in a shipping product eventually.

The a8x doesn't have to compete with other SoCs, just as the iPad doesn't compete with other tablet hardware. The a8x is one piece of apples offering, as is the rest of the iPad hardware, as is iOS, the App Store, and Apple's retail experience. Apple cares about average age selling price of the devices they sell, and average margins. If spending a bit more on the a8x's silicon is a good way to ensure that outcome, then the fact that NVidia went a different route to a different goal is not a particularly useful comparison.

More interesting perhaps: both companies are fabless. An Apple chip with a combo of in house CPU ip and 3rd party GPU IP is performance competitive with a chip from NVIDIA based on in house CPU and GPU ip. What does that say? In particular what does that say about the value of NVidia's GPU IP?
iwod - Wednesday, November 12, 2014 - link
It could also be the A8X was designed with a much higher resolution 12.x" iPad Pro in mind?
And next year it would a die shrink of A8X in to 16nm with LPDDR4 on to iPhone 6S. Given I think the Plus is Gfx and Memory constrain right now.
It could also pave the way for future Apple SoC Macbook Air.
I wonder if Apple could jump ahead with PowerVR 7 next year as well. They surprise everyone with ARM64, PowerVR7 could be another case.
iwod - Wednesday, November 12, 2014 - link
I also wonder how graphics at such high resolution isn't memory capacity constrain? The iPad and iPhone 6Plus is reaching console level graphics. And yet those PC and system have 8GB+ memory for them.
hammerd2 - Wednesday, November 12, 2014 - link
I'd imagine it's to do with the legendarily low memory bandwidth requirements for all PowerVR TBDR designs.
DanNeely - Wednesday, November 12, 2014 - link
They're approaching the levels of the gfx in the PS3/XB360. In terms of current hardware, the A8X's GPU is around the same size as a smaller Intel IGP.
timbob2000 - Wednesday, November 12, 2014 - link
Is it possible the extra 2 clusters are not activated in the iPad Air 2, and therefore not being used? They could be used for a larger iPad down the road.
kwrzesien - Wednesday, November 12, 2014 - link
That's an insightful observation...only the Pro will tell.
hahmed330 - Wednesday, November 12, 2014 - link
Tegra K1 does 384 32bit Flops per clock while GXA6850 does 512 32bit Flops per clock yet they perform equally. Architecture performance efficiency wise Tegra K1 gets the thumbs up. On the other hand power efficency wise GXA6850 is ahead for two reasons. Firstly ability to do 1024 16 bit Flops per clock and being based upon 20nm.
Kevin G - Wednesday, November 12, 2014 - link
@hahmed330

The per clock numbers are correct but you are ignoring each GPU's clock speed while doing the comparison. If the K1 was running its GPU at twice the clock speed, the A8X would then be the more efficient design. This is what makes the comparison difficult as the GPU clock of the A8X is currently unknown.
kron123456789 - Wednesday, November 12, 2014 - link
In that case it's strange that A8X has so little advantage over K1. Perhaps, it's clocked around 600МГц or something. Damn, it have four times more ROPs, two times more TMUs and 33,(3)% more FP32 ALUs and it's andvantage is 10-15%.
DERSS - Wednesday, November 12, 2014 - link
K1 clocks for both CPU and GPU are dramatically higher, so it is just different approach. A8X is big, but has low frequencies for better power consumption.
chizow - Wednesday, November 12, 2014 - link
Interesting piece Ryan, definitely appreciate the digging. A8X is still impressive, but this news makes it less impressive, imo. Apple needed to boost GPU by 100% and CPU by 50% fully leveraging 20nm to beat Nvidia's Denver Tegra K1. This bodes well for Nvidia's next SoC, Erista, which will have both Maxwell and 20nm.

I also really appreciated the comparison in die sizes and overall performance. It looks like Intel may actually be the most impressive in the bunch with their Haswell/GT3 chip at only 1.7Bn transistors? I mean they need higher frequencies (and TDP) to reach those performance levels, but its still in a completely different class in terms of performance (as Surface Pro 3 benches show). Broadwell-Y should also help close the gap some with Core M, although performance will most like be parked where it is now as a result.
GC2:CS - Wednesday, November 12, 2014 - link
I don't know how can be more advenced proces and a wider GPU interpreted as a negative, but it's the fail of nvida they don't want to pay for bigger chips/smaller process.

It completelly doesn't matter if nvidia can achieve similar performances at worse process and narrower GPU, the fact that Apple can even compete with an exclusivelly GPU company is rather embarasing to nvidia. You know, they haven't been doing their own chips 5 years ago.

Then it's silly to take just performance into consideration and say nvidia is better only because it's cheaper. You had to put power compustion into equation.
How much less power A8X consumes compared to tegra ? That's the question of the year.
chizow - Wednesday, November 12, 2014 - link
Are you being serious here? It would be as if AMD was hypothetically competing with or beating Intel's 22nm chips with their own 28nm chips that used 2/3rd the transistors. The fact they do use a shared fab bodes well for Nvidia because these same avenues for performance gain and power-savings are obviously going to be open to them as well.

And what sense would it make for Nvidia to pay the premium that Apple paid to TSMC for early/exclusive access to 20nm when Nvidia does not have nearly the readily available market for their SoC that Apple does? Sure Nvidia is a huge company that primarily makes GPU, but in this arena, they are a small fish compared against the likes of Apple, Samsung, Qualcomm. Apple alone generates some 70-75% of their revenues that number in the tens-of-billions on products that directly rely on their Ax-based SoCs, so of course they are going to spend top dollar to ensure they are on the leading edge of everything. The fact Nvidia is able to even keep up in this regard and even exceed Apple/Qualcomm with <$1Bn in operating budget per year for their Tegra unit is simply amazing, and certainly nothing to be ashamed of.

And what of power consumption? Again they are close enough to the point its really negligible. Nexus 9 has a smaller footprint, smaller battery, slightly better battery life vs. the Ipad Air 2, again taking into consideration Apple's own power-saving claims for A8/A8X this is another amazing accomplishment on the part of Nvidia.
lucam - Wednesday, November 12, 2014 - link
Where did you read about Nexus 9 battery life. I am still waiting for Anand full article.
As regards of Nvidia that can't have access to 20nm is just laughable. The soc is not 20nm because at this stage it can't simple as that. If it was Nvidia already started the fabrication of it.
chizow - Wednesday, November 12, 2014 - link
Nexus 9 Battery life is in AT's N9 preview, with iPad Air 2 results and you can see, the N9 edges the Air 2 out with a smaller battery to boot. The Air 2 does have a bigger screen, but you can see, the results are close enough to say battery life/power consumption concerns are negligible between the two.
http://images.anandtech.com/graphs/graph8670/68887...

I never said Nvidia wouldn't have access to 20nm eventually, just not in the timeframe slated for Tegra K1. Apple paid for early/exclusive access to it, plain and simple. There was a lot of speculation about this a few years ago and we have seen it come to fruition as Apple is the only SoC maker that is producing 20nm chips from TSMC this year.

http://hothardware.com/Reviews/GameChanger-TSMC-Ma...

At this point there's no reason for Nvidia to go with K1 on 20nm and grow their existing SoC, they'll undoubtedly wait for Erista with Maxwell GPU and increased/refined Denver CPU at this point if they bother to move to 20nm at all.
lucam - Wednesday, November 12, 2014 - link
Do you want really see the battery performance?
You will see the Ipad air 2 has longer life and the performance is still the same
http://gfxbench.com/compare.jsp?benchmark=gfx30&am...
Ipad Air 2 long long term: 50.7fps
Nexus 9 long term: 36.6fps
Needless to say those benchmarks are normalised to do a fair comparison.
As I said, Anandtech has to finish his article and you will see they will confirm what found in the gfx bench.
Nvidia can also have access to 20nm when they want, they only need to design a soc that it could fit it.
You then are losing one major point. If Tegra K1 was so efficient, why Nvidia didn't remove some core cpu/gpu to put inside a smartphone? Possibly because they could not reach those level of performance, resulting far behind (maybe) the A7 or Adreno 330.
kron123456789 - Wednesday, November 12, 2014 - link
No, that's because K1 is most simpler Kepler design(1 GPC with 1 SMX). I think they just couldn't remove some of CUDA cores and get it to work. But they can do it with Maxwell because 1 SMM has only 128 CUDA cores and i think they can even split it in two if they want to.
chizow - Wednesday, November 12, 2014 - link
Running a benchmark loop is typical usage pattern for most end-users? I think most users would go by a typical light/browser test to see what kind of battery life they get with these devices and as I said, they do show the two are very comparable.

Again, Nvidia will go to 20nm or smaller eventually, but that won't happen with Tegra K1, as the process was not available to them. Only Apple had access because they paid for the privilege. If 20nm was an option to Nvidia from the outset for Denver K1, you don't think they would have taken it?

And finally I'm not missing your major point, Tegra K1 would have no problems fitting in a smartphone given it's predecessor Tegra 4 was able to do so, and the K1 has shown to be more power efficient than that. The problem is the lack of integrated LTE which makes it a non-starter for most OEMs/integrators, especially given the fact GPU performance isn't the top driver for smartphone SoC metrics. I guess by the same token the amazing point you are missing is why the A8X isn't in a smartphone?
lucam - Thursday, November 13, 2014 - link
Look Chizow, I showed a benchmark where it was clearly shown the long life battery of Ipad Air 2 vs Nexus 9. Then I showed you also another one where the Ipad Air 2 sustains higher fps during time than Nexus, proving the fact it's more efficient.
What else you want me to show, if gfxbench is not enough? You want me to link a naked woman holding an Ipad Air 2 to convince you? What ever I say you find an excuse.
Then you said this no-sense idea that only Apple have access to 20nm. Next year when Apple will move to 16nm, you will same the same.
But the fact is now that the A8X performs better than K1 Denver in any sense, that's it.
Than we got some idiots around that keeps saying that Tegra K1 can be only 1 SMX, so that's why it can't go below that. So why Nvidia didn't improve the old Tegra 4 to put it inside a smartphone? Because they don't have a design for that, and no major vendors want Nvidia inside the smartphone, simple as this!!
Chizow the fact is, to date, there is not Tegra inside the smartphones despite your interesting assumptions; and the K1 is not efficient as A8X.
I wish to find you a link of a nice naked woman with an Ipad Air 2 though!!
deppman - Thursday, November 13, 2014 - link

"Look Chizow, I showed a benchmark where it was clearly shown the long life battery of Ipad Air 2 vs Nexus 9. Then I showed you also another one where the Ipad Air 2 sustains higher fps during time than Nexus, proving the fact it's more efficient"

Eh, but that's not quite right. Here is how the shield long-term performance graph looks: http://images.anandtech.com/doci/8329/Run2FPS.PNG

That's not like many others SoCs (I'm looking at you, Adreno) which throttle in one or two times. That over 100 sustained runs with constant performance. It's only when the Shield gets into battery saving mode when it drops FPS.

The bottom line is until the A8x no other SoC came even close to the K1 in GPU perf/W. And even at 20nm and a huge die, it's arguable if the A8x comes out ahead. 3D mark scores for the Shield Tablet, for example, are 33% higher than the iPad Air 2.
lucam - Friday, November 14, 2014 - link
Take a look at this link:
http://gfxbench.com/result.jsp?benchmark=gfx30&...

Shield at 1920x1104 =56.4fps
Ipad Air 2 at 2048x1536 =52.6fps
Google Nexus 9 at 2048x1440 =37.9fps
Xiaomi Mipad at 2048x1536 = 35.9fps

It's obvious that Shield run quicker only because of resolution; if you the other Tegra K1 devices the fps the performance goes down dramatically.
deppman - Saturday, November 15, 2014 - link
That's a link to long-term performance, which favors the devices with better thermal management, and does not support your argument of "only because of the resolution".

The Shield and iPad air obviously have better thermal engineering, with the aluminum chassis of the iPad clearly offering the best condition. And if you don't think heat is an issue, consider that the iPad mini 3 doesn't have the A8X.

This link http://gfxbench.com/compare.jsp?benchmark=gfx30&am... is a much more appropriate comparison of relative performance, and your argument is much less convincing, and the K1 has much higher render quality.
deppman - Thursday, November 13, 2014 - link
http://gfxbench.com/compare.jsp?benchmark=gfx30&am...

Notice how the Shield tablet - which uses a heat spreader similar to the iPad air 2 - wins in long term perf, and has much better image quality. Run time at max perf isn't great due to a relatively small battery and inefficiencies introduces when running "flat out", but Anandtech has shown that by capping the OS to 30 fps, that can be more than doubled.

If the N9 was built more like the Air 2, with a heat spreader and a conductive aluminum chassis, it would almost certainly not have throttling issues.
GC2:CS - Friday, November 14, 2014 - link
"That's not like many others SoCs (I'm looking at you, Adreno) which throttle in one or two times. That over 100 sustained runs with constant performance. It's only when the Shield gets into battery saving mode when it drops FPS"

The question is if it doesn't throttle because it runs cool, or becuse of combination of passive heatsink and very high allowed tempeartures ?

"The bottom line is until the A8x no other SoC came even close to the K1 in GPU perf/W. And even at 20nm and a huge die, it's arguable if the A8x comes out ahead. 3D mark scores for the Shield Tablet, for example, are 33% higher than the iPad Air 2."

Really ? What about the A7 ? It's got around 40% of k1 performance, but it could fit into a ultra-thin mobile phone and last almost as long as the shield tablet on full load in gfx bench. The power compustion of it is around 2,5 W. K1 is so power hungry no one dares to put it into a 7" tablet not even talking about phones.

"Notice how the Shield tablet - which uses a heat spreader similar to the iPad air 2 - wins in long term perf, and has much better image quality. Run time at max perf isn't great due to a relatively small battery and inefficiencies introduces when running "flat out", but Anandtech has shown that by capping the OS to 30 fps, that can be more than doubled."

The heatspreader in iPad Air 2 is a rectangle of metal put over it, the heatspreader in shield is a thick plate of magnesium spaning over entire tablet (putting the weight of that 8" tablet dangerously close to 9,7" iPad Air itself)
Then don't you think the shield tablet got a bit better performance, only because it has to power around 50% less pixels ? Run time at max load is terible, bacause tegra is a battery assasin, as you said it's really inneficient when it tries to run at A8X level of performance.

"If the N9 was built more like the Air 2, with a heat spreader and a conductive aluminum chassis, it would almost certainly not have throttling issues."

Not so much- such incredibly thin device drops the thearmal headroom dramatically a very thin aluminium plate is not as good heat conductor as it looks. Then tegra K1 is shipped at 5W TDP to OEM's which doesn't allow it to run at "flat out" for long.
deppman - Friday, November 14, 2014 - link

"... but it [the A7] could fit into a ultra-thin mobile phone and last almost as long as the shield tablet on full load in gfx bench." <- at a much lower resolution at (640x1136)/(1920x1200) = 0.315, or 31.5‰ while the Shield tablet cranks out 53.9/40.7 = 1.324 or 132.4% of the frame rate. So the 5s is doing 0.315/1.342=0.238 or 23.8% of the work over time as the Shield tablet. In other words, in the same feeble envelope, the K1 would likely destroy the A7 in both effeciency and performance.

"K1 is so power hungry no one dares to put it into a 7" tablet not even talking about phones." Except the 7" Google Tango tablet. And the phone argument is a red herring. The K1 isn't in phones because of Qualcomm modems and patents more than anything else.
deppman - Friday, November 14, 2014 - link
"Then don't you think the shield tablet got a bit better performance, only because it has to power around 50% less pixels?"

(1920×1200)÷(2048×1536) = 0.73242188, or 73%. I wouldn't call that "about half". Another misrepresentation.

I get it, your a fanboy that will doublethink your way into justifying your emotional predisposition. But the reality is the K1 is a pretty impressive chip that handily out lasts the A7 at the same perf, or provides 2.5x the GPU perf at the cost of lower battery life, and it actually competes fairly well with the A8x, while providing superior image quality - I'm guessing this is because the K1 renders full FP32 only?
lucam - Friday, November 14, 2014 - link
http://gfxbench.com/result.jsp?benchmark=gfx30&...
Buk Lau - Wednesday, November 12, 2014 - link
tegra k1 already has 1 SMX of CUDA cores and you can't go below that, that's just architecture of kepler. also what's the point of designing a 20nm SoC when apple almost monopolized all the supplies from TSMC? before saying what the tegra k1 can't do, at least know what it has and what it does first
lucam - Friday, November 14, 2014 - link
This must be the reason why there are so many smartphones with K1
deppman - Friday, November 14, 2014 - link
The phone argument is a red herring. You're not differentiating between correlation and causation. The K1 isn't in flagship phones because of Qualcomm modems, patents, and existing business relationships. And it isn't in low-end phones outside of the US patent zone because Rockchip and others have made that a commodity market.
lucam - Friday, November 14, 2014 - link
Absolutely, if it wasn't for Qualcomm and Rockchip Tegra K1 was already inside the phones! This is a good one!!
deppman - Friday, November 14, 2014 - link
Your response only emphasizes your apparent inability to distinguish correlation and causation.
Please explain why ALL flagship phones in the US have QC chips? There is no Intel, no NV, no AMD. Do you think their CDMA patents and licensing might have something to do with it? Maybe? Rockchip is just an example of a commodity player; another is MediaTech.
lucam - Friday, November 14, 2014 - link
Your response doesn't answer either, unless you justify the absence of Tegra in smartphone because of Qualcomm, Intel or something else. Look, what happened in meantime to Tegra 4i and evolution of them?
AMD in the phones? I have to tell the truth I miss that!!
deppman - Saturday, November 15, 2014 - link
"Your response doesn't answer either"

No, actually it does. If you read my first response, you will see clearly a reason provided for the near 100% monopoly of QC chips in flagship phones in the US. Only Apple employs a non-QC SoC, and even they use a QC modem.

The fact that no other SoCs are used doesn't prove much about said SoCs outside the fact that QCs strategy of using their CDMA and other patents to bar competition has been quite effective.

Consider that in counties the Shield tablet supports voice calls, but not in the US. Hmmm, I wonder why...
lucam - Saturday, November 15, 2014 - link
So you are clearly saying that because of Qualcomm, the K1 could be inside phones.
What about a Shield phone then? At the end of the day, Nvidia can still place his chips in their proprietary devices.
deppman - Saturday, November 15, 2014 - link
@lucam "So you are clearly saying that because of Qualcomm, the K1 could be inside phones.
What about a Shield phone then? At the end of the day, Nvidia can still place his chips in their proprietary devices."

The Shield tablet today probably could place voice calls in the US. This is speculation based on the fact that it can place calls in other countries and Icera modem capabilities - see https://store.blackphone.ch. But even if they did, they would have either a bevy of licensing fees to pay or a law suit on their hands. Certainly, they are going to pick their battles, and investing in a device with expensive FRAND licensing fees doesn't make sense. All flagship phone manufacturers in the US apparently agree.
lucam - Monday, November 17, 2014 - link
That is the reason for no Tegra 4i and disappearing of other Tegras in the mobiles. It's because of others. Still doesn't explain why there is no Shield phones just for the sake of.
testbug00 - Wednesday, November 12, 2014 - link
Nexus 9 has a 14% smaller screen area and gets half a percent better battery life with a 7% smaller battery.

IN WIFI TESTS. Where the main power usage is the screen not the SoC, and, the SoC is in a low idle state, with power gating being pretty equal for everyone. lowering nodes hasn't decreased the IDLE POWER to much recently from my understanding.

UNTIL you can break down the component power draw in this Wifi test for the iPad and Nexus 9 it is worthless. The only way to test the SoC power consumptions would be to measure the screen power draw and run a CPU+GPU intensive task that is locked to a framerate that both chips can achieve. Than, you measure power consumption. all radios off.

Even that does not count between the differences in screen size (granted, if the SoC is running high power the 14% larger area of Apple shouldn't matter more than 1-2% battery life)
HisDivineOrder - Wednesday, November 12, 2014 - link
Operation: Ananding Apple is a go. Operative codenamed A has already garnered useful intelligence on the inner workings of heretofore undiscoverable GPU technology in use by Apple. Operative reports that Apple is currently devising a new kind of GPU unseen by human eyes in their hidden base deep within a currently inactive volcano. They are harnessing the thermal energy to produce an endless supply of power to fuel the super computers they need to design and test said GPU technology.

A insists his cover is not blown even with the revelation of this exciting new GPU technology, but he has been invited to a party in his honor at another facility off-site. Apparently, the base he's going to be visiting is having, "Bring your Kid to Work" Day that day. In his last transmission, A remarked the AI that runs the base insists there will be cake. And that all his friends will be there. A is actually excited because he loves cake.

He is assured that the large canisters of what the labels describe as "Deadly Neurotoxin" he's seen being wheeled in the cargo hold of the plane he's in are for animal testing and have nothing to do with him.
creed3020 - Wednesday, November 12, 2014 - link
I'm surprised it took four pages before a comment like this... :)

I really appreciate the narrative. Long live Agent A?
darkich - Wednesday, November 12, 2014 - link
+111
testbug00 - Wednesday, November 12, 2014 - link
hey look, Apple designed their own GPU (kinda) wonder who said that would happen a long while ago.

On that note, that puts this GPU ahead of the GK20A GPU in terms of performance/frequency.
tviceman - Wednesday, November 12, 2014 - link
Performance/frequency is a worthless, made up, meaningless metric. The only two metrics are performance/watt and outright performance.
testbug00 - Wednesday, November 12, 2014 - link
pref/frequency matters. Although, it is best served for one companies product line that shares a base.
name99 - Wednesday, November 12, 2014 - link
"Apple designed their own GPU"
Not yet. This is basically the A5 (or even A4) of Apple's GPUs. We have yet to see the A6 (fully custom GPU) and A7 (throw every optimization idea at it GPU).

But it does follow the path many of us were suggesting a year ago. IMHO a large part of Apple's focus when they switch to their own GPU will be on creating a viable and performant HSA system. This doubtless won't be very sexy to the gaming crowd who are more interesting in counting number of FMAs than in new underlying capabilities, but it will provide Apple with a machine that can very easily and with extremely low overhead run parts on an app (which may have nothing to do with graphics) on the throughput engine aka GPU.
[Just like Swift was somewhat conservative, their first GPU may not be full HSA; the A7 version of the GPU may be the one where they're confident in what they're doing and can do the equivalent of throwing the rest of the ARM world an HSA curveball two years before they were prepared for it...]
tviceman - Wednesday, November 12, 2014 - link
I see I'm not the only one who was misled by the article title. The A8X is not more or less impressive, it just has a larger functioning GPU than originally thought.

I hope Erista comes within the first half of 2015 on 20nm; it should be an absolute beast in performance and hopefully... FINALLY... gets Tegra solid traction in mobile products. Shield is good (if not a bit heavy) while Nexus 9 looks about perfect in size but storage options area joke. I'd like to see a MiPad-like device in price and performance come state-side.
ruthan - Wednesday, November 12, 2014 - link
Ok, so fillrate is great, but this GPU bottleneck is probably elsewhere.. GPU is so quick how is quick its slowest part.
name99 - Wednesday, November 12, 2014 - link
"the iPad Air 2 is nowhere near twice as many pixels as the iPhone 6 Plus. "
Maybe not. But what about the semi-mythical iPad Pro...?
One suspects THAT is the real reason for the A8X.
darkich - Wednesday, November 12, 2014 - link
Now THAT'S the stuff I've been waiting for to read.
You've just redeemed yourself for the totally generic iPad Air 2 review.
tipoo - Wednesday, November 12, 2014 - link
Speaking of the K1 at the end of the article, I really want a deep dive into Denver. I'm wondering if the code morphing is the reason for the Nexus 9s inconsistant performance. In clean code, the architecture is beastly, beating A8X in single threaded performance. But as soon as you multitask or thow spaghetti code at it, it seems to choke, just like you'd expect from code morphing of old.
lindukids - Thursday, November 13, 2014 - link
业界良心，感动cry
lindukids - Thursday, November 13, 2014 - link
干脆叫GX6450 sli/cf算了233333333333
hpascoa - Thursday, November 13, 2014 - link
I would love to see an article comparing this new generation of mobile SoCs to the previous generation of console hardware. It seems to me that Apple's "GXA6850" and Nvidia's Tegra K1 are in a position that they could be compared to the Xbox 360's Xenon/Xenos and the PS3's Cell/RSX architectures. Even if the results would obviously have to be theoretical in nature, due to the impossibility of doing a real-world apples-to-apples comparison, I still think it would be an interesting piece to read. Maybe the current PC IGPs from Intel and AMD could also be thrown into the mix to help put things into perspective.
darkich - Thursday, November 13, 2014 - link
Digital foundry made some analysis and comparison of the Shield Tablet graphics, with Trine 2 ran on Shield and consoles.

I'd say that hardware wise the K1 and A8X are definitely on par with the consoles but it will take time for games to be ported properly.

With fully enabled Metal though, I'm sure we'll eventually see games on the iPad Air 2 that look better than anything we've seen on the last gen consoles.
kron123456789 - Thursday, November 13, 2014 - link
Don't forget the fact that games for X360 and PS3 are mostly running in 720p resolution (sometimes even lower) and display of iPad Air 2 has ~3.5 times more pixels than that.
wysiwyg3826 - Thursday, November 20, 2014 - link
So the A8X has the basically the same Geekbench 3 score as the Intel chip in the 2011 MacBook Air...not an apples-to-apples comparison but I wonder where Apple might put an A9X into next year...iPad Air 3 or something else?

Apple A8X’s GPU - GXA6850, Even Better Than I Thought

Post Your Comment

114 Comments

Back to Article

dyc4ha - Tuesday, November 11, 2014 - link

Ryan Smith - Wednesday, November 12, 2014 - link

dyc4ha - Wednesday, November 12, 2014 - link

Spunjji - Wednesday, November 12, 2014 - link

vFunct - Wednesday, November 12, 2014 - link

jameskatt - Saturday, November 15, 2014 - link

feldspar - Tuesday, November 11, 2014 - link

przemo_li - Wednesday, November 12, 2014 - link

tipoo - Wednesday, November 12, 2014 - link

hecnpo - Wednesday, June 17, 2015 - link

jjj - Tuesday, November 11, 2014 - link

smartypnt4 - Wednesday, November 12, 2014 - link

jjj - Wednesday, November 12, 2014 - link

testbug00 - Wednesday, November 12, 2014 - link

easp - Wednesday, November 12, 2014 - link

tipoo - Wednesday, November 12, 2014 - link

testbug00 - Wednesday, November 12, 2014 - link

tviceman - Wednesday, November 12, 2014 - link

testbug00 - Wednesday, November 12, 2014 - link

aenews - Saturday, November 15, 2014 - link

ltcommanderdata - Wednesday, November 12, 2014 - link

vFunct - Wednesday, November 12, 2014 - link

bill.rookard - Wednesday, November 12, 2014 - link

jameskatt - Saturday, November 15, 2014 - link

blackcrayon - Wednesday, November 12, 2014 - link

jameskatt - Saturday, November 15, 2014 - link

LJSteve - Friday, November 21, 2014 - link

CrazyElf - Wednesday, November 12, 2014 - link

kurahk7 - Wednesday, November 12, 2014 - link

Daniel Egger - Wednesday, November 12, 2014 - link

vFunct - Wednesday, November 12, 2014 - link

Bob Todd - Wednesday, November 12, 2014 - link

Nogib - Wednesday, November 12, 2014 - link

prosit - Wednesday, November 12, 2014 - link

Ryan Smith - Wednesday, November 12, 2014 - link

prosit - Wednesday, November 12, 2014 - link

Ryan Smith - Wednesday, November 12, 2014 - link

prosit - Wednesday, November 12, 2014 - link

DERSS - Wednesday, November 12, 2014 - link

prosit - Wednesday, November 12, 2014 - link

tipoo - Wednesday, November 12, 2014 - link

iMacmatician - Wednesday, November 12, 2014 - link

name99 - Wednesday, November 12, 2014 - link

name99 - Wednesday, November 12, 2014 - link

lucam - Wednesday, November 12, 2014 - link

Dribble - Wednesday, November 12, 2014 - link

WaltFrench - Wednesday, November 12, 2014 - link

chizow - Wednesday, November 12, 2014 - link

Squirrel! - Wednesday, November 12, 2014 - link

michael2k - Wednesday, November 12, 2014 - link

ltcommanderdata - Wednesday, November 12, 2014 - link

Nogib - Wednesday, November 12, 2014 - link

testbug00 - Wednesday, November 12, 2014 - link

jasonbayk - Wednesday, November 12, 2014 - link

deppman - Saturday, November 15, 2014 - link

testbug00 - Wednesday, November 12, 2014 - link

lukarak - Thursday, November 13, 2014 - link

easp - Thursday, November 13, 2014 - link

iwod - Wednesday, November 12, 2014 - link

iwod - Wednesday, November 12, 2014 - link

hammerd2 - Wednesday, November 12, 2014 - link

DanNeely - Wednesday, November 12, 2014 - link

timbob2000 - Wednesday, November 12, 2014 - link

kwrzesien - Wednesday, November 12, 2014 - link

hahmed330 - Wednesday, November 12, 2014 - link

Kevin G - Wednesday, November 12, 2014 - link

kron123456789 - Wednesday, November 12, 2014 - link

DERSS - Wednesday, November 12, 2014 - link

chizow - Wednesday, November 12, 2014 - link

GC2:CS - Wednesday, November 12, 2014 - link

chizow - Wednesday, November 12, 2014 - link

lucam - Wednesday, November 12, 2014 - link

chizow - Wednesday, November 12, 2014 - link

lucam - Wednesday, November 12, 2014 - link

kron123456789 - Wednesday, November 12, 2014 - link

chizow - Wednesday, November 12, 2014 - link