NVIDIA Tegra X1 Preview & Architecture Analysis

Name: NVIDIA Tegra X1 Preview & Architecture Analysis
Item: NVIDIA Tegra X1 Preview & Architecture Analysis

by Joshua Ho & Ryan Smith on January 5, 2015 1:00 AM EST

194 Comments | Add A Comment

194 Comments

In the past few years, we’ve seen NVIDIA shift their mobile strategy dramatically with time. With Tegra 2 and 3, we saw multiple design wins in the smartphone space, along with the tablet space. These SoCs often had dedicated GPUs that were quite unlike what we saw in NVIDIA’s desktop and laptop GPUs, with a reduced feature set and unique architecture. However, with Tegra K1 we saw a distinct shift in NVIDIA’s SoC strategy, as the Tegra K1 was the first mobile SoC to achieve parity in GPU architecture with desktop GPUs. In the case of the Tegra K1, this meant a single Kepler SMX which made for truly incredible GPU performance. However, in the time since we’ve seen companies like Apple release new SoCs such as the A8X, which managed to bring largely similar performance with less power.

NVIDIA of course is never content to stay idle for too long and let themselves be surpassed, which has led to the company developing their Tegra SoCs on a roughly yearly cadence. In NVIDIA’s development timeline, 2014 brought us the Tegra K1-32, the company’s first SoC to integrate a modern high-performance GPU architecture (Kepler), and later on the TK1-64, a K1 with NVIDIA’s own in-house developed ARMv8 Denver CPU in place of the 32bit ARM A15 CPU.

NVIDIA's GTC 2014 Tegra Roadmap

2014 also brought a revision to NVIDIA’s public Tegra roadmap: for 2015 NVIDIA would bump the previously announced Parker SoC and release a new in its place, Erista. This schedule change ended up being quite a surprise due to the fact that NVIDIA had up until this time released their roadmaps roughly 2 years out, whereas Erista was put on the map less than a year before it was to be announced. More unusual, NVIDIA offered up minimal details of Erista; it would be powered by a Maxwell GPU with no additional information on the CPU or manufacturing process. Parker on the other hand – NVIDIA’s planned Denver + Maxwell + 16nm FinFet part – fell off the schedule, with NVIDIA not officially canceling it but also saying little else about it.

Now in 2015 and with the launch of the Tegra X1, we can finally begin putting the picture together. Erista as it turns out is something of a rapid release product for NVIDIA; what had been plans to produce a 16nm FF part in 2015 became plans to produce a 20nm part, with Erista to be that part. To pull together Erista NVIDIA would go for a quick time-to-market approach in SoC design, pairing up a Maxwell GPU with ARM Cortex A57 & A53 GPUs, to be produced on TSMC’s 20nm SoC process.

CPU

We’ll kick off our look at Erista then with a look at Erista’s CPUs, where one of the bigger surprises in Erista is the CPU configuration. While Tegra K1 had a version with NVIDIA’s custom Denver core for the CPU, NVIDIA has elected to use ARM’s Cortex A57 and A53 in the Tegra X1. The A57 CPU cluster has 2MB of L2 cache shared across the four cores, with 48KB/32KB L1s (I$+D$) per core. The A53 cluster has 512KB of L2 cache shared by all four cores and 32KB/32KB L1s (I$+D$) per core. NVIDIA representatives stated that this was done for time to market reasons.

However, rather than a somewhat standard big.LITTLE configuration as one might expect, NVIDIA continues to use their own unique system. This includes a custom interconnect rather than ARM’s CCI-400, and cluster migration rather than global task scheduling which exposes all eight cores to userspace applications. It’s important to note that NVIDIA’s solution is cache coherent, so this system won't suffer from the power/performance penalties that one might expect given experience with previous SoCs that use cluster migration.

Although cluster migration is usually a detriment to power and performance (especially without a cache coherent interconnect), NVIDIA claims that Tegra X1 significantly outperforms Samsung System LSI’s Exynos 5433 in performance per watt with 1.4x more performance at the same amount of power or half the power for the same performance on the CPU. It's likely that this difference comes from optimizing the implemented logic and differences between Samsung and TSMC 20nm processes rather than differences in thread management. NVIDIA continues to use System EDP (Electrical Design Point) management to control throttling and turbo rather than ARM's IPA (Intelligent Power Allocation) drivers, and in general it seems that while the CPU architecture is ARM's reference design it's quite clear that everything else is distinctly NVIDIA in design.

All of this of course raises the question of where’s Denver, and what is to happen to it? The use of the ARM Cortex A57 and A53, as NVIDIA tells it, was based on a time-to-market decision, and that NVIDIA could bring an off-the-shelf Cortex-based SoC to the market sooner than they could another Denver SoC. On the surface there is at least some evidence to agree with this, as NVIDIA would be able to buy a 20nm design from ARM versus spending the time porting Denver over from TSMC’s 28nm process to their 20nm process. At the same time however integrating an ARM CPU core into an SoC is not an easy task – logic LEGO this is not – so it’s hard to say whether this actually saved NVIDIA much time.

In any case, much like the still in the dark Parker design, NVIDIA is holding their cards close to their chest and hinting that Denver will still be in future generation products. If Parker is still part of NVIDIA’s plans – and they are without a doubt working on some kind of 16nm FF Tegra SoC – then we may still see Denver as part of the original Parker plan. Otherwise at this point while there’s no reason not to believe NVIDIA about the fate of future Denver SoCs, it’s anyone’s guess when we might see it again.

Uncore

Outside of the CPU and GPU, NVIDIA has also dramatically improved the rest of Tegra X1 in comparison with Tegra K1. We see a move from 64-bit wide LPDDR3 to 64-bit wide LPDDR4 on the memory interface, which improves peak memory bandwidth from 14.9 GB/s to 25.6 GB/s and improves power efficiency by around 40%. In addition, the maximum internal display resolution moves from 3200x2000 at 60 Hz to 3840x2160 at 60 Hz with support for VESA’s display stream compression. For external displays, we see a significant improvement with support for HDMI 2.0 and HDCP 2.2, which means that 4K60 is supported in contrast with the Tegra K1, which only supported 4K30.

The ISP is largely similar to the one we see in the Tegra K1 when it comes to feature set, but JPEG encode and decode rate is now five times as fast, going from 120 MP/s to 600 MP/s. For video encode and decode, we see support for 4K60 H.265 and VP9, with support for 10 bit color on decode on H.265. In addition to support for new codecs in the hardware blocks, we see that H.264 and VP8 now support 4K60, a notable improvement over the 4K30 that was the maximum for Tegra K1. We also see that the storage controller now supports eMMC 5.1 for faster storage performance.

Tegra X1's GPU: Maxwell for Mobile

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

194 Comments

View All Comments

tipoo - Monday, January 5, 2015 - link
Oh I read that wrong, you meant the games, not the play store. Still, games almost never crash on this either.
PC Perv - Monday, January 5, 2015 - link
Why do you guys write what essentially is a PR statements by NV as if they were independently validated facts by yourselves? I suppose you guys did not have time to test any of these claims.

So you end up writing contradictory paragraphs one after another. In the first, you say NVIDIA "embarked on a mobile first design for the first time." That statement in and of itself is not something one can prove or disprove, but in the very next paragraph you write,

"By going mobile-first NVIDIA has been able to reap a few benefits.. their desktop GPUs has resulted chart-topping efficiency, and these benefits are meant to cascade down to Tegra as well." (??)

I suggest you read that paragraph again. Maybe you missed something, or worse the whole paragraph comes off unintelligible.
ABR - Monday, January 5, 2015 - link
Well the situation itself is confusing since NVIDIA might have designed Maxwell "mobile-first" but actually released it "desktop-first". Then came notebook chips and now we are finally seeing Tegra. So release-wise the power efficiency "cascades down", even though they presumably designed starting from the standpoint of doing well under smaller power envelopes.
PC Perv - Monday, January 5, 2015 - link
But that is a tautology that is totally vacuous of meaning. One can say the opposite thing in the exact same way: "We went with desktop first, but released to mobile first, so that power efficiency we've learned "cascaded up" to the desktops.

So the impression one gets from reading that explanation is that it does not matter whether it was mobile first or desktop first. It is a wordplay that is void of meaningful information. (but designed to sound like something, I guess)
Yojimbo - Monday, January 5, 2015 - link
Isn't that standard reviewing practice? "Company X says they did Y in their design, and it shows in Z." The reviewer doesn't have to plant a mole in the organization and verify if NVIDIA really did Y like they said. This is a review, not an interrogation. If the results don't show in Z, then the reviewer will question the effectiveness of Y or maybe whether Y was really done as claimed. Yes, the logical flow of the statement you quoted is a bit weak, but I think it just has to do with perhaps poor writing and not from being some sort of shill, like you imply. The fact is that result Z, power-efficiency, is there in this case and it has been demonstrated on previously-released desktop products.

As far as your statement that one could say the opposite thing and have the same meaning, I don't see it. Because going "mobile-first" means to focus on power-efficiency in the design of the architecture. It has nothing to do with the order of release of products. That is what the author means by "mobile-first," in any case. To say that NVIDIA was going "desktop-first" would presumably mean that raw performance, and not power-efficiency, was the primary design focus, and so the proper corresponding statement would be: "We went desktop-first, but released to mobile first, and the performance is meant to "cascade up" (is that a phrase? probably should be scale up, unless you live on a planet where the waterfalls fall upwards) to the desktops." There are two important notes here. Firstly, one could not assume that desktop-first design should result in increased mobile performance just because mobile-first design results in increased desktop efficiency. Secondly and more importantly, you replaced "is meant to" with "so". "So" implies a causation, which directly introduces the logical problem you are complaining about. The article says "is meant to," which implies that NVIDIA had aforethought in the design of the chip, with this release in mind, even though the desktop parts launched first. That pretty much describes the situation as NVIDIA tells it (And I don't see why you are so seemingly eager to disbelieve it. The claimed result, power-efficiency, is there, as I previously said.), and though maybe written confusingly, doesn't seem to have major logical flaws: "1. NVIDIA designed mobile-first, i.e., for power-efficiency. 2. We've seen evidence of this power-efficiency on previously-released desktop products. 3. NVIDIA always meant for this power-efficiency to similarly manifest itself in mobile products." The "cascade down" bit is just a color term.
Yojimbo - Monday, January 5, 2015 - link
I just want to note that I don't think the logical flow of the originally-written statement is as weak as I conceded to in my first paragraph. In your paraphrase-quote you left out the main clause and instead included a subordinate clause and treated it as the main clause. The author is drawing a parallel and citing evidence at the same time as making a logical statement and does so in a way that is a little confusing, but I don't think it really has weak logical flow.
chizow - Monday, January 5, 2015 - link
Anyone who is familiar with the convergence of Tegra and GeForce/Tesla roadmaps and design strategy understands what the author(s) meant to convey there.

Originally, Nvidia's design was to build the biggest, fastest GPU they could with massive monolithic GPGPUs built primarily for intensive graphics and compute applications. This resulted in an untenable trend with increasingly bigger and hotter GPUs.

After the undeniably big, hot Fermi arch, Nvidia placed an emphasis on efficiency with Kepler, but on the mobile side of things, they were still focusing on merging and implementing their desktop GPU arch with their mobile, which they did beginning with Tegra K1. The major breakthrough for Nvidia here was bringing mobile GPU arch in-line with their established desktop line.

That has changed with Maxwell, where Nvidia has stated, they took a mobile-first design strategy for all of their GPU designs and modularized it to scale to higher performance levels, rather than vice-versa, and the results have been obvious on the desktop space. Since Maxwell is launching later in the mobile space, the authors are saying everyone expects the same benefits in terms of power saving from mobile Maxwell over mobile Kepler that we saw with desktop Maxwell parts over desktop Kepler parts (roughly 2x perf/w).

There's really no tautology if you took the time to understand the development and philosophy behind the convergence of the two roadmaps.
Mondozai - Monday, January 5, 2015 - link
No, it's not untelligible for reasons that other people have already explained. If you understand the difference between what it is developed for and what is released first you understand the difference. And apparently you don't.
OBLAMA2009 - Monday, January 5, 2015 - link
man nvidia is such a joke
MasterTactician - Monday, January 5, 2015 - link
512 GFLOPS... 8800GTX in a phone, anyone? Impressive.

NVIDIA Tegra X1 Preview & Architecture Analysis

CPU

Uncore

Post Your Comment

194 Comments

View All Comments

tipoo - Monday, January 5, 2015 - link

PC Perv - Monday, January 5, 2015 - link

ABR - Monday, January 5, 2015 - link

PC Perv - Monday, January 5, 2015 - link

Yojimbo - Monday, January 5, 2015 - link

Yojimbo - Monday, January 5, 2015 - link

chizow - Monday, January 5, 2015 - link

Mondozai - Monday, January 5, 2015 - link

OBLAMA2009 - Monday, January 5, 2015 - link

MasterTactician - Monday, January 5, 2015 - link

Log in

Don't have an account? Sign up now