Name: Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core
Item: Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core
Author: Anand Lal Shimpi

Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core

by Anand Lal Shimpi on 9/20/2011 11:46 AM EST

Post Your Comment
Please log in or sign up to comment.

Comments Locked

74 Comments

Back to Article

zorxd - Tuesday, September 20, 2011 - link
It will barely catch-up with the Mali 400, if at all. I expected more from Nvidia.
Draiko - Tuesday, September 20, 2011 - link
That has yet to be seen buuut let me ask you something...

Can you even use the extra performance of the Mali on anything right now? That SGS2 you probably have is already 5 months old and nobody has done anything with it except root, throw Chainfire3D on it, and play Tegra games that nVidia helped bring to Android.

You have some more performance in benchmarks and extra video codec support. It's a newer SOC, that's expected but nobody seems to be doing anything with the added performance and Samsung isn't encouraging anyone to start. Sammy is just appeasing the ROM dev crowd whose main focus is messing around with Android, not building new apps and games.

If that doesn't change, SGS2 users are just going to be stuck jumping through a ton of hoops (rooting, chainfire3D, etc) to get the same software experience as the Tegra 2.

Thanks but no thanks.
metafor - Tuesday, September 20, 2011 - link
So essentially, you're arguing that Samsung needs to purposely fragment the Android market and use shady-ass "exclusivity" deals to gain consumers?
Draiko - Tuesday, September 20, 2011 - link
Google, Apple, and Microsoft "Fragmented" the mobile device into separate platforms and app markets and shady-ass exclusivity deals to gain consumers. The same thing goes on in the gaming market as well. It happens everywhere. What's your point?

nVidia ponied up some cash to bring some awesome games to Android. I don't see the problem with them "fragmenting" it a little to fund showcasing these capabilities of the Android Platform and their hardware. They're giving devs incentive to produce more advanced software and raising the bar.

I wish more companies would do that instead of putting out some kind of benchmark-rocking hardware that ends up doing nothing. Remember the Samsung hummingbird? They really put that to good use, didn't they?
metafor - Tuesday, September 20, 2011 - link
My point is that it's not a good thing. Just because "others do it too" doesn't make it a good thing.

The fact that you can defend this shows just how much of a shill you are. Seriously, you're trying to argue that it's a good thing that each manufacturer comes up with exclusive game titles so that we'll have 5+ different markets and games available only to certain devices.

Having a common platform is what keeps hardware companies in constant competition with each other. Having "exclusivity" deals essentially boils down to "our hardware is not as good so we'll buy our way into consumer's pockets with some exclusive games".
Draiko - Wednesday, September 21, 2011 - link
First off, I'm not a shill.

Second, Having a common platform is nice but keeping that common platform and then going above and beyond it is even better. Not pushing the envelope fast enough makes the entire platform stand still.

Third, I'd rather a company offer their excellent hardware and "buy" their way into my pocket with specialized apps and games than hollow benchmarks and wasted potential.

Again, the Samsung Hummingbird... what has the common platform done with it? What made it worthwhile? Nothing. The SGX540 just collected dust and now the people that actually cared about having an SGX540 have moved onto more powerful SOCs.
SteelCity1981 - Wednesday, September 21, 2011 - link
what a troll.
Lucian Armasu - Wednesday, September 21, 2011 - link
You should eliminate comments like these, Anand. And also the ones that degrade the conversations. I don't want Anandtech to become another Engadget, comment wise.
metafor - Thursday, September 22, 2011 - link
Nothing? It played every game that was available at the time at faster framerates (and in many cases, was the only one to do it at acceptable frame rates).

And I have a bit of a hard time taking you seriously when not more than a few months ago, you were the one on this very board hawking about how much faster Tegra 2 was *on benchmarks*.

Now your new schtick is "hey, benchmarks don't matter, Tegra has software exclusives!"

Never mind that as Chainfire proved, those exclusives have nothing to do with the technical capabilities of other chips like Exynos (it plays them just fine) but having to do with nVidia exercising Microsoft-like "deals".

Tell me, what are your thoughts on LG's new "exclusives" with game titles? So that in order to get them, you have to get an LG phone? But they, they're "incentivizing" devs, right?
ph00ny - Thursday, September 22, 2011 - link
Have you seen the difference between standard openGL code vs nvidia "optimized" code. It's like using 2 lines of code vs 1 line. End result is pretty much identical between two platforms. What do you think chainfire is doing with his app running as an intermediate driver? Yup. He's simply translating nvidia "optimized" codes back into OpenGL standard. What nvidia is doing here simply creating another level of fragmentation to the android environment.

As for google, apple, and microsoft comment, they're the creators of indvidual OS. Are you saying nvidia should make their own OS to "fragment".
z0mb13n3d - Tuesday, September 20, 2011 - link
Makes sense. Without content to make use of the underlying hardware, all the "MY PH0NE PWNZ URZ" comments are just silly and good enough to draw pretty charts and graphs.

If nvidia can push out compelling enough games/apps that make actual use of all 4 cores/better GPU at launch, the Exynos might still look like the champion, on paper. Samsung doesn't seem to care much about software and content (aside from their UI), TI is busy doing god alone knows what and only Qualcomm is beginning to understand that content is just as important as the silicon they throw out, although they still have quite some way to go.

This is quite similar to what Intel is facing (as Anand pointed out) with the QuickSync technology. Excellent tech on paper, but with little to no freely available apps that makes actual use of the tech, it's all just a big pile of useless.
Draiko - Tuesday, September 20, 2011 - link
Unless devs are either excited or incentivized, they're going to build apps that run on the largest number of devices.

nVidia is incentivizing. Other companies should do the same. There will always be a common library of Android apps that run on all devices.

I'm sure that once the dev tools get more advanced and the platform matures, we'll see general apps and games that work on all devices but have abilities that are enabled only on certain hardware.
Death666Angel - Tuesday, September 20, 2011 - link
Since most other competitors will be using 28nm technology and Cortex A15 for their quad cores (afair), it stands to reason that a quad core built on the 40nm technology with A9 innards will be quite the power hog. :-)

I'm very interested to see how the next round of ARM refreshes goes.
Draiko - Tuesday, September 20, 2011 - link
Ummm... those 28nm SOCs like Krait and OMAP5 won't be in products for a while. They're also pretty expensive to make so OEMs will shy away from using them at first.

Tegra Kal-el products are going to be on store shelves as early as next month and after using a Tegra 2 (40nm dual-A9), I'm pretty sure the Kal-el won't be a power hog.
jjj - Tuesday, September 20, 2011 - link
Dual core Krait at 1.5-1.7GHz is supposed to show up in devices early next year (according to Qualcomm anyway).
Draiko - Tuesday, September 20, 2011 - link
Last I heard, they were scheduled to start sampling Krait in Q2 2011 and release devices around a year+ later. That puts Krait devices almost another year out at best. Tegra Wayne devices will be shipping by then.
jjj - Tuesday, September 20, 2011 - link
your info is outdated
Draiko - Tuesday, September 20, 2011 - link
No it isn't, they were sampling in volume back in June. That was on-schedule (June is part of Q2 last time I checked).

A few hopeful bloggers were saying that Krait might hit early. We'll see.
jjj - Tuesday, September 20, 2011 - link
28nm parts started sampling in Q2,that part is true.
In the last month Qualcomm said multiple times that phones will show up early nest year,most recently at Qualcomm IQ in Istambul (watch out some sites wrongly reported that they'll have 2.5GHz quads).Now ofc this is what they expect and as always things can go somewhat differently.
As for wayne,i wouldn't expect it in 2012.
Draiko - Tuesday, September 20, 2011 - link
If nVidia doesn't show Wayne at CES 2012, I wouldn't expect it in 2012. Until we see or don't see Wayne, we can only make assumptions based on nVidia's roadmap in which they've clearly committed to a new Tegra every year.

Early next year could mean anything before June and most likely points to a MWC showcase. Qualcomm is pushing Krait up because of increased competition. They even cancelled the MSM8672.

I'll also remind you that Qualcomm's roadmap for the MSM8660 stated a Q3 2010 release but the first product was the Pantech Vega Racer which didn't hit until May, 2011. The US launched the first MSM8660 equipped device in June (Evo 3D).

Using that schedule history, the Dual-core MSM8690 equipped products won't hit shelves until Q3 2012 and the Quad-core Kraits (Q1 2013) won't hit stores until Q4 2013.
Draiko - Tuesday, September 20, 2011 - link
PS - The MSM8660 was volume sampling in June, 2010.
jjj - Tuesday, September 20, 2011 - link
You might be confusing comercial sampling dates or volume production with actual devices in stores.In any case the fact is that Qualcomm said early next year for devices on a number of ocasions,including IR events and there isn't anything more accurate that this at this time.
If Nvidai had Wayne ready for devices hitting retail next year there would be no need for Kal-El plus.

PS: we could also see some surprises from Samsung and Apple.(ofc what Apple does is less relevant here)
Draiko - Wednesday, September 21, 2011 - link
Kal-el plus might be a lower-cost part for mid-range devices and other markets.

Samsung won't pull out any surprises. They'll either stick with Mali 400 (possibly MP2 or MP4 configs) or jump directly to a Mali-T604.

Apple is just going to keep using the SGX series GPUs but I doubt they'll start using a "Rogue" GPU for a while longer.
Lucian Armasu - Wednesday, September 21, 2011 - link
My guess is Kal-El+ is a mid-life kicker for Kal-El, to keep up with the new competition mid year. So it will probably be an upgraded Kal-El to 2 Ghz or even 2.5 Ghz per core.
z0mb13n3d - Tuesday, September 20, 2011 - link
Highly unlikely we'll see anything Wayne-related at CES 2012, especially since nvidia will expect a lot of Kal-El -product announcements at CES since it's now been pushed to Q4'11.

It will be very interesting to see if nvidia is able to out-execute the competition in the SoC industry, much the same way they did in the GPU industry early on. Interesting times lay ahead in the mobile space!
Draiko - Wednesday, September 21, 2011 - link
Kal-el tablets are due out next month.
Lucian Armasu - Wednesday, September 21, 2011 - link
Kal-El wasn't showed at CES either, like Tegra 2 was. It was showed at MWC. But it's still coming to tablets this year, and phones early next year. I figure Wayne would do the same. It's too bad Nvidia couldn't keep up with their original schedule, though, so we could see both next-gen chips in tablets and phones by Christmas.

But it probably has mostly to do with manufacturers. It's very possible they delayed Nvidia's schedule by 2 months, maybe to wait for Ice-Cream Sandwich, or who knows.
metafor - Tuesday, September 20, 2011 - link
Cost of the chip is primarily determined by die area, not how expensive the R&D for the process is.

And 28nm Snapdragons will be in devices Q1 of 2012.
z0mb13n3d - Tuesday, September 20, 2011 - link
Unlikely. But even if they do manage to push out 28nm on shipping devices by Q1, the possibility of them being quad-core are almost impossible.
metafor - Tuesday, September 20, 2011 - link
It's not quad-core, but why does it need to be? 2x A15-class > 4x A9 class any day of the week.
z0mb13n3d - Tuesday, September 20, 2011 - link
You're wrong, on two fronts. With Qualcomm being an architecture licensee, the Krait is not a straight-up A15 implementation (much unlike OMAP5 and Wayne, which will be). Also, if we were to assume what you're saying to be true with Qualcomm shipping retail devices based on Krait in Q1'12, realistically they would have had to have started development on their Krait architecture at least 3 years ago (especially with Qualcomm claiming that it is a new design from the ground-up). Considering this fact, the Krait will most likely be another 'in-between' architecture, straddling the A9 and A15, with custom blocks and logic in there to ensure the architecture will ramp up in frequency and be die-shrink friendly, since it has to remain competitive with current high-end A9's (Kal-El, the rumored Samsung A9 QC) and future high-end A15's (OMAP5, Wayne etc.) which are 1.5-2 years out. This is completely ignoring potential 28nm yield issues.

Secondly, the whole '2x A15-class > 4x A9-class' comment is so obviously flawed, it isn't even worth the time and effort to try and put forth reasonable arguments to counter it.
metafor - Tuesday, September 20, 2011 - link
Which is why I said A15-class. Krait is not A15 but its performance -- thus released in DMIPS form -- is on the level of A15, that is to say, much more than A9.

"realistically they would have had to have started development on their Krait architecture at least 3 years ago (especially with Qualcomm claiming that it is a new design from the ground-up)"

Scorpion finished development in early 2009 and retailed in devices in 2010. What do you suppose the CPU team has been doing since?

"Secondly, the whole '2x A15-class > 4x A9-class' comment is so obviously flawed, it isn't even worth the time and effort to try and put forth reasonable arguments to counter it."

Oh please. Don't tell me you're one of those people who think 4x Core = 4x Performance.
z0mb13n3d - Tuesday, September 20, 2011 - link
Numbers such as DMIPS and MFLOPS do little more than help these companies position their products on paper and roadmaps. How do you know that the Krait performs on par with/beats A15's? There are no A15's around for you to compare.

Scorpion finished development in early 2009? What? You clearly don't know what you're going on about here. Scorpion was available commercially in Q4 2008, implying that the 'design' was finalized much before that, most likely mid-late 2007. Assuming they started working on Krait development after that, it lends further credence to the possibility that it is a highly custom A9-based design. Given that Qualcomm has already openly stated they are planning on hitting up to 2.5GHz (!!) on Krait, it also seems very likely that it is banking on ramping up frequencies to compete with others, thereby not being a wider/higher IPC A15-based design. While it may have a leg up against Kal-El (or even Kal-El+), how it performs against pure/modified A15's (OMAP5's, future Samsung cores or Wayne) is anybody's guess.

"Oh please. Don't tell me you're one of those people who think 4x Core = 4x Performance."

...says the person who makes a silly blanket comment in the first place!
metafor - Tuesday, September 20, 2011 - link
"Numbers such as DMIPS and MFLOPS do little more than help these companies position their products on paper and roadmaps. How do you know that the Krait performs on par with/beats A15's? There are no A15's around for you to compare."

ARM has provided preliminary numbers for A15 as has Qcom. Of course they are DMIPS so obviously they don't represent the bulk of workload. However, that aside, both Krait and A15 fall into a wholely different class than A9.

"Scorpion finished development in early 2009? What? You clearly don't know what you're going on about here. Scorpion was available commercially in Q4 2008"

Not really. The first revision of 8x50 was announced in 2008 but it wasn't until the second revision that you could find it in a consumer product (LG in Korea, I believe).

"Assuming they started working on Krait development after that, it lends further credence to the possibility that it is a highly custom A9-based design."

Architectural licenses don't work like that. You don't "base" it on a Cortex design. ARM doesn't give you that kind of resource (documentation, engineers, etc.). You either use a stock core or make your own. Why would a team planning on releasing a chip in 2011/2012 aim only for A9-level performance?

"Given that Qualcomm has already openly stated they are planning on hitting up to 2.5GHz (!!) on Krait, it also seems very likely that it is banking on ramping up frequencies to compete with others, thereby not being a wider/higher IPC A15-based design."

A15 is also projected to hit upwards of 2-3GHz....

Do you know anything about A15 at all? It almost doubles the pipeline length compared to A9 (8 vs 15 stages, but 8 is the load-latency whereas 15 is the number for integer exec. The load latency for A15 is ~17). It's a heavily pipelined design intended for high-frequency. It also happens to be higher in IPC as well than A9 but that came at a pretty heavy area (and likely power) cost.
z0mb13n3d - Tuesday, September 20, 2011 - link
I don't know what 'revision' you're going on about (silicon spin perhaps?...in which case you're wrong anyway), but the first QSD's were up for sampling in the Q3-Q4 08 period. The fact that the first consumer device (LG Expo) packing that particular chip shipped only a year later just goes to show why I think no Krait-based devices will ship in Q1'12

Oh but you do base it off of a template! While it is up to the licensee to decide what/how much they want to customize, it'd be silly of you to think they start from 0. That kind of investment would almost never be recouped in a 3-4 year cycle. Plus, given that they did start working on Krait in late 07, performance models and simulations can give you just that, models. Part refreshes are inevitable. Look at the number of Scorpion interations!

A15 is not projected to hit anywhere CLOSE to 3GHz. The template macro limit is 2.5GHz and even that is using the G process node for the deepest pipelined model. Don't expect anything more than 1.8-2 GHz standard A15 in mobile guise.

Please stop saying stuff just for the sake of it or merely stating the obvious. Repeating the same thing again and again won't make it right!
metafor - Wednesday, September 21, 2011 - link
Yes, silicon spin. You'd be surprised how often companies claim "sampling" before silicon's remotely ready. The reason Krait SoC's should be in production by Q1 is because they were "sampling" way back in January/February of 2011.

And yes, ARM licenses do work like that. You either take a stock design or you take the ISA and start from scratch. Do you have any idea how much effort it takes to reverse-engineer from RTL? You do not get detailed documentation from ARM. Have you been an ARM licensee before? Done an ARM design? Well I have. You start from 0. You can make iterative improvements as you go (although it didn't happen with Scorpion, save for MP support). But at the end of the day, trying to make major modifications to an existing ARM design without access to the original designers of that core (and you don't get that) is far more effort for too little gain.

http://www.arm.com/products/processors/cortex-a/co...

"1.5GHz-2.5 GHz quad-core configurations"

You'd be surprised what 28 HKMG can do for frequency. And yes, that is what Krait at 2.5GHz will be on.
z0mb13n3d - Wednesday, September 21, 2011 - link
If you're saying that going back to do a die re-spin after announcing commercial sampling is common in the industry, you very clearly are confused. VERY confused. This is probably the worst thing that could happen to a company. The most recent example of this would be nvidia's Fermi.

Making a custom design does not imply rebuilding every single block from scratch. With proven, stable macro libraries available, it would be a criminal waste of effort, time and money to build/design every block from scratch. Likewise, to "base something" does not mean to reverse engineer code. At least not outside of Verilog assignments in school.

Finally, the link you posted proves 2 things now. One, you were clearly wrong about A15 hitting up to 3GHz. Two, that link only further proves what I've been saying all this while. To quote ARM's own implementation examples:

"Smartphone and Mobile Computing: 1 GHz – 1.5 GHz single or dual-core configurations.....

Digital Home Entertainment: 1 GHz - 2GHz..." with Home/web servers/wireless equipment estimated to hit up to 2.5GHz.

Again, please try and understand this: THESE NUMBERS ARE TEMPLATES that tell potential licensees what the architecture is capable of in terms of theoretical maximums, assuming there is no process/leakage/regulation/routing/floor planning issues. Given that a majority of existing A8/A9-based designs barely make it through more than 3/4th of a day with normal use, I can't even being to imagine what use an SoC running 4xA15's (or even 2 for that matter) at 2.5GHz would be used for. This is not even considering the power envelopes for the baseband, NAND, RAM, GPU etc.

I've said this enough number of times. It really is up to you to see things for the way they are or just go on about...random stuff. Cheers!
metafor - Thursday, September 22, 2011 - link
Lol, you say "commercial sampling" like it's some kind of magic. 8660 was "sampled" when? Oh yes, a year before commercial devices were out. 8x50 was "sampled" when? Oh yes, a year before devices were out. Tegra 2 was "sampled" when? Oh man. Let's go to Kal-el.

"Making a custom design does not imply rebuilding every single block from scratch. With proven, stable macro libraries available, it would be a criminal waste of effort, time and money to build/design every block from scratch."

It does mean building every single block. Including the circuit macros for the standard cells. Sure, you could use ARM's standard cell library, but they were made specifically for the micro-architecture of the ARM cores. You may need your wallace tree to have faster compressors, your register files to have better access times or more ports or lower standby power. And you get no documentation for any of the individual modules. You get a behavior model of the whole CPU and verilog; that's it. Think about how much effort it takes to go in to a submodule and figure out "hey, what are the small details here that I don't know".

Again, have you actually done an ARM design? You sound awfully sure of yourself.

As for the rest of your ranting, all I've said is that A15 is targeting the same frequency at 28HKMG as Krait. Which is, as I've shown, true.
Lucian Armasu - Wednesday, September 21, 2011 - link
A Cortex A15 is significantly larger than a Cortex A9, so by itself should be a power hog compared to A9, but the 28nm process should eliminate part of that, plus whatever optimizations Nvidia and the others do to it by then. Plus, there isn't any quad core Cortex A15 chip coming to market, perhaps besides Wayne/Tegra4, which will *probably* have that.

TI OMAP 5 is not a quad core Cortex A15. It only has 2 Cortex A15 cores at 2.5 Ghz each, and it's coming late 2012. The other MP4 cores are companion cores, but I believe work differently than Nvidia's companion core. We'll probably see a quad core Krait chip from Qualcomm in 2nd half of 2012 at 2.5 Ghz, but I imagine each of that core should be a bit weaker than a Cortex A15 core.
dagamer34 - Tuesday, September 20, 2011 - link
Ehh??? You missed the most important info on the first slide: Windows Phone is on there! To date, only Qualcomm chips have been used in Windows Phone devices.
Anand Lal Shimpi - Tuesday, September 20, 2011 - link
The slide just indicates that Windows Phone is a target market for Tegra, not that MS has agreed to ship NV hardware in Windows Phone. Right now it's still all Qualcomm.

Take care,
Anand
thefivetheory - Tuesday, September 20, 2011 - link
Perhaps it'll be a weighted companion cube--er, core. Will there be cake?
cgramer - Tuesday, September 20, 2011 - link
The cake is a lie! :-)
Ken g6 - Tuesday, September 20, 2011 - link
Unfortunately, the weighted companion core will probably have to be fused off in an "emergency intelligence incinerator" before the chip can go to production.
cosmotic - Tuesday, September 20, 2011 - link
I'ts pretty lame that the power saving graph from nVidia starts at 20%.
IKeelU - Tuesday, September 20, 2011 - link
Why? The power savings seems pretty good to me, considering Tegra2 was already smartphone-capable.
jjj - Tuesday, September 20, 2011 - link
Good that they did more about power consumption,they needed that,
Are you sure about the turbo like feature? Can't remember seeing it mentioned in the whitepapers.
jalexoid - Tuesday, September 20, 2011 - link
Did they get the idea for companion core from OMAP5?
sarge78 - Tuesday, September 20, 2011 - link
Standard ARM IP maybe? All the other multi core designs have a low power companion core too. (nVidia are using a full on A9 core while the others are using a M3/A5 core though)
alyarb - Tuesday, September 20, 2011 - link
I like that nVIDIA is having fun with CPU architectures, but phones have been fast enough for most users and workloads for about a year now. Power optimization should be the priority, while the growth of the CPU should be somewhat tied to the growth of the memory bus.

My single core A8 has HKMG and runs for a full day on a charge under all manner of workloads. My next phone must also have HKMG and do better. I honestly thought Tegra 3 was going to be 28nm HKMG but now I'm totally put off.
polysick - Wednesday, September 21, 2011 - link
Don't you think having the 5 core setup is more impressive than hkmg? The idea is you are only going to be running a single core most of the time anyway.
Wolfpup - Wednesday, September 21, 2011 - link
A year ago we were using A8, and in fact that's still what's in Apple's stuff, and A8 isn't remotely fast enough. The dual core A9 @ 1Ghz seems to run single threaded stuff pretty well, though it's still slower than I'd like, and we're still running very crippled OSes.
Gauner - Tuesday, September 20, 2011 - link
I actually would like to see something similar in desktop architectures, I usually buy high end quad cores for work(3d rendering and video editing/compression), but it has always seem a little wasteful to use that as main desktop computer, most of the time my PC is only doing simple tasks like irc, web browsing, playing music/movies, ... and I think I'm not the only one that wastes energy with quad cores most of the time.

A extra core with an atom or low end llano would be perfect for that, you could let the 10w cpu work most of the time, and only wake up the 80+W quad core when needed, in a year it should give some nice savings in energy. And yes, I know that those 80-100w are only consumed under full load, but I doubt a i7 2600k will consume 10 or 15w or less with only simple task in the background.
Mike1111 - Tuesday, September 20, 2011 - link
The Cortex-A9's companion core role sounds a lot like the dual-core Cortex-M4's in OMAP5 (in addition to the dual-core Cortex-A15). Just 9 months earlier :-)

I'm wondering if Apple's gonna do something similar for the CPU in the A6, since spring 2012 seems to be to early for a full-blown quad-core Cortex-A15 and I can't see them going quad-core Cortex-A9 like Nvidia.
z0mb13n3d - Tuesday, September 20, 2011 - link
The idea is similar, but the execution is different, very different. The M4's in the OMAP5 can run (or support) very specific tasks that have to do with the video decode, ISP etc, while the main A15's are running. They are not general purpose cores. As listed in the article above, processes such as background sync (email, FB etc.) or any 'general purpose' Android code execution will wake up the A15's in OMAP5.

In this case however, the 'companion' (definitely could have come up with a better term for this) core in Kal-El is a full-blown general purpose low-power A9 core , that is capable of running everything the other 4 A9 cores can, albeit significantly slower and only up to a certain utilization threshold (for obvious performance reasons). So in effect, unless utilization spikes, the 4 'main' cores would probably never wake up while the device is in standby.

Although I'm not sure how accurate the numbers nvidia provided are (definitely seem optimistic), if it's true that the companion core does in fact have the MPE block, I can understand the HD video playback power savings claims. Doesn't Flash also use MPE to an extent?
ltcommanderdata - Tuesday, September 20, 2011 - link
Given everyone seemed to standardize on 512kB of L2 cache for the Cortex A8, which then carried over as 512kB L2 cache per core for for dual core Cortex A9 SoCs, is there any performance concerns now that 4 cores are having to share the same 1MB L2 cache?
polysick - Wednesday, September 21, 2011 - link
I would think there shouldn't be. When a single of the GP cores gets activated, it would seem that clock rate increases (if I read that correctly), which makes sense since with the smaller area you have to worry less about clock skew. This is AKA the 'turbo boost' mentioned. So if you have 4 cores active, I think you would have a slower clock rate, so maybe L2 cache congestion wouldn't be as much of an issue?
macs - Tuesday, September 20, 2011 - link
Anand, I would love an article that summarize what is expected to be available in the coming months/year (OMAP 5,Kal-el,Krait, Exynos quad core, A6)...
We need more order in this SOC world because it has a lot of players!
Blaster1618 - Friday, September 23, 2011 - link
I second that request, When I read your original article on Nvidia's road map I thought they were going to sweep the market, but in that time it seems like TI and the other competitors have raised their SOC game 2-3 orders of magnitude, I'm lov'n it. Samsung information is always sketchy until there months from release. I tried once to look through ARM's customer list and quickly got a headache. B-)
SniperWulf - Tuesday, September 20, 2011 - link
I think its cool that nvidia appears to be progressing nicely in the ARM world. But since they are still primarily a graphics company, I really expected the T3 (or whatever it will be called) to blow the doors off of Mali and A5
Stahn Aileron - Tuesday, September 20, 2011 - link
Funny. This is similar to the thoughts I've had for the past year of why Intel hasn't tried integrating an Atom core (or two) into an i-series CPU for standby modes.

This would've been the thing I think Intel would try to do for its ultrabook initiative. LP Atom core running while the system is idle or in standby, Core, uh, cores running for active workloads.

I honestly expect one of these days to see an i3 or i5 series CPU with an Atom in it. Maybe 2+2 arrangement.
polysick - Wednesday, September 21, 2011 - link
I wonder, with this dynamic allocation of cores going on here, will it scale well past 4 cores? On an 8 core machine, how often would you use all 8? what about 16? 512? It seems reminiscent of the problem instruction level parallelism ran into when intel first started to run out of things to do with more transistors. Having more than four ALU's proved pointless because you'd end up wasting the rest of your instruction word with no-ops most of the time. Without massively threaded programs, do you think you think you could realize utilize more cores?
BoyBawang - Wednesday, September 21, 2011 - link
I guess the unused cores will serve as heat sink. So the more cores... the cooler.
polysick - Wednesday, September 21, 2011 - link
I'd think that would just be a waste of transistors.. not highly effective.
BoyBawang - Wednesday, September 21, 2011 - link
But still power efficient in multitasking like several little apps running at the same time distributed to 4 cores where each under-clocked at 200mhz. That configuration is more power efficient than a single-core clocked at 800mhz
GullLars - Sunday, September 25, 2011 - link
But that would require a reversed 4+1 configuration, with 4 LP-cores for multitasking and 1 G-core for those heavy single-thread workloads. Perhaps a 4LP + 2G might be more efficient?

Also, regarding stuff noted earlier in this thread, would you consider buying an i(something)-#### with twice the cores for twice (or 3x?) the price, but with the restrictions of half the allowed clock frequency (except for single and dual core turbo, which would be the same), giving you the same throughput potential at lower power.
Wolfpup - Wednesday, September 21, 2011 - link
Okay, so Tegra 2 and the A5 are basically the same thing, save for two SGX 543s (what is that, basically 8 cores?) instead of 8 of Nvidia's cores. And supposedly the two SGX 543s are better.

But then how do you stick FIVE A9s on there, and bump it to 12 GPU cores, and STILL have A5 be 30% larger? That MUST be a typo, right? I mean...the SGX 543s aren't THAT impressive, are they?
shiznit - Wednesday, September 21, 2011 - link
Yes they are.

And the A5 uses a different process from a different foundry.
Lucian Armasu - Thursday, September 22, 2011 - link
If what Anand says is true, than the A5 should be 2x as large as Tegra 2, which means the GPU is a lot larger, and it's why it managed to beat all the other chips in the market on GPU benchmarks. This also makes me think that they will shrink the GPU part for the iPhone 5, so it won't be as powerful.
BoyBawang - Friday, September 23, 2011 - link
If you read the whitepaper(b), Nvidia seems defensive that Kal-El's 4 processors, although can be turned on/off individually, aren't asynchronous meaning they always operate at the same frequency. In contrast, Qualcomms dual-core CPU can each operate at different frequency depending on load.
felixyang - Tuesday, September 27, 2011 - link
It's Cortex A9's limitation.
GullLars - Sunday, September 25, 2011 - link
I feel it's worth pointing out they (nVidia) chose to start their chart at 20% instead of 0% in the "Power Savings on Kal-El due to vSMP" chart.
This is at best a mistake, and likely intentionally dishonest.

Since most people interpret the graphs visually even though the numbers are there, they will come away with the visual ratio of graph lenghts rather than the numbers they represent. For the third, "HD video playback", that will results in Kal-El looking like it uses ~25% of the power of Tegra 2 (1/4 graph length) rather than the ~40% represented by the numbers.

I have noticed this before in marketing slides from a couple of other companies (not CPU/GPU) who shall remain unnamed. Every time i notice this kind of "GRAPH TAMPERING" (or graphs without units on them, like the last one here), i loose some respect for that company. I usually also shoot off an email to the contact@company mail if they have the pictures on their website, though here it's posted by Anand, and not commented on.
If any editors read this, in the future please at least note this below the relevant graphs.
ProDigit - Wednesday, September 28, 2011 - link
Why have 5 cores, with only 4 active?
When 4 are active, they expect a multi threading app is working demanding nearly 100% of each core; so enabling the lesser core there, would only result in an even faster mechanism.
felixyang - Saturday, October 8, 2011 - link
The problem is cache snoop latency. If the companion core is enabled, the big core has to snoop the companion core's cache which leads to high cache latency of big cores and poor performance maybe. On the other side, I think it's also the limitation of their architecture. the big core has no way to snoop the companion core. just guess.

Kal-El Has Five Cores, Not Four: NVIDIA Reveals the Companion Core

Post Your Comment

74 Comments

Back to Article

zorxd - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

metafor - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

metafor - Tuesday, September 20, 2011 - link

Draiko - Wednesday, September 21, 2011 - link

SteelCity1981 - Wednesday, September 21, 2011 - link

Lucian Armasu - Wednesday, September 21, 2011 - link

metafor - Thursday, September 22, 2011 - link

ph00ny - Thursday, September 22, 2011 - link

z0mb13n3d - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

Death666Angel - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

Draiko - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

Draiko - Wednesday, September 21, 2011 - link

Lucian Armasu - Wednesday, September 21, 2011 - link

z0mb13n3d - Tuesday, September 20, 2011 - link

Draiko - Wednesday, September 21, 2011 - link

Lucian Armasu - Wednesday, September 21, 2011 - link

metafor - Tuesday, September 20, 2011 - link

z0mb13n3d - Tuesday, September 20, 2011 - link

metafor - Tuesday, September 20, 2011 - link

z0mb13n3d - Tuesday, September 20, 2011 - link

metafor - Tuesday, September 20, 2011 - link

z0mb13n3d - Tuesday, September 20, 2011 - link

metafor - Tuesday, September 20, 2011 - link

z0mb13n3d - Tuesday, September 20, 2011 - link

metafor - Wednesday, September 21, 2011 - link

z0mb13n3d - Wednesday, September 21, 2011 - link

metafor - Thursday, September 22, 2011 - link

Lucian Armasu - Wednesday, September 21, 2011 - link

dagamer34 - Tuesday, September 20, 2011 - link

Anand Lal Shimpi - Tuesday, September 20, 2011 - link

thefivetheory - Tuesday, September 20, 2011 - link

cgramer - Tuesday, September 20, 2011 - link

Ken g6 - Tuesday, September 20, 2011 - link

cosmotic - Tuesday, September 20, 2011 - link

IKeelU - Tuesday, September 20, 2011 - link

jjj - Tuesday, September 20, 2011 - link

jalexoid - Tuesday, September 20, 2011 - link

sarge78 - Tuesday, September 20, 2011 - link

alyarb - Tuesday, September 20, 2011 - link

polysick - Wednesday, September 21, 2011 - link

Wolfpup - Wednesday, September 21, 2011 - link

Gauner - Tuesday, September 20, 2011 - link

Mike1111 - Tuesday, September 20, 2011 - link

z0mb13n3d - Tuesday, September 20, 2011 - link

ltcommanderdata - Tuesday, September 20, 2011 - link

polysick - Wednesday, September 21, 2011 - link

macs - Tuesday, September 20, 2011 - link

Blaster1618 - Friday, September 23, 2011 - link

SniperWulf - Tuesday, September 20, 2011 - link

Stahn Aileron - Tuesday, September 20, 2011 - link

polysick - Wednesday, September 21, 2011 - link

BoyBawang - Wednesday, September 21, 2011 - link

polysick - Wednesday, September 21, 2011 - link

BoyBawang - Wednesday, September 21, 2011 - link

GullLars - Sunday, September 25, 2011 - link

Wolfpup - Wednesday, September 21, 2011 - link

shiznit - Wednesday, September 21, 2011 - link

Lucian Armasu - Thursday, September 22, 2011 - link

BoyBawang - Friday, September 23, 2011 - link

felixyang - Tuesday, September 27, 2011 - link

GullLars - Sunday, September 25, 2011 - link

ProDigit - Wednesday, September 28, 2011 - link

felixyang - Saturday, October 8, 2011 - link

Log in

Don't have an account? Sign up now