Actually, lots of people wanted it, but it was a little too pricey. Side from that Google just bought a bunch of these actually and Alibaba may have done the same, even prior to this announcement.
Nvidia does the same thing with Kepler cards and previously Fermi when Kepler was the norm... Also, I would encourage you to try a GTX 710 or 730 if you think AMD recycling is bad.
I bought a half-height GT 730 when the GK208 version launched, but telling it apart from the GF108 version was idiotic. I can't believe they didn't call it a GT 735 or something. I had to read the core configuration on the side of the box, the clock speeds, and so on to avoid buying the Fermi part (I wanted the Kepler video decoder for my home theatre PC). Bleargh.
It looks like it won't be setting any efficiency records though. Adding the interconnect to maximize FP16 throughput guts efficiency as expected.
The result is that for FP32 for Fiji we have 8.2 tflops in 175W budget at 28nm and for Vega 12.5 tflops in 300W budget at 14nm.
In other words, process is scaled down twice, TDP budget is increased almost twice, but performance gains are only 66% or so. That's fairly modest. I'd expect even if not mature yet, process alone outta result in a 40% boost at the very least, and the expanded TDP headroom another 50%, so close to 90% at the very least. But that's just the cost of maximizing FP16 throughput, for their own sake I hope this instinct will be a different chip overall rather than just re-branding, cuz that would mean the workstation and compute workflows will needlessly suffer for the sake of a feature that is irrelevant in those fields.
It's not too bad, I think. The P100 offers 18.7 TF of half precision performance at about 250W, so AMD in theory is ahead of Nvidia on the efficiency curve here, offering around 35% more FLOPs for 20% more power. Now, AMD TF != Nvidia TF, especially in gaming, but there's probably a chance to expect that AMD could achieve better hardware efficiency in a compute environment than in a gaming one.
I don't think it's correct to compare the efficiency of the MI25 with the P100. Rather it should be compared efficiency-wise with the P40, as strong FP64 is not something that's been mentioned for the MI25 as far as I see.. Note that the P40 uses GDDR5 and not HBM2, which reduces its efficiency. I know the P40 doesn't have FP16 support but I don't think the MI25 will really be competing much with the Pascal generation of Tesla cards except after they are offered at a lower price once the Volta generation of cards are available. These Radeon cards are not just drop-in replacements for NVIDIA's hardware. Even assuming AMD can produce the MI25 in volume in Q2 2017, it will take a bit of testing and validation before people are willing to use it en mass in servers. Users also have to think about software and middleware considerations.
In any case, they seem to be claiming efficiency close to the P40, which is a bit surprising. What we do know is that AMD claimed strong efficiency with Polaris before it was released and they overstated their claims. For me, I am taking their claims with a grain of salt until the product is actually released.
Well, what I find shocking is the P4 with a 5.5TFLOP rating at 50w/75w versions as the rated maximum power, not even using TDP to obfuscate the numbers. It's right near the output of the 1070 but the power numbers are just, what? If that's true and it may well be considering they're available products, I wonder how they've got that set up to draw so little power yet output so much or process at such speed.
The math doesn't work like that at all. Additionally, we don't know the die size and the GPU die is not the only thing using power on a GPU AiB. What we do know is that it gets more at 300W rated TDP than Nvidia's P100.
"The math doesn't work like that at all." - neither do flamboyant statements devoid of substantiation. The numbers are exactly where I'd expect them to be based on rough estimates on the cost of implementing a more fine-grained execution engine.
Almost everyone has gotten this wrong this generation. It's not 2 node jumps because the 14nm GloFo and 16nm TSMC are really "20nm equivalent nodes." The 14nm/16nm at GloFo and TSMC are more marketing than a true representation.
"Bottom line, lithographically, both 16nm and 14nm FinFET processes are still effectively offering a 20nm technology with double-patterning of lower-level metals and no triple or quad patterning."
Which is *exactly* why I find the rumor that AMD is licensing its GPUs to Chipzilla to replace Intel iGPUs so scary: With that Intel would be able to produce true HSA APUs with HBM2 and/or EDRAM which nobody else can match.
Intel has given away more than 50% of silicon real-estate for years for free to starve off Nvidia and AMD (isn't that illegal silicon dumping?) and now they could be ripping the crown jewels off a starving AMD to crash NVidia where Knights Lansing failed.
AMD having on-par CPU technology now is only going to pull some punch, when it's accompanied with a powerful GPU part in their APUs that Intel can't match and NVidia can't deliver.
They license that to Intel, they are left with nothing to compete with.
Perhaps Intel lured AMD by offering their foundries for dGPU, which would allow ATI to make a temporary return. I can't see Intel feeding snakes at their fab-bosom (or producing "Zenselves").
At this point in the silicon end game, technology becomes a side show to politics and it's horribly fascinating to watch.
In terms of perf/W Vega might get ~22% more perf for ~20% more power. So essentially they should be near as darnit the same. Except AMD will probably be cheaper, and because each card is more powerful, you'll be able to pack more compute into a given amount of rack space. Which is what the people who run multi-million $ HPC research machines will *really* be interested in, because that's kind of their job.
Not all servers can handle providing 300 W to add in cards, so even if they are announced as 300 W cards, they may be limited to something closer to 250 W in actual deployments.
the kind of servers these cards aim for should be able to handle the load. And odds are many will simply be buying new servers with new hardware, rather then buying new cards and putting them in old servers.
This is probably going to force NVIDIA to rethink the current strategic product segmentation they've implemented by withholding packed FP16 support from the Titan X cards. These announced products from AMD don't really compete with the P100, I think, but they are appealing for anyone thinking of training neural networks on the Titan X or scaled out servers using the Titan X. The Volta-based iteration of the Titan X may need to include FP16 support, which may then force that support onto the Volta-based P40 and P4 replacements, as well.
These AMD products are well too late to affect the Pascal generation cards, though. It takes a long time for a new product to be qualified for a large server and I'm guessing the middleware and framework support isn't really there either, and isn't likely to be up to snuff for a while.
Amen, brother. This separation of training/inference to run on different hardware pissed me off. I hope Nvidia gets a little bit of competition. On the other hand, maybe we will find a way to train networks with 8 bits precision, after all it's highly unlikely our biological neurons/synapses are that precise.
What segment are they shooting for with these parts?
The MI6 which is implied to be an inference targeted device has one quarter the performance at three times the power draw of the P4 for INT8, this doesn't look bad, it looks like an embarrassment to the industry. ~8.5% of the performance per watt of parts that have been shipping for a while now for a product we don't even have a launch date for?
The MI8 doesn't have enough memory to do any data heavy workloads, it is too big and way too power hungry for its performance for inference, what exactly is this part any good for?
MI25 without memory amounts, bandwidth and some useful performance numbers(TOPS) it's hard to gauge where this is going to fall. Maybe this could be useful as an entry level device if priced really cheap?
Their software stack, well, AMD has a justly earned reputation of being a third tier, at best, software development house. The only hope they have it relying on the community, their problem is going to be despite the borderline vulgar levels of misinformation and propaganda to the contrary, this *IS* an established market with massive resources already devoted to it, and AMD is coming very late to the game and are going to try and woo resources that have been working with the competition for years already?
Comments on high levels of bandwidth for large scale deployment are kind of quaint. Why are you comparing the AMD solutions to those of Intel for high end usage? The high end for this market is using Power/nVidia with NVLink and measuring bandwidth in TB/s, the segment you are talking about is, at best, mid tier.
What's worse, from a useful information perspective, is your comments that AMD making their own CPUs is rare in this market. In terms of volume the most popular use case for deep learning is going to be paired with ARM processors for the next decade at least- a market that has many players already and nVidia is quickly pushing them out of the segment. The only real viable competition at this point seems likely to come from Intel and their upcoming Xeon Phi parts, which appear to be likely to ship roughly when AMD would be shipping these parts.
Pretty much, everyone that matters in deep learning makes CPUs.
Intel, nVidia, IBM and Qualcomm currently represent all of the major players- I know there are a bunch of FPGA and DSPs on the drawing board, but out of actual shipping solutions, the players all make their own CPUs.
Obviously I'm talking about the hardware manufacturer side.
Well AMD's biggest problem is the software stack. But that issue aside, only the MI25 looks promising to me. I'm not sure why we should be too confident in AMD's ability to get the ball rolling with machine learning when they've had HPC offerings all along and barely had success. Guess we gotta wait and see.
The way AMD does best right now is bidding for custom hardware for specific customers, combined with their willingness to accept lower margins then the opposition, so they win the deal. They can then do something general purpose based off that and sell some more, but the core funding is done by the big customer. See console deals, or apple gpu deal for examples.
Because that customer knows exactly what they want and AMD are so cheap the customer does most of the software, AMD just provides hardware. That I suspect will be the real aim here - provide google/amazon/someone big with some cheap custom hardware.
Considering how the AMD ARM server initiative crashed and burned, I think this is going to be a pretty rough uphill battle. It seems the company has all the internal knowledge to create an end to end solution with their own CPU/GPU/motherboard/interconnect for HPCs, but somehow are drastically falling short against Intel and Nvidia whenever its time to execute.
The prices are going to have to be very competitive to get a foothold into this market, but this is a market that's also not as price conscious as the consumer segment, when you consider bad software or tools can lead to man-months wasted (which is easily $10 of thousands of dollars when discussing Silicon valley engineering salary time).
That's typical of AMD. Here I have a Gigabyte RX 480 8GB. And they haven't even got drivers for other Linux distros than Ubuntu and RHEL. (I'm on Arch Linux.) The drivers they have for Ubuntu and RHEL are buggy, and there is no Vulkan support and spotty OpenCL support. The Open Source drivers I'm using give me all kinds of artifacts and glitches in Blender as soon as its window is displayed. The Blender UI is constantly corrupted. I uninstalled. And now they are already busy with other cards.
I'm also waiting for Zen to be released. To compare with what Intel has on offer. But it's highly unlikely that I'm going to stick into the meat grinder again.
It's probably never ever AMD again. The current Linux driver situation is just unforgivable. The green camp introduced GTX 1060 1 month after RX 480 and they have Linux drivers for all Linux distros. Same old. Same old.
Thank you for sharing your wonderful experience, I found it really very helpful and interesting. These tips are to be kept in mind for sure while writing. Good one! /
I wish these incapable cu|\|ts would release stable, bug free and fast Linux drivers for Radeon RX for all Linux distros first. Motherfuckers, get your priorities right. Don't take money first and then ignore.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
39 Comments
Back to Article
The_Assimilator - Monday, December 12, 2016 - link
*AMD recycling products intensifies*Well, I guess they gotta do something with all those Fiji chips they produced and nobody wanted.
JoeyJoJo123 - Monday, December 12, 2016 - link
To be fair, it's Fiji with HBM on the die, so it does at least that which the Polaris chips don't.MLSCrow - Monday, December 12, 2016 - link
Actually, lots of people wanted it, but it was a little too pricey. Side from that Google just bought a bunch of these actually and Alibaba may have done the same, even prior to this announcement.Demiurge - Monday, December 12, 2016 - link
Nvidia does the same thing with Kepler cards and previously Fermi when Kepler was the norm... Also, I would encourage you to try a GTX 710 or 730 if you think AMD recycling is bad.evilspoons - Tuesday, December 13, 2016 - link
I bought a half-height GT 730 when the GK208 version launched, but telling it apart from the GF108 version was idiotic. I can't believe they didn't call it a GT 735 or something. I had to read the core configuration on the side of the box, the clock speeds, and so on to avoid buying the Fermi part (I wanted the Kepler video decoder for my home theatre PC). Bleargh.jjj - Monday, December 12, 2016 - link
"Naples doesn’t have an official launch date"Zen server is Q2.
doggface - Monday, December 12, 2016 - link
Oh nelly. Vega looks to be very interesting...ddriver - Monday, December 12, 2016 - link
It looks like it won't be setting any efficiency records though. Adding the interconnect to maximize FP16 throughput guts efficiency as expected.The result is that for FP32 for Fiji we have 8.2 tflops in 175W budget at 28nm and for Vega 12.5 tflops in 300W budget at 14nm.
In other words, process is scaled down twice, TDP budget is increased almost twice, but performance gains are only 66% or so. That's fairly modest. I'd expect even if not mature yet, process alone outta result in a 40% boost at the very least, and the expanded TDP headroom another 50%, so close to 90% at the very least. But that's just the cost of maximizing FP16 throughput, for their own sake I hope this instinct will be a different chip overall rather than just re-branding, cuz that would mean the workstation and compute workflows will needlessly suffer for the sake of a feature that is irrelevant in those fields.
Drumsticks - Monday, December 12, 2016 - link
It's not too bad, I think. The P100 offers 18.7 TF of half precision performance at about 250W, so AMD in theory is ahead of Nvidia on the efficiency curve here, offering around 35% more FLOPs for 20% more power. Now, AMD TF != Nvidia TF, especially in gaming, but there's probably a chance to expect that AMD could achieve better hardware efficiency in a compute environment than in a gaming one.Yojimbo - Monday, December 12, 2016 - link
I don't think it's correct to compare the efficiency of the MI25 with the P100. Rather it should be compared efficiency-wise with the P40, as strong FP64 is not something that's been mentioned for the MI25 as far as I see.. Note that the P40 uses GDDR5 and not HBM2, which reduces its efficiency. I know the P40 doesn't have FP16 support but I don't think the MI25 will really be competing much with the Pascal generation of Tesla cards except after they are offered at a lower price once the Volta generation of cards are available. These Radeon cards are not just drop-in replacements for NVIDIA's hardware. Even assuming AMD can produce the MI25 in volume in Q2 2017, it will take a bit of testing and validation before people are willing to use it en mass in servers. Users also have to think about software and middleware considerations.In any case, they seem to be claiming efficiency close to the P40, which is a bit surprising. What we do know is that AMD claimed strong efficiency with Polaris before it was released and they overstated their claims. For me, I am taking their claims with a grain of salt until the product is actually released.
CoD511 - Saturday, December 31, 2016 - link
Well, what I find shocking is the P4 with a 5.5TFLOP rating at 50w/75w versions as the rated maximum power, not even using TDP to obfuscate the numbers. It's right near the output of the 1070 but the power numbers are just, what? If that's true and it may well be considering they're available products, I wonder how they've got that set up to draw so little power yet output so much or process at such speed.jjj - Monday, December 12, 2016 - link
The math doesn't work like that at all.Additionally, we don't know the die size and the GPU die is not the only thing using power on a GPU AiB.
What we do know is that it gets more at 300W rated TDP than Nvidia's P100.
ddriver - Monday, December 12, 2016 - link
"The math doesn't work like that at all." - neither do flamboyant statements devoid of substantiation. The numbers are exactly where I'd expect them to be based on rough estimates on the cost of implementing a more fine-grained execution engine.RussianSensation - Monday, December 12, 2016 - link
Almost everyone has gotten this wrong this generation. It's not 2 node jumps because the 14nm GloFo and 16nm TSMC are really "20nm equivalent nodes." The 14nm/16nm at GloFo and TSMC are more marketing than a true representation."Bottom line, lithographically, both 16nm and 14nm FinFET processes are still effectively offering a 20nm technology with double-patterning of lower-level metals and no triple or quad patterning."
https://www.semiwiki.com/forum/content/1789-16nm-f...
Intel's 14nm is far superior to the 14nm/16nm FinFET nodes offered by GloFo and TSMC at the moment.
abufrejoval - Monday, December 19, 2016 - link
Which is *exactly* why I find the rumor that AMD is licensing its GPUs to Chipzilla to replace Intel iGPUs so scary: With that Intel would be able to produce true HSA APUs with HBM2 and/or EDRAM which nobody else can match.Intel has given away more than 50% of silicon real-estate for years for free to starve off Nvidia and AMD (isn't that illegal silicon dumping?) and now they could be ripping the crown jewels off a starving AMD to crash NVidia where Knights Lansing failed.
AMD having on-par CPU technology now is only going to pull some punch, when it's accompanied with a powerful GPU part in their APUs that Intel can't match and NVidia can't deliver.
They license that to Intel, they are left with nothing to compete with.
Perhaps Intel lured AMD by offering their foundries for dGPU, which would allow ATI to make a temporary return. I can't see Intel feeding snakes at their fab-bosom (or producing "Zenselves").
At this point in the silicon end game, technology becomes a side show to politics and it's horribly fascinating to watch.
cheshirster - Sunday, January 8, 2017 - link
If Apple is a customer everything is possible.lobz - Tuesday, December 13, 2016 - link
ddriver......do you have any idea what else is going on under the hood of that surprisingly big card? =}
there could be a lot of things accumulating that add up to <300W, which is still lower then the P100's 10,6 TF @ 300W =}
hoohoo - Wednesday, December 14, 2016 - link
You're being hyperbolic.24.00 W/TF for Vega.
21.34 W/TF for Fiji.
12% higher power use for Vega. That's not really gutted.
Haawser - Monday, December 12, 2016 - link
PCIe P100 = 18.7TF of 16bit in 250WPCie Vega = ~24TF of 16bit in 300W
In terms of perf/W Vega might get ~22% more perf for ~20% more power. So essentially they should be near as darnit the same. Except AMD will probably be cheaper, and because each card is more powerful, you'll be able to pack more compute into a given amount of rack space. Which is what the people who run multi-million $ HPC research machines will *really* be interested in, because that's kind of their job.
Ktracho - Monday, December 12, 2016 - link
Not all servers can handle providing 300 W to add in cards, so even if they are announced as 300 W cards, they may be limited to something closer to 250 W in actual deployments.TheinsanegamerN - Tuesday, December 13, 2016 - link
the kind of servers these cards aim for should be able to handle the load. And odds are many will simply be buying new servers with new hardware, rather then buying new cards and putting them in old servers.Michael Bay - Monday, December 12, 2016 - link
I`d like to know if AMD is sharing PR team with Seagate, or vice versa. Oh those product names.Yojimbo - Monday, December 12, 2016 - link
This is probably going to force NVIDIA to rethink the current strategic product segmentation they've implemented by withholding packed FP16 support from the Titan X cards. These announced products from AMD don't really compete with the P100, I think, but they are appealing for anyone thinking of training neural networks on the Titan X or scaled out servers using the Titan X. The Volta-based iteration of the Titan X may need to include FP16 support, which may then force that support onto the Volta-based P40 and P4 replacements, as well.These AMD products are well too late to affect the Pascal generation cards, though. It takes a long time for a new product to be qualified for a large server and I'm guessing the middleware and framework support isn't really there either, and isn't likely to be up to snuff for a while.
p1esk - Monday, December 12, 2016 - link
Amen, brother. This separation of training/inference to run on different hardware pissed me off. I hope Nvidia gets a little bit of competition.On the other hand, maybe we will find a way to train networks with 8 bits precision, after all it's highly unlikely our biological neurons/synapses are that precise.
Threska - Sunday, January 1, 2017 - link
Analog is a different beast.Holliday75 - Monday, December 12, 2016 - link
My portfolio likes this news and I am thrilled to see the lack of comments asking how many display ports this card has.BenSkywalker - Monday, December 12, 2016 - link
What segment are they shooting for with these parts?The MI6 which is implied to be an inference targeted device has one quarter the performance at three times the power draw of the P4 for INT8, this doesn't look bad, it looks like an embarrassment to the industry. ~8.5% of the performance per watt of parts that have been shipping for a while now for a product we don't even have a launch date for?
The MI8 doesn't have enough memory to do any data heavy workloads, it is too big and way too power hungry for its performance for inference, what exactly is this part any good for?
MI25 without memory amounts, bandwidth and some useful performance numbers(TOPS) it's hard to gauge where this is going to fall. Maybe this could be useful as an entry level device if priced really cheap?
Their software stack, well, AMD has a justly earned reputation of being a third tier, at best, software development house. The only hope they have it relying on the community, their problem is going to be despite the borderline vulgar levels of misinformation and propaganda to the contrary, this *IS* an established market with massive resources already devoted to it, and AMD is coming very late to the game and are going to try and woo resources that have been working with the competition for years already?
Comments on high levels of bandwidth for large scale deployment are kind of quaint. Why are you comparing the AMD solutions to those of Intel for high end usage? The high end for this market is using Power/nVidia with NVLink and measuring bandwidth in TB/s, the segment you are talking about is, at best, mid tier.
What's worse, from a useful information perspective, is your comments that AMD making their own CPUs is rare in this market. In terms of volume the most popular use case for deep learning is going to be paired with ARM processors for the next decade at least- a market that has many players already and nVidia is quickly pushing them out of the segment. The only real viable competition at this point seems likely to come from Intel and their upcoming Xeon Phi parts, which appear to be likely to ship roughly when AMD would be shipping these parts.
Pretty much, everyone that matters in deep learning makes CPUs.
p1esk - Monday, December 12, 2016 - link
*Pretty much, everyone that matters in deep learning makes CPUs.*Sorry, what?
BenSkywalker - Monday, December 12, 2016 - link
Intel, nVidia, IBM and Qualcomm currently represent all of the major players- I know there are a bunch of FPGA and DSPs on the drawing board, but out of actual shipping solutions, the players all make their own CPUs.Obviously I'm talking about the hardware manufacturer side.
Yojimbo - Monday, December 12, 2016 - link
It'll be interesting to see what Graphcore's offering is like.Yojimbo - Monday, December 12, 2016 - link
Well AMD's biggest problem is the software stack. But that issue aside, only the MI25 looks promising to me. I'm not sure why we should be too confident in AMD's ability to get the ball rolling with machine learning when they've had HPC offerings all along and barely had success. Guess we gotta wait and see.Dribble - Tuesday, December 13, 2016 - link
The way AMD does best right now is bidding for custom hardware for specific customers, combined with their willingness to accept lower margins then the opposition, so they win the deal. They can then do something general purpose based off that and sell some more, but the core funding is done by the big customer. See console deals, or apple gpu deal for examples.Because that customer knows exactly what they want and AMD are so cheap the customer does most of the software, AMD just provides hardware. That I suspect will be the real aim here - provide google/amazon/someone big with some cheap custom hardware.
webdoctors - Tuesday, December 13, 2016 - link
Considering how the AMD ARM server initiative crashed and burned, I think this is going to be a pretty rough uphill battle. It seems the company has all the internal knowledge to create an end to end solution with their own CPU/GPU/motherboard/interconnect for HPCs, but somehow are drastically falling short against Intel and Nvidia whenever its time to execute.The prices are going to have to be very competitive to get a foothold into this market, but this is a market that's also not as price conscious as the consumer segment, when you consider bad software or tools can lead to man-months wasted (which is easily $10 of thousands of dollars when discussing Silicon valley engineering salary time).
Looks like 2017 should be interesting.
TheinsanegamerN - Tuesday, December 13, 2016 - link
It would help if they could deliver on time and on budget. They always seem to get products out months after they are supposed to.IntoGraphics - Wednesday, December 14, 2016 - link
That's typical of AMD.Here I have a Gigabyte RX 480 8GB. And they haven't even got drivers for other Linux distros than Ubuntu and RHEL. (I'm on Arch Linux.) The drivers they have for Ubuntu and RHEL are buggy, and there is no Vulkan support and spotty OpenCL support.
The Open Source drivers I'm using give me all kinds of artifacts and glitches in Blender as soon as its window is displayed. The Blender UI is constantly corrupted. I uninstalled.
And now they are already busy with other cards.
I'm also waiting for Zen to be released. To compare with what Intel has on offer. But it's highly unlikely that I'm going to stick into the meat grinder again.
It's probably never ever AMD again.
The current Linux driver situation is just unforgivable.
The green camp introduced GTX 1060 1 month after RX 480 and they have Linux drivers for all Linux distros. Same old. Same old.
IntoGraphics - Wednesday, December 14, 2016 - link
"I'm going to stick into the meat grinder again." should be "I'm not going to stick my d.ck in the meat grinder again.".appsforsys - Saturday, December 17, 2016 - link
Thank you for sharing your wonderful experience, I found it really very helpful and interesting.These tips are to be kept in mind for sure while writing.
Good one!
/
IntoGraphics - Tuesday, January 3, 2017 - link
I wish these incapable cu|\|ts would release stable, bug free and fast Linux drivers for Radeon RX for all Linux distros first. Motherfuckers, get your priorities right. Don't take money first and then ignore.IntoGraphics - Wednesday, January 4, 2017 - link
Radeon Itstinks.