Outside of professional applications, for a GPU, the limitation is currently on GPU performance, and isn't being held back all that much by the existing HBM2 or GDDR6 standards. Sure, faster memory will result in some benefits, but for the difference in cost, the benefits may not be worth it. That is the big question, and I would hope that AMD and/or NVIDIA would show the performance difference of a given GPU with GDDR6 vs HBM2 or HBM2E. If you get 5fps better with the HBM2E memory but it raises the cost of a video card by $200, people won't want to go for it.
Well, actually a lot of recent AMD cards could really use more B/W. The Radeon VII has plenty but it had to resort to 4 stacks of HBM -- if we could provide 920GB/sec using only 2 stacks, it would be really handy.
Radeon VII performance still scales with memory bandwidth -- so it's hard to say. Nvidia's arch is totally different so you can't really compare it directly to them. Navi also seems to hit a memory b/w wall around 2100Mhz as well.
I'm with @extide here - AMD cards seriously struggle with memory bandwidth. Look at OCing reviews of the 5700XT, it hardly scales at all with core frequency OCing, but scales healthily with memory clock (what little you can extract from it, anyway). I suspect a 5700 XT with 920GB/s from two stacks of this new HBM would be quite a bit faster.
Compute workloads are uncompressed, so require a ton of memory bandwidth. This is why Vega 20 steamrolls Vega 10 in compute.
Graphics workloads, however, benefit from memory compression, so there's a diminishing return with memory OC in GPU-limited scenarios (higher resolutions).
What people are mixing up as a memory bandwidth limitation though is during overclocking, they tend to increase the power limit of the GPU. Of course, allowing the GPU to reduce its power throttling will improve performance.
What needs to be done is a run at stock clocks, stock power limit, and a comparison between 14Gbps and 15Gbps (Navi) or 1100-1250MHz HBM on Vega 20. If there's no real gains, there isn't a memory bandwidth limitation. If you see gains outside of the margin of error but below 5%, I'd call it marginal. When you stsrt seeing upwards of 10% performance as you see in APUs using faster DDR4 system memory, then it's safer to declare that memory bandwidth is a limiting factor.
lol what are you talking about "It's not like the Radeon VII actually needs that much BW". The only reason AMD went with HBM memory in the first place a few years ago was a costly "last ditch" effort to get close to Nvida cards performance. There GPU could not meet snuff so AMD had to give them an "edge" which was the HBM. Which at the time was widely reported killed their margins on the card. HBM is still not cheap and while AMD recently has had a good run there still a reason why AMD is not using GDDR6.
Except at the race to the bottom end, it's been a really long time since we've had GPUs bottlenecked by memory IO in normal gaming use. Better texture compression and more total rendering time being spent doing things that don't need more textures are helping here a lot.
Some compute uses can do so, and more can benefit from the higher max capacity it allows; which is why NVidia eschews it in consumer cards while offering it in some workstation/compute cards. AMD's made more use of it consumerside; but even they switched back to GDDR6 for RX5700 at 50% of Radeon VI's bandwidth without problem.
Pushing higher per stack speeds and capacities to let cards get away with fewer stacks might help push costs down enough to make it more mainstream if GDDR ever does run out of room to grow; especially if the smaller stack count allows replacing big expensive interposers with tiny bits of silicon like with Intel's EMIB. That said in the medium term I'm not really expecting to see it in the consumer end much if at all - and only to the extent that consumer halo products are rebadged workstation/compute designs.
Actually memory bandwidth has been an issue for consumer GPUs for a while. Companies recognized it years ago and NVIDIA especially invested significant resources to help mitigate the problem. If the memory bandwidth had been there on the road map to begin with they would have used it and put the resources somewhere else. AMD have been bandwidth limited in the meantime because they didn't have the same resources to spend.
There might be benefits to be had in laptops for mid-range GPU's. Instead of 128bit GDDR5 / 6 they could use one stack of HBM2(E) to save space and power, which might be interesting for vendors like Apple & Microsoft, who can charge very high prices for their top-end mobile devices.
Obviously it is not interesting for your $999 gaming laptop manufacturer who much rather increases the dimension of the mobo by couple of cm² to fit few more memory chips, but for Surface Book and Macbook pro models going for $2000+, increased energy efficiency and space savings (more battery) could be worth it.
I actually think HBM solves a much more interesting problem than that. APU performance suffers a lot because CPUs need better latencies than GDDR6 provides while GPUs need higher bandwidth than DDR4 can provide. HBM has more bandwidth, lower latencies, AND lower power.
A current stack of HBM supposedly costs around $80 per 4GB or around $20/GB. If new stacks can provide 2x the density, that price should reduce cost by 50% or more. That would make 16GB around $160 or around $185 with a $25 substrate. AMD can provide their current desktop APUs for $100 - $150. Doubling the GPU from 10CU to 20CU would increase die size roughly 30-50%.
GPU's at least on AMD's side are a bit BW starved On Vega 10 it is very clearly seen once you OC the HBM2 past stock and see a decent performance increase in most games. Navi 10 is somewhat starved as well but not to the same extent as Vega 10 is. There have been videos on YT where they have looked into this on Navi recently and found that at a certain point over clocking the core has little affect on speed and even the little bit of extra OC they got from the GDDR6 netted a decent amount of extra performance but it is to bad that the GDDR6 on Navi has so little room left for any kind of extra OC. My own thinking is that it might have someting to do with the GPU's IMC or maybe board design and that if it is just because of the way the boards are designed maybe the third party cards coming out that are custom cards will fix this problem hopefully.
On my own Vega 56 bios flashed to 64 just going from 800MHz HBM2 to 945MHz HBM2 nets a huge performance gain and if you are lucky and the memory will go even higher like mine does 1150MHz sees another major step up for performance. It would be the same and probably to some extent even a bigger gain for the 64 at 1100-1150MHz HBM2.
The reason why you are not seeing memory limitation is because GPU performance scales pretty much linearly with transistor count, and you only get transistor increase with die shrink.
So for every die shrink, assuming the same die size and work load you will need an increase in memory bandwidth. Not to mention we need faster memory so less Die Space are used for I/O.
We have been stuck with 16/14nm for a long time. We have clear roadmap now for 7nm, 5nm, and 3nm for the next 5 years. Which means we need something like 3 - 5x more bandwidth, and as I mentioned I/O don't shrink as well as logic, at some point the cost of using HBM will be less than wasting Die Space on GDDR controller. ( Assuming HBM also drop in price )
what u r saying is percisely 100% flase. its quite the otherway around. nvidia and radeon cards are both memory bottlenecked at the moment and benefit immensly from memory overclocks. radeon was even more bw starved than nvidia. if u had any clue what radeon 7 was relative to vega 64, in terms of CU count Clocks and memory bw, u wouldnt have made this stupid comment.
with higher res and refresh rates even more bw will be required
would be GREAT if when they say HBM etc will be X speed that when we see on GPU is at least this speed, AMD is a bit "worse" at least were for Vega (due to power limits) that HBM was not running "spec speed" as it would kill its PCI SIG specifications (i.e go past this, you lose license/kill computers)
anyway, cool they make faster HBM less power etc, but, it not matter if no one uses at that speed/power, does it? ^.^
that being said, pretty sure the "next" Vega or whatever will use this new HBM is going to be crazy quick (no reason why should not be at least a chunk % "better"
Where are you getting that from? They sold a ridiculous amount of Vega cards with HBM2. They lucked out a bit in that it was good for mining, but its also been good for their pro uses (which is the market that Vega was really targeted at, and Vega was developed with HBM in mind).
People need to remember that HBM2 is a datacenter product. These things are hundreds of dollars a unit for 8-hi stacks. People hoping for an affordable GPU with even two of these stacks integrated are in dreamland.
Its expensive but its not that expensive. Plus if you used it as the system memory, it'd cut out DRAM cost, while having big performance uptick (pair 16GB HBM2 with like 128GB of NAND - where the NAND could come close to DDR4 speeds with HBM2 as the NAND cache), and you'd get both much larger memory space as well as much faster memory speed (speaking of total system, think of the HBM2 being like a huge CPU/GPU cache, with the NAND being equivalent of DRAM but larger in size and non-volatile to boot).
My point being, if you built a system around using HBM, the cost would be less of an issue, and would bring bigger benefits than just using it for say the GPU. It'd also enable more compact systems. But it would require developing new platform(s), although I think AMD should be looking at that due to how OEMs have been slow to support their products. AMD could make reference designs and then sell complete boards to OEMs (who put it in their own chassis, and then do the support part of things).
For AMD, they could maybe take the I/O die and put it on an interposer with the HBM, and then connect to CPU and GPU chips out from there. I don't know if they could FAB the HBM themselves too (maybe even integrate it right into the I/O die so it might not need interposer even)?
I've posted this in the forums, but I wish AMD would develop a new platform, that would be like a high end PC version of consoles. It'd have CPU chiplets, GPU chiplets, and replace the system memory with HBM (which would function like a huge L3 cache). The HBM could also work as the buffer for NAND (they could put some amount onboard, where it'd mitigate any potential memory limitations - i.e. 16GB of HBM system memory for cost reasons where the onboard NAND could have similar to DRAM speeds but larger capacity; then they'd have PCIe 4.0 capable slots for SSD expansion for storage). It'd also let them easily do unified memory. It'd be good for gaming (which would let them translate a lot of development for the consoles to PC) but I think would really fly for workstations.
It'd also let them be less constrained by the socket (so for instance they could release APU systems that have two CPU chiplets and a large GPU chip, without being limited like they are by AM4 in both power and packaging size). I could see them doing something like 1-6 CPU chiplets, where the'd have small NUC like boxes for single CPU chiplet APU setups, but then have a mainstream one that could be integrated into laptops, all-in-ones, and SFF systems (plus sell as stuff like Steamboxes and Windows versions of that), then 3-6 chiplet ones for workstations (with options for multiple GPU chiplets). The number of HBM stacks would be based on the CPU chiplets (matching that number, where they could go for different stack heights and different speeds as well).
It also would let them bypass OEMs (who have been slow to support AMD platforms, and when they do often do silly stuff like use single channel memory or other things to cheap out). Heck, we still don't have Threadripper workstations (unless you built your own).
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
23 Comments
Back to Article
Targon - Monday, August 12, 2019 - link
Outside of professional applications, for a GPU, the limitation is currently on GPU performance, and isn't being held back all that much by the existing HBM2 or GDDR6 standards. Sure, faster memory will result in some benefits, but for the difference in cost, the benefits may not be worth it. That is the big question, and I would hope that AMD and/or NVIDIA would show the performance difference of a given GPU with GDDR6 vs HBM2 or HBM2E. If you get 5fps better with the HBM2E memory but it raises the cost of a video card by $200, people won't want to go for it.extide - Monday, August 12, 2019 - link
Well, actually a lot of recent AMD cards could really use more B/W. The Radeon VII has plenty but it had to resort to 4 stacks of HBM -- if we could provide 920GB/sec using only 2 stacks, it would be really handy.guidryp - Monday, August 12, 2019 - link
It's not like the Radeon VII actually needs that much BW. It gets outperformed by the RTX 2080 that only has about half that (448 GB/sec).I doubt we will see HBM except at the VERY high end of GPUs going forward, since they are keeping the prices high.
mdriftmeyer - Monday, August 12, 2019 - link
He's talking about computation crunching, not gaming.extide - Monday, August 12, 2019 - link
Radeon VII performance still scales with memory bandwidth -- so it's hard to say. Nvidia's arch is totally different so you can't really compare it directly to them. Navi also seems to hit a memory b/w wall around 2100Mhz as well.rhysiam - Tuesday, August 13, 2019 - link
I'm with @extide here - AMD cards seriously struggle with memory bandwidth. Look at OCing reviews of the 5700XT, it hardly scales at all with core frequency OCing, but scales healthily with memory clock (what little you can extract from it, anyway). I suspect a 5700 XT with 920GB/s from two stacks of this new HBM would be quite a bit faster.JasonMZW20 - Tuesday, August 13, 2019 - link
Compute workloads are uncompressed, so require a ton of memory bandwidth. This is why Vega 20 steamrolls Vega 10 in compute.Graphics workloads, however, benefit from memory compression, so there's a diminishing return with memory OC in GPU-limited scenarios (higher resolutions).
What people are mixing up as a memory bandwidth limitation though is during overclocking, they tend to increase the power limit of the GPU. Of course, allowing the GPU to reduce its power throttling will improve performance.
What needs to be done is a run at stock clocks, stock power limit, and a comparison between 14Gbps and 15Gbps (Navi) or 1100-1250MHz HBM on Vega 20. If there's no real gains, there isn't a memory bandwidth limitation. If you see gains outside of the margin of error but below 5%, I'd call it marginal. When you stsrt seeing upwards of 10% performance as you see in APUs using faster DDR4 system memory, then it's safer to declare that memory bandwidth is a limiting factor.
JasonMZW20 - Tuesday, August 13, 2019 - link
1000-1250MHz on Vega 20*Skeptical123 - Tuesday, August 13, 2019 - link
lol what are you talking about "It's not like the Radeon VII actually needs that much BW". The only reason AMD went with HBM memory in the first place a few years ago was a costly "last ditch" effort to get close to Nvida cards performance. There GPU could not meet snuff so AMD had to give them an "edge" which was the HBM. Which at the time was widely reported killed their margins on the card. HBM is still not cheap and while AMD recently has had a good run there still a reason why AMD is not using GDDR6.DanNeely - Monday, August 12, 2019 - link
Except at the race to the bottom end, it's been a really long time since we've had GPUs bottlenecked by memory IO in normal gaming use. Better texture compression and more total rendering time being spent doing things that don't need more textures are helping here a lot.Some compute uses can do so, and more can benefit from the higher max capacity it allows; which is why NVidia eschews it in consumer cards while offering it in some workstation/compute cards. AMD's made more use of it consumerside; but even they switched back to GDDR6 for RX5700 at 50% of Radeon VI's bandwidth without problem.
Pushing higher per stack speeds and capacities to let cards get away with fewer stacks might help push costs down enough to make it more mainstream if GDDR ever does run out of room to grow; especially if the smaller stack count allows replacing big expensive interposers with tiny bits of silicon like with Intel's EMIB. That said in the medium term I'm not really expecting to see it in the consumer end much if at all - and only to the extent that consumer halo products are rebadged workstation/compute designs.
Yojimbo - Monday, August 12, 2019 - link
Actually memory bandwidth has been an issue for consumer GPUs for a while. Companies recognized it years ago and NVIDIA especially invested significant resources to help mitigate the problem. If the memory bandwidth had been there on the road map to begin with they would have used it and put the resources somewhere else. AMD have been bandwidth limited in the meantime because they didn't have the same resources to spend.zepi - Monday, August 12, 2019 - link
There might be benefits to be had in laptops for mid-range GPU's. Instead of 128bit GDDR5 / 6 they could use one stack of HBM2(E) to save space and power, which might be interesting for vendors like Apple & Microsoft, who can charge very high prices for their top-end mobile devices.Obviously it is not interesting for your $999 gaming laptop manufacturer who much rather increases the dimension of the mobo by couple of cm² to fit few more memory chips, but for Surface Book and Macbook pro models going for $2000+, increased energy efficiency and space savings (more battery) could be worth it.
quadrivial - Friday, August 16, 2019 - link
I actually think HBM solves a much more interesting problem than that. APU performance suffers a lot because CPUs need better latencies than GDDR6 provides while GPUs need higher bandwidth than DDR4 can provide. HBM has more bandwidth, lower latencies, AND lower power.A current stack of HBM supposedly costs around $80 per 4GB or around $20/GB. If new stacks can provide 2x the density, that price should reduce cost by 50% or more. That would make 16GB around $160 or around $185 with a $25 substrate. AMD can provide their current desktop APUs for $100 - $150. Doubling the GPU from 10CU to 20CU would increase die size roughly 30-50%.
For $400-450, I'd buy that chip.
rocky12345 - Monday, August 12, 2019 - link
GPU's at least on AMD's side are a bit BW starved On Vega 10 it is very clearly seen once you OC the HBM2 past stock and see a decent performance increase in most games. Navi 10 is somewhat starved as well but not to the same extent as Vega 10 is. There have been videos on YT where they have looked into this on Navi recently and found that at a certain point over clocking the core has little affect on speed and even the little bit of extra OC they got from the GDDR6 netted a decent amount of extra performance but it is to bad that the GDDR6 on Navi has so little room left for any kind of extra OC. My own thinking is that it might have someting to do with the GPU's IMC or maybe board design and that if it is just because of the way the boards are designed maybe the third party cards coming out that are custom cards will fix this problem hopefully.On my own Vega 56 bios flashed to 64 just going from 800MHz HBM2 to 945MHz HBM2 nets a huge performance gain and if you are lucky and the memory will go even higher like mine does 1150MHz sees another major step up for performance. It would be the same and probably to some extent even a bigger gain for the 64 at 1100-1150MHz HBM2.
ksec - Monday, August 12, 2019 - link
The reason why you are not seeing memory limitation is because GPU performance scales pretty much linearly with transistor count, and you only get transistor increase with die shrink.So for every die shrink, assuming the same die size and work load you will need an increase in memory bandwidth. Not to mention we need faster memory so less Die Space are used for I/O.
We have been stuck with 16/14nm for a long time. We have clear roadmap now for 7nm, 5nm, and 3nm for the next 5 years. Which means we need something like 3 - 5x more bandwidth, and as I mentioned I/O don't shrink as well as logic, at some point the cost of using HBM will be less than wasting Die Space on GDDR controller. ( Assuming HBM also drop in price )
azfacea - Monday, August 12, 2019 - link
what u r saying is percisely 100% flase. its quite the otherway around. nvidia and radeon cards are both memory bottlenecked at the moment and benefit immensly from memory overclocks. radeon was even more bw starved than nvidia. if u had any clue what radeon 7 was relative to vega 64, in terms of CU count Clocks and memory bw, u wouldnt have made this stupid comment.with higher res and refresh rates even more bw will be required
extide - Monday, August 12, 2019 - link
Interestingly enough, the first GDDR5 GPU (Radeon HD 4870) used 3.6Gbps GDDR5.Dragonstongue - Monday, August 12, 2019 - link
would be GREAT if when they say HBM etc will be X speed that when we see on GPU is at least this speed, AMD is a bit "worse" at least were for Vega (due to power limits) that HBM was not running "spec speed" as it would kill its PCI SIG specifications (i.e go past this, you lose license/kill computers)anyway, cool they make faster HBM less power etc, but, it not matter if no one uses at that speed/power, does it? ^.^
that being said, pretty sure the "next" Vega or whatever will use this new HBM is going to be crazy quick (no reason why should not be at least a chunk % "better"
systemBuilder33 - Monday, August 12, 2019 - link
HBM2 practically bankrupted AMD. No thanks!darkswordsman17 - Monday, August 12, 2019 - link
Where are you getting that from? They sold a ridiculous amount of Vega cards with HBM2. They lucked out a bit in that it was good for mining, but its also been good for their pro uses (which is the market that Vega was really targeted at, and Vega was developed with HBM in mind).stadisticado - Monday, August 12, 2019 - link
People need to remember that HBM2 is a datacenter product. These things are hundreds of dollars a unit for 8-hi stacks. People hoping for an affordable GPU with even two of these stacks integrated are in dreamland.darkswordsman17 - Monday, August 12, 2019 - link
Its expensive but its not that expensive. Plus if you used it as the system memory, it'd cut out DRAM cost, while having big performance uptick (pair 16GB HBM2 with like 128GB of NAND - where the NAND could come close to DDR4 speeds with HBM2 as the NAND cache), and you'd get both much larger memory space as well as much faster memory speed (speaking of total system, think of the HBM2 being like a huge CPU/GPU cache, with the NAND being equivalent of DRAM but larger in size and non-volatile to boot).My point being, if you built a system around using HBM, the cost would be less of an issue, and would bring bigger benefits than just using it for say the GPU. It'd also enable more compact systems. But it would require developing new platform(s), although I think AMD should be looking at that due to how OEMs have been slow to support their products. AMD could make reference designs and then sell complete boards to OEMs (who put it in their own chassis, and then do the support part of things).
For AMD, they could maybe take the I/O die and put it on an interposer with the HBM, and then connect to CPU and GPU chips out from there. I don't know if they could FAB the HBM themselves too (maybe even integrate it right into the I/O die so it might not need interposer even)?
darkswordsman17 - Monday, August 12, 2019 - link
This is a nice development.I've posted this in the forums, but I wish AMD would develop a new platform, that would be like a high end PC version of consoles. It'd have CPU chiplets, GPU chiplets, and replace the system memory with HBM (which would function like a huge L3 cache). The HBM could also work as the buffer for NAND (they could put some amount onboard, where it'd mitigate any potential memory limitations - i.e. 16GB of HBM system memory for cost reasons where the onboard NAND could have similar to DRAM speeds but larger capacity; then they'd have PCIe 4.0 capable slots for SSD expansion for storage). It'd also let them easily do unified memory. It'd be good for gaming (which would let them translate a lot of development for the consoles to PC) but I think would really fly for workstations.
It'd also let them be less constrained by the socket (so for instance they could release APU systems that have two CPU chiplets and a large GPU chip, without being limited like they are by AM4 in both power and packaging size). I could see them doing something like 1-6 CPU chiplets, where the'd have small NUC like boxes for single CPU chiplet APU setups, but then have a mainstream one that could be integrated into laptops, all-in-ones, and SFF systems (plus sell as stuff like Steamboxes and Windows versions of that), then 3-6 chiplet ones for workstations (with options for multiple GPU chiplets). The number of HBM stacks would be based on the CPU chiplets (matching that number, where they could go for different stack heights and different speeds as well).
It also would let them bypass OEMs (who have been slow to support AMD platforms, and when they do often do silly stuff like use single channel memory or other things to cheap out). Heck, we still don't have Threadripper workstations (unless you built your own).