With much anticipation and more than a few leaks, NVIDIA this morning is announcing the next generation of video cards, the GeForce RTX 30 series. Based upon the gaming and graphics variant of NVIDIA’s Ampere architecture and built on an optimized version of Samsung’s 8nm process, NVIDIA is touting the new cards as delivering some of their greatest gains ever in gaming performance. All the while, the latest generation of GeForce will also be coming with some new features to further set the cards apart from and ahead of NVIDIA’s Turing-based RTX 20 series.

Out of the gate, NVIDIA is announcing the first three cards to make up the new RTX 30 series: the RTX 3090, RTX 3080, and RTX 3070. These cards are all launching within the next month and a half – albeit at slightly separate times – with the RTX 3090 and RTX 3080 leading the charge. The two cards, in turn, will serve as the successors to NVIDIA’s GeForce RTX 2080 Ti and RTX 2080/2080S respectively, hitting new highs in graphics performance, albeit while also hitting new highs in prices in the case of the RTX 3090.

The first card out the door will be the GeForce RTX 3080. With NVIDIA touting upwards of 2x the performance of the RTX 2080, this card will go on sale on September 17th for $700. That will be followed up a week later by the even more powerful GeFoce RTX 3090, which hits the shelves September 24th for $1500. Finally, the RTX 3070, which is being positioned as more of a traditional sweet spot card, will arrive next month at $499.

NVIDIA GeForce Specification Comparison
  RTX 3090 RTX 3080 RTX 3070 RTX 2080 Ti
CUDA Cores 10496 8704 5888 4352
Boost Clock 1.7GHz 1.71GHz 1.73GHz 1545MHz
Memory Clock 19.5Gbps GDDR6X 19Gbps GDDR6X 16Gbps GDDR6 14Gbps GDDR6
Memory Bus Width 384-bit 320-bit 256-bit 352-bit
VRAM 24GB 10GB 8GB 11GB
Single Precision Perf. 35.7 TFLOPs 29.8 TFLOPs 20.4 TFLOPs 13.4 TFLOPs
Tensor Perf. (FP16) 143 TFLOPs 119 TFLOPs 82 TFLOPs 114 TFLOPs
Tensor Perf. (FP16-Sparse) 285 TFLOPs 238 TFLOPs 163 TFLOPs 114 TFLOPs
Ray Perf. 69 TFLOPs 58 TFLOPs 40 TFLOPs ?
TDP 350W 320W 220W 250W
GPU GA102 GA102 GA104? TU102
Transistor Count 28B 28B ? 18.6B
Architecture Ampere Ampere Ampere Turing
Manufacturing Process Samsung 8nm Samsung 8nm Samsung 8nm TSMC 12nm "FFN"
Launch Date 09/24/2020 09/17/2020 10/2020 09/20/2018
Launch Price MSRP: $1499 MSRP: $699 MSRP: $499 MSRP: $999
Founders $1199

Ampere for Gaming: GA102

As is traditionally the case for NVIDIA, this morning’s public presentation was not an architectural deep dive. Though the purely virtual presentation was certainly a change of pace for a company who treats every video card launch like a party, NVIDIA stuck to their successful launch playbook. That means a lot of demonstrations, testimonials, and promotional videos, along with some high-level overviews of several of the technologies and engineering design decisions that went into making their latest generation of GPUs. The net result is that we have a decent idea of what’s in store for the RTX 30 series, but we’ll have to wait for NVIDIA to offer some deep dive technical briefings to fill in the blanks and get to the heart of matters in true AnandTech style.

At a high level, Ampere and the GA102 GPU being used in these top-tier cards brings several major hardware advancements to NVIDIA’s lineup. The biggest of which is the ever-shrinking size of transistors, thanks to a customized version of Samsung's 8nm process. We only have limited information about this process – mostly because it hasn't been used too many places – but at a high level it's Samsung's densest traditional, non-EUV process, derived from their earlier 10nm process. All told, NVIDIA has ended up as a bit of a latecomer in moving to smaller processes, but as the company has re-developed an affinity for shipping large GPUs first, they need higher wafer yields (fewer defects) to get chips out the door.

In any case, for NVIDIA’s products Samsung's 8nm process is a full generational jump from their previous process, TSMC’s 12nm “FFN”, which itself was an optimized version of TSMC's 16nm process. So NVIDIA’s transistor densities have gone up significantly, leading to a 28B transistor chip in the case of GA102, which is reflected in the sheer number of CUDA cores and other hardware available. Whereas mid-generation architectures like Turing and Maxwell saw most of their gains at an architectural level, Ampere (like Pascal before it) benefits greatly from a proper jump in lithographic processes. The only hitch in all of this is that Dennard Scaling has died and isn’t coming back, so while NVIDIA can pack more transistors than ever into a chip, power consumption is creeping back up, which is reflected in the cards' TDPs.

NVIDIA hasn’t given us specific die sizes for GA102, but based on some photos we’re reasonably confident it’s over 500mm2. Which is notably smaller than the ridiculously-sized 754mm2 TU102, but it’s still a sizable chip, and among the largest chips produced at Samsung.

Moving on, let’s talk about the Ampere architecture itself. First introduced this spring as part of NVIDIA’s A100 accelerator, until now we’ve only seen Ampere from a matching compute-oriented perspective. GA100 lacked several graphics features so that NVIDIA could maximize the amount of die space allocated to compute, so while graphics-focused Ampere GPUs like GA102 are still a member of the Ampere family, there are a significant number of distinctions or differences between the two. Which is to say that NVIDIA was able to keep a lot under wraps about the gaming side of Ampere until now.

From a compute perspective, Ampere looked a fair bit like Volta before it, and the same can be said from a graphics perspective. GA102 doesn’t introduce any exotic new functional blocks like RT cores or tensor cores, but their capabilities and relative sizes have been tweaked. The most notable change here is that, like Ampere GA100, the gaming Ampere parts inherit updated and more powerful tensor cores, which NVIDIA calls their third generation tensor cores. A single Ampere tensor core can provide double the tensor throughput of a Turing tensor core, with NVIDIA essentially consolidating what was 8 tensor cores per SM into 4. So per SM, the tensor core performance is stable, and while this has some ramifications for how things work under the hood, for gaming Ampere parts you're looking at roughly just as many tensor ALUs per SM. Note that this is different from how Big Ampere (GA100) is configured; that part has 8 of the 3rd gen tensor cores per SM, doubling its performance over its predecessor.

Meanwhile NVIDIA has confirmed the tensor cores going into GA102 and other Ampere graphics GPUs also support sparsity for more performance, and in fact it's these figures that NVIDIA is quoting in today's presentation. So NVIDIA has not held back here in terms of tensor core features. But to a certain degree this does mean that the presentation was misleading – or at least not-apples-to-apples – as Turing didn't support sparsity. If you run "dense" arrays, Ampere is only a mild improvement over Turing.

Overall, this focus on tensor core performance underscores NVIDIA’s commitment to deep learning and AI performance, as the company sees deep learning as not just a driver of their datacenter business, but their gaming business as well. We only have to go as far as NVIDIA’s Deep Learning Super Sampling (DLSS) tech to see why; DLSS relies in part on the tensor cores to deliver as much performance as possible, and NVIDIA is still looking at more ways to put their tensor cores to good use.

The ray tracing (RT) cores have also been beefed up, though to what degree we’re not certain. Besides having more of them overall by virtue of GA102 having a larger number of SMs, the individual RT cores are said to be up to 2x faster, with NVIDIA presumably specifically quoting ray/triangle intersection performance. There are also some brief notes about RT core concurrency in NVIDIA's presentation slides, but the company didn't go into any real detail on the subject in the brief presentation, so we're waiting on technical briefings for more details.

Overall, faster RT cores is very good news for the gaming industry’s ray tracing ambitions, as ray tracing had a heavy performance cost on RTX 20 series cards. Now with that said, nothing NVIDIA does is going to completely eliminate that penalty – ray tracing is a lot of work, period – but more and rebalanced hardware can help bring that cost down.

Last but certainly not least, we have the matter of the shader cores. This is the area that's the most immediately important to gaming performance, and also the area where NVIDIA has said the least today. We know that the new RTX 30 series cards pack an incredible number of FP32 CUDA cores, and that it comes thanks to what NVIDIA is labeling as "2x FP32" in their SM configuration. As a result, even the second-tier RTX 3080 offers 29.8 TFLOPs of FP32 shader performance, more than double the last-gen RTX 2080 Ti. To put it succinctly, there is an incredible number of ALUs within these GPUs, and frankly a lot more than I would have expected given the transistor count.

Shading performance is not everything, of course, which is why NVIDIA's own performance claims for these cards isn't nearly as high as the gains in shading performance alone. But certainly shaders are a bottleneck much of the time, given the embarrassingly parallel nature of computer graphics. Which is why throwing more hardware (in this case, more CUDA cores) at the problem is such an effective strategy.

The big question at this point is how these additional CUDA cores are organized, and what it means for the execution model within an SM. We're admittedly getting into more minute technical details here, but how easily Ampere can fill those additional cores is going to be a critical factor in how well it can extra all those teraFLOPs of performance. Is this driven by additional IPC extraction within a warp of threads? Running further warps? Etc.

On a final note, while we're waiting for more technical information on the new cards, it's noteworthy that none of NVIDIA's spec sheets or other materials mention any additional graphics features in the cards. To NVIDIA's credit, Turing was already well ahead of the curve, offering the features that would become the new DirectX 12 Ultimate/feature level 12_2 set over two years before any other vendor. So with Microsoft and the rest of the field just now catching up, there's no immediate higher feature set for NVIDIA to aspire to. Still, it's unusual to not see NVIDIA pull a new graphics feature or two out of its proverbial hat just to wow the crowds.

The Down-Low On I/O: PCI Express 4.0, SLI, and RTX IO

The introduction of Ampere within NVIDIA’s GeForce cards also brings Ampere’s improved I/O capabilities to the consumer market. And while nothing here is likely to be groundbreaking on its own – especially relative to the sheer amount of hardware NVIDIA is throwing at performance – everything here further helps to keep NVIDIA’s latest generation card well-fed.

Arguably the marquee feature on the I/O front is the inclusion of PCI-Express 4.0 support. This was introduced on NVIDIA’s A100 accelerators, so its inclusion here has been all but expected, but none the less it marks the first increase in NVIDIA’s PCIe bandwidth since the launch of the GTX 680 over 8 years ago. With a full PCIe 4.0 x16 slot, the RTX 30 series cards get just shy of 32GB/second of I/O bandwidth in each direction, double what the RTX 20 series cards had access to.

As for the performance impact from PCIe 4.0, we’re not expecting much of a difference at this time, as there’s been very little evidence that Turing cards have been limited by PCIe 3.0 speeds – even PCIe 3.0 x8 has proven to be sufficient in most cases. Ampere’s higher performance will undoubtedly drive up the need for more bandwidth, but not by much. Which is likely why even NVIDIA isn’t promoting PCIe 4.0 support terribly hard (though being second to AMD here could very well be a factor).

Meanwhile, it looks like SLI support will be sticking with us, for at least one more generation. NVIDIA’s RTX 3090 card includes a single NVLInk connector for SLI and other multi-GPU purposes. So multi-GPU rendering remains alive, if just barely. NVIDIA’s presentation today didn’t go into any further details on the feature, but it’s noteworthy that the Ampere architecture introduces NVLink 3, which if NVIDIA is using it for the RTX 3090, means that the 3090 will likely have twice the NVLink bandwidth of the RTX 2080 Ti, for 100GB/second in each direction.

Overall, I suspect the inclusion of an NVLInk connector on the RTX 3090 is more a play for compute users, many of whom will be drooling over a fast consumer-grade card with 24GB of VRAM thanks to how important VRAM capacity is to more advanced deep learning models. Still, NVIDIA is never one to pass up an opportunity to upsell on the graphics front as well.

Finally, with the launch of the RTX 30 series, NVIDIA is also announcing a new suite of I/O features that they’re calling RTX IO. At a high level this appears to be NVIDIA’s implementation of Microsoft’s forthcoming DirectStorage API, which like on the Xbox Series X console where it’s first launching, allows for direct, asynchronous asset streaming from storage to the GPU. By bypassing the CPU for much of this work, DirectStorage (and by extension RTX IO) can improve both I/O latency and throughput to the GPU by letting the GPU more directly fetch the resources it needs.

The most significant innovation here, besides Microsoft providing a standardized API for the technology, is that Ampere GPUs are capable of directly decompressing assets. Game assets are frequently compressed for storage purposes – least Flight Simulator 2020 take up even more SSD space – and currently decompressing those assets to something the GPU can use is the job of the CPU. Offloading it from the CPU not only frees it up for other tasks, but ultimately it gets rid of a middleman entirely, which helps to improve asset streaming performance and game load times.

Pragmatically speaking, we already know this technology is coming to the Xbox Series X and PlayStation 5, so this is largely Microsoft and NVIDIA keeping parity with the next-generation consoles. None the less, it does require some real hardware improvements on the GPU end of things to handle all of these I/O requests and to be able to efficiently decompress various types of assets.

Ampere Power Efficiency Improvements: 1.9x? Probably Not

Next to overall video card performance, NVIDIA’s second big technology pillar as part of their presentation was overall power efficiency. With power efficiency being a cornerstone of GPU design – graphics workloads are embarrassingly parallel and GPU performance is capped by total power consumption – power efficiency is a frequent focus across all GPU launches. And for the RTX 30 series launch NVIDIA made sure to give it some attention.

On the whole, NVIDIA is claiming that Ampere offers a 1.9x increase in power efficiency. For a full jump in manufacturing process nodes in the post-Dennard era, this is actually a bit of a surprising claim. It’s far from impossible, mind you, but it’s more than what NVIDIA got out of Turing or Pascal before it.

However digging into NVIDIA’s claims a bit more, this 1.9x claim increasingly looks exaggerated – or at least cherry-picked.

The immediate oddity here is that power efficiency is normally measured at a fixed level of power consumption, not a fixed level of performance. With power consumption of a transistor increasing at roughly the cube of the voltage, a “wider” part like Ampere with more functional blocks can clock itself at a much lower frequency to hit the same overall performance as Turing. In essence, this graph is comparing Turing at its worst to Ampere at its best, asking “what would it be like if we downclocked Ampere to be as slow as Turing” rather than “how much faster is Ampere than Turing under the same constraints”. In other words, NVIDIA’s graph is not presenting us with an apples-to-apples performance comparison at a specific power draw.

If you actually make a fixed wattage comparison, then Ampere doesn’t look quite as good in NVIDIA’s graph. Whereas Turing hits 60fps at 240W in this example, Ampere’s performance curve has it at roughly 90fps. Which to be sure, this is still a sizable improvement, but it’s only a 50% improvement in performance-per-watt. Ultimately the exact improvement in power efficiency is going to depend on where in the graph you sample, but it’s clear that NVIDIA’s power efficiency improvements with Ampere, as defined by more normal metrics, are not going to be 90% as NVIDIA’s slide claims.

All of which is reflected in the TDP ratings of the new RTX 30 series cards. The RTX 3090 draws a whopping 350 watts of power, and even the RTX 3080 pulls 320W. If we take NVIDIA’s performance claims at their word – that RTX 3080 offers up to 100% more performance than RTX 2080 – then that comes with a 49% hike in power consumption, for an effective increase in performance-per-watt of just 34%. And the comparison for the RTX 3090 is even harsher, with NVIDIA claiming a 50% performance increase for a 25% increase in power consumption, for a net power efficiency gain of just 20%.

Ultimately, it’s clear that a good chunk of NVIDIA’s performance gains for the Ampere generation are going to come from higher power consumption limits. With 28B transistors the cards are going to be fast, but it’s going to take more power than ever before to light them all up.

GDDR6X: Cooking With PAM

Outside of the core GPU architecture itself, GA102 also introduces support for another new memory type: GDDR6X. A Micron and NVIDIA developed evolution of GDDR6, GDDR6X is designed to allow for higher memory bus speeds (and thus more memory bandwidth) by using multi-level signaling on the memory bus. By employing this strategy, NVIDIA and Micron can continue to push the envelope on cost-effective discrete memory technologies, and thus continue to feed the beast that is NVIDIA’s latest generation of GPUs. This marks the third memory technology in as many generations for NVIDIA, having gone from GDDR5X to GDDR6 to GDDR6X

Micron accidentally spilt the beans on the subject last month, when they posted some early technical documents on the technology. By employing Pulse Amplitude Modulation-4 (PAM4), GDDR6X is able to transmit one of four different symbols per clock, in essence moving two bits per clock instead of the usual one bit per clock. For the sake of brevity I won’t completely rehash that discussion, but I’ll go over the highlights.

At a very high level, what PAM4 does versus NRZ (binary coding) is to take a page from the MLC NAND playbook, and double the number of electrical states a single cell (or in this case, transmission) will hold. Rather than traditional 0/1 high/low signaling, PAM4 uses 4 signal levels, so that a signal can encode for four possible two-bit patterns: 00/01/10/11. This allows PAM4 to carry twice as much data as NRZ without having to double the transmission bandwidth, which would have presented an even greater challenge.


NRZ vs. PAM4 (Base Diagram Courtesy Intel)

PAM4 in turn requires more complex memory controllers and memory devices to handle the multiple signal states, but it also backs off on the memory bus frequency, simplifying some other aspects. Perhaps most importantly of which for NVIDIA at this point is that it’s more power efficient, taking around 15% less power per bit of bandwidth. To be sure, total DRAM power consumption is still up because that’s more than offset by bandwidth gains, but every joule saved on DRAM is another joule that can be dedicated to the GPU instead.

According to Micron’s documents, the company designed the first generation of their GDDR6X to go to 21Gbps; however NVIDIA is keeping things a bit more conservative and stopping at 19.5Gbps for the RTX 3090, and 19Gbps for the RTX 3080. Even at those speeds, that’s still a 36%-39% increase in memory bandwidth over the previous generation of cards, assuming identically-sized memory buses. Overall this kind of progress remains the exception to the norm; historically speaking we typically don’t see memory bandwidth gains quite this large over successive generations. But with many more SMs to feed, I can only imagine that NVIDIA’s product teams are glad to have it.

GDDR6X does come with one somewhat immediate drawback however: capacity. While Micron has plans for 16Gbit chips in the future, to start things out today they’re only making 8Gbit chips in the future. This is the same density as the memory chips on NVIDIA’s RTX 20 series cards, and their GTX 1000 series cards for that matter. So there are no “free” memory capacity upgrades, at least for these initial cards. RTX 3080 only gets 10GB of VRAM versus 8GB on RTX 2080, and that’s by virtue of using a larger 320-bit memory bus (which is to say, 10 chips instead of 8). Meanwhile RTX 3090 gets 24GB of VRAM, but only by using 12 pairs of chips in clamshell mode on a 384-bit memory bus, making for more than twice as many memory chips as on RTX 2080 Ti.

HDMI 2.1 & AV1 Are In, VirtualLink Is Out

On the display I/O front, Ampere and the new GeForce RTX 30 series cards make a couple of notable changes as well. The most important of which is that, at long last, HDMI 2.1 support has arrived. Already shipping in TVs (and set to ship in this year’s consoles), HDMI 2.1 brings a few features to the table, most notably support for much greater cable bandwidth. An HDMI 2.1 cable can carry up to 48Gbps of data – more than 2.6x as much as HDMI 2.0 – allowing for much higher display resolutions and refresh rates, such as 8K TVs or 4K displays running at upwards of 165Hz. This significant jump in bandwidth even puts HDMI ahead of DisplayPort, at least for now; DisplayPort 1.4 only offers around 66% the bandwidth, and while DisplayPort 2.0 will eventually beat that, it would seem that Ampere is just a bit too early for that technology.

With all of that said, I’m still waiting on confirmation from NVIDIA about whether they support a full 48Gbps signaling rate with their new GeForce cards. Some HDMI 2.1 TVs have been shipping with support for lower data rates, so it’s not inconceivable that NVIDIA may do the same here.

HDMI 2.1’s other marquee feature from a gaming standpoint is support for variable refresh rates over HDMI. However this feature is not exclusive to HDMI 2.1, and indeed has already been backported to NVIDIA’s RTX 20 cards, so while support for it is going to be more useful here with the greater cable bandwidth, it technically is not a new feature to NVIDIA’s cards.

Meanwhile VirtualLink ports, which were introduced on the RTX 20 series of cards, are on their way out. The industry’s attempt to build a port combing video, data, and power in a single cable for VR headsets has fizzled, and none of the big 3 headset manufacturers (Oculus, HTC, Valve) used the port. So you will not find the port returning on RTX 30 series cards.

Finally, while we’re on the subject of video, NVIDIA has also confirmed that the new Ampere GPUs include an updated version of their NVDEC video decode block. Bringing the block up to what NVIDIA calls Gen 5, the chipmaker has added decode support for the new AV1 video codec.

The up-and-coming royalty free codec is widely expected to become the de facto successor to H.264/AVC, as while HEVC has been on the market for a number of years (and is already supported in all recent GPUs), the madcap royalty situation around the codec has discouraged its adoption. By contrast, AV1 should deliver similar or slightly better quality than HEVC without royalties for its use in distribution, which makes it a lot more palatable to content vendors. The one downside to AV1 thus far is that it’s pretty CPU heavy, which makes hardware decode support important, even in high-end desktops, in order to avoid tying up the CPU and to ensure smooth, glitch-free playback.

NVIDIA hasn’t gone too much into details here on what their AV1 support entails, but a separate blog post mentions 10-bit color support and 8K decoding, so it sounds like NVIDIA has its bases covered.

Meanwhile, there is no mention of further improvements to the company’s NVENC block. That was most recently revised for the Turing launch, improving the scope of NVIDIA’s HEVC encoding capabilities and overall HEVC & H.264 image quality. Otherwise, we’re still a generation too early for hardware AV1 encoding, as some of the unique properties of that codec are making hardware encoding a tougher nut to crack.

Comments Locked

410 Comments

View All Comments

  • Spunjji - Monday, September 7, 2020 - link

    Honestly, this comment's pretty far off-base.

    To start, RDNA is pretty much even with Turing on power/performance - and yes, that's with a node advantage, so yes, it means that architecturally Turing is markedly more efficient than RDNA. That's not in dispute, but you're claiming that there was no efficiency improvement from Vega on 7nm to RDNA on 7nm which is just a lie - the 5700XT nearly matches Radeon VII's performance at ~100W less power.

    Then you compare the 980Ti to the 5600XT, which is a funny one, because the 980Ti also performs like that 1660Ti - Nvidia's competition for the 5600XT. In effect that's more just a comment on slow progress in the GPU industry up until now, but you frame it as a ding on AMD alone. You also quote the wrong power for the 5600XT - it's 150W board power by the spec, ~160w measured, so a little higher than the 1660Ti for a little more performance.

    You then launch off that total mischaracterisation to claim that AMD won't hit their targets for RDNA 2 - even though they hit their targets for RDNA and for every single Ryzen release. You're entitled to not believe them, that's fine, but you're not entitled to your own versions of the facts.

    Despite all that, I really don't disagree with you conclusion. AMD need to start competing properly instead of just going for the value proposition, else they won't be able to make the required margins to compete at the high-end. If they hit their 50% increase in PPW target, though, then their high-end card ought to compete with the 2080 but at a smaller die area / power draw. Whether they actually *do* remains to be seen.
  • Kangal - Monday, September 7, 2020 - link

    It seems like you completely disagreed with me, walked away, and landed to the same position. So not sure how far the base goes :\

    I never disputed that AMD won't hit their targets. They will. They determine what their targets are and when they'll hit them. It's easy. Whilst, AMD is being more objective than Intel, by creating targets in advance, and in general, I believe they do try to hit them with real-world targets. It's just sort of easy to fudge the figures a little here/there to save face during a low-point. Hence, I've woken up to the situation to never listen to the underdog or the market leader or any company... but wait for tests to be done by unbiased enthusiasts (ahem Anandtech). It's the Scientific way after all (peer review).

    I got the power draws from the reviews done here, figures may not be exact, but it doesn't detract from the point I made. There wasn't a slow progress in the GPU Industry, until, well Vega64 happened. Nvidia made a decent leap with Pascal, and AMD was doing okay in the low-end/midrange but delayed "Big Polaris" several times. But after the GTX 1080, Nvidia saw little reason to push further. I won't even blame the cryptocurrency boom. This one is on AMD for not being competitive enough, and this one is also on consumers for not buying AMD when they should.

    So I think I've done AMD justice in my analysis, and not a mischaracterisation. They need to do something about their debt/finances, and try to pump more money into their R&D. The architectural differences are definitely there, and I want to see them claw back more ground. I know they're stretched too thin. You've got supercomputers, servers, desktops, laptops, budget chipsets, embedded chips, iGPUs, dGPUs, gaming drivers, storage solutions etc etc. It's a lot. Frankly we're lucky that TSMC and Zen was a success, as was the console sales. Maybe I'm asking for too much, too soon...

    @tamalero
    Face it, the 8nm lithography from Samsung is impressive. I thought it would be in the middle of the 14nm-7nm node, but I was wrong, it is much much closer to the 7nm node than the 14nm wafers. Sure, TSMC's mainstream 7nm from 2018 would've been better but not by much. So they've done well, and as I said earlier, at least this gives us some runway to look forward to the next gpu cards (TSMC, Advanced +, 5nm, 3D Stacking) due in like 2023. And according to rumours floating around, Nvidia is still buying some 7nm wafers from TSMC for certain silicons, but they couldn't get a Tender for a large yields and stock, and at a price they wanted. It seems Nvidia were too aggressive on prices, and TSMC doesn't care, since they make more money off smartphone SoC's anyway. Thank goodness for Samsung Foundry who was ready to catch the fall, so to say. Hopefully we can see more competition in this market going forward (Intel Fabs and Global Foundries, I'm looking at you).
  • Spunjji - Tuesday, September 8, 2020 - link

    @Kangal - As I said, I thought the overall comment itself was off-base, but I agreed with your conclusion. There was never a point of total disagreement, just dissent on specifics.

    Anandtech do Total System Power for their power numbers here, which is why I went elsewhere to check board power numbers (and not the ones reported by the GPU itself).

    I don't think you can entirely blame a lack of competition from AMD for the slowdown. I'm going to give two reasons here:
    1) Nvidia just announced their biggest jump ahead in performance for a long time, and there hasn't really been any more or less competition from AMD of late than there was in the period leading up to Turing. I think AMD's relative market share might even be down on that period when Polaris was relatively fresh.
    2) There are self-evident reasons why Turing didn't move the performance bar forwards much: they were stuck on a similar manufacturing process, and they wanted to introduce RTX. That constrained the possibilities available to them - some of the die was consumed by expensive RT features, and they weren't getting any of that back from a shrink. Compare that to Pascal, where they benefited from a full node shrink and minor architectural changes.

    As for consumers, well, I agree in part but I feel like that's more complex too. Take the near-total lack of Polaris-based laptops - what was that about? Vega I can sort-of understand due to cost - although the one laptop with a Vega 56 in it actually competed pretty well with its 1070-based cousin - but iven the prices Polaris cards sold for on desktop, it should have been no problem at all for even a fairly clunky Polaris-based solution to compete on price alone, if not performance / thermals. At some point its not people's fault for not buying things if they just aren't *there*, and I genuinely don't know whose fault that is.
  • Sivar - Tuesday, September 8, 2020 - link

    Thank you for this example showing that we can disagree with someone, debate using facts, and yet remain respectful and open to the idea that the opponent's statement can have merits despite its flaws.
  • tamalero - Monday, September 7, 2020 - link

    "8nm impressive". According to who?
    Everyone else is saying that if Nvidia had eaten its ego, they would have been on the much much better 7nm from TSCM.

    8nm is not even on the density of 7nm.
  • eddman - Tuesday, September 1, 2020 - link

    They didn't copy anything. They've already had their own solution for a year.

    https://developer.nvidia.com/blog/gpudirect-storag...

    It can now be leveraged in windows, since MS is adding the DirectStorage API to windows.
  • inighthawki - Tuesday, September 1, 2020 - link

    Adding features like this to your hardware isn't something you do in a couple months after the hardware features of next gen consoles were revealed. This type of thing is something you start years in advance. It's pretty clear that there was some sort of collaboration between parties like MS/Sony/Game devs and the desire for hardware based decompression and loading for IO was expected to be a prominent feature of the next generation.
  • tipoo - Tuesday, September 1, 2020 - link

    You're suggesting that since the PS5 tech talk in March of this year, Nvidia architected a competing solution?

    These things take way longer than people seem to imagine. It looks like their views on where next gen games were going were quite similar.
  • Yojimbo - Tuesday, September 1, 2020 - link

    NVIDIA copied it so well they are coming out with it first...
  • Kakkoii - Wednesday, September 2, 2020 - link

    Nvidia has had this feature as part of NVLINK on their server side platforms for many years now... I've been eagerly anticipating its arrival on consumer GPUs.

Log in

Don't have an account? Sign up now