As someone who analyzes GPUs for a living, one of the more vexing things in my life has been NVIDIA’s Maxwell architecture. The company’s 28nm refresh offered a huge performance-per-watt increase for only a modest die size increase, essentially allowing NVIDIA to offer a full generation’s performance improvement without a corresponding manufacturing improvement. We’ve had architectural updates on the same node before, but never anything quite like Maxwell.

The vexing aspect to me has been that while NVIDIA shared some details about how they improved Maxwell’s efficiency over Kepler, they have never disclosed all of the major improvements under the hood. We know, for example, that Maxwell implemented a significantly altered SM structure that was easier to reach peak utilization on, and thanks to its partitioning wasted much less power on interconnects. We also know that NVIDIA significantly increased the L2 cache size and did a number of low-level (transistor level) optimizations to the design. But NVIDIA has also held back information – the technical advantages that are their secret sauce – so I’ve never had a complete picture of how Maxwell compares to Kepler.

For a while now, a number of people have suspected that one of the ingredients of that secret sauce was that NVIDIA had applied some mobile power efficiency technologies to Maxwell. It was, after all, their original mobile-first GPU architecture, and now we have some data to back that up. Friend of AnandTech and all around tech guru David Kanter of Real World Tech has gone digging through Maxwell/Pascal, and in an article & video published this morning, he outlines how he has uncovered very convincing evidence that NVIDIA implemented a tile based rendering system with Maxwell.

In short, by playing around with some DirectX code specifically designed to look at triangle rasterization, he has come up with some solid evidence that NVIDIA’s handling of tringles has significantly changed since Kepler, and that their current method of triangle handling is consistent with a tile based renderer.


NVIDIA Maxwell Architecture Rasterization Tiling Pattern (Image Courtesy: Real World Tech)

Tile based rendering is something we’ve seen for some time in the mobile space, with both Imagination PowerVR and ARM Mali implementing it. The significance of tiling is that by splitting a scene up into tiles, tiles can be rasterized piece by piece by the GPU almost entirely on die, as opposed to the more memory (and power) intensive process of rasterizing the entire frame at once via immediate mode rendering. The trade-off with tiling, and why it’s a bit surprising to see it here, is that the PC legacy is immediate mode rendering, and this is still how most applications expect PC GPUs to work. So to implement tile based rasterization on Maxwell means that NVIDIA has found a practical means to overcome the drawbacks of the method and the potential compatibility issues.

In any case, Real Word Tech’s article goes into greater detail about what’s going on, so I won’t spoil it further. But with this information in hand, we now have a more complete picture of how Maxwell (and Pascal) work, and consequently how NVIDIA was able to improve over Kepler by so much. Finally, at this point in time Real World Tech believes that NVIDIA is the only PC GPU manufacturer to use tile based rasterization, which also helps to explain some of NVIDIA’s current advantages over Intel’s and AMD’s GPU architectures, and gives us an idea of what we may see them do in the future.

Source: Real World Tech

Comments Locked

191 Comments

View All Comments

  • close - Monday, August 1, 2016 - link

    Yes, you will definitely get a prize for *LIKING* a specific product. A fanboy is a fanboy no matter what he cheers for ;). And usually fanboys on any side aren't that bright to begin with.
  • TessellatedGuy - Monday, August 1, 2016 - link

    Also, looking at how the 14nm "efficiency" worked out for amd, I wouldn't be surprised if I was correct. Hell, a bigger die size like pascal is more efficient that polaris.
  • godrilla - Monday, August 1, 2016 - link

    The nano was pretty impressive, and it seems amd will be 1st with hbm 2.0 consumer card, plus performance per watt is increasing with age do low level APIs vs decreasing due to legendary generation graphics status!
  • hojnikb - Monday, August 1, 2016 - link

    If nvidia ever decided to underclock the shit out of something like titan x, they could just as easily reach some next level efficiency.
  • emn13 - Monday, August 1, 2016 - link

    Judging by forum reports, AMD's latest cards are at least as unreasonably clocked. Many people seem to be able to get large savings with almost no performance loss with just a little undervolting/underclocking.

    It's weird - their reputation as less efficient seems like something they'd be glad to lose, yet steps that are so trivial anyone can do them at home aren't taken.
  • looncraz - Monday, August 1, 2016 - link

    AMD's easiest fix for their power usage would be to step the memory frequency with the core frequency. This would be most noticeable while watching videos, as AMD's current drivers just ramp the memory frequency to max clocks any time there is a GPU load.

    I underclock my RAM to just 625Mhz (it's stable at 550Mhz, but I like a little margin) and save ~25~40W while watching Youtube videos. That means my R9 290 is only pulling some 20W or so to watch a video.
  • retrospooty - Monday, August 1, 2016 - link

    AMD/ATI and Nvidia have traded that top spot many times in the past 15 years. Nvidia has had it for the past few years, but that can always change back again (again).
  • sharath.naik - Monday, August 1, 2016 - link

    Not for the next few years, their polaris GPU turned out to be even more inefficient.
  • Yojimbo - Monday, August 1, 2016 - link

    ATI had a bigger market share than NVIDIA just once, for about a year in 2H '04 and 1H '05. The peak was 56% ATI and 42% NVIDIA. AMD never has. Sometimes AMD/ATI has had the higher-performing top card, if that's what you mean, but the original comment was about the overall architecture. In the last two years NVIDIA has opened up a huge lead, with a peak of about 80% NVIDIA and 20% AMD, now down to 77% NVIDIA, 23% AMD.
  • StrangerGuy - Monday, August 1, 2016 - link

    How the Radeon 4xxx to 7xxx hit a home run in perf/price yet got punished with less marketshare in the same time period versus Nvidia only speak volumes about how toxic the AMD brand in the minds of the consumers are; ATI as a brand has much favorable mindshare and it was a stupendously dumb decision that got dropped in favor of AMD.

Log in

Don't have an account? Sign up now