Titan’s Compute Performance, Cont

With Rahul having covered the basis of Titan’s strong compute performance, let’s shift gears a bit and take a look at real world usage.

On top of Rahul’s work with Titan, as part of our 2013 GPU benchmark suite we put together a larger number of compute benchmarks to try to cover real world usage, including the old standards of gaming usage (Civilization V) and ray tracing (LuxMark), along with several new tests. Unfortunately that got cut short when we discovered that OpenCL support is currently broken in the press drivers, which prevents us from using several of our tests. We still have our CUDA and DirectCompute benchmarks to look at, but a full look at Titan’s compute performance on our 2013 GPU benchmark suite will have to wait for another day.

For their part, NVIDIA of course already has OpenCL working on GK110 with Tesla. The issue is that somewhere between that and bringing up GK110 for Titan by integrating it into NVIDIA’s mainline GeForce drivers – specifically the new R314 branch – OpenCL support was broken. As a result we expect this will be fixed in short order, but it’s not something NVIDIA checked for ahead of the press launch of Titan, and it’s not something they could fix in time for today’s article.

Unfortunately this means that comparisons with Tahiti will be few and far between for now. Most significant cross-platform compute programs are OpenCL based rather than DirectCompute, so short of games and a couple other cases such as Ian’s C++ AMP benchmark, we don’t have too many cross-platform benchmarks to look at. With that out of the way, let’s dive into our condensed collection of compute benchmarks.

We’ll once more start with our DirectCompute game example, Civilization V, which uses DirectCompute to decompress textures on the fly. Civ V includes a sub-benchmark that exclusively tests the speed of their texture decompression algorithm by repeatedly decompressing the textures required for one of the game’s leader scenes.  While DirectCompute is used in many games, this is one of the only games with a benchmark that can isolate the use of DirectCompute and its resulting performance.

Note that for 2013 we have changed the benchmark a bit, moving from using a single leader to using all of the leaders. As a result the reported numbers are higher, but they’re also not going to be comparable with this benchmark’s use from our 2012 datasets.

With Civilization V having launched in 2010, graphics cards have become significantly more powerful since then, far outpacing growth in the CPUs that feed them. As a result we’ve rather quickly drifted from being GPU bottlenecked to being CPU bottlenecked, as we see both in our Civ V game benchmarks and our DirectCompute benchmarks. For high-end GPUs the performance difference is rather minor; the gap between GTX 680 and Titan for example is 45fps, or just less than 10%. Still, it’s at least enough to get Titan past the 7970GE in this case.

Our second test is one of our new tests, utilizing Elcomsoft’s Advanced Office Password Recovery utility to take a look at GPU password generation. AOPR has separate CUDA and OpenCL kernels for NVIDIA and AMD cards respectively, which means it doesn’t follow the same code path on all GPUs but it is using an optimal path for each GPU it can handle. Unfortunately we’re having trouble getting it to recognize AMD 7900 series cards in this build, so we only have CUDA cards for the time being.

Password generation and other forms of brute force crypto is an area  where the GTX 680 is particularly weak, thanks to the various compute aspects that have been stripped out in the name of efficiency. As a result it ends up below even the GTX 580 in these benchmarks, never mind AMD’s GCN cards. But with Titan/GK110 offering NVIDIA’s full compute performance, it rips through this task. In fact it more than doubles performance from both the GTX 680 and the GTX 580, indicating that the huge performance gains we’re seeing are coming from not just the additional function units, but from architectural optimizations and new instructions that improve overall efficiency and reduce the number of cycles needed to complete work on a password.

Altogether at 33K passwords/second Titan is not just faster than GTX 680, but it’s faster than GTX 690 and GTX 680 SLI, making this a test where one big GPU (and its full compute performance) is better than two smaller GPUs. It will be interesting to see where the 7970 GHz Edition and other Tahiti cards place in this test once we can get them up and running.

Our final test in our abbreviated compute benchmark suite is our very own Dr. Ian Cutress’s SystemCompute benchmark, which is a collection of several different fundamental compute algorithms. Rahul went into greater detail on this back in his look at Titan’s compute performance, but I wanted to go over it again quickly with the full lineup of cards we’ve tested.

Surprisingly, for all of its performance gains relative to GTX 680, Titan still falls notably behind the 7970GE here. Given Titan’s theoretical performance and the fundamental nature of this test we would have expected it to do better. But without additional cross-platform tests it’s hard to say whether this is something where AMD’s GCN architecture continues to shine over Kepler, or if perhaps it’s a weakness in NVIDIA’s current DirectCompute implementation for GK110. Time will tell on this one, but in the meantime this is the first solid sign that Tahiti may be more of a match for GK110 than it’s typically given credit for.

Titan’s Compute Performance (aka Ph.D Lust) Meet The 2013 GPU Benchmark Suite & The Test
Comments Locked

337 Comments

View All Comments

  • chizow - Saturday, February 23, 2013 - link

    I haven't use this rebuttal in a long time, I reserve it for only the most deserving, but you sir are retarded.

    Everything you've written above is anti-progress, you've set Moore's law and semiconductor progress back 30 years with your asinine rants. If idiots like you running the show, no one would own any electronic devices because we'd be paying $50,000 for toaster ovens.
  • CeriseCogburn - Tuesday, February 26, 2013 - link

    Yeah that's a great counter you idiot... as usual when reality barely glints a tiny bit through your lying tin foiled dunce cap, another sensationalistic pile of bunk is what you have.
    A great cover for a cornered doofus.
    When you finally face your immense error, you'll get over it.

  • hammer256 - Thursday, February 21, 2013 - link

    Not to sound like a broken record, but for us in scientific computing using CUDA, this is a godsend.
    The GTX 680 release was a big disappointment for compute, and I was worried that this is going to be the trend going forward with Nvidia: nerfed compute card for the consumers that focuses on graphics, and compute heavy professional cards for the HPC space.
    I was worried that the days of cheap compute are gone. These days might still be numbered, but at least for this generation Titan is going to keep it going.
  • ronin22 - Thursday, February 21, 2013 - link

    +1
  • PCTC2 - Thursday, February 21, 2013 - link

    For all of you complaining about the $999 price tag. It's like the GTX 690 (or even the 8800 Ultra, for those who remember it). It's a flagship luxury card for those who can afford it.

    But that's beside the real point. This is a K20 without the price premium (and some of the valuable Tesla features). But for researchers on a budget, using homegrown GPGPU compute code that doesn't validate to run only on Tesla cards, these are a godsend. I mean, some professional programs will benefit from having a Tesla over a GTX card, but these days, researchers are trying to reach into HPC space without the price premium of true HPC enterprise hardware. The GTX Titan is a good middle point. For the price of a Quadro K5000 and a single Tesla K20c card, they can purchase 4 GTX Titans and still have some money to spare. They don't need SLI. They just need the raw compute power these cards are capable of. So as entry GPU Compute workstation cards, these cards hit the mark for those wanting to enter GPU compute on a budget. As a graphics card for your gaming machine, average gamers need not apply.
  • ronin22 - Thursday, February 21, 2013 - link

    "average gamers need not apply"

    If only people had read this before posting all this hate.

    Again, gamers, this card is not for you. Please get the cr*p out of here.
  • CeriseCogburn - Tuesday, February 26, 2013 - link

    You have to understand, the review sites themselves have pushed the blind fps mentality now for years, not to mention insanely declared statistical percentages ripened with over-interpretation on the now contorted and controlled crybaby whiners. It's what they do every time, they feel it gives them the status of consumer advisor, Nader protege, fight the man activist, and knowledgeable enthusiast.

    Unfortunately that comes down the ignorant demands we see here, twisted with as many lies and conspiracies as are needed, to increase the personal faux outrage.
  • Dnwvf - Thursday, February 21, 2013 - link

    In absolute terms, this is the best non-Tesla compute card on the market.

    However, looking at flops/$, you'd be better off buying 2 7970Ghz Radeons, which would run around $60 less and give you more total Flops. Look at the compute scores - Titan is generally not 2x a single 7970. And in some of the compute scores, the 7970 wins.

    2 7970ghz (not even in crossfire mode, you don't need that for OpenCL), will beat the crap out of Titan and cost less. They couldn't run AOPR on the AMD cards..but everybody knows from bitcoin that Amd cards rule over nvidia for password hashing ( just google bitcoin bit_align_int to see why).

    There's an article on Toms Hardware where they put a bunch of nvidia and amd cards through a bunch of compute benchmarks, and when amd isn't winning, the gtx 580 generally beats the 680...most likely due to its 512 bit bus. Titan is still a 384 bit bus...can't really compare on price because Phi costs an arm and a leg like Tesla, but you have to acknowledge that Phi is probably gonna rock out with its 512 bit bus.

    Gotta give Nvidia kudos for finally not crippling fp64, but at this price point, who cares? If you're looking to do compute and have a GPU budget of $2K, you could buy:

    An older Tesla
    2 Titans
    -or-
    Build a system with 2 7970Ghz and 2 Gtx 580.

    And the last system would be the best...compute on the amd cards for certain algorithms, on the nvidia cards for the others, and pci bandwidth issues aside, running multiple complex algorithms simultaneously will rock because you can enqueue and execute 4 OpenCL kernels simultaneously. You'd have to shop around for a while to find some 580's though.

    Gamers aren't gonna buy this card unless they're spending Daddy's money, and serious compute folk will realize quickly that if they buy a mobo that will fit 2 or 4 double-width cards, depending on Gpu budget, they can get more flops per dollar with a multiple-card setup (think of it as a micro-sized Gpu compute cluster). Don't believe me? Google Jeremi Gosni oclhashcat.

    I'm not much for puns, but this card is gonna flop. (sorry)
  • DanNeely - Thursday, February 21, 2013 - link

    Has any eta on when the rest of the Kepler refresh is due leaked out yet?
  • HisDivineOrder - Thursday, February 21, 2013 - link

    It's way out of my price range, first and foremost.

    Second, I think the pricing is a mistake, but I know where they are coming from. They're using the same Intel school of thought on SB-E compared to IB. They price it out the wazoo and only the most luxury of the luxury gamers will buy it. It doesn't matter that the benchmarks show it's only mostly better than its competition down at the $400-500 range and not the all-out destruction you might think it capable of.

    The cost will be so high it will be spoken of in whispers and with wary glances around, fearful that the Titan will appear and step on you. It'll be rare and rare things are seen as legendary just so long as they can make the case it's the fastest single-GPU out there.

    And they can.

    So in short, it's like those people buying hexacore CPU's from Intel. You pay out the nose, you get little real gain and a horrible performance per dollar, but it is more marketing than common sense.

    If nVidia truly wanted to use this product to service all users, they would have priced it at $600-700 and moved a lot more. They don't want that. They're fine with the 670/680 being the high end for a majority of users. Those cards have to be cheap to make by now and with AMD's delays/stalls/whatever's, they can keep them the way they are or update them with a firmware update and perhaps a minor retooling of the fab design to give it GPU Boost 2.

    They've already set the stage for that imho. If you read the way the article is written about GPU Boost 2 (both of them), you can see nVidia is setting up a stage where they introduce a slightly modified version of the 670 and 680 with "minor updates to the GPU design" and GPU Boost 2, giving them more headroom to improve consistency with the current designs.

    Which again would be stealing from Intel's playbook of supplement SB-E with IB mainstream cores.

    The price is obscene, but the only people who should actually care are the ones who worship at the altar of AA. Start lowering that and suddenly even a 7950 is way ahead of what you need.

Log in

Don't have an account? Sign up now