It is confusing that sometimes you are benchmarking cores, and sometimes cpus. The question here is "which is the fastest cpu, x86 or POWER8" - and then you should bench cpu vs cpu. Not core vs core. If a core is faster than another core says nothing, you also need to know how many cores there are. Maybe one cpu has 2 cores, and the other has 1.000 cores. So when you tell which core is fastest, you give incomplete information so I still have to check up how many cores and then I can conclude which cpu is fastest. Or can I? There are scaling issues, just because one benchmark runs well on one core, does not mean it runs equally well when run on 18 cores. This means I can not extrapolate from one core to the entire cpu. So I still am not sure which cpu is fastest as you give me information about core performance. Next time, if you want to talk about which cpu is faster, please benchmark the entire cpu. Not core, as you are not talking about which core is faster.
Here are 20+ world records by SPARC M7 cpu. It is typically 2-3x faster than POWER8 and Intel Xeon, all the way up to >10x faster. For instance, M7 achieves 87% higher saps than E5-2699v3. https://blogs.oracle.com/BestPerf/
The big difference between POWER and SPARC vs x86, is scalability and RAS. When I say scalability, I talk about scale-up business Enterprise servers with as many as 16- or even 32-sockets, running business software such as SAP or big databases, that require one large single server. SGI UV2000 that scales to 10.000s of cores can only run scale-out HPC number crunching workloads, in effect, it is a cluster. There are no customers that have ever run SGI UV2000 using enterprise business workloads, such as SAP. There are no SAP benchmarks nor database benchmarks on SGI UV2000, because they can only be used as clusters. The UV2000 are exclusively used for number crunching HPC workloads, according to SGI. If you dont agree, I invite you to post SAP benchmarks with SGI UV2000. You wont find any. The thing is, you can not use a small cluster with 10.000 cores and replace a big 16- or 32-socket Unix server running SAP. Scale-out clusters can not run SAP, only scale-up servers can. There does not exist any scale-out clustered SAP benchmarks. All the highest SAP benchmarks are done by single large scale-up servers having 16- or 32-sockets. There are no 1.000-socket clustered servers on the SAP benchmark list.
x86 is low end, and have for decades stopped at maximum 8-sockets (when we talk about scale-up business servers), and just recently we see 16- and 32- sockets scale-up business x86 servers on the market (HP Kraken, and SGI UV300H) but they are brand new, so performance is quite bad. It takes a couple of generations until SGI and HP have learned and refined so they can ramp up performance for scale-up servers. Also, Windows and Linux has only scaled to 8-sockets and not above, so they need a major rewrite to be able to handle 16-sockets and a few TB of RAM. AIX and Solaris has scaled to 32-sockets and above for decades, were recently rewritten to handle 10s of TB of RAM. There is no way Windows and Linux can handle that much RAM efficiently as they have only scaled to 8-sockets until now. Unix servers scale way beyond 8-sockets, and perform very well doing so. x86 does not.
The other big difference apart from scalability is RAS. For instance, for SPARC and POWER you can hot swap everything, motherboards, cpu, RAM, etc. Just like Mainframes. x86 can not. Some SPARC cpus can replay instructions if something went wrong. x86 can not.
For x86 you typically use scale-out clusters: many cheap 1-2 socket small x86 servers in a huge cluster just like Google or Facebook. When they crash, you just swap them out for another cheap server. For Unix you typically use them as a single scale-up server with 16- or 32-sockets or even 64-sockets (Fujitsu M10-4S) running business software such as SAP, they have the RAS so they do not crash.
well he started with "niiiiice". Could have been much worse. Hi zeeBomb, I am Johan, 43 years old and already 17 years active as a hardware editor. ;-)
The good old days! I remember so many of the great discussions/arguments we had. We had an Intel guy, an AMD guy, and Charlie Demerjian. Johan was there. Mike Magee would stop in. So would Chris Tom, and Kyle Bennett. It was an awesome collection of people, and the discussions were FULL of real, technical points. I always feel grateful when I think back to Ace's. It was quite a place.
It was an awesome community. I learned so much from everyone. I remember the days when we'd write pages arguing whether AMD's new 64 bit extension to x86 was truly 64 bit. The discussions could be heated, but they were seldom rude. I wish there were something similar today. :/
Like Ryan said, I have been working 11 years at Anand. In other words, it is great working at Anandtech. AT is one of the few tech sites out there that still values deep analysis and allows the editors to take the time to delve deep.
Dear Johan nice article. Did u ever consider sparse system solving (with preconditioning) as a sensitive benchmark? It is a crucial stage of most scientific applications and it is a bandwidth limited operation with a high degree of parallelism. It would be definitely interesting to see how the power 8 fares on such a test. If you are interested I think I could provide a pointer to a simple benchmark (to be compiled). If you feel it may be interesting just drop me an email.
Johan's been with Anandtech for more than a decade, and has been publishing on the subject since the late 90s.
But I very much second your "Niiiiice!," as reading his name always reminds me of the old days over at aceshardware, and I'm always looking forward to his insights!
Mate... Bite your tongue! Johan is THE man when it comes to Datacenter-class hardware. Obviously he doesn't get the same exposure as teh personal technology guys, but he is definitely one of the best reviewers out there (inside and outside AT).
IBM's L1 data cache has a 3-cycle access time, and is twice as large (64KB) as Intel's, and I think I remember it accounting for something like half the power consumption of the core.
Very nice to see tests of non-x86 hardware. It's interesting too se a test of the S822L when IBM just launched two even more price competitive machines, designed and built by Wistron and Tyan, as pure OpenPOWER machines: the S812LC and S822LC. These can't run AIX, and are substantially cheaper than the IBM designed machines. They might lack some features, but they would probably fit nicely in this test. And they are sporting the single chip 12 core version of the POWER8 processor (with cores disabled).
"The server is powered by two redundant high quality Emerson 1400W PSUs."
The sticker on the PSU is only 80+ (no color). Unless the hotswap support comes with a substantial penalty (if so why); this design looks to be well behind the state of the art. With data centers often being power/hvac limited these days, using a relatively low efficiency PSU in an otherwise very high end system seems bizarre to me.
That's possible; it looks like there's something at the bottom of the logo. Google image search shows 80+ platinum as a lighter silver/gray than 80+ silver; white is only the original standard.
Oh yum! THIS is what I still love about AT: non-mainstream previews / reviews. REALLY looking forward to more like this. I only wish SGI still built workstation-level machines. :-(
Indeed, but it'd need a hefty change in direction at SGI to get back into workstations again, so very unlikely for the forseeable future. They certainly have the required base tech (NUMALink6, MPI offload, etc.), namely lots of sockets/cores/RAM coupled with GPUs for really heavy tasks (big data, GIS, medical, etc.), ie. a theoretical scalable, shared-memory workstation. But the market isn't interested in advanced performance solutions like this atm, and the margin on standard 2/4-socket systems isn't worthwhile, it'd be much cheaper to buy a generic Dell or HP (plus, it's only above this no. of sockets that their own unique tech comes into play). Pity, as the equivalent of a UV 30/300 workstation would be sweet (if expensive), though for virtually all of the tasks discussed in this article, shared memory tech isn't relevant anyway. The notion of connectable, scalable, shared memory workstations based on NV gfx, PCIe and newer multi-core MIPS CPUs was apparently brought up at SGI way back before the Rackable merger, but didn't go anywhere (not viable given the financial situation at the time). It's a neat concept, eg. imagine being able to connect two or more separate ordinary 2/4-socket XEON workstations together (each fitted with, say, a couple of M6000s) to form a single combined system with one OS instance and resources pool, allowing users to combine & split setups as required to match workloads, but it's a notion whose time has not yet come.
Of course, what's missing entirely is the notion of advanced but costly custom gfx, but again there's no market for that atm either, at least not publicly. Maybe behind the scenes NV makes custom stuff the way SGI used to for relevant customers (DoD, Lockheed, etc.), but SGI's products always had some kind of commercially available equivalent from which the custom builds were derived (IRx gfx), whereas atm there's no such thing as a Quadro with 30000 cores and 100GB RAM that costs $50K and slides into more than one PCIe slot which anyone can buy if they have the moolah. :D
Most of all though, even if the demand existed and the tech could be built, it'd never work unless SGI stopped using its pricing-is-secret reseller sales model. They should have adopted a direct sales setup long ago, order on the site, pricing configurator, etc., but that never happened, even though the lack of such an option killed a lot of sales. Less of an issue with the sort of products they sell atm, but a better sales model would be essential if they were to ever try to sell workstations again, and that'd need a huge PR/sales management clearout to be viable.
Pity IBM couldn't pay NV to make custom gfx, that'd be interesting, but then IBM quit the workstation market aswell.
the international system dictates that , and . are the same thing, and as a separator you should use a space. In many countries in Europe, ' is also used. That's fine too as there is no ambiguity. Using . and , for anything that is not the decimal separator in international websites just creates confusion imho. I guess AT doesn't have a style book though.
nice review. but Xeon is not 95% of the market. AMD is still just a bit above 5% on its own. so it deserves a bit salt :) not to mention the fact that competition is good for all of us. if reviewers continue like this all narrowed readers will think there is no competition.
I'm left wondering what a Steamroller-based 16+ core CPU would do here, considering multithreading is better than with previous models. Yes, the Xeons have a large single-threading lead, but more cores = good in the server world, not to mention that such a CPU would severely undercut the price of the competition.
AMD killed off both Streamroller and Excavator chips early on as the Bulldozer and Piledriver chips weren't as competitive. More importantly, OEMs simply were not interested even if those parts were upgrades based upon existing designs. Thus the great AMD server drought began as they effectively have left that market and are hoping for a return with Zen.
Also I should point out that Seattle, AMD's first ARM based Opteron has yet to arrive. This was supposed to be out a year ago and keep AMD's server business going throughout 2015 during the wait for Zen and K12 in 2016. Well K12 has already been delayed into 2017 and Seattle is no where to be found in commercial systems (there are a handle of Seattle developer boards).
That’s not always correct, though. You can have 5% of the market and 20% of the profits, for example, which would put you in a way better position than your competitors (because only a small increase in market share would pay big time).
I've been dealing with IBM Power based machines for 5 years now. Such experience has only given me a major disdain for AIX.
I do NOT advise it for anyone. It sucks to work on. There is a certain consistent, spartan logic to it, but it is difficult to learn, and learning materials are EXTREMELY expensive. I never liked the idea of paying $12,000 for a one week class that taught me barely a tenth of what I needed to know to run an AIX network. (My company paid for the class, but I could not get them to pay for the rest of them, for some reason.) This makes people who can support AIX extremely expensive to employ. Figure on paying twice the rate of a Windows admin in order to employ an AIX admin. Then there is the massive expense of maintenance agreements. Even the software only maintenance agreement, just to get patches for AIX, is $4000 per year per system. They may be competitive in cost up front, but they drain money like vampires to maintain.
Even the most modern IBM Power based machine takes 20-30 minutes to reboot or power up due to POST diagnostics. That alone is annoying enough to make me avoid AIX as much as I can.
No, because I don't earn twice as much. I'm not fully trained in AIX, so I have to muddle my way through dealing with the test machines we have. We don't use them for full production machines, just for testing software for our customers. (Which means I have to reinstall the OS on at least one of those machines about every month or so. That is a BIG pain in the behind due to the boot procedure. Where it takes a couple hours to reinstall Windows or Linux, it takes a full day to do it on an AIX machine.)
I'm trying to advise people to NOT use AIX. It's an awful operating system. I'm also advising people NOT use IBM Power based machines because they are extremely aggravating to work on. Overall, it costs much more to run IBM Power machines, even if they aren't running AIX, than it does to run x86 machines. The up front cost might look competitive, but the maintenance costs are huge. Running AIX on them makes it an order of magnitude more expensive.
I suggest reading the NIM A-Z handbook. It shouldn't take you more than 10 minutes to fully deploy an AIX system fully built and installed with software. As with Linux, it also shouldn't take more than about 10 minutes to install and fully deploy a server if you have any experience scripting installs.
The developerworks community inside IBM is possibly the best free resource you could hope for. Also the redbooks.ibm.com site.
Compared to most *NIX flavors, AIX is UNIX for dummies.
I've worked on AIX platforms extensively for about the same amount of time. First, most of these purchases go through a partner and yours must've sucked because we got great support from our IBM partner -- free training, access to experts, that sort of thing.
Second, I always love the complaining about the cost of the hardware, etc. If you're buying big iron Power servers, the maintenance cost should be near irrelevant. And again, your partner should take care to negotiate that into the deal for 3-5 years ensuring you have access to updates.
The other thing no one ever talks about is *why* you buy these servers. Why do they take so long to boot? Well, for the frame it self, it's a deep POST. But then, mine were never rebooted in 4 years, and that's for firmware upgrades (online) and a couple of interface card swaps (also done online with no service disruption). Do that on x86. So reason #1 -- RAS, at the hardware level. Seriously, how often did you need to reboot the frame?
Reason #2 -- for large enterprises, you can do so much with these with relatively few cores they lead to huge licensing savings in Oracle, IBM software. For us, it was over $1m a year ongoing. And no, switching to other software was not an option. We could run an Oracle RAC on 4 cores of Power 7 (at the time) versus the 32 x86 it was on previously. That saves a lot of $.
The machine reviewed does not run AIX. It's Linux only. So the maintenance, etc. you mention isn't even relevant.
There are still things that are annoying I suppose. AIX is steeped in legacy to some degree, and certainly not as easy to manage as a Linux box. But there are a lot of guides out there for free -- it took me about a month to be fully productive. And the support costs you pay for -- well, if I ran into a wall, I just opened a PMR. IBM was always helpful
I'm mostly working in Linux Devops now, but I remember dreading to use all the "classic" Unix machines at my first "real" job 12 years ago. We ran a few IRIX and AIX boxes which were ancient along itself. Hell even the first thing I did on my work Macbook was to replace the BSD userland with GNU wherever possible.
It's hard to find any information on them and any learning materials are expensive and usually on dead trees. They pretty much want to sell training, consulting etc. along with the often non-competitive Hardware prices since these companies don't actually WANT to sell hardware. They want to sell everything that surrounds it.
The problem with server chips is that its about platform stability. IBM (and others) dropped off the face of the Earth and as mentioned above Intel now has 95% of the market. This chip looks great but will companies buy into it in mass? What if IBM makes another choice to drop off the face of the Earth again and your platform is dead ended? I would have to think long and hard about going with them at this point.
POWER and System Z are two different architectures. Case in point, POWER is a RISC design introduced in the 90's where as the System Z mainframes can trace their roots to a CISC design from the 1960's (and it is still possible to run some of that 1960's code unmodified).
They do share a handful of common parts (think the CDIMMs) to cut down on support costs.
"The z10 processor was co-developed with and shares many design traits with the POWER6 processor, such as fabrication technology, logic design, execution unit, floating-point units, bus technology (GX bus) and pipeline design style, i.e., a high frequency, low latency, deep (14 stages in the z10), in-order pipeline." from the Wiki.
Yes, the z continues the CISC ISA from the 360 (well, sort of) rather than hardware RISC, but as Intel (amongst others) has demonstrated, CISC ISA doesn't have to be in hardware. In fact, the 360/30 (lowest tier) was wholly emulated, as was admitted then. Today, we'd say "micro-instructions". All those billions of transistors could have been used to implement X86 in hardware, but Intel went with emulation, sorry micro-ops.
What matters is the underlying fab tech. That's not going anywhere.
The GX bus in the mainframes was indeed shared by POWER chips as that enabled system level component sharing (think chipsets).
However, attributes like the execution unit and the pipeline depth are different between the POWER6 and z10. At a bird's eye view, they do look similar but the implementation is genuinely different.
Other features like SMT were introduced with the POWER5 but only the most recent z13 chip has 2 way SMT. Features like out-of-order execution, SMT, SIMD were once considered too exotic to validate in the mainframe market that needed absolute certainty in its hardware states. However, recent zArch chips have implemented these features, sometimes decades after being introduced in POWER.
The other thing is that IBM has been attempting to get get more and more of the zArch instruction set to be executed by hardware and no microcode. Roughly 75% to 80% of instructions are handled by microcode (there is a bit of a range here as some are conditional to use microcode).
I believe that benchmark uses about 8 threads and not very well either? Secondly, it is probably very well optimized for SSE/AVX. So you can imagine that the POWER8 will not be very good at it, unless we manually optimize it for Altivec/VSX. And that is beyond my skills :-)
I'm sure no one is still reading this as I'm posting over a month later, but...
I tested handbrake/x264 on a bunch of cross-platform builds including Raspberry Pi 2. I found it would take 24 RPi2s to match a single i5-4670K. That was a gcc compiled handbrake on Raspbian vs the heavily optimized DL copy for Windows. Not too bad really. Also, x264 seems to scale fairly well with the number of cores. Still, POWER8 unoptimized would be interesting, though not a fair test.
BTW, I'd encourage you to use a more standard Linux version than 6-month experimental little-endian version of Ubuntu. The slides you show advertise support for Ubuntu 14.04 LTS, not 15.04. For something this new, you may need the latest, but that is often not the case.
He never claimed the world revolved around him, he just made a true statement that may be worth consideration. Your response is unnecessarily hostile and annoying.
I would expand Jtaylor1986's statement: I believe that most if not all native English speaking populations use commas for thousands grouping in numbers. Since this site is written in English, it might be worthwhile to stick to conventions of native English speakers.
It's possible that there are many more non-native English speakers reading this site who would prefer dots instead of commas, but I doubt it. Only the site maintainers would know though.
Talking to numerous people around Europe about tech stuff, I can't think of any nation from which someone used the decimal point in their emails instead of a comma in this context. I'd assumed the comma was standard for thousands groupings. So which non-US countries do use the point instead? Anyone know?
Cool on the rest of the world part, but the period vs comma as delimiters in the world numeric system ARE backward. In language (universal, or nearly), a comma is used to denote a pause or minor break, and a period is used to denote the end of a complete thought or section. Applied to numerics, and you end up with the American way of doing it.
For future use: just use a space for thousands seperation (that's how I do it on anything that isn't limited to a 7seg-style display), and confuse readers by mixing commas and periods for decimals :P
I like to use a fullstop for the decimal point, an apostrophe for the thousands separator, a comma for separating items in the list, don't start a sentance with a digit. One list of numbers may be : 3'500'000, 45.08, 12'500.8, 9'500. Second list : 45'000, 15'000, 25'000. We use apostrophes when we contract words like don't so why not use it for contracting numbers where we would otherwise have the words thousand, millions, billions etc ?
North America is on the majority side on this issue. Asia, in particular, is almost completely on the side of using a dot as the decimal separator and a comma to put breaks in long numbers.
Get with the program Europe. The world doesn't revolve around you!
or better yet recode this site to see from what country u are visiting and use the appropriate denote thousands symbol, appropriate metric system and even time for the viewer. how meany time i was irradiated when i need to google something like this just to make it in local. and this is easy to do in the site natively. (P.S. Sorry for bad English, not native)
The browser already sends a header with Accept-Language, which should be the preferred way to for the web site to determine locale. For example, I live in Germany so my browser will send en, en_UK, en_US, de and DE (this can be set somewhere in the settings). Now you can determine the language and other localisation based upon that, and there are tools where you can set the locale to then display dates, times etc.. based n that.
Many sites instead use Geolocation based on the IP address, which can be really awkward when you travel to a country where you can't read the language.
Even within EU it's not consistent.. UK uses commas for thousands, dot for decimal (and that's why the US and most anglophone countries use the same setup), Germany, Netherlands and France on the other hand favour dots for thousands, commas for decimal, so you see it used there.. and in a great number of their former colonies where the language and culture has stuck.
Dates / Time are even funnier, that's also extremely inconsistent and can be very misleading. Think for examle a date like 12/11/2015, in some countries it'll mean the 12th of November while in others it will mean the 11th of December, and sometimes even the 2015th of November in 11 AD ;)
I remember reading on the GTA IV in-game "Internet" a travel guide to Europe that said the months here have 12 days but there are 30 months a year or something to that effect ;)
In summary should be something about software support, raw power is here, but software stack is very limited. PowerCPU is good for old legacy apps - SAP, Oracle etc.. but otherwise its dead end. I would like to see comparision of IBM LPAR virtualization against Xeon Vmware solution, or Oracle / MySQL benchmark on power vs. Oracle benchmark on Xeon.
Server with Power even cant to run Crysis, or maybe some QEMU magic..
I have to disagree with "only old legacy". One of things I really want to tackle is running Apache Spark on POWER. Spark is one of the most exciting Big Data tools, it is a very modern piece of software. IBM claims that the POWER8 is very good at it, and I want to check that.
Very interesting review! I've been a PowerPC fan for many years. I even bought a used PowerMac Quad G5 a few years ago to hack on FreeBSD for PowerPC with (much cheaper than the latest gear).
My only suggestion is that I would love to see you run the same benchmarks with big-endian Linux, since the entire stack is so much more mature for PPC than LE Linux, which as you mention wasn't even supported for many years.
Anyone running Java workloads in particular has no business using LE Linux when Java itself uses big-endian data formats, and IBM has 15+ years of tuning the JDK for big-endian Linux & AIX.
TL;DR is the biggest advantage of LE Linux is that it's easier to port large custom apps that were written for x86 and have too many byte ordering issues to fix. The original motivation to make the PowerPC architecture bi-endian instead of big-endian was the hope of a Windows NT port. When Apple went their own way with hardware, and IBM focused on servers, little-endian mode disappeared. It's good that POWER8 supports LE mode again, for customers who really need it, but it's far from the normal mode.
PS. I've been working on fixing bugs in Clang/LLVM for PowerPC arch (32/64-bit). FreeBSD recently switched from GCC 4.2.1 (final GPLv2 version) to Clang as the default system compiler on x86, but LLVM has some code gen bugs for PowerPC that I'm hoping to squash. For now, it doesn't work well enough for me to recommend trying to use Clang as an alternative to GCC for POWER8 benchmarking. Certainly not for little-endian mode.
BTW, the name of the instruction set architecture is still PowerPC, even though IBM's chips are the POWER series. The predecessor architecture was named POWER, so I always write PowerPC to avoid confusion when referring to the instruction set. The PowerPC 970MP chips in my Quad G5 (2 x dual-core) are a derivative of POWER4.
That would be incorrect actually (since they changed since in 2006).
The ISA is (currently) named the Power ISA (previously "PowerPC ISA", the "ISA" bit is quite important to denote hardware architecture vs ISA) (with the current being Power ISA 2.07 B)
Underneath each ISA, there are a variety of designs that all have nice, different names, from POWER1-8, PowerPC (including the various variants used by Apple, like the G5/970), Power-PC-AS, Cell, most LSI controllers (mostly PowerPC 440 (Power ISA 2.03) based, afaik) etc.
I wish I had a what-if machine to see what IBM would be making had they stayed in the consumer space (well, discounting some consoles they're in - currently only the Wii U on an ancient PowerPC 750 modified for tri-core). And how chunky that PowerBook G5 would have been :P
Probably they'd've ended up making architectural tradeoffs that made their cores a lot more like Intel. As it is, they can optimize their designs for very high power isn't a problem because the power cost is a small fraction of the TCO on one of their monster servers; and a relatively minor concern for consoles (ie just dropping the core count gets them desktop CPU level thermals which are good enough). If they were still selling to Apple, they'd need to be well optimized for performance at only a few watts/core for laptops. Huge L1 caches and massive SMT would be gone because they'd devour battery power to little benefit on systems that generally functioned at very low average CPU loads vs on a mega server or mainframe where if you're not pushing enough work onto it to keep it at a high load level you're doing it wrong.
While Apple's engineers were given the task of a PowerBook G5, they new it could never happen due to thermals and a very arcane chipset. Case in point, the PowerPC 970 could not boot itself: it needed a service processor to calibrate and initialize the frontside bus for the processor before it could take control. Justifiable for servers but unnecessary for a consumer laptop.
The expected Powerbook G5's were supposed to be using PA-Semi chips. Due to IBM not meeting Apple's goals, they switched to Intel and the PA-Semi deal fell through with it. However, their dealings with Apple did lead to Apply buying them out a few years later to help design custom ARM SoCs and eventually the custom ARM cores used in the iPhone/iPad of today.
Would love to hear some thoughts on what the possible problems could arise if we rerun our tests on BE linux. Because our best benchmarks are all all based upon some data stored on our x86 fileservers - so they are probably stored in LE.
If all you do is just mount the network volume to use the data, then likely nothing at all. While binaries do have to be modified, the file systems themselves are written to store data in a single consistent manner. If you're wondering more if there would be some overhead in translating from LE to BE to work in memory, conceptually the answer is yes but I'd predict it be rather small and dwarfed by the time to transfer data over a network. I'd be curious to see the results.
Ultiamtely I'd be more concerned with kernel modules for various peripherals when switching between LE and BE versions. Considering that POWER has been BE for a few generations and you did your initial testing using LE, availability shouldn't be an issue. You've been using the version which should have had the most problems in this regard.
So basically power is somewhat competitive with intel's WORST price/perf chips which also happen to have the worst memory bandwidth/CPU. Seems nowhere close for the more reasonable $400-$650 xeons like the D-1520/1540 or the E5-2620 and E5-2630. Sure IBM has better memory bandwidth than the worst intels, but if you want more memory bandwidth per $ or per core then get the E5-2620.
It is definitely not an alternative for applications where performance/watt is important. As you mentioned, Intel offers a much better range of SKUs . But for transactional databases and data mining (traditional or unstructured), I see the POWER8 as very potent challenger. When you are handling a few hundreds of gigabytes of data, you want your memory to be reliable. Intel will then steer you to the E7 range, and that is where the POWER8 can make a difference: filling the niche between E5 and E7.
Especially if you're running software that doesn't easily scale out very well these are very competitive. And nowadays even MySQL will scale-up nicely to many, many cores.
"Less important, but still significant is the fact that IBM uses SAS disks, which increase the cost of the storage system, especially if you want lots of them."
The Dell servers I've used had SAS controllers, and every SAS controller I've dealt with supported using SATA drives. I'm pretty sure SATA compatibility is in the SAS specification. In fact, the Dell R730 quoted in this review supports SAS drives. There shouldn't be anything stopping you from using the same drives in both servers.
You are absolutely right about SATA drives being compatible with a SAS controller. However, afaik IBM gives you only the choice between their own rather expensive SAS drives and SSDs. And maybe I have looked over it, but in general DELL let you only chose between SATA and SSDs. And this has been the trend for a while: SATA if you want to keep costs low, SSDs for everything else.
And mounting a storage server made out of commodity hardware over a couple of lanes of 10Gbit Ethernet if you don't want to pay the exotic-hardware-supplier's markup on disc.
I forgot to mention: VMX is better known as AltiVec (it's also called "Velocity Engine" by Apple). It's a very nice SIMD extension that was supported by Apple's G4 (Motorola/Freescale 7400/7450) and G5 (IBM PPC 970) Macs, as well as the PPC game consoles.
It would be interesting to compare the Linux VMX crypto acceleration to code written to use the newer native AES & other instructions. In x86 terms, it'd be like SSE-optimized AES vs. the AES-NI instructions.
I had a dual 450 MHz G4 system and AltiVec was quite amazing in iTunes when doing encoding. Between the second processor and the AltiVec putting things into ALAC was very fast (in comparison with other machines at the time like the G3 and the AMD machines I had).
suggestions on how to to do this? OpenSSL 1.02 will support the build in crypto accelerator, but I am not sure on how I would be able to see if the crypto code uses VMX.
In terms of 100, POWER Software Ecosystem manage to scale from 10 to 20, so that is a 100% increase but still very very low. Will we see POWER CPU / Server that is cheap enough to compete with Xeon E3 / E5, where most of the volume are? Compared to E7 is like comparing Server CPU for the 10% of the market.
Intel will be moving to 14nm E7, I don't see anyone making POWER CPU at 14nm anytime soon.
Intel DC business are growing, and it desperately need a competitor, such as POWER to combat E7 and AMD Zen from the bottom.
Nice review! It just confirms my question however of "What does IBM do?" Seriously, what do they do anymore? All I see are headlines for things that never come out as actual products. Their servers suck up too much power per watt, they don't have their own semi conductor foundries, their semi conductor research seems like a bunch of useless paper tiger stuff, their much vaunted AI is better at playing Jeapordy than seemingly any real world use.
Countdown to complete IBM bankruptcy/spinoff/selloff is closer than ever.
Since the dawn of computing, IBM has been in the business of providing solutions, rather than merely hardware. When you buy IBM you pay a huge amount of money, and what you get for that is support, with some hardware thrown in.
Obviously this only appeals to wealthy customers who don't have or don't want to have an internal support organization that can duplicate what IBM offers. It seems to me that the number of such customers is decreasing over time, but as long as the US government is around, IBM will have at least one customer.
Pretty fair and even handed review; don't agree with it all and definitely feel there is room to learn and improve. Btw, full disclosure, I am a System Architect focusing on Power technology for a Business Partner.
MySQL is definitely relevant but with the new Linux distro's packaging MariaDB in place of MySQL I would have liked to see an Intel vs Power comparison with this MySQL alternative. MariaDB just announced v10.1 is delivering over 1M queries per second on POWER8. https://blog.mariadb.org/10-1-mio-qps/
Benchmarks are great, all vendors do them and most people realize you should take them with a grain of salt. One benefit of Power servers when using PowerVM, its native firmware based hypervisor is that it delivers tremendous compute efficiency to VM's. On paper things like TDP seem higher for Power vs Intel (especially E5_v3 chips) but when Power servers deliver consolidation ratio's with 2-4X (and greater) more VM's per core the TCA & TCO get real interesting. One person commented how SAP on Power would blow out a budget. It does just the opposite because how you can run in a Tier-2 architecture obtaining intra-server VM to VM efficiencies, compute efficiencies with fewer cores & servers which impacts everything in the datacenter. Add in increased reliability & serviceability features and you touch the servers less which means your business is running longer.
Great feedback. We hope to get access to another POWER8(+) server and build further upon our existing knowledge. We have real world experience with Spark, so it is definitely on the list. The blog you linked seems to have used specific SPARK optimization for POWER, but the x86 reference system looks a bit "neglected". A real independent test would be very valuable there. The interesting part of Spark is that a good benchmark would be also very relevant for the real world as peak performance is one of the most important aspects of Spark, in contrast with databases where maximum performance is only a very small part of the experience.
About MySQL, people have pointed out that the 5.7 version seems to scale a lot better, so that is together with MariaDB also on my "to test" list. Redis does not seem relevant for this kind of machine, it is single-threaded, almost impossible to test 160 instances.
The virtualization part is indeed one of the most interesting parts, but it is a benchmarking nightmare. You got to keep response times at more or less the same levels while loading the machine with more and more VMs. We did that kind of testing until 2 years ago on x86, but it was very time consuming and we had a deep understanding on how vSphere worked. Building that kind of knowledge on PowerVM might be beyond our manpower and time :-).
Well, I think you should kick Franz Bourlet, for not hooking you up with with a IBM technical Advocate who actually knew the technology. Such a person could have shown you the robes and helped you understand the kit better. Again Franz is a sales guy.
IMHO selecting Ubuntu as the Linux distro, did not help you. It's new to the POWER platform and does not have the same robustness as for example SLES which have been around for 10+ years on POWER.
The fact that you are getting better results using gcc generated code rather than xLC, shows me that something is not right. And that the IBM JDK isn't working is well also an indicator that something is now right. IMHO selecting Ubuntu, did not make Things easier for you Guys.
And for really optimized code you need to install and use High performance math libraries for POWER (MASS), which is an addon math library.
And AFAIR having 8 memory modules, only enables half the memory bandwidth of the system.
So IMHO IBM didn't help you make their system look good.
But again that is what you get when you get rid of all the clever people :)
compared to the pentium 4 the mips r16k with loads of l3 cache was a bzip2 beast, outperforming the pentium 4 which ran at twice the clock speed and more. despite that the usage of zip programs is what these server processors are build.
Just curious, do you know of any comparative results anywhere for bzip2 on old MIPS vs. other CPUs? It's not something I've seen mentioned before, at least not with respect to SGIs, but perhaps I can run som tests the next time I obtain a quad-R16K/1GHz (16MB L2) Tezro. Best I have at is only an R16K/900MHz (8MB L2) single-CPU Fuel and various configs of Tezro and Onyx350 from 4 to 16x 700MHz with 8MB L2. Just a pity SGI never got to employ multi-core MIPS (it was planned, but alas never happened).
Oddly, back when current, MIPS' real strength was fp. Over time it fell behind badly for general int, though for SGI's core markets that didn't really matter ("It's the bandwidth, stupid!" - famous quote from Mashey IIRC). MIPS could have caught up with MDMX and MIPS V ISA, especially with the initially intended merged Cray vector stuff, but again that all fell away once the design talent moved to Intel in 1996/7.
I guess you mean T7 with SPARC M7 inside and not T5. If so, then yes, M7 looks quite capable, but unfortunately provides horrible price/performance ratio. POWER8 box starts at ~6.5k $ while T7-1 on ~40k $. So on SPARC front we'll need to see if Oracle is going to change that with Sonoma chip.
Thank you Johan for this amazingly well written and well researched article.
I have to agree with a few people here that question your choice of using LE Ubuntu to test. Traditionally people who use Linux on POWER use SUSE, and some use RHEL, but Ubuntu? Nothing against them, and I love apt, but it's just not a mature platform.
Try with something more representative such as BE SLES and you will find a vastly different types ecosystem maturity.
But thanks again, and also thanks to AT for caring about such subjects and publishing these tests.
Thank you for taking the time to write up some constructive feedback. I have years of experience with ubuntu and linux and I wanted to play it safe. Running benchmarks on "new" hardware with a new ISA (from my perspective) is pretty complex. C-ray and 7-zip are the only exceptions, but most real server apps (NAMD, ElasticSearch, Spark) depends on many layers of software.
In theory the OS/ distro is more important to get applications working than the ISA. In practice, it might have been better to bet on the distro with the most maturity and adapt our scripts and installation procedures to Suse.
But as soon as I get the chance, I'll try out BE suse or redhat on a POWER system.
Blinkenlights is just a mirror, and not the primary mirror either (that would be the vintagecomputers site).
Btw, it's a pity you didn't use the same image sizes & settings as used on the main c-ray site, because then I could have included the results on my page (ie. 'sphfract' at 800x600, 1024x768 with 8X oversampling, and 7500x3500), or did you just use the same settings that Phoronix employs?
Also, John Tsiombikas, the guy who wrote C-ray, told me some interesting things about the test and how it works (info included on the page), most especially that it is highly vulnerable to compiler optimisations which can produce results that are even less realistic than real life workloads. I'm glad thought that you did at least use the sphfract test, since at a sensible resolution or with oversampling it easily pushes the test out of just L1 (the 'scene' test is much smaller). But yeah, overall, c-ray was never intended to be used as a benchmark, it's just taken off somehow, perhaps because the scanline method of threading makes it scale very well.
Hmm, I really must sort out the page formatting one of these days, and move the most complex test tables to the top. Never seem to find the time...
Thanks!!
Ian.
PS. I always obtained the best results by having more threads than the no. of cores/CPUs, or is this something which doesn't work with non-MIPS systems?
I did not know you used 7500x3500, my testing was inspired on what the rest of the benchmarking community (Phoronix, Serverthehome) was using (obviously, 1024x768 is too small for current servers).
Most welcome! And I really should move the more complex tests to the top of the page...
Oh, my wording about threads was not what I'd intended. What I meant was, the no. of threads being larger than the supported no. of hardware threads. Thus, for a 12-core Power8 with 8 threads per core, try using 192 or 384 threads, instead of just the nominal 96 one might assume would make sense.
The Power scale out boxes will save on your running and software costs as you can reduce your software licensing and server footprint.
With the OpenPOWER Foundation, you now have companies such as Tyan and Wistron who also create their own POWER8 servers and sell them independently of IBM. If you have not looked at The OpenPOWER Foundation and the innovation it brings through community and collaboration, your missing out big time!
It's funny how this article is trying to "sell" me the system but I'm still not impressed. Costs more, less performance, and uses more power at idle and load than the Intel system.
Having a lot of software that isn't really well ported is probably going to remain a problem for Power8 for years to come since so few people have access to these kinds of systems and the cost is prohibitive. The great thing with x86 and ARM is that you can use it at home/work pretty easily without shelling out a lot of money. On x86 you can be sure if your software builds locally and runs locally it will also run on your server.
1. I submit that the headline is misleading. Intel x86 does not compete with POWER at the high end. POWER L & LC line of servers are comparable to x86 based servers. IBM POWER is taking the battle to Intel's home turf. 2. The analysis leaves out cost of SW. Many organizations use commercial software which are priced per core. If POWER can do with 10 cores what Intel does with 18 cores, that means HUGE savings. 3. OPEN POWER is a huge move. I think the market will start seeing the results soon.
An excellent review as always Johan. (haha...to zeeBomb. It is my understanding that Johan doesn't post probably as often as he might have otherwise like to because testing servers/enterprise computing solutions takes a LOT longer than testing/benching consumer-level systems. Some of the HPC applications that I run takes hours to days for each run, so when you're running it, you're running those tests over and over again, and before you know it, a month has gone by (or you've ran out of time with the system) or you have to purposely cut it short so that you can test a variety of software.)
It's unfortunate that IBM never ported AIX to x86 (unlike Solaris.) I think that there would be more people trying to get into it if the cost of entry (even just to learn) isn't so high. I've looked at getting an old POWER4 system before for that purpose, but by then, the systems are so old and slow that it's like "what's the point?" I think that IBM is literally pricing themselves into extinction (along with their entire hardware/software ecosystem). Unfortunately for many businesses, AIX POWER servers still run their mainframe/backend which means that if you want to get paid $100k+ outta college - go learn AIX on POWER. As the current generation of sysadmins are starting to age and retire out, and they're going to have a hard time finding qualified people, the only way eventually would be that they would have to pay top dollar just to attract people into the field. (Unless they decide to move everything over to the x86/Xeon/Linux world. But for some mainframes (like financial institutions), that's DEFINITELY easier said than don).
Technically this is not true. IBM had a working version of AIX running on PS/2 systems as late as the 1.3 release. Unfortunately support was withdrawn and future releases of AIX were not compiled for x86 compatible processors. One can still find a copy of this release if one knows where to look. It's completely useless to anyone but a museum or curious hobbyist, but it's out there.
I was reading this article, and I found it interesting. Since I am a developer for the IBM XL compiler, the comparisons between GCC and XL were particularly interesting. I tried to reproduce the results you are seeing for the LZMA benchmark. My results were similar, but not exactly the same.
When I compared GCC 4.9.1 (I know a slightly different version that you) to XL 13.1.2 (I assume this is the version you used), I saw XL consistently ahead of GCC, even when I used -O3 for both compilers.
I'm still interested in trying to reproduce your results, so I can see what XL can do better, so I have a couple questions on areas that could be different.
1) What version of the XL compiler did you use? I assumed 13.1.2, but it is worth double checking. 2) Which version of the 7-zip software did you use? I picked up p7zip 15.09. 3) Also, I noticed when the Power 8 machine was running at full capacity (for me that was 192 threads on a 24 core machine), the results would fluctuate a bit. How many runs did you do for each configuration? Were the results stable? 4) Did you try XL at the less aggressive and more stable options like "-O3" or "-O3 -qhot"?
Other than the ridiculous price of CDIMMs the power efficiency just doesn't look healthy. For data centers leasing their hardware like Amazon AWS, Google AppEngine, Azure, Rackspace, etc, clients who pay for hardware yet fail to use their allocation significantly help the bottom line of those companies by reduced overheads. For others high usage is a mandatory part of the ROI equation during its period as an operating asset, thus power consumption is a real cost. Even with our small cluster of 12 nodes the power efficiency is a real consideration, let alone companies standardizing toward IBM and utilising 100s or 1000s of nodes that are arguably less efficient.
Perhaps you could devise some sort of theoretical total cost of ownership breakdown for these articles. My biggest question after all of this is, which one gets the most work done with the lowest overheads. Don't get me wrong though, I commend you and AnandTech on the detail you already provide.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
146 Comments
Back to Article
Der2 - Friday, November 6, 2015 - link
Life's good when you got power.BlueBlazer - Friday, November 6, 2015 - link
Aye, the power bills will skyrocket.Brutalizer - Friday, November 13, 2015 - link
It is confusing that sometimes you are benchmarking cores, and sometimes cpus. The question here is "which is the fastest cpu, x86 or POWER8" - and then you should bench cpu vs cpu. Not core vs core. If a core is faster than another core says nothing, you also need to know how many cores there are. Maybe one cpu has 2 cores, and the other has 1.000 cores. So when you tell which core is fastest, you give incomplete information so I still have to check up how many cores and then I can conclude which cpu is fastest. Or can I? There are scaling issues, just because one benchmark runs well on one core, does not mean it runs equally well when run on 18 cores. This means I can not extrapolate from one core to the entire cpu. So I still am not sure which cpu is fastest as you give me information about core performance. Next time, if you want to talk about which cpu is faster, please benchmark the entire cpu. Not core, as you are not talking about which core is faster.Here are 20+ world records by SPARC M7 cpu. It is typically 2-3x faster than POWER8 and Intel Xeon, all the way up to >10x faster. For instance, M7 achieves 87% higher saps than E5-2699v3.
https://blogs.oracle.com/BestPerf/
The big difference between POWER and SPARC vs x86, is scalability and RAS. When I say scalability, I talk about scale-up business Enterprise servers with as many as 16- or even 32-sockets, running business software such as SAP or big databases, that require one large single server. SGI UV2000 that scales to 10.000s of cores can only run scale-out HPC number crunching workloads, in effect, it is a cluster. There are no customers that have ever run SGI UV2000 using enterprise business workloads, such as SAP. There are no SAP benchmarks nor database benchmarks on SGI UV2000, because they can only be used as clusters. The UV2000 are exclusively used for number crunching HPC workloads, according to SGI. If you dont agree, I invite you to post SAP benchmarks with SGI UV2000. You wont find any. The thing is, you can not use a small cluster with 10.000 cores and replace a big 16- or 32-socket Unix server running SAP. Scale-out clusters can not run SAP, only scale-up servers can. There does not exist any scale-out clustered SAP benchmarks. All the highest SAP benchmarks are done by single large scale-up servers having 16- or 32-sockets. There are no 1.000-socket clustered servers on the SAP benchmark list.
x86 is low end, and have for decades stopped at maximum 8-sockets (when we talk about scale-up business servers), and just recently we see 16- and 32- sockets scale-up business x86 servers on the market (HP Kraken, and SGI UV300H) but they are brand new, so performance is quite bad. It takes a couple of generations until SGI and HP have learned and refined so they can ramp up performance for scale-up servers. Also, Windows and Linux has only scaled to 8-sockets and not above, so they need a major rewrite to be able to handle 16-sockets and a few TB of RAM. AIX and Solaris has scaled to 32-sockets and above for decades, were recently rewritten to handle 10s of TB of RAM. There is no way Windows and Linux can handle that much RAM efficiently as they have only scaled to 8-sockets until now. Unix servers scale way beyond 8-sockets, and perform very well doing so. x86 does not.
The other big difference apart from scalability is RAS. For instance, for SPARC and POWER you can hot swap everything, motherboards, cpu, RAM, etc. Just like Mainframes. x86 can not. Some SPARC cpus can replay instructions if something went wrong. x86 can not.
For x86 you typically use scale-out clusters: many cheap 1-2 socket small x86 servers in a huge cluster just like Google or Facebook. When they crash, you just swap them out for another cheap server. For Unix you typically use them as a single scale-up server with 16- or 32-sockets or even 64-sockets (Fujitsu M10-4S) running business software such as SAP, they have the RAS so they do not crash.
zeeBomb - Friday, November 6, 2015 - link
New author? Niiiice!Ryan Smith - Friday, November 6, 2015 - link
Ouch! Poor Johan.=(Johan is in fact the longest-serving AT editor. He's been here almost 11 years, just a bit longer than I have.
hans_ober - Friday, November 6, 2015 - link
@Johan you need to post this stuff more often, people are forgetting you :)JohanAnandtech - Friday, November 6, 2015 - link
well he started with "niiiiice". Could have been much worse. Hi zeeBomb, I am Johan, 43 years old and already 17 years active as a hardware editor. ;-)JanSolo242 - Friday, November 6, 2015 - link
Reading Johan reminds me of the days of AcesHardware.com.joegee - Friday, November 6, 2015 - link
The good old days! I remember so many of the great discussions/arguments we had. We had an Intel guy, an AMD guy, and Charlie Demerjian. Johan was there. Mike Magee would stop in. So would Chris Tom, and Kyle Bennett. It was an awesome collection of people, and the discussions were FULL of real, technical points. I always feel grateful when I think back to Ace's. It was quite a place.JohanAnandtech - Saturday, November 7, 2015 - link
And many more: Paul Demone, Paul Hsieh (K7-architect), Gabriele svelto ... Great to see that people remember. :-)joegee - Thursday, November 19, 2015 - link
It was an awesome community. I learned so much from everyone. I remember the days when we'd write pages arguing whether AMD's new 64 bit extension to x86 was truly 64 bit. The discussions could be heated, but they were seldom rude. I wish there were something similar today. :/Kevin G - Saturday, November 7, 2015 - link
Aces brings back memories for me as well even though I mainly lurked there.A solid chunk of that group have moved over to RWT.
joegee - Thursday, November 19, 2015 - link
What is RWT?psychobriggsy - Friday, November 6, 2015 - link
Get back to Aces Hardware you!JohanAnandtech - Saturday, November 7, 2015 - link
Like Ryan said, I have been working 11 years at Anand. In other words, it is great working at Anandtech. AT is one of the few tech sites out there that still values deep analysis and allows the editors to take the time to delve deep.joegee - Friday, November 6, 2015 - link
And still writing as well as you ever did! Keep up the good work, Johan!rrossi - Saturday, November 7, 2015 - link
Dear Johan nice article. Did u ever consider sparse system solving (with preconditioning) as a sensitive benchmark? It is a crucial stage of most scientific applications and it is a bandwidth limited operation with a high degree of parallelism. It would be definitely interesting to see how the power 8 fares on such a test. If you are interested I think I could provide a pointer to a simple benchmark (to be compiled). If you feel it may be interesting just drop me an email.JohanAnandtech - Saturday, November 7, 2015 - link
Interested... mail me, I don't have your mail. See the author link on top of the article.Ian Cutress - Saturday, November 7, 2015 - link
I'd also like to be pointed to such a benchmark for workstation style tests on x86. Please email ian@anandtech.com with info :)MartinT - Friday, November 6, 2015 - link
Johan's been with Anandtech for more than a decade, and has been publishing on the subject since the late 90s.But I very much second your "Niiiiice!," as reading his name always reminds me of the old days over at aceshardware, and I'm always looking forward to his insights!
LemmingOverlord - Friday, November 6, 2015 - link
Mate... Bite your tongue! Johan is THE man when it comes to Datacenter-class hardware. Obviously he doesn't get the same exposure as teh personal technology guys, but he is definitely one of the best reviewers out there (inside and outside AT).joegee - Friday, November 6, 2015 - link
He's been doing class A work since Ace's Hardware (maybe before, I found him on Ace's though.) He is a cut above the rest.nismotigerwvu - Friday, November 6, 2015 - link
Johan,I think you had a typo on the first sentence of the 3rd paragraph on page 1.
"After seeing the reader interestin POWER8 in that previous article..."
Nice read overall and if I hadn't just had my morning cup of coffee I would have missed it too.
Ryan Smith - Friday, November 6, 2015 - link
Good catch. Thanks!Essence_of_War - Friday, November 6, 2015 - link
That performance per watt, it is REALLY hard to keep up with the Xeons there!III-V - Friday, November 6, 2015 - link
IBM's L1 data cache has a 3-cycle access time, and is twice as large (64KB) as Intel's, and I think I remember it accounting for something like half the power consumption of the core.Essence_of_War - Friday, November 6, 2015 - link
Whoa, neat bit of trivia!JohanAnandtech - Saturday, November 7, 2015 - link
Interesting. Got a link/doc to back that up? I have not found such detailed architectural info.Henriok - Friday, November 6, 2015 - link
Very nice to see tests of non-x86 hardware. It's interesting too se a test of the S822L when IBM just launched two even more price competitive machines, designed and built by Wistron and Tyan, as pure OpenPOWER machines: the S812LC and S822LC. These can't run AIX, and are substantially cheaper than the IBM designed machines. They might lack some features, but they would probably fit nicely in this test. And they are sporting the single chip 12 core version of the POWER8 processor (with cores disabled).DanNeely - Friday, November 6, 2015 - link
"The server is powered by two redundant high quality Emerson 1400W PSUs."The sticker on the PSU is only 80+ (no color). Unless the hotswap support comes with a substantial penalty (if so why); this design looks to be well behind the state of the art. With data centers often being power/hvac limited these days, using a relatively low efficiency PSU in an otherwise very high end system seems bizarre to me.
hissatsu - Friday, November 6, 2015 - link
You might want to look more closely. Thought it's a bit blurry, I'm almost certain that's the 80+ Platinum logo, which has no color.DanNeely - Friday, November 6, 2015 - link
That's possible; it looks like there's something at the bottom of the logo. Google image search shows 80+ platinum as a lighter silver/gray than 80+ silver; white is only the original standard.Shezal - Friday, November 6, 2015 - link
Just look up the part number. It's a Platinum :)The12pAc - Thursday, November 19, 2015 - link
I have a S814, it's Platinum.johnnycanadian - Friday, November 6, 2015 - link
Oh yum! THIS is what I still love about AT: non-mainstream previews / reviews. REALLY looking forward to more like this. I only wish SGI still built workstation-level machines. :-(mapesdhs - Tuesday, November 10, 2015 - link
Indeed, but it'd need a hefty change in direction at SGI to get back into workstations again, so very unlikely for the forseeable future. They certainly have the required base tech (NUMALink6, MPI offload, etc.), namely lots of sockets/cores/RAM coupled with GPUs for really heavy tasks (big data, GIS, medical, etc.), ie. a theoretical scalable, shared-memory workstation. But the market isn't interested in advanced performance solutions like this atm, and the margin on standard 2/4-socket systems isn't worthwhile, it'd be much cheaper to buy a generic Dell or HP (plus, it's only above this no. of sockets that their own unique tech comes into play). Pity, as the equivalent of a UV 30/300 workstation would be sweet (if expensive), though for virtually all of the tasks discussed in this article, shared memory tech isn't relevant anyway. The notion of connectable, scalable, shared memory workstations based on NV gfx, PCIe and newer multi-core MIPS CPUs was apparently brought up at SGI way back before the Rackable merger, but didn't go anywhere (not viable given the financial situation at the time). It's a neat concept, eg. imagine being able to connect two or more separate ordinary 2/4-socket XEON workstations together (each fitted with, say, a couple of M6000s) to form a single combined system with one OS instance and resources pool, allowing users to combine & split setups as required to match workloads, but it's a notion whose time has not yet come.
Of course, what's missing entirely is the notion of advanced but costly custom gfx, but again there's no market for that atm either, at least not publicly. Maybe behind the scenes NV makes custom stuff the way SGI used to for relevant customers (DoD, Lockheed, etc.), but SGI's products always had some kind of commercially available equivalent from which the custom builds were derived (IRx gfx), whereas atm there's no such thing as a Quadro with 30000 cores and 100GB RAM that costs $50K and slides into more than one PCIe slot which anyone can buy if they have the moolah. :D
Most of all though, even if the demand existed and the tech could be built, it'd never work unless SGI stopped using its pricing-is-secret reseller sales model. They should have adopted a direct sales setup long ago, order on the site, pricing configurator, etc., but that never happened, even though the lack of such an option killed a lot of sales. Less of an issue with the sort of products they sell atm, but a better sales model would be essential if they were to ever try to sell workstations again, and that'd need a huge PR/sales management clearout to be viable.
Pity IBM couldn't pay NV to make custom gfx, that'd be interesting, but then IBM quit the workstation market aswell.
Ian.
mostlyharmless - Friday, November 6, 2015 - link
"There is definitely a market for such hugely expensive and robust server systems as high end RISC machines are good for about 50.000 servers. "Rounding error?
DanNeely - Friday, November 6, 2015 - link
50k clients would be my guess.FunBunny2 - Friday, November 6, 2015 - link
(dot) versus (comma) most likely. Euro centric versus 'Murcan centric.DanNeely - Friday, November 6, 2015 - link
If that was the case, a plain 50 would be much more appropriate.extide - Friday, November 6, 2015 - link
No he meant that in a lot of the european countries they use the dot as a comma, so it would be 50.000 to mean 50 thousand.Murloc - Sunday, November 8, 2015 - link
the international system dictates that , and . are the same thing, and as a separator you should use a space.In many countries in Europe, ' is also used. That's fine too as there is no ambiguity.
Using . and , for anything that is not the decimal separator in international websites just creates confusion imho.
I guess AT doesn't have a style book though.
duploxxx - Friday, November 6, 2015 - link
nice review.but Xeon is not 95% of the market. AMD is still just a bit above 5% on its own. so it deserves a bit salt :) not to mention the fact that competition is good for all of us. if reviewers continue like this all narrowed readers will think there is no competition.
silverblue - Friday, November 6, 2015 - link
I'm left wondering what a Steamroller-based 16+ core CPU would do here, considering multithreading is better than with previous models. Yes, the Xeons have a large single-threading lead, but more cores = good in the server world, not to mention that such a CPU would severely undercut the price of the competition.Shame it isn't ever going to happen!
lmcd - Friday, November 6, 2015 - link
Or even an Excavator! It's a shame AMD didn't just keep Bulldozer developing internally until at least Piledriver, and iterate on Thuban.Kevin G - Saturday, November 7, 2015 - link
AMD killed off both Streamroller and Excavator chips early on as the Bulldozer and Piledriver chips weren't as competitive. More importantly, OEMs simply were not interested even if those parts were upgrades based upon existing designs. Thus the great AMD server drought began as they effectively have left that market and are hoping for a return with Zen.Also I should point out that Seattle, AMD's first ARM based Opteron has yet to arrive. This was supposed to be out a year ago and keep AMD's server business going throughout 2015 during the wait for Zen and K12 in 2016. Well K12 has already been delayed into 2017 and Seattle is no where to be found in commercial systems (there are a handle of Seattle developer boards).
JoeMonco - Saturday, November 7, 2015 - link
When you account for only 5% of the market while the other side commands 95%, you aren't really much of a credible competitor.xype - Sunday, November 8, 2015 - link
That’s not always correct, though. You can have 5% of the market and 20% of the profits, for example, which would put you in a way better position than your competitors (because only a small increase in market share would pay big time).Murloc - Sunday, November 8, 2015 - link
that applies more to consumer products, e.g. apple.dgingeri - Friday, November 6, 2015 - link
I've been dealing with IBM Power based machines for 5 years now. Such experience has only given me a major disdain for AIX.I do NOT advise it for anyone. It sucks to work on. There is a certain consistent, spartan logic to it, but it is difficult to learn, and learning materials are EXTREMELY expensive. I never liked the idea of paying $12,000 for a one week class that taught me barely a tenth of what I needed to know to run an AIX network. (My company paid for the class, but I could not get them to pay for the rest of them, for some reason.) This makes people who can support AIX extremely expensive to employ. Figure on paying twice the rate of a Windows admin in order to employ an AIX admin. Then there is the massive expense of maintenance agreements. Even the software only maintenance agreement, just to get patches for AIX, is $4000 per year per system. They may be competitive in cost up front, but they drain money like vampires to maintain.
Even the most modern IBM Power based machine takes 20-30 minutes to reboot or power up due to POST diagnostics. That alone is annoying enough to make me avoid AIX as much as I can.
psychobriggsy - Friday, November 6, 2015 - link
So you are complaining that your job's selection of hardware has made you earn twice as much?dgingeri - Friday, November 6, 2015 - link
No, because I don't earn twice as much. I'm not fully trained in AIX, so I have to muddle my way through dealing with the test machines we have. We don't use them for full production machines, just for testing software for our customers. (Which means I have to reinstall the OS on at least one of those machines about every month or so. That is a BIG pain in the behind due to the boot procedure. Where it takes a couple hours to reinstall Windows or Linux, it takes a full day to do it on an AIX machine.)I'm trying to advise people to NOT use AIX. It's an awful operating system. I'm also advising people NOT use IBM Power based machines because they are extremely aggravating to work on. Overall, it costs much more to run IBM Power machines, even if they aren't running AIX, than it does to run x86 machines. The up front cost might look competitive, but the maintenance costs are huge. Running AIX on them makes it an order of magnitude more expensive.
serpint - Friday, November 6, 2015 - link
I suggest reading the NIM A-Z handbook. It shouldn't take you more than 10 minutes to fully deploy an AIX system fully built and installed with software. As with Linux, it also shouldn't take more than about 10 minutes to install and fully deploy a server if you have any experience scripting installs.The developerworks community inside IBM is possibly the best free resource you could hope for. Also the redbooks.ibm.com site.
Compared to most *NIX flavors, AIX is UNIX for dummies.
agtcovert - Tuesday, November 10, 2015 - link
If you had a NIM server setup and were using LPARs, loading a functional image of AIX should take 10 minutes flat, on a 1G network.If you're loading AIX on a physical machine without using the virtualization, you're wasting the server.
agtcovert - Tuesday, November 10, 2015 - link
I've worked on AIX platforms extensively for about the same amount of time. First, most of these purchases go through a partner and yours must've sucked because we got great support from our IBM partner -- free training, access to experts, that sort of thing.Second, I always love the complaining about the cost of the hardware, etc. If you're buying big iron Power servers, the maintenance cost should be near irrelevant. And again, your partner should take care to negotiate that into the deal for 3-5 years ensuring you have access to updates.
The other thing no one ever talks about is *why* you buy these servers. Why do they take so long to boot? Well, for the frame it self, it's a deep POST. But then, mine were never rebooted in 4 years, and that's for firmware upgrades (online) and a couple of interface card swaps (also done online with no service disruption). Do that on x86. So reason #1 -- RAS, at the hardware level. Seriously, how often did you need to reboot the frame?
Reason #2 -- for large enterprises, you can do so much with these with relatively few cores they lead to huge licensing savings in Oracle, IBM software. For us, it was over $1m a year ongoing. And no, switching to other software was not an option. We could run an Oracle RAC on 4 cores of Power 7 (at the time) versus the 32 x86 it was on previously. That saves a lot of $.
The machine reviewed does not run AIX. It's Linux only. So the maintenance, etc. you mention isn't even relevant.
There are still things that are annoying I suppose. AIX is steeped in legacy to some degree, and certainly not as easy to manage as a Linux box. But there are a lot of guides out there for free -- it took me about a month to be fully productive. And the support costs you pay for -- well, if I ran into a wall, I just opened a PMR. IBM was always helpful
nils_ - Wednesday, November 11, 2015 - link
I'm mostly working in Linux Devops now, but I remember dreading to use all the "classic" Unix machines at my first "real" job 12 years ago. We ran a few IRIX and AIX boxes which were ancient along itself. Hell even the first thing I did on my work Macbook was to replace the BSD userland with GNU wherever possible.It's hard to find any information on them and any learning materials are expensive and usually on dead trees. They pretty much want to sell training, consulting etc. along with the often non-competitive Hardware prices since these companies don't actually WANT to sell hardware. They want to sell everything that surrounds it.
retrospooty - Friday, November 6, 2015 - link
The problem with server chips is that its about platform stability. IBM (and others) dropped off the face of the Earth and as mentioned above Intel now has 95% of the market. This chip looks great but will companies buy into it in mass? What if IBM makes another choice to drop off the face of the Earth again and your platform is dead ended? I would have to think long and hard about going with them at this point.FunBunny2 - Friday, November 6, 2015 - link
Not likely. the mainframe z machines are built using POWER blocks.Kevin G - Friday, November 6, 2015 - link
POWER and System Z are two different architectures. Case in point, POWER is a RISC design introduced in the 90's where as the System Z mainframes can trace their roots to a CISC design from the 1960's (and it is still possible to run some of that 1960's code unmodified).They do share a handful of common parts (think the CDIMMs) to cut down on support costs.
plonk420 - Friday, November 6, 2015 - link
can you run an x264 benchmark on it?? x)FunBunny2 - Friday, November 6, 2015 - link
"The z10 processor was co-developed with and shares many design traits with the POWER6 processor, such as fabrication technology, logic design, execution unit, floating-point units, bus technology (GX bus) and pipeline design style, i.e., a high frequency, low latency, deep (14 stages in the z10), in-order pipeline." from the Wiki.Yes, the z continues the CISC ISA from the 360 (well, sort of) rather than hardware RISC, but as Intel (amongst others) has demonstrated, CISC ISA doesn't have to be in hardware. In fact, the 360/30 (lowest tier) was wholly emulated, as was admitted then. Today, we'd say "micro-instructions". All those billions of transistors could have been used to implement X86 in hardware, but Intel went with emulation, sorry micro-ops.
What matters is the underlying fab tech. That's not going anywhere.
FunBunny2 - Friday, November 6, 2015 - link
^^ should have gone to KevinG!!Kevin G - Saturday, November 7, 2015 - link
The GX bus in the mainframes was indeed shared by POWER chips as that enabled system level component sharing (think chipsets).However, attributes like the execution unit and the pipeline depth are different between the POWER6 and z10. At a bird's eye view, they do look similar but the implementation is genuinely different.
Other features like SMT were introduced with the POWER5 but only the most recent z13 chip has 2 way SMT. Features like out-of-order execution, SMT, SIMD were once considered too exotic to validate in the mainframe market that needed absolute certainty in its hardware states. However, recent zArch chips have implemented these features, sometimes decades after being introduced in POWER.
The other thing is that IBM has been attempting to get get more and more of the zArch instruction set to be executed by hardware and no microcode. Roughly 75% to 80% of instructions are handled by microcode (there is a bit of a range here as some are conditional to use microcode).
JohanAnandtech - Saturday, November 7, 2015 - link
I believe that benchmark uses about 8 threads and not very well either? Secondly, it is probably very well optimized for SSE/AVX. So you can imagine that the POWER8 will not be very good at it, unless we manually optimize it for Altivec/VSX. And that is beyond my skills :-)UrQuan3 - Monday, December 21, 2015 - link
I'm sure no one is still reading this as I'm posting over a month later, but...I tested handbrake/x264 on a bunch of cross-platform builds including Raspberry Pi 2. I found it would take 24 RPi2s to match a single i5-4670K. That was a gcc compiled handbrake on Raspbian vs the heavily optimized DL copy for Windows. Not too bad really. Also, x264 seems to scale fairly well with the number of cores. Still, POWER8 unoptimized would be interesting, though not a fair test.
BTW, I'd encourage you to use a more standard Linux version than 6-month experimental little-endian version of Ubuntu. The slides you show advertise support for Ubuntu 14.04 LTS, not 15.04. For something this new, you may need the latest, but that is often not the case.
stun - Friday, November 6, 2015 - link
@Johan You might want to fix "the platform" hyperlink at the bottom of page 4. It is invalid.JohanAnandtech - Friday, November 6, 2015 - link
Thanks and fixed.Ahkorishaan - Friday, November 6, 2015 - link
Couldn't read past the graphic on page 1. It's 2015 IBM, time to use a font that doesn't look like a toddler's handwriting.xype - Sunday, November 8, 2015 - link
To be fair, it seems that the slide is meant for management types… :PJtaylor1986 - Friday, November 6, 2015 - link
Using decimals instead of commas to denote thousands is jarring to your North American readers.Mondozai - Friday, November 6, 2015 - link
That's too bad. Over 90% of the world population exists outside of it and even if you look at the HPC market, the vast majority of that is, too.The world doesn't revolve around you. Get out of your bubble.
bji - Friday, November 6, 2015 - link
He never claimed the world revolved around him, he just made a true statement that may be worth consideration. Your response is unnecessarily hostile and annoying.I would expand Jtaylor1986's statement: I believe that most if not all native English speaking populations use commas for thousands grouping in numbers. Since this site is written in English, it might be worthwhile to stick to conventions of native English speakers.
It's possible that there are many more non-native English speakers reading this site who would prefer dots instead of commas, but I doubt it. Only the site maintainers would know though.
Jtaylor1986 - Friday, November 6, 2015 - link
You read my mind :)mapesdhs - Tuesday, November 10, 2015 - link
Talking to numerous people around Europe about tech stuff, I can't think of any nation from which someone used the decimal point in their emails instead of a comma in this context. I'd assumed the comma was standard for thousands groupings. So which non-US countries do use the point instead? Anyone know?lmcd - Friday, November 6, 2015 - link
Cool on the rest of the world part, but the period vs comma as delimiters in the world numeric system ARE backward. In language (universal, or nearly), a comma is used to denote a pause or minor break, and a period is used to denote the end of a complete thought or section. Applied to numerics, and you end up with the American way of doing it.^my take
JohanAnandtech - Saturday, November 7, 2015 - link
Just for the record, this was not an attempt to nag the US people. Just the mighty force of habit.ZeDestructor - Saturday, November 7, 2015 - link
For future use: just use a space for thousands seperation (that's how I do it on anything that isn't limited to a 7seg-style display), and confuse readers by mixing commas and periods for decimals :Ptygrus - Sunday, November 8, 2015 - link
I like to use a fullstop for the decimal point, an apostrophe for the thousands separator, a comma for separating items in the list, don't start a sentance with a digit.One list of numbers may be : 3'500'000, 45.08, 12'500.8, 9'500. Second list : 45'000, 15'000, 25'000. We use apostrophes when we contract words like don't so why not use it for contracting numbers where we would otherwise have the words thousand, millions, billions etc ?
mapesdhs - Tuesday, November 10, 2015 - link
I have a headache in my eyeballs! :Dws3 - Friday, November 6, 2015 - link
North America is on the majority side on this issue. Asia, in particular, is almost completely on the side of using a dot as the decimal separator and a comma to put breaks in long numbers.Get with the program Europe. The world doesn't revolve around you!
JohanAnandtech - Saturday, November 7, 2015 - link
Ok, Europe adopt the dot, but maybe the US can adopt the metric system like the rest of the world? :-)bitaljus - Saturday, November 7, 2015 - link
or better yet recode this site to see from what country u are visiting and use the appropriate denote thousands symbol, appropriate metric system and even time for the viewer. how meany time i was irradiated when i need to google something like this just to make it in local. and this is easy to do in the site natively. (P.S. Sorry for bad English, not native)nils_ - Wednesday, November 11, 2015 - link
The browser already sends a header with Accept-Language, which should be the preferred way to for the web site to determine locale. For example, I live in Germany so my browser will send en, en_UK, en_US, de and DE (this can be set somewhere in the settings). Now you can determine the language and other localisation based upon that, and there are tools where you can set the locale to then display dates, times etc.. based n that.Many sites instead use Geolocation based on the IP address, which can be really awkward when you travel to a country where you can't read the language.
ZeDestructor - Saturday, November 7, 2015 - link
Even within EU it's not consistent.. UK uses commas for thousands, dot for decimal (and that's why the US and most anglophone countries use the same setup), Germany, Netherlands and France on the other hand favour dots for thousands, commas for decimal, so you see it used there.. and in a great number of their former colonies where the language and culture has stuck.nils_ - Wednesday, November 11, 2015 - link
Dates / Time are even funnier, that's also extremely inconsistent and can be very misleading. Think for examle a date like 12/11/2015, in some countries it'll mean the 12th of November while in others it will mean the 11th of December, and sometimes even the 2015th of November in 11 AD ;)I remember reading on the GTA IV in-game "Internet" a travel guide to Europe that said the months here have 12 days but there are 30 months a year or something to that effect ;)
powchie - Friday, November 6, 2015 - link
let Johan do more reviews. been reading his stuff from late 90's and consider him the best then followed by Anand.Ryan Smith - Saturday, November 7, 2015 - link
Trust me when I say that if I could whip Johan any harder and make him work any faster I would be doing just that.;-)juhatus - Saturday, November 7, 2015 - link
When the Balrog showed up for work he had everyone fired.ruthan - Friday, November 6, 2015 - link
In summary should be something about software support, raw power is here, but software stack isvery limited.
PowerCPU is good for old legacy apps - SAP, Oracle etc.. but otherwise its dead end.
I would like to see comparision of IBM LPAR virtualization against Xeon Vmware solution, or Oracle / MySQL benchmark on power vs. Oracle benchmark on Xeon.
Server with Power even cant to run Crysis, or maybe some QEMU magic..
ruthan - Friday, November 6, 2015 - link
Pleas Ad edit button like other civilized sites, i was hurry when i wrote it.Michael Bay - Saturday, November 7, 2015 - link
You are probably still hurry.Or just not civilized enough.
JohanAnandtech - Saturday, November 7, 2015 - link
I have to disagree with "only old legacy". One of things I really want to tackle is running Apache Spark on POWER. Spark is one of the most exciting Big Data tools, it is a very modern piece of software. IBM claims that the POWER8 is very good at it, and I want to check that.Jake Hamby - Friday, November 6, 2015 - link
Very interesting review! I've been a PowerPC fan for many years. I even bought a used PowerMac Quad G5 a few years ago to hack on FreeBSD for PowerPC with (much cheaper than the latest gear).My only suggestion is that I would love to see you run the same benchmarks with big-endian Linux, since the entire stack is so much more mature for PPC than LE Linux, which as you mention wasn't even supported for many years.
Anyone running Java workloads in particular has no business using LE Linux when Java itself uses big-endian data formats, and IBM has 15+ years of tuning the JDK for big-endian Linux & AIX.
TL;DR is the biggest advantage of LE Linux is that it's easier to port large custom apps that were written for x86 and have too many byte ordering issues to fix. The original motivation to make the PowerPC architecture bi-endian instead of big-endian was the hope of a Windows NT port. When Apple went their own way with hardware, and IBM focused on servers, little-endian mode disappeared. It's good that POWER8 supports LE mode again, for customers who really need it, but it's far from the normal mode.
PS. I've been working on fixing bugs in Clang/LLVM for PowerPC arch (32/64-bit). FreeBSD recently switched from GCC 4.2.1 (final GPLv2 version) to Clang as the default system compiler on x86, but LLVM has some code gen bugs for PowerPC that I'm hoping to squash. For now, it doesn't work well enough for me to recommend trying to use Clang as an alternative to GCC for POWER8 benchmarking. Certainly not for little-endian mode.
Jake Hamby - Friday, November 6, 2015 - link
BTW, the name of the instruction set architecture is still PowerPC, even though IBM's chips are the POWER series. The predecessor architecture was named POWER, so I always write PowerPC to avoid confusion when referring to the instruction set. The PowerPC 970MP chips in my Quad G5 (2 x dual-core) are a derivative of POWER4.ZeDestructor - Saturday, November 7, 2015 - link
That would be incorrect actually (since they changed since in 2006).The ISA is (currently) named the Power ISA (previously "PowerPC ISA", the "ISA" bit is quite important to denote hardware architecture vs ISA) (with the current being Power ISA 2.07 B)
Underneath each ISA, there are a variety of designs that all have nice, different names, from POWER1-8, PowerPC (including the various variants used by Apple, like the G5/970), Power-PC-AS, Cell, most LSI controllers (mostly PowerPC 440 (Power ISA 2.03) based, afaik) etc.
Source: https://en.wikipedia.org/wiki/Power_Architecture
tipoo - Friday, November 6, 2015 - link
I wish I had a what-if machine to see what IBM would be making had they stayed in the consumer space (well, discounting some consoles they're in - currently only the Wii U on an ancient PowerPC 750 modified for tri-core). And how chunky that PowerBook G5 would have been :Phttp://forums.macrumors.com/attachments/powerbook_...
DanNeely - Friday, November 6, 2015 - link
Probably they'd've ended up making architectural tradeoffs that made their cores a lot more like Intel. As it is, they can optimize their designs for very high power isn't a problem because the power cost is a small fraction of the TCO on one of their monster servers; and a relatively minor concern for consoles (ie just dropping the core count gets them desktop CPU level thermals which are good enough). If they were still selling to Apple, they'd need to be well optimized for performance at only a few watts/core for laptops. Huge L1 caches and massive SMT would be gone because they'd devour battery power to little benefit on systems that generally functioned at very low average CPU loads vs on a mega server or mainframe where if you're not pushing enough work onto it to keep it at a high load level you're doing it wrong.Jake Hamby - Friday, November 6, 2015 - link
Yep. It feels a lot like Apple's own 64-bit ARM cores have approached the old G5 (PPC 970) from the other end of the power envelope.Kevin G - Saturday, November 7, 2015 - link
While Apple's engineers were given the task of a PowerBook G5, they new it could never happen due to thermals and a very arcane chipset. Case in point, the PowerPC 970 could not boot itself: it needed a service processor to calibrate and initialize the frontside bus for the processor before it could take control. Justifiable for servers but unnecessary for a consumer laptop.The expected Powerbook G5's were supposed to be using PA-Semi chips. Due to IBM not meeting Apple's goals, they switched to Intel and the PA-Semi deal fell through with it. However, their dealings with Apple did lead to Apply buying them out a few years later to help design custom ARM SoCs and eventually the custom ARM cores used in the iPhone/iPad of today.
JohanAnandtech - Saturday, November 7, 2015 - link
Would love to hear some thoughts on what the possible problems could arise if we rerun our tests on BE linux. Because our best benchmarks are all all based upon some data stored on our x86 fileservers - so they are probably stored in LE.Kevin G - Saturday, November 7, 2015 - link
If all you do is just mount the network volume to use the data, then likely nothing at all. While binaries do have to be modified, the file systems themselves are written to store data in a single consistent manner. If you're wondering more if there would be some overhead in translating from LE to BE to work in memory, conceptually the answer is yes but I'd predict it be rather small and dwarfed by the time to transfer data over a network. I'd be curious to see the results.Ultiamtely I'd be more concerned with kernel modules for various peripherals when switching between LE and BE versions. Considering that POWER has been BE for a few generations and you did your initial testing using LE, availability shouldn't be an issue. You've been using the version which should have had the most problems in this regard.
spikebike - Friday, November 6, 2015 - link
So basically power is somewhat competitive with intel's WORST price/perf chips which also happen to have the worst memory bandwidth/CPU. Seems nowhere close for the more reasonable $400-$650 xeons like the D-1520/1540 or the E5-2620 and E5-2630. Sure IBM has better memory bandwidth than the worst intels, but if you want more memory bandwidth per $ or per core then get the E5-2620.JohanAnandtech - Saturday, November 7, 2015 - link
It is definitely not an alternative for applications where performance/watt is important. As you mentioned, Intel offers a much better range of SKUs . But for transactional databases and data mining (traditional or unstructured), I see the POWER8 as very potent challenger. When you are handling a few hundreds of gigabytes of data, you want your memory to be reliable. Intel will then steer you to the E7 range, and that is where the POWER8 can make a difference: filling the niche between E5 and E7.nils_ - Wednesday, November 11, 2015 - link
Especially if you're running software that doesn't easily scale out very well these are very competitive. And nowadays even MySQL will scale-up nicely to many, many cores.Gigaplex - Friday, November 6, 2015 - link
"Less important, but still significant is the fact that IBM uses SAS disks, which increase the cost of the storage system, especially if you want lots of them."The Dell servers I've used had SAS controllers, and every SAS controller I've dealt with supported using SATA drives. I'm pretty sure SATA compatibility is in the SAS specification. In fact, the Dell R730 quoted in this review supports SAS drives. There shouldn't be anything stopping you from using the same drives in both servers.
JohanAnandtech - Saturday, November 7, 2015 - link
You are absolutely right about SATA drives being compatible with a SAS controller. However, afaik IBM gives you only the choice between their own rather expensive SAS drives and SSDs. And maybe I have looked over it, but in general DELL let you only chose between SATA and SSDs. And this has been the trend for a while: SATA if you want to keep costs low, SSDs for everything else.TomWomack - Sunday, November 8, 2015 - link
And mounting a storage server made out of commodity hardware over a couple of lanes of 10Gbit Ethernet if you don't want to pay the exotic-hardware-supplier's markup on disc.Gunbuster - Friday, November 6, 2015 - link
SAP and IBM AIX servers... I guess if you want to blow out your entire IT budget in once easy decision...Jake Hamby - Friday, November 6, 2015 - link
I forgot to mention: VMX is better known as AltiVec (it's also called "Velocity Engine" by Apple). It's a very nice SIMD extension that was supported by Apple's G4 (Motorola/Freescale 7400/7450) and G5 (IBM PPC 970) Macs, as well as the PPC game consoles.It would be interesting to compare the Linux VMX crypto acceleration to code written to use the newer native AES & other instructions. In x86 terms, it'd be like SSE-optimized AES vs. the AES-NI instructions.
Oxford Guy - Saturday, November 7, 2015 - link
I had a dual 450 MHz G4 system and AltiVec was quite amazing in iTunes when doing encoding. Between the second processor and the AltiVec putting things into ALAC was very fast (in comparison with other machines at the time like the G3 and the AMD machines I had).JohanAnandtech - Saturday, November 7, 2015 - link
suggestions on how to to do this? OpenSSL 1.02 will support the build in crypto accelerator, but I am not sure on how I would be able to see if the crypto code uses VMX.SarahKerrigan - Monday, November 9, 2015 - link
Compile with -qreport in XL C/C++.Oxford Guy - Saturday, November 7, 2015 - link
Typo on page 2:The resuls are that Google is supporting the efforts and Rackspace has even build their own OpenPOWER server called "Barreleye".
Ryan Smith - Saturday, November 7, 2015 - link
Thanks.iwod - Saturday, November 7, 2015 - link
In terms of 100, POWER Software Ecosystem manage to scale from 10 to 20, so that is a 100% increase but still very very low. Will we see POWER CPU / Server that is cheap enough to compete with Xeon E3 / E5, where most of the volume are? Compared to E7 is like comparing Server CPU for the 10% of the market.Intel will be moving to 14nm E7, I don't see anyone making POWER CPU at 14nm anytime soon.
Intel DC business are growing, and it desperately need a competitor, such as POWER to combat E7 and AMD Zen from the bottom.
Frenetic Pony - Saturday, November 7, 2015 - link
Nice review! It just confirms my question however of "What does IBM do?" Seriously, what do they do anymore? All I see are headlines for things that never come out as actual products. Their servers suck up too much power per watt, they don't have their own semi conductor foundries, their semi conductor research seems like a bunch of useless paper tiger stuff, their much vaunted AI is better at playing Jeapordy than seemingly any real world use.Countdown to complete IBM bankruptcy/spinoff/selloff is closer than ever.
ws3 - Saturday, November 7, 2015 - link
Since the dawn of computing, IBM has been in the business of providing solutions, rather than merely hardware. When you buy IBM you pay a huge amount of money, and what you get for that is support, with some hardware thrown in.Obviously this only appeals to wealthy customers who don't have or don't want to have an internal support organization that can duplicate what IBM offers. It seems to me that the number of such customers is decreasing over time, but as long as the US government is around, IBM will have at least one customer.
xype - Sunday, November 8, 2015 - link
They make 2-5 Billion dollars of profit per quarter. "Countdown to complete IBM bankruptcy/spinoff/selloff is closer than ever." my ass.PowerTrumps - Sunday, November 8, 2015 - link
Pretty fair and even handed review; don't agree with it all and definitely feel there is room to learn and improve. Btw, full disclosure, I am a System Architect focusing on Power technology for a Business Partner.With regard to compilers I would suggest IBM's SDK for Linux on Power & Advanced Tool Chain (ATC) provide development tools and open source optimized dev stack (ie gcc) for POWER8. Details at: https://www-304.ibm.com/webapp/set2/sas/f/lopdiags... and https://www.ibm.com/developerworks/community/wikis...
MySQL is definitely relevant but with the new Linux distro's packaging MariaDB in place of MySQL I would have liked to see an Intel vs Power comparison with this MySQL alternative. MariaDB just announced v10.1 is delivering over 1M queries per second on POWER8. https://blog.mariadb.org/10-1-mio-qps/
A commenter asked about Spark with POWER8. This blog discusses how it performs vs Intel. https://www.ibm.com/developerworks/community/blogs...
In addition to the commercial benchmarks often quoted such as SPEC, SAP and TPC like this SAP HANA result with SUSE on POWER8 ; SAP BW-EML (ie HANA) shows tremendous scaling with POWER8. http://www.smartercomputingblog.com/power-systems/... many of the ISV's have produced their own. I have seen results for PostgreSQL, STAC (http://financial.mcobject.com/press-release-novemb... Redis Labs, etc.
Benchmarks are great, all vendors do them and most people realize you should take them with a grain of salt. One benefit of Power servers when using PowerVM, its native firmware based hypervisor is that it delivers tremendous compute efficiency to VM's. On paper things like TDP seem higher for Power vs Intel (especially E5_v3 chips) but when Power servers deliver consolidation ratio's with 2-4X (and greater) more VM's per core the TCA & TCO get real interesting. One person commented how SAP on Power would blow out a budget. It does just the opposite because how you can run in a Tier-2 architecture obtaining intra-server VM to VM efficiencies, compute efficiencies with fewer cores & servers which impacts everything in the datacenter. Add in increased reliability & serviceability features and you touch the servers less which means your business is running longer.
And for more details on the open platform or those based on the OpenPOWER derivative using the "LC" designator such as S822LC in contrast to the S822L used as the focus in this article. http://www.smartercomputingblog.com/power-systems/... and http://businesspartnervoices.com/ibm-power-systems...
JohanAnandtech - Sunday, November 8, 2015 - link
Great feedback. We hope to get access to another POWER8(+) server and build further upon our existing knowledge. We have real world experience with Spark, so it is definitely on the list. The blog you linked seems to have used specific SPARK optimization for POWER, but the x86 reference system looks a bit "neglected". A real independent test would be very valuable there. The interesting part of Spark is that a good benchmark would be also very relevant for the real world as peak performance is one of the most important aspects of Spark, in contrast with databases where maximum performance is only a very small part of the experience.About MySQL, people have pointed out that the 5.7 version seems to scale a lot better, so that is together with MariaDB also on my "to test" list. Redis does not seem relevant for this kind of machine, it is single-threaded, almost impossible to test 160 instances.
The virtualization part is indeed one of the most interesting parts, but it is a benchmarking nightmare. You got to keep response times at more or less the same levels while loading the machine with more and more VMs. We did that kind of testing until 2 years ago on x86, but it was very time consuming and we had a deep understanding on how vSphere worked. Building that kind of knowledge on PowerVM might be beyond our manpower and time :-).
jesperfrimann - Monday, November 9, 2015 - link
Well, I think you should kick Franz Bourlet, for not hooking you up with with a IBM technical Advocate who actually knew the technology. Such a person could have shown you the robes and helped you understand the kit better. Again Franz is a sales guy.IMHO selecting Ubuntu as the Linux distro, did not help you. It's new to the POWER platform and does not have the same robustness as for example SLES which have been around for 10+ years on POWER.
The fact that you are getting better results using gcc generated code rather than xLC, shows me that something is not right.
And that the IBM JDK isn't working is well also an indicator that something is now right.
IMHO selecting Ubuntu, did not make Things easier for you Guys.
And for really optimized code you need to install and use High performance math libraries for POWER (MASS), which is an addon math library.
And AFAIR having 8 memory modules, only enables half the memory bandwidth of the system.
So IMHO IBM didn't help you make their system look good.
But again that is what you get when you get rid of all the clever people :)
// Jesper
nils_ - Wednesday, November 11, 2015 - link
You can always rent a box at OVH, they offer a huge chunk of an OpenPower System, albeit virtualized through Runlabs.stefstef - Sunday, November 8, 2015 - link
compared to the pentium 4 the mips r16k with loads of l3 cache was a bzip2 beast, outperforming the pentium 4 which ran at twice the clock speed and more. despite that the usage of zip programs is what these server processors are build.mapesdhs - Tuesday, November 10, 2015 - link
Just curious, do you know of any comparative results anywhere for bzip2 on old MIPS vs. other CPUs? It's not something I've seen mentioned before, at least not with respect to SGIs, but perhaps I can run som tests the next time I obtain a quad-R16K/1GHz (16MB L2) Tezro. Best I have at is only an R16K/900MHz (8MB L2) single-CPU Fuel and various configs of Tezro and Onyx350 from 4 to 16x 700MHz with 8MB L2. Just a pity SGI never got to employ multi-core MIPS (it was planned, but alas never happened).Oddly, back when current, MIPS' real strength was fp. Over time it fell behind badly for general int, though for SGI's core markets that didn't really matter ("It's the bandwidth, stupid!" - famous quote from Mashey IIRC). MIPS could have caught up with MDMX and MIPS V ISA, especially with the initially intended merged Cray vector stuff, but again that all fell away once the design talent moved to Intel in 1996/7.
Ian.
Freen the merciless - Sunday, November 8, 2015 - link
Heh! Sparc T5 eats Xeon and power for breakfast.kgardas - Monday, November 9, 2015 - link
I guess you mean T7 with SPARC M7 inside and not T5. If so, then yes, M7 looks quite capable, but unfortunately provides horrible price/performance ratio. POWER8 box starts at ~6.5k $ while T7-1 on ~40k $. So on SPARC front we'll need to see if Oracle is going to change that with Sonoma chip.Michael Bay - Monday, November 9, 2015 - link
In parallel only.aryonoco - Tuesday, November 10, 2015 - link
Thank you Johan for this amazingly well written and well researched article.I have to agree with a few people here that question your choice of using LE Ubuntu to test. Traditionally people who use Linux on POWER use SUSE, and some use RHEL, but Ubuntu? Nothing against them, and I love apt, but it's just not a mature platform.
Try with something more representative such as BE SLES and you will find a vastly different types ecosystem maturity.
But thanks again, and also thanks to AT for caring about such subjects and publishing these tests.
JohanAnandtech - Wednesday, November 11, 2015 - link
Thank you for taking the time to write up some constructive feedback. I have years of experience with ubuntu and linux and I wanted to play it safe. Running benchmarks on "new" hardware with a new ISA (from my perspective) is pretty complex. C-ray and 7-zip are the only exceptions, but most real server apps (NAMD, ElasticSearch, Spark) depends on many layers of software.In theory the OS/ distro is more important to get applications working than the ISA. In practice, it might have been better to bet on the distro with the most maturity and adapt our scripts and installation procedures to Suse.
But as soon as I get the chance, I'll try out BE suse or redhat on a POWER system.
mapesdhs - Tuesday, November 10, 2015 - link
Johan,A minor point, please note my home page for C-ray is here:
http://www.sgidepot.co.uk/c-ray.html
Blinkenlights is just a mirror, and not the primary mirror either (that would be the vintagecomputers site).
Btw, it's a pity you didn't use the same image sizes & settings as used on the main c-ray site, because then I could have included the results on my page (ie. 'sphfract' at 800x600, 1024x768 with 8X oversampling, and 7500x3500), or did you just use the same settings that Phoronix employs?
Also, John Tsiombikas, the guy who wrote C-ray, told me some interesting things about the test and how it works (info included on the page), most especially that it is highly vulnerable to compiler optimisations which can produce results that are even less realistic than real life workloads. I'm glad thought that you did at least use the sphfract test, since at a sensible resolution or with oversampling it easily pushes the test out of just L1 (the 'scene' test is much smaller). But yeah, overall, c-ray was never intended to be used as a benchmark, it's just taken off somehow, perhaps because the scanline method of threading makes it scale very well.
Hmm, I really must sort out the page formatting one of these days, and move the most complex test tables to the top. Never seem to find the time...
Thanks!!
Ian.
PS. I always obtained the best results by having more threads than the no. of cores/CPUs, or is this something which doesn't work with non-MIPS systems?
JohanAnandtech - Wednesday, November 11, 2015 - link
I did not know you used 7500x3500, my testing was inspired on what the rest of the benchmarking community (Phoronix, Serverthehome) was using (obviously, 1024x768 is too small for current servers).http://www.anandtech.com/show/9567/the-power-8-rev...
This answers your question about threads, right?
JohanAnandtech - Wednesday, November 11, 2015 - link
Oh yes, changed the link. Thanks for the feedback!mapesdhs - Thursday, November 12, 2015 - link
Most welcome! And I really should move the more complex tests to the top of the page...Oh, my wording about threads was not what I'd intended. What I meant was, the no. of threads being larger than the supported no. of hardware threads. Thus, for a 12-core Power8 with 8 threads per core, try using 192 or 384 threads, instead of just the nominal 96 one might assume would make sense.
Ian.
MB13 - Wednesday, November 11, 2015 - link
POWER8 is full of innovation and brings change! An S812LC only costs $6,595 from IBM's external website! http://www-03.ibm.com/systems/power/hardware/s812l...The Power scale out boxes will save on your running and software costs as you can reduce your software licensing and server footprint.
With the OpenPOWER Foundation, you now have companies such as Tyan and Wistron who also create their own POWER8 servers and sell them independently of IBM. If you have not looked at The OpenPOWER Foundation and the innovation it brings through community and collaboration, your missing out big time!
There is change! Don't get left behind!
MB13 - Wednesday, November 11, 2015 - link
and don't forget - POWER8 runs Little Endian and support the latest versions of RedHat, SUSE and Ubuntu!The OpenPOWER servers are Linux only!
Gasaraki88 - Wednesday, November 11, 2015 - link
It's funny how this article is trying to "sell" me the system but I'm still not impressed. Costs more, less performance, and uses more power at idle and load than the Intel system.nils_ - Thursday, November 12, 2015 - link
What I found the most off putting is that you have to do a lot of work to get some things running with Linux. That's a big cost factor.nils_ - Thursday, November 12, 2015 - link
Having a lot of software that isn't really well ported is probably going to remain a problem for Power8 for years to come since so few people have access to these kinds of systems and the cost is prohibitive. The great thing with x86 and ARM is that you can use it at home/work pretty easily without shelling out a lot of money. On x86 you can be sure if your software builds locally and runs locally it will also run on your server.svj - Thursday, November 12, 2015 - link
Well written articles.1. I submit that the headline is misleading. Intel x86 does not compete with POWER at the high end. POWER L & LC line of servers are comparable to x86 based servers. IBM POWER is taking the battle to Intel's home turf.
2. The analysis leaves out cost of SW. Many organizations use commercial software which are priced per core. If POWER can do with 10 cores what Intel does with 18 cores, that means HUGE savings.
3. OPEN POWER is a huge move. I think the market will start seeing the results soon.
alpha754293 - Thursday, November 12, 2015 - link
An excellent review as always Johan. (haha...to zeeBomb. It is my understanding that Johan doesn't post probably as often as he might have otherwise like to because testing servers/enterprise computing solutions takes a LOT longer than testing/benching consumer-level systems. Some of the HPC applications that I run takes hours to days for each run, so when you're running it, you're running those tests over and over again, and before you know it, a month has gone by (or you've ran out of time with the system) or you have to purposely cut it short so that you can test a variety of software.)It's unfortunate that IBM never ported AIX to x86 (unlike Solaris.) I think that there would be more people trying to get into it if the cost of entry (even just to learn) isn't so high. I've looked at getting an old POWER4 system before for that purpose, but by then, the systems are so old and slow that it's like "what's the point?" I think that IBM is literally pricing themselves into extinction (along with their entire hardware/software ecosystem). Unfortunately for many businesses, AIX POWER servers still run their mainframe/backend which means that if you want to get paid $100k+ outta college - go learn AIX on POWER. As the current generation of sysadmins are starting to age and retire out, and they're going to have a hard time finding qualified people, the only way eventually would be that they would have to pay top dollar just to attract people into the field. (Unless they decide to move everything over to the x86/Xeon/Linux world. But for some mainframes (like financial institutions), that's DEFINITELY easier said than don).
usernametaken76 - Thursday, November 12, 2015 - link
Technically this is not true. IBM had a working version of AIX running on PS/2 systems as late as the 1.3 release. Unfortunately support was withdrawn and future releases of AIX were not compiled for x86 compatible processors. One can still find a copy of this release if one knows where to look. It's completely useless to anyone but a museum or curious hobbyist, but it's out there.zenip - Friday, November 13, 2015 - link
...>--click here-Steven Perron - Monday, November 23, 2015 - link
Hello Johan,I was reading this article, and I found it interesting. Since I am a developer for the IBM XL compiler, the comparisons between GCC and XL were particularly interesting. I tried to reproduce the results you are seeing for the LZMA benchmark. My results were similar, but not exactly the same.
When I compared GCC 4.9.1 (I know a slightly different version that you) to XL 13.1.2 (I assume this is the version you used), I saw XL consistently ahead of GCC, even when I used -O3 for both compilers.
I'm still interested in trying to reproduce your results, so I can see what XL can do better, so I have a couple questions on areas that could be different.
1) What version of the XL compiler did you use? I assumed 13.1.2, but it is worth double checking.
2) Which version of the 7-zip software did you use? I picked up p7zip 15.09.
3) Also, I noticed when the Power 8 machine was running at full capacity (for me that was 192 threads on a 24 core machine), the results would fluctuate a bit. How many runs did you do for each configuration? Were the results stable?
4) Did you try XL at the less aggressive and more stable options like "-O3" or "-O3 -qhot"?
Thanks for you time.
Toyevo - Wednesday, November 25, 2015 - link
Other than the ridiculous price of CDIMMs the power efficiency just doesn't look healthy. For data centers leasing their hardware like Amazon AWS, Google AppEngine, Azure, Rackspace, etc, clients who pay for hardware yet fail to use their allocation significantly help the bottom line of those companies by reduced overheads. For others high usage is a mandatory part of the ROI equation during its period as an operating asset, thus power consumption is a real cost. Even with our small cluster of 12 nodes the power efficiency is a real consideration, let alone companies standardizing toward IBM and utilising 100s or 1000s of nodes that are arguably less efficient.Perhaps you could devise some sort of theoretical total cost of ownership breakdown for these articles. My biggest question after all of this is, which one gets the most work done with the lowest overheads. Don't get me wrong though, I commend you and AnandTech on the detail you already provide.
AstroGuardian - Tuesday, December 8, 2015 - link
It's good to have someone challenging Intel, since AMD crap their pants on regular basisdba - Monday, July 25, 2016 - link
Dear Johan:Can you extrapolate how much faster the Sparc S7 will be in your Cluster Benchmarking,
if the 2 on Die Infiniband ports are Activated, 5, 10, 20% ???
Thank You, dennis b.