Name: Real-world virtualization benchmarking: the best server CPUs compared
Item: Real-world virtualization benchmarking: the best server CPUs compared
Author: Johan De Gelas

Real-world virtualization benchmarking: the best server CPUs compared

by Johan De Gelas on 5/21/2009 3:00 AM EST

Posted in
IT Computing

Post Your Comment
Please log in or sign up to comment.

Comments Locked

66 Comments

Back to Article

binaryguru - Monday, June 1, 2009 - link
It seems to me, x86-based virutalization software is getting more and more complicated. Not only is x86 virtualization getting more complicated, it is getting more and more difficult to get reliable performance from it.

Let me explain my point.

The industry is clearly trying to do more with less hardware these days. Getting raw VM performance on commodity hardware is getting to a point where there is no predictable way to plan for an efficient VM environment.

Current VM technology is trying to simulate the flexibility and performance of mainframes. To me, this is clearly an impossible goal to achieve with the current or future x86 platform model.

All of the problems the industry is experiencing with VM consolidation does not exist on the mainframe. Running 4 'large' VMs for 'raw' performance. How about running 40 'large' VMs for 'raw' performance. Clearly, we all know that is impossible to achieve with current VM setups.

Now I'm not saying that virtuallization is a bad idea, it clearly is the ONLY solution for the future of computing. However, I think that the industry is going about it the wrong way. Server farms are becoming increasingly more difficult to manage, never mind the challenge of getting 100s of blade servers to play nice with each other while providing good processing throughput.

This problem has been solved about 20 years ago; and yet, here we are, struggling again with the "how can I get MORE from my technology investment" scenario.

In conclusion, I think we need to go back to utilizing huge monolithic computing designs; not computing clusters.
mikidutzaa2 - Friday, May 29, 2009 - link
Hello,

It would be useful (if possible) to have latency numbers/response times on the tests as well because rarely we are interested in throughput on our servers. What we usually care more is how long it takes the server to respond to user actions.

What is your opinion?
JohanAnandtech - Friday, May 29, 2009 - link
I agree. I admit it is easier for us or any benchmark person to use throughput as immediately comparable (X is 10% faster than Y) and you have only one datapoint. That is why almost

Responsetime however can only be understood by drawing curves relative to the current throughtput / User concurrency. So yes, we are taking this excellent suggestion into consideration. The trade off might that articles get harder to read :-).
mikidutzaa2 - Friday, May 29, 2009 - link
Looking forward to your new articles then, glad to hear :).

The articles don't necessarily have to be harder to read, you could put the detailed graphs on a separate page and maybe show only one response time for a "decent"/medium user concurrency.

Also, I would find interesting (if you have time) to have the same benchmarks with 2vcpu machines, I think this is a more common setup for virtualization. Very few people I think virtualize their most critical/highly used platforms - at least that's how we do it. We need virtualization for lightly used platforms (i.e. not very many users) but we are still very much interested in response time because the users perceive latency, not throughput.

So the important question is: if you have a virtual server (as opposed to a physical one) will the users notice? If so, by how much is it slower?

Thank you.
RobAm - Tuesday, May 26, 2009 - link
It's good to see some unbiased analysis with respect to virtualization. It's also especially interesting that your workloads (which look much more like real world apps my company runs as opposed to SPECjbb, vmark, vconsolidate) shows a much more competitive landscape than vmware and Intel portray. Also, doesn't vmware prohibit benchmarking without their permission. Did they give you permission? Has VMware called offering to re-educate you? :-)
Brovane - Tuesday, May 26, 2009 - link
I was hoping for a some benchmarks on the Xeon x7xxx CPU for the Quad Socket Intel boxes. We are currently have Dell R900's and we where looking at adding to our ESX cluster. We where debating between the R900 with Hex cores our Xeon x55xx series CPU's in the R710. I see the x55xx series where bench marked but nothing on the Xeon MP series unless I am missing that part of the article.
JohanAnandtech - Tuesday, May 26, 2009 - link
Expect a 24-core CPU comparison soon :-).
Brovane - Tuesday, May 26, 2009 - link
You also might want to a 12-core comparison also. We have found that with a 4-socket box that you usually run out of memory before you run out CPU power. With the R900 having 32-Dimm Sockets, the R900's we purchased last year have 64GB of RAM and just use 2x2.93Ghz CPU's we max memory before CPU easily in our environment. Since Vmware licensing and Data Center licensing is done per Socket we only populate 2 of the sockets with CPU's and this seems to do great for us. You basically double your licensing costs if you go with all 4 sockets occupied. Just a thought as to how sometimes virtualization is done in the real world. There is such a price premium for 8GB memory Dimm's it isn't worth it to put 256GB in one box with all 4 sockets occupied. The 4GB Dimm's did reach price parity this year so we were looking at going for 128GB of memory on our new R900's however Intel also released Hex-core so we still don't see much reason to occupy all 4 sockets.
yasbane - Tuesday, May 26, 2009 - link
I know positive feedback is always appreciated for the hard work put in but it seems very rare that we see any non-microsoft benchmarks for server stuff these days on Anandtech. Is there any particular reason for this...? I don't mean to carp but I recall the days when non-microsoft technologies actually got a mention on Anandtech. Sadly, we don't seem to see that anymore :(

Cheers
JohanAnandtech - Tuesday, May 26, 2009 - link
Yasbane, my first server testing articles (DB2, MySQL) were all pure Linux benches. However, we have moved on to a new kind of realworld benchmarks and it takes a while to master the new benchmarks we have introduced. Running Calling Circle and Dell DVD store posed more problems on Linux than on Windows: we have lower performance, a few weird error messages and so on. In our lab, about 50% of the servers are running linux (and odd machines is running OS-X and another Solaris :-) and we definitely would love to see some serious linux benchmarking again. But it will take time.

Xen benchmarks are happening as I write this BTW.
has407 - Sunday, May 24, 2009 - link
Thanks very much for the additional data points, and especially for providing details. Still digesting your data (thanks again!), but a few thoughts...

1. At the risk of being pedantic... Both VMark and vAplus scores are dimensionless. It would be better to avoid terms such as "faster" to describe them; IMHO, that has lead to distraction (or *cough* in some cases *cough* irrationality). Is a car that can move 5 people at 160KPH "faster" than a bus that can move 20 people at 80KPH? Maybe sticking to terms such as "throughput" or simply "performance" would be better.

2. While the geometric mean provides a nice single score, I hope you will continue to publish the detailed numbers that contribute to it (as done with VMark disclosures). The individual scores provide important clues as to whether a closer look is warranted, whether of the workload mix, the CPU, or the hypervisor.

For example, the sum of the workload (or arithmetic mean * 4) provides total overall throughput, which is an important indicator; in an ideal world that should match the geometric mean. A significant difference between those suggests a closer look is warranted.

E.g., Unless you have a workload mix that can soak up extra CPU cycles, the Xeon 5080 and to a lesser extent the Opteron 2222 don't look like good choices. For the 5080, the CPU-intensive OLAP VM contributes 60% to the result, whereas the others tend to be ~40-45%, and the difference between the geometric and arithmetic mean for the 5080 is 19%, whereas for the rest it's <14%.

3. Note what happens if you pull the CPU-intensive OLAP VM out of the picture. While I can't empirically test that, and I'm using a bit of a sledgehammer here... Eliminate it from the scoring and see what happens: the difference between the geometric and arithmetic mean drops to ~1% across the board.

Moreover, the ratio of the scores with and without the OLAP VM is quite constant, with a correlation > 0.999. The outliers again, but not by all that much, being the Xeon 5080 and the Opteron 2222, and to a lesser extent the Xeon L5350.

4. In short, I'm not sure what the addition of a CPU-intensive VM such as OLAP is adding to the picture, other than soaking up CPU cycles and some memory. A CPU-intensive VM is the easiest (or should be the easiest) for a hypervisor to handle, and appears to tell us little more than what idle time figures would tell us. In the case of the Xeon 5080 and Opteron 2222, it also appears to inflate their overall score (whether due to the processor or hypervisor, or more likely a combination of the two, is unclear).

5. That said, maybe it would be good to include a CPU-intensive VM in the mix, if for no other reason than to highlight those systems or hypervisors where that VM scores higher or lower than expected (e.g., the Xeon 5080 and Opteron 2222). However, I'd bet you can achieve the same result with a lot less work using a simpler synthetic CPU/memory-intensive test in the VM.

OTOH, maybe artificially driving CPU utilization towards 100% with such CPU-intensive VM's doesn't really tell us much more than we'd know without them--as IMHO my admittedly crude analysis suggests--and that vAplus might be a better indicator for those looking for clues as to appropriate workload allocation among virtualized systems, rather than those looking for a single magic number to quantify performance.
JohanAnandtech - Tuesday, May 26, 2009 - link
"However, I'd bet you can achieve the same result with a lot less work using a simpler synthetic CPU/memory-intensive test in the VM. "

That would eliminate the network traffic. While the "native" running database is not making the OS kernel sweat, the hypervisor does get some work from the network, and thus this VM influences the scores of the other VMs. It is not a gigantic effect but it is there. And remember, we want to keep control of what happens in our VMs. Once you start running synthetic benches, you have no idea what kind of instructions are run. SQL server is closed source too, but at least we know that the instructions which will be send to the CPU will be the same as in the real world.

We will of course continue to publish all the different scores so that our inquisitive readers can make their minds :-). Nothing worse than people who quickly gloss over the graphs and than start ranting ;-).

Thanks for the elaborate comment, although I am still not sure why you would remove the OLAP database. The fact that the 4 core machines (Dempsey, dualcore opteron) do not have a lot of cycles left for the other VMs illustrates what happens in an oversubscribed system where one VM demands a lot of CPU power.
has407 - Wednesday, May 27, 2009 - link
Johan -- My thought was not so much whether to get rid of the OLAP VM, than whether a simpler CPU-intensive VM would suffice, synthetic or otherwise. However, that's probably an academic question at this point, as you've already got it the mix. (And a question I probably spent too much time thinking-out-loud about in my post. :)

The other arguably more important questions are whether including CPU-intensive VM's (OLAP or synthetic) in order to drive CPU utilization to 100% easier--especially as it is 25% of the workload--provides significant additional information, and whether is more representative than the VMark approach.

That's a much harder question to answer, and far more difficult to model. Real-world benchmarks may be desirable and necessary, but they are not sufficient; a representative and real-world workload mix is also needed. What constitutes a "representative and real-world" mix is of course the Big Question.

I'll spare everyone more thinking-out-loud on that subject :), other than to say that benchmarks should help us understand how to characterize and model to more accurately predict performance. Without that we end up with lots of data (snapshots of workload X on hardware Y), but little better formal or rigorous understanding as to why. (One area where synthetic- or micro-benchmarks can help provide insight, as much as they might be derided. And one reason IMHO why what passes for most benchmarking today contributes more noise than signal. But that's another subject.)

In any case, it's good to have vApus to provide additional data points and as a counterpoint to VMark. Thanks again. Looking forward to the next round of data.
has407 - Monday, May 25, 2009 - link
Sorry, fourth column in table labeled "B:GM" (duplicate of third column label) should be "B:AM".
has407 - Monday, May 25, 2009 - link
p.s. here's the numbers on which that post was based, calculated using your raw data...

A - With OLAP VM
B - Without OLAP VM
GM - Geometric mean
AM - Arithmetic mean

- A:GM -- geometric mean of all four VM's * 4
- A:AM -- arithmetic mean of all four VM's * 4 (or the sum of the individual scores).
- B:GM -- geometric mean of the three VM's excluding the OLAP VM * 3.
- B:AM -- arithmetic mean of the three VM's excluding OLAP * 3 (or the sum of the individual scores excluding the OLAP VM).

A:GM A:AM B:GM B:GM A:GM/B:GM
2.03 2.14 1.28 1.29 1.58 Dual Opteron 8389 2.9
2.45 2.54 1.60 1.60 1.54 Dual Xeon X5570 2.93
2.08 2.21 1.29 1.29 1.61 Dual Xeon X5570 2.93 HT off
1.87 1.99 1.16 1.17 1.61 Dual Xeon E5450 3.0
1.68 1.81 1.02 1.02 1.65 Dual Xeon X5365 3.0
1.12 1.22 0.68 0.68 1.66 Dual Xeon L5350 1.86
0.59 0.78 0.30 0.31 1.96 Dual Xeon 5080 3.73
0.82 0.96 0.45 0.46 1.80 Dual Opteron 2222 3.0

Correlation( A:GM, B:GM ): 0.9993

Hope that helps explains my conclusions.
solori - Friday, May 22, 2009 - link
I'm glad to see Johan's team has gone beyond the "closed" VMmark standard with a Windows-based benchmark and I hope this leads to more sanity-checking of results down the line. However, the first step is verifying the process before you get to the results. Here's an example of where you're leaving some issues dangling:

"However, the web portal (MCS eFMS) will give the hypervisor a lot of work if Hardware Assisted Paging (RVI, NPT, EPT) is not available. If EPT or RVI is available, the TLBs (Translation Lookaside Buffer) of the CPUs will be stressed quite a bit, and TLB misses will be costly."

This implies RVI is defaulted for 32-bit VM's. VMware's default for 32-bit virtual machines is BT (binary-translation) and not RVI, even though VROOM! tests show a clear advantage for RVI over BT for most 32-bit workloads. While you effectively discuss the affects of disabling RVI in the 64-bit case, you're unclear about "forcing" RVI in the 32-bit case. Are you saying that AMD-v and RVI are enabled for the 32-bit workloads by default? VMware's guidance states otherwise:

"By default, ESX automatically runs 32bit VMs (Mail, File, and Standby) with BT, and runs 64bit VMS (Database, Web, and Java) with AMD-V + RVI."

- VROOM! Blog, http://blogs.vmware.com/performance/2009/03/perfor...">http://blogs.vmware.com/performance/200...uation-o...

This guidance is echoed in the latest VI3.5 Performance Guide Release:

"RVI is supported beginning with ESX 3.5 Update 1. By default, on AMD processors that support it ESX Update 1 uses RVI for virtual machines running 64-bit guest operating systems and does not use RVI for virtual machines running 32-bit guest operating systems.

Although RVI is disabled by default for virtual machines running 32-bit guest operating systems, enabling it for certain 32-bit operating systems may achieve performance benefits similar to those achieved for 64-bit operating systems. These 32-bit operating systems include Windows 2003 SP2, Windows Vista, and Linux.

When RVI is enabled for a virtual machine we recommend you also?when possible?configure that virtual machine?s guest operating system and applications to make use of large memory pages."

- Performance Best Practices and Benchmarking Guidelines, VMware, Inc. (page 18)

Your chart on page 9 further indicates "SVM + RVI" for 32-bit hosts, but there is no mention of steps you took to enable RVI. This process is best described by the Best Practices Guide:

"If desired, however, this can be changed using the VI Client by selecting the virtual machine to be configured, clicking Edit virtual machine settings, choosing the Options tab, selecting Virtualized MMU, and selecting the desired radio button. Force use of these features where available enables RVI, Forbid use of these features disables RVI, and Allow the host to determine automatically results in the default behavior described above."

- Performance Best Practices and Benchmarking Guidelines, VMware, Inc. (page 18)

So, which is it: 32-bit without RVI or undocumented changes to the VMM according to VMware guidance? If it is the former, the conclusions are misleading (as stated); if the latter, such modifications should be stated explicitly since they do not represent the "typical" or "default" configuration for 32-bit guests. This oversight does not invalidate the results of the test by any means, it simply makes them more difficult to interpret.

That said, a good effort! You may as well contrast 32-bit w & w/o RVI - those results might be interesting too. I know you guys probably worked VERY hard to get these results out, and I'd like to see more, despite what "tshen83" thinks :-)

Collin C. MacMillan -- http://solori.wordpress.com">http://solori.wordpress.com
JohanAnandtech - Sunday, May 24, 2009 - link
Hi Collin,

I was under the impression that ESX now choses RVI+SVM automatically, but that might have been ESX 4.0. I am going to check again on monday, but I am 99.9% sure we have enabled RVI in most tests (unless indicated otherwise) as it is a best performance practice for the Opterons.
alpha754293 - Friday, May 22, 2009 - link
Another excellent, thorough, well researched article.

Thanks! :o)
JohanAnandtech - Sunday, May 24, 2009 - link
You are most welcome. Thx for letting us know!
knutjb - Monday, May 25, 2009 - link
Thanks for presenting another point of view. When I read the original article showing the new Xeons so far ahead, I was skeptical. Rarely does a company produce a product that is such a huge leap, not only over their competitors, over their own products too. When there is only one primary benchmark the results can be skewed. Also, the wide variety of software combinations is eyepopping so it is very time consuming to create a resonable balance using real databaeses for a different, but valid benchmark.

Thanks for the hard work, I look forward to reading more on this subject.
Bandoleer - Thursday, May 21, 2009 - link
I have been running Vmware Virtual Infrastructure for 2 years now. While this article can be useful for someone looking for hardware upgrades or scaling of a virtual system, CPU and memory are hardly the bottlenecks in the real world. I'm sure there are some organizations that want to run 100+ vm's on "one" physical machine with 2 physical processors, but what are they really running????

The fact is, if you want VM flexability, you need central storage of all your VMDK's that are accessible by all hosts. There is where you find your bottlenecks, in the storage arena. FC or iSCSI, where are those benchmarks? Where's the TOE vs QLogic HBA? Considering 2 years ago, there was no QLogic HBA for blade servers, nor does Vmware support TOE.

However, it does appear i'll be able to do my own baseline/benching once vSphere ie VI4 materializes to see if its even worth sticking with vmware or making the move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware.
But it would really be nice to see central storage benchmarks, considering that is the single most expensive investment of a virtual system.
duploxxx - Friday, May 22, 2009 - link
perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV.

ESX 3.5 does support Jumbo, iscsi offload adapters and no idea how you are going to gain 600% if iscsi is only about 15% slower then FC if you have decent network and dedicated iscsi box?????
Bandoleer - Friday, May 22, 2009 - link
"perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV. "

what you are calling functionality here are the same features that will not work in ESX4.0 in order to gain direct hardware access for performance.
Bandoleer - Friday, May 22, 2009 - link
The reality is I lost around 500MBps storage throughput when I moved from Direct Attached Storage. Not because of our new central storage, but because of the limitations of the driver-less Linux iSCSI capability or the lack there of. Yes!! in ESX 3.5 vmware added Jumbo frame support as well as flow control support for iSCSI!! It was GREAT, except for the part that you can't run JUMBO frames + flow control, you have to pick one, flow control or JUMBO.

I said 2 years ago there was no such thing as iSCSI HBA's for blade servers. And that ESX does not support the TOE feature of Multifunction adapters (because that "functionality" requires a driver).

Functionality you lose by moving to hyperV? In my case, i call them useless features, which are second to performance and functionality.
JohanAnandtech - Friday, May 22, 2009 - link
I fully agree that in many cases the bottleneck is your shared storage. However, the article's title indicated "Server CPU", so it was clear from the start that this article would discuss CPU performance.

"move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware. "

Can you back that up with a link to somewhere? Because the 600% sounds like an MS Advertisement :-).
Bandoleer - Friday, May 22, 2009 - link
My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish.

I wasn't ranting at the article, its great for what it is, which is what the title represents. I was responding to this part of the article that accidentally came out as a rant because i'm so passionate about virtualization.

"What about ESX 4.0? What about the hypervisors of Xen/Citrix and Microsoft? What will happen once we test with 8 or 12 VMs? The tests are running while I am writing this. We'll be back with more. Until then, we look forward to reading your constructive criticism and feedback.

Sorry, i meant to be more constructive haha...
JohanAnandtech - Sunday, May 24, 2009 - link
"My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish. "

Yes, please do. Very interested in to reading what you found.

"I wasn't ranting at the article, its great for what it is, which is what the title represents. "

Thx. no problem...Just understand that these things takes time and cooperation of the large vendors. And getting the right $5000 storage hardware in lab is much harder than getting a $250 videocard. About 20 times harder :-).
Bandoleer - Sunday, May 24, 2009 - link
I haven't looked recently, but high performance tiered storage was anywhere from $40k - $80k each, just for the iSCSI versions, the FC versions are clearly absurd.
solori - Monday, May 25, 2009 - link
Look at ZFS-based storage solutions. ZFS enables hybrid storage pools and an elegant use of SSDs with commodity hardware. You can get it from Sun, Nexenta or by rolling-your-own with OpenSolaris:

http://solori.wordpress.com/2009/05/06/add-ssd-to-...">http://solori.wordpress.com/2009/05/06/add-ssd-to-...
pmonti80 - Friday, May 22, 2009 - link
Still it would be interesting to see those central storage benchmarks or at least knowing if you will/won't be doing them for whatever reason.
JohanAnandtech - Friday, May 22, 2009 - link
We are definitely interesting in doing this, but of course we like to do this well. I'll update as soon as I can.
pc007 - Thursday, May 21, 2009 - link
I agree, CPU & RAM usage are not really bottlenecks in my experience. Processes hamering slow disk and making everything else slower is the main concern.
SeanG - Friday, May 22, 2009 - link
There are 300 million people in this country and you're surprised that some of them are ignorant/jerks/crazy? We're all supposed to be ashamed because not everyone from this country is mentally stable? It's insulting to people like me who care about this country to hear you talk about being ashamed over something that is a problem with humanity in general and not only in the US.
lopri - Thursday, May 21, 2009 - link
It is said to see such a fascist persona in this comment section of such a fascinating article. I feel ashamed as one residing in the U.S.
JohanAnandtech - Friday, May 22, 2009 - link
Don't be. Tshen must be the first US citizen that I have encountered that hates Belgians :-). All other US people I have met so far were very friendly. In fact, I am very much astonished how hospitable US people are. Sometimes we have only spoken over the phone or via e-mail with each other and the minute I arrive in the US, we are having a meal and chatting about IT. When you arrive in Silicon valley, one can only be amazed about the enormous energy and entrepreneurship this valley breathes.
tshen83 - Friday, May 22, 2009 - link
You are a slave, Johan, whether you realize it or not. The people in Silicon Valley are "nice to you" because they are in the process of negotiating a purchased piece of publication from you.

You don't know anything about the Silicon Valley nor are you qualified to talk about it. If anything is true, Silicon Valley is in the toilet right now with bankruptcies everywhere. The state is broke, with Arnold Schwarzenegger begging for Federal bailouts. The last two big "entrepreneurships" coming out of Silicon Valley: Facebook and Twitter are both advertising scams without a viable business model.

I don't hate Belgians. I do hate retards like you whether you come from Belgium or not.

[BANNED]

[FROM JARRED: We are proponents of free speech, but repeated name calling and insults with little to no factual information to back up claims will not be tolerated. There was worse, and I'm leaving this text so you can see how it started.]
tshen83 - Thursday, May 21, 2009 - link
As what I have expected, Johan, your sorry ass came up with a benchmark that invalidates VMmark.

On page 9 "Nehalem vs Shanghai" http://it.anandtech.com/IT/showdoc.aspx?i=3567&...">http://it.anandtech.com/IT/showdoc.aspx?i=3567&...

Where is the Nehalem vs Shanghai benchmarks? All I see is a chart pumping Opteron 8389.

Let me dissect the situation for you [EDITED FOR VULGARITY].

The 100% performance per watt advantage witnessed by the Nehalem servers was the result of 3 factors: triple channel DDR3 IMC, HyperThreading, and Turbo Boost. The fact that Opterons can no longer compete because they lack the raw bandwidth and the "fake Hyper Threaded" cores that performs like a real core.

What would AMD do in this situation? Of course, invalidate an industrially accepted benchmark by substituting it with a "paid third party" benchmark that isn't available to the the public. I wonder what kind of "optimizations" were done?

You know what killed the GPU market? HardOCP.com. That's right, they invalidated the importance of 3DMark by doing game by game FPS analysis. The problem with this approach is that the third party game developers really don't have the energy or resources to make sure that each GPU architecture is properly optimized for. As long as the games run about 30fps on both Nvidia and ATI GPUs, they are happy. What results from this lackluster effort is that there is no Frames per Second differentiation on the GPU vendors, causing prices to free fall and the idiots choosing an architecturally inferior ATI GPU that gave a similar FPS performance.

Same methodology can be applied here. Since the Opterons lack raw memory bandwidth and core count visible to the OS, why not have a benchmark that isn't threaded well enough, and stress on high CPU utilization situations where memory bandwidth and core count matter less? That is what this new benchmark is doing, hiding Opteron architectural difficiencies.

The reason why VMmark stresses high number of VMs is to guage the hardware acceleration of VM switching. Having lesser number of VMs doing high CPU workload helps the worse performer(Opteron) by hiding and masquerading the performance difficiency. Nobody runs 100 VMs on one physical machine, but the VMmark does show you a superior hardware implementation. Nobody really prevents AMD from optimizing their CPUs for VMmark.

Let me be even more brutal with my assessment of your ethics, Johan. Why do you feel you are qualified to do what you do? The people who actually know about hardware are doing the CPU designs themselves in the United States, so the Americans would be the first to know about hardware. When the CPU samples are sent to Taiwan for motherboard design, the Asians would be the second batch of people dealing with hardware. By the time hardware news got to freaking Europe(Fudzilla, The Inq), the information usually was fudged up to the wazoos by Wall Street analysts. Consider yourself lucky that the SEC isn't probing you [EDITED FOR VULGARITY] because you reside in Belgium.

So Johan, my suggestion for you personally, is that you should consider the morality of your publications. In today's day and age, every word you ever say is recorded for eternity. Thirty years from now, do you want people to call you a [EDITED FOR VULGARITY] for pumping an inferior architecture by fudging benchmark results? Of course, I personally run the same risks. What I can guarantee you is that by June of next year, Johan, you [EDITED FOR VULGARITY] would be pumping Via instead.
JohanAnandtech - Friday, May 22, 2009 - link
As long as you are not able to discuss technical matters without personal attacks, I won't waste much time on you. Leave the personal attacks out of your comments, and I'll address every concern you have.

But for all other readers, I'll show how shallow your attacks are (but they probably figured that one out a long time ago).

"Why not have a benchmark that isn't threaded well enough"
Yes, Tshen. In your world, Oracle and MS SQL server have few threads. In the realworld however...

"http://it.anandtech.com/IT/showdoc.aspx?i=3567&...">http://it.anandtech.com/IT/showdoc.aspx?i=3567&...
Where is the Nehalem vs Shanghai benchmarks? All I see is a chart pumping Opteron 8389. "

All other readers have seen a chart that tries to show how the benchmark reacts to cache size and memory bandwidth. All other readers understand that we only have one Nehalem Xeon, and that is a little hard to show empirically how for example different cache sizes influence the benchmark results.

Lastly, as long as I publish AMD vs Intel comparisons, some people will call me and Anandtech biased. This article shows that Nehalem is between 50 to 80% faster in typical server apps.
http://it.anandtech.com/IT/showdoc.aspx?i=3536&...">http://it.anandtech.com/IT/showdoc.aspx?i=3536&...
For some people that meant that we were biased towards Intel. In your case, we are biased towards AMD if Intel does not win by a huge percentage. For the rest of the world, it just means that we like to make our benches as realworld as possible and we report what we find.
whatthehey - Friday, May 22, 2009 - link
At least we can be grateful he's kind enough to include his IQ in his user name. You know what's interesting? tshen83 isn't exactly a common user name, and he happens to troll elsewhere:
http://www.google.com/search?hl=en&q=tshen83">http://www.google.com/search?hl=en&q=tshen83

"Thirty years from now, do you want people to call you a fucking asshole...?" No need for you to wait 30 years, tshen; we'll be happy to call you a fucking asshole right now. As the saying goes: if the shoe fits....
JarredWalton - Thursday, May 21, 2009 - link
Frankly, you make me sad to be an American - as though just because someone is located in Belgium they are not qualified to do anything with hardware? Let's see, Belgium has higher average salaries than the US, so they surely have to be less qualified. And with you as a shining example we can certainly tell EVERYONE in the US is more qualified than in Europe.

To whit, your assertion that HardOCP - or any other site - "killed the GPU market" is absurd in the extreme. The GPU market has seen declining prices because of competition between ATI and NVIDIA, and because the consumer isn't interested in spending $500 every 6-12 months on a new GPU. However, ATI and NVIDIA are hardly dying... though ATI as part of AMD is in a serious bind right now if things don't improve. Thankfully, AMD has helped the CPU market reach a similar point, but with Core i7 we're going back to the old way of things.

Your linking to page nine of this article as though that's somehow proof of bias is even better. Johan shows that Nehalem isn't properly optimized for in ESX 3.5, while Shanghai doesn't have that problem. That's a potential 22% boost for AMD in that test, which we outright admit! Of course, as Johan then points out, there are a LOT of companies that aren't moving to ESX 4.0 for a while yet, so ESX 3.5 scores are more of a look at the current industry.

Again, we know that VMmark does provide one measurement of virtualization performance. Is it a "catch-all"? No more so than the vApus Mark I tests. They both show different aspects of how a server/CPU can perform in a virtualized environment. We haven't even looked at stuff like Linux yet, and you can rest assured that the performance of various CPUs with that environment are all over the place (due to optimizations or lack of optimizations). Anyhow, I expect Nehalem will stretch its legs more in 2-tile and 3-tile testing, even with our supposedly biased test suite.

Since you're so wise, let me ask you something: what would happen if a large benchmark became highly used as a standard measurement of performance in an industry where companies spend billions of dollars? Do you think, just maybe, that places like Intel, Dell, HP, Sun, etc. might do a bunch of extra optimizations targeted solely at improving those scores? No, that could never happen, especially not in the great USA where we alone are qualified to know how hardware works. Certainly NVIDIA and ATI never played any optimization games with 3DMark.

In short, the responses to your comments should give you a good idea of how reasoned your postings are. Cool your jets and learn to show respect and thoughtful posting. I don't know why you're so worried about people showing Intel in the best light possible, but you post on (practically) every Intel or AMD article pumping the joys of Intel, and lambasting AMD.

The fact is, many reviews of Nehalem show inflated benefits for the architecture relative to the real world. VMmark with ESX 4.0 definitely falls into that category - or do you think a range of 14.22@10 tiles with ESX 3.5 Update 4 to 24.24@17 tiles with ESX 4.0 is perfectly normal? I'm not sure anyone actually runs a real workload that mimics VMmark to the point where simply an update to ESX 4.0 would boost performance and virtualization potential by 70%.

Does Intel make the currently better CPU? Of course they do. Does that mean AMD isn't worth a look? Hardly. There are numerous reasons an architecture might perform better/worse. VMmark - or any benchmark - will at best show one facet of performance, and thus what we really need are numerous tests showing how systems truly perform.
tshen83 - Thursday, May 21, 2009 - link
Jarred:

Let's not fool each other. Johan's AMD bias is disgusting.

My assertion that HardOCP killed the GPU market is simply trying to show you the effect of invalidating industry standard benchmarks. Architecturally, Nvidia's GPU bigger monolithic cores are far more advanced than ATI's cores right now. In GPGPU applications, it is not even close. The problem with gaming FPS benchmark as I have said is that developers are typically happy once the FPS reaches parity. It does not show architectural superiority.

vApus? There are a ton of questions unanswered.
1. Who wrote the software?(I assume European)
2. Does the software scale linearly? And does the software scale on both AMD and Intel architecuture?
3. Why benchmark 4 Core Virtual machines when we know that VMware doesn't really scale that well themselves in SMP setup?
4. Seriously? Nieuws.be OLAP database? How many real world people run Nieuws.be?

I usually don't respond to Anandtech articles unless the article is disgustingly stupid. I also don't understand why you guys can't accept the fact that Nehalem is in fact 100% performance/watt improved vs the previous generation Xeon. It is backed by data from more than one industry standard benchmark.

Is AMD worth a look today? No, absolutely not. If you are still considering anything AMD today, you are an idiot. (The world is full of idiots) AMD's only chance is if they can release the G34 socket platform within a TDP range that is acceptable before they run out of cash.

Before you call me a troll, remind yourself this: usually the troll is smarter than the people he/she is trolling. So ask yourself this question: did Johan deserve the negative critism?
JarredWalton - Thursday, May 21, 2009 - link
You criticize every one of his articles, often because I'm not sure your reading comprehension is up to snuff. His "AMD bias" is not disgusting, though I'm quite sure your Intel bias is far worse than his AMD bias. The reason 3DMark has been largely invalidated is that it doesn't show realistic performance - though some of the latest versions scale similarly to some games, at best 3DMark measures 3DMark performance. Similarly, VMmark measures VMmark performance. Unless your workload is the same as VMmark, it doesn't really tell you much.

1 - Who wrote the software? According to the article, "vApus or Virtual Application Unique Stresstest is a stress test developed by Dieter Vandroemme, lead developer of the Sizing Server Lab at the University College of West-Flanders." His being European has nothing to do with anything at all, unless you're a racist, bigoted fool.

2 - 2-tile and 3-tile testing is in the works. It will take time.

3 - Perhaps because there are companies looking for exactly that sort of solution. I guess we should only test situations where VMware performs optimally?

4 - The source of the database is not so critical as the fact that it is a real-world database. Whether Johan uses a DB from Nieuws.be, AnandTech.com, Cnet.com, or some other source isn't particularly meaningful. It is a real setup used outside of benchmarking, and he had access to the site.

I usually don't respond to trolls unless they are disgustingly stupid as well. I don't understand why you can't accept the fact that Nehalem isn't a panacea that fixes all the world's woes. That is backed by the world around us which continues to have all sorts of problems, and a "greener" CPU isn't going to save the environment any more than unplugging millions of cell phone charges that each consume 0.5W of power or less.

AMD is certainly worth a *look* today. Will you actually end up purchasing AMD? That depends largely on your intended use. I have old Athlon 64/X2 systems that do everything that they need to do. For a small investment, you can build a much better AMD HTPC than Intel - mostly because the cheap Intel platform boards are garbage. I'd take a lesser CPU with a better motherboard any day over a top-end CPU with a crappy motherboard. If you want a system for less than $300, the motherboards alone would make me tend towards AMD.

Of course, that completely misses the point that this isn't even remotely related to that market. Servers are in another realm, and features and support are critical. If you have a choice between AMD quad socket and Intel dual socket, and the price is the same, you might want the AMD solution. If you have existing hardware that can be upgraded to Shanghai without changing anything other than the CPU, you might want AMD. If you're buying new, you'd want to look at as much data as possible.

Xeon X5570 still surpasses AMD in the initial tests by over 30%, which is not insignificant. If that extends to 50% or more in 2-tile and 3-tile setups, it's even more in Intel's favor. However, a 30% advantage is hardly out of line with the rest of the computing world. SYSmark 2007 shows the i7 965 beating the Phenom II 955 by 26.6%. Photoshop CS4 shows a 48.7% difference. DivX is 35.3%, xVid is 15.9% pass1 and 65.4% pass2, and WME9 is 25%. 3dsmax is 55.8%, CINEBENCH is 42%, and POV-ray is 65.3%.

Which of those tests is a best indication of true potential for Core i7? Well, ALL OF THEM ARE! What's the best virtualization performance metric out there? Or the best server benchmark out there? They're ALL important and useful. vApus is just one more item to look at, and it still shows a good lead for Intel.

Where is the 100% perf/watt boost compared to last generation? Well, it's in an application where i7 can stretch its eight threaded muscles. Compared to AMD, the performance/watt benefit for an entire system is more like 40% on servers. For QX9770, i7 965 is 32% more perf/watt in Cinebench, or 37.6% in Xvid. I doubt you can find a 100% increase in performance/watt without cherry-picking the benchmark and CPUs in question, but that's what you're already determined to do. That, my friend, is true bias - when you can't even admit that anything from the competition might be noteworthy, you are obviously wearing blinders.
Zstream - Thursday, May 21, 2009 - link
Umm based on your two rants this means you have ZERO knowledge working with virtual desktops/terminal servers/virtual applications.

I feel I need to make two corrections.

One: ATI's die size is roughly 75% of Nvidia's, how do you conclude that Nvidia is better? Well honestly you can not because if you scale the performance and had the same die size of Nvidia, then ATI would be killing them.

Second: Majority of enterprise's run AMD and Intel, in fact not till Neh. did Intel really come into the virtualization market.
tshen83 - Thursday, May 21, 2009 - link
"Umm based on your two rants this means you have ZERO knowledge working with virtual desktops/terminal servers/virtual applications. "

Really? Just how did you come up with this revelation?

"One: ATI's die size is roughly 75% of Nvidia's, how do you conclude that Nvidia is better? Well honestly you can not because if you scale the performance and had the same die size of Nvidia, then ATI would be killing them. "

You don't know shit about GPUs.

"Second: Majority of enterprise's run AMD and Intel, in fact not till Neh. did Intel really come into the virtualization market. "

True. That's what I am saying too, if you listened. I said, "no one should be considering AMD today because Nehalem is here".
Zstream - Thursday, May 21, 2009 - link
I came to that conclusion based on your incoherent rants.

Why would you say I do not know shit about GPU's? I provided you a fact, your illogical thinking does not change the matter. It comes down to die size and ATI wins performance/DIE. If you would like to argue that claim with then please do so.

Who would consider Neh in todays market? Very few, unless you are a self proclaimed millionaire who crazily spends or needing the extra performance boost in some applications like exchange.
Viditor - Thursday, May 21, 2009 - link
Guys, it's tshen...nobody over the age of 12 listens to his rants anyway, so don't feed the troll (or ban him if you can...).
leexgx - Thursday, May 21, 2009 - link
LOL nice rant

3dmark cant be used any more as its not an 3dmark any more its more like an 3d gpu/cpu mark the CPU can sway the total result

AMD cpus have been using dedicated bus that talks to each other cpu socket and has direct access to the ram, allso AMD does have V-amd as well on all amd64 am2 cpus as well as optrons an (baring sempron)
Makaveli - Thursday, May 21, 2009 - link
Ya what is the post all about.

HardOCP killed the GPU market? I don't know about you but I never bought a videocard because of its 3dmark score. It's one benchmark that both companies cater to but is of little importance. Hardocp review method has much more valuable data for me than one benchmark.

Let me ask you this when you are buying a car or anything of siginicant value. Do you not do your homework is one review being either positive or negative enough to drop your hard earned cash?

If so Bestbuy is that way!

As for the rest of your post the personal attacks and childish language cleary show your not even worth taking seriously. Sounds more like the ramblings of a Highschool child who is trying to get attention.

Good day to you sir,

Godspeed
Zstream - Thursday, May 21, 2009 - link
You have no idea what you are talking about. The benchmark software can be downloaded. It is not our fault you are to poor to pay for a product.

The rest I have to say "LOL".
DeepThought86 - Thursday, May 21, 2009 - link
Wow, just wow.
GotDiesel - Thursday, May 21, 2009 - link
"Yes, this article is long overdue, but the Sizing Server Lab proudly presents the AnandTech readers with our newest virtualization benchmark, vApus Mark I, which uses real-world applications in a Windows Server Consolidation scenario."

spoken with a mouth full of microsoft cock

where are the Linux reviews ?

not all of us VM with windows you know..
JohanAnandtech - Thursday, May 21, 2009 - link
A minimum form of politeness would be appreciated, but I am going to assume your were just dissapointed.

The problem is that right now the calling circle benchmark runs half as fast on Linux as it does on Windows. What is causing Oracle to run slower on Linux than on Windows is a mystery even to some of the experienced DBA we have spoken. We either have to replace that benchmark with an alternative (probably Sysbench) or find out what exactly happened.

When you construct a virtualized benchmark it is not enough just to throw in a few benchmarks and VMs, you really have to understand the benchmark thoroughly. There are enough halfbaken benchmarks already on the internet that look like a Swiss cheese because there are so many holes in the methodology.
JarredWalton - Thursday, May 21, 2009 - link
Page 4: vApus Mark I: the choices we made

"vApus mark I uses only Windows Guest OS VMs, but we are also preparing a mixed Linux and Windows scenario."

Building tests, verifying tests, running them on all the servers takes a lot of time. That's why the 2-tile and 3-tile results are not yet ready. I suppose Linux will have to wait for Mark II (or Mark I.1).
mino - Thursday, May 21, 2009 - link
What you did so far is great. No more words needed.

What I would like to see is vApus Mark I "small" where you make the tiles smaller, about 1/3 to 1/4 of your current tiles.
Tile structure shall remain simmilar for simplicity, they will just be smaller.

When you manage to have 2 different tile sizes, you shall be able to consider 1 big + 1 small tile as one "condensed" tile for general score.

Having 2 reference points will allow for evaluating "VM size scaling" situations.
JohanAnandtech - Sunday, May 24, 2009 - link
Can you elaborate a bit? What do you menan by "1/3 of my current tile?" . A tile = 4 VMs. are you talking about small mem footprint or number of VCPUs?

Are you saying we should test with a Tile with small VMs and then test afterwards with the large ones? How do you see such "VM scaling" evaluation?
mino - Monday, May 25, 2009 - link
Thanks for response.

1/3 I mean smaller VM's. Mostly from the load POW. Probably 1/3 load would go for 1/2 memory footprint.

The point being that currently the is only a single datapont with a specific load-size per tile/per VM.

By "VM scaling" I would like to see what effect woul smaller loads have on overal performance.

I suggest 1/3 or 1/4 the load to get a measurable difference while remaining within reasonable memory/VM scale.

In the end, if you get simmilar overal performance from 1/4 tiles, it may not make sense to include this in future.
Even then the information that your benchmark results can be safely extrapolated to smaller loads would be of a great value by itself.
mino - Monday, May 25, 2009 - link
Eh, that last text of mime looks like a nice gibberish...
Clarification nneded:

To be able to run more tiles/box smaller memory footprint is a must.
With smaller mem footprint, smaller DB's are a must.

The end results may not be directly comparable but shall be able to give some reference point, corectly interpreted

Please let me know if this makes sense to you.
There are multiple dimensions to this. I may be easily on the imaginery branch :)
ibb27 - Thursday, May 21, 2009 - link
Can we have a chance to see benchmarks for Sun Virtualbox which is Opensource?
winterspan - Tuesday, May 26, 2009 - link
This test is misleading because you are not using the latest version of VMware that supports Intel's EPT. Since AMD's version of this is supported in the older version, the test is not at all a fair representation of their respective performance.
Zstream - Thursday, May 21, 2009 - link
Can someone please perform a Win2008 RC2 Terminal Server benchmark? I have been looking everywhere and no one can provide that.

If I can take this benchmark and tell my boss this is how the servers will perform in a TS environment please let me know.
JohanAnandtech - Friday, May 22, 2009 - link
Most of the time, the number of sessions on TS are limited by the amount of memory. Can you give some insight in what you are running inside a session? If it is light on CPU or I/O resources, sizing will be based on the amount of memory per session only.
dragunover - Thursday, May 21, 2009 - link
would be interesting if this was done on desktop CPU's with price / performance ratios
jmke - Thursday, May 21, 2009 - link
nope, that would not be interesting at all. You don't want desktop motherboards, RAM or CPUs in your server room;
nor do you run ESX at home. So there's no point to test performance of desktop CPUs.
simtex - Thursday, May 21, 2009 - link
Why so harsh, virtualization will eventually become a part of desktops users everyday life.

Imagine, tabbing between different virtualization, like you do in your browser. You might have a secure virtualization for your webapplications, a fast virtualization for your games. Another for streaming music and maybe capturing television. All on one computer, which you seldom have to reboot because everything runs virtualized.
Azsen - Monday, May 25, 2009 - link
Why would you run all those applications on your desktop in VMs? Surely they would just be separate application processes running under the one OS.
flipmode - Thursday, May 21, 2009 - link
Speaking from the perspective of how the article can be the most valuable, it is definitely better off to stick to true server hardware for the time being.

For desktop users, it is a curiosity that "may eventually" impart some useful data. The tests are immediately valuable for servers and for current server hardware. They are merely of academic curiosity for desktop users on hardware that will be outdated by the time virtualization truly becomes a mainstream scenario on the desktop.

And I do not think he was being harsh, I think he was just being as brief as possible.

Real-world virtualization benchmarking: the best server CPUs compared

Post Your Comment

66 Comments

Back to Article

binaryguru - Monday, June 1, 2009 - link

mikidutzaa2 - Friday, May 29, 2009 - link

JohanAnandtech - Friday, May 29, 2009 - link

mikidutzaa2 - Friday, May 29, 2009 - link

RobAm - Tuesday, May 26, 2009 - link

Brovane - Tuesday, May 26, 2009 - link

JohanAnandtech - Tuesday, May 26, 2009 - link

Brovane - Tuesday, May 26, 2009 - link

yasbane - Tuesday, May 26, 2009 - link

JohanAnandtech - Tuesday, May 26, 2009 - link

has407 - Sunday, May 24, 2009 - link

JohanAnandtech - Tuesday, May 26, 2009 - link

has407 - Wednesday, May 27, 2009 - link

has407 - Monday, May 25, 2009 - link

has407 - Monday, May 25, 2009 - link

solori - Friday, May 22, 2009 - link

JohanAnandtech - Sunday, May 24, 2009 - link

alpha754293 - Friday, May 22, 2009 - link

JohanAnandtech - Sunday, May 24, 2009 - link

knutjb - Monday, May 25, 2009 - link

Bandoleer - Thursday, May 21, 2009 - link

duploxxx - Friday, May 22, 2009 - link

Bandoleer - Friday, May 22, 2009 - link

Bandoleer - Friday, May 22, 2009 - link

JohanAnandtech - Friday, May 22, 2009 - link

Bandoleer - Friday, May 22, 2009 - link

JohanAnandtech - Sunday, May 24, 2009 - link

Bandoleer - Sunday, May 24, 2009 - link

solori - Monday, May 25, 2009 - link

pmonti80 - Friday, May 22, 2009 - link

JohanAnandtech - Friday, May 22, 2009 - link

pc007 - Thursday, May 21, 2009 - link

SeanG - Friday, May 22, 2009 - link

lopri - Thursday, May 21, 2009 - link

JohanAnandtech - Friday, May 22, 2009 - link

tshen83 - Friday, May 22, 2009 - link

tshen83 - Thursday, May 21, 2009 - link

JohanAnandtech - Friday, May 22, 2009 - link

whatthehey - Friday, May 22, 2009 - link

JarredWalton - Thursday, May 21, 2009 - link

tshen83 - Thursday, May 21, 2009 - link

JarredWalton - Thursday, May 21, 2009 - link

Zstream - Thursday, May 21, 2009 - link

tshen83 - Thursday, May 21, 2009 - link

Zstream - Thursday, May 21, 2009 - link

Viditor - Thursday, May 21, 2009 - link

leexgx - Thursday, May 21, 2009 - link

Makaveli - Thursday, May 21, 2009 - link

Zstream - Thursday, May 21, 2009 - link

DeepThought86 - Thursday, May 21, 2009 - link

GotDiesel - Thursday, May 21, 2009 - link

JohanAnandtech - Thursday, May 21, 2009 - link

JarredWalton - Thursday, May 21, 2009 - link

mino - Thursday, May 21, 2009 - link

JohanAnandtech - Sunday, May 24, 2009 - link

mino - Monday, May 25, 2009 - link

mino - Monday, May 25, 2009 - link

ibb27 - Thursday, May 21, 2009 - link

winterspan - Tuesday, May 26, 2009 - link

Zstream - Thursday, May 21, 2009 - link

JohanAnandtech - Friday, May 22, 2009 - link

dragunover - Thursday, May 21, 2009 - link

jmke - Thursday, May 21, 2009 - link

simtex - Thursday, May 21, 2009 - link

Azsen - Monday, May 25, 2009 - link

flipmode - Thursday, May 21, 2009 - link

Log in

Don't have an account? Sign up now