AMD K8 E4 Stepping: SSE3 Performance
by Derek Wilson on February 17, 2005 12:05 AM EST- Posted in
- CPUs
Final Words
Finding good SSE3 benchmarks wasn't as easy as we would have liked. Other encoding suites react the same way that DivX and AutoGK do. This seems to indicate that the K8 architecture is simply resilient when it comes to unaligned 128bit loads. In the case of Intel's NetBurst, the lddqu instruction may have more impact.As far as physics and graphics go, the added instructions show potential in our synthetic test. For DCC, CAD, scientific, and other workstation software, the E4 stepping could offer a bit of a performance boost.
In the consumer space, Athlon 64 may not see as much benefit from SSE3, especially since our encoding tests turned up so little performance impact. SSE3 can be used in games, but the impact of this will likely be minimal. As most games will likely remain graphics limited, improvements will have a hard time shining through. Of course, for those who like to use lower cost Athlon 64 processors in cheaper workstations, there could be some advantage.
When we take a look at the Opteron 252 in a workstation environment, we will be able to get a better view of what the total package has to offer. As our workstation tests will be in a DP environment, we'll be able to see how the higher bandwidth helps the Opteron shine.
We would like to have tested more applications in this report on SSE3 performance under the new AMD core. Of interest to us are LINPACK, FLOPS, STREAM, and various other tests that would require us to recompile them with proper SSE3 support. As the Intel compiler is designed to optimize for Intel processors, we haven't had a viable source for high quality SSE3 compilation. Hand optimizing these benchmarks for SSE3 on Opteron would take a little more time than this short investigation will allow. We may look into using GCC for this purpose in future tests. As for real world tests using SSE3, we haven't been able to find many suitable candidates beyond video encoders.
It will likely be the case that current SSE3 optimized code paths will also not show their strengths on Opteron/Athlon until the processors are in developers' hands for a while. The Intel compiler is also hands and feet above any resource AMD have up their sleeve. But since SSE3 offers more choices for optimization and code simplification, compilers may have an easier time generating efficient code. Hand optimized code is still important for tight loops in critical sections of performance oriented code. In this case, more powerful and simple options implemented in hardware will help programmers better optimize their own code.
48 Comments
View All Comments
Beenthere - Thursday, February 17, 2005 - link
The only reason Intel created SSE3 was to have bogus benchmarks to fool naive consumers. There is no significant performance advanatage in any application. When you're Intel and you can provide incentives for benchmarks to be written to your liking to show a fantasy performance advantage, and your product line is obsolete and your market share is dropping, you do whatever you can to deceive consumers and hacks. AMD included SSE3 so Intel couldn't use the bogus benchmarks for misleading marketing purposes.This is no different than when MICROSUCKS paid to have benchmarks run that showed Win2000 to be faster than NT4 when in fact it is NOT in actual practice.
SOD, DD
Time for PC users to become a little more knowledgeable on the scams being used by dishonest companies to hawk inferior products.
Carfax - Thursday, February 17, 2005 - link
Hey Derek. Could you test SSE2 performance aswell?As it has been mentioned, the E stepping was rumored to possess a better SSE2 implementation.
iwodo - Thursday, February 17, 2005 - link
I always thought E core stepping is going to bring many things new on the table.Improved memory contoller, that is suppose to be faster and have better compbality.
Improved SSE2 core - More performance.
Better Cache Latency
SIO - Lower TDP...........
Where is all these in the review? Or are they just total rumors or They are not avalible on Opertron?
DerekWilson - Thursday, February 17, 2005 - link
Oh, but back on topic... I've had a lot of emails about AMD simply mapping SSE3 functionality to SSE2 (or even x87) hardware. This would be a very bad idea for AMD and doesn't look like what they are doing.If we had seen AMD impliment the entire SSE3 instruction set as essentially macros for SSE2 we would likely have seen a performance drop. There's not an easy way to just map some of the instructions, as optimal performance would require a recompile. We actually saw a performance gain in our synthetic benchmark that used some of the floating point instructions.
It is possible some instructions could be treated this way. For example, there's no reason the code that uses a standard method to load 16 bytes (that may or may not be unaligned) and lddqu should look different.
DerekWilson - Thursday, February 17, 2005 - link
No one uese Opteron?http://www.anandtech.com/IT/showdoc.aspx?i=2173
Also, if you need 4P or more, there's no reason to limit yourself by going with Intel's FSB implimentation -- It really hurts the performance of the system:
http://www.anandtech.com/IT/showdoc.aspx?i=1982
xsilver - Thursday, February 17, 2005 - link
old habit ?Its called perception lag -- when perception (of intel being good) needs to catch up to reality .... oh and also blame it on companies like dell etc.
Brunnis - Thursday, February 17, 2005 - link
bigpow: But then again, I wouldn't go with Opteron too.Why not? Opteron is better than Xeon in many areas.
A large reason why many companies don't use much else than Intel products are probably because of old habit. That's just stupid, in my opinion, but everyone's different...
sandorski - Thursday, February 17, 2005 - link
Bigpow: Opteron has gone from 0-10% marketshare in he server space. So it's not surprising that you nor anyone you know has them, but they are being used and last I heard they were still gaining Marketshare.Samadhi - Thursday, February 17, 2005 - link
It has been written in a number of places that as well as adding SSE3 units the SSE2 units were to be improved in the latest chip revision.Any chance we could get some SSE2 vs SSE2 results for the two processors tested in this article?
SkAiN - Thursday, February 17, 2005 - link
Sorry for the blank post.When I first began reading this article, I became excited, looking forward to seeing the benchmarks this "upgrade" was supposed to bring, especially in the area of encoding.
Then I saw the benchmarks.
Seriously, it looks as if AMD is getting the short end of the stick when it comes to the cross-licensing deal with Intel. Intel gets awesome new architechture, A64's get Intel's bogus hype...