Analyzing Falkor’s Microarchitecture: A Deep Dive into Qualcomm’s Centriq 2400 for Windows Server and Linuxby Ian Cutress on August 20, 2017 11:00 AM EST
- Posted in
- Enterprise CPUs
- Centriq 2400
Developing a custom microarchitecture is difficult. Even with all the standards in place and licensing an instruction set such as ARM, the actual development takes time and the right people to put together, then the infrastructure to deploy at scale.
In the mobile space, we’ve seen custom cores – most notably from Apple – deviating from the regular ARM design, but also Samsung and Qualcomm are playing in that space. Qualcomm however is going one further by developing a custom core for the server and enterprise market, focusing purely on typical enterprise workloads. The current commercial ARM success in the data center comes from companies such as Cavium, who use ARM architecture licenses in a custom SoC. By developing its own high-performance core, Qualcomm is hoping to offer something different in the data center, and they’ve lifted the lid on a good chunk of the core.
The Qualcomm Centriq 2400 SoC Family, with the Falkor CPU
Back in December 2016, Qualcomm announced that it has developed its own SoC for the data center, all the while also reveaing details such as the fact that it is a custom core and that Qualcomm will be involved in the Open Compute Project (and is based on the latest version of Microsoft’s Project Olympus). We knew that Qualcomm has been aiming for a 48 core design, using ARM’s instruction set, and is aiming for the data center and enterprise markets. The goal is to carry forward knowledge of the ARM instruction set and custom core design into markets that could potentially leverage it – it also helps that the data center market has a very interesting TAM (total addressable market, in USD) of which even a small slice could reap rewards. Back in December, they were beginning to sample cloud partners and potential future customers.
The first set of products to come out will be the Qualcomm Centriq 2400 family of SoCs. The top parts will feature 48 cores, and while today Qualcomm is ultimately communicating about said 48-core model, they have stated that the 2400-series will be a range of parts segregated by core count, performance, and power. The CPU cores, code named Falkor, will be ARMv8.0 compliant although with ARMv8.1 features, allowing software to potentially seamlessly transition from other ARM environments (or need a recompile). The Centriq 2400 family is set to be AArch64 only, without support for AArch32: Qualcomm states that this saves some power and die area, but that they primarily chose this route because the ecosystems they are targeting have already migrated to 64-bit. Qualcomm’s Chris Bergen, Senior Director of Product Management for the Centriq 2400, stated that the majority of new and upcoming companies have started off with 64-bit as their base in the data center, and not even considering 32-bit, which is a reason for the AArch64-only choice here.
The design team behind the Centriq, as explained to us, was partly formed from the custom core team from the mobile side. On the mobile side we have seen Qualcomm custom cores based on ARM’s instruction set in the form of Krait and Kryo, although this new Falkor design is not derived from either. Qualcomm states that Falkor is their 5th generation of custom CPU core design, and has been a complete ground up design specifically for the data center. The focus, we were told, was on high overall performance, high performance per watt, but also the ability to run at low power. To do this, the Centriq 2400 is set to be the first major data center design built on a 10nm process.
We already know that it will be fabbed on a 10nm process, and various media/analysts have postulated which foundry will be playing that role. Qualcomm currently has 10nm volume with Samsung through the Snapdragon 835, which is shipping in the millions. Samsung’s 10nm processes are more mature than the competition at this point, however Samsung does not have much experience with large silicon dies, tending to favor smaller SoCs due to the naturally higher yields and helping to keep fab production at a high level. The other alternative is TSMC, whose CLN10FF process was technically available for select customer orders later than Samsung, but is currently being used by Apple's A10X in the iPad Pro 2. TSMC also has experience with larger silicon, which would be of considerable benefit. Qualcomm is not announcing who their foundry partner is at this time unfortunately, although it would likely depend on relations, volume, pricing and performance.
Post Your CommentPlease log in or sign up to comment.
View All Comments
tipoo - Sunday, August 20, 2017 - linkBig ARM server CPUs will be interesting. The ISA is very sane and scalable, if the investment and demand was there it would have no issue getting to where large x86 cores are, the ISA was never the limit.
Then we can see if they can actually exceed them.
Kevin G - Sunday, August 20, 2017 - linkThis makes me wish that Apple would license their cores to 3rd parties. Recent Apple cores are getting very close to where x86 lies per clock and they've certainly exceeded x86 in performance/watt in the ultra mobile space (granted Intel's last round of ultra mobile chips was flat out cancelled, skewing such a comparison).
Considering Apple's work in ultra mobile, I find it believable that a higher performance per clock design in the server space is feasible for an ARM design. A company with enough resources just needs to do it.
iwod - Sunday, August 20, 2017 - linkIf the leaked numbers for A11 were true then Apple may have exceeded the performance / clock against Intel x86 as well.
While Apple are highly unlikely to ever license their Cores out, I wish they could use those Cores and make an Xserve Server Come back.
peevee - Monday, August 21, 2017 - linkXServe died because of their own OS. Nobody is interested in anything but Linux (and sometimes a little Windows).
But they could have sold it with Linux though.
Dr. Swag - Sunday, August 20, 2017 - linkApple never will though, since it's Apple we're talking about. They keep their tech to themselves to give themselves the advantage.
name99 - Sunday, August 20, 2017 - linkThe only benchmarks that exist are geekbench4 and the browser benchmarks against Apple laptop hardware. By THOSE benchmarks A9X matched Intel in IPC and A10X exceeds by around 15%.
This is clearly an area that draws out the crazies in full screaming mode because a lot of assumptions have to be made (for example the most realistic assumption is that the high-end Intel scores occur at the maximum turbo frequency, but the crazies will insist that, no, you have to normalize to the baseline intel frequency for that particular CPU). Or you get the insistence that the ONLY measurement that matters is against SPEC2006 compiled with icc, which runs into the issues that icc has MASSIVE effects on SPEC; and that no SPEC numbers in any form exist for the A10/A10X.
At the end of the day, it boils down to "what is your goal?" If your goal is an honest comparison of the two processor families, the best data available suggests the summary I gave. If your goal is "my CPU can beat up your CPU" then all the data in the world presumably won't change your mind, and the best data of all is non-existent data (like the certain claims as to how the A10X would or would not behave on SPEC2006).
Final point. It is not at all implausible, IMHO, that Apple have a plan, and have already started proceeding down it, for ARM in their data centers. After all, why not? It saves them money, it allows them to run at their pace not Intel's (eg install AI or compression or encryption accelerators as they need them) and provides better security (both security through obscurity and not having as large an attack surface as Intel).
But why would they talk about it? Apple says nothing ever, unless they have to. No way they would advertise to their competitors the extent to which they have comparative advantage through use of their own data warehouse chips (for at least some purposes).
zodiacfml - Monday, August 21, 2017 - linkNot sa fast. Apple's SoC's are huge in die size which is the reason for their performance. They are as big or bigger than Intel Core. The best part for comparison are the Core M parts. There is little or no business for Apple to do this. There are rumors using Apple SoC on a Macbook Air but that will make little sense as they will to need port OSX to ARM. Again, that is not a good idea as Macbook Pro nor the Mac Pros will continue with OSX .
cdillon - Monday, August 21, 2017 - linkApple has already ported OSX to ARM, and they call it "iOS". It's not going to be as big a deal as you think to get OSX as we know it to run in ARM. Not only that, but they already have experience with juggling two processor architectures (PPC and x86) at the same time in one OS.
extide - Monday, August 21, 2017 - linkAnd 68k to PPC, back in the day
name99 - Monday, August 21, 2017 - linkApple's SoCs are not huge, neither are their cores.
The iPhone SoC's tend to hover around 100 to 120mm^2, the iPad SoCs sometime reach 150, though the A10X is below 100.
The cores are a few mm^2. Eyeballing it, I'd say the entire CPU complex (2 large cores, two small cores, and L2) is about 12mm^2. This is substantially larger than ARM cores (four A73s+their L2 in the same process technology would fit in 8mm^2) but substantially smaller than Intel (an Intel core these days runs at around 8mm^2 in Intels 14nm).