Russia’s Elbrus 8CB Microarchitecture: 8-core VLIW on TSMC 28nm
by Dr. Ian Cutress on June 1, 2020 8:00 AM ESTAll of the world’s major superpowers have a vested interest in building their own custom silicon processors. The vital ingredient to this allows the superpower to wean itself off of US-based processors, guarantee there are no supplemental backdoors, and if needed add their own. As we have seen with China, custom chip designs, x86-based joint ventures, or Arm derivatives seem to be the order of the day. So in comes Russia, with its custom Elbrus VLIW design that seems to have its roots in SPARC.
Russia has been creating processors called Elbrus for a number of years now. For those of us outside Russia, it has mostly been a big question mark as to what is actually under the hood – these chips are built for custom servers and office PCs, often at the direction of the Russian government and its requirements. We have had glimpses of the design, thanks to documents from Russian supercomputing events, however these are a few years old now. If you are not in Russia, you are unlikely to ever get your hands on one at any rate. However, it recently came to our attention of a new programming guide listed online for the latest Elbrus-8CB processor designs.
The latest Elbrus-8CB chip, as detailed in the new online programming guide published this week, built on TSMC’s 28nm, is a 333 mm2 design featuring 8 cores at 1.5 GHz. Peak throughput according to the documents states 576 GFLOPs of single precision, with the chip offering four channels of DDR4-2400, good for 68.3 GB/s. The L1 and L2 caches are private, with a 64 kB L1-D cache, a 128 kB L1-I cache, and a 512 kB L2 cache. The L3 cache is shared between the cores, at 2 MB/core for a total of 16 MB. The processor also supports 4-way server multiprocessor combinations, although it does not say on what protocol or what bandwidth.
It is a compiler focused design, much like some other complex chips, in that most of the optimizations happen at the compiler level. Based on compiler first designs in the past, that typically does not make for a successful product. Documents from 2015 state that a continuing goal of the Elbrus design is x86 and x86-64 binary translation with only a 20% overhead, allowing full support for x86 code as well as x86 operating systems, including Windows 7 (this may have been updated since 2015).
The core has six execution ports, with many ports being multi-capable. For example, four of the ports can be load ports, and two of the ports can be store ports, but all of them can do integer operations and most can do floating point operations. Four of the ports can do comparison operations, and those four ports can also do vector compute.
This short news post is not meant to be a complete breakdown of the Elbrus capabilities – we have amusingly joked internally at what frequency a Cortex X1 with x86 translation would match the capabilities of the 8-core Elbrus, however users who want to get to grips with the design can open and read the documentation at the following address:
http://ftp.altlinux.org/pub/people/mike/elbrus/docs/elbrus_prog/html/index.html
The bigger question is going to be how likely any of these state-funded processor development projects are going to succeed at scale. State-funded groups should, theoretically, be the best funded, however even with all the money in the world, engineers are still required to get things done. Even if there ends up being a new super-CPU for a given superpower, there will always be vested interests in an amount of security though obscurity, especially if the hardware is designed specifically to cater to state-secret levels of compute. There's also the added complication of the US government tightening its screws around TSMC and ASML to not accept orders from specific companies - any plans to expand those boundaries could occur, depending how good the products are or how threatened some nations involved feel.
Source: Blu (Twitter)
93 Comments
View All Comments
erinadreno - Monday, June 1, 2020 - link
The state funded groups had to put all money into chip development. However, to make something useful, paying developers and marketing are also important. None of ARM, AMD, Intel are state-funded, but they almost run everything. Kinda sad that true innovation will likely never happen.PixyMisa - Monday, June 1, 2020 - link
What is this "true innovation"?VLIW doesn't work very well for general-purpose workloads, which is why Intel abandoned it.
azfacea - Monday, June 1, 2020 - link
exactly intel/hp and others squandered tens of billions on VLIW and eventually came accept their fate.1. the compilers are impossible to write as static analysis does not permit the same level of out of order parallelism that a CPU based re-order/scheduler can deliver.
2. even the mediocre compiler performance you do get, is tied to uArch, and have to be thrown away next generation
if you need massive SIMD performance or something thats a diff story, but it doesnt work for general purpose CPU
FunBunny2 - Monday, June 1, 2020 - link
"if you need massive SIMD performance or something thats a diff story, but it doesnt work for general purpose CPU"yeah, but... since the Big Story these days is Big Data (whether in flat files or RDBMS), which is (nearly?) by definition SIMD, there may/ought to be a significant market. may haps Intel/HP bailed too early?
Freeb!rd - Monday, June 1, 2020 - link
So you think AMD gave up too soon on VLIW4 & VLIW5?https://www.tomshardware.com/reviews/a10-4600m-tri...
Maybe the cost and efficiency just wasn't there.
AlB80 - Tuesday, June 2, 2020 - link
There is no need to increase single thread performance for GPU. GPUs have massive thread parallelism and pipeline even without speculative execution is enough.azfacea - Monday, June 1, 2020 - link
big data and AI are two diff things. VLIW is liability for general purpose computing, not an asset. VLIW can work if you are building a custom chip aimed at some AI or video transcoding application or something like that but good luck competing with AI start-ups from 28 nm, nevermind NV, AMD, ....what russia needs more than anything else is semi manufacturing capacity where they have big deficit right now and possibly much bigger 2mrw. x86 is out of patent, and ARM licenses core IP on the cheap, RISC-V is out there, high performance CPU core designs are proliferating everywhere no need to redefine the general purpose ISA/architecture.
mode_13h - Monday, June 1, 2020 - link
> what russia needs more than anything else is semi manufacturing capacity where they have big deficit right nowLook at how much $$$ China has poured into this, and they're still nowhere close to TSMC or Samsung. How on Earth do you expect Russia ever to develop cutting edge fabrication in any kind of reasonable timescale?
FunBunny2 - Monday, June 1, 2020 - link
if the USofA can strongarm (he, he) ASML not to sell certain places, fabrication is moot.azfacea - Monday, June 1, 2020 - link
thats percisely why its so important, isnt it? if they dont make their own semi, their fate is in uncle sam's hand. I am not a fan of putin, far be it for me to worry about russia's semi future. but if i am being honest, it should be quite scary for russia to be marching into the age of AI with non-existent or highly vulnerable semi industry.today they can trade oil barrels for wafers, 2mrw oil may not be there, but tech will not be less important, it will be more important, especially if AI is going to do the thing its probably going to do