Alongside their EPYC server CPU updates, as part of today’s AMD Data Center event, the company is also offering an update on the status of their nearly-finished AMD Instinct MI300 accelerator family. The company’s next-generation HPC-class processors, which use both Zen 4 CPU cores and CDNA 3 GPU cores on a single package, have now become a multi-SKU family of XPUs.

Joining the previously announced 128GB MI300 APU, which is now being called the MI300A, AMD is also producing a pure GPU part using the same design. This chip, dubbed the MI300X, uses just CDNA 3 GPU tiles rather than a mix of CPU and GPU tiles in the MI300A, making it a pure, high-performance GPU that gets paired with 192GB of HBM3 memory. Aimed squarely at the large language model market, the MI300X is designed for customers who need all the memory capacity they can get to run the largest of models.

First announced back in June of last year, and detailed in greater depth back at CES 2023, the AMD Instinct MI300 is AMD’s big play into the AI and HPC market. The unique, server-grade APU packs both Zen 4 CPU cores and CDNA 3 GPU cores on to a single, chiplet-based chip. None of AMD’s competitors have (or will have) a combined CPU+GPU product like the MI300 series this year, so it gives AMD an interesting solution with a truly united memory architecture, and plenty of bandwidth between the CPU and GPU tiles.

MI300 also includes on-chip memory via HBM3, using 8 stacks of the stuff. At the time of the CES reveal, the highest capacity HBM3 stacks were 16GB, yielding a chip design with a maximum local memory pool of 128GB. However, thanks to the recent introduction of 24GB HBM3 stacks, AMD is now going to be able to offer a version of the MI300 with 50% more memory – or 192GB. Which, along with the additional GPU chiplets found on the MI300X, are intended to make it a powerhouse for processing the largest and most complex of LLMs.

Under the hood, MI300X is actually a slightly simpler chip than MI300A. AMD has replaced MI300A's trio of CPU chiplets with just two CDNA 3 GPU chiplets, resulting in a 12 chiplet design overall - 8 GPU chiplets and what appears to be another 4 IO memory chiplets. Otherwise, despite excising the CPU cores (and de-APUing the APU), the GPU-only MI300X looks a lot like the MI300A. And clearly, AMD is aiming to take advantage of the synergy in offering both an APU and a flagship CPU in the same package.

Raw GPU performance aside (we don't have any hard numbers to speak of right now), a big part of AMD's story with the MI300X is going to be memory capacity. Just offering a 192GB chip on its own is a big deal, given that memory capacity is the constraining factor for the current generation of large language models (LLMs) for AI. As we’ve seen with recent developments from NVIDIA and others, AI customers are snapping up GPUs and other accelerators as quickly as they can get them, all the while demanding more memory to run even larger models. So being able to offer a massive, 192GB GPU that uses 8 channels of HBM3 memory is going to be a sizable advantage for AMD in the current market – at least, once MI300X starts shipping.

The MI300 family remains on track to ship at some point later this year. According to AMD, the 128GB MI300A APU is already sampling to customers now. Meanwhile the 192GB MI300X GPU will be sampling to customers in Q3 of this year.

It also goes without saying that, with this announcement, AMD has solidified that they're doing a flexible XPU design at least 3 years before rival Intel. Whereas Intel scrapped their combined CPU+GPU Falcon Shores product for a pure GPU Falcon Shores, AMD is now slated to offer a flexible CPU+GPU/GPU-only product as soon as the end of this year. In this timeframe, it will be going up against products such as NVIDIA's Grace Hopper superchip, which although isn't an APU/XPU either, comes very close by linking up NVIDIA's Grace CPU with a Hopper GPU via a high bandwidth NVLink. So while we're waiting on further details on MI300X, it should make for a very interesting battle between the two GPU titans.

Overall, the pressure on AMD with regards to the MI300 family is significant. Demand for AI accelerators has been through the roof for much of the past year, and MI300 will be AMD’s first opportunity to make a significant play for the market. MI300 will not quite be a make-or-break product for the company, but besides getting the technical advantage of being the first to ship a single-chip server APU (and the bragging rights that come with it), it will also give them a fresh product to sell into a market that is buying up all the hardware it can get. In short, MI300 is expected to be AMD’s license to print money (ala NVIDIA’s H100), or so AMD’s eager investors hope.

AMD Infinity Architecture Platform

Alongside today’s 192GB MI300X news, AMD is also briefly announcing what they are calling the AMD Infinity Architecture Platform. This is an 8-way MI300X design, allowing for up to 8 of AMD’s top-end GPUs to be interlinked together to work on larger workloads.

As we’ve seen with NVIDIA’s 8-way HGX boards and Intel’s own x8 UBB for Ponte Vecchio, an 8-way processor configuration is currently the sweet spot for high-end servers. This is both for physical design reasons – room to place the chips and room to route cooling through them – as well as the best topologies that are available to link up a large number of chips without putting too many hops between them. If AMD is to go toe-to-toe with NVIDIA and to capture part of the HPC GPU market, then this is one more area where they’re going to need to match NVIDIA’s hardware offerings

AMD is calling the Infinity Architecture Platform an “industry-standard” design. Accoding to AMD, they're using an OCP server platform as their base here; and while this implies that MI300X is using an OAM form factor, we're still waiting to get explicit confirmation of this.

Comments Locked

28 Comments

View All Comments

  • lemurbutton - Tuesday, June 13, 2023 - link

    Congratulations to AMD. Its MI300X GPU will match the M2 Ultra in GPU memory capacity.
  • mdriftmeyer - Tuesday, June 13, 2023 - link

    It surpasses it by 96bps on an 8192 bit bus. In short, it stomps all over it.
  • hecksagon - Tuesday, June 13, 2023 - link

    MI300X his 5200 gb/s compared to the M2's 800gb/s. It isn't even close.
  • Gm2502 - Tuesday, June 13, 2023 - link

    How to show you know nothing about computer architecture without saying you know nothing. Rates as one of the stupidest comments of the year...
  • name99 - Tuesday, June 13, 2023 - link

    It's not a completely stupid comment.
    At the time of the M2 Ultra reveal, a lot of people were mocking it as useless because it did not have the raw compute of high end nV or AMD. The obvious rejoinder to that is that compute is not the whole story, there is real value in less compute coupled to more RAM.

    MI shows that this wasn't just copium. AMD ALSO believes there's real value associated with adding massive (ie more than current nV) amounts of RAM to a decent, even if it's not best in the world, amounts of compute.

    You can squabble about the details: whether Apple doesn't have enough GPU compute, whether AMD's bandwidth is over-specced, whether SLC on Apple captures enough data reuse in the particular case of interest (training LLMs) to effectively match AMD's memory performance.
    But all of those are details; the single biggest point is that this is a validation of Apple's belief that raw compute ala nV is going in the wrong direction; that compute needs to be matched to a *large enough* RAM to match upcoming problems.
  • Gm2502 - Tuesday, June 13, 2023 - link

    What are you talking about? Literally you are sprouting total rubbish. The M2 ultra is a desktop/workstation class APU, utilising DDR5 unified memory. HEDT or Workstations like the Mac pro normally require large amounts of physical system memory, and generous VRAM for graphics. Previous Mac pro could have 1.5tb of system ram and 48gb of VRAM. The new M2 needed at least 192GB of RAM, with the vast majority of it going to system RAM not VRAM. This is needed to handle CAD and other graphic/video editing requirements. The GPU horse power sucks compared to dedicated GPUS because of limited die space, not because of some push by Apple to reduce processing in lieu of adding more RAM. Apple is betting they can optimise the software to accommodate the less powerful hardware, giving a similar experience for these very narrow use cases. The MI300X has the huge amount of dedicated high bandwidth RAM (given the almost 7x bandwidth increase, the 192GB OF Ram is massively more efficient and performant the M2) to allow full LLM model parameters to be directly stored in memory for processing. The computing power of this chip is a complete monster, and it would be limited by the 24gb HBM3e chips, otherwise it would have even more RAM. You are literally trying to clutch straws and draw false equivalence between two massively different technologies utilised for completely different things, and my original comment now extends to you too.
  • lemurbutton - Tuesday, June 13, 2023 - link

    It uses LPDDR5X, not DDR5.
  • Gm2502 - Wednesday, June 14, 2023 - link

    Fair point, used DDR5 as a catch all, should have been more specific.
  • whatthe123 - Wednesday, June 14, 2023 - link

    lol what... have you been living under a rock? Large memory pools on AI systems have been standard for the better part of a decade. It's one of the reasons cpus are still used even though they are an order of magnitude slower than amd/nvidia gpus. good lord
  • MINIMAN10000 - Thursday, June 15, 2023 - link

    So M2 Ultra not having enough GPU compute only really comes up in the context that they brought up. Training LLMs requires absurd amounts of compute, the training simply isn't worth your resources to be attempted on an M2 Ultra. However as far as inference goes, it may not be the fastest but it should be able to run future models with record setting number of parameters that can't be run on anything even remotely close to M2 Ultra in price. Assuming they get the software working ( Apple is pretty good at that ) The M2 Ultra will allow for some bleeding edge testing.

    I just wanted to point out that when it comes to large LLMs ( AI is pretty much the reason why you would use this much ram ) bandwidth is king. The faster you can move data the faster you can run inference ( talking to the AI ) So Ideally we should see some incredible results with this thing for something like guanaco 65B and larger

Log in

Don't have an account? Sign up now