An AnandTech Interview with Jim Keller: 'The Laziest Person at Tesla'
by Dr. Ian Cutress on June 17, 2021 12:20 PM EST- Posted in
- Interviews
- AMD
- Intel
- Jim Keller
- Tenstorrent
I've spoken about Jim Keller many times on AnandTech. In the world of semiconductor design, his name draws attention, simply by the number of large successful projects he has worked on, or led, that have created billions of dollars of revenue for those respective companies. His career spans DEC, AMD, SiByte, Broadcom, PA Semi, Apple, AMD (again), Tesla, Intel, and now he is at Tenstorrent as CTO, developing the next generation of scalable AI hardware. Jim's work ethic has often been described as 'enjoying a challenge', and over the years when I've spoken to him, he always wants to make sure that what he is doing is both that challenge, but also important for who he is working for. More recently that means working on the most exciting semiconductor direction of the day, either high-performance compute, self-driving, or AI.
Jim Keller CTO Tenstorrent |
Ian Cutress AnandTech |
I have recently interviewed Tenstorrent's CEO, Ljubisa Bajic, alongside Jim discussing the next generation of AI semiconductors. Today we're publishing a transcript of a recent chat with Jim, now five months into his role at Tenstorrent, but moreso to talk about Jim the person, rather than simply Jim the engineer.
Jim Keller: Work Experience | ||||
AnandTech | Company | Title | Important Product |
|
1980s | 1998 | DEC | Architect | Alpha |
1998 | 1999 | AMD | Lead Architect | K7, K8v1 HyperTransport |
1999 | 2000 | SiByte | Chief Architect | MIPS Networking |
2000 | 2004 | Broadcom | Chief Architect | MIPS Networking |
2004 | 2008 | P.A. Semi | VP Engineering | Low Power Mobile |
2008 | 2012 | Apple | VP Engineering | A4 / A5 Mobile |
8/2012 | 9/2015 | AMD | Corp VP and Chief Cores Architect |
Skybridge / K12 (+ Zen) |
1/2016 | 4/2018 | Tesla | VP Autopilot Hardware Engineering |
Fully Self-Driving (FSD) Chip |
4/2018 | 6/2020 | Intel | Senior VP Silicon Engineering |
? |
2021 | Tenstorrent | President and CTO | TBD |
Topics Covered
- AMD, Zen, and Project Skybridge
- Managing 10000 People at Intel
- The Future with Tenstorrent
- Engineers and People Skills
- Arm vs x86 vs RISC-V
- Living a Life of Abstraction
- Thoughts on Moore's Law
- Engineering the Right Team
- Idols, Maturity, and the Human Experience
- Nature vs Nurture
- Pushing Everyone To Be The Best
- Security, Ethics, and Group Belief
- Chips Made by AI, and Beyond Silicon
AMD, Zen, and Project Skybridge
Ian Cutress: Most of the audience questions are focused on your time at AMD, so let’s start there. You worked at AMD on Zen, and on the Skybridge platform - AMD is now gaining market share with the Zen product line, and you're off on to bigger and better things. But there has been a lot of confusion as to your exact role at AMD during that project. Some people believe you were integral in nailing down Zen’s design, then Zen 2 and Zen 3 high-level microarchitecture. Others believe that you put the people in place, signed off at high level, and then went to focus on the Arm version of Skybridge, K12. Can you give us any clarity as to your role there, how deep you went with Zen versus K12, or your involvement in things like Infinity Fabric?
Jim Keller: Yeah, it was a complicated project, right? At AMD when I joined, they had Bulldozer and Jaguar, and they both had some charming features but they weren't successful in the market. The roadmaps weren't aggressive, they were falling behind Intel, and so that's not a good thing to do if you're already behind - you better be catching up, not falling behind. So I took the role, and I was president of the CPU team which I think when I joined was 500 people. Then over the next three years the SoC team, the Fabric team, and some IP teams joined my little gang. I think when I left, it was 2400 people I was told. So I was a VP with a staff. I had senior directors reporting to me, and the senior fellows, and my staff was 15 people. So I was hardly writing RTL!
That said we did a whole bunch of things. I'm a computer architect, I’m not really a manager. I wanted the management role, which was the biggest management role I'd had at the time. Up to that point I'd been the VP of a start-up, but that was 50 people, and we all got along - this was a fairly different play for me. I knew that the technical changes we had to make would involve getting people aligned to it. I didn't want to be the architect on the side arguing with the VP about why somebody could or couldn’t do the job, or why this was the right or wrong decision. I spoke to Mark Papermaster, I told him my theory, and he said ‘okay, we'll give it a try’, and it worked out pretty good.
With that I had direct authority as it were - but people don't really do what they're told to do, right? They do what they're inspired to do. So you have to lay out a plan, and part of it was finding out who were the right people to do these different things, and sometimes somebody is really good, but people get very invested in what they did last time, or they believe things can't be changed, and I would say my view was things were so bad that almost everything had to change. So I went in with that as a default. Does that make sense? Now, it wasn't that we didn't find a whole bunch of stuff that was good to use. But you had to prove that the old thing was good, as opposed to prove the new thing was good, so we changed that mindset.
Architecturally, I had a pretty good idea what I wanted to build and why. I found people inside the company, such as Mike Clark, Leslie Barnes, Jay Fleischman, and others. There are quite a few really great people that once we describe what we wanted to do, they were like, ‘yeah, we want to do that’. Architecturally, I had some input. There was often decisions and analysis, and people have different opinions, so I was fairly hands-on doing that. But I wasn't doing block diagrams or writing RTL. We had multiple projects going on - there was Zen, there was the Arm cousin of that, the follow-on, and some new SoC methodology. But we did more than just CPU design - we did methodology design, IP refactoring, very large organizational changes. I was hands-on top to bottom with all that stuff, so it makes sense.
IC: A few people consider you 'The Father of Zen', do you think you’d scribe to that position? Or should that go to somebody else?
JK: Perhaps one of the uncles. There were a lot of really great people on Zen. There was a methodology team that was worldwide, the SoC team was partly in Austin and partly in India, the floating-point cache was done in Colorado, the core execution front end was in Austin, the Arm front end was in Sunnyvale, and we had good technical leaders. I was in daily communication for a while with Suzanne Plummer and Steve Hale, who kind of built the front end of the Zen core, and the Colorado team. It was really good people. Mike Clark's a great architect, so we had a lot of fun, and success. Success has a lot of authors - failure has one. So that was a success. Then some teams stepped up - we moved Excavator to the Boston team, where they took over finishing the design and the physical stuff, Harry Fair and his guys did a great job on that. So there were some fairly stressful organizational changes that we did, going through that. The team all came together, so I think there was a lot of camaraderie in it. So I won't claim to be the ‘father’ - I was brought in, you know, as the instigator and the chief nudge, but part architect part transformational leader. That was fun.
IC: Is everything that you worked on now out at AMD, or is there still, kind of roadmap stuff still to come out, do you think from the ideas that you helped propagate?
JK: So when you build a new computer, and Zen was a new computer, there was already work underway. You build in basically a roadmap, so I was thinking about what we were going to do for five years, chip after chip. We did this at Apple too when we built the first big core at Apple - we built big bones [into the design]. When you make a computer faster, there's two ways to do it - you make the fundamental structure bigger, or you tweak features, and Zen had a big structure. Then there were obvious things to do for several generations to follow. They've been following through on that.
So at some point, they will have to do another big rewrite and change. I don't know if they started that yet. What we had planned for the architectural performance improvements were fairly large, over a couple of years, and they seem to be doing a great job of executing to that. But I've been out of there for a while - four or five years now.
IC: Yeah, I think they said that Zen 3, the last one that just came out was a rewrite. So I think some people are thinking that was still under your direction.
JK: Yeah, it's hard to say. Even when we did Zen, we did a from-scratch design - a clean design at the top. But then when they built it, there was a whole bunch of pieces of RTL that came from Bulldozer, and Jaguar, which were perfectly good to use. They just had to be modified and built into the new Zen structure. So hardware guys are super good at using code when it's good.
So when they say they did a big rewrite, they probably took some pieces and re-architected them at the top, but when they built the code, it wouldn't surprise me if somewhere between 20% and 80% of the code was the same stuff, or mildly modified, but that's pretty normal. The key is to get the structure right, and then reuse code as needed, as opposed to taking something that's complicated and trying to tweak it to get somewhere. So if they did a rewrite, they probably fixed the structure.
Managing 10000 People at Intel
IC: I know it’s still kind of fresh, so I’m not sure what kind of NDAs you are still under, but your work at Intel - was that more of a clean slate? Can you go into any detail about what you did there?
JK: I can’t talk too much, obviously. The role I had was Senior Vice President of Silicon Engineering Group, and the team was 10,000 people. They're doing so many different things, it's just amazing. It was something like 60 or 70 SoCs is in flight at a time, literally from design to prototyping, debugging, and in production. So it was a fairly diverse group, and there my staff was vice presidents and senior fellows, so it was a big organizational thing.
I had thought I was going there because there was a bunch of new technology to go build. I spent most of my time working with the team about both organizational and methodology transformation, like new CAD tools, new methodologies, new ways to build chips. A couple of years before I joined, they started what's called the SoC IP view of building chips, versus Intel's historic monolithic view. That to be honest wasn't going well, because they took the monolithic chips, they took the great client and server parts, and simply broke it into pieces. You can't just break it into pieces - you have to actually rebuild those pieces and some of the methodology goes with it.
We found a bunch of people [internally] who were really excited about working on that, and I also spent a lot of time on IP quality, IP density, libraries, characterization, process technology. You name it, I was on it. My days were kind of wild - some days I’d have 14 different meanings in one day. It was just click, click, click, click, so many things going on.
IC: All those meetings, how did you get anything done?
JK: I don't get anything done technically! I got told I was the senior vice president - it's evaluation, set direction, make judgment calls, or let’s say try some organizational change, or people change. That adds up after a while. Know that the key thing about getting somewhere is to know where you are going, and then put an organization in place that knows how to do that - that takes a lot of work. So I didn't write much code, but I did send a lot of text messages.
IC: Now Intel has a new engineering-focused CEO in Pat Gelsinger. Would you ever consider going back if the right opportunity came up?
JK: I don't know. I have a really fun job now, and in a really explosive growth market. So I wish him the best. I think it was a good choice [for Pat as CEO], and I hope it's a good choice, but we'll see what happens. He definitely cares a lot about Intel, and he's had real success in the past. He’s definitely going to bring a lot more technical focus to the company. But I liked working with Bob Swan just fine, so we'll see what happens.
The Future with Tenstorrent
IC: You are now several companies on from AMD, at a company called Tenstorrent, with an old friend in Ljubisa Bajic. You’ve been jumping from company to company to company for basically your whole career. You’re always finding another project, another opportunity, another angle. Not to be too blunt, but is Tenstorrent going to be a forever home?
JK: First, I was at Digital (DEC) for 15 years, right! Now that was a different career because I was in the mid-range group where we built computers out of ECL - these were refrigerator-sized boxes. I was in the DEC Alpha team where we built little microprocessors, little teeny things, which at the time we thought were huge. These were 300 square millimeters at 50 watts, which blew everybody's mind.
So I was there for a while, and I went to AMD right during the internet rush, and we did a whole bunch of stuff in a couple of years. We started Opteron, HyperTransport, 2P servers - it was kind of a whirlwind of a place. But I got sucked up or caught up in the enthusiasm of the internet, and I went to SiByte, which got bought by Broadcom, and I was there for four years total. We delivered several generations of products.
I was then at P.A Semi, and we delivered a great product, but they didn't really want to sell the product for some reason, or they thought they were going to sell it to Apple. I actually went to Apple, and then Apple bought P.A Semi, and then I worked for that team, so you know I was between P.A Semi and Apple. That was seven years, so I don't really feel like that was jumping around too much.
Then I jumped to AMD I guess, and that was fun for a while. Then I went to Tesla where we delivered Hardware 3 (Tesla Autopilot). So that was kind of phenomenal. From a standing start to driving a car in 18 months - I don't think that's ever been done before, and that product shipped really successfully. They built a million of them last year. Tesla and Intel were a different kind of a whirlwind, so you could say I jumped in and jumped out. I sure had a lot of fun.
So yeah, I've been around a little bit. I like to think I mostly get done what I set out to accomplish. My success right there is pretty high in terms of delivering products that have lasting value. I'm not the guy to tweak things in production – it’s either a clean piece of paper or a complete disaster. That seems to be the things I do best at. It's good to know yourself - I'm not an operational manager. So Tenstorrent is more the clean piece of paper. The AI space is exploding. The company itself is already many years old, but we're building a new generation of parts and going to market and starting to sell stuff. I'm CTO and president, have a big stake in the company, both financially and also a commitment to my friends there, so I plan on being here for a while.
IC: I think you said before that going beyond the sort of matrix, you end up with massive graph structures, especially for AI and ML, and the whole point about Tenstorrent, it’s a graph compiler and a graph compute engine, not just a simple matrix multiply.
JK: From old math, and I'm not a mathematician, so mathematicians are going to cringe a little bit, but there was scalar math, like A = B + C x D. When you had a small number of transistors, that's the math you could do. Now we have more transistors you could say ‘I can do a vector of those’, like an equation properly in a step. Then we got more transistors, we could do a matrix multiply. Then as we got more transistors, you wanted to take those big operations and break them up, because if you make your matrix multiplier too big, the power of just getting across the unit is a waste of energy.
So you find you want to build this optimal size block that’s not too small, like a thread in a GPU, but it's not too big, like covering the whole chip with one matrix multiplier. That would be a really dumb idea from a power perspective. So then you get this array of medium size processors, where medium is something like four TOPs. That is still hilarious to me, because I remember when that was a really big number. Once you break that up, now you have to take the big operations and map them to the array of processors and AI looks like a graph of very big operations. It’s still a graph, and then the big operations are factored down into smaller graphs. Now you have to lay that out on a chip with lots of processors, and have the data flow around it.
This is a very different kind of computing than running a vector or a matrix program. So we sometimes call it a scalar vector matrix. Raja used to call it spatial compute, which would probably be a better word.
IC: Alongside the Tensix cores, Tenstorrent is also adding in vector engines into your cores for the next generation? How does that fit in?
JK: Remember the general-purpose CPUs that have vector engines on them – it turns out that when you're running AI programs, there is some general-purpose computing you just want to have. There are also some times in the graph where you want to run a C program on the result of an AI operation, and so having that compute be tightly coupled is nice. [By keeping] it on the same chip, the latency is super low, and the power to get back and forth is reasonable. So yeah, we're working on an interesting roadmap for that. That's a little computer architectural research area, like, what's the right mix with accelerated computing and total purpose computing and how are people using it. Then how do you build it in a way programmers can actually use it? That's the trick, which we're working on.
Engineers and People Skills
IC: If I go through your career, you’ve gone between high-performance computing and low-powered efficient computing. Now you’re in the world of AI acceleration. Has it ever got boring?
JK: No, and it's really weird! Well it's changed, and it's changed so much, but at some level it doesn't change at all. Computers at the bottom, they just add ones and zeros together. It's pretty easy. 011011100, it's not that complicated.
But I worked on the VAX 8800 where we built it out of gate arrays that had 200 OR gates in each chip. Like 200, right? Now at Tenstorrent, our little computers, we call them Tensix cores, are four trillion operations per second per core, and there's 100 of them in a chip. So the building block has shifted from 200 gates to four Tera Ops. That's kind of a wild transformation.
Then the tools are way better than they used to be. What you can do now - you can't build more complicated things unless the abstraction levels change and the tools change. There have been so many changes on that kind of stuff. When I was a kid, I used to think I had to do everything myself - and I worked like a maniac and coded all the time. Now I know how to work with people and organizations and listen. Stuff like that. People skills. I probably would have a pretty uneven scorecard on the people skills! I do have a few.
IC: Would you say that engineers need more people skills these days? Because everything is complex, everything has separate abstraction layers, and if you want to work between them you have to have the fundamentals down.
JK: Now here’s the fundamental truth, people aren't getting any smarter. So people can't continue to work across more and more things - that's just dumb. But you do have to build tools and organizations that support people's ability to do complicated things. The VAX 8800 team was 150 people. But the team that built the first or second processor at Apple, the first big custom core, was 150 people. Now, the CAD tools are unbelievably better, and we use 1000s of computers to do simulations, plus we have tools that could place and route 2 million gates versus 200. So something has changed radically, but the number of people an engineer might talk to in a given day didn't change at all. If you have an engineer talk to more than five people a day, they'll lose their mind. So, some things are really constant.
CPU Instruction Sets: Arm vs x86 vs RISC-V
IC: You’ve spoken about CPU instruction sets in the past, and one of the biggest requests for this interview I got was around your opinion about CPU instruction sets. Specifically questions came in about how we should deal with fundamental limits on them, how we pivot to better ones, and what your skin in the game is in terms of ARM versus x86 versus RISC V. I think at one point, you said most compute happens on a couple of dozen op-codes. Am I remembering that correctly?
JK: [Arguing about instruction sets] is a very sad story. It's not even a couple of dozen [op-codes] - 80% of core execution is only six instructions - you know, load, store, add, subtract, compare and branch. With those you have pretty much covered it. If you're writing in Perl or something, maybe call and return are more important than compare and branch. But instruction sets only matter a little bit - you can lose 10%, or 20%, [of performance] because you're missing instructions.
For a while we thought variable-length instructions were really hard to decode. But we keep figuring out how to do that. You basically predict where all the instructions are in tables, and once you have good predictors, you can predict that stuff well enough. So fixed-length instructions seem really nice when you're building little baby computers, but if you're building a really big computer, to predict or to figure out where all the instructions are, it isn't dominating the die. So it doesn't matter that much.
When RISC first came out, x86 was half microcode. So if you look at the die, half the chip is a ROM, or maybe a third or something. And the RISC guys could say that there is no ROM on a RISC chip, so we get more performance. But now the ROM is so small, you can't find it. Actually, the adder is so small, you can hardly find it? What limits computer performance today is predictability, and the two big ones are instruction/branch predictability, and data locality.
Now the new predictors are really good at that. They're big - two predictors are way bigger than the adder. That's where you get into the CPU versus GPU (or AI engine) debate. The GPU guys will say ‘look there's no branch predictor because we do everything in parallel’. So the chip has way more adders and subtractors, and that's true if that's the problem you have. But they're crap at running C programs.
GPUs were built to run shader programs on pixels, so if you're given 8 million pixels, and the big GPUs now have 6000 threads, you can cover all the pixels with each one of them running 1000 programs per frame. But it's sort of like an army of ants carrying around grains of sand, whereas big AI computers, they have really big matrix multipliers. They like a much smaller number of threads that do a lot more math because the problem is inherently big. Whereas the shader problem was that the problems were inherently small because there are so many pixels.
There are genuinely three different kinds of computers: CPUs, GPUs, and AI. NVIDIA is kind of doing the ‘inbetweener’ thing where they're using a GPU to run AI, and they're trying to enhance it. Some of that is obviously working pretty well, and some of it is obviously fairly complicated. What's interesting, and this happens a lot, is that general-purpose CPUs when they saw the vector performance of GPUs, added vector units. Sometimes that was great, because you only had a little bit of vector computing to do, but if you had a lot, a GPU might be a better solution.
IC: So going back to ISA question - many people were asking about what do you think about Arm versus x86? Which one has the legs, which one has the performance? Do you care much, if at all?
JK: I care a little. Here's what happened - so when x86 first came out, it was super simple and clean, right? Then at the time, there were multiple 8-bit architectures: x86, the 6800, the 6502. I programmed probably all of them way back in the day. Then x86, oddly enough, was the open version. They licensed that to seven different companies. Then that gave people opportunity, but Intel surprisingly licensed it. Then they went to 16 bits and 32 bits, and then they added virtual memory, virtualization, security, then 64 bits and more features. So what happens to an architecture as you add stuff, you keep the old stuff so it's compatible.
So when Arm first came out, it was a clean 32-bit computer. Compared to x86, it just looked way simpler and easier to build. Then they added a 16-bit mode and the IT (if then) instruction, which is awful. Then [they added] a weird floating-point vector extension set with overlays in a register file, and then 64-bit, which partly cleaned it up. There was some special stuff for security and booting, and so it has only got more complicated.
Now RISC-V shows up and it's the shiny new cousin, right? Because there's no legacy. It's actually an open instruction set architecture, and people build it in universities where they don’t have time or interest to add too much junk, like some architectures have. So relatively speaking, just because of its pedigree, and age, it's early in the life cycle of complexity. It's a pretty good instruction set, they did a fine job. So if I was just going to say if I want to build a computer really fast today, and I want it to go fast, RISC-V is the easiest one to choose. It’s the simplest one, it has got all the right features, it has got the right top eight instructions that you actually need to optimize for, and it doesn't have too much junk.
IC: So modern instruction sets have too much bloat, especially the old ones. Legacy baggage and such?
JK: Instructions that have been iterated on, and added to, have too much bloat. That's what always happens. As you keep adding things, the engineers have the struggle. You can have this really good design, there are 10 features, and so you add some features to it. The features all make it better, but they also make it more complicated. As you go along, every new feature added gets harder to do, because the interaction for that feature, and everything else, gets terrible.
The marketing guys, and the old customers, will say ‘don't delete anything’, but in the meantime they are all playing with the new fresh thing that only does 70% of what the old one does, but it does it way better because it doesn't have all these problems. I've talked about diminishing return curves, and there's a bunch of reasons for diminishing returns, but one of them is the complexity of the interactions of things. They slow you down to the point where something simpler that did less would actually be faster. That has happened many times, and it's some result of complexity theory and you know, human nefariousness I think.
IC: So did you ever see a situation where x86 gets broken down and something just gets reinvented? Or will it just remain sort of legacy, and then just new things will pop up like RISC-V to kind of fill the void when needed?
JK: x86-64 was a fairly clean slate, but obviously it had to carry all the old baggage for this and that. They deprecated a lot of the old 16-bit modes. There's a whole bunch of gunk that disappeared, and sometimes if you're careful, you can say ‘I need to support this legacy, but it doesn't have to be performant, and I can isolate it from the rest’. You either emulate it or support it.
We used to build computers such that you had a front end, a fetch, a dispatch, an execute, a load store, an L2 cache. If you looked at the boundaries between them, you'd see 100 wires doing random things that were dependent on exactly what cycle or what phase of the clock it was. Now these interfaces tend to look less like instruction boundaries – if I send an instruction from here to there, now I have a protocol. So the computer inside doesn't look like a big mess of stuff connected together, it looks like eight computers hooked together that do different things. There’s a fetch computer and a dispatch computer, an execution computer, and a floating-point computer. If you do that properly, you can change the floating-point without touching anything else.
That's less of an instruction set thing – it’s more ‘what was your design principle when you build it’, and then how did you do it. The thing is, if you get to a problem, you could say ‘if I could just have these five wires between these two boxes, I could get rid of this problem’. But every time you do that, every time you violate the abstraction layer, you've created a problem for future Jim. I've done that so many times, and like if you solve it properly, it would still be clean, but at some point if you hack it a little bit, then that kills you over time.
Living a Life of Abstraction
IC: I've seen a number of talks where you speak about the concept of abstraction layers in not only a lot of aspects of engineering, but also life as well. This concept that you can independently upgrade different layers without affecting those above and below, and providing new platforms to build upon. At what point in your life did that kind of ethos click, and what happened in your life to make it that a pervasive element of your personality?
JK: Pervasive element of my personality? That's pretty funny! I know I repeat it a lot, maybe I'm trying to convince myself.
Like, when we built EV 6, Dirk Meyer was the other architect. We had a couple other strong people. We divided the design into a few pieces, we wrote a very simple performance model, got it, but when we built the thing, it was a relatively short pipe for an out-of-order machine, because we were still a little weak on predictors. There were a lot of interactions between things, and it was a difficult design we built. We also built it with the custom design methodology Digital had at the time. So we had 22 different flip-flops, and people could/would roll their own flip flop. We frequently built large structures out of transistors. I remember somebody asked me what elements were in our library, and I said, both of them! N-devices and P-devices, right? Then I went to AMD, and K7 was built with a cell library.
Now, the engineers there were really good at laying down the cell libraries in a way they got good performance. They only had two flip flops - a big one and a little one, and they had a clean cell library. They had an abstraction layer between the transistors and the designers. This was before the age of really good place-and-route tools, and that was way better.
Then on the interface that we built on EV6, which was later called the S2K bus, we listened to AMD. We originally had a lot of complicated transactions to do snoops, and loads, and stores, and reads, and writes, and all kinds of stuff. A friend of mine, who was at Digital Research Lab, I explained how it worked to him one day - he listened to me and he just shook his head. He said ‘Jim, that's not the way you do this’. He explained how virtual channels worked, and how you could have separate abstract channels of information. You get that right before you start encoding commands. As a result of that educational seminar/ass-kicking, was HyperTransport. It has a lot of the S2K protocol, but it was built in a much more abstract way. So I would say that my move from AMD, from Digital to AMD, was where we had the ideas of how to build high-performance computing, but the methodologies were integrated, so from transistor up to architecture it couldn't be the same person.
At AMD, there’s Mike Clark, the architects, the microarchitects, and the RTL people who write Verilog, but they literally translated to the gate libraries, to the gate people, and it was much more of a layered approach. K7 was quite a fast processor, and our first swing at K8, we kind of went backwards. My favorite circuit partner at the time - he and I could talk about big designs, and we saw this as transistors, but that's a complicated way to build computers. Since then, I've been more convinced that the abstraction layers were right. You don't overstep human capability - that's the biggest problem. If you want to build something bigger and more complicated, you better solve the abstraction layers, because people aren't getting smarter. If you put more than 100 people on it, it'll slow down, not speed up, and so you have to solve that problem.
IC: If you have more than 100 people, you need to split into two abstraction layers?
JK: Exactly. There are reasons for that, like human beings are really good at tracking. Your inner circle of friends is like 10-20 people, it's like a close family, and then there is this kind of 50 to 100 depending on how it's organized, that you can keep track of. But above that, you read everybody outside your group of 100 people as semi-strangers. So you have to have some different contracts about how you do it. Like when we built Zen, we had 200 people, and half the team at the front end and half the team at the back end. The interface between them was defined, and they didn't really have to talk to each other about the details behind the contract. That was important. Now they got along pretty good and they worked together, but they didn't constantly have to go back and forth across that boundary.
Thoughts on Moore's Law
IC: You've said on stage, and in interviews in the past, that you're not worried about Moore's Law. You’re not worried on the process node side, about the evolution of semiconductors, and it will eventually get worked out by someone, somewhere. Would you say your attitude towards Moore's law is apathetic?
JK: I’m super proactive. That’s not apathetic at all. Like, I know a lot of details about it. People conflate a few things, like when Intel's 10-nanometer slipped. People said that Moore's law is dead, but TSMC’s roadmap didn’t slip at all.
Some of that is because TSMC’s roadmap aligned to the EUV machine availability. So when they went from 16nm, to 10nm, to 7nm, they did something that TSMC has been really good at - doing these half steps. So they did 7nm without EUV, and that 7nm with EUV, then 5nm without, and 5+nm with EUV, and they tweaked stuff. Then with the EUV machines, for a while people weren't sure if they're going to work. But now ASML’s market cap is twice that of Intel's (it’s actually about even now, on 21st June).
Then there's a funny thing - I realized that at the locus of innovation, we tend to think of TSMC, Samsung, and Intel as the process leaders. But a lot of the leadership is actually in the equipment manufacturers like ASML, and in materials. If you look at who is building the innovative stuff, and the EUV worldwide sales, the number is something like TSMC is going to buy like 150 EUV machines by 2023 or something like that. The numbers are phenomenal because even a few years ago not many people were even sure that EUV was going to work. But now there's X-ray lithography coming up, and again, you can say it's impossible, but bloody everything has been impossible! The fine print, this what Richard Feynman said - he's kind of smart. He said ‘there's lots of room at the bottom’, and I personally can count, and if you look at how many atoms are across transistors, there's a lot. If you look at how many transistors you actually need to make a junction, without too many quantum effects, there are only 10. So there is room there.
There's also this funny thing - there's a belief system when everybody believes technology is moving at this pace and the whole world is oriented towards it. But technology isn't one thing. There are people who figure out how to build transistors, like what the process designers do at like Intel, or TSMC, or Samsung. They use equipment which can do features, but then the features actually interact, and then there's a really interesting trade-off between, like, how should this be deposited and etched, how tall should it be, how wide, in what space. They are the craftsman using the tools, so the tools have to be super sharp, and the craftsmen have to be super knowledgeable. That's a complicated play. There's lots of interaction and at some level, because the machines themselves are complicated, you have this little complexity combination where the machine manufacturers are doing different pieces, but they don't always coordinate perfectly, or they coordinate through the machine integration guys who designed the process, and that's complicated. It can slow things down. But it's not due to physics fundamentals - we're making good progress on physics fundamentals.
IC: In your scaled ML talk, the one that you have in Comic Sans, you had the printed X slide. About it you say that as time goes on the way you print the X, because of the laws of physics, there are still several more steps to go in EUV. Also High NA EUV is coming in a couple of years, but now you mention X-rays. What's the timeline for that? It's not even on my radar yet.
JK: Typically when a technology comes along, they use it for one thing. First, when EUV was first used in DRAMs, it was literally for one step, maybe two. So I'm trying to remember – perhaps 2023/2024? It's not that far away. That means they're already up and running, and people are playing with it. Then the wild thing is, when they went from optical light to EUV, it was about a 10x reduction in wavelength? So they while they had crazy multi-patterning and interference kind of stuff that you saw those pictures of DUV, when it came to EUV, they could just print direct. But actually [as you go smaller] they can use the same tricks on EUV. So EUV is going to multi-patterning, I think in 3nm. Then there are so many tricks you can do with that. So yeah, the physics is really interesting. Then along with the physics, the optics stuff, and then there's the purity of the materials, which is super important, then temperature control, so things don't move around too much. Everywhere you look there are interesting physics problems, and so there's lots to do. There are hundreds of thousands of people working on it, and there’s more than enough innovation bandwidth.
Engineering the Right Team
IC: So pivoting to a popular question we’ve had. One of the things that we've noted you doing, as you go from company to company, is the topic of building a team. As teams are built by others, we've seen some people take engineers from a team they've built at previous companies to the next company. Have you ever got any insights into how you build your teams? Have there been any different approaches at the companies that you work for on this?
JK: The first thing you have to realize is if you are building the team, or finding one. So there's a great museum in Venice, the David Museum, and the front of the museum, there's these huge blocks of marble. 20 by 20 by 20. How they move them, I don't know. The block of marble sitting there, and Michelangelo could see this beautiful sculpture in it. It was already there, right? The problem was removing the excess marble.
So if you go into companies with 1000 employees, I guarantee you, there's a good team there. You don't have to hire anybody. When I was at AMD, I hardly hired anybody. We moved people around, we re-deployed people [elsewhere], but there were plenty of great people there. When I went to Tesla, we had to build the team from scratch, because there was nobody at Tesla that was building chips. I hired people that I knew, but then we hired a bunch of people that I didn't know at some point, and this is one of those interesting things.
I've seen leaders go from one company to another and they bring their 20 people, and then they start trying to reproduce what they had before. That's a bad idea, because although 20 people is enough to reproduce [what you had], it alienates what you want [in that new team]. When you build a new team, ideally, you get people you really like, either you just met them, or you work with them, but you want some differences in approach and thinking because everybody gets into a local minimum. So the new team has this opportunity to make something new together. Some of that is because if you had ten really great teams all working really well, and then you made a new team with one person from each of those teams: that may well be better, because they will re-select which the best ideas were.
But every team has pluses and minuses, and so you have to think about if you're building the team or finding a team, and then what's the dynamic you're trying to create that gives it space for people to have new ideas. Or, if some people get stuck on one idea, they then work with new people and they’ll start doing this incredible thing, and you think they're great, even though they used to be not so great, so what happened? Well, they were carrying some idea around that wasn't great, and then they met somebody who challenged them or the environment forced them, and all of a sudden they're doing a great job. I've seen that happen so many times.
Ken Olson at Digital (DEC) said there are no bad employees, there are just bad employee job matches. When I was younger, I thought that was stupid. But as I've worked with more people, I've seen that happen so many bloody times that I've even fired people who went on to be really successful. All because they weren't doing a good job and they were stuck, emotionally, and they felt committed to something that wasn't working. The act of moving them to a different place freed them up. [Needless to say] I don't get a thank you. (laughs)
IC: So how much of that also comes down to company culture? I mean, when you're looking for the person for the right position, or whether you're hiring in for the new position, do you try and get something that goes against the company grain? Or goes with the company grain? Do you have any tactics here or are you just looking for someone with spark?
JK: If you're trying to do something really innovative, it's probably mostly going against [the grain]. If you have a project that's going really well, bringing in instigators is going to slow everybody down, because you're already doing well. You have to read the group in the environment. Then there are some people who are really good, and they're really flexible to go on this project, they fit in and just push, but on the next project, you can see they have been building their network and the team, and on the next project they’re ready to do a pivot and everybody's willing to work. Trust is a funny thing, right? You know, if somebody walks up and says to jump off this bridge but you'll be fine, you're likely to call bullshit - but if you had already been through a whole bunch of stuff with them, and they said ‘look, trust me, then jump - you're going to be fine; it's going to suck, but it's going to be fine’, you'll do it, right? Teams that trust each other are way more effective than ones that have to do everything with contracts, negotiation, and politics.
So that's probably one thing - if you're building or finding a team, and you start seeing people doing politics, which means manipulating the environment for their own benefit, they have got to go. Unless you're the boss! Then you have got to see if they deliver. Some people are very political, but they really think their political strength comes from delivering. But people randomly in an organization that are political just cause lots of stress.
IC: Do you recommend that early or mid-career engineers should bounce around regularly from project to project, just so they don’t get stuck in a hole? It sounds like that’s a common thing.
JK: You learn fastest when you're doing something new, and working for somebody that knows way more than you. So if you're relatively early in your career and you're not learning a lot or, you know, the people that you're working for aren't inspiring you, then yeah you should probably change. There are some careers where I've seen people bounce around three times because they're getting experience and they end up being good at nothing. They would have been better staying where they were, and really getting deep at something. So you know, creative tension - there's creative tension between those two ideas.
Idols, Maturity, and the Human Experience
IC: So that kind of leads into a good question, actually, because I wanted to ask about you and your mentors going through your early career. Who did you look up to for leadership or knowledge or skills? Is there anyone you idolize?
JK: Oh, yeah, lots of people. Well it started out with my parents. Like, I was really lucky. My father was an engineer, and my mom was super smart, kind of more verbally and linguistically. The weird thing was that when I grew up, I was sort of more like her, you know, thinking-wise, but I was dyslexic - I couldn't read. My father was an engineer, so I grew up thinking I was like him, but I was actually intellectually more like my mother. They were both smart people. Now they came out of the 50s, and my mom raised family, so she didn't start her career as a therapist until later in life. But they were pretty interesting people.
Then, when I first started at Digital, I worked for a guy named Bob Stewart, who was a great computer architect. He did the PDP-11/44, PDP-11/70, VAX 780, VAX 8800, and the CI interconnect. Somebody said that every project that he had ever worked on earned a billion dollars, back when that was a huge number. So I worked for him and he was great, but there were half a dozen other really great computer architects there. I was at DEC and DEC had DEC Research Labs, and I got to meet guys like Butler Lampson and Chuck Thacker and Neil Wilhelm. Nancy Kronenberg was one of my mentors when I was a little kid, and she’s one of the chief people on the VMS operating system. So that was kind of lucky.
So did I idolize them? Well, they were both daunting and not, because I was a little bit of a, you know. I didn't quite realize who they were at the time. I was more a little oblivious to what was going on. Like, my first week at Digital, we got trained on this drawing system called Valid, which is kind of before the Matrox graphics era. So this guy walked in, and he was asking us questions and telling us about hierarchical design. I explained to him why that was partly good idea and partly stupid, and so we had an hour debate about it, then he walked off. Somebody said that was Gordon Bell. I asked ‘Who's that? He’s the CTO of Digital? Really? Well he's wrong about half the stuff he just said - I hope I straightened him out.’ But you know, I think that's just some serotonin activation or something. That's more of a mental problem with me than a feature, I think!
IC: So would you say you’ve matured?
JK: Not a bit!
IC: Is that where the fun is?
JK: I mean, there's a whole bunch of stuff. When I was young, it was like I get nervous when I give a talk, and I realized I had to understand the people around me better. But you know, I wasn't always quite convinced. [At the time] I rather they just do the right thing or something. So there's a bunch of stuff that has changed. Now I'm really interested in what people think and why they think it, and I have a lot of experience with that. Every once a while you can really help debug somebody, or get the group to work better. I don't mind giving public talks at all. I just decided that the energy I got from being nervous was fun. I still remember walking out on stage at Intel at some conference, like 2000 people. I was like I should have been really nervous, but instead I was just really excited about it. So some of that kind of stuff changed, but that's partly conscious, and partly just practice. I still get excited around like computer design and stuff. I had a friend of mine’s wife ask what they put in the water, because all we ever do is talk about computers. It's really fun, you know. Changing the world. It's great.
IC: It sounds like you have spent a lot more time, in a way, studying the human experience. If you understand how people think, how people operate, that’s different compared to mouthing at Gordon Bell for an hour.
JK: It's funny. People occasionally ask me like, or I tell people, that I read books. You learn a lot from books. Books are fun by the way - if you know how a book works. Somebody who lives 20 years, then passionately writes their best ideas (and there are lots of those books), and then you go on Amazon and find the best ones. It's hilarious, right? Like a really condensed experience in a book, written, and you can select the better books, like who knew, right? But I've been reading a lot of books for a long time.
It's hard to say, ‘read these four books, it'll change your life’. Sometimes a [single] book will change your life. But reading 1000 books will [certainly] change your life that's for damn sure. There's so much human experience that's useful. Who knew Shakespeare would be really useful for engineering management, right? But like, what are all those stories - power politics, devious guys, the minions doing all the work and the occasional hero saving the day? How does that all play out? You're always placed 500 years ago, but it applies to corporate America every single day of the week. So if you don't know Shakespeare or Machiavelli, you don’t know nothing.
IC: I think I remember you saying that before you went into your big first management role, you read 20 books about management techniques, and how you ended up realizing that you'd read 19 more than anybody else.
JK: Yeah, pretty much. I actually contacted Venkat (Venkatesh) Rao, who's famous for the Ribbonfarm blog and a few other things to figure [stuff] out. I really liked his thinking about organization from his blog, and he had a little thing at the bottom where it says to click here to buy him a cup of coffee, or get a consulting or a consult, so I sent him an email. So we started yakking, and we spent a lot of time talking before I joined AMD. He said I should read these books and I did. I thought everybody who’s in a big management job did that, but nobody does. You know it was hilarious - like 19 is generous. I read 20 more management books than most managers have ever read. Or they read some superficial thing like Good to Great, which has some nice stories in it, but it's not that deep a book management-wise. You'd be better off reading Carl Jung than Good to Great if you want to understand management.
IC: Do you find yourself reading more fiction or nonfiction?
JK: As a kid, I read all the nonfiction books. Then my parents had a book club. I didn't really learn to read until I was in fourth grade, but somewhere around seventh or eighth grade, I had read all the books in the house. They had John Updike, and John Barth was one of my favorite authors when I was a kid. So there were a whole bunch of stories. Then Doris Lessing. Doris Lessing wrote a series of science fiction books that were also psychological inquiries, and I read that, and I just, I couldn't believe it. Every once a while stuff like that kind of blows your mind. And it happened, obviously, at the right time. But now I read all kinds of stuff. I like history and anthropology and psychology, and mysticism, and there are so many different things. I’ve probably read fewer fiction books in the last 10 years. But when I was younger, I read probably mostly fiction.
IC: I did get a few particular comments from the audience in advance of this interview about comments you made when you were being interviewed by Lex Fridman. You said that you read two books a week. You’re also very adept at quoting from key engineers and futurists. I'm sure if you started tweeting what book you’re reading when you start a new one, you'll get a very large following. A sort of a passive Jim Keller book club!
JK: I would say I read two books a week. Now, I read a lot, but it tends to be blogs and all kinds of crazy stuff. I don't know - like doing Lex [Lex’s Podcast] is super fun, but I don't know that I have the attention span for social media to do anything like that. I'd forget about it for weeks at a time.
IC: How do you make sure that you're absorbing what you're reading, rather than having your brain diverting about some other problem that you might be worrying about?
JK: I don't really care about that. I know people that read books, and they are really worried if they're going to remember them. They spend all this time highlighting and analyzing. I read for interest, right? What I really remember is that people have to write 250-page books, because that's like a publisher rule. It doesn't matter if you have 50 pages of ideas, or 500, but you can tell pretty fast. I've read some really good books that are only 50 pages, because that's all they had. You can also read 50 pages, and you think, ‘wow, it's really great!’, but then the next 50 pages is the same shit. Then you realize it’s just been fleshed out – at that point I wish they just published a shorter book.
But that is what it is. But if the ideas are interesting, that's good. I meditate regularly, and then I think about what I'm thinking about, which is sometimes related to what I'm reading. Then if it's interesting, it gets incorporated. But your brain is this kind of weird thing - you don't actually have access to all the ideas and thoughts and things you've read, but your personality seems to be well informed by it, and I trust that process. So I don't worry if I can't remember somebody's name [in a book], because their idea may have changed, and who I was and I don't remember what book it came from. I don't care about that stuff.
IC: As long as you have passively absorb it at some level?
JK: Yeah. Well, there's a combination of passive and active. I told Lex that a lot of times when I'm working on problems, I prep my dreams for it. It's really useful. That's a fairly straightforward thing to do. Before you fall asleep, you call up your mind, on what you're really working on and thinking about. Then my personal experiences sometimes, I really do work on that, and sometimes that's just a problem in the way of what I actually need to think about, and I'll dream about something else. I'll wake up well, and one way or the other it was really interesting.
Nature vs Nurture
IC: So on the topic of time, here we are discussing personal health, study, meditation, and family, but also how you execute professionally. Are you one of these people who only needs four hours of sleep a night?
JK: Nah, I need like seven. Well, I added it up one day that my ideal day would have like 34 hours in it. Because I like to work out, spend time with my kids, I like to sleep and eat, and you know I like to work. I like to read too, so I don't know. Work is the weird one, because that can fill in lots more time than you want to spend on it. But I also really like working, so it's a challenge to kind of stamp it down.
IC: When there's a deadline, what gets pushed out the way first? You've worked at companies where getting the product out, and time to market, has been a key element of what you're doing.
JK: For about the last six years, the key thing for me is that once I have too much to do, I find somebody that wants to do it more than me. I mostly work on unsolved problems. You know I was the laziest person at Tesla. Tesla had a culture of working 12 hours a day to make it look like you're working, and I worked, you know, 9 to 7, which was a lot of hours. But I also went running at lunch, and a workout. They had a weightlifting room. Deer Creek was right next to the big machine shop, so I would go down there for an hour to work out and to eat.
At AMD and Intel, they're big, big organizations, and I had a really good staff. So I'd find myself spending way too much time on presentations, or working on some particular thing. Then I'd find some people who wanted to work on it, so I’d give it to them and, you know, go on vacation.
IC: Or speaking to press people like me, and taking up your time! What is your feeling about doing these sorts of press interviews, and you know, more the sort of marketing and corporate and discussion? These aren't really necessarily related to actually pushing the envelope, it's just talk.
JK: It’s not just talk. I’ve worked on some really interesting stuff, so I like to talk about it. When I was in Intel, I realized it was one of the ways to influence the Intel engineers. Like everybody thought Moore's Law was dead, and I thought ‘holy crap, it's the Moore's Law company!’. It was really a drag if [as an engineer] your main thing was that [Moore’s Law is dead], because I thought it wasn't. So I talked to various people, then they amplified what I said and debated it, and it went back inside. You know, I actually reached more people inside of Intel by doing external talks. So that was useful to me, because I had a mission to build faster computers. That's what I like to do. So when I talked to people, they always bring all kinds of stuff up, like how the work we do impacts people. Guys like you, and think really hard about it, and you talk to each other. Then I talk to you, and you ask all these questions, and it's kind of stimulating. It's fun. If you can explain something really clearly, you probably know it. There are a lot of times you think you know it, and then you go to explain it, but you're stumbling all around. I did some public talks where they were hard to do, like the talk actually seems simple, but to get to the simple part you have to get your ideas out and reorganize them and then throw out the BS. It's a useful thing to talk.
IC: Is it Feynman or Sagan that said ‘if you can’t explain the concept to at first-year college level, then you don’t really understand it’?
JK: Yeah, that sounds probably like Feynman. He did that really well, like with his lecture series on physics. It was quite interesting. Feynman’s problem was that he had such a brilliant intuition for the math, that his idea of simple was often not that simple! Like he just saw it, and you could tell. Like he could calculate some orbital geometry in five ‘simple’ steps, and he was so excited about how simple it was. But I think he was the only person in the room that thought it was simple.
IC: I presume he had the ability to visualize things in his head and manipulate them. I remember you saying at one point, that when it comes down to circuit-level design, that's the sort of thing you can do.
JK: Yeah. If I had one superpower, I feel like I can visualize how a computer actually runs. So when I do performance modeling and stuff like that, I can see the whole thing in my head and I'm just writing the code down. It is a really useful skill, but you know I probably partly was born with it. Partly developed and partly something that came out of my late adult diagnosis of dyslexia.
IC: I was going to ask how much of that is nature versus nurture?
JK: It's hard. There's this funny thing that with super-smart people, often things are so easy for them, that they can go a really long way without having to work hard. So I'm not that smart. So persistence, and what they call grit, is super useful, especially in computer design. When lots of stuff takes a lot of tweaking, you have to believe you can get there. But a lot of times, there's a whole bunch of subtle iterations to do, and practice with that actually really works. So yeah, everybody's a combination. But if you don't have any talent, it's pretty hard to get anywhere, but sometimes really talented people don't learn how to work, so they get stuck with just doing the things that are obvious, not the things that take that persistence through the mess.
IC: Also identifying that talent is critical as well, especially if you don’t know you have it?
JK: Yeah, but on the flip side, you may have enough talent, but you just haven't worked hard, and some people give up too soon. You’ve got to do something, something you're really interested in. When people are struggling, like if they want to be an engineer or in marketing or this or that, [ask yourself] what do you like? This is especially true for people who want to be engineers, but their parents or somebody wants me to be a manager. You're going to have a tough life, because you're not chasing your dream, you're chasing somebody else's. The odds that you will be excited about somebody else’s dream are low. So if you're not excited, you're not going to put the energy in. or learn. That's a tough loop in the end.
Pushing Everyone To Be The Best
IC: To what extent do you spend your time mentoring others, either inside organizations, or externally with previous coworkers or students? Do you ever envision yourself doing something on a more serious basis, like the ‘Jim Keller School of Semiconductor Design’?
JK: Nah. So it's funny because I'm mostly mission driven. Like, ‘we're going to build Zen!’, or ‘we're going to build Autopilot!’, and then there are people that work for me. Then as soon as they start working for me, I start figuring out who they are, and then some of them are fine, and some of them have big problems that need to be, let's say, dealt with one way or the other. So then I'll tell them what I want, sometimes I'll give them some pointed advice. Sometimes I'll do stuff, and you can tell some people are really good at learning by following. Then people later on are telling me that I was mentoring them, but I'm thinking that I thought I was kicking your ass? It's a funny experience.
There are quite a few people that said I impacted their life in some way, but for some of those, I went after them about their health or diet, because I thought they looked not energized by life. You can make really big improvements there. It's worth doing by the way. It was either that, or they were doing the wrong thing, and they were just not excited about it. [At that point] you can tell they should be doing something else. So they either have to figure out why they are not excited or get excited, and then a lot of people start fussing with themselves or with other people about their status or something. The best way to have status is to do something great, and then everybody thinks you're great. Having status by trying to claw your way up is terrible, because everybody thinks you're a climber, and sometimes they don’t have the competence or skill to make the right choice there. It mostly comes out of being mission driven.
I do care about people, at least I try to, and then I see the results. I mean, it's really gratifying to get a big complicated project done. You know where it was when you started, and then you know where it was when it was done, and then people when they work on successful things associate the leadership and the team they're working with as being part of that. So that's really great, but it doesn't always happen. I have a hard time doing quote ‘mentoring people’, because what's the mission? Like, somebody comes to you and says ‘I want to get better’. Well, better at what? Then if that's like wanting to be better at you playing violin, well I'm not good at that.
Whereas when I say ‘hey, we're going to build the world's fastest autopilot chip’, then everybody working on it needs to get better at doing that. It turns out three-quarters of their problems are actually personal, not technical. So to get the autopilot chip, you have to go debug all that stuff, and there are all kinds of personal problems - health problems, parental childhood problems, partner problems, workplace problems, and career stall problems. The list is so bloody long, and we take them all seriously. As it turns out, everybody thinks their own problems are really important, right? You may not think their problems are important, but I tell you, they do, and they have a list. Ask anybody – what are your top five problems. They can probably tell you. Or even weirder, they give you the wrong five, because that happens too.
IC: But did they give you the five they think you want to hear rather than the actual five?
JK: Yeah. People also have no-fly zones, so their biggest problem may be something they don’t want to talk about. But if you help them solve that, then the project will go better, and then at some point, they'll appreciate you. Then they'll say you're a mentor, and you're thinking, kinda, I don’t know.
IC: So you mentioned about your project succeeding, and you know, people being proud of their products. Do you have a 'proudest moment' of your career, project, or accolade? Any specific moments in time?
JK: I have, and there's a whole bunch of them. I worked with Becky Loop at Intel, and we were debugging some quality things. It turns out there was a whole bunch of layers of stuff. We were going back and forth on how to analyze it, how to present it, and I was frustrated with the data and what was going on. One day she came up with this picture, and it was just perfect. I was really excited for her because she'd gotten to the bottom of it. We actually saw a line of sight to fix and stuff. But that kind of stuff happens a lot.
IC: An epiphany?
JK: Yeah. Well sometimes working with a group of people, going into it is like a mess, but then it gets better. The Tesla Autopilot thing was wild, and Zen’s success has been fantastic. Everybody thought that the AMD team couldn't shoot straight, and I was very intrigued with the possibility of building a really great computer with the team that everybody thought was out of it. Like nobody thought AMD had a great CPU design team. But you know, the people who built Zen, they had 25 to 30 years work history at AMD. That was insane.
IC: I mean Mike Clark and Leslie Barnes, they’ve been there for 25 to 30 years.
JK: Steve Hale, Suzanne Plummer.
IC: The Lifers?
JK: Yeah, they're kind of lifers, but they had done many great projects there. They all had good track records. But what did we do different? We set some really clear goals, and then we reorganized to hit the goals. We did some really thorough talent analysis of where we were, and there were a couple people that had really checked out because they were frustrated that they could never do the right thing. You know I listened to them - whoa Jesus, I love to listen to people.
We had this really fun meeting, and it was one of the best experiences of my life. Suzanne called me up and said that people on the Zen team don't believe they can do it. I said, ‘great - I'll drive to the airport, I’m in California, and I'll see you there tomorrow morning, eight o'clock. Make sure you have a big room with lots of whiteboards’. It was like 30 angry people ready to tell me all the reasons why it wouldn't work. So I just wrote all of the reasons down on a whiteboard, and we spent two days solving them. It was wild because it started with me defending against the gang, but people started to jump in. I was like, whenever possible, when somebody would say ‘I know how we fix that’, I would give them the pen and they would get up on the board and explain it. It worked out really good. The thing was, the honesty of what they did, was great. Here are all the problems that we don't know how to solve, and so we're putting them on the table. They didn't give you 2 reasons but hold back 10 and say ‘you solve those two’. There was none of that kind of bullshit kind of stuff. They were serious people that had real problems, and they'd been through projects where people said they could solve these problems, and they couldn't. So they were probably calling me out, but like I’m just not a bullshitter. I’m not a bullshitter, but I told them how some we can do, some I don't know. But I remember, Mike Clark was there and he said we could solve all these problems. You know I walked out when our thing is pretty good, and people walked out of the room feeling okay, but two days later problems all pop back up. So you know, like how often do you have to go convince somebody? But that’s why they got through it. It wasn’t just me hectoring them from the sidelines, there were lots of people and lots of parts of the team that really said, they’re willing to really put some energy into this, which is great.
IC: At some point I’d love to interview some of them, but AMD keeps them under lock and key from the likes of us.
JK: That’s probably smart!
IC: Is there somebody in your career that you consider like a silent hero, that hasn’t got enough credit for the work that they’ve done?
JK: A person?
IC: Yeah.
JK: Most engineers. There are so many of them, it’s unbelievable. You know engineers, they don’t really get it. Compared to lawyers that are making 800 bucks an hour in Silicon Valley, engineers so often want to be left alone and do their work and crank out stuff. There are so many of those people that are just bloody great. I've talked to people who say stuff like ‘this is my eighth-generation memory controller’, and they're just proud as hell because it works and there are no bugs in it, and the RTL is clean, and the commits are perfect. Engineers like that are all over the place, I really like that scenario.
IC: But they don’t self-promote, or the company doesn’t?
JK: Engineers are more introverted, and conscientious. The introverted tend not to be the people who self-promote.
IC: But aren’t you a little like me, you’ve learned how to be more extroverted as you’ve grown?
JK: Well, I decided I wanted to build bigger projects, and to do that, you have to pretend to be an extrovert, and you have to promote yourself, because there's a whole bunch of people who are decision-makers who don't do the work to find out who the best architect is. They're going to pick who the person that everybody says is the best architect, or the loudest, or the capable. So at some level, if you want to succeed above ‘principal engineer’, you have to understand how to work in the environment of people who play it. Some people are super good at that naturally, so they get pretty high in organizations without much talent, sometimes without much hard work. Then the group of people, Director and above, that you have to deal with have a way different skill set than most of the engineers. So if you want to be part of that gang, even if you're an engineer, you have to learn how that rolls. It's not that complicated. Read Shakespeare, Young, a couple of books, Machiavelli, you know, you can learn a lot from that.
Security, Ethics, and Group Belief
IC: One of the future aspects of computing is security, and we've had a wake of side-channel vulnerabilities. This is a potential can of worms, attacking the tricks that we use to make fast computers. To what extent do you approach those security aspects when you're designing silicon these days? Are you proactive? Do you find yourself specifically proactive or reactive?
JK: So the market is sort of dictating needs. The funny thing about security first of all is you know it only has to be secure if somebody cares about it. For years, security in an operating system was virtual memory – for a particular process, its virtual memory couldn't look into another process's virtual memory. But the code underneath it in the operating system was so complicated that you could trick the operating system into doing something. So basically you started from security by correct software, but once you couldn't prove the software correct, they started putting additional hardware barriers in there. Now we're building computers where the operating system can't see the data the user has, and vice versa. So we're trying to put these extra boundaries in, but every time you do, you've made it a little more complicated.
At some level security worldwide is mostly for security by obscurity, right? Nobody cares about you, in particular, because you're just one out of 7 billion people. Like somebody could crack your iPhone, but they mostly don't care about it. There's a funny arms race going on about this, but it's definitely kind of incremental. They discovered side-channel attacks, and they weren't that hard to fix. But there'll be some other things, and, you know, I'm not a security expert. The overhead of building security features is mostly low. The hard part is thinking that out and deciding what to do. Every once in a while somebody will say something like ‘this is secure, because the software does x’, and I always think, ‘yeah, just wait 10 minutes, and the software will get more complicated, which will introduce a gap in it’. So there needs to be real hardware boundaries in there.
There are lots of computers that are secure, because they don't talk to anything. Like there are boatloads of places where the computers are usually behind a hard firewall, or literally disconnected from anything. So only physical attacks work, and then they have physical guards. So now, it's going to be interesting, but it's not super high in my thinking, I mostly follow what's going on, and then we'll just do the right thing. But I have no faith in security by software, let's say, because that always kind of grows to the point where it kind of violates its own premises. It's happened many times.
IC: So you've worked at Tesla, and when you designed a product specifically for Tesla. You have also worked at companies that sell products for a wide array of uses. Beyond that sort of customer workload analysis, do you consider the myriad of possibilities of what the product you are building will be used for? Do you consider the ethics behind what it might be used for? Or are you just there to solve the problem of building the chip?
JK: The funny thing about general-purpose computing is it can really be used for anything. So the ethics is more if the net good is better than the net bad. For the most part I think the net good is better than the possible downsides. But people do have serious concerns about this. There's all a big movement around ethics in AI, and to be honest, the AI capabilities have so far outstripped the thinking around the aspects of that. I don't know what to think about it.
What the current systems can do is already has stripped us bare, it knows what we think, and what we want, and what we're doing. Then the question is how many people have that one reason to build a lower-cost AI and programmable AI. We're talking to quite a large number of AI software startups, that want AI hardware and computing in more people's hands, because then you have a little mutual standoff situation, as opposed to one winner take all. But the modern tech world has been sort of a winner take all. There are literally several dozen very large companies that have a competitive relationship with each other. So, that's kind of complicated. I think about it some, but I don't have anything you know, really good to say, besides, you know the net benefit so far has been a positive. Having technology in more people's hands rather than a concentrated few seems better, but we'll see how it plays out.
IC: You've worked for a number of big personalities. You know, Elon Musk, Steve Jobs to name two. It appears you still have a strong contact with Elon. Your presence at the Neuralink demo last year with Lex, was not unnoticed. What’s your relationship with Elon now, and was he the one to invite you?
JK: I was invited by somebody in the Neuralink team. I mean Elon, I would say I don’t have a lot of contact with him at the moment. I like the development team there, so I went over to talk to those guys. It was fun.
IC: So you don’t stay in touch with Elon?
JK: No, I haven’t talked to him recently, no.
IC: It was very much a professional, not a personal relationship when you worked for Tesla then?
JK: Yeah.
IC: Because I was going ask about the fact that Elon is a big believer in Cryptocurrency. He regularly discusses it as it pertains to demands of computing and resources, for something that has no intrinsic value. Do you have any opinions as it comes to Cryptocurrency?
JK: Not much. Not really. I mean humans are really weird where they can put value in something like gold, or money, or cryptocurrency, and you know that's a shared belief contract. What it's based on, the best I can tell, hasn't mattered much. I mean the thing the crypto guys like is that it appears to be out of the hands of some central government. Whether that's true or not, I couldn't say. Jow that's going to impact stuff, I have no idea. But as a human, you know, group beliefs are really interesting, because when you're building things, if you don't have a group belief that makes sense then you're not going to get anything done. Group beliefs are super powerful, and they move currencies, politics, companies, technologies, philosophies, self-fulfillment. You name it. So that's a super interesting topic, but as for the details of Cryptocurrency, I don't care much about it, except as a manifestation of some kind of psychological phenomena about group beliefs, which is actually interesting. But it seems to be more of a symptom, or a random example let's say.
Chips Made by AI, and Beyond Silicon
IC: In terms of processor design, currently with EDA tools there is some amount of automation in there. Advances in AI and Machine Learning are being expanded into processor design - do you ever envision a time where an AI model can design a purposeful multi-million device or chip that will be unfathomable to human engineers? Would that occur in our lifetime, do you think?
JK: Yeah, and it’s coming pretty fast. So already the complexity of a high-end AMD, Intel, or Apple chip, is almost unfathomable that any one person. But if you actually go down into details today, you can mostly read the RTL or look at the cell libraries and say, ‘I know what they do’, right? But if you go look inside a neural network that's been trained and say, why is this weight 0.015843? Nobody knows.
IC: Isn’t that more data than design, though?
JK: Well, somebody told me this. Scientists, traditionally, do a bunch of observations and they go, ‘hey, when I drop a rock, it accelerates like this’. They then calculate how fast it accelerated and then they curve fit, and they realize ‘holy crap, there's this equation’. Physicists for years have come up with all these equations, and then when they got to relativity, they had to bend space and quantum mechanics, and they had to introduce probability. But still there are mostly understandable equations.
There's a phenomenon now that a machine learning thing can learn, and predict. Physics is some equation, put inputs, equation outputs, or function output, right? But if there's a black box there, where the AI networks as inputs, a black box of AI outputs, and you if you looked in the box, you can't tell what it means. There's no equation. So now you could say that the design of the neurons is obvious, you know - the little processors, little four teraflop computers, but the design of the weights is not obvious. That's where the thing is. Now, let’s go use an AI computer to go build an AI calculator, what if you go look inside the AI calculator? You can't tell why it's getting a value, and you don't understand the weight. You don't understand the math or the circuits underneath them. That's possible. So now you have two levels of things you don't understand. But what result do you desire? You might still be designed in the human experience.
Computer designers used to design things with transistors, and now we design things with high-level languages. So those AI things will be building blocks in the future. But it's pretty weird that there's going to be parts of science where the function is not intelligible. There used to be physics by explanation, such as if I was Aristotle, 1500 years ago - he was wrong about a whole bunch of stuff. Then there was physics by equation, like Newton, Copernicus, and people like that. Stephen Wolfram says there’s now going to be physics by, by program. There are very few programs that you can write in one equation. Theorems are complicated, and he says, why isn’t physics like that? Well, protein folding in the computing world now we have programmed by AI, which has no intelligible equations, or statements, so why isn’t physics going to do the same thing?
IC: It's going to be those abstraction layers, down to the transistor. Eventually, each of those layers will be replaced by AI, by some unintelligible black box.
JK: The thing that assembles the transistors will make things that we don’t even understand as devices. It’s like people have been staring at the brain for how many years, they still can't tell you exactly why the brain does anything.
IC: It’s 20 Watts of fat and salt.
JK: Yeah and they see chemicals go back and forth, and electrical signals move around, and, you know, they're finding more stuff, but, it's fairly sophisticated.
IC: I wanted to ask you about going beyond silicon. We've been working on silicon now for 50+ years, and the silicon paradigm has been continually optimized. Do you ever think about what’s going to happen beyond silicon, if we ever reach a theoretical limit within our lifetime? Or will anything get there, because it won’t have 50 years of catch-up optimization?
JK: Oh yeah. Computers started, you know, with Abacuses, right? Then mechanical relays. Then vacuum tubes, transistors, and integrated circuits. Now the way we build transistors, it's like a 12th generation transistor. They're amazing, and there's more to do. The optical guys have been actually making some progress, because they can direct light through polysilicon, and do some really interesting switching things. But that's sort of been 10 years away for 20 years. But they actually seem to be making progress.
It’s like the economics of biology. It’s 100 million times cheaper to make a complicated molecule than it is to make a transistor. The economics are amazing. Once you have something that can replicate proteins - I know a company that makes proteins for a living, and we did the math, and it was literally 100 million times less capital per molecule than we spent on transistors. So when you print transistors it’s something interesting because they're organized and connected in very sophisticated ways and in arrays. But our bodies are self-organizing - they get the proteins exactly where they need to be. So there's something amazing about that. There's so much room, as Feynman said, at the bottom, of how chemicals are made and organized, and how they’re convinced to go a certain way.
I was talking to some guys who were looking at doing a quantum computing startup, and they were using lasers to quiet down atoms, and hold them in 3D grids. It was super cool. So I think we've barely scratched the surface on what's possible. Physics is so complicated and apparently arbitrary that who the hell knows what we're going to build out of it. So yeah, I think about it. It could be that we need an AI kind of computation in order to organize the atoms in ways that takes us to that next level. But the possibilities are so unbelievable, it's literally crazy. Yeah I think about that.
Many thanks to Jim Keller and his team for their time.
Many thanks also to Gavin Bonshor for assistance in transcription,
81 Comments
View All Comments
prophet001 - Thursday, June 17, 2021 - link
Didn't read it all yet but the part about stop arguing about op-codes was pretty nice to hear. (Looks at apple fanboys)name99 - Thursday, June 17, 2021 - link
Better read the whole thing then. Because his comments (especially about the importance of abstraction layers) don't mean what you think they mean...mode_13h - Friday, June 18, 2021 - link
He never actually says that. His stance on ISAs is pretty clear, if you read to the end of that section. Ian tried pretty hard to nail it down.> if I want to build a computer really fast today, and I want it to go fast, RISC-V
> is the easiest one to choose. It’s the simplest one, it has got all the right features,
> it has got the right top eight instructions that you actually need to optimize for,
> and it doesn't have too much junk.
> As you go along, every new feature added gets harder to do,
> because the interaction for that feature, and everything else, gets terrible.
> The marketing guys, and the old customers, will say ‘don't delete anything’,
> but in the meantime they are all playing with the new fresh thing that only
> does 70% of what the old one does, but it does it way better because it
> doesn't have all these problems. I've talked about diminishing return curves,
> and there's a bunch of reasons for diminishing returns, but one of them is
> the complexity of the interactions of things. They slow you down to the point
> where something simpler that did less would actually be faster.
GeoffreyA - Friday, June 18, 2021 - link
I was glad to hear his statement about RISC-V. I just hope that if/when x86 goes down, AMD and Intel choose RISC-V. Windows is another problem, though; there isn't any RISC-V version as far as I know; and if they do release one, it'll take some time before x86-64 emulation is up and running.mode_13h - Friday, June 18, 2021 - link
We should recall that Tenstorrent is using some RISC V cores from SiFive, in an upcoming chip. That decision is kind of putting his money where his mouth is, although it could just mean that RISC-V was simply better in a single respect: TTM, PPA, PPW, or licensing costs, and "just good enough", in the others. He also cited SiFive's Chris Lattner, "one of the best compiler guys on the planet".TTM = time-to-market
PPA = performance-per-area
PPW = performance-per-Watt
BTW, Intel is rumored to be trying to acquire SiFive for $2B. That could put an interesting wrinkle in Tenstorrent's long-term plans with them. But, by the time follow-on chips are being built, maybe Tenstorrent will have been acquired as well.
I should add that one advantage of ARM over RISC V is how strict ARM is in avoiding vendor-specific ISA extensions. That makes ARM code very portable, by comparison with RISC V. This isn't a problem for embedded uses of RISC V, but plays against its potential for success in general purpose (or should I say "total purpose" ;-) computing.
Ian Cutress - Friday, June 18, 2021 - link
The RISC-V cores are a simple strip on one side of the chip to do mid-cycle compute. It's the Tensix cores that do most of the heavy liftingmode_13h - Friday, June 18, 2021 - link
Yes, definitely worth noting. However, Jim was keen to point out the 512-bit vector unit of the SiFive X280 core they'll be using. So, the fact that they aren't as fundamental to the chip as the Tensix cores doesn't mean they're not plenty important.For anyone interested in further details, check out Part 1 of the interview.
mode_13h - Friday, June 18, 2021 - link
https://www.anandtech.com/show/16709/an-interview-...mode_13h - Friday, June 18, 2021 - link
^ Part 1GeoffreyA - Saturday, June 19, 2021 - link
mode_13h, it would be interesting to see a showdown between RISC-V and ARM, but no doubt the picture might be muddied by ARM having better, more mature implementations. Anyway, I'm sure if AMD built one, they'd cover the ground quickly.