He is no Jim Keller, and the mostly[1] automated transcript makes it read cringe, but it is not at all devoid of content.
Some examples of very interesting, non-obvious content:
* Even if store ports are kept fixed (2 in his example), adding store address generators (up to 4 in his example) actually improves performance, because it frees up load port dependencies.
* Within the same core, they use two different styles of load/address address contention mechanisms which he describes as two tables, one with explicit "allows" and the other one with explicit "denies" -- which of course end up converging (I understand it refers to two different encodings which vary in what is stored).
* Between cores, they have completely separate teams which reach different designs for things like this.
* It was interesting to me to discover how isolated the different core design teams work (which makes sense)
* It was interesting to me to picture the load/store address contention subsystem, which must be quite complex and needs to be really fast.
And I stop listing, re different types of workloads, gaming workloads being similar to DB workloads, and even more similar between them than to SPEC benchmarks and so on.
Just go read the interview if you're interested in CPU design!
[1] mostly automated: at least the dialog name labels seem to be hand-edited, as one of them has a typo
Oops, sorry about carelessly throwing the "cringe" label at that. Thanks for the transcript which allowed me to enjoy the content, which I did find very interesting.
I haven't watched the video so I am not sure how he actually talks, but what read cringe to me was things like the following paragraph:
"Stephen Robinson: Yeah. So let’s, let’s break it down into address generation versus execution. So, when you have three load execution ports, you need three load address generators. And so that’s there. On the store side, we have four store address generation units. But we only sustain two stores into the data cache."
Which reads weird. "let's" repeated twice, probably a stutter, could be transcribed just once. The "So" or "And so" the interviewee uses all the time at the start of sentences can also be removed for clearer and easier reading most of the time, without loss of meaning. Some sentences can almost be removed completely as they provide no actual information. The previous paragraph could be transcribed like this:
"Stephen Robinson: Let’s break it down into address generation versus execution. When you have three load execution ports, you need three load address generators. That’s there. On the store side, we have four store address generation units. But we only sustain two stores into the data cache."
I hesitate to remove "That's there." so I left it. But everything else I removed, it makes it clearer, and I think I'm not being unfaithful to the original. Removing the duplicate "let's" is a given as it's normal to stutter when speaking, but you don't really want to transcribe that unless the goal is to transcribe the talking imperfections we all have. And all the other things I removed, "Yeah", "So", "And so", are basically the same type of thing.
I thought this was automated because it had so many of the meaningless go-to words and hesitations from the original. Now that you mention it, automated transcription would probably never have produced something this good. And otherwise we are talking about stylistic preference here, always subjective -- although I'd definitely prefer the style of transcription suggest here.
Thanks again. I read chips and cheese with interest, quite often, and enjoy it quite a lot. Keep up the good work. And sorry for the careless put-down.
You're right the things you list do contain fresh information. Though the similarity between game logic and business logic is not a new observation ... and web browser in the same ballpark too. I think it's a code size vs data size thing. SPEC programs mostly have a relatively small amount of code, gcc being an obvious exception. And I guess Blender in 2017 FP.
And the last generation was wider and deeper than the one before it, also costing power and area.
The question that should be asked ... but which would never be answered ... is "What was it that you changed that REQUIRED and ALLOWED you to go wider and deeper?"
It's not a new process node every time.
Theres no NEED to have a massive reorder buffer unless you can decode and dispatch that number of instructions in the time it takes for a load to arrive from whichever level of memory hierarchy you're optimising for. And there's no POINT if you're often going to get a misprediction in that number of instructions. Ok, so wider decode is one component of that. Is there a difference in memory latency as well? Wider decode past 3 or 4 instructions increasingly means that you can't just end your packet of decoded instructions at the first branch -- as you get wider you're increasingly going to have to both parse past a conditional branch, and then have to predict more than one branch in the same decode cycle. You'll also get into branches that jump to other instructions in the same decode group (either forward or backward).
There are all kinds of complications there, with no doubt interesting solutions, that go far beyond "we went wider and deeper".
I asked chatgpt to give a contentful summary of the interview, it seems to be more or less accurate, albeit surface level. If anyone is interested.
It gets the "why" but not the "how". Maybe someone here can prompt it further to speculate on the "how". I don't think I'll be able to verify its output well enough to do that.
By using general knowledge to write e.g what adding a store address unit accomplishes in the context of the rest of the interview. Did you even read the chat?
For sure, I'm against it as well, it's just that in this case the transcription provided in the article was so terse that it was more or less useless. LLMs are good at expanding it to make more sense as prose. If you open the link, that is what the prompt asks it do as well. I'd argue that's useful and not just padding.
> Add content
Yes, I mentioned this in my original comment "not the why" "surface level" etc
Unfortunately our AI future involves many more people refusing to use their brains for more than a few seconds and depend on AI to generate summaries without knowing what parts are hallucinated or even the point.
Or, they read the transcription, didn't have time to see the video interview, and used an LLM to augment it to make sense as prose as an aid to the casual reader. I know a fair bit about the topic at hand:) but not enough to be gung-ho about it on a tech forum frequented by legends.
If you actually went through the LLM output, found problems with it, and then commented this, it would be fine. Until then it's an unfounded accusation.
Oh em gee ... what a contentless interview.
"We made it wider and deeper".
Gosh. Why didn't anyone think about doing that before?
He is no Jim Keller, and the mostly[1] automated transcript makes it read cringe, but it is not at all devoid of content.
Some examples of very interesting, non-obvious content:
* Even if store ports are kept fixed (2 in his example), adding store address generators (up to 4 in his example) actually improves performance, because it frees up load port dependencies. * Within the same core, they use two different styles of load/address address contention mechanisms which he describes as two tables, one with explicit "allows" and the other one with explicit "denies" -- which of course end up converging (I understand it refers to two different encodings which vary in what is stored). * Between cores, they have completely separate teams which reach different designs for things like this. * It was interesting to me to discover how isolated the different core design teams work (which makes sense) * It was interesting to me to picture the load/store address contention subsystem, which must be quite complex and needs to be really fast.
And I stop listing, re different types of workloads, gaming workloads being similar to DB workloads, and even more similar between them than to SPEC benchmarks and so on.
Just go read the interview if you're interested in CPU design!
[1] mostly automated: at least the dialog name labels seem to be hand-edited, as one of them has a typo
I did the transcription, but not the dialogs and labels etc. So I can say with certainty that it wasn't automated :)
What made the transcription "cringe"? I'd like to believe it's accurate.
Oops, sorry about carelessly throwing the "cringe" label at that. Thanks for the transcript which allowed me to enjoy the content, which I did find very interesting.
I haven't watched the video so I am not sure how he actually talks, but what read cringe to me was things like the following paragraph:
"Stephen Robinson: Yeah. So let’s, let’s break it down into address generation versus execution. So, when you have three load execution ports, you need three load address generators. And so that’s there. On the store side, we have four store address generation units. But we only sustain two stores into the data cache."
Which reads weird. "let's" repeated twice, probably a stutter, could be transcribed just once. The "So" or "And so" the interviewee uses all the time at the start of sentences can also be removed for clearer and easier reading most of the time, without loss of meaning. Some sentences can almost be removed completely as they provide no actual information. The previous paragraph could be transcribed like this:
"Stephen Robinson: Let’s break it down into address generation versus execution. When you have three load execution ports, you need three load address generators. That’s there. On the store side, we have four store address generation units. But we only sustain two stores into the data cache."
I hesitate to remove "That's there." so I left it. But everything else I removed, it makes it clearer, and I think I'm not being unfaithful to the original. Removing the duplicate "let's" is a given as it's normal to stutter when speaking, but you don't really want to transcribe that unless the goal is to transcribe the talking imperfections we all have. And all the other things I removed, "Yeah", "So", "And so", are basically the same type of thing.
I thought this was automated because it had so many of the meaningless go-to words and hesitations from the original. Now that you mention it, automated transcription would probably never have produced something this good. And otherwise we are talking about stylistic preference here, always subjective -- although I'd definitely prefer the style of transcription suggest here.
Thanks again. I read chips and cheese with interest, quite often, and enjoy it quite a lot. Keep up the good work. And sorry for the careless put-down.
You're right the things you list do contain fresh information. Though the similarity between game logic and business logic is not a new observation ... and web browser in the same ballpark too. I think it's a code size vs data size thing. SPEC programs mostly have a relatively small amount of code, gcc being an obvious exception. And I guess Blender in 2017 FP.
Because that costs power and area.
And it still does.
And the last generation was wider and deeper than the one before it, also costing power and area.
The question that should be asked ... but which would never be answered ... is "What was it that you changed that REQUIRED and ALLOWED you to go wider and deeper?"
It's not a new process node every time.
Theres no NEED to have a massive reorder buffer unless you can decode and dispatch that number of instructions in the time it takes for a load to arrive from whichever level of memory hierarchy you're optimising for. And there's no POINT if you're often going to get a misprediction in that number of instructions. Ok, so wider decode is one component of that. Is there a difference in memory latency as well? Wider decode past 3 or 4 instructions increasingly means that you can't just end your packet of decoded instructions at the first branch -- as you get wider you're increasingly going to have to both parse past a conditional branch, and then have to predict more than one branch in the same decode cycle. You'll also get into branches that jump to other instructions in the same decode group (either forward or backward).
There are all kinds of complications there, with no doubt interesting solutions, that go far beyond "we went wider and deeper".
https://chatgpt.com/share/68ef6cc3-1c48-8013-a545-905af89fbc...
I asked chatgpt to give a contentful summary of the interview, it seems to be more or less accurate, albeit surface level. If anyone is interested.
It gets the "why" but not the "how". Maybe someone here can prompt it further to speculate on the "how". I don't think I'll be able to verify its output well enough to do that.
I'm not sure what you expect to get out of this. How do you make a "contentful summary" of a contentless interview? Where do you get the content from?
By using general knowledge to write e.g what adding a store address unit accomplishes in the context of the rest of the interview. Did you even read the chat?
That doesn’t add useful content. It adds definitions. That’s just padding.
Only the interviewee can add content.
I’m also of the opinion “I asked ChatGPT for a summary” type comments are very low effort and don’t add to the discussion.
> don't add to the discussion
For sure, I'm against it as well, it's just that in this case the transcription provided in the article was so terse that it was more or less useless. LLMs are good at expanding it to make more sense as prose. If you open the link, that is what the prompt asks it do as well. I'd argue that's useful and not just padding.
> Add content
Yes, I mentioned this in my original comment "not the why" "surface level" etc
Unfortunately our AI future involves many more people refusing to use their brains for more than a few seconds and depend on AI to generate summaries without knowing what parts are hallucinated or even the point.
Or, they read the transcription, didn't have time to see the video interview, and used an LLM to augment it to make sense as prose as an aid to the casual reader. I know a fair bit about the topic at hand:) but not enough to be gung-ho about it on a tech forum frequented by legends.
If you actually went through the LLM output, found problems with it, and then commented this, it would be fine. Until then it's an unfounded accusation.
Well isn't Intel mostly alive by capital injections from the US government and NVidia nowadays? How much content did you expect from a straw puppet.
yeah strange sort of
Odd read especially after that preamble >> The transcript has been edited for readability and conciseness.
Not a lot of novel information either.
[flagged]
I'm not going to thumb my nose at CPU design content from folks that aren't good at public speaking. They're almost entirely distinct skill sets.
Also, the Venn Diagram between (good public speech) and (good public speech which also looks good when transcribed) is probably pretty thin.
It's basically a transcript of a conversation, so obviously it's not going to read as edited prose.
my original point stands
Yeah, that's weird.