Related question: Can I buy a desktop Zen 5 CPU and something like an RX 7600 XT and some RAM and have a high shared memory bandwidth situation between the system memory and the GPU ala Strix Halo and Apple Silicon without spending a ton of money?
And get pretty reasonable local LLM performance on some of the larger models for hobbyist use?
Edit: I don’t have a good grasp on this but I’m thinking I can only do shared memory when I’m using an APU and not a discrete GPU. Is this correct?
I would love to try out one of the mini-PCs that ship this, but they seem to be made of either platinum (hugely overpriced in EU) or unobtainium (no retailers carry them here, and getting something direct from China is dicey warranty-wise). ROCm 7 looks to be working already under most Linux distros and having this as a workstation with a local LLM or a “home inference server” with Ollama and a few services seems like a great solution.
Have you looked at Corsair's AI Workstation 300 Desktop PC? [1] It's 2000-2700 EURO depending on model, and taking VAT into consideration it's comparable to the 1700-2300 USD pretax prices.
I ordered the framework desktop 395 - 128Gb edition for just under 1900 eur. With some extras I paid just over 2000 incl shipping to EU.
Didn’t feel overpriced to me.
ROCm is making great progress but I’ve had enough hiccups (desktop with RX9070XT) that I’d still recommend those looking for AI capability to continue using an Nvidia or Apple solution for the time being.
Still, I think it’ll be quite equivalent soon.
I think one of the best AI systems in terms of price/performance is still just to build a desktop with dual RTX 3090’s (of course you’ll need an board that supports dual cards) and toss it in a closet.
It depends on what you are doing. A lot of people who want to do local inference want to do it using much larger models than what can be fit onto a RTX3090, and Strix Halo is such a hit because it gives you reasonable (not great, but good enough to not be outright painful) performance with 128GB of memory.
Comparing this against mobile dGPUs and the (finally real) DGX Spark, this feels like a latent market segment that has not arrived at its final form. I don't know what delayed the DGX Spark so long, but it granted AMD a huge boon by allowing them capture some market mindshare first.
Compared to discrete GPUs (mobile or not), the advantage of a dGPU is memory bandwidth. The disadvantage of a dGPU is power draw and memory capacity—if we set aside CUDA, which I grant is a HUGE thing to just "set aside".
If we mix in the small DGX Spark desktops, then those have an additional advantage in the dual 200Gb network ports that allow for RDMA across multiple boxes. One could get more from of a small stack (2, 3 or 4) of those than from the same number of Strix Halo 395 boxes. However, as sexy as my homelab-brain finds a small stack of DGX Spark boxes with RDMA, I would think that for professional use, I would rather have a GPU server (or Threadripper GPU workstation) than four DGX Spark boxes?
Because the DGX Spark isn't being sold in a laptop (AFAIK, CMIIW), that is another differentiator in favor of the Strix Halo. Once again, it points to this being a weird, emerging market segment, and I expect the next generation or two will iterate towards how these capabilities really ought to be packaged.
Next gen, AMD has the Medusa Halo with (reportedly) a 384bit LPDDR6 bus. This should get you twice the memory of what Strix Halo has with 1.7 times the throughput when using memory that's already announced, with even better modules coming later.
I think with the success of Strix Halo as an inference platform, this market segment is here to stay.
I'm really excited and looking forward to this refresh. The APU spec leaks for the upcoming PS6 and XBox have some clues as well. My wishlist: more memory bandwidth, more GPU/NPU cores, actual unified memory rather than designating, more PCIe lanes. Of course there could be more/new AMD packaging magic sprinkled in too.
“dGPU” usually means “discrete GPU”. Do you mean “iGPU” for “integrated GPU” instead?
Strix Halo is also being marketed for gaming but the performance profile is all wrong for that. The CPU is too fast and the iGPU still not strong enough.
Yes, I intended to use the term "discrete GPU" before using "dGPU" as a shorthand for that exact reason (in the second paragraph). I now see that I edited the first paragraph to use "dGPU" without first defining it as such.
I also agree that they aren't for gaming (something I know little about). My comment was with respect to compute workloads, but I never specified that. Apologies.
From what I've seen the gaming benchmarks are fantastic. Beats the mobile 5070 for some games and settings, or slightly behind on others. While being very far ahead of every other iGPU.
I have a laptop with an Nvidia GPU. Ruins battery life and makes it run very hot. I'd pay a lot for a powerful iGPU.
As a casual gamer I'm already okay with the RTX 3050 dGPU on my laptop. Reports put Strix Halo at RTX 4070 level which is massive for an iGPU and certainly allows for 2k single screen gaming. Hardcore gaming will always require a desktop with PCIe boards.
In some power constrained scenarios that sort of thing is often petty reproducible.
Especially if the different SKUs have different power budgets. Laptop GPU naming and performance is a bit of a mess, as in the example shown (the 4060 on the Asus TUF Gaming A16 has a limit of 140w GPU+CPU, while the 4070 on the Asus Proart PX13 has 115w GPU+CPU - and even that is a "custom" non-default mode with 95w being the actual out-of-the-box limit).
With wildly varying power profiles laptop graphics need to be compared by chassis (and the cooling/power supply that implies) as much as by GPU SKU.
The DGX Spark seems to have one intended usecase: local AI model development and testing. The Strix Halo is an amd64 with iGPU, it can be used for any traditional PC workload, and is a reasonable local-ai target device.
For me, the Strix Halo is the first nail in the coffin of discrete GPUs inside laptops for amd64. I think Nvidia knows this, which is why they're partnering with Intel to make an iGPU setup.
I think it's beyond that even - it's for local AI toolchain model development and testing or those people who have a ore-exisitng nvidia deployment infrastructure
It feels like nVidia spent a ton of money here on a piece of infrastructure (the big network pipes) that very few people will ever leverage, and that the rest of the infrastructure constrains somewhat.
The saddest part of this is the lack of availability: at this point there's 2 standard laptops using this chip, the Z13 being the only high perf one. There's the Framework lines as well, but they aren't available in many countries, and it's a very specific public.
And that's after half a year after the first machines to come to the market.
I love the Z13, but it's clearly a niche machine, so I'm assuming they are having a really hard time manufacturing the chips ? All the capacity is getting eaten by Apple ?
Cognisant US pricing for the HP Z Book Ultra was astronomical, within the EU it's on par with standard laptops and to good effect. The only regret I have is ordering on release day and not wanting to wait for the 128gb version; but battery life and power has remain unmatched to any of the pretty large workloads I have thrown at it!
Outside of laptops, Beelink and co. are making NUCs with them which are relatively affordable!
I do agree, the scarcity has limited their opportunity to assess the growth opportunity.
I picked up a framework desktop and am running it through it's paces right now. So far, it's a impressive little box. I'm really hopeful that this continues to drive more and more enthusiast support and engagement. Getting strong vulcan or rocm supported infrastructure would be great for everyone.
I wonder if higher TDP is possible with framework desktop. That one probably has much better cooling than these laptops with the same chip and if numbers are different.
I haven't tested the power draw, but I have the mainboard from Framework that I put into a larger ITX case for better cooling.
My main PC is a 7950X3D which has the same core count/threads as the Strix unit, and the Strix benches within margin of error as the 7950X3D. Which is to say the performance is the same.
That you can get the same computer power in a laptop is crazy.
I read somewhere, but can't remember where, that a major reason those APUs aren't as efficient as the Apple ones is a conscious decision to share the architecture with Epyc and therefore accept worse efficiency at lower wattage as a tradeoff.
In this review, Hardware Canucks tested [1] the M4 Pro (3nm 2nd gen) and the 395+ (4nm) at 50w and found the performance being somewhat comparable. The differences can be explained away by 3nm vs 4nm.
They are ok but yeah they do not have anything like the memory bandwidth of an m3 ultra. But they also cost a lot less. I’m primarily looking to replace my older desktop but just have to make sure i can run an external gpu like the A6000 that i can borrow from work without having to spend a week fiddling with settings or parameters
Power draw is around 75W. It can be manual boosted, but will stay below 100W under all circumstances (from memory, as I was researching the Z13)
The chip itself should accept higher power draws, and ASUS usually isn't shy on feeding 130+W to a laptop, so the 75W figure was quite a surprise to me.
I love the concept of it and have been thinking about getting one the only problem I see right now is no ability as far as I can see to get an external dock to run an additional external gpu in the future.
> In a Linux context I got some GPUs working and I can add some [external] GPU devices. Minis forum when I reached out to them said they don't officially support either via the Thunderbolt compatibility USB 4, USB4 v2 or even the built-in PCIe slot. Yeah, not technically officially supported and it's because of the resource allocation and the BAR space and they need somebody on the BIOS team to understand that to fix it.
There are ways to manage BAR better in linux or with UEFI preboot environments for windows as hobbyists have been doing for ages due to bad BIOS support https://github.com/xCuri0/ReBarUEFI
Thanks for the pointer. I have been struggling to get either a oculink or USB4 PCIe tunnel to work with the framework desktop. HOpefully some clues here.
I was just thinking the other day that AMD can match Nvidia pound for pound on the raw hardware specs, and if they don’t just yet, they get pretty close. If AI is a bubble, then AMD should not catch up. If there isn’t a bubble, then there is no choice but to learn to use whatever is out there and AMD is truly set to be another trillion dollar company. The 10% stake OpenAI took is going to look like a Google buying YouTube moment in the long run.
And it’s worth noting, AMD has always matched up with Nvidia hardware wise for decades, plus or minus. They are an interesting company in that they took on both Nvidia and Intel, and is still continuing to do so.
Related question: Can I buy a desktop Zen 5 CPU and something like an RX 7600 XT and some RAM and have a high shared memory bandwidth situation between the system memory and the GPU ala Strix Halo and Apple Silicon without spending a ton of money?
And get pretty reasonable local LLM performance on some of the larger models for hobbyist use?
Edit: I don’t have a good grasp on this but I’m thinking I can only do shared memory when I’m using an APU and not a discrete GPU. Is this correct?
I would love to try out one of the mini-PCs that ship this, but they seem to be made of either platinum (hugely overpriced in EU) or unobtainium (no retailers carry them here, and getting something direct from China is dicey warranty-wise). ROCm 7 looks to be working already under most Linux distros and having this as a workstation with a local LLM or a “home inference server” with Ollama and a few services seems like a great solution.
Have you looked at Corsair's AI Workstation 300 Desktop PC? [1] It's 2000-2700 EURO depending on model, and taking VAT into consideration it's comparable to the 1700-2300 USD pretax prices.
[1]: https://www.corsair.com/eu/en/c/ai-workstations
No, but it falls into the platinum side of the equation. I can rent a cloud GPU for a few hours a month and come out ahead.
I don't think there's any computer hardware that is now economical to buy and use a couple hours a month than to rent
If the economics don't work out, perhaps this product is not for you and you're better off renting.
I ordered the framework desktop 395 - 128Gb edition for just under 1900 eur. With some extras I paid just over 2000 incl shipping to EU. Didn’t feel overpriced to me.
I looked just now and it cost 2500 euro without any storage.
Was it on sale or something?
ROCm is making great progress but I’ve had enough hiccups (desktop with RX9070XT) that I’d still recommend those looking for AI capability to continue using an Nvidia or Apple solution for the time being.
Still, I think it’ll be quite equivalent soon.
I think one of the best AI systems in terms of price/performance is still just to build a desktop with dual RTX 3090’s (of course you’ll need an board that supports dual cards) and toss it in a closet.
It depends on what you are doing. A lot of people who want to do local inference want to do it using much larger models than what can be fit onto a RTX3090, and Strix Halo is such a hit because it gives you reasonable (not great, but good enough to not be outright painful) performance with 128GB of memory.
Also, Vulkan is great, and much more stable. Plus tends to work great for new, and even very old, graphics cards.
At this point Vulkan will just take over. AMD and Intel are fumbling ROCm and SYCL, whereas Vulkan already ships nearly everywhere.
> ROCm is making great progress
is the progress in the room with us?
Comparing this against mobile dGPUs and the (finally real) DGX Spark, this feels like a latent market segment that has not arrived at its final form. I don't know what delayed the DGX Spark so long, but it granted AMD a huge boon by allowing them capture some market mindshare first.
Compared to discrete GPUs (mobile or not), the advantage of a dGPU is memory bandwidth. The disadvantage of a dGPU is power draw and memory capacity—if we set aside CUDA, which I grant is a HUGE thing to just "set aside".
If we mix in the small DGX Spark desktops, then those have an additional advantage in the dual 200Gb network ports that allow for RDMA across multiple boxes. One could get more from of a small stack (2, 3 or 4) of those than from the same number of Strix Halo 395 boxes. However, as sexy as my homelab-brain finds a small stack of DGX Spark boxes with RDMA, I would think that for professional use, I would rather have a GPU server (or Threadripper GPU workstation) than four DGX Spark boxes?
Because the DGX Spark isn't being sold in a laptop (AFAIK, CMIIW), that is another differentiator in favor of the Strix Halo. Once again, it points to this being a weird, emerging market segment, and I expect the next generation or two will iterate towards how these capabilities really ought to be packaged.
Next gen, AMD has the Medusa Halo with (reportedly) a 384bit LPDDR6 bus. This should get you twice the memory of what Strix Halo has with 1.7 times the throughput when using memory that's already announced, with even better modules coming later.
I think with the success of Strix Halo as an inference platform, this market segment is here to stay.
I'm really excited and looking forward to this refresh. The APU spec leaks for the upcoming PS6 and XBox have some clues as well. My wishlist: more memory bandwidth, more GPU/NPU cores, actual unified memory rather than designating, more PCIe lanes. Of course there could be more/new AMD packaging magic sprinkled in too.
Fyi its not dual 200Gb its 1x 200 or 2x 100Gb
How sure are you of that? :)
Everything I've seen says it's 2x 200GbE.
One of many examples: https://www.storagereview.com/review/nvidia-dgx-spark-review...
That review says "Allows for a maximum of 200G bandwidth" between the two ports.
“dGPU” usually means “discrete GPU”. Do you mean “iGPU” for “integrated GPU” instead?
Strix Halo is also being marketed for gaming but the performance profile is all wrong for that. The CPU is too fast and the iGPU still not strong enough.
I am sure it’s amazing at matmul though.
Yes, I intended to use the term "discrete GPU" before using "dGPU" as a shorthand for that exact reason (in the second paragraph). I now see that I edited the first paragraph to use "dGPU" without first defining it as such.
I also agree that they aren't for gaming (something I know little about). My comment was with respect to compute workloads, but I never specified that. Apologies.
I have one. Framework Desktop mainboard that I put into a larger ITX chassis and regular power supply.
It's fine for 1440p gaming. I don't use it for that, but it would not be a bother if that was all I had.
From what I've seen the gaming benchmarks are fantastic. Beats the mobile 5070 for some games and settings, or slightly behind on others. While being very far ahead of every other iGPU.
I have a laptop with an Nvidia GPU. Ruins battery life and makes it run very hot. I'd pay a lot for a powerful iGPU.
As a casual gamer I'm already okay with the RTX 3050 dGPU on my laptop. Reports put Strix Halo at RTX 4070 level which is massive for an iGPU and certainly allows for 2k single screen gaming. Hardcore gaming will always require a desktop with PCIe boards.
Strix Halo is nowhere near RTX 4070 (desktop at least, not familiar with laptop GPUs).
Maybe there's been some selective optimization and careful marketing but to even be in that ballpark for some games now means that more is coming.
https://www.techspot.com/news/106835-amd-ryzen-strix-halo-la...
This link is a terrible source. In one of the graphs 4060 is faster than 4070. This speaks to the quality of testing.
In some power constrained scenarios that sort of thing is often petty reproducible.
Especially if the different SKUs have different power budgets. Laptop GPU naming and performance is a bit of a mess, as in the example shown (the 4060 on the Asus TUF Gaming A16 has a limit of 140w GPU+CPU, while the 4070 on the Asus Proart PX13 has 115w GPU+CPU - and even that is a "custom" non-default mode with 95w being the actual out-of-the-box limit).
With wildly varying power profiles laptop graphics need to be compared by chassis (and the cooling/power supply that implies) as much as by GPU SKU.
That just proves the point about the source, right?
The DGX Spark seems to have one intended usecase: local AI model development and testing. The Strix Halo is an amd64 with iGPU, it can be used for any traditional PC workload, and is a reasonable local-ai target device.
For me, the Strix Halo is the first nail in the coffin of discrete GPUs inside laptops for amd64. I think Nvidia knows this, which is why they're partnering with Intel to make an iGPU setup.
I think it's beyond that even - it's for local AI toolchain model development and testing or those people who have a ore-exisitng nvidia deployment infrastructure
It feels like nVidia spent a ton of money here on a piece of infrastructure (the big network pipes) that very few people will ever leverage, and that the rest of the infrastructure constrains somewhat.
High Yield has a video that deep dive into the 395 chip on the silicon level: https://youtu.be/maH6KZ0YkXU
The saddest part of this is the lack of availability: at this point there's 2 standard laptops using this chip, the Z13 being the only high perf one. There's the Framework lines as well, but they aren't available in many countries, and it's a very specific public.
And that's after half a year after the first machines to come to the market.
I love the Z13, but it's clearly a niche machine, so I'm assuming they are having a really hard time manufacturing the chips ? All the capacity is getting eaten by Apple ?
Cognisant US pricing for the HP Z Book Ultra was astronomical, within the EU it's on par with standard laptops and to good effect. The only regret I have is ordering on release day and not wanting to wait for the 128gb version; but battery life and power has remain unmatched to any of the pretty large workloads I have thrown at it!
Outside of laptops, Beelink and co. are making NUCs with them which are relatively affordable!
I do agree, the scarcity has limited their opportunity to assess the growth opportunity.
I also have one with 64gb — best laptop I've ever used :-). I have the same regret of not waiting for the 128gb version to be available before buying.
Beelink, GMKtec, Minisforum, Corsair...
HP ZBook Ultra G1a is a great option and can be bought with up to 128GB RAM.
I picked up a framework desktop and am running it through it's paces right now. So far, it's a impressive little box. I'm really hopeful that this continues to drive more and more enthusiast support and engagement. Getting strong vulcan or rocm supported infrastructure would be great for everyone.
I wonder if higher TDP is possible with framework desktop. That one probably has much better cooling than these laptops with the same chip and if numbers are different.
I haven't tested the power draw, but I have the mainboard from Framework that I put into a larger ITX case for better cooling.
My main PC is a 7950X3D which has the same core count/threads as the Strix unit, and the Strix benches within margin of error as the 7950X3D. Which is to say the performance is the same.
That you can get the same computer power in a laptop is crazy.
I read somewhere, but can't remember where, that a major reason those APUs aren't as efficient as the Apple ones is a conscious decision to share the architecture with Epyc and therefore accept worse efficiency at lower wattage as a tradeoff.
Can someone confirm/refute that?
In this review, Hardware Canucks tested [1] the M4 Pro (3nm 2nd gen) and the 395+ (4nm) at 50w and found the performance being somewhat comparable. The differences can be explained away by 3nm vs 4nm.
[1]: https://www.youtube.com/watch?v=v7HUud7IvAo
It isn’t comparable at all. In MT, maybe it is comparable with the M4 Pro still winning. In ST, it is 3-4x ahead of Strix Halo in efficiency.
They are ok but yeah they do not have anything like the memory bandwidth of an m3 ultra. But they also cost a lot less. I’m primarily looking to replace my older desktop but just have to make sure i can run an external gpu like the A6000 that i can borrow from work without having to spend a week fiddling with settings or parameters
So potentially competitive with a 5070M for graphics? Sounds very nice, as long as price and power draw are reasonable.
Power draw is around 75W. It can be manual boosted, but will stay below 100W under all circumstances (from memory, as I was researching the Z13)
The chip itself should accept higher power draws, and ASUS usually isn't shy on feeding 130+W to a laptop, so the 75W figure was quite a surprise to me.
I love the concept of it and have been thinking about getting one the only problem I see right now is no ability as far as I can see to get an external dock to run an additional external gpu in the future.
I'm not sure what do you mean? I'm running eGPU with my Strix Point laptop via Thunderbolt.
I've also seen quite a few mini PCs with Oculink port and Strix Halo CPUs.
I was reading people having problems getting the external cards working if they had a lot of memory?
> In a Linux context I got some GPUs working and I can add some [external] GPU devices. Minis forum when I reached out to them said they don't officially support either via the Thunderbolt compatibility USB 4, USB4 v2 or even the built-in PCIe slot. Yeah, not technically officially supported and it's because of the resource allocation and the BAR space and they need somebody on the BIOS team to understand that to fix it.
https://www.youtube.com/watch?v=TvNYpyA1ZGk
There are ways to manage BAR better in linux or with UEFI preboot environments for windows as hobbyists have been doing for ages due to bad BIOS support https://github.com/xCuri0/ReBarUEFI
Thanks for the pointer. I have been struggling to get either a oculink or USB4 PCIe tunnel to work with the framework desktop. HOpefully some clues here.
There are plenty of mini-PCs with USB4 and Oculink, and you can get an M.2 adapter (might be tricky to retrofit into a laptop though).
I was just thinking the other day that AMD can match Nvidia pound for pound on the raw hardware specs, and if they don’t just yet, they get pretty close. If AI is a bubble, then AMD should not catch up. If there isn’t a bubble, then there is no choice but to learn to use whatever is out there and AMD is truly set to be another trillion dollar company. The 10% stake OpenAI took is going to look like a Google buying YouTube moment in the long run.
And it’s worth noting, AMD has always matched up with Nvidia hardware wise for decades, plus or minus. They are an interesting company in that they took on both Nvidia and Intel, and is still continuing to do so.
[dead]