[REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO
@ 2026-03-20  9:28 Mark Somerville
  2026-03-20 11:42 ` Thorsten Leemhuis
  2026-03-20 11:42 ` Mario Limonciello
  0 siblings, 2 replies; 6+ messages in thread
From: Mark Somerville @ 2026-03-20  9:28 UTC (permalink / raw)
  To: stable
  Cc: Mario Limonciello, regressions, Alex Deucher, Greg Kroah-Hartman,
	Christian König, Xinhui Pan

Hello maintainers!

I run Debian 13 stable (6.12 kernel) and have encountered a regression.

My machine has three GPUs, the iGPU that is part of my 7950X and two dGPUs - one NVIDIA 3090 and one AMD RX 6400. I use the iGPU for the host and only use the two dGPUs with virtual machines via VFIO with libvirt.

Although I have specified kernel parameters vfio_pci.ids for the GPUs, I have not blacklisted the amdgpu driver so that the host iGPU can operate.  Previously, starting a VM with the RX 6400 dGPU assigned to it (via VFIO) would work fine. However, doing this with more recent stable kernels causes the machine to hang immediately (and then, ultimately, reset after a while - ~30s). No errors are logged, at least as things are configured just now.

I can reliably reproduce this crash and a bisection revealed the commit that introducted the problem: 8140ac7c55e75093a01c6110a2c4025fe7177c57.

This is fixed in the mainline kernel, I have tested and verified my RX 6400 is working with VFIO under 7.0-rc4.

I *think* this is still present in the 6.12.y branch but a second (currently ongoing) regression is preventing me checking this on the latest and greatest 6.12 release right now.

Working:   6.12.63
Regressed: 6.12.69
Working:   7.0-rc4

#regzbot introduced: 8140ac7c55e75093a01c6110a2c4025fe7177c57

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO
  2026-03-20  9:28 [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO Mark Somerville
@ 2026-03-20 11:42 ` Thorsten Leemhuis
  2026-03-20 12:34   ` Greg Kroah-Hartman
  2026-03-20 11:42 ` Mario Limonciello
  1 sibling, 1 reply; 6+ messages in thread
From: Thorsten Leemhuis @ 2026-03-20 11:42 UTC (permalink / raw)
  To: Mark Somerville, stable
  Cc: Mario Limonciello, regressions, Alex Deucher, Greg Kroah-Hartman,
	Christian König, Xinhui Pan, stable@vger.kernel.org,
	Sasha Levin

@greg/@sasha: I might be missing something, but looks like one patch
that was backported missed two series where it's needed (see below for
details):

On 3/20/26 10:28, Mark Somerville wrote:
> 
> I run Debian 13 stable (6.12 kernel) and have encountered a regression.
> 
> My machine has three GPUs, the iGPU that is part of my 7950X and two dGPUs - one NVIDIA 3090 and one AMD RX 6400. I use the iGPU for the host and only use the two dGPUs with virtual machines via VFIO with libvirt.
> 
> Although I have specified kernel parameters vfio_pci.ids for the GPUs, I have not blacklisted the amdgpu driver so that the host iGPU can operate.  Previously, starting a VM with the RX 6400 dGPU assigned to it (via VFIO) would work fine. However, doing this with more recent stable kernels causes the machine to hang immediately (and then, ultimately, reset after a while - ~30s). No errors are logged, at least as things are configured just now.
> 
> I can reliably reproduce this crash and a bisection revealed the commit that introducted the problem: 8140ac7c55e75093a01c6110a2c4025fe7177c57.

That is 28695ca09d3264 ("drm/amd: Clean up kfd node on surprise
disconnect") [v6.19-rc6, v6.18.7, v6.12.67 (as 8140ac7c55e750), v6.6.122].

A fix for that f7afda7fcd169a ("drm/amd: Fix hang on amdgpu unload by
using pci_dev_is_disconnected()") [v7.0-rc1, v6.18.17, v6.12.77].

@greg/@sasha: Wondering why it's not in 6.19.y and 6.6.y. It failed
there first, but later was applied to 6.18.y and 6.12.y:

https://lore.kernel.org/all/?q=%22Fix+hang+on+amdgpu+unload+by+using+pci_dev_is_disconnected%22+%28f%3Agreg+OR+f%3Asasha%29

> This is fixed in the mainline kernel, I have tested and verified my RX 6400 is working with VFIO under 7.0-rc4.
>
> I *think* this is still present in the 6.12.y branch but a second (currently ongoing) regression is preventing me checking this on the latest and greatest 6.12 release right now.

v6.12.77 contains a fix for that commit, so there is a decent chance
that it will fix your problem, unless it's a different kind of problem.

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO
  2026-03-20  9:28 [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO Mark Somerville
  2026-03-20 11:42 ` Thorsten Leemhuis
@ 2026-03-20 11:42 ` Mario Limonciello
  2026-03-20 13:33   ` Mark Somerville
  1 sibling, 1 reply; 6+ messages in thread
From: Mario Limonciello @ 2026-03-20 11:42 UTC (permalink / raw)
  To: Mark Somerville, stable
  Cc: regressions, Alex Deucher, Greg Kroah-Hartman,
	Christian König, Xinhui Pan



On 3/20/26 4:28 AM, Mark Somerville wrote:
> Hello maintainers!
> 
> I run Debian 13 stable (6.12 kernel) and have encountered a regression.
> 
> My machine has three GPUs, the iGPU that is part of my 7950X and two dGPUs - one NVIDIA 3090 and one AMD RX 6400. I use the iGPU for the host and only use the two dGPUs with virtual machines via VFIO with libvirt.
> 
> Although I have specified kernel parameters vfio_pci.ids for the GPUs, I have not blacklisted the amdgpu driver so that the host iGPU can operate.  Previously, starting a VM with the RX 6400 dGPU assigned to it (via VFIO) would work fine. However, doing this with more recent stable kernels causes the machine to hang immediately (and then, ultimately, reset after a while - ~30s). No errors are logged, at least as things are configured just now.
> 
> I can reliably reproduce this crash and a bisection revealed the commit that introducted the problem: 8140ac7c55e75093a01c6110a2c4025fe7177c57.
> 
> This is fixed in the mainline kernel, I have tested and verified my RX 6400 is working with VFIO under 7.0-rc4.
> 
> I *think* this is still present in the 6.12.y branch but a second (currently ongoing) regression is preventing me checking this on the latest and greatest 6.12 release right now.
> 
> Working:   6.12.63
> Regressed: 6.12.69
> Working:   7.0-rc4
> 
> #regzbot introduced: 8140ac7c55e75093a01c6110a2c4025fe7177c57

If you bisected to 8140ac7c55e75093a01c6110a2c4025fe7177c57, try adding 
f7afda7fcd169a9168695247d07ad94cf7b9798f.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO
  2026-03-20 11:42 ` Thorsten Leemhuis
@ 2026-03-20 12:34   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2026-03-20 12:34 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Mark Somerville, stable, Mario Limonciello, regressions,
	Alex Deucher, Christian König, Xinhui Pan, Sasha Levin

On Fri, Mar 20, 2026 at 12:42:04PM +0100, Thorsten Leemhuis wrote:
> @greg/@sasha: I might be missing something, but looks like one patch
> that was backported missed two series where it's needed (see below for
> details):
> 
> On 3/20/26 10:28, Mark Somerville wrote:
> > 
> > I run Debian 13 stable (6.12 kernel) and have encountered a regression.
> > 
> > My machine has three GPUs, the iGPU that is part of my 7950X and two dGPUs - one NVIDIA 3090 and one AMD RX 6400. I use the iGPU for the host and only use the two dGPUs with virtual machines via VFIO with libvirt.
> > 
> > Although I have specified kernel parameters vfio_pci.ids for the GPUs, I have not blacklisted the amdgpu driver so that the host iGPU can operate.  Previously, starting a VM with the RX 6400 dGPU assigned to it (via VFIO) would work fine. However, doing this with more recent stable kernels causes the machine to hang immediately (and then, ultimately, reset after a while - ~30s). No errors are logged, at least as things are configured just now.
> > 
> > I can reliably reproduce this crash and a bisection revealed the commit that introducted the problem: 8140ac7c55e75093a01c6110a2c4025fe7177c57.
> 
> That is 28695ca09d3264 ("drm/amd: Clean up kfd node on surprise
> disconnect") [v6.19-rc6, v6.18.7, v6.12.67 (as 8140ac7c55e750), v6.6.122].
> 
> A fix for that f7afda7fcd169a ("drm/amd: Fix hang on amdgpu unload by
> using pci_dev_is_disconnected()") [v7.0-rc1, v6.18.17, v6.12.77].
> 
> @greg/@sasha: Wondering why it's not in 6.19.y and 6.6.y. It failed
> there first, but later was applied to 6.18.y and 6.12.y:
> 
> https://lore.kernel.org/all/?q=%22Fix+hang+on+amdgpu+unload+by+using+pci_dev_is_disconnected%22+%28f%3Agreg+OR+f%3Asasha%29

It's in the queue for 6.6.y, I've queued it up for 6.19.y now too.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO
  2026-03-20 11:42 ` Mario Limonciello
@ 2026-03-20 13:33   ` Mark Somerville
  2026-03-20 13:51     ` Mario Limonciello (AMD) (kernel.org)
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Somerville @ 2026-03-20 13:33 UTC (permalink / raw)
  To: Mario Limonciello, stable
  Cc: regressions, Alex Deucher, Greg Kroah-Hartman,
	Christian König, Xinhui Pan

March 20, 2026 at 11:42 AM, "Mario Limonciello" <superm1@kernel.org mailto:superm1@kernel.org?to=%22Mario%20Limonciello%22%20%3Csuperm1%40kernel.org%3E > wrote:
> 
> If you bisected to 8140ac7c55e75093a01c6110a2c4025fe7177c57, try adding f7afda7fcd169a9168695247d07ad94cf7b9798f.

Ah, nice one! Just tried that and can confirm that 8140ac7c + f7afda7f resolves this problem for me.

Also great that it's already in a later release than I have been able to test!

Thanks a lot for the fast resolution and apologies for any noise.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO
  2026-03-20 13:33   ` Mark Somerville
@ 2026-03-20 13:51     ` Mario Limonciello (AMD) (kernel.org)
  0 siblings, 0 replies; 6+ messages in thread
From: Mario Limonciello (AMD) (kernel.org) @ 2026-03-20 13:51 UTC (permalink / raw)
  To: Mark Somerville, stable
  Cc: regressions, Alex Deucher, Greg Kroah-Hartman,
	Christian König, Xinhui Pan



On 3/20/2026 8:33 AM, Mark Somerville wrote:
> [You don't often get email from mark@qpok.net. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> March 20, 2026 at 11:42 AM, "Mario Limonciello" <superm1@kernel.org mailto:superm1@kernel.org?to=%22Mario%20Limonciello%22%20%3Csuperm1%40kernel.org%3E > wrote:
>>
>> If you bisected to 8140ac7c55e75093a01c6110a2c4025fe7177c57, try adding f7afda7fcd169a9168695247d07ad94cf7b9798f.
> 
> Ah, nice one! Just tried that and can confirm that 8140ac7c + f7afda7f resolves this problem for me.
> 
> Also great that it's already in a later release than I have been able to test!
> 
> Thanks a lot for the fast resolution and apologies for any noise.

That's great!

Not noise at all, you found a problem and we have a solution that should 
be added so others don't hit it.

I think Sasha and Greg just need to pick it up then.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-20 13:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20  9:28 [REGRESSION] Unable to pass AMD RX 6400 GPU via VFIO Mark Somerville
2026-03-20 11:42 ` Thorsten Leemhuis
2026-03-20 12:34   ` Greg Kroah-Hartman
2026-03-20 11:42 ` Mario Limonciello
2026-03-20 13:33   ` Mark Somerville
2026-03-20 13:51     ` Mario Limonciello (AMD) (kernel.org)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox