linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
@ 2025-07-21  9:59 Ban ZuoXiang
  2025-07-22  1:30 ` Baolu Lu
  0 siblings, 1 reply; 7+ messages in thread
From: Ban ZuoXiang @ 2025-07-21  9:59 UTC (permalink / raw)
  To: baolu.lu; +Cc: iommu, linux-kernel, stable, bbaa

Hi all,

We've identified a regression affecting PCI passthrough / SR-IOV virtualization starting from Linux v6.12.35.

A user reported that [1], beginning with this version, SR-IOV virtual functions fail to initialize properly inside the guest. The issue appears to some MMIO operations not completing correctly in the guest.

> [    2.152320] i915 0000:07:00.0: [drm] *ERROR* GT0: GUC: mmio request 0x4509: failure 306/0  
> [    2.152327] i915 0000:07:00.0: [drm] *ERROR* GuC initialization failed (-ENXIO)  
> [    2.152330] i915 0000:07:00.0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!  

Here is the |git bisect| log:

> # bad: [fbad404f04d758c52bae79ca20d0e7fe5fef91d3] Linux 6.12.37
> # good: [e03ced99c437f4a7992b8fa3d97d598f55453fd0] Linux 6.12.33
> git bisect start 'HEAD' 'v6.12.33'
> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
> # good: [35f116a4658f787bea7e82fdd23e2e9789254f5e] drm/xe: Make xe_gt_freq part of the Documentation
> git bisect good 35f116a4658f787bea7e82fdd23e2e9789254f5e
> # good: [261f2a655b709e59a8d759ce9fa478778d9e84f4] crypto: qat - add shutdown handler to qat_c3xxx
> git bisect good 261f2a655b709e59a8d759ce9fa478778d9e84f4
> # good: [4d0686b53cc9342be3f8ce06336fd5ab0d206355] ata: ahci: Disallow LPM for Asus B550-F motherboard
> git bisect good 4d0686b53cc9342be3f8ce06336fd5ab0d206355
> # bad: [ce4ef0274cb66a4750000f33f2d316c0dbaf4515] KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY
> git bisect bad ce4ef0274cb66a4750000f33f2d316c0dbaf4515
> # bad: [8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9] parisc/unaligned: Fix hex output to show 8 hex chars
> git bisect bad 8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9
> # good: [fed611bd8c7b76b070aa407d0c7558e20d9e1f68] f2fs: fix to do sanity check on ino and xnid
> git bisect good fed611bd8c7b76b070aa407d0c7558e20d9e1f68
> # good: [8a008c89e5e5c5332e4c0a33d707db9ddd529f8a] net/sched: fix use-after-free in taprio_dev_notifier
> git bisect good 8a008c89e5e5c5332e4c0a33d707db9ddd529f8a
> # bad: [3f2098f4fba7718eb2501207ca6e99d22427f25a] fbdev: Fix do_register_framebuffer to prevent null-ptr-deref in fb_videomode_to_var
> git bisect bad 3f2098f4fba7718eb2501207ca6e99d22427f25a
> # bad: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
> git bisect bad fb5873b779dd5858123c19bbd6959566771e2e83
> # good: [81c64c2f84ab581d1c45dbbbca941c13128faee6] net: ftgmac100: select FIXED_PHY
> git bisect good 81c64c2f84ab581d1c45dbbbca941c13128faee6
> # first bad commit: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
>
> commit fb5873b779dd5858123c19bbd6959566771e2e83
> Author: Lu Baolu <baolu.lu@linux.intel.com>
> Date:   Tue May 20 15:58:49 2025 +0800
>
>     iommu/vt-d: Restore context entry setup order for aliased devices
>     
>     commit 320302baed05c6456164652541f23d2a96522c06 upstream.
This commit was introduced in [2], and the issue only affects stable kernels prior to v6.15. Besides, the Ubuntu v6.14-series kernel used by Proxmox also appears to be affected [3].


Best regards,

Ban ZuoXiang

[1]: https://github.com/strongtz/i915-sriov-dkms/issues/320

[2]: https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com

[3]: https://github.com/strongtz/i915-sriov-dkms/issues/312 

 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
  2025-07-21  9:59 [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35 Ban ZuoXiang
@ 2025-07-22  1:30 ` Baolu Lu
  2025-07-22  2:40   ` bbaa
  2025-07-22 13:14   ` Ban ZuoXiang
  0 siblings, 2 replies; 7+ messages in thread
From: Baolu Lu @ 2025-07-22  1:30 UTC (permalink / raw)
  To: Ban ZuoXiang; +Cc: iommu, linux-kernel, stable, bbaa

On 7/21/25 17:59, Ban ZuoXiang wrote:
> Hi all,
> 
> We've identified a regression affecting PCI passthrough / SR-IOV virtualization starting from Linux v6.12.35.
> 
> A user reported that [1], beginning with this version, SR-IOV virtual functions fail to initialize properly inside the guest. The issue appears to some MMIO operations not completing correctly in the guest.
> 
>> [    2.152320] i915 0000:07:00.0: [drm]*ERROR* GT0: GUC: mmio request 0x4509: failure 306/0
>> [    2.152327] i915 0000:07:00.0: [drm]*ERROR* GuC initialization failed (-ENXIO)
>> [    2.152330] i915 0000:07:00.0: [drm]*ERROR* GT0: Failed to initialize GPU, declaring it wedged!
> Here is the|git bisect| log:
> 
>> # bad: [fbad404f04d758c52bae79ca20d0e7fe5fef91d3] Linux 6.12.37
>> # good: [e03ced99c437f4a7992b8fa3d97d598f55453fd0] Linux 6.12.33
>> git bisect start 'HEAD' 'v6.12.33'
>> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
>> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
>> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
>> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
>> # good: [35f116a4658f787bea7e82fdd23e2e9789254f5e] drm/xe: Make xe_gt_freq part of the Documentation
>> git bisect good 35f116a4658f787bea7e82fdd23e2e9789254f5e
>> # good: [261f2a655b709e59a8d759ce9fa478778d9e84f4] crypto: qat - add shutdown handler to qat_c3xxx
>> git bisect good 261f2a655b709e59a8d759ce9fa478778d9e84f4
>> # good: [4d0686b53cc9342be3f8ce06336fd5ab0d206355] ata: ahci: Disallow LPM for Asus B550-F motherboard
>> git bisect good 4d0686b53cc9342be3f8ce06336fd5ab0d206355
>> # bad: [ce4ef0274cb66a4750000f33f2d316c0dbaf4515] KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY
>> git bisect bad ce4ef0274cb66a4750000f33f2d316c0dbaf4515
>> # bad: [8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9] parisc/unaligned: Fix hex output to show 8 hex chars
>> git bisect bad 8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9
>> # good: [fed611bd8c7b76b070aa407d0c7558e20d9e1f68] f2fs: fix to do sanity check on ino and xnid
>> git bisect good fed611bd8c7b76b070aa407d0c7558e20d9e1f68
>> # good: [8a008c89e5e5c5332e4c0a33d707db9ddd529f8a] net/sched: fix use-after-free in taprio_dev_notifier
>> git bisect good 8a008c89e5e5c5332e4c0a33d707db9ddd529f8a
>> # bad: [3f2098f4fba7718eb2501207ca6e99d22427f25a] fbdev: Fix do_register_framebuffer to prevent null-ptr-deref in fb_videomode_to_var
>> git bisect bad 3f2098f4fba7718eb2501207ca6e99d22427f25a
>> # bad: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
>> git bisect bad fb5873b779dd5858123c19bbd6959566771e2e83
>> # good: [81c64c2f84ab581d1c45dbbbca941c13128faee6] net: ftgmac100: select FIXED_PHY
>> git bisect good 81c64c2f84ab581d1c45dbbbca941c13128faee6
>> # first bad commit: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
>>
>> commit fb5873b779dd5858123c19bbd6959566771e2e83
>> Author: Lu Baolu<baolu.lu@linux.intel.com>
>> Date:   Tue May 20 15:58:49 2025 +0800
>>
>>      iommu/vt-d: Restore context entry setup order for aliased devices
>>      
>>      commit 320302baed05c6456164652541f23d2a96522c06 upstream.
> This commit was introduced in [2], and the issue only affects stable kernels prior to v6.15. Besides, the Ubuntu v6.14-series kernel used by Proxmox also appears to be affected [3].

Thanks for reporting. Can this issue be reproduced with the latest
mainline linux kernel? Can it work if you simply revert this commit?

Thanks,
baolu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
  2025-07-22  1:30 ` Baolu Lu
@ 2025-07-22  2:40   ` bbaa
  2025-07-22 13:14   ` Ban ZuoXiang
  1 sibling, 0 replies; 7+ messages in thread
From: bbaa @ 2025-07-22  2:40 UTC (permalink / raw)
  To: Baolu Lu; +Cc: iommu, linux-kernel, stable, bbaa

> Thanks for reporting. Can this issue be reproduced with the latest
> mainline linux kernel? Can it work if you simply revert this commit?
>
> Thanks,
> baolu
Simply reverting this commit can resolve the issue.
Since Intel GPU SR-IOV currently depends on out-of-tree modules and is not yet compatible with the mainline kernel, I will test it later.
It can be confirmed that the v6.15 stable series is not affected, which also includes a backport of this commit.

regards,
Ban Zuoxiang





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
  2025-07-22  1:30 ` Baolu Lu
  2025-07-22  2:40   ` bbaa
@ 2025-07-22 13:14   ` Ban ZuoXiang
  2025-07-22 13:19     ` Greg KH
  1 sibling, 1 reply; 7+ messages in thread
From: Ban ZuoXiang @ 2025-07-22 13:14 UTC (permalink / raw)
  To: Baolu Lu; +Cc: iommu, linux-kernel, stable, bbaa

> Thanks for reporting. Can this issue be reproduced with the latest
> mainline linux kernel? Can it work if you simply revert this commit?
>
> Thanks,
> baolu 

Hi, baolu

The issue cannot be reproduced on the latest mainline kernel (6.16.0-rc7-1-mainline).
The Ubuntu v6.14-series kernel which also include the commit is also not affected.
I think the issue only affects the v6.12 series in linux-stable tree. Should I wait for the stable maintainers to solve it?

Thanks,
Ban ZuoXiang




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
  2025-07-22 13:14   ` Ban ZuoXiang
@ 2025-07-22 13:19     ` Greg KH
  2025-07-23  9:22       ` Ban ZuoXiang
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2025-07-22 13:19 UTC (permalink / raw)
  To: Ban ZuoXiang; +Cc: Baolu Lu, iommu, linux-kernel, stable, bbaa

On Tue, Jul 22, 2025 at 09:14:08PM +0800, Ban ZuoXiang wrote:
> > Thanks for reporting. Can this issue be reproduced with the latest
> > mainline linux kernel? Can it work if you simply revert this commit?
> >
> > Thanks,
> > baolu 
> 
> Hi, baolu
> 
> The issue cannot be reproduced on the latest mainline kernel (6.16.0-rc7-1-mainline).
> The Ubuntu v6.14-series kernel which also include the commit is also not affected.
> I think the issue only affects the v6.12 series in linux-stable tree. Should I wait for the stable maintainers to solve it?

Nope!  We need your help as you are the one that can reproduce it :)

Are we missing a backport?  Did we get the backport incorrect?  Should
we just revert it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
  2025-07-22 13:19     ` Greg KH
@ 2025-07-23  9:22       ` Ban ZuoXiang
  2025-07-23  9:39         ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Ban ZuoXiang @ 2025-07-23  9:22 UTC (permalink / raw)
  To: Greg KH; +Cc: Baolu Lu, iommu, linux-kernel, stable

> Nope!  We need your help as you are the one that can reproduce it :)
>
> Are we missing a backport?  Did we get the backport incorrect?  Should
> we just revert it?
>
> thanks,
> greg k-h

Hi, greg k-h

Original patch:

> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index bb871674d8acba..226e174577fff1 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4298,6 +4306,9 @@ static int identity_domain_attach_dev(struct
> iommu_domain *domain, struct device
>      else
>          ret = device_setup_pass_through(dev);
>  
> +    if (!ret)
> +        info->domain_attached = true;
> +
>      return ret;
>  }
Backport patch:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 157542c07aaafa..56e9f125cda9a0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4406,6 +4414,9 @@ static int device_set_dirty_tracking(struct
> list_head *devices, bool enable)
>              break;
>      }
>  
> +    if (!ret)
> +        info->domain_attached = true;
> +
>      return ret;
>  }

The last hunk of the original patch [1] was applied to the
|identity_domain_attach_dev| function, 
but the last hunk of the backport patch [2] appears to have been
mistakenly applied to the |device_set_dirty_tracking| function.
I can confirm that correctly placing the patch from
device_set_dirty_tracking into identity_domain_attach_dev resolves the
issue.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=320302baed05c6456164652541f23d2a96522c06
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fb5873b779dd5858123c19bbd6959566771e2e83

thanks,
Ban ZuoXiang






^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
  2025-07-23  9:22       ` Ban ZuoXiang
@ 2025-07-23  9:39         ` Greg KH
  0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2025-07-23  9:39 UTC (permalink / raw)
  To: Ban ZuoXiang; +Cc: Baolu Lu, iommu, linux-kernel, stable

On Wed, Jul 23, 2025 at 05:22:44PM +0800, Ban ZuoXiang wrote:
> > Nope!  We need your help as you are the one that can reproduce it :)
> >
> > Are we missing a backport?  Did we get the backport incorrect?  Should
> > we just revert it?
> >
> > thanks,
> > greg k-h
> 
> Hi, greg k-h
> 
> Original patch:
> 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index bb871674d8acba..226e174577fff1 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -4298,6 +4306,9 @@ static int identity_domain_attach_dev(struct
> > iommu_domain *domain, struct device
> >      else
> >          ret = device_setup_pass_through(dev);
> >  
> > +    if (!ret)
> > +        info->domain_attached = true;
> > +
> >      return ret;
> >  }
> Backport patch:
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 157542c07aaafa..56e9f125cda9a0 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -4406,6 +4414,9 @@ static int device_set_dirty_tracking(struct
> > list_head *devices, bool enable)
> >              break;
> >      }
> >  
> > +    if (!ret)
> > +        info->domain_attached = true;
> > +
> >      return ret;
> >  }
> 
> The last hunk of the original patch [1] was applied to the
> |identity_domain_attach_dev| function, 
> but the last hunk of the backport patch [2] appears to have been
> mistakenly applied to the |device_set_dirty_tracking| function.
> I can confirm that correctly placing the patch from
> device_set_dirty_tracking into identity_domain_attach_dev resolves the
> issue.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=320302baed05c6456164652541f23d2a96522c06
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fb5873b779dd5858123c19bbd6959566771e2e83

Ah, nice work!

Can you send a patch that fixes this up properly please?  I'll be glad
to queue that up.

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-23  9:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-21  9:59 [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35 Ban ZuoXiang
2025-07-22  1:30 ` Baolu Lu
2025-07-22  2:40   ` bbaa
2025-07-22 13:14   ` Ban ZuoXiang
2025-07-22 13:19     ` Greg KH
2025-07-23  9:22       ` Ban ZuoXiang
2025-07-23  9:39         ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).