* [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
@ 2025-07-21 9:59 Ban ZuoXiang
2025-07-22 1:30 ` Baolu Lu
0 siblings, 1 reply; 7+ messages in thread
From: Ban ZuoXiang @ 2025-07-21 9:59 UTC (permalink / raw)
To: baolu.lu; +Cc: iommu, linux-kernel, stable, bbaa
Hi all,
We've identified a regression affecting PCI passthrough / SR-IOV virtualization starting from Linux v6.12.35.
A user reported that [1], beginning with this version, SR-IOV virtual functions fail to initialize properly inside the guest. The issue appears to some MMIO operations not completing correctly in the guest.
> [ 2.152320] i915 0000:07:00.0: [drm] *ERROR* GT0: GUC: mmio request 0x4509: failure 306/0
> [ 2.152327] i915 0000:07:00.0: [drm] *ERROR* GuC initialization failed (-ENXIO)
> [ 2.152330] i915 0000:07:00.0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
Here is the |git bisect| log:
> # bad: [fbad404f04d758c52bae79ca20d0e7fe5fef91d3] Linux 6.12.37
> # good: [e03ced99c437f4a7992b8fa3d97d598f55453fd0] Linux 6.12.33
> git bisect start 'HEAD' 'v6.12.33'
> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
> # good: [35f116a4658f787bea7e82fdd23e2e9789254f5e] drm/xe: Make xe_gt_freq part of the Documentation
> git bisect good 35f116a4658f787bea7e82fdd23e2e9789254f5e
> # good: [261f2a655b709e59a8d759ce9fa478778d9e84f4] crypto: qat - add shutdown handler to qat_c3xxx
> git bisect good 261f2a655b709e59a8d759ce9fa478778d9e84f4
> # good: [4d0686b53cc9342be3f8ce06336fd5ab0d206355] ata: ahci: Disallow LPM for Asus B550-F motherboard
> git bisect good 4d0686b53cc9342be3f8ce06336fd5ab0d206355
> # bad: [ce4ef0274cb66a4750000f33f2d316c0dbaf4515] KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY
> git bisect bad ce4ef0274cb66a4750000f33f2d316c0dbaf4515
> # bad: [8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9] parisc/unaligned: Fix hex output to show 8 hex chars
> git bisect bad 8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9
> # good: [fed611bd8c7b76b070aa407d0c7558e20d9e1f68] f2fs: fix to do sanity check on ino and xnid
> git bisect good fed611bd8c7b76b070aa407d0c7558e20d9e1f68
> # good: [8a008c89e5e5c5332e4c0a33d707db9ddd529f8a] net/sched: fix use-after-free in taprio_dev_notifier
> git bisect good 8a008c89e5e5c5332e4c0a33d707db9ddd529f8a
> # bad: [3f2098f4fba7718eb2501207ca6e99d22427f25a] fbdev: Fix do_register_framebuffer to prevent null-ptr-deref in fb_videomode_to_var
> git bisect bad 3f2098f4fba7718eb2501207ca6e99d22427f25a
> # bad: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
> git bisect bad fb5873b779dd5858123c19bbd6959566771e2e83
> # good: [81c64c2f84ab581d1c45dbbbca941c13128faee6] net: ftgmac100: select FIXED_PHY
> git bisect good 81c64c2f84ab581d1c45dbbbca941c13128faee6
> # first bad commit: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
>
> commit fb5873b779dd5858123c19bbd6959566771e2e83
> Author: Lu Baolu <baolu.lu@linux.intel.com>
> Date: Tue May 20 15:58:49 2025 +0800
>
> iommu/vt-d: Restore context entry setup order for aliased devices
>
> commit 320302baed05c6456164652541f23d2a96522c06 upstream.
This commit was introduced in [2], and the issue only affects stable kernels prior to v6.15. Besides, the Ubuntu v6.14-series kernel used by Proxmox also appears to be affected [3].
Best regards,
Ban ZuoXiang
[1]: https://github.com/strongtz/i915-sriov-dkms/issues/320
[2]: https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com
[3]: https://github.com/strongtz/i915-sriov-dkms/issues/312
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
2025-07-21 9:59 [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35 Ban ZuoXiang
@ 2025-07-22 1:30 ` Baolu Lu
2025-07-22 2:40 ` bbaa
2025-07-22 13:14 ` Ban ZuoXiang
0 siblings, 2 replies; 7+ messages in thread
From: Baolu Lu @ 2025-07-22 1:30 UTC (permalink / raw)
To: Ban ZuoXiang; +Cc: iommu, linux-kernel, stable, bbaa
On 7/21/25 17:59, Ban ZuoXiang wrote:
> Hi all,
>
> We've identified a regression affecting PCI passthrough / SR-IOV virtualization starting from Linux v6.12.35.
>
> A user reported that [1], beginning with this version, SR-IOV virtual functions fail to initialize properly inside the guest. The issue appears to some MMIO operations not completing correctly in the guest.
>
>> [ 2.152320] i915 0000:07:00.0: [drm]*ERROR* GT0: GUC: mmio request 0x4509: failure 306/0
>> [ 2.152327] i915 0000:07:00.0: [drm]*ERROR* GuC initialization failed (-ENXIO)
>> [ 2.152330] i915 0000:07:00.0: [drm]*ERROR* GT0: Failed to initialize GPU, declaring it wedged!
> Here is the|git bisect| log:
>
>> # bad: [fbad404f04d758c52bae79ca20d0e7fe5fef91d3] Linux 6.12.37
>> # good: [e03ced99c437f4a7992b8fa3d97d598f55453fd0] Linux 6.12.33
>> git bisect start 'HEAD' 'v6.12.33'
>> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
>> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
>> # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails
>> git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc
>> # good: [35f116a4658f787bea7e82fdd23e2e9789254f5e] drm/xe: Make xe_gt_freq part of the Documentation
>> git bisect good 35f116a4658f787bea7e82fdd23e2e9789254f5e
>> # good: [261f2a655b709e59a8d759ce9fa478778d9e84f4] crypto: qat - add shutdown handler to qat_c3xxx
>> git bisect good 261f2a655b709e59a8d759ce9fa478778d9e84f4
>> # good: [4d0686b53cc9342be3f8ce06336fd5ab0d206355] ata: ahci: Disallow LPM for Asus B550-F motherboard
>> git bisect good 4d0686b53cc9342be3f8ce06336fd5ab0d206355
>> # bad: [ce4ef0274cb66a4750000f33f2d316c0dbaf4515] KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY
>> git bisect bad ce4ef0274cb66a4750000f33f2d316c0dbaf4515
>> # bad: [8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9] parisc/unaligned: Fix hex output to show 8 hex chars
>> git bisect bad 8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9
>> # good: [fed611bd8c7b76b070aa407d0c7558e20d9e1f68] f2fs: fix to do sanity check on ino and xnid
>> git bisect good fed611bd8c7b76b070aa407d0c7558e20d9e1f68
>> # good: [8a008c89e5e5c5332e4c0a33d707db9ddd529f8a] net/sched: fix use-after-free in taprio_dev_notifier
>> git bisect good 8a008c89e5e5c5332e4c0a33d707db9ddd529f8a
>> # bad: [3f2098f4fba7718eb2501207ca6e99d22427f25a] fbdev: Fix do_register_framebuffer to prevent null-ptr-deref in fb_videomode_to_var
>> git bisect bad 3f2098f4fba7718eb2501207ca6e99d22427f25a
>> # bad: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
>> git bisect bad fb5873b779dd5858123c19bbd6959566771e2e83
>> # good: [81c64c2f84ab581d1c45dbbbca941c13128faee6] net: ftgmac100: select FIXED_PHY
>> git bisect good 81c64c2f84ab581d1c45dbbbca941c13128faee6
>> # first bad commit: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
>>
>> commit fb5873b779dd5858123c19bbd6959566771e2e83
>> Author: Lu Baolu<baolu.lu@linux.intel.com>
>> Date: Tue May 20 15:58:49 2025 +0800
>>
>> iommu/vt-d: Restore context entry setup order for aliased devices
>>
>> commit 320302baed05c6456164652541f23d2a96522c06 upstream.
> This commit was introduced in [2], and the issue only affects stable kernels prior to v6.15. Besides, the Ubuntu v6.14-series kernel used by Proxmox also appears to be affected [3].
Thanks for reporting. Can this issue be reproduced with the latest
mainline linux kernel? Can it work if you simply revert this commit?
Thanks,
baolu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
2025-07-22 1:30 ` Baolu Lu
@ 2025-07-22 2:40 ` bbaa
2025-07-22 13:14 ` Ban ZuoXiang
1 sibling, 0 replies; 7+ messages in thread
From: bbaa @ 2025-07-22 2:40 UTC (permalink / raw)
To: Baolu Lu; +Cc: iommu, linux-kernel, stable, bbaa
> Thanks for reporting. Can this issue be reproduced with the latest
> mainline linux kernel? Can it work if you simply revert this commit?
>
> Thanks,
> baolu
Simply reverting this commit can resolve the issue.
Since Intel GPU SR-IOV currently depends on out-of-tree modules and is not yet compatible with the mainline kernel, I will test it later.
It can be confirmed that the v6.15 stable series is not affected, which also includes a backport of this commit.
regards,
Ban Zuoxiang
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
2025-07-22 1:30 ` Baolu Lu
2025-07-22 2:40 ` bbaa
@ 2025-07-22 13:14 ` Ban ZuoXiang
2025-07-22 13:19 ` Greg KH
1 sibling, 1 reply; 7+ messages in thread
From: Ban ZuoXiang @ 2025-07-22 13:14 UTC (permalink / raw)
To: Baolu Lu; +Cc: iommu, linux-kernel, stable, bbaa
> Thanks for reporting. Can this issue be reproduced with the latest
> mainline linux kernel? Can it work if you simply revert this commit?
>
> Thanks,
> baolu
Hi, baolu
The issue cannot be reproduced on the latest mainline kernel (6.16.0-rc7-1-mainline).
The Ubuntu v6.14-series kernel which also include the commit is also not affected.
I think the issue only affects the v6.12 series in linux-stable tree. Should I wait for the stable maintainers to solve it?
Thanks,
Ban ZuoXiang
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
2025-07-22 13:14 ` Ban ZuoXiang
@ 2025-07-22 13:19 ` Greg KH
2025-07-23 9:22 ` Ban ZuoXiang
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2025-07-22 13:19 UTC (permalink / raw)
To: Ban ZuoXiang; +Cc: Baolu Lu, iommu, linux-kernel, stable, bbaa
On Tue, Jul 22, 2025 at 09:14:08PM +0800, Ban ZuoXiang wrote:
> > Thanks for reporting. Can this issue be reproduced with the latest
> > mainline linux kernel? Can it work if you simply revert this commit?
> >
> > Thanks,
> > baolu
>
> Hi, baolu
>
> The issue cannot be reproduced on the latest mainline kernel (6.16.0-rc7-1-mainline).
> The Ubuntu v6.14-series kernel which also include the commit is also not affected.
> I think the issue only affects the v6.12 series in linux-stable tree. Should I wait for the stable maintainers to solve it?
Nope! We need your help as you are the one that can reproduce it :)
Are we missing a backport? Did we get the backport incorrect? Should
we just revert it?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
2025-07-22 13:19 ` Greg KH
@ 2025-07-23 9:22 ` Ban ZuoXiang
2025-07-23 9:39 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Ban ZuoXiang @ 2025-07-23 9:22 UTC (permalink / raw)
To: Greg KH; +Cc: Baolu Lu, iommu, linux-kernel, stable
> Nope! We need your help as you are the one that can reproduce it :)
>
> Are we missing a backport? Did we get the backport incorrect? Should
> we just revert it?
>
> thanks,
> greg k-h
Hi, greg k-h
Original patch:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index bb871674d8acba..226e174577fff1 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4298,6 +4306,9 @@ static int identity_domain_attach_dev(struct
> iommu_domain *domain, struct device
> else
> ret = device_setup_pass_through(dev);
>
> + if (!ret)
> + info->domain_attached = true;
> +
> return ret;
> }
Backport patch:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 157542c07aaafa..56e9f125cda9a0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4406,6 +4414,9 @@ static int device_set_dirty_tracking(struct
> list_head *devices, bool enable)
> break;
> }
>
> + if (!ret)
> + info->domain_attached = true;
> +
> return ret;
> }
The last hunk of the original patch [1] was applied to the
|identity_domain_attach_dev| function,
but the last hunk of the backport patch [2] appears to have been
mistakenly applied to the |device_set_dirty_tracking| function.
I can confirm that correctly placing the patch from
device_set_dirty_tracking into identity_domain_attach_dev resolves the
issue.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=320302baed05c6456164652541f23d2a96522c06
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fb5873b779dd5858123c19bbd6959566771e2e83
thanks,
Ban ZuoXiang
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35
2025-07-23 9:22 ` Ban ZuoXiang
@ 2025-07-23 9:39 ` Greg KH
0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2025-07-23 9:39 UTC (permalink / raw)
To: Ban ZuoXiang; +Cc: Baolu Lu, iommu, linux-kernel, stable
On Wed, Jul 23, 2025 at 05:22:44PM +0800, Ban ZuoXiang wrote:
> > Nope! We need your help as you are the one that can reproduce it :)
> >
> > Are we missing a backport? Did we get the backport incorrect? Should
> > we just revert it?
> >
> > thanks,
> > greg k-h
>
> Hi, greg k-h
>
> Original patch:
>
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index bb871674d8acba..226e174577fff1 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -4298,6 +4306,9 @@ static int identity_domain_attach_dev(struct
> > iommu_domain *domain, struct device
> > else
> > ret = device_setup_pass_through(dev);
> >
> > + if (!ret)
> > + info->domain_attached = true;
> > +
> > return ret;
> > }
> Backport patch:
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 157542c07aaafa..56e9f125cda9a0 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -4406,6 +4414,9 @@ static int device_set_dirty_tracking(struct
> > list_head *devices, bool enable)
> > break;
> > }
> >
> > + if (!ret)
> > + info->domain_attached = true;
> > +
> > return ret;
> > }
>
> The last hunk of the original patch [1] was applied to the
> |identity_domain_attach_dev| function,
> but the last hunk of the backport patch [2] appears to have been
> mistakenly applied to the |device_set_dirty_tracking| function.
> I can confirm that correctly placing the patch from
> device_set_dirty_tracking into identity_domain_attach_dev resolves the
> issue.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=320302baed05c6456164652541f23d2a96522c06
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fb5873b779dd5858123c19bbd6959566771e2e83
Ah, nice work!
Can you send a patch that fixes this up properly please? I'll be glad
to queue that up.
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-23 9:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-21 9:59 [REGRESSION][BISECTED] PCI Passthrough / SR-IOV Failure on Stable Kernel ≥ v6.12.35 Ban ZuoXiang
2025-07-22 1:30 ` Baolu Lu
2025-07-22 2:40 ` bbaa
2025-07-22 13:14 ` Ban ZuoXiang
2025-07-22 13:19 ` Greg KH
2025-07-23 9:22 ` Ban ZuoXiang
2025-07-23 9:39 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).