* [REGRESSION] vfio gpu passthrough stopped working @ 2025-08-07 15:31 cat 2025-08-07 15:52 ` Greg KH 0 siblings, 1 reply; 6+ messages in thread From: cat @ 2025-08-07 15:31 UTC (permalink / raw) Cc: regressions, stable #regzbot introduced: v6.12.34..v6.12.35 After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed: - the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags - latency measurement feature appeared These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above: [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 ... [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 [ 1.964641] nouveau 0000:01:00.0: init failed with -5 [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working 2025-08-07 15:31 [REGRESSION] vfio gpu passthrough stopped working cat @ 2025-08-07 15:52 ` Greg KH 2025-08-07 18:31 ` Harshit Mogalapalli 2025-08-08 4:40 ` cat 0 siblings, 2 replies; 6+ messages in thread From: Greg KH @ 2025-08-07 15:52 UTC (permalink / raw) To: cat; +Cc: regressions, stable On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote: > #regzbot introduced: v6.12.34..v6.12.35 > > After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed: > > - the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s > - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags > - latency measurement feature appeared > > These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above: > > [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 > [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 > ... > [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 > [ 1.964641] nouveau 0000:01:00.0: init failed with -5 > [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 > [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 > [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5 > > > 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system). Can you use git bisect to find the offending commit? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working 2025-08-07 15:52 ` Greg KH @ 2025-08-07 18:31 ` Harshit Mogalapalli 2025-08-08 4:40 ` cat 1 sibling, 0 replies; 6+ messages in thread From: Harshit Mogalapalli @ 2025-08-07 18:31 UTC (permalink / raw) To: Greg KH, cat; +Cc: regressions, stable Hi, On 07/08/25 21:22, Greg KH wrote: > On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote: >> #regzbot introduced: v6.12.34..v6.12.35 >> >> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed: >> >> - the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s >> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags >> - latency measurement feature appeared >> >> These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above: >> >> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 >> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 >> ... >> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 >> [ 1.964641] nouveau 0000:01:00.0: init failed with -5 >> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 >> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 >> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5 >> >> >> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system). > > Can you use git bisect to find the offending commit?> Additional notes: I looked at the log and am listing probably relevant commit, if bisection is too costly: 68e58f579121 PCI: dwc: ep: Correct PBA offset in .set_msix() callback 523815857b1e PCI: cadence-ep: Correct PBA offset in .set_msix() callback These two might be interesting ones to consider. Please ignore this note if bisection is already in progress as these are pure guesses. Thanks, Harshit > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working 2025-08-07 15:52 ` Greg KH 2025-08-07 18:31 ` Harshit Mogalapalli @ 2025-08-08 4:40 ` cat 2025-08-08 9:00 ` cat 1 sibling, 1 reply; 6+ messages in thread From: cat @ 2025-08-08 4:40 UTC (permalink / raw) To: Greg KH; +Cc: regressions, stable I will perform bisection, yes. On 8/7/25 3:52 PM, Greg KH wrote: > On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote: >> #regzbot introduced: v6.12.34..v6.12.35 >> >> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed: >> >> - the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s >> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags >> - latency measurement feature appeared >> >> These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above: >> >> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 >> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 >> ... >> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 >> [ 1.964641] nouveau 0000:01:00.0: init failed with -5 >> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 >> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 >> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5 >> >> >> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system). > Can you use git bisect to find the offending commit? > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working 2025-08-08 4:40 ` cat @ 2025-08-08 9:00 ` cat 2025-08-08 9:22 ` Harshit Mogalapalli 0 siblings, 1 reply; 6+ messages in thread From: cat @ 2025-08-08 9:00 UTC (permalink / raw) To: Greg KH, regressions, stable fb5873b779dd5858123c19bbd6959566771e2e83 is the first bad commit commit fb5873b779dd5858123c19bbd6959566771e2e83 Author: Lu Baolu <baolu.lu@linux.intel.com> Date: Tue May 20 15:58:49 2025 +0800 iommu/vt-d: Restore context entry setup order for aliased devices commit 320302baed05c6456164652541f23d2a96522c06 upstream. Commit 2031c469f816 ("iommu/vt-d: Add support for static identity domain") changed the context entry setup during domain attachment from a set-and-check policy to a clear-and-reset approach. This inadvertently introduced a regression affecting PCI aliased devices behind PCIe-to-PCI bridges. Specifically, keyboard and touchpad stopped working on several Apple Macbooks with below messages: kernel: platform pxa2xx-spi.3: Adding to iommu group 20 kernel: input: Apple SPI Keyboard as /devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-APP000D:00/input/input0 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 Fix this by restoring the previous context setup order. Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/all/4dada48a-c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/ Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250520075849.755012-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/iommu/intel/iommu.c | 11 +++++++++++ drivers/iommu/intel/iommu.h | 1 + drivers/iommu/intel/nested.c | 4 ++-- 3 files changed, 14 insertions(+), 2 deletions(-) On 8/8/25 4:40 AM, cat wrote: > I will perform bisection, yes. > > On 8/7/25 3:52 PM, Greg KH wrote: >> On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote: >>> #regzbot introduced: v6.12.34..v6.12.35 >>> >>> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has >>> stopped working within a windows VM, it sees device in device >>> manager but reports that it did not start correctly. I compared >>> lspci logs in the vm before and after upgrade to 6.12.35, and here >>> are the changes I noticed: >>> >>> - the reported link speed for the passthrough GPU has changed from >>> 2.5 to 16GT/s >>> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags >>> - latency measurement feature appeared >>> >>> These entries also began appearing within the vm in dmesg when host >>> kernel is 6.12.35 or above: >>> >>> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 >>> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot >>> failed: -5 >>> ... >>> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 >>> [ 1.964641] nouveau 0000:01:00.0: init failed with -5 >>> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 >>> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 >>> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau >>> failed with error -5 >>> >>> >>> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am >>> using intel CPU and nvidia GPU (for passthrough, and as my GPU on >>> linux system). >> Can you use git bisect to find the offending commit? >> >> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working 2025-08-08 9:00 ` cat @ 2025-08-08 9:22 ` Harshit Mogalapalli 0 siblings, 0 replies; 6+ messages in thread From: Harshit Mogalapalli @ 2025-08-08 9:22 UTC (permalink / raw) To: cat, Greg KH, regressions, stable Hi, On 08/08/25 14:30, cat wrote: > fb5873b779dd5858123c19bbd6959566771e2e83 is the first bad commit > commit fb5873b779dd5858123c19bbd6959566771e2e83 > Author: Lu Baolu <baolu.lu@linux.intel.com> > Date: Tue May 20 15:58:49 2025 +0800 > > iommu/vt-d: Restore context entry setup order for aliased devices > > commit 320302baed05c6456164652541f23d2a96522c06 upstream. > > Commit 2031c469f816 ("iommu/vt-d: Add support for static identity > domain") > changed the context entry setup during domain attachment from a > set-and-check policy to a clear-and-reset approach. This inadvertently > introduced a regression affecting PCI aliased devices behind PCIe- > to-PCI > bridges. > > Specifically, keyboard and touchpad stopped working on several Apple > Macbooks with below messages: > > kernel: platform pxa2xx-spi.3: Adding to iommu group 20 > kernel: input: Apple SPI Keyboard as > /devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi- > APP000D:00/input/input0 > kernel: DMAR: DRHD: handling fault status reg 3 > kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr > 0xffffa000 [fault reason 0x06] PTE Read access is not set > kernel: DMAR: DRHD: handling fault status reg 3 > kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr > 0xffffa000 [fault reason 0x06] PTE Read access is not set > kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 > kernel: DMAR: DRHD: handling fault status reg 3 > kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr > 0xffffa000 [fault reason 0x06] PTE Read access is not set > kernel: DMAR: DRHD: handling fault status reg 3 > kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 > > Fix this by restoring the previous context setup order. > > Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity > domain") > Closes: https://lore.kernel.org/all/4dada48a- > c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/ > Cc: stable@vger.kernel.org > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > Reviewed-by: Kevin Tian <kevin.tian@intel.com> > Reviewed-by: Yi Liu <yi.l.liu@intel.com> > Link: https://lore.kernel.org/r/20250514060523.2862195-1- > baolu.lu@linux.intel.com > Link: https://lore.kernel.org/r/20250520075849.755012-2- > baolu.lu@linux.intel.com > Signed-off-by: Joerg Roedel <jroedel@suse.de> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > drivers/iommu/intel/iommu.c | 11 +++++++++++ > drivers/iommu/intel/iommu.h | 1 + > drivers/iommu/intel/nested.c | 4 ++-- > 3 files changed, 14 insertions(+), 2 deletions(-) > Looks like a duplicate of https://lore.kernel.org/linux-iommu/721D44AF820A4FEB+722679cb-2226-4287-8835-9251ad69a1ac@bbaa.fun/ And the fix for that was https://lore.kernel.org/all/468CF4B655888074+20250723120423.37924-1-bbaa@bbaa.fun/ which is present in 6.12.40, so maybe update to 6.12.40 and the issue will most likely be fixed. Thanks, Harshit > On 8/8/25 4:40 AM, cat wrote: >> I will perform bisection, yes. >> >> On 8/7/25 3:52 PM, Greg KH wrote: >>> On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote: >>>> #regzbot introduced: v6.12.34..v6.12.35 >>>> >>>> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has >>>> stopped working within a windows VM, it sees device in device >>>> manager but reports that it did not start correctly. I compared >>>> lspci logs in the vm before and after upgrade to 6.12.35, and here >>>> are the changes I noticed: >>>> >>>> - the reported link speed for the passthrough GPU has changed from >>>> 2.5 to 16GT/s >>>> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags >>>> - latency measurement feature appeared >>>> >>>> These entries also began appearing within the vm in dmesg when host >>>> kernel is 6.12.35 or above: >>>> >>>> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 >>>> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot >>>> failed: -5 >>>> ... >>>> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 >>>> [ 1.964641] nouveau 0000:01:00.0: init failed with -5 >>>> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 >>>> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 >>>> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau >>>> failed with error -5 >>>> >>>> >>>> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am >>>> using intel CPU and nvidia GPU (for passthrough, and as my GPU on >>>> linux system). >>> Can you use git bisect to find the offending commit? >>> >>> > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-08 9:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-07 15:31 [REGRESSION] vfio gpu passthrough stopped working cat 2025-08-07 15:52 ` Greg KH 2025-08-07 18:31 ` Harshit Mogalapalli 2025-08-08 4:40 ` cat 2025-08-08 9:00 ` cat 2025-08-08 9:22 ` Harshit Mogalapalli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).