* [REGRESSION] vfio gpu passthrough stopped working
@ 2025-08-07 15:31 cat
2025-08-07 15:52 ` Greg KH
0 siblings, 1 reply; 6+ messages in thread
From: cat @ 2025-08-07 15:31 UTC (permalink / raw)
Cc: regressions, stable
#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
- the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
- latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001
[ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5
...
[ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5
[ 1.964641] nouveau 0000:01:00.0: init failed with -5
[ 1.964681] nouveau: drm:00000000:00000080: init failed with -5
[ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5
[ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working
2025-08-07 15:31 [REGRESSION] vfio gpu passthrough stopped working cat
@ 2025-08-07 15:52 ` Greg KH
2025-08-07 18:31 ` Harshit Mogalapalli
2025-08-08 4:40 ` cat
0 siblings, 2 replies; 6+ messages in thread
From: Greg KH @ 2025-08-07 15:52 UTC (permalink / raw)
To: cat; +Cc: regressions, stable
On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
> #regzbot introduced: v6.12.34..v6.12.35
>
> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
>
> - the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
> - latency measurement feature appeared
>
> These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
>
> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001
> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5
> ...
> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5
> [ 1.964641] nouveau 0000:01:00.0: init failed with -5
> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5
> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5
> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
>
>
> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
Can you use git bisect to find the offending commit?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working
2025-08-07 15:52 ` Greg KH
@ 2025-08-07 18:31 ` Harshit Mogalapalli
2025-08-08 4:40 ` cat
1 sibling, 0 replies; 6+ messages in thread
From: Harshit Mogalapalli @ 2025-08-07 18:31 UTC (permalink / raw)
To: Greg KH, cat; +Cc: regressions, stable
Hi,
On 07/08/25 21:22, Greg KH wrote:
> On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
>> #regzbot introduced: v6.12.34..v6.12.35
>>
>> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
>>
>> - the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
>> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
>> - latency measurement feature appeared
>>
>> These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
>>
>> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001
>> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5
>> ...
>> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5
>> [ 1.964641] nouveau 0000:01:00.0: init failed with -5
>> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5
>> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5
>> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
>>
>>
>> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
>
> Can you use git bisect to find the offending commit?>
Additional notes: I looked at the log and am listing probably relevant
commit, if bisection is too costly:
68e58f579121 PCI: dwc: ep: Correct PBA offset in .set_msix() callback
523815857b1e PCI: cadence-ep: Correct PBA offset in .set_msix() callback
These two might be interesting ones to consider. Please ignore this note
if bisection is already in progress as these are pure guesses.
Thanks,
Harshit
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working
2025-08-07 15:52 ` Greg KH
2025-08-07 18:31 ` Harshit Mogalapalli
@ 2025-08-08 4:40 ` cat
2025-08-08 9:00 ` cat
1 sibling, 1 reply; 6+ messages in thread
From: cat @ 2025-08-08 4:40 UTC (permalink / raw)
To: Greg KH; +Cc: regressions, stable
I will perform bisection, yes.
On 8/7/25 3:52 PM, Greg KH wrote:
> On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
>> #regzbot introduced: v6.12.34..v6.12.35
>>
>> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
>>
>> - the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
>> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
>> - latency measurement feature appeared
>>
>> These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
>>
>> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001
>> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5
>> ...
>> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5
>> [ 1.964641] nouveau 0000:01:00.0: init failed with -5
>> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5
>> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5
>> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
>>
>>
>> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
> Can you use git bisect to find the offending commit?
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working
2025-08-08 4:40 ` cat
@ 2025-08-08 9:00 ` cat
2025-08-08 9:22 ` Harshit Mogalapalli
0 siblings, 1 reply; 6+ messages in thread
From: cat @ 2025-08-08 9:00 UTC (permalink / raw)
To: Greg KH, regressions, stable
fb5873b779dd5858123c19bbd6959566771e2e83 is the first bad commit
commit fb5873b779dd5858123c19bbd6959566771e2e83
Author: Lu Baolu <baolu.lu@linux.intel.com>
Date: Tue May 20 15:58:49 2025 +0800
iommu/vt-d: Restore context entry setup order for aliased devices
commit 320302baed05c6456164652541f23d2a96522c06 upstream.
Commit 2031c469f816 ("iommu/vt-d: Add support for static identity
domain")
changed the context entry setup during domain attachment from a
set-and-check policy to a clear-and-reset approach. This inadvertently
introduced a regression affecting PCI aliased devices behind
PCIe-to-PCI
bridges.
Specifically, keyboard and touchpad stopped working on several Apple
Macbooks with below messages:
kernel: platform pxa2xx-spi.3: Adding to iommu group 20
kernel: input: Apple SPI Keyboard as
/devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-APP000D:00/input/input0
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
kernel: DMAR: DRHD: handling fault status reg 3
kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
0xffffa000 [fault reason 0x06] PTE Read access is not set
kernel: DMAR: DRHD: handling fault status reg 3
kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
Fix this by restoring the previous context setup order.
Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity
domain")
Closes:
https://lore.kernel.org/all/4dada48a-c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Link:
https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com
Link:
https://lore.kernel.org/r/20250520075849.755012-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/iommu/intel/iommu.c | 11 +++++++++++
drivers/iommu/intel/iommu.h | 1 +
drivers/iommu/intel/nested.c | 4 ++--
3 files changed, 14 insertions(+), 2 deletions(-)
On 8/8/25 4:40 AM, cat wrote:
> I will perform bisection, yes.
>
> On 8/7/25 3:52 PM, Greg KH wrote:
>> On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
>>> #regzbot introduced: v6.12.34..v6.12.35
>>>
>>> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has
>>> stopped working within a windows VM, it sees device in device
>>> manager but reports that it did not start correctly. I compared
>>> lspci logs in the vm before and after upgrade to 6.12.35, and here
>>> are the changes I noticed:
>>>
>>> - the reported link speed for the passthrough GPU has changed from
>>> 2.5 to 16GT/s
>>> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
>>> - latency measurement feature appeared
>>>
>>> These entries also began appearing within the vm in dmesg when host
>>> kernel is 6.12.35 or above:
>>>
>>> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001
>>> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot
>>> failed: -5
>>> ...
>>> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5
>>> [ 1.964641] nouveau 0000:01:00.0: init failed with -5
>>> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5
>>> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5
>>> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau
>>> failed with error -5
>>>
>>>
>>> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am
>>> using intel CPU and nvidia GPU (for passthrough, and as my GPU on
>>> linux system).
>> Can you use git bisect to find the offending commit?
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [REGRESSION] vfio gpu passthrough stopped working
2025-08-08 9:00 ` cat
@ 2025-08-08 9:22 ` Harshit Mogalapalli
0 siblings, 0 replies; 6+ messages in thread
From: Harshit Mogalapalli @ 2025-08-08 9:22 UTC (permalink / raw)
To: cat, Greg KH, regressions, stable
Hi,
On 08/08/25 14:30, cat wrote:
> fb5873b779dd5858123c19bbd6959566771e2e83 is the first bad commit
> commit fb5873b779dd5858123c19bbd6959566771e2e83
> Author: Lu Baolu <baolu.lu@linux.intel.com>
> Date: Tue May 20 15:58:49 2025 +0800
>
> iommu/vt-d: Restore context entry setup order for aliased devices
>
> commit 320302baed05c6456164652541f23d2a96522c06 upstream.
>
> Commit 2031c469f816 ("iommu/vt-d: Add support for static identity
> domain")
> changed the context entry setup during domain attachment from a
> set-and-check policy to a clear-and-reset approach. This inadvertently
> introduced a regression affecting PCI aliased devices behind PCIe-
> to-PCI
> bridges.
>
> Specifically, keyboard and touchpad stopped working on several Apple
> Macbooks with below messages:
>
> kernel: platform pxa2xx-spi.3: Adding to iommu group 20
> kernel: input: Apple SPI Keyboard as
> /devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-
> APP000D:00/input/input0
> kernel: DMAR: DRHD: handling fault status reg 3
> kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
> 0xffffa000 [fault reason 0x06] PTE Read access is not set
> kernel: DMAR: DRHD: handling fault status reg 3
> kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
> 0xffffa000 [fault reason 0x06] PTE Read access is not set
> kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
> kernel: DMAR: DRHD: handling fault status reg 3
> kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr
> 0xffffa000 [fault reason 0x06] PTE Read access is not set
> kernel: DMAR: DRHD: handling fault status reg 3
> kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
>
> Fix this by restoring the previous context setup order.
>
> Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity
> domain")
> Closes: https://lore.kernel.org/all/4dada48a-
> c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/
> Cc: stable@vger.kernel.org
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> Link: https://lore.kernel.org/r/20250514060523.2862195-1-
> baolu.lu@linux.intel.com
> Link: https://lore.kernel.org/r/20250520075849.755012-2-
> baolu.lu@linux.intel.com
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> drivers/iommu/intel/iommu.c | 11 +++++++++++
> drivers/iommu/intel/iommu.h | 1 +
> drivers/iommu/intel/nested.c | 4 ++--
> 3 files changed, 14 insertions(+), 2 deletions(-)
>
Looks like a duplicate of
https://lore.kernel.org/linux-iommu/721D44AF820A4FEB+722679cb-2226-4287-8835-9251ad69a1ac@bbaa.fun/
And the fix for that was
https://lore.kernel.org/all/468CF4B655888074+20250723120423.37924-1-bbaa@bbaa.fun/
which is present in 6.12.40, so maybe update to 6.12.40 and the issue
will most likely be fixed.
Thanks,
Harshit
> On 8/8/25 4:40 AM, cat wrote:
>> I will perform bisection, yes.
>>
>> On 8/7/25 3:52 PM, Greg KH wrote:
>>> On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
>>>> #regzbot introduced: v6.12.34..v6.12.35
>>>>
>>>> After upgrade to kernel 6.12.35, vfio passthrough for my GPU has
>>>> stopped working within a windows VM, it sees device in device
>>>> manager but reports that it did not start correctly. I compared
>>>> lspci logs in the vm before and after upgrade to 6.12.35, and here
>>>> are the changes I noticed:
>>>>
>>>> - the reported link speed for the passthrough GPU has changed from
>>>> 2.5 to 16GT/s
>>>> - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
>>>> - latency measurement feature appeared
>>>>
>>>> These entries also began appearing within the vm in dmesg when host
>>>> kernel is 6.12.35 or above:
>>>>
>>>> [ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001
>>>> [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot
>>>> failed: -5
>>>> ...
>>>> [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5
>>>> [ 1.964641] nouveau 0000:01:00.0: init failed with -5
>>>> [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5
>>>> [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5
>>>> [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau
>>>> failed with error -5
>>>>
>>>>
>>>> 6.12.34 worked fine, and latest 6.12 LTS does not work either. I am
>>>> using intel CPU and nvidia GPU (for passthrough, and as my GPU on
>>>> linux system).
>>> Can you use git bisect to find the offending commit?
>>>
>>>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-08 9:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07 15:31 [REGRESSION] vfio gpu passthrough stopped working cat
2025-08-07 15:52 ` Greg KH
2025-08-07 18:31 ` Harshit Mogalapalli
2025-08-08 4:40 ` cat
2025-08-08 9:00 ` cat
2025-08-08 9:22 ` Harshit Mogalapalli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).