kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices
@ 2025-11-03  9:12 bugzilla-daemon
  2025-11-03  9:17 ` [Bug 220740] " bugzilla-daemon
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-03  9:12 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

            Bug ID: 220740
           Summary: Host crash when do PF passthrough to KVM guest with
                    some devices
           Product: Virtualization
           Version: unspecified
          Hardware: Intel
                OS: Linux
            Status: NEW
          Severity: high
          Priority: P3
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: farrah.chen@intel.com
        Regression: No

Environment:

Host Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
v6.18.0-rc4

Guest kernel: 6.17-rc7

QEMU: https://gitlab.com/qemu-project/qemu.git master 37ad0e48e9fd58b17

Bug detail description: 

when do PF passthrough to KVM guest with some devices, guest failed to boot and
host crash.

Not all devices can trigger this issue, currently, I found Intel NIC
X710(almost every time) and Nvidia GPU A10(randomly) can reproduce this issue.
VF passthrough can't reproduce this issue.

Reproduce steps: 

Add "intel_iommu=on" host kernel cmdline to enable VTD
Check VTD in dmesg
[root@gnr ~]# dmesg|grep "Virtualization Technology"
[   27.313975] DMAR: Intel(R) Virtualization Technology for Directed I/O
Check BDF of X710
[root@gnr ~]# lspci|grep "X710"
b8:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for
10GbE SFP+ (rev 01)
...
Bind X710 to vfio-pci driver
[root@gnr ~]# modprobe vfio-pci
[root@gnr ~]# echo 0000:b8:00.0 >
/sys/bus/pci/devices/0000\:b8\:00.0/driver/unbind

[root@gnr ~]# lspci -n -s b8:00.0
b8:00.0 0200: 8086:1572 (rev 01)
[root@gnr ~]# echo 8086 1572 > /sys/bus/pci/drivers/vfio-pci/new_id
[root@gnr ~]# lspci -k -s b8:00.0
b8:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for
10GbE SFP+ (rev 01)
        Subsystem: Intel Corporation Ethernet Converged Network Adapter X710-2
        Kernel driver in use: vfio-pci
        Kernel modules: i40e

Boot guest with b8:00.0 assigned
/home/qemu/build/qemu-system-x86_64 \
    -name legacy,debug-threads=on \
    -accel kvm \
    -cpu host \
    -smp 16 \
    -m 16G \
    -drive file=/home/centos9.qcow2,if=none,id=virtio-disk0 \
    -device virtio-blk-pci,drive=virtio-disk0 \
    -vnc :1 \
    -monitor telnet:127.0.0.1:45455,nowait,server \
    -device vfio-pci,host=b8:00.0 \
    -serial stdio
Error log: 

VM failed to boot, no output.
Host crash with below error in serial output.

gnr login: [  120.259677] i40e 0000:b8:00.0: i40e_ptp_stop: removed PHC on
ens26f0np0

[  136.778544] vfio-pci 0000:b8:00.0: resetting

[  136.891303] vfio-pci 0000:b8:00.0: reset done

[  136.896389] vfio-pci 0000:b8:00.0: Masking broken INTx support

[  136.940637] vfio-pci 0000:b8:00.0: resetting

[  137.051298] vfio-pci 0000:b8:00.0: reset done

[IEH] error found at IEH(S:0x1 B:0xFE D:0x2 F:0x0) Sev: IEH CORRECT ERROR

[IEH] ErrorStatus 0x10, MaxBitIdx 0x1D

IEH CORRECT ERROR

[IEH] BitIdx 0x4, ShareIdx 0x0

[IEH] error device is (S:0x1 B:0xB7 D:0x0 F:0x4) BitIdx 0x4, ShareIdx 0x0 [IEH]
error found at IEH(S:0x1 B:0xB7 D:0x0 F:0x4) Sev: IEH CORRECT ERROR

[IEH] ErrorStatus 0x4, MaxBitIdx 0x11

IEH CORRECT ERROR

[IEH] BitIdx 0x2, ShareIdx 0x0

[IEH] error device is (S:0x1 B:0xB7 D:0x2 F:0x0) BitIdx 0x2, ShareIdx 0x0  
[Device Error] error on skt:0x1 Bus:0xB7 Device:0x2 func:0x0

PcieRootPortErrorHandler MailBox->PcieInitPar.SerrEmuTestEn = 0x0

PcieRootPortMultiErrorsHandler RP Error handler.

ERROR: C00000002:V03071008 I0 515DFD4E-2D7E-40D1-8C22-8AD3CD224325 7C7C9818

WHEA: Detected PCIe Error

 --Logging Corrected Error to WHEA

WHEA: Sending OS notification via SCI. Success

ERROR: C00000002:V03071008 I0 515DFD4E-2D7E-40D1-8C22-8AD3CD224325 7C7C9818

WHEA: Detected PCIe Error

 --Logging Corrected Error to WHEA

WHEA: Sending OS notification via SCI. Success
...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
@ 2025-11-03  9:17 ` bugzilla-daemon
  2025-11-03 23:47 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-03  9:17 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

Chen, Fan (farrah.chen@intel.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Bisected commit-id|                            |2b938e3db335e3670475e31a722
                   |                            |c2bee34748c5a
                 CC|                            |farrah.chen@intel.com
         Regression|No                          |Yes

--- Comment #1 from Chen, Fan (farrah.chen@intel.com) ---
I reproduced this issue on SPR, GNR, SRF, CWF.
If we disable "PCIE Error Enabling" in BIOS, host will not crash.

After bisecting, the first bad commit is:
commit 2b938e3db335e3670475e31a722c2bee34748c5a (HEAD)
Author: Ramesh Thomas <ramesh.thomas@intel.com>
Date:   Tue Dec 10 05:19:37 2024 -0800

    vfio/pci: Enable iowrite64 and ioread64 for vfio pci

    Definitions of ioread64 and iowrite64 macros in asm/io.h called by vfio
    pci implementations are enclosed inside check for CONFIG_GENERIC_IOMAP.
    They don't get defined if CONFIG_GENERIC_IOMAP is defined. Include
    linux/io-64-nonatomic-lo-hi.h to define iowrite64 and ioread64 macros
    when they are not defined. io-64-nonatomic-lo-hi.h maps the macros to
    generic implementation in lib/iomap.c. The generic implementation does
    64 bit rw if readq/writeq is defined for the architecture, otherwise it
    would do 32 bit back to back rw.

    Note that there are two versions of the generic implementation that
    differs in the order the 32 bit words are written if 64 bit support is
    not present. This is not the little/big endian ordering, which is
    handled separately. This patch uses the lo followed by hi word ordering
    which is consistent with current back to back implementation in the
    vfio/pci code.

    Signed-off-by: Ramesh Thomas <ramesh.thomas@intel.com>
    Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
    Link:
https://lore.kernel.org/r/20241210131938.303500-2-ramesh.thomas@intel.com
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
b/drivers/vfio/pci/vfio_pci_rdwr.c
index 66b72c289284..a0595c745732 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -16,6 +16,7 @@
 #include <linux/io.h>
 #include <linux/vfio.h>
 #include <linux/vgaarb.h>
+#include <linux/io-64-nonatomic-lo-hi.h>

 #include "vfio_pci_priv.h"

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
  2025-11-03  9:17 ` [Bug 220740] " bugzilla-daemon
@ 2025-11-03 23:47 ` bugzilla-daemon
  2025-11-04  5:48 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-03 23:47 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

Alex Williamson (alex.l.williamson@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alex.l.williamson@gmail.com
                   |                            |, ramesh.thomas@intel.com

--- Comment #2 from Alex Williamson (alex.l.williamson@gmail.com) ---
(In reply to Chen, Fan from comment #1)
> I reproduced this issue on SPR, GNR, SRF, CWF.

Were there platforms that did not reproduce?

> If we disable "PCIE Error Enabling" in BIOS, host will not crash.
> 
> After bisecting, the first bad commit is:
> commit 2b938e3db335e3670475e31a722c2bee34748c5a (HEAD)
...
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -16,6 +16,7 @@
>  #include <linux/io.h>
>  #include <linux/vfio.h>
>  #include <linux/vgaarb.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> 
>  #include "vfio_pci_priv.h"

Theoretically this would only define non-atomic ioread64 and iowrite64 support
on a host that doesn't already have native support for these.  Any 64-bit
x86_64 host should already define ioread/write64, so no change in behavior is
expected or intended.  Can you provide the kernel .config and compiler
information?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
  2025-11-03  9:17 ` [Bug 220740] " bugzilla-daemon
  2025-11-03 23:47 ` bugzilla-daemon
@ 2025-11-04  5:48 ` bugzilla-daemon
  2025-11-04  5:53 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-04  5:48 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

--- Comment #3 from Chen, Fan (farrah.chen@intel.com) ---
Created attachment 308890
  --> https://bugzilla.kernel.org/attachment.cgi?id=308890&action=edit
Kconfig-vfio

.config of my kernel attached

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
                   ` (2 preceding siblings ...)
  2025-11-04  5:48 ` bugzilla-daemon
@ 2025-11-04  5:53 ` bugzilla-daemon
  2025-11-05  0:03 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-04  5:53 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

--- Comment #4 from Chen, Fan (farrah.chen@intel.com) ---
(In reply to Alex Williamson from comment #2)
> (In reply to Chen, Fan from comment #1)
> > I reproduced this issue on SPR, GNR, SRF, CWF.
> 
> Were there platforms that did not reproduce?

The only system I failed to reproduce is the one whose BIOS has no "Error
Injection Configuration" or "PCIE Error Enabling" is disabled. So I guess this
issue is platform independent.

> 
> > If we disable "PCIE Error Enabling" in BIOS, host will not crash.
> > 
> > After bisecting, the first bad commit is:
> > commit 2b938e3db335e3670475e31a722c2bee34748c5a (HEAD)
> ...
> > --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> > +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> > @@ -16,6 +16,7 @@
> >  #include <linux/io.h>
> >  #include <linux/vfio.h>
> >  #include <linux/vgaarb.h>
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > 
> >  #include "vfio_pci_priv.h"
> 
> Theoretically this would only define non-atomic ioread64 and iowrite64
> support on a host that doesn't already have native support for these.  Any
> 64-bit x86_64 host should already define ioread/write64, so no change in
> behavior is expected or intended.  Can you provide the kernel .config and
> compiler information?

My host OS is Centos Stream 9, so the complier is from centos 9 release:
gcc-11.5.0-11.el9.x86_64
glibc-2.34-234.el9.x86_64
glibc-2.34-234.el9.i686

And to make sure I didn't use wrong .config, my .config is also from Centos
stream 9 default kernel, attached.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
                   ` (3 preceding siblings ...)
  2025-11-04  5:53 ` bugzilla-daemon
@ 2025-11-05  0:03 ` bugzilla-daemon
  2025-12-09  2:54   ` Tian, Kevin
  2025-11-05  4:06 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-05  0:03 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

--- Comment #5 from Alex Williamson (alex.l.williamson@gmail.com) ---
I have an X710, but not a system that can reproduce the issue.

Also I need to correct my previous statement after untangling the headers. 
This commit did introduce 8-byte access support for archs including x86_64
where they don't otherwise defined a ioread/write64 support.  This access uses
readq/writeq, where previously we'd use pairs or readl/writel.  The expectation
is that we're more closely matching the access by the guest.

I'm curious how we're getting into this code for an X710 though, mine shows
BARs as:

03:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for
10GbE SFP+ (rev 01)
        Region 0: Memory at 380000000000 (64-bit, prefetchable) [size=8M]
        Region 3: Memory at 380001800000 (64-bit, prefetchable) [size=32K]

Those would typically be mapped directly into the KVM address space and not
fault through QEMU to trigger access through this code.  The MSI-X capability
lands in BAR3:

        Capabilities: [70] MSI-X: Enable- Count=129 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00001000

Ideally the device follows the PCIe recommendation not to place registers in
the same page as the vector and pba tables, not doing so could cause this
access though.  If it were such an access, QEMU could virtualize the MSI-X
tables on a different BAR with the option x-msix-relocation=bar5 (or bar2).

If QEMU were using x-no-mmap=on then we could expect this code would be used,
but that's not specified in the example.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
                   ` (4 preceding siblings ...)
  2025-11-05  0:03 ` bugzilla-daemon
@ 2025-11-05  4:06 ` bugzilla-daemon
  2025-11-05  8:12 ` bugzilla-daemon
  2025-12-09  2:54 ` bugzilla-daemon
  7 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-05  4:06 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

--- Comment #6 from Chen, Fan (farrah.chen@intel.com) ---
(In reply to Alex Williamson from comment #5)
> I have an X710, but not a system that can reproduce the issue.
> 
> Also I need to correct my previous statement after untangling the headers. 
> This commit did introduce 8-byte access support for archs including x86_64
> where they don't otherwise defined a ioread/write64 support.  This access
> uses readq/writeq, where previously we'd use pairs or readl/writel.  The
> expectation is that we're more closely matching the access by the guest.
> 
> I'm curious how we're getting into this code for an X710 though, mine shows
> BARs as:
> 
> 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for
> 10GbE SFP+ (rev 01)
>         Region 0: Memory at 380000000000 (64-bit, prefetchable) [size=8M]
>         Region 3: Memory at 380001800000 (64-bit, prefetchable) [size=32K]
> 
> Those would typically be mapped directly into the KVM address space and not
> fault through QEMU to trigger access through this code.  The MSI-X
> capability lands in BAR3:
> 
>         Capabilities: [70] MSI-X: Enable- Count=129 Masked-
>                 Vector table: BAR=3 offset=00000000
>                 PBA: BAR=3 offset=00001000
> 

Not sure if it is related, but on my systems, different from yours, the MSI-X
capability is "Enable+":
        Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00001000

And if without this commit(reset to previous commit), I can passthrough X710
successfully with below log in host dmesg:
gnr-sp-2s-605 login: [  129.819630] i40e 0000:b8:00.0: i40e_ptp_stop: removed
PHC on ens26f0np0
[  143.509906] vfio-pci 0000:b8:00.0: resetting
[  143.619051] vfio-pci 0000:b8:00.0: reset done
[  143.624135] vfio-pci 0000:b8:00.0: Masking broken INTx support
[  143.669167] vfio-pci 0000:b8:00.0: resetting
[  143.779059] vfio-pci 0000:b8:00.0: reset done
[  144.392971] vfio-pci 0000:b8:00.0: vfio_bar_restore: reset recovery -
restoring BARs


> Ideally the device follows the PCIe recommendation not to place registers in
> the same page as the vector and pba tables, not doing so could cause this
> access though.  If it were such an access, QEMU could virtualize the MSI-X
> tables on a different BAR with the option x-msix-relocation=bar5 (or bar2).
> 
> If QEMU were using x-no-mmap=on then we could expect this code would be
> used, but that's not specified in the example.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
                   ` (5 preceding siblings ...)
  2025-11-05  4:06 ` bugzilla-daemon
@ 2025-11-05  8:12 ` bugzilla-daemon
  2025-12-09  2:54 ` bugzilla-daemon
  7 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-11-05  8:12 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

--- Comment #7 from Chen, Fan (farrah.chen@intel.com) ---
I also tried the option you mentioned "x-msix-relocation=bar5", still the same
result, host crash.

In addition, according to the host serial log in description, the error devices
is b7:00.04 and b7:02.00, and the BDF of my X710 is b8:00.0.
I checked pcie topology of my system:
 +-[0000:b7]-+-00.0
 |           +-00.1
 |           +-00.2
 |           +-00.4
 |           \-02.0-[b8]--+-00.0
 |                        \-00.1

So the AER errors are reported on the bridge (b7:02.0) of the assigned device
(b8:00.0), and a device (b7:00.4) sibling to the bridge.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-05  0:03 ` bugzilla-daemon
@ 2025-12-09  2:54   ` Tian, Kevin
  0 siblings, 0 replies; 10+ messages in thread
From: Tian, Kevin @ 2025-12-09  2:54 UTC (permalink / raw)
  To: bugzilla-daemon@kernel.org, kvm@vger.kernel.org

> From: bugzilla-daemon@kernel.org <bugzilla-daemon@kernel.org>
> Sent: Wednesday, November 5, 2025 8:03 AM
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=220740
> 
> --- Comment #5 from Alex Williamson (alex.l.williamson@gmail.com) ---
> I have an X710, but not a system that can reproduce the issue.
> 
> Also I need to correct my previous statement after untangling the headers.
> This commit did introduce 8-byte access support for archs including x86_64
> where they don't otherwise defined a ioread/write64 support.  This access
> uses
> readq/writeq, where previously we'd use pairs or readl/writel.  The
> expectation
> is that we're more closely matching the access by the guest.
> 
> I'm curious how we're getting into this code for an X710 though, mine shows
> BARs as:
> 
> 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for
> 10GbE SFP+ (rev 01)
>         Region 0: Memory at 380000000000 (64-bit, prefetchable) [size=8M]
>         Region 3: Memory at 380001800000 (64-bit, prefetchable) [size=32K]
> 
> Those would typically be mapped directly into the KVM address space and
> not
> fault through QEMU to trigger access through this code.

We have verified this problem caused by 8-byte access to the rom bar:

    Expansion ROM at 93480000 [disabled] [size=512K]

Every qword access to that range triggers a dozens of PCI AER related
prints then in total 64K reads from Qemu lead to many many prints then
the host is not responsive.

There is indeed no access to bar0/bar3 in this path.

Disabling "PCIE Error Enabling" in BIOS just removes the prints to hide
the issue.

Updating to latest X710 firmware didn't help and we didn't find an explicit
errata talking about this dword limitation. 

It is difficult to identify all possible devices suffering from this issue, so a
safer/simpler way is to universally disable 8-byte access to the rom bar,
e.g. as below:

diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
index e346392b72f6..9b39184f76b7 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -491,7 +491,7 @@ nvgrace_gpu_map_and_read(struct nvgrace_gpu_pci_core_device *nvdev,
 		ret = vfio_pci_core_do_io_rw(&nvdev->core_device, false,
 					     nvdev->resmem.ioaddr,
 					     buf, offset, mem_count,
-					     0, 0, false);
+					     0, 0, false, true);
 	}
 
 	return ret;
@@ -609,7 +609,7 @@ nvgrace_gpu_map_and_write(struct nvgrace_gpu_pci_core_device *nvdev,
 		ret = vfio_pci_core_do_io_rw(&nvdev->core_device, false,
 					     nvdev->resmem.ioaddr,
 					     (char __user *)buf, pos, mem_count,
-					     0, 0, true);
+					     0, 0, true, true);
 	}
 
 	return ret;
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6192788c8ba3..3467151a632d 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -135,7 +135,7 @@ VFIO_IORDWR(64)
 ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 			       void __iomem *io, char __user *buf,
 			       loff_t off, size_t count, size_t x_start,
-			       size_t x_end, bool iswrite)
+			       size_t x_end, bool iswrite, bool allow_qword)
 {
 	ssize_t done = 0;
 	int ret;
@@ -150,7 +150,7 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 		else
 			fillable = 0;
 
-		if (fillable >= 8 && !(off % 8)) {
+		if (allow_qword && fillable >= 8 && !(off % 8)) {
 			ret = vfio_pci_iordwr64(vdev, iswrite, test_mem,
 						io, buf, off, &filled);
 			if (ret)
@@ -234,6 +234,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 	void __iomem *io;
 	struct resource *res = &vdev->pdev->resource[bar];
 	ssize_t done;
+	bool allow_qword = true;
 
 	if (pci_resource_start(pdev, bar))
 		end = pci_resource_len(pdev, bar);
@@ -262,6 +263,15 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 		if (!io)
 			return -ENOMEM;
 		x_end = end;
+
+		/*
+		 * Certain devices (e.g. Intel X710) don't support 8-byte access
+		 * to the ROM bar. Otherwise PCI AER errors might be triggered.
+		 *
+		 * Disable qword access to the ROM bar universally, which has been
+		 * working reliably for years before 8-byte access is enabled.
+		 */
+		allow_qword = false;
 	} else {
 		int ret = vfio_pci_core_setup_barmap(vdev, bar);
 		if (ret) {
@@ -278,7 +288,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 	}
 
 	done = vfio_pci_core_do_io_rw(vdev, res->flags & IORESOURCE_MEM, io, buf, pos,
-				      count, x_start, x_end, iswrite);
+				      count, x_start, x_end, iswrite, allow_qword);
 
 	if (done >= 0)
 		*ppos += done;
@@ -352,7 +362,7 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
 	 * to the memory enable bit in the command register.
 	 */
 	done = vfio_pci_core_do_io_rw(vdev, false, iomem, buf, off, count,
-				      0, 0, iswrite);
+				      0, 0, iswrite, true);
 
 	vga_put(vdev->pdev, rsrc);
 
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index f541044e42a2..3a75b76eaed3 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -133,7 +133,7 @@ pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
 ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool test_mem,
 			       void __iomem *io, char __user *buf,
 			       loff_t off, size_t count, size_t x_start,
-			       size_t x_end, bool iswrite);
+			       size_t x_end, bool iswrite, bool allow_qword);
 bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt,
 					 loff_t reg_start, size_t reg_cnt,
 					 loff_t *buf_offset,

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Bug 220740] Host crash when do PF passthrough to KVM guest with some devices
  2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
                   ` (6 preceding siblings ...)
  2025-11-05  8:12 ` bugzilla-daemon
@ 2025-12-09  2:54 ` bugzilla-daemon
  7 siblings, 0 replies; 10+ messages in thread
From: bugzilla-daemon @ 2025-12-09  2:54 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=220740

--- Comment #8 from kevin.tian@intel.com ---
> From: bugzilla-daemon@kernel.org <bugzilla-daemon@kernel.org>
> Sent: Wednesday, November 5, 2025 8:03 AM
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=220740
> 
> --- Comment #5 from Alex Williamson (alex.l.williamson@gmail.com) ---
> I have an X710, but not a system that can reproduce the issue.
> 
> Also I need to correct my previous statement after untangling the headers.
> This commit did introduce 8-byte access support for archs including x86_64
> where they don't otherwise defined a ioread/write64 support.  This access
> uses
> readq/writeq, where previously we'd use pairs or readl/writel.  The
> expectation
> is that we're more closely matching the access by the guest.
> 
> I'm curious how we're getting into this code for an X710 though, mine shows
> BARs as:
> 
> 03:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for
> 10GbE SFP+ (rev 01)
>         Region 0: Memory at 380000000000 (64-bit, prefetchable) [size=8M]
>         Region 3: Memory at 380001800000 (64-bit, prefetchable) [size=32K]
> 
> Those would typically be mapped directly into the KVM address space and
> not
> fault through QEMU to trigger access through this code.

We have verified this problem caused by 8-byte access to the rom bar:

    Expansion ROM at 93480000 [disabled] [size=512K]

Every qword access to that range triggers a dozens of PCI AER related
prints then in total 64K reads from Qemu lead to many many prints then
the host is not responsive.

There is indeed no access to bar0/bar3 in this path.

Disabling "PCIE Error Enabling" in BIOS just removes the prints to hide
the issue.

Updating to latest X710 firmware didn't help and we didn't find an explicit
errata talking about this dword limitation. 

It is difficult to identify all possible devices suffering from this issue, so
a
safer/simpler way is to universally disable 8-byte access to the rom bar,
e.g. as below:

diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c
b/drivers/vfio/pci/nvgrace-gpu/main.c
index e346392b72f6..9b39184f76b7 100644
--- a/drivers/vfio/pci/nvgrace-gpu/main.c
+++ b/drivers/vfio/pci/nvgrace-gpu/main.c
@@ -491,7 +491,7 @@ nvgrace_gpu_map_and_read(struct nvgrace_gpu_pci_core_device
*nvdev,
                ret = vfio_pci_core_do_io_rw(&nvdev->core_device, false,
                                             nvdev->resmem.ioaddr,
                                             buf, offset, mem_count,
-                                            0, 0, false);
+                                            0, 0, false, true);
        }

        return ret;
@@ -609,7 +609,7 @@ nvgrace_gpu_map_and_write(struct
nvgrace_gpu_pci_core_device *nvdev,
                ret = vfio_pci_core_do_io_rw(&nvdev->core_device, false,
                                             nvdev->resmem.ioaddr,
                                             (char __user *)buf, pos,
mem_count,
-                                            0, 0, true);
+                                            0, 0, true, true);
        }

        return ret;
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
b/drivers/vfio/pci/vfio_pci_rdwr.c
index 6192788c8ba3..3467151a632d 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -135,7 +135,7 @@ VFIO_IORDWR(64)
 ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool
test_mem,
                               void __iomem *io, char __user *buf,
                               loff_t off, size_t count, size_t x_start,
-                              size_t x_end, bool iswrite)
+                              size_t x_end, bool iswrite, bool allow_qword)
 {
        ssize_t done = 0;
        int ret;
@@ -150,7 +150,7 @@ ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device
*vdev, bool test_mem,
                else
                        fillable = 0;

-               if (fillable >= 8 && !(off % 8)) {
+               if (allow_qword && fillable >= 8 && !(off % 8)) {
                        ret = vfio_pci_iordwr64(vdev, iswrite, test_mem,
                                                io, buf, off, &filled);
                        if (ret)
@@ -234,6 +234,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev,
char __user *buf,
        void __iomem *io;
        struct resource *res = &vdev->pdev->resource[bar];
        ssize_t done;
+       bool allow_qword = true;

        if (pci_resource_start(pdev, bar))
                end = pci_resource_len(pdev, bar);
@@ -262,6 +263,15 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev,
char __user *buf,
                if (!io)
                        return -ENOMEM;
                x_end = end;
+
+               /*
+                * Certain devices (e.g. Intel X710) don't support 8-byte
access
+                * to the ROM bar. Otherwise PCI AER errors might be triggered.
+                *
+                * Disable qword access to the ROM bar universally, which has
been
+                * working reliably for years before 8-byte access is enabled.
+                */
+               allow_qword = false;
        } else {
                int ret = vfio_pci_core_setup_barmap(vdev, bar);
                if (ret) {
@@ -278,7 +288,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev,
char __user *buf,
        }

        done = vfio_pci_core_do_io_rw(vdev, res->flags & IORESOURCE_MEM, io,
buf, pos,
-                                     count, x_start, x_end, iswrite);
+                                     count, x_start, x_end, iswrite,
allow_qword);

        if (done >= 0)
                *ppos += done;
@@ -352,7 +362,7 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev,
char __user *buf,
         * to the memory enable bit in the command register.
         */
        done = vfio_pci_core_do_io_rw(vdev, false, iomem, buf, off, count,
-                                     0, 0, iswrite);
+                                     0, 0, iswrite, true);

        vga_put(vdev->pdev, rsrc);

diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index f541044e42a2..3a75b76eaed3 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -133,7 +133,7 @@ pci_ers_result_t vfio_pci_core_aer_err_detected(struct
pci_dev *pdev,
 ssize_t vfio_pci_core_do_io_rw(struct vfio_pci_core_device *vdev, bool
test_mem,
                               void __iomem *io, char __user *buf,
                               loff_t off, size_t count, size_t x_start,
-                              size_t x_end, bool iswrite);
+                              size_t x_end, bool iswrite, bool allow_qword);
 bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt,
                                         loff_t reg_start, size_t reg_cnt,
                                         loff_t *buf_offset,

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-12-09  2:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03  9:12 [Bug 220740] New: Host crash when do PF passthrough to KVM guest with some devices bugzilla-daemon
2025-11-03  9:17 ` [Bug 220740] " bugzilla-daemon
2025-11-03 23:47 ` bugzilla-daemon
2025-11-04  5:48 ` bugzilla-daemon
2025-11-04  5:53 ` bugzilla-daemon
2025-11-05  0:03 ` bugzilla-daemon
2025-12-09  2:54   ` Tian, Kevin
2025-11-05  4:06 ` bugzilla-daemon
2025-11-05  8:12 ` bugzilla-daemon
2025-12-09  2:54 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).