kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
@ 2024-07-06 16:30 bugzilla-daemon
  2024-07-06 17:19 ` [Bug 219010] " bugzilla-daemon
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-06 16:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

            Bug ID: 219010
           Summary: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash
                    because of "Collect hot-reset devices to local buffer"
           Product: Virtualization
           Version: unspecified
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: zaltys@natrix.lt
        Regression: No

One of my virtual machines using PCI device passthrough (vfio) stopped working
on OpenSuse Tumbleweed since kernel 6.9.7. Qemu 9.0.1 complains:

qemu-system-x86_64: vfio: hot reset info failed: No space left on device
qemu-system-x86_64: GLib: ../glib/gmem.c:177: failed to allocate
18446744068411217972 bytes

and then coredumps. Qemu backtrace shows vfio_pci_get_pci_hot_reset_info()
being the last qemu function being called.

Reverting kernel 6.9.7 commit 9313244c26f3792daa86f3a18cc3bd5ad60310e0
(upstream f6944d4a0b87c16bc34ae589169e1ded3d4db08e) - "vfio/pci: Collect
hot-reset devices to local buffer" fixes the problem. As I understand, that was
backported to 6.9.7 from 6.10 tree.

Upon more throughout analysis I pinpointed that crash is happening because of
one specific passed device: sound card of Asus B650 Creator motherboard. VM
starts on 6.9.7 if I remove this sound card from it. I think the important bit
is this card being VF of device which does not report support for FLR:

15:00.0 | iommu group 28 | Phoenix PCIe Dummy Function <-- not passed to VM, no
driver, reset method: pm bus 
15:00.2 | iommu group 29 | Encryption controller (PSP/CCP) <-- ccp driver
15:00.3 | iommu group 30 | USB controller <-- xhci_hcd driver
15:00.4 | iommu group 31 | USB controller <-- xhci_hcd driver
15:00.6 | iommu group 32 | HD Audio Controller <-- sound card passed to VM

After reverting the above mentioned commit, qemu complains:

vfio: Cannot reset device 0000:15:00.6, depends on group 28 which is not owned

exactly the same as before 6.9.7 and VM starts with that sound card passed.

This might be an unsupported configuration, but qemu crashing with 6.9.7 also
feels like kernel might be breaking userspace by handling/mishandling this
differently, especially with minor version change.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
@ 2024-07-06 17:19 ` bugzilla-daemon
  2024-07-09 13:48   ` Yi Liu
  2024-07-09  8:44 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-06 17:19 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

--- Comment #1 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
Additional information: passing NVIDIA GPU, Samsung NVMEs works, passing Fresco
 FL1100 based USB card does not work. Fresco card is single VF device, but like
that sound card it does not report FLR. Reverting "vfio/pci: Collect hot-reset
devices to local buffer" allows to pass every mentioned device.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
  2024-07-06 17:19 ` [Bug 219010] " bugzilla-daemon
@ 2024-07-09  8:44 ` bugzilla-daemon
  2024-07-09 13:44 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-09  8:44 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

The Linux kernel's regression tracker (Thorsten Leemhuis) (regressions@leemhuis.info) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |regressions@leemhuis.info

--- Comment #2 from The Linux kernel's regression tracker (Thorsten Leemhuis) (regressions@leemhuis.info) ---
Does the problem happen with 6.10-rc6 or newer as well?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
  2024-07-06 17:19 ` [Bug 219010] " bugzilla-daemon
  2024-07-09  8:44 ` bugzilla-daemon
@ 2024-07-09 13:44 ` bugzilla-daemon
  2024-07-09 14:24 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-09 13:44 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

--- Comment #3 from Liu, Yi L (yi.l.liu@intel.com) ---
On 2024/7/7 01:19, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219010
> 
> --- Comment #1 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
> Additional information: passing NVIDIA GPU, Samsung NVMEs works, passing
> Fresco
>   FL1100 based USB card does not work. Fresco card is single VF device, but
>   like
> that sound card it does not report FLR. Reverting "vfio/pci: Collect
> hot-reset
> devices to local buffer" allows to pass every mentioned device.
> 

It appears that the count is used without init.. And it does not happen
with other devices as they have FLR, hence does not trigger the hotreset
info path. Please try below patch to see if it works.


 From 93618efe933c4fa5ec453bddacdf1ca2ccbf3751 Mon Sep 17 00:00:00 2001
From: Yi Liu <yi.l.liu@intel.com>
Date: Tue, 9 Jul 2024 06:41:02 -0700
Subject: [PATCH] vfio/pci: Fix a regresssion

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
  drivers/vfio/pci/vfio_pci_core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c 
b/drivers/vfio/pci/vfio_pci_core.c
index 59af22f6f826..0a7bfdd08bc7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1260,7 +1260,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
        struct vfio_pci_hot_reset_info hdr;
        struct vfio_pci_fill_info fill = {};
        bool slot = false;
-       int ret, count;
+       int ret, count = 0;

        if (copy_from_user(&hdr, arg, minsz))
                return -EFAULT;

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 17:19 ` [Bug 219010] " bugzilla-daemon
@ 2024-07-09 13:48   ` Yi Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Yi Liu @ 2024-07-09 13:48 UTC (permalink / raw)
  To: bugzilla-daemon, kvm

On 2024/7/7 01:19, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219010
> 
> --- Comment #1 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
> Additional information: passing NVIDIA GPU, Samsung NVMEs works, passing Fresco
>   FL1100 based USB card does not work. Fresco card is single VF device, but like
> that sound card it does not report FLR. Reverting "vfio/pci: Collect hot-reset
> devices to local buffer" allows to pass every mentioned device.
> 

It appears that the count is used without init.. And it does not happen
with other devices as they have FLR, hence does not trigger the hotreset
info path. Please try below patch to see if it works.


 From 93618efe933c4fa5ec453bddacdf1ca2ccbf3751 Mon Sep 17 00:00:00 2001
From: Yi Liu <yi.l.liu@intel.com>
Date: Tue, 9 Jul 2024 06:41:02 -0700
Subject: [PATCH] vfio/pci: Fix a regresssion

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
  drivers/vfio/pci/vfio_pci_core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c 
b/drivers/vfio/pci/vfio_pci_core.c
index 59af22f6f826..0a7bfdd08bc7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1260,7 +1260,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
  	struct vfio_pci_hot_reset_info hdr;
  	struct vfio_pci_fill_info fill = {};
  	bool slot = false;
-	int ret, count;
+	int ret, count = 0;

  	if (copy_from_user(&hdr, arg, minsz))
  		return -EFAULT;
-- 
2.34.1


-- 
Regards,
Yi Liu

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
                   ` (2 preceding siblings ...)
  2024-07-09 13:44 ` bugzilla-daemon
@ 2024-07-09 14:24 ` bugzilla-daemon
  2024-07-09 20:49 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-09 14:24 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

Beld Zhang (beldzhang@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |beldzhang@gmail.com

--- Comment #4 from Beld Zhang (beldzhang@gmail.com) ---
after manual modify source code:
testing pass, that crash is not occurs again.

nv 3060ti on dell precision T7920
kernel 6.6.38
qemu 8.2.4

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
                   ` (3 preceding siblings ...)
  2024-07-09 14:24 ` bugzilla-daemon
@ 2024-07-09 20:49 ` bugzilla-daemon
  2024-07-10  0:48   ` Yi Liu
  2024-07-10  0:44 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-09 20:49 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

--- Comment #5 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
(In reply to Liu, Yi L from comment #3)
> It appears that the count is used without init.. And it does not happen
> with other devices as they have FLR, hence does not trigger the hotreset
> info path. Please try below patch to see if it works.
> 

Patch fixes the problem on my system.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
                   ` (4 preceding siblings ...)
  2024-07-09 20:49 ` bugzilla-daemon
@ 2024-07-10  0:44 ` bugzilla-daemon
  2024-07-10  0:46 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-10  0:44 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

--- Comment #6 from Liu, Yi L (yi.l.liu@intel.com) ---
On 2024/7/10 04:49, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219010
> 
> --- Comment #5 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
> (In reply to Liu, Yi L from comment #3)
>> It appears that the count is used without init.. And it does not happen
>> with other devices as they have FLR, hence does not trigger the hotreset
>> info path. Please try below patch to see if it works.
>>
> 
> Patch fixes the problem on my system.
> 

patch submitted to mailing list. Thanks, and feel free to let me know if
it is proper to add your reported-by, and add your tested-by.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
                   ` (5 preceding siblings ...)
  2024-07-10  0:44 ` bugzilla-daemon
@ 2024-07-10  0:46 ` bugzilla-daemon
  2024-07-10 15:47 ` bugzilla-daemon
  2024-07-20 18:30 ` bugzilla-daemon
  8 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-10  0:46 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

--- Comment #7 from Liu, Yi L (yi.l.liu@intel.com) ---
On 2024/7/10 08:48, Yi Liu wrote:
> On 2024/7/10 04:49, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219010
>>
>> --- Comment #5 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
>> (In reply to Liu, Yi L from comment #3)
>>> It appears that the count is used without init.. And it does not happen
>>> with other devices as they have FLR, hence does not trigger the hotreset
>>> info path. Please try below patch to see if it works.
>>>
>>
>> Patch fixes the problem on my system.
>>
> 
> patch submitted to mailing list. Thanks, and feel free to let me know if
> it is proper to add your reported-by, and add your tested-by.
> 

forgot the link. :)

https://lore.kernel.org/kvm/20240710004150.319105-1-yi.l.liu@intel.com/T/#u

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-09 20:49 ` bugzilla-daemon
@ 2024-07-10  0:48   ` Yi Liu
  2024-07-10  0:49     ` Yi Liu
  0 siblings, 1 reply; 13+ messages in thread
From: Yi Liu @ 2024-07-10  0:48 UTC (permalink / raw)
  To: bugzilla-daemon, kvm

On 2024/7/10 04:49, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219010
> 
> --- Comment #5 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
> (In reply to Liu, Yi L from comment #3)
>> It appears that the count is used without init.. And it does not happen
>> with other devices as they have FLR, hence does not trigger the hotreset
>> info path. Please try below patch to see if it works.
>>
> 
> Patch fixes the problem on my system.
> 

patch submitted to mailing list. Thanks, and feel free to let me know if
it is proper to add your reported-by, and add your tested-by.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-10  0:48   ` Yi Liu
@ 2024-07-10  0:49     ` Yi Liu
  0 siblings, 0 replies; 13+ messages in thread
From: Yi Liu @ 2024-07-10  0:49 UTC (permalink / raw)
  To: bugzilla-daemon, kvm

On 2024/7/10 08:48, Yi Liu wrote:
> On 2024/7/10 04:49, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219010
>>
>> --- Comment #5 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
>> (In reply to Liu, Yi L from comment #3)
>>> It appears that the count is used without init.. And it does not happen
>>> with other devices as they have FLR, hence does not trigger the hotreset
>>> info path. Please try below patch to see if it works.
>>>
>>
>> Patch fixes the problem on my system.
>>
> 
> patch submitted to mailing list. Thanks, and feel free to let me know if
> it is proper to add your reported-by, and add your tested-by.
> 

forgot the link. :)

https://lore.kernel.org/kvm/20240710004150.319105-1-yi.l.liu@intel.com/T/#u

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
                   ` (6 preceding siblings ...)
  2024-07-10  0:46 ` bugzilla-daemon
@ 2024-07-10 15:47 ` bugzilla-daemon
  2024-07-20 18:30 ` bugzilla-daemon
  8 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-10 15:47 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

--- Comment #8 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
(In reply to Liu, Yi L from comment #6)

> patch submitted to mailing list. Thanks, and feel free to let me know if
> it is proper to add your reported-by, and add your tested-by.

It is ok to add me.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 219010] [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer"
  2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
                   ` (7 preceding siblings ...)
  2024-07-10 15:47 ` bugzilla-daemon
@ 2024-07-20 18:30 ` bugzilla-daemon
  8 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2024-07-20 18:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=219010

Žilvinas Žaltiena (zaltys@natrix.lt) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #9 from Žilvinas Žaltiena (zaltys@natrix.lt) ---
Fixed in 6.9.10. Closing this.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-07-20 18:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-06 16:30 [Bug 219010] New: [REGRESSION][VFIO] kernel 6.9.7 causing qemu crash because of "Collect hot-reset devices to local buffer" bugzilla-daemon
2024-07-06 17:19 ` [Bug 219010] " bugzilla-daemon
2024-07-09 13:48   ` Yi Liu
2024-07-09  8:44 ` bugzilla-daemon
2024-07-09 13:44 ` bugzilla-daemon
2024-07-09 14:24 ` bugzilla-daemon
2024-07-09 20:49 ` bugzilla-daemon
2024-07-10  0:48   ` Yi Liu
2024-07-10  0:49     ` Yi Liu
2024-07-10  0:44 ` bugzilla-daemon
2024-07-10  0:46 ` bugzilla-daemon
2024-07-10 15:47 ` bugzilla-daemon
2024-07-20 18:30 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).