From: Anthony DeRossi <ajderossi@gmail.com>
To: kvm@vger.kernel.org
Cc: alex.williamson@redhat.com, cohuck@redhat.com, jgg@nvidia.com,
kevin.tian@intel.com, abhsahu@nvidia.com, yishaih@nvidia.com
Subject: [PATCH v6 3/3] vfio/pci: Check the device set open count on reset
Date: Wed, 9 Nov 2022 17:40:27 -0800 [thread overview]
Message-ID: <20221110014027.28780-4-ajderossi@gmail.com> (raw)
In-Reply-To: <20221110014027.28780-1-ajderossi@gmail.com>
vfio_pci_dev_set_needs_reset() inspects the open_count of every device
in the set to determine whether a reset is allowed. The current device
always has open_count == 1 within vfio_pci_core_disable(), effectively
disabling the reset logic. This field is also documented as private in
vfio_device, so it should not be used to determine whether other devices
in the set are open.
Checking for vfio_device_set_open_count() > 1 on the device set fixes
both issues.
After commit 2cd8b14aaa66 ("vfio/pci: Move to the device set
infrastructure"), failure to create a new file for a device would cause
the reset to be skipped due to open_count being decremented after
calling close_device() in the error path.
After commit eadd86f835c6 ("vfio: Remove calls to
vfio_group_add_container_user()"), releasing a device would always skip
the reset due to an ordering change in vfio_device_fops_release().
Failing to reset the device leaves it in an unknown state, potentially
causing errors when it is accessed later or bound to a different driver.
This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3)
assigned to a Windows guest. After shutting down the guest, unbinding
the device from vfio-pci, and binding the device to amdgpu:
[ 548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
[ 548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[ 548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[ 548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed
[ 548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init
Fixes: 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")
Fixes: eadd86f835c6 ("vfio: Remove calls to vfio_group_add_container_user()")
Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
drivers/vfio/pci/vfio_pci_core.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index badc9d828cac..e030c2120183 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2488,12 +2488,12 @@ static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set)
struct vfio_pci_core_device *cur;
bool needs_reset = false;
- list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
- /* No VFIO device in the set can have an open device FD */
- if (cur->vdev.open_count)
- return false;
+ /* No other VFIO device in the set can be open. */
+ if (vfio_device_set_open_count(dev_set) > 1)
+ return false;
+
+ list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
needs_reset |= cur->needs_reset;
- }
return needs_reset;
}
--
2.37.4
next prev parent reply other threads:[~2022-11-10 1:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-10 1:40 [PATCH v6 0/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
2022-11-10 1:40 ` [PATCH v6 1/3] vfio: Fix container device registration life cycle Anthony DeRossi
2022-11-10 2:36 ` Yi Liu
2022-11-10 1:40 ` [PATCH v6 2/3] vfio: Export the device set open count Anthony DeRossi
2022-11-10 2:46 ` Yi Liu
2022-11-10 1:40 ` Anthony DeRossi [this message]
2022-11-10 3:03 ` [PATCH v6 3/3] vfio/pci: Check the device set open count on reset Yi Liu
2022-11-10 4:17 ` Alex Williamson
2022-11-10 4:33 ` Yi Liu
2022-11-10 20:16 ` [PATCH v6 0/3] " Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221110014027.28780-4-ajderossi@gmail.com \
--to=ajderossi@gmail.com \
--cc=abhsahu@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=cohuck@redhat.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.