qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Question] SR-IOV VF 'surprise removal' and vfio_reset behavior in pSeries
@ 2021-01-04 13:35 Daniel Henrique Barboza
  2021-01-04 17:55 ` Alex Williamson
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Henrique Barboza @ 2021-01-04 13:35 UTC (permalink / raw)
  To: qemu-devel@nongnu.org, qemu-ppc@nongnu.org

Hi,

This question came up while I was investigating a Libvirt bug [1], where an user is removing
VFs from the host while Libvirt domains was using them, causing Libvirt to remain in
an inconsistent state. I'm trying to alleviate the effects of this in Libvirt (see [2] if curious),
but QEMU is throwing some messages in the terminal that, although it appears to be benign,
I'm not sure if it's a symptom of something that should be fixed.

In a Power 9 server running a Mellanox MT28800 SR-IOV netcard I have the following IOMMU
settings, where the physical card is at Group 0 and all the VFs are allocated from Group 12 and
onwards:

IOMMU Group 0 0000:01:00.0 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019]
(...)
IOMMU Group 12 0000:01:00.2 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function] [15b3:1018]
IOMMU Group 13 0000:01:00.3 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5 Virtual Function] [15b3:1018]
(...)


Creating a guest with the Group 12 VF and trying to remove the VF from the host via

echo 0 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs


Makes the guest remove the VF card, but throwing a warning/error message in QEMU log:

"qemu-system-ppc64: vfio: Cannot reset device 0000:01:00.2, depends on group 0 which is not owned."


I found this message confusing because the VF was occupying IOMMU group 12, but the message is
claiming that the reset wasn't possible because Group 0 wasn't owned by the process.

Digging it a bit, the hotunplug is fired up via the poweroff state of the card triggering pSeries internals,
and then reaching spapr_pci_unplug() in hw/ppc/spapr_pci.c. The body of the function reads:

-------
     /* some version guests do not wait for completion of a device
      * cleanup (generally done asynchronously by the kernel) before
      * signaling to QEMU that the device is safe, but instead sleep
      * for some 'safe' period of time. unfortunately on a busy host
      * this sleep isn't guaranteed to be long enough, resulting in
      * bad things like IRQ lines being left asserted during final
      * device removal. to deal with this we call reset just prior
      * to finalizing the device, which will put the device back into
      * an 'idle' state, as the device cleanup code expects.
      */
     pci_device_reset(PCI_DEVICE(plugged_dev));
-------

My first question is right at this point: do we need PCI reset for a VF removal?  I am not sure about
handling IRQ lines asserted for a device that the kernel is going to remove.

Going on further to the origin on the warning message we get to hw/vfio/pci.c, vfio_pci_hot_reset().
The VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl() is returning all VFs of the card, including
the physical function, in the vfio_pci_hot_reset_info struct. Then, down where it verifies if all
IOMMU groups required for reset belongs to the process, it fails to reset the VF because QEMU
does not have Group 0 ownership:

-------
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret) {
         ret = -errno;
         error_report("vfio: hot reset info failed: %m");
         goto out_single;
     }

(...)

         QLIST_FOREACH(group, &vfio_group_list, next) {
             if (group->groupid == devices[i].group_id) {
                 break;
             }
         }

         if (!group) {
             if (!vdev->has_pm_reset) {
                 error_report("vfio: Cannot reset device %s, "
                              "depends on group %d which is not owned.",
                              vdev->vbasedev.name, devices[i].group_id);
             }
             ret = -EPERM;
             goto out;
         }
-------

This message is not clear to me because I'm aware that the VF was in Group 12, but apparently the
code is demanding ownership of all IOMMU Groups related to the card to allow the reset.

The second question: is this intended?  If not, then someone is behaving badly (perhaps the card driver,
mlx5_core) and reporting wrong info to that VFIO ioctl(). If this reset behavior is intended, then I
might insert a code in spapr_pci_unplug() to skip resetting the VF in this particular case to avoid the
error message (assuming that we really can live without a reset in this case).


Thanks,


DHB


[1] https://gitlab.com/libvirt/libvirt/-/issues/72
[2] https://www.redhat.com/archives/libvir-list/2021-January/msg00028.html


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-01-04 17:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-04 13:35 [Question] SR-IOV VF 'surprise removal' and vfio_reset behavior in pSeries Daniel Henrique Barboza
2021-01-04 17:55 ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).