[PATCH v1 0/4] Support dynamic MSI-X allocation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/4] Support dynamic MSI-X allocation
@ 2023-08-22  7:29 Jing Liu
  2023-08-22  7:29 ` [PATCH v1 1/4] vfio/pci: detect the support of " Jing Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Jing Liu @ 2023-08-22  7:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
	jing2.liu, jing2.liu

Changes since RFC v1:
- RFC v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg978637.html
- Revise the comments. (Alex)
- Report error of getting irq info and remove the trace of failure
  case. (Alex, Cédric)
- Only store dynamic allocation flag as a bool type and test
  accordingly. (Alex)
- Move dynamic allocation detection to vfio_msix_early_setup(). (Alex)
- Change the condition logic in vfio_msix_vector_do_use() that moving
  the defer_kvm_irq_routing test out and create a common place to update
  nr_vectors. (Alex)
- Consolidate the way of MSI-X enabling during device initialization and
  interrupt restoring that uses fd = -1 trick. Create a function doing
  that. (Alex)

Before kernel v6.5, dynamic allocation of MSI-X interrupts was not
supported. Qemu therefore when allocating a new interrupt, should first
release all previously allocated interrupts (including disable of MSI-X)
and re-allocate all interrupts that includes the new one.

The kernel series [1] adds the support of dynamic MSI-X allocation to
vfio-pci and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide user
space, that when dynamic MSI-X is supported the flag is cleared.

This series makes the behavior for VFIO PCI devices when dynamic MSI-X
allocation is supported. When guest unmasks an interrupt, Qemu can
directly allocate an interrupt on host for this and has nothing to do
with the previously allocated ones. Therefore, host only allocates
interrupts for those unmasked (enabled) interrupts inside guest when
dynamic MSI-X allocation is supported by device.

When guests enable MSI-X with all of the vectors masked, Qemu need match
the state to enable MSI-X with no vector enabled. During migration
restore, Qemu also need enable MSI-X first in dynamic allocation mode,
to avoid the guest unused vectors being allocated on host. To
consolidate them, we use vector 0 with an invalid fd to get MSI-X
enabled and create a common function for this. This is cleaner than
setting userspace triggering and immediately release.

Any feedback is appreciated.

Jing

[1] https://lwn.net/Articles/931679/

Jing Liu (4):
  vfio/pci: detect the support of dynamic MSI-X allocation
  vfio/pci: enable vector on dynamic MSI-X allocation
  vfio/pci: use an invalid fd to enable MSI-X
  vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation

 hw/vfio/pci.c        | 126 +++++++++++++++++++++++++++++++++----------
 hw/vfio/pci.h        |   1 +
 hw/vfio/trace-events |   2 +-
 3 files changed, 101 insertions(+), 28 deletions(-)

-- 
2.27.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/4] vfio/pci: detect the support of dynamic MSI-X allocation
  2023-08-22  7:29 [PATCH v1 0/4] Support dynamic MSI-X allocation Jing Liu
@ 2023-08-22  7:29 ` Jing Liu
  2023-08-29 13:33   ` Cédric Le Goater
  2023-08-22  7:29 ` [PATCH v1 2/4] vfio/pci: enable vector on " Jing Liu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jing Liu @ 2023-08-22  7:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
	jing2.liu, jing2.liu

Kernel provides the guidance of dynamic MSI-X allocation support of
passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to
guide user space.

Fetch the flags from host to determine if dynamic MSI-X allocation is
supported.

Originally-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
---
Changes since RFC v1:
- Filter the dynamic MSI-X allocation flag and store as a bool type.
  (Alex)
- Move the detection to vfio_msix_early_setup(). (Alex)
- Report error of getting irq info and remove the trace of failure
  case. (Alex, Cédric)
---
 hw/vfio/pci.c        | 15 +++++++++++++--
 hw/vfio/pci.h        |  1 +
 hw/vfio/trace-events |  2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a205c6b1130f..8a3b34f3c196 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1493,7 +1493,9 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
     uint8_t pos;
     uint16_t ctrl;
     uint32_t table, pba;
-    int fd = vdev->vbasedev.fd;
+    int ret, fd = vdev->vbasedev.fd;
+    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
+                                      .index = VFIO_PCI_MSIX_IRQ_INDEX };
     VFIOMSIXInfo *msix;
 
     pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
@@ -1530,6 +1532,14 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
     msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
     msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
 
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
+        return;
+    }
+
+    msix->noresize = !!(irq_info.flags & VFIO_IRQ_INFO_NORESIZE);
+
     /*
      * Test the size of the pba_offset variable and catch if it extends outside
      * of the specified BAR. If it is the case, we need to apply a hardware
@@ -1562,7 +1572,8 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
     }
 
     trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix->table_bar,
-                                msix->table_offset, msix->entries);
+                                msix->table_offset, msix->entries,
+                                msix->noresize);
     vdev->msix = msix;
 
     vfio_pci_fixup_msix_region(vdev);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index a2771b9ff3cc..0717574d79e9 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
     uint32_t table_offset;
     uint32_t pba_offset;
     unsigned long *pending;
+    bool noresize;
 } VFIOMSIXInfo;
 
 #define TYPE_VFIO_PCI "vfio-pci"
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index ee7509e68e4f..6de5d9ba8e46 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -27,7 +27,7 @@ vfio_vga_read(uint64_t addr, int size, uint64_t data) " (0x%"PRIx64", %d) = 0x%"
 vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s, @0x%x, len=0x%x) 0x%x"
 vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s, @0x%x, 0x%x, len=0x%x)"
 vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
-vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
+vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int entries, bool noresize) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d, noresize %d"
 vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
 vfio_check_pm_reset(const char *name) "%s Supports PM reset"
 vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 2/4] vfio/pci: enable vector on dynamic MSI-X allocation
  2023-08-22  7:29 [PATCH v1 0/4] Support dynamic MSI-X allocation Jing Liu
  2023-08-22  7:29 ` [PATCH v1 1/4] vfio/pci: detect the support of " Jing Liu
@ 2023-08-22  7:29 ` Jing Liu
  2023-08-22  7:29 ` [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X Jing Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Jing Liu @ 2023-08-22  7:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
	jing2.liu, jing2.liu

The vector_use callback is used to enable vector that is unmasked in
guest. The kernel used to only support static MSI-X allocation. When
allocating a new interrupt using "static MSI-X allocation" kernels,
Qemu first disables all previously allocated vectors and then
re-allocates all including the new one. The nr_vectors of VFIOPCIDevice
indicates that all vectors from 0 to nr_vectors are allocated (and may
be enabled), which is used to to loop all the possibly used vectors
When, e.g., disabling MSI-X interrupts.

Extend the vector_use function to support dynamic MSI-X allocation when
host supports the capability. Qemu therefore can individually allocate
and enable a new interrupt without affecting others or causing interrupts
lost during runtime.

Utilize nr_vectors to calculate the upper bound of enabled vectors in
dynamic MSI-X allocation mode since looping all msix_entries_nr is not
efficient and unnecessary.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since RFC v1:
- Test vdev->msix->noresize to identify the allocation mode. (Alex)
- Move defer_kvm_irq_routing test out and update nr_vectors in a
  common place before vfio_enable_vectors(). (Alex)
- Revise the comments. (Alex)
---
 hw/vfio/pci.c | 44 +++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8a3b34f3c196..31f36d68bb19 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -470,6 +470,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
     VFIOPCIDevice *vdev = VFIO_PCI(pdev);
     VFIOMSIVector *vector;
     int ret;
+    int old_nr_vecs = vdev->nr_vectors;
 
     trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
 
@@ -512,33 +513,42 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
     }
 
     /*
-     * We don't want to have the host allocate all possible MSI vectors
-     * for a device if they're not in use, so we shutdown and incrementally
-     * increase them as needed.
+     * When dynamic allocation is not supported, we don't want to have the
+     * host allocate all possible MSI vectors for a device if they're not
+     * in use, so we shutdown and incrementally increase them as needed.
+     * nr_vectors represents the total number of vectors allocated.
+     *
+     * When dynamic allocation is supported, let the host only allocate
+     * and enable a vector when it is in use in guest. nr_vectors represents
+     * the upper bound of vectors being enabled (but not all of the ranges
+     * is allocated or enabled).
      */
     if (vdev->nr_vectors < nr + 1) {
         vdev->nr_vectors = nr + 1;
-        if (!vdev->defer_kvm_irq_routing) {
+    }
+
+    if (!vdev->defer_kvm_irq_routing) {
+        if (vdev->msix->noresize && (old_nr_vecs < nr + 1)) {
             vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
             ret = vfio_enable_vectors(vdev, true);
             if (ret) {
                 error_report("vfio: failed to enable vectors, %d", ret);
             }
-        }
-    } else {
-        Error *err = NULL;
-        int32_t fd;
-
-        if (vector->virq >= 0) {
-            fd = event_notifier_get_fd(&vector->kvm_interrupt);
         } else {
-            fd = event_notifier_get_fd(&vector->interrupt);
-        }
+            Error *err = NULL;
+            int32_t fd;
 
-        if (vfio_set_irq_signaling(&vdev->vbasedev,
-                                     VFIO_PCI_MSIX_IRQ_INDEX, nr,
-                                     VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
-            error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+            if (vector->virq >= 0) {
+                fd = event_notifier_get_fd(&vector->kvm_interrupt);
+            } else {
+                fd = event_notifier_get_fd(&vector->interrupt);
+            }
+
+            if (vfio_set_irq_signaling(&vdev->vbasedev,
+                                       VFIO_PCI_MSIX_IRQ_INDEX, nr,
+                                       VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
+                error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+            }
         }
     }
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X
  2023-08-22  7:29 [PATCH v1 0/4] Support dynamic MSI-X allocation Jing Liu
  2023-08-22  7:29 ` [PATCH v1 1/4] vfio/pci: detect the support of " Jing Liu
  2023-08-22  7:29 ` [PATCH v1 2/4] vfio/pci: enable vector on " Jing Liu
@ 2023-08-22  7:29 ` Jing Liu
  2023-08-29 14:04   ` Cédric Le Goater
  2023-08-22  7:29 ` [PATCH v1 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation Jing Liu
  2023-09-15  7:40 ` [PATCH v1 0/4] Support dynamic MSI-X allocation Liu, Jing2
  4 siblings, 1 reply; 14+ messages in thread
From: Jing Liu @ 2023-08-22  7:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
	jing2.liu, jing2.liu

Guests typically enable MSI-X with all of the vectors masked in the MSI-X
vector table. To match the guest state of device, Qemu enables MSI-X by
enabling vector 0 with userspace triggering and immediately release.
However the release function actually does not release it due to already
using userspace mode.

It is no need to enable triggering on host and rely on the mask bit to
avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough
to get MSI-X enabled.

After dynamic MSI-X allocation is supported, the interrupt restoring
also need use such way to enable MSI-X, therefore, create a function
for that.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
---
Changes since RFC v1:
- A new patch. Use an invalid fd to get MSI-X enabled instead of using
  userspace triggering. (Alex)
---
 hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 31f36d68bb19..e24c21241a0c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -369,6 +369,39 @@ static void vfio_msi_interrupt(void *opaque)
     notify(&vdev->pdev, nr);
 }
 
+/*
+ * Get MSI-X enabled, but no vector enabled, by setting vector 0 with an invalid
+ * fd to kernel.
+ */
+static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
+{
+    struct vfio_irq_set *irq_set;
+    int ret = 0, argsz;
+    int32_t *fd;
+
+    argsz = sizeof(*irq_set) + sizeof(*fd);
+
+    irq_set = g_malloc0(argsz);
+    irq_set->argsz = argsz;
+    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+                     VFIO_IRQ_SET_ACTION_TRIGGER;
+    irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+    irq_set->start = 0;
+    irq_set->count = 1;
+    fd = (int32_t *)&irq_set->data;
+    *fd = -1;
+
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    if (ret) {
+        error_report("vfio: failed to enable MSI-X with vector 0 trick, %d",
+                     ret);
+    }
+
+    g_free(irq_set);
+
+    return ret;
+}
+
 static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 {
     struct vfio_irq_set *irq_set;
@@ -618,6 +651,8 @@ static void vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
 
 static void vfio_msix_enable(VFIOPCIDevice *vdev)
 {
+    int ret;
+
     vfio_disable_interrupts(vdev);
 
     vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
@@ -640,8 +675,6 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
     vfio_commit_kvm_msi_virq_batch(vdev);
 
     if (vdev->nr_vectors) {
-        int ret;
-
         ret = vfio_enable_vectors(vdev, true);
         if (ret) {
             error_report("vfio: failed to enable vectors, %d", ret);
@@ -655,13 +688,14 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
          * MSI-X capability, but leaves the vector table masked.  We therefore
          * can't rely on a vector_use callback (from request_irq() in the guest)
          * to switch the physical device into MSI-X mode because that may come a
-         * long time after pci_enable_msix().  This code enables vector 0 with
-         * triggering to userspace, then immediately release the vector, leaving
-         * the physical device with no vectors enabled, but MSI-X enabled, just
-         * like the guest view.
+         * long time after pci_enable_msix().  This code sets vector 0 with an
+         * invalid fd to make the physical device MSI-X enabled, but with no
+         * vectors enabled, just like the guest view.
          */
-        vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
-        vfio_msix_vector_release(&vdev->pdev, 0);
+        ret = vfio_enable_msix_no_vec(vdev);
+        if (ret) {
+            error_report("vfio: failed to enable MSI-X, %d", ret);
+        }
     }
 
     trace_vfio_msix_enable(vdev->vbasedev.name);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation
  2023-08-22  7:29 [PATCH v1 0/4] Support dynamic MSI-X allocation Jing Liu
                   ` (2 preceding siblings ...)
  2023-08-22  7:29 ` [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X Jing Liu
@ 2023-08-22  7:29 ` Jing Liu
  2023-09-15  7:40 ` [PATCH v1 0/4] Support dynamic MSI-X allocation Liu, Jing2
  4 siblings, 0 replies; 14+ messages in thread
From: Jing Liu @ 2023-08-22  7:29 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
	jing2.liu, jing2.liu

During migration restoring, vfio_enable_vectors() is called to restore
enabling MSI-X interrupts for assigned devices. It sets the range from
0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
guest. During the MSI-X enabling, all the vectors within the range are
allocated according to the VFIO_DEVICE_SET_IRQS ioctl.

When dynamic MSI-X allocation is supported, we only want the guest
unmasked vectors being allocated and enabled. Use vector 0 with an
invalid fd to get MSI-X enabled, after that, all the vectors can be
allocated in need.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
---
Changes since RFC v1:
- Revise the comments. (Alex)
- Call the new helper function in previous patch to enable MSI-X. (Alex)
---
 hw/vfio/pci.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e24c21241a0c..3c37d036e727 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -408,6 +408,23 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
     int ret = 0, i, argsz;
     int32_t *fds;
 
+    /*
+     * If dynamic MSI-X allocation is supported, the vectors to be allocated
+     * and enabled can be scattered. Before kernel enabling MSI-X, setting
+     * nr_vectors causes all these vectors to be allocated on host.
+     *
+     * To keep allocation as needed, use vector 0 with an invalid fd to get
+     * MSI-X enabled first, then set vectors with a potentially sparse set of
+     * eventfds to enable interrupts only when enabled in guest.
+     */
+    if (msix && !vdev->msix->noresize) {
+        ret = vfio_enable_msix_no_vec(vdev);
+
+        if (ret) {
+            return ret;
+        }
+    }
+
     argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
 
     irq_set = g_malloc0(argsz);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/4] vfio/pci: detect the support of dynamic MSI-X allocation
  2023-08-22  7:29 ` [PATCH v1 1/4] vfio/pci: detect the support of " Jing Liu
@ 2023-08-29 13:33   ` Cédric Le Goater
  2023-08-30  7:21     ` Liu, Jing2
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2023-08-29 13:33 UTC (permalink / raw)
  To: Jing Liu, qemu-devel
  Cc: alex.williamson, pbonzini, kevin.tian, reinette.chatre, jing2.liu

Hello Jing,

On 8/22/23 09:29, Jing Liu wrote:
> Kernel provides the guidance of dynamic MSI-X allocation support of
> passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to
> guide user space.
> 
> Fetch the flags from host to determine if dynamic MSI-X allocation is
> supported.
> 
> Originally-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> ---
> Changes since RFC v1:
> - Filter the dynamic MSI-X allocation flag and store as a bool type.
>    (Alex)
> - Move the detection to vfio_msix_early_setup(). (Alex)
> - Report error of getting irq info and remove the trace of failure
>    case. (Alex, Cédric)
> ---
>   hw/vfio/pci.c        | 15 +++++++++++++--
>   hw/vfio/pci.h        |  1 +
>   hw/vfio/trace-events |  2 +-
>   3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index a205c6b1130f..8a3b34f3c196 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1493,7 +1493,9 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       uint8_t pos;
>       uint16_t ctrl;
>       uint32_t table, pba;
> -    int fd = vdev->vbasedev.fd;
> +    int ret, fd = vdev->vbasedev.fd;
> +    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
> +                                      .index = VFIO_PCI_MSIX_IRQ_INDEX };
>       VFIOMSIXInfo *msix;
>   
>       pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
> @@ -1530,6 +1532,14 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
>       msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
>   
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "failed to get MSI-X irq info");

Missing :
             g_free(msix);


With this fixed,

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


  
> +        return;
> +    }
> +
> +    msix->noresize = !!(irq_info.flags & VFIO_IRQ_INFO_NORESIZE);
> +
>       /*
>        * Test the size of the pba_offset variable and catch if it extends outside
>        * of the specified BAR. If it is the case, we need to apply a hardware
> @@ -1562,7 +1572,8 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       }
>   
>       trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix->table_bar,
> -                                msix->table_offset, msix->entries);
> +                                msix->table_offset, msix->entries,
> +                                msix->noresize);
>       vdev->msix = msix;
>   
>       vfio_pci_fixup_msix_region(vdev);
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index a2771b9ff3cc..0717574d79e9 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
>       uint32_t table_offset;
>       uint32_t pba_offset;
>       unsigned long *pending;
> +    bool noresize;
>   } VFIOMSIXInfo;
>   
>   #define TYPE_VFIO_PCI "vfio-pci"
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index ee7509e68e4f..6de5d9ba8e46 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -27,7 +27,7 @@ vfio_vga_read(uint64_t addr, int size, uint64_t data) " (0x%"PRIx64", %d) = 0x%"
>   vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s, @0x%x, len=0x%x) 0x%x"
>   vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s, @0x%x, 0x%x, len=0x%x)"
>   vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
> -vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
> +vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int entries, bool noresize) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d, noresize %d"
>   vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
>   vfio_check_pm_reset(const char *name) "%s Supports PM reset"
>   vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X
  2023-08-22  7:29 ` [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X Jing Liu
@ 2023-08-29 14:04   ` Cédric Le Goater
  2023-08-30 10:03     ` Liu, Jing2
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2023-08-29 14:04 UTC (permalink / raw)
  To: Jing Liu, qemu-devel
  Cc: alex.williamson, pbonzini, kevin.tian, reinette.chatre, jing2.liu

On 8/22/23 09:29, Jing Liu wrote:
> Guests typically enable MSI-X with all of the vectors masked in the MSI-X
> vector table. To match the guest state of device, Qemu enables MSI-X by

QEMU is preferred to Qemu.

> enabling vector 0 with userspace triggering and immediately release.
> However the release function actually does not release it due to already
> using userspace mode.
> 
> It is no need to enable triggering on host and rely on the mask bit to
> avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough
> to get MSI-X enabled.
> 
> After dynamic MSI-X allocation is supported, the interrupt restoring
> also need use such way to enable MSI-X, therefore, create a function
> for that.
> 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Jing Liu <jing2.liu@intel.com>
> ---
> Changes since RFC v1:
> - A new patch. Use an invalid fd to get MSI-X enabled instead of using
>    userspace triggering. (Alex)
> ---
>   hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 42 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 31f36d68bb19..e24c21241a0c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -369,6 +369,39 @@ static void vfio_msi_interrupt(void *opaque)
>       notify(&vdev->pdev, nr);
>   }
>   
> +/*
> + * Get MSI-X enabled, but no vector enabled, by setting vector 0 with an invalid
> + * fd to kernel.
> + */
> +static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)> +{
> +    struct vfio_irq_set *irq_set;

This could be a 'g_autofree' variable.

> +    int ret = 0, argsz;
> +    int32_t *fd;
> +
> +    argsz = sizeof(*irq_set) + sizeof(*fd);
> +
> +    irq_set = g_malloc0(argsz);
> +    irq_set->argsz = argsz;
> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> +                     VFIO_IRQ_SET_ACTION_TRIGGER;
> +    irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> +    irq_set->start = 0;
> +    irq_set->count = 1;
> +    fd = (int32_t *)&irq_set->data;
> +    *fd = -1;
> +
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    if (ret) {
> +        error_report("vfio: failed to enable MSI-X with vector 0 trick, %d",
> +                     ret);

The above message seems redundant. I would simply return 'ret' and let
the caller report the error. Same as vfio_enable_vectors().

Thanks,

C.


> +    }
> +
> +    g_free(irq_set);
> +
> +    return ret;
> +}
> +
>   static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>   {
>       struct vfio_irq_set *irq_set;
> @@ -618,6 +651,8 @@ static void vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
>   
>   static void vfio_msix_enable(VFIOPCIDevice *vdev)
>   {
> +    int ret;
> +
>       vfio_disable_interrupts(vdev);
>   
>       vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
> @@ -640,8 +675,6 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>       vfio_commit_kvm_msi_virq_batch(vdev);
>   
>       if (vdev->nr_vectors) {
> -        int ret;
> -
>           ret = vfio_enable_vectors(vdev, true);
>           if (ret) {
>               error_report("vfio: failed to enable vectors, %d", ret);
> @@ -655,13 +688,14 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>            * MSI-X capability, but leaves the vector table masked.  We therefore
>            * can't rely on a vector_use callback (from request_irq() in the guest)
>            * to switch the physical device into MSI-X mode because that may come a
> -         * long time after pci_enable_msix().  This code enables vector 0 with
> -         * triggering to userspace, then immediately release the vector, leaving
> -         * the physical device with no vectors enabled, but MSI-X enabled, just
> -         * like the guest view.
> +         * long time after pci_enable_msix().  This code sets vector 0 with an
> +         * invalid fd to make the physical device MSI-X enabled, but with no
> +         * vectors enabled, just like the guest view.
>            */
> -        vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
> -        vfio_msix_vector_release(&vdev->pdev, 0);
> +        ret = vfio_enable_msix_no_vec(vdev);
> +        if (ret) {
> +            error_report("vfio: failed to enable MSI-X, %d", ret);
> +        }
>       }
>   
>       trace_vfio_msix_enable(vdev->vbasedev.name);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v1 1/4] vfio/pci: detect the support of dynamic MSI-X allocation
  2023-08-29 13:33   ` Cédric Le Goater
@ 2023-08-30  7:21     ` Liu, Jing2
  0 siblings, 0 replies; 14+ messages in thread
From: Liu, Jing2 @ 2023-08-30  7:21 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, pbonzini@redhat.com, Tian, Kevin,
	Chatre, Reinette, jing2.liu@linux.intel.com

Hi Cédric,

Thank you for your reviewing.

On 8/29/2023 9:33 PM, Cédric Le Goater wrote:
> Hello Jing,
> 
> On 8/22/23 09:29, Jing Liu wrote:
> > Kernel provides the guidance of dynamic MSI-X allocation support of
> > passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to
> > guide user space.
> >
> > Fetch the flags from host to determine if dynamic MSI-X allocation is
> > supported.
> >
> > Originally-by: Reinette Chatre <reinette.chatre@intel.com>
> > Signed-off-by: Jing Liu <jing2.liu@intel.com>
> > ---
> > Changes since RFC v1:
> > - Filter the dynamic MSI-X allocation flag and store as a bool type.
> >    (Alex)
> > - Move the detection to vfio_msix_early_setup(). (Alex)
> > - Report error of getting irq info and remove the trace of failure
> >    case. (Alex, Cédric)
> > ---
> >   hw/vfio/pci.c        | 15 +++++++++++++--
> >   hw/vfio/pci.h        |  1 +
> >   hw/vfio/trace-events |  2 +-
> >   3 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > a205c6b1130f..8a3b34f3c196 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -1493,7 +1493,9 @@ static void vfio_msix_early_setup(VFIOPCIDevice
> *vdev, Error **errp)
> >       uint8_t pos;
> >       uint16_t ctrl;
> >       uint32_t table, pba;
> > -    int fd = vdev->vbasedev.fd;
> > +    int ret, fd = vdev->vbasedev.fd;
> > +    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
> > +                                      .index =
> > + VFIO_PCI_MSIX_IRQ_INDEX };
> >       VFIOMSIXInfo *msix;
> >
> >       pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX); @@
> > -1530,6 +1532,14 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev,
> Error **errp)
> >       msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> >       msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
> >
> > +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> > +    if (ret < 0) {
> > +        error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
> 
> Missing :
>              g_free(msix);
> 
Oh, yes. 

> 
> With this fixed,
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> 
Thank you. I'll apply it in next version with the fix. 

Jing

> Thanks,
> 
> C.
> 
> 
> 
> > +        return;
> > +    }
> > +
> > +    msix->noresize = !!(irq_info.flags & VFIO_IRQ_INFO_NORESIZE);
> > +
> >       /*
> >        * Test the size of the pba_offset variable and catch if it extends outside
> >        * of the specified BAR. If it is the case, we need to apply a
> > hardware @@ -1562,7 +1572,8 @@ static void
> vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
> >       }
> >
> >       trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix->table_bar,
> > -                                msix->table_offset, msix->entries);
> > +                                msix->table_offset, msix->entries,
> > +                                msix->noresize);
> >       vdev->msix = msix;
> >
> >       vfio_pci_fixup_msix_region(vdev); diff --git a/hw/vfio/pci.h
> > b/hw/vfio/pci.h index a2771b9ff3cc..0717574d79e9 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
> >       uint32_t table_offset;
> >       uint32_t pba_offset;
> >       unsigned long *pending;
> > +    bool noresize;
> >   } VFIOMSIXInfo;
> >
> >   #define TYPE_VFIO_PCI "vfio-pci"
> > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index
> > ee7509e68e4f..6de5d9ba8e46 100644
> > --- a/hw/vfio/trace-events
> > +++ b/hw/vfio/trace-events
> > @@ -27,7 +27,7 @@ vfio_vga_read(uint64_t addr, int size, uint64_t data) "
> (0x%"PRIx64", %d) = 0x%"
> >   vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s,
> @0x%x, len=0x%x) 0x%x"
> >   vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s,
> @0x%x, 0x%x, len=0x%x)"
> >   vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
> > -vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int
> entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
> > +vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int
> entries, bool noresize) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x,
> entries %d, noresize %d"
> >   vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
> >   vfio_check_pm_reset(const char *name) "%s Supports PM reset"
> >   vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X
  2023-08-29 14:04   ` Cédric Le Goater
@ 2023-08-30 10:03     ` Liu, Jing2
  2023-08-30 10:48       ` Igor Mammedov
  0 siblings, 1 reply; 14+ messages in thread
From: Liu, Jing2 @ 2023-08-30 10:03 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, pbonzini@redhat.com, Tian, Kevin,
	Chatre, Reinette, jing2.liu@linux.intel.com

Hi Cédric，

On 8/29/2023 10:04 PM, Cédric Le Goater wrote:
> On 8/22/23 09:29, Jing Liu wrote:
> > Guests typically enable MSI-X with all of the vectors masked in the
> > MSI-X vector table. To match the guest state of device, Qemu enables
> > MSI-X by
> 
> QEMU is preferred to Qemu.
Got it. 

> 
> > enabling vector 0 with userspace triggering and immediately release.
> > However the release function actually does not release it due to
> > already using userspace mode.
> >
> > It is no need to enable triggering on host and rely on the mask bit to
> > avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough
> > to get MSI-X enabled.
> >
> > After dynamic MSI-X allocation is supported, the interrupt restoring
> > also need use such way to enable MSI-X, therefore, create a function
> > for that.
> >
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Jing Liu <jing2.liu@intel.com>
> > ---
> > Changes since RFC v1:
> > - A new patch. Use an invalid fd to get MSI-X enabled instead of using
> >    userspace triggering. (Alex)
> > ---
> >   hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
> >   1 file changed, 42 insertions(+), 8 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 31f36d68bb19..e24c21241a0c 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -369,6 +369,39 @@ static void vfio_msi_interrupt(void *opaque)
> >       notify(&vdev->pdev, nr);
> >   }
> >
> > +/*
> > + * Get MSI-X enabled, but no vector enabled, by setting vector 0 with
> > +an invalid
> > + * fd to kernel.
> > + */
> > +static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)> +{
> > +    struct vfio_irq_set *irq_set;
> 
> This could be a 'g_autofree' variable.

Thanks for pointing this to me. I just realized QEMU doc recommends 
g_autofree/g_autoptr to do automatic memory deallocation.

I will replace to g_autofree style in next version.

I have a question for a specific example (not related to this patch), and
I failed to find an example in current QEMU code to understand it.
If one g_autofree structure includes a pointer that we also allocate
space for the inside pointer, could g_autofree release the inside space?

struct dummy1 {
    int data;
    struct *p;
}
struct p {
    char member[];
}
void func() {
    g_autofree struct *dummy1 = NULL;

    dummy1 = g_malloc();
    dummy1->p = g_malloc();
    ...
    return;	// is this correct or need g_free(p)?
}

> 
> > +    int ret = 0, argsz;
> > +    int32_t *fd;
> > +
> > +    argsz = sizeof(*irq_set) + sizeof(*fd);
> > +
> > +    irq_set = g_malloc0(argsz);
> > +    irq_set->argsz = argsz;
> > +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > +                     VFIO_IRQ_SET_ACTION_TRIGGER;
> > +    irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> > +    irq_set->start = 0;
> > +    irq_set->count = 1;
> > +    fd = (int32_t *)&irq_set->data;
> > +    *fd = -1;
> > +
> > +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> > +    if (ret) {
> > +        error_report("vfio: failed to enable MSI-X with vector 0 trick, %d",
> > +                     ret);
> 
> The above message seems redundant. I would simply return 'ret' and let the
> caller report the error. Same as vfio_enable_vectors().

Understood. Will change.

Thanks,
Jing

> Thanks,
> 
> C.
> 
> 
> > +    }
> > +
> > +    g_free(irq_set);
> > +
> > +    return ret;
> > +}
> > +
> >   static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> >   {
> >       struct vfio_irq_set *irq_set;
> > @@ -618,6 +651,8 @@ static void
> > vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
> >
> >   static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >   {
> > +    int ret;
> > +
> >       vfio_disable_interrupts(vdev);
> >
> >       vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
> > @@ -640,8 +675,6 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >       vfio_commit_kvm_msi_virq_batch(vdev);
> >
> >       if (vdev->nr_vectors) {
> > -        int ret;
> > -
> >           ret = vfio_enable_vectors(vdev, true);
> >           if (ret) {
> >               error_report("vfio: failed to enable vectors, %d", ret);
> > @@ -655,13 +688,14 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >            * MSI-X capability, but leaves the vector table masked.  We therefore
> >            * can't rely on a vector_use callback (from request_irq() in the guest)
> >            * to switch the physical device into MSI-X mode because that may come
> a
> > -         * long time after pci_enable_msix().  This code enables vector 0 with
> > -         * triggering to userspace, then immediately release the vector, leaving
> > -         * the physical device with no vectors enabled, but MSI-X enabled, just
> > -         * like the guest view.
> > +         * long time after pci_enable_msix().  This code sets vector 0 with an
> > +         * invalid fd to make the physical device MSI-X enabled, but with no
> > +         * vectors enabled, just like the guest view.
> >            */
> > -        vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
> > -        vfio_msix_vector_release(&vdev->pdev, 0);
> > +        ret = vfio_enable_msix_no_vec(vdev);
> > +        if (ret) {
> > +            error_report("vfio: failed to enable MSI-X, %d", ret);
> > +        }
> >       }
> >
> >       trace_vfio_msix_enable(vdev->vbasedev.name);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X
  2023-08-30 10:03     ` Liu, Jing2
@ 2023-08-30 10:48       ` Igor Mammedov
  2023-09-04  7:37         ` Liu, Jing2
  0 siblings, 1 reply; 14+ messages in thread
From: Igor Mammedov @ 2023-08-30 10:48 UTC (permalink / raw)
  To: Liu, Jing2
  Cc: Cédric Le Goater, qemu-devel@nongnu.org,
	alex.williamson@redhat.com, pbonzini@redhat.com, Tian, Kevin,
	Chatre, Reinette, jing2.liu@linux.intel.com

On Wed, 30 Aug 2023 10:03:33 +0000
"Liu, Jing2" <jing2.liu@intel.com> wrote:

> Hi Cédric，
> 
> On 8/29/2023 10:04 PM, Cédric Le Goater wrote:
> > On 8/22/23 09:29, Jing Liu wrote:  
> > > Guests typically enable MSI-X with all of the vectors masked in the
> > > MSI-X vector table. To match the guest state of device, Qemu enables
> > > MSI-X by  
> > 
> > QEMU is preferred to Qemu.  
> Got it. 
> 
> >   
> > > enabling vector 0 with userspace triggering and immediately release.
> > > However the release function actually does not release it due to
> > > already using userspace mode.
> > >
> > > It is no need to enable triggering on host and rely on the mask bit to
> > > avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough
> > > to get MSI-X enabled.
> > >
> > > After dynamic MSI-X allocation is supported, the interrupt restoring
> > > also need use such way to enable MSI-X, therefore, create a function
> > > for that.
> > >
> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > Signed-off-by: Jing Liu <jing2.liu@intel.com>
> > > ---
> > > Changes since RFC v1:
> > > - A new patch. Use an invalid fd to get MSI-X enabled instead of using
> > >    userspace triggering. (Alex)
> > > ---
> > >   hw/vfio/pci.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
> > >   1 file changed, 42 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > > 31f36d68bb19..e24c21241a0c 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -369,6 +369,39 @@ static void vfio_msi_interrupt(void *opaque)
> > >       notify(&vdev->pdev, nr);
> > >   }
> > >
> > > +/*
> > > + * Get MSI-X enabled, but no vector enabled, by setting vector 0 with
> > > +an invalid
> > > + * fd to kernel.
> > > + */
> > > +static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)> +{
> > > +    struct vfio_irq_set *irq_set;  
> > 
> > This could be a 'g_autofree' variable.  
> 
> Thanks for pointing this to me. I just realized QEMU doc recommends 
> g_autofree/g_autoptr to do automatic memory deallocation.
> 
> I will replace to g_autofree style in next version.
> 
> I have a question for a specific example (not related to this patch), and
> I failed to find an example in current QEMU code to understand it.
> If one g_autofree structure includes a pointer that we also allocate
> space for the inside pointer, could g_autofree release the inside space?

it might be too cumbersome for 1-off use
see following for how 'auto' works https://docs.gtk.org/glib/auto-cleanup.html
 
> struct dummy1 {
>     int data;
>     struct *p;
> }
> struct p {
>     char member[];
> }
> void func() {
>     g_autofree struct *dummy1 = NULL;
> 
>     dummy1 = g_malloc();
>     dummy1->p = g_malloc();
>     ...
>     return;	// is this correct or need g_free(p)?
> }
> 
> >   
> > > +    int ret = 0, argsz;
> > > +    int32_t *fd;
> > > +
> > > +    argsz = sizeof(*irq_set) + sizeof(*fd);
> > > +
> > > +    irq_set = g_malloc0(argsz);
> > > +    irq_set->argsz = argsz;
> > > +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > > +                     VFIO_IRQ_SET_ACTION_TRIGGER;
> > > +    irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> > > +    irq_set->start = 0;
> > > +    irq_set->count = 1;
> > > +    fd = (int32_t *)&irq_set->data;
> > > +    *fd = -1;
> > > +
> > > +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> > > +    if (ret) {
> > > +        error_report("vfio: failed to enable MSI-X with vector 0 trick, %d",
> > > +                     ret);  
> > 
> > The above message seems redundant. I would simply return 'ret' and let the
> > caller report the error. Same as vfio_enable_vectors().  
> 
> Understood. Will change.
> 
> Thanks,
> Jing
> 
> > Thanks,
> > 
> > C.
> > 
> >   
> > > +    }
> > > +
> > > +    g_free(irq_set);
> > > +
> > > +    return ret;
> > > +}
> > > +
> > >   static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> > >   {
> > >       struct vfio_irq_set *irq_set;
> > > @@ -618,6 +651,8 @@ static void
> > > vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
> > >
> > >   static void vfio_msix_enable(VFIOPCIDevice *vdev)
> > >   {
> > > +    int ret;
> > > +
> > >       vfio_disable_interrupts(vdev);
> > >
> > >       vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
> > > @@ -640,8 +675,6 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
> > >       vfio_commit_kvm_msi_virq_batch(vdev);
> > >
> > >       if (vdev->nr_vectors) {
> > > -        int ret;
> > > -
> > >           ret = vfio_enable_vectors(vdev, true);
> > >           if (ret) {
> > >               error_report("vfio: failed to enable vectors, %d", ret);
> > > @@ -655,13 +688,14 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
> > >            * MSI-X capability, but leaves the vector table masked.  We therefore
> > >            * can't rely on a vector_use callback (from request_irq() in the guest)
> > >            * to switch the physical device into MSI-X mode because that may come  
> > a  
> > > -         * long time after pci_enable_msix().  This code enables vector 0 with
> > > -         * triggering to userspace, then immediately release the vector, leaving
> > > -         * the physical device with no vectors enabled, but MSI-X enabled, just
> > > -         * like the guest view.
> > > +         * long time after pci_enable_msix().  This code sets vector 0 with an
> > > +         * invalid fd to make the physical device MSI-X enabled, but with no
> > > +         * vectors enabled, just like the guest view.
> > >            */
> > > -        vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
> > > -        vfio_msix_vector_release(&vdev->pdev, 0);
> > > +        ret = vfio_enable_msix_no_vec(vdev);
> > > +        if (ret) {
> > > +            error_report("vfio: failed to enable MSI-X, %d", ret);
> > > +        }
> > >       }
> > >
> > >       trace_vfio_msix_enable(vdev->vbasedev.name);  
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X
  2023-08-30 10:48       ` Igor Mammedov
@ 2023-09-04  7:37         ` Liu, Jing2
  0 siblings, 0 replies; 14+ messages in thread
From: Liu, Jing2 @ 2023-09-04  7:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Cédric Le Goater, qemu-devel@nongnu.org,
	alex.williamson@redhat.com, pbonzini@redhat.com, Tian, Kevin,
	Chatre, Reinette, jing2.liu@linux.intel.com

Hi Igor,

On Wed, August 30, 2023 6:49 PM, Igor Mammedov  wrote:
> 
> On Wed, 30 Aug 2023 10:03:33 +0000
> "Liu, Jing2" <jing2.liu@intel.com> wrote:
> 
...
> > > > +/*
> > > > + * Get MSI-X enabled, but no vector enabled, by setting vector 0
> > > > +with an invalid
> > > > + * fd to kernel.
> > > > + */
> > > > +static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)> +{
> > > > +    struct vfio_irq_set *irq_set;
> > >
> > > This could be a 'g_autofree' variable.
> >
> > Thanks for pointing this to me. I just realized QEMU doc recommends
> > g_autofree/g_autoptr to do automatic memory deallocation.
> >
> > I will replace to g_autofree style in next version.
> >
> > I have a question for a specific example (not related to this patch),
> > and I failed to find an example in current QEMU code to understand it.
> > If one g_autofree structure includes a pointer that we also allocate
> > space for the inside pointer, could g_autofree release the inside space?
> 
> it might be too cumbersome for 1-off use see following for how 'auto' works
> https://docs.gtk.org/glib/auto-cleanup.html

Thank you for the guide. It's now clear to me that for such type, there need define 
specific free function by macros and g_auto can help call it automatically. 

Jing

> 
> > struct dummy1 {
> >     int data;
> >     struct *p;
> > }
> > struct p {
> >     char member[];
> > }
> > void func() {
> >     g_autofree struct *dummy1 = NULL;
> >
> >     dummy1 = g_malloc();
> >     dummy1->p = g_malloc();
> >     ...
> >     return;	// is this correct or need g_free(p)?
> > }
> >
> > >
> > > > +    int ret = 0, argsz;
> > > > +    int32_t *fd;
> > > > +
> > > > +    argsz = sizeof(*irq_set) + sizeof(*fd);
> > > > +
> > > > +    irq_set = g_malloc0(argsz);
> > > > +    irq_set->argsz = argsz;
> > > > +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > > > +                     VFIO_IRQ_SET_ACTION_TRIGGER;
> > > > +    irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> > > > +    irq_set->start = 0;
> > > > +    irq_set->count = 1;
> > > > +    fd = (int32_t *)&irq_set->data;
> > > > +    *fd = -1;
> > > > +
> > > > +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> > > > +    if (ret) {
> > > > +        error_report("vfio: failed to enable MSI-X with vector 0 trick, %d",
> > > > +                     ret);
> > >
> > > The above message seems redundant. I would simply return 'ret' and
> > > let the caller report the error. Same as vfio_enable_vectors().
> >
> > Understood. Will change.
> >
> > Thanks,
> > Jing
> >
> > > Thanks,
> > >
> > > C.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v1 0/4] Support dynamic MSI-X allocation
  2023-08-22  7:29 [PATCH v1 0/4] Support dynamic MSI-X allocation Jing Liu
                   ` (3 preceding siblings ...)
  2023-08-22  7:29 ` [PATCH v1 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation Jing Liu
@ 2023-09-15  7:40 ` Liu, Jing2
  2023-09-15  7:42   ` Cédric Le Goater
  4 siblings, 1 reply; 14+ messages in thread
From: Liu, Jing2 @ 2023-09-15  7:40 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, clg@redhat.com, pbonzini@redhat.com,
	Tian, Kevin, Chatre, Reinette, jing2.liu@linux.intel.com,
	Liu, Jing2

Friendly ping to have your valuable inputs and comments. 
Thanks very much.

BRs,
Jing

> On 8/22/2023 3:29 PM, Jing Liu wrote:
> Changes since RFC v1:
> - RFC v1: https://www.mail-archive.com/qemu-
> devel@nongnu.org/msg978637.html
> - Revise the comments. (Alex)
> - Report error of getting irq info and remove the trace of failure
>   case. (Alex, Cédric)
> - Only store dynamic allocation flag as a bool type and test
>   accordingly. (Alex)
> - Move dynamic allocation detection to vfio_msix_early_setup(). (Alex)
> - Change the condition logic in vfio_msix_vector_do_use() that moving
>   the defer_kvm_irq_routing test out and create a common place to update
>   nr_vectors. (Alex)
> - Consolidate the way of MSI-X enabling during device initialization and
>   interrupt restoring that uses fd = -1 trick. Create a function doing
>   that. (Alex)
> 
> Before kernel v6.5, dynamic allocation of MSI-X interrupts was not supported.
> Qemu therefore when allocating a new interrupt, should first release all
> previously allocated interrupts (including disable of MSI-X) and re-allocate all
> interrupts that includes the new one.
> 
> The kernel series [1] adds the support of dynamic MSI-X allocation to vfio-pci
> and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide user space, that
> when dynamic MSI-X is supported the flag is cleared.
> 
> This series makes the behavior for VFIO PCI devices when dynamic MSI-X
> allocation is supported. When guest unmasks an interrupt, Qemu can directly
> allocate an interrupt on host for this and has nothing to do with the previously
> allocated ones. Therefore, host only allocates interrupts for those unmasked
> (enabled) interrupts inside guest when dynamic MSI-X allocation is supported by
> device.
> 
> When guests enable MSI-X with all of the vectors masked, Qemu need match the
> state to enable MSI-X with no vector enabled. During migration restore, Qemu
> also need enable MSI-X first in dynamic allocation mode, to avoid the guest
> unused vectors being allocated on host. To consolidate them, we use vector 0
> with an invalid fd to get MSI-X enabled and create a common function for this.
> This is cleaner than setting userspace triggering and immediately release.
> 
> Any feedback is appreciated.
> 
> Jing
> 
> [1] https://lwn.net/Articles/931679/
> 
> Jing Liu (4):
>   vfio/pci: detect the support of dynamic MSI-X allocation
>   vfio/pci: enable vector on dynamic MSI-X allocation
>   vfio/pci: use an invalid fd to enable MSI-X
>   vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation
> 
>  hw/vfio/pci.c        | 126 +++++++++++++++++++++++++++++++++----------
>  hw/vfio/pci.h        |   1 +
>  hw/vfio/trace-events |   2 +-
>  3 files changed, 101 insertions(+), 28 deletions(-)
> 
> --
> 2.27.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 0/4] Support dynamic MSI-X allocation
  2023-09-15  7:40 ` [PATCH v1 0/4] Support dynamic MSI-X allocation Liu, Jing2
@ 2023-09-15  7:42   ` Cédric Le Goater
  2023-09-15  8:03     ` Liu, Jing2
  0 siblings, 1 reply; 14+ messages in thread
From: Cédric Le Goater @ 2023-09-15  7:42 UTC (permalink / raw)
  To: Liu, Jing2, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, pbonzini@redhat.com, Tian, Kevin,
	Chatre, Reinette, jing2.liu@linux.intel.com

On 9/15/23 09:40, Liu, Jing2 wrote:
> Friendly ping to have your valuable inputs and comments.
> Thanks very much.

I think that was done. We are waiting for the v2.

Thanks,

C.


> 
> BRs,
> Jing
> 
>> On 8/22/2023 3:29 PM, Jing Liu wrote:
>> Changes since RFC v1:
>> - RFC v1: https://www.mail-archive.com/qemu-
>> devel@nongnu.org/msg978637.html
>> - Revise the comments. (Alex)
>> - Report error of getting irq info and remove the trace of failure
>>    case. (Alex, Cédric)
>> - Only store dynamic allocation flag as a bool type and test
>>    accordingly. (Alex)
>> - Move dynamic allocation detection to vfio_msix_early_setup(). (Alex)
>> - Change the condition logic in vfio_msix_vector_do_use() that moving
>>    the defer_kvm_irq_routing test out and create a common place to update
>>    nr_vectors. (Alex)
>> - Consolidate the way of MSI-X enabling during device initialization and
>>    interrupt restoring that uses fd = -1 trick. Create a function doing
>>    that. (Alex)
>>
>> Before kernel v6.5, dynamic allocation of MSI-X interrupts was not supported.
>> Qemu therefore when allocating a new interrupt, should first release all
>> previously allocated interrupts (including disable of MSI-X) and re-allocate all
>> interrupts that includes the new one.
>>
>> The kernel series [1] adds the support of dynamic MSI-X allocation to vfio-pci
>> and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide user space, that
>> when dynamic MSI-X is supported the flag is cleared.
>>
>> This series makes the behavior for VFIO PCI devices when dynamic MSI-X
>> allocation is supported. When guest unmasks an interrupt, Qemu can directly
>> allocate an interrupt on host for this and has nothing to do with the previously
>> allocated ones. Therefore, host only allocates interrupts for those unmasked
>> (enabled) interrupts inside guest when dynamic MSI-X allocation is supported by
>> device.
>>
>> When guests enable MSI-X with all of the vectors masked, Qemu need match the
>> state to enable MSI-X with no vector enabled. During migration restore, Qemu
>> also need enable MSI-X first in dynamic allocation mode, to avoid the guest
>> unused vectors being allocated on host. To consolidate them, we use vector 0
>> with an invalid fd to get MSI-X enabled and create a common function for this.
>> This is cleaner than setting userspace triggering and immediately release.
>>
>> Any feedback is appreciated.
>>
>> Jing
>>
>> [1] https://lwn.net/Articles/931679/
>>
>> Jing Liu (4):
>>    vfio/pci: detect the support of dynamic MSI-X allocation
>>    vfio/pci: enable vector on dynamic MSI-X allocation
>>    vfio/pci: use an invalid fd to enable MSI-X
>>    vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation
>>
>>   hw/vfio/pci.c        | 126 +++++++++++++++++++++++++++++++++----------
>>   hw/vfio/pci.h        |   1 +
>>   hw/vfio/trace-events |   2 +-
>>   3 files changed, 101 insertions(+), 28 deletions(-)
>>
>> --
>> 2.27.0
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v1 0/4] Support dynamic MSI-X allocation
  2023-09-15  7:42   ` Cédric Le Goater
@ 2023-09-15  8:03     ` Liu, Jing2
  0 siblings, 0 replies; 14+ messages in thread
From: Liu, Jing2 @ 2023-09-15  8:03 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel@nongnu.org
  Cc: alex.williamson@redhat.com, pbonzini@redhat.com, Tian, Kevin,
	Chatre, Reinette, jing2.liu@linux.intel.com

Hi Cédric,
Thanks for this information. I'll send v2 later.

Jing

> On 9/15/2023 3:42 PM, Cédric Le Goater <clg@redhat.com> wrote:
> 
> On 9/15/23 09:40, Liu, Jing2 wrote:
> > Friendly ping to have your valuable inputs and comments.
> > Thanks very much.
> 
> I think that was done. We are waiting for the v2.
> 
> Thanks,
> 
> C.
> 
> 
> >
> > BRs,
> > Jing
> >
> >> On 8/22/2023 3:29 PM, Jing Liu wrote:
> >> Changes since RFC v1:
> >> - RFC v1: https://www.mail-archive.com/qemu-
> >> devel@nongnu.org/msg978637.html
> >> - Revise the comments. (Alex)
> >> - Report error of getting irq info and remove the trace of failure
> >>    case. (Alex, Cédric)
> >> - Only store dynamic allocation flag as a bool type and test
> >>    accordingly. (Alex)
> >> - Move dynamic allocation detection to vfio_msix_early_setup().
> >> (Alex)
> >> - Change the condition logic in vfio_msix_vector_do_use() that moving
> >>    the defer_kvm_irq_routing test out and create a common place to update
> >>    nr_vectors. (Alex)
> >> - Consolidate the way of MSI-X enabling during device initialization and
> >>    interrupt restoring that uses fd = -1 trick. Create a function doing
> >>    that. (Alex)
> >>
> >> Before kernel v6.5, dynamic allocation of MSI-X interrupts was not supported.
> >> Qemu therefore when allocating a new interrupt, should first release
> >> all previously allocated interrupts (including disable of MSI-X) and
> >> re-allocate all interrupts that includes the new one.
> >>
> >> The kernel series [1] adds the support of dynamic MSI-X allocation to
> >> vfio-pci and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide
> >> user space, that when dynamic MSI-X is supported the flag is cleared.
> >>
> >> This series makes the behavior for VFIO PCI devices when dynamic
> >> MSI-X allocation is supported. When guest unmasks an interrupt, Qemu
> >> can directly allocate an interrupt on host for this and has nothing
> >> to do with the previously allocated ones. Therefore, host only
> >> allocates interrupts for those unmasked
> >> (enabled) interrupts inside guest when dynamic MSI-X allocation is
> >> supported by device.
> >>
> >> When guests enable MSI-X with all of the vectors masked, Qemu need
> >> match the state to enable MSI-X with no vector enabled. During
> >> migration restore, Qemu also need enable MSI-X first in dynamic
> >> allocation mode, to avoid the guest unused vectors being allocated on
> >> host. To consolidate them, we use vector 0 with an invalid fd to get MSI-X
> enabled and create a common function for this.
> >> This is cleaner than setting userspace triggering and immediately release.
> >>
> >> Any feedback is appreciated.
> >>
> >> Jing
> >>
> >> [1] https://lwn.net/Articles/931679/
> >>
> >> Jing Liu (4):
> >>    vfio/pci: detect the support of dynamic MSI-X allocation
> >>    vfio/pci: enable vector on dynamic MSI-X allocation
> >>    vfio/pci: use an invalid fd to enable MSI-X
> >>    vfio/pci: enable MSI-X in interrupt restoring on dynamic
> >> allocation
> >>
> >>   hw/vfio/pci.c        | 126 +++++++++++++++++++++++++++++++++----------
> >>   hw/vfio/pci.h        |   1 +
> >>   hw/vfio/trace-events |   2 +-
> >>   3 files changed, 101 insertions(+), 28 deletions(-)
> >>
> >> --
> >> 2.27.0
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-15  8:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-22  7:29 [PATCH v1 0/4] Support dynamic MSI-X allocation Jing Liu
2023-08-22  7:29 ` [PATCH v1 1/4] vfio/pci: detect the support of " Jing Liu
2023-08-29 13:33   ` Cédric Le Goater
2023-08-30  7:21     ` Liu, Jing2
2023-08-22  7:29 ` [PATCH v1 2/4] vfio/pci: enable vector on " Jing Liu
2023-08-22  7:29 ` [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X Jing Liu
2023-08-29 14:04   ` Cédric Le Goater
2023-08-30 10:03     ` Liu, Jing2
2023-08-30 10:48       ` Igor Mammedov
2023-09-04  7:37         ` Liu, Jing2
2023-08-22  7:29 ` [PATCH v1 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation Jing Liu
2023-09-15  7:40 ` [PATCH v1 0/4] Support dynamic MSI-X allocation Liu, Jing2
2023-09-15  7:42   ` Cédric Le Goater
2023-09-15  8:03     ` Liu, Jing2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).