* [PATCH v3 1/4] vfio/pci: detect the support of dynamic MSI-X allocation
2023-09-26 2:14 [PATCH v3 0/4] Support dynamic MSI-X allocation Jing Liu
@ 2023-09-26 2:14 ` Jing Liu
2023-09-26 2:14 ` [PATCH v3 2/4] vfio/pci: enable vector on " Jing Liu
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jing Liu @ 2023-09-26 2:14 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
jing2.liu, jing2.liu
Kernel provides the guidance of dynamic MSI-X allocation support of
passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to
guide user space.
Fetch the flags from host to determine if dynamic MSI-X allocation is
supported.
Originally-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
---
Changes since v2:
- Apply Alex's Reviewed-by.
Changes since v1:
- Free msix when failed to get MSI-X irq info. (Cédric)
- Apply Cédric's Reviewed-by.
Changes since RFC v1:
- Filter the dynamic MSI-X allocation flag and store as a bool type.
(Alex)
- Move the detection to vfio_msix_early_setup(). (Alex)
- Report error of getting irq info and remove the trace of failure
case. (Alex, Cédric)
---
hw/vfio/pci.c | 16 ++++++++++++++--
hw/vfio/pci.h | 1 +
hw/vfio/trace-events | 2 +-
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3b2ca3c24ca2..a94eef50e41e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1493,7 +1493,9 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
uint8_t pos;
uint16_t ctrl;
uint32_t table, pba;
- int fd = vdev->vbasedev.fd;
+ int ret, fd = vdev->vbasedev.fd;
+ struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
+ .index = VFIO_PCI_MSIX_IRQ_INDEX };
VFIOMSIXInfo *msix;
pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
@@ -1530,6 +1532,15 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
+ g_free(msix);
+ return;
+ }
+
+ msix->noresize = !!(irq_info.flags & VFIO_IRQ_INFO_NORESIZE);
+
/*
* Test the size of the pba_offset variable and catch if it extends outside
* of the specified BAR. If it is the case, we need to apply a hardware
@@ -1562,7 +1573,8 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
}
trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix->table_bar,
- msix->table_offset, msix->entries);
+ msix->table_offset, msix->entries,
+ msix->noresize);
vdev->msix = msix;
vfio_pci_fixup_msix_region(vdev);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 2d836093a83d..0d89eb761ece 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
uint32_t table_offset;
uint32_t pba_offset;
unsigned long *pending;
+ bool noresize;
} VFIOMSIXInfo;
#define TYPE_VFIO_PCI "vfio-pci"
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index e64ca4a01961..0ba3c5a0e26b 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -27,7 +27,7 @@ vfio_vga_read(uint64_t addr, int size, uint64_t data) " (0x%"PRIx64", %d) = 0x%"
vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s, @0x%x, len=0x%x) 0x%x"
vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s, @0x%x, 0x%x, len=0x%x)"
vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
-vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
+vfio_msix_early_setup(const char *name, int pos, int table_bar, int offset, int entries, bool noresize) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d, noresize %d"
vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
vfio_check_pm_reset(const char *name) "%s Supports PM reset"
vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"
--
2.27.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/4] vfio/pci: enable vector on dynamic MSI-X allocation
2023-09-26 2:14 [PATCH v3 0/4] Support dynamic MSI-X allocation Jing Liu
2023-09-26 2:14 ` [PATCH v3 1/4] vfio/pci: detect the support of " Jing Liu
@ 2023-09-26 2:14 ` Jing Liu
2023-09-26 2:14 ` [PATCH v3 3/4] vfio/pci: use an invalid fd to enable MSI-X Jing Liu
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jing Liu @ 2023-09-26 2:14 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
jing2.liu, jing2.liu
The vector_use callback is used to enable vector that is unmasked in
guest. The kernel used to only support static MSI-X allocation. When
allocating a new interrupt using "static MSI-X allocation" kernels,
QEMU first disables all previously allocated vectors and then
re-allocates all including the new one. The nr_vectors of VFIOPCIDevice
indicates that all vectors from 0 to nr_vectors are allocated (and may
be enabled), which is used to loop all the possibly used vectors when
e.g., disabling MSI-X interrupts.
Extend the vector_use function to support dynamic MSI-X allocation when
host supports the capability. QEMU therefore can individually allocate
and enable a new interrupt without affecting others or causing interrupts
lost during runtime.
Utilize nr_vectors to calculate the upper bound of enabled vectors in
dynamic MSI-X allocation mode since looping all msix_entries_nr is not
efficient and unnecessary.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
---
Changes since v2:
- Use a bool type to test (vdev->nr_vectors < nr + 1). (Alex)
- Revise the comments. (Alex)
- Apply Alex's Reviewed-by.
Changes since v1:
- Revise Qemu to QEMU.
Changes since RFC v1:
- Test vdev->msix->noresize to identify the allocation mode. (Alex)
- Move defer_kvm_irq_routing test out and update nr_vectors in a
common place before vfio_enable_vectors(). (Alex)
- Revise the comments. (Alex)
---
hw/vfio/pci.c | 46 ++++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a94eef50e41e..27a65302ea69 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -470,6 +470,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
VFIOMSIVector *vector;
int ret;
+ bool resizing = !!(vdev->nr_vectors < nr + 1);
trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
@@ -512,33 +513,42 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
}
/*
- * We don't want to have the host allocate all possible MSI vectors
- * for a device if they're not in use, so we shutdown and incrementally
- * increase them as needed.
+ * When dynamic allocation is not supported, we don't want to have the
+ * host allocate all possible MSI vectors for a device if they're not
+ * in use, so we shutdown and incrementally increase them as needed.
+ * nr_vectors represents the total number of vectors allocated.
+ *
+ * When dynamic allocation is supported, let the host only allocate
+ * and enable a vector when it is in use in guest. nr_vectors represents
+ * the upper bound of vectors being enabled (but not all of the ranges
+ * is allocated or enabled).
*/
- if (vdev->nr_vectors < nr + 1) {
+ if (resizing) {
vdev->nr_vectors = nr + 1;
- if (!vdev->defer_kvm_irq_routing) {
+ }
+
+ if (!vdev->defer_kvm_irq_routing) {
+ if (vdev->msix->noresize && resizing) {
vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
ret = vfio_enable_vectors(vdev, true);
if (ret) {
error_report("vfio: failed to enable vectors, %d", ret);
}
- }
- } else {
- Error *err = NULL;
- int32_t fd;
-
- if (vector->virq >= 0) {
- fd = event_notifier_get_fd(&vector->kvm_interrupt);
} else {
- fd = event_notifier_get_fd(&vector->interrupt);
- }
+ Error *err = NULL;
+ int32_t fd;
- if (vfio_set_irq_signaling(&vdev->vbasedev,
- VFIO_PCI_MSIX_IRQ_INDEX, nr,
- VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
- error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+ if (vector->virq >= 0) {
+ fd = event_notifier_get_fd(&vector->kvm_interrupt);
+ } else {
+ fd = event_notifier_get_fd(&vector->interrupt);
+ }
+
+ if (vfio_set_irq_signaling(&vdev->vbasedev,
+ VFIO_PCI_MSIX_IRQ_INDEX, nr,
+ VFIO_IRQ_SET_ACTION_TRIGGER, fd, &err)) {
+ error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+ }
}
}
--
2.27.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/4] vfio/pci: use an invalid fd to enable MSI-X
2023-09-26 2:14 [PATCH v3 0/4] Support dynamic MSI-X allocation Jing Liu
2023-09-26 2:14 ` [PATCH v3 1/4] vfio/pci: detect the support of " Jing Liu
2023-09-26 2:14 ` [PATCH v3 2/4] vfio/pci: enable vector on " Jing Liu
@ 2023-09-26 2:14 ` Jing Liu
2023-09-26 2:14 ` [PATCH v3 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation Jing Liu
2023-09-26 6:24 ` [PATCH v3 0/4] Support dynamic MSI-X allocation Cédric Le Goater
4 siblings, 0 replies; 6+ messages in thread
From: Jing Liu @ 2023-09-26 2:14 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
jing2.liu, jing2.liu
Guests typically enable MSI-X with all of the vectors masked in the MSI-X
vector table. To match the guest state of device, QEMU enables MSI-X by
enabling vector 0 with userspace triggering and immediately release.
However the release function actually does not release it due to already
using userspace mode.
It is no need to enable triggering on host and rely on the mask bit to
avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough
to get MSI-X enabled.
After dynamic MSI-X allocation is supported, the interrupt restoring
also need use such way to enable MSI-X, therefore, create a function
for that.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
---
Changes since v2:
- Apply Cédric's Reviewed-by.
- Apply Alex's Reviewed-by.
Changes since v1:
- Revise Qemu to QEMU. (Cédric)
- Use g_autofree to automatically release. (Cédric)
- Just return 'ret' and let the caller of vfio_enable_msix_no_vec()
report the error. (Cédric)
Changes since RFC v1:
- A new patch. Use an invalid fd to get MSI-X enabled instead of using
userspace triggering. (Alex)
---
hw/vfio/pci.c | 44 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 27a65302ea69..bf676a49ae77 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -369,6 +369,33 @@ static void vfio_msi_interrupt(void *opaque)
notify(&vdev->pdev, nr);
}
+/*
+ * Get MSI-X enabled, but no vector enabled, by setting vector 0 with an invalid
+ * fd to kernel.
+ */
+static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
+{
+ g_autofree struct vfio_irq_set *irq_set = NULL;
+ int ret = 0, argsz;
+ int32_t *fd;
+
+ argsz = sizeof(*irq_set) + sizeof(*fd);
+
+ irq_set = g_malloc0(argsz);
+ irq_set->argsz = argsz;
+ irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
+ VFIO_IRQ_SET_ACTION_TRIGGER;
+ irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+ irq_set->start = 0;
+ irq_set->count = 1;
+ fd = (int32_t *)&irq_set->data;
+ *fd = -1;
+
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+ return ret;
+}
+
static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
{
struct vfio_irq_set *irq_set;
@@ -618,6 +645,8 @@ static void vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
static void vfio_msix_enable(VFIOPCIDevice *vdev)
{
+ int ret;
+
vfio_disable_interrupts(vdev);
vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
@@ -640,8 +669,6 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
vfio_commit_kvm_msi_virq_batch(vdev);
if (vdev->nr_vectors) {
- int ret;
-
ret = vfio_enable_vectors(vdev, true);
if (ret) {
error_report("vfio: failed to enable vectors, %d", ret);
@@ -655,13 +682,14 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
* MSI-X capability, but leaves the vector table masked. We therefore
* can't rely on a vector_use callback (from request_irq() in the guest)
* to switch the physical device into MSI-X mode because that may come a
- * long time after pci_enable_msix(). This code enables vector 0 with
- * triggering to userspace, then immediately release the vector, leaving
- * the physical device with no vectors enabled, but MSI-X enabled, just
- * like the guest view.
+ * long time after pci_enable_msix(). This code sets vector 0 with an
+ * invalid fd to make the physical device MSI-X enabled, but with no
+ * vectors enabled, just like the guest view.
*/
- vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL);
- vfio_msix_vector_release(&vdev->pdev, 0);
+ ret = vfio_enable_msix_no_vec(vdev);
+ if (ret) {
+ error_report("vfio: failed to enable MSI-X, %d", ret);
+ }
}
trace_vfio_msix_enable(vdev->vbasedev.name);
--
2.27.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation
2023-09-26 2:14 [PATCH v3 0/4] Support dynamic MSI-X allocation Jing Liu
` (2 preceding siblings ...)
2023-09-26 2:14 ` [PATCH v3 3/4] vfio/pci: use an invalid fd to enable MSI-X Jing Liu
@ 2023-09-26 2:14 ` Jing Liu
2023-09-26 6:24 ` [PATCH v3 0/4] Support dynamic MSI-X allocation Cédric Le Goater
4 siblings, 0 replies; 6+ messages in thread
From: Jing Liu @ 2023-09-26 2:14 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, pbonzini, kevin.tian, reinette.chatre,
jing2.liu, jing2.liu
During migration restoring, vfio_enable_vectors() is called to restore
enabling MSI-X interrupts for assigned devices. It sets the range from
0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
guest. During the MSI-X enabling, all the vectors within the range are
allocated according to the VFIO_DEVICE_SET_IRQS ioctl.
When dynamic MSI-X allocation is supported, we only want the guest
unmasked vectors being allocated and enabled. Use vector 0 with an
invalid fd to get MSI-X enabled, after that, all the vectors can be
allocated in need.
Signed-off-by: Jing Liu <jing2.liu@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
---
Changes since v2:
- Apply Cédric's Reviewed-by.
- Apply Alex's Reviewed-by.
Changes since v1:
- No change.
Changes since RFC v1:
- Revise the comments. (Alex)
- Call the new helper function in previous patch to enable MSI-X. (Alex)
---
hw/vfio/pci.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bf676a49ae77..8a082af39e77 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -402,6 +402,23 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
int ret = 0, i, argsz;
int32_t *fds;
+ /*
+ * If dynamic MSI-X allocation is supported, the vectors to be allocated
+ * and enabled can be scattered. Before kernel enabling MSI-X, setting
+ * nr_vectors causes all these vectors to be allocated on host.
+ *
+ * To keep allocation as needed, use vector 0 with an invalid fd to get
+ * MSI-X enabled first, then set vectors with a potentially sparse set of
+ * eventfds to enable interrupts only when enabled in guest.
+ */
+ if (msix && !vdev->msix->noresize) {
+ ret = vfio_enable_msix_no_vec(vdev);
+
+ if (ret) {
+ return ret;
+ }
+ }
+
argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
irq_set = g_malloc0(argsz);
--
2.27.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 0/4] Support dynamic MSI-X allocation
2023-09-26 2:14 [PATCH v3 0/4] Support dynamic MSI-X allocation Jing Liu
` (3 preceding siblings ...)
2023-09-26 2:14 ` [PATCH v3 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation Jing Liu
@ 2023-09-26 6:24 ` Cédric Le Goater
4 siblings, 0 replies; 6+ messages in thread
From: Cédric Le Goater @ 2023-09-26 6:24 UTC (permalink / raw)
To: Jing Liu, qemu-devel
Cc: alex.williamson, pbonzini, kevin.tian, reinette.chatre, jing2.liu
On 9/26/23 04:14, Jing Liu wrote:
> Changes since v2:
> - v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg989852.html
> - Use a bool type to test (vdev->nr_vectors < nr + 1). (Alex)
> - Revise comments. (Alex)
> - Apply Cédric's and Alex's Reviewed-by.
>
> Changes since v1:
> - v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg982842.html
> - Revise Qemu to QEMU. (Cédric)
> - Add g_free when failure of getting MSI-X irq info. (Cédric)
> - Apply Cédric's Reviewed-by. (Cédric)
> - Use g_autofree to automatically release. (Cédric)
> - Remove the failure message in vfio_enable_msix_no_vec(). (Cédric)
>
> Changes since RFC v1:
> - RFC v1: https://www.mail-archive.com/qemu-devel@nongnu.org/msg978637.html
> - Revise the comments. (Alex)
> - Report error of getting irq info and remove the trace of failure
> case. (Alex, Cédric)
> - Only store dynamic allocation flag as a bool type and test
> accordingly. (Alex)
> - Move dynamic allocation detection to vfio_msix_early_setup(). (Alex)
> - Change the condition logic in vfio_msix_vector_do_use() that moving
> the defer_kvm_irq_routing test out and create a common place to update
> nr_vectors. (Alex)
> - Consolidate the way of MSI-X enabling during device initialization and
> interrupt restoring that uses fd = -1 trick. Create a function doing
> that. (Alex)
>
> Before kernel v6.5, dynamic allocation of MSI-X interrupts was not
> supported. QEMU therefore when allocating a new interrupt, should first
> release all previously allocated interrupts (including disable of MSI-X)
> and re-allocate all interrupts that includes the new one.
>
> The kernel series [1] adds the support of dynamic MSI-X allocation to
> vfio-pci and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide user
> space, that when dynamic MSI-X is supported the flag is cleared.
>
> This series makes the behavior for VFIO PCI devices when dynamic MSI-X
> allocation is supported. When guest unmasks an interrupt, QEMU can
> directly allocate an interrupt on host for this and has nothing to do
> with the previously allocated ones. Therefore, host only allocates
> interrupts for those unmasked (enabled) interrupts inside guest when
> dynamic MSI-X allocation is supported by device.
>
> When guests enable MSI-X with all of the vectors masked, QEMU need match
> the state to enable MSI-X with no vector enabled. During migration
> restore, QEMU also need enable MSI-X first in dynamic allocation mode,
> to avoid the guest unused vectors being allocated on host. To
> consolidate them, we use vector 0 with an invalid fd to get MSI-X
> enabled and create a common function for this. This is cleaner than
> setting userspace triggering and immediately release.
>
> Any feedback is appreciated.
>
> Jing
>
> [1] https://lwn.net/Articles/931679/
>
> Jing Liu (4):
> vfio/pci: detect the support of dynamic MSI-X allocation
> vfio/pci: enable vector on dynamic MSI-X allocation
> vfio/pci: use an invalid fd to enable MSI-X
> vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation
>
> hw/vfio/pci.c | 123 +++++++++++++++++++++++++++++++++----------
> hw/vfio/pci.h | 1 +
> hw/vfio/trace-events | 2 +-
> 3 files changed, 97 insertions(+), 29 deletions(-)
>
Applied to vfio-next.
Thanks,
C.
^ permalink raw reply [flat|nested] 6+ messages in thread