* [PATCH V5 01/11] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 02/11] vfio/pci: Remove negative check on unsigned vector Reinette Chatre
` (13 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
vfio_msi_disable() releases all previously allocated state
associated with each interrupt before disabling MSI/MSI-X.
vfio_msi_disable() iterates twice over the interrupt state:
first directly with a for loop to do virqfd cleanup, followed
by another for loop within vfio_msi_set_block() that removes
the interrupt handler and its associated state using
vfio_msi_set_vector_signal().
Simplify interrupt cleanup by iterating over allocated interrupts
once.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag.
Changes since V2:
- Improve accuracy of changelog.
drivers/vfio/pci/vfio_pci_intrs.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index bffb0741518b..6a9c6a143cc3 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -426,10 +426,9 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
for (i = 0; i < vdev->num_ctx; i++) {
vfio_virqfd_disable(&vdev->ctx[i].unmask);
vfio_virqfd_disable(&vdev->ctx[i].mask);
+ vfio_msi_set_vector_signal(vdev, i, -1, msix);
}
- vfio_msi_set_block(vdev, 0, vdev->num_ctx, NULL, msix);
-
cmd = vfio_pci_memory_lock_and_enable(vdev);
pci_free_irq_vectors(pdev);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH V5 02/11] vfio/pci: Remove negative check on unsigned vector
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 01/11] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 03/11] vfio/pci: Prepare for dynamic interrupt context storage Reinette Chatre
` (12 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
User space provides the vector as an unsigned int that is checked
early for validity (vfio_set_irqs_validate_and_prepare()).
A later negative check of the provided vector is not necessary.
Remove the negative check and ensure the type used
for the vector is consistent as an unsigned int.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag. (Kevin)
Changes since V2:
- Rework unwind loop within vfio_msi_set_block() that required j
to be an int. Rework results in both i and j used for vector, both
moved to be unsigned int. (Alex)
drivers/vfio/pci/vfio_pci_intrs.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 6a9c6a143cc3..258de57ef956 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -317,14 +317,14 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
}
static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
- int vector, int fd, bool msix)
+ unsigned int vector, int fd, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
struct eventfd_ctx *trigger;
int irq, ret;
u16 cmd;
- if (vector < 0 || vector >= vdev->num_ctx)
+ if (vector >= vdev->num_ctx)
return -EINVAL;
irq = pci_irq_vector(pdev, vector);
@@ -399,7 +399,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
unsigned count, int32_t *fds, bool msix)
{
- int i, j, ret = 0;
+ unsigned int i, j;
+ int ret = 0;
if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
return -EINVAL;
@@ -410,8 +411,8 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
}
if (ret) {
- for (--j; j >= (int)start; j--)
- vfio_msi_set_vector_signal(vdev, j, -1, msix);
+ for (i = start; i < j; i++)
+ vfio_msi_set_vector_signal(vdev, i, -1, msix);
}
return ret;
@@ -420,7 +421,7 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
- int i;
+ unsigned int i;
u16 cmd;
for (i = 0; i < vdev->num_ctx; i++) {
@@ -542,7 +543,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
- int i;
+ unsigned int i;
bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH V5 03/11] vfio/pci: Prepare for dynamic interrupt context storage
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 01/11] vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 02/11] vfio/pci: Remove negative check on unsigned vector Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-17 2:12 ` Tian, Kevin
2023-05-11 15:44 ` [PATCH V5 04/11] vfio/pci: Move to single error path Reinette Chatre
` (11 subsequent siblings)
14 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
Interrupt context storage is statically allocated at the time
interrupts are allocated. Following allocation, the interrupt
context is managed by directly accessing the elements of the
array using the vector as index.
It is possible to allocate additional MSI-X vectors after
MSI-X has been enabled. Dynamic storage of interrupt context
is needed to support adding new MSI-X vectors after initial
allocation.
Replace direct access of array elements with pointers to the
array elements. Doing so reduces impact of moving to a new data
structure. Move interactions with the array to helpers to
mostly contain changes needed to transition to a dynamic
data structure.
No functional change intended.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V4:
- Non-existing interrupt context should be treated as kernel bug
with WARN_ON_ONCE(). (Kevin)
- Move interrupt context retrieval in vfio_pci_intx_mask() and
vfio_pci_intx_unmask_handler() to support scenario of mask/unmask
of INTx without INTx being enabled.
Changes since RFC V1:
- Improve accuracy of changelog.
drivers/vfio/pci/vfio_pci_intrs.c | 215 +++++++++++++++++++++---------
1 file changed, 149 insertions(+), 66 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 258de57ef956..6094679349d9 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -48,6 +48,31 @@ static bool is_irq_none(struct vfio_pci_core_device *vdev)
vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
}
+static
+struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
+ unsigned long index)
+{
+ if (index >= vdev->num_ctx)
+ return NULL;
+ return &vdev->ctx[index];
+}
+
+static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
+{
+ kfree(vdev->ctx);
+}
+
+static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
+ unsigned long num)
+{
+ vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
+ GFP_KERNEL_ACCOUNT);
+ if (!vdev->ctx)
+ return -ENOMEM;
+
+ return 0;
+}
+
/*
* INTx
*/
@@ -55,14 +80,21 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
{
struct vfio_pci_core_device *vdev = opaque;
- if (likely(is_intx(vdev) && !vdev->virq_disabled))
- eventfd_signal(vdev->ctx[0].trigger, 1);
+ if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
+ struct vfio_pci_irq_ctx *ctx;
+
+ ctx = vfio_irq_ctx_get(vdev, 0);
+ if (WARN_ON_ONCE(!ctx))
+ return;
+ eventfd_signal(ctx->trigger, 1);
+ }
}
/* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
{
struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_irq_ctx *ctx;
unsigned long flags;
bool masked_changed = false;
@@ -77,7 +109,14 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
if (unlikely(!is_intx(vdev))) {
if (vdev->pci_2_3)
pci_intx(pdev, 0);
- } else if (!vdev->ctx[0].masked) {
+ goto out_unlock;
+ }
+
+ ctx = vfio_irq_ctx_get(vdev, 0);
+ if (WARN_ON_ONCE(!ctx))
+ goto out_unlock;
+
+ if (!ctx->masked) {
/*
* Can't use check_and_mask here because we always want to
* mask, not just when something is pending.
@@ -87,10 +126,11 @@ bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
else
disable_irq_nosync(pdev->irq);
- vdev->ctx[0].masked = true;
+ ctx->masked = true;
masked_changed = true;
}
+out_unlock:
spin_unlock_irqrestore(&vdev->irqlock, flags);
return masked_changed;
}
@@ -105,6 +145,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
{
struct vfio_pci_core_device *vdev = opaque;
struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_irq_ctx *ctx;
unsigned long flags;
int ret = 0;
@@ -117,7 +158,14 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
if (unlikely(!is_intx(vdev))) {
if (vdev->pci_2_3)
pci_intx(pdev, 1);
- } else if (vdev->ctx[0].masked && !vdev->virq_disabled) {
+ goto out_unlock;
+ }
+
+ ctx = vfio_irq_ctx_get(vdev, 0);
+ if (WARN_ON_ONCE(!ctx))
+ goto out_unlock;
+
+ if (ctx->masked && !vdev->virq_disabled) {
/*
* A pending interrupt here would immediately trigger,
* but we can avoid that overhead by just re-sending
@@ -129,9 +177,10 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
} else
enable_irq(pdev->irq);
- vdev->ctx[0].masked = (ret > 0);
+ ctx->masked = (ret > 0);
}
+out_unlock:
spin_unlock_irqrestore(&vdev->irqlock, flags);
return ret;
@@ -146,18 +195,23 @@ void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
{
struct vfio_pci_core_device *vdev = dev_id;
+ struct vfio_pci_irq_ctx *ctx;
unsigned long flags;
int ret = IRQ_NONE;
+ ctx = vfio_irq_ctx_get(vdev, 0);
+ if (WARN_ON_ONCE(!ctx))
+ return ret;
+
spin_lock_irqsave(&vdev->irqlock, flags);
if (!vdev->pci_2_3) {
disable_irq_nosync(vdev->pdev->irq);
- vdev->ctx[0].masked = true;
+ ctx->masked = true;
ret = IRQ_HANDLED;
- } else if (!vdev->ctx[0].masked && /* may be shared */
+ } else if (!ctx->masked && /* may be shared */
pci_check_and_mask_intx(vdev->pdev)) {
- vdev->ctx[0].masked = true;
+ ctx->masked = true;
ret = IRQ_HANDLED;
}
@@ -171,15 +225,24 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
{
+ struct vfio_pci_irq_ctx *ctx;
+ int ret;
+
if (!is_irq_none(vdev))
return -EINVAL;
if (!vdev->pdev->irq)
return -ENODEV;
- vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL_ACCOUNT);
- if (!vdev->ctx)
- return -ENOMEM;
+ ret = vfio_irq_ctx_alloc_num(vdev, 1);
+ if (ret)
+ return ret;
+
+ ctx = vfio_irq_ctx_get(vdev, 0);
+ if (!ctx) {
+ vfio_irq_ctx_free_all(vdev);
+ return -EINVAL;
+ }
vdev->num_ctx = 1;
@@ -189,9 +252,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
* here, non-PCI-2.3 devices will have to wait until the
* interrupt is enabled.
*/
- vdev->ctx[0].masked = vdev->virq_disabled;
+ ctx->masked = vdev->virq_disabled;
if (vdev->pci_2_3)
- pci_intx(vdev->pdev, !vdev->ctx[0].masked);
+ pci_intx(vdev->pdev, !ctx->masked);
vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
@@ -202,41 +265,46 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
{
struct pci_dev *pdev = vdev->pdev;
unsigned long irqflags = IRQF_SHARED;
+ struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
unsigned long flags;
int ret;
- if (vdev->ctx[0].trigger) {
+ ctx = vfio_irq_ctx_get(vdev, 0);
+ if (WARN_ON_ONCE(!ctx))
+ return -EINVAL;
+
+ if (ctx->trigger) {
free_irq(pdev->irq, vdev);
- kfree(vdev->ctx[0].name);
- eventfd_ctx_put(vdev->ctx[0].trigger);
- vdev->ctx[0].trigger = NULL;
+ kfree(ctx->name);
+ eventfd_ctx_put(ctx->trigger);
+ ctx->trigger = NULL;
}
if (fd < 0) /* Disable only */
return 0;
- vdev->ctx[0].name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
- pci_name(pdev));
- if (!vdev->ctx[0].name)
+ ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
+ pci_name(pdev));
+ if (!ctx->name)
return -ENOMEM;
trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger)) {
- kfree(vdev->ctx[0].name);
+ kfree(ctx->name);
return PTR_ERR(trigger);
}
- vdev->ctx[0].trigger = trigger;
+ ctx->trigger = trigger;
if (!vdev->pci_2_3)
irqflags = 0;
ret = request_irq(pdev->irq, vfio_intx_handler,
- irqflags, vdev->ctx[0].name, vdev);
+ irqflags, ctx->name, vdev);
if (ret) {
- vdev->ctx[0].trigger = NULL;
- kfree(vdev->ctx[0].name);
+ ctx->trigger = NULL;
+ kfree(ctx->name);
eventfd_ctx_put(trigger);
return ret;
}
@@ -246,7 +314,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
* disable_irq won't.
*/
spin_lock_irqsave(&vdev->irqlock, flags);
- if (!vdev->pci_2_3 && vdev->ctx[0].masked)
+ if (!vdev->pci_2_3 && ctx->masked)
disable_irq_nosync(pdev->irq);
spin_unlock_irqrestore(&vdev->irqlock, flags);
@@ -255,12 +323,18 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
{
- vfio_virqfd_disable(&vdev->ctx[0].unmask);
- vfio_virqfd_disable(&vdev->ctx[0].mask);
+ struct vfio_pci_irq_ctx *ctx;
+
+ ctx = vfio_irq_ctx_get(vdev, 0);
+ WARN_ON_ONCE(!ctx);
+ if (ctx) {
+ vfio_virqfd_disable(&ctx->unmask);
+ vfio_virqfd_disable(&ctx->mask);
+ }
vfio_intx_set_signal(vdev, -1);
vdev->irq_type = VFIO_PCI_NUM_IRQS;
vdev->num_ctx = 0;
- kfree(vdev->ctx);
+ vfio_irq_ctx_free_all(vdev);
}
/*
@@ -284,10 +358,9 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
if (!is_irq_none(vdev))
return -EINVAL;
- vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx),
- GFP_KERNEL_ACCOUNT);
- if (!vdev->ctx)
- return -ENOMEM;
+ ret = vfio_irq_ctx_alloc_num(vdev, nvec);
+ if (ret)
+ return ret;
/* return the number of supported vectors if we can't get all: */
cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -296,7 +369,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
if (ret > 0)
pci_free_irq_vectors(pdev);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
- kfree(vdev->ctx);
+ vfio_irq_ctx_free_all(vdev);
return ret;
}
vfio_pci_memory_unlock_and_restore(vdev, cmd);
@@ -320,6 +393,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
unsigned int vector, int fd, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
int irq, ret;
u16 cmd;
@@ -327,33 +401,33 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
if (vector >= vdev->num_ctx)
return -EINVAL;
+ ctx = vfio_irq_ctx_get(vdev, vector);
+ if (!ctx)
+ return -EINVAL;
irq = pci_irq_vector(pdev, vector);
- if (vdev->ctx[vector].trigger) {
- irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
+ if (ctx->trigger) {
+ irq_bypass_unregister_producer(&ctx->producer);
cmd = vfio_pci_memory_lock_and_enable(vdev);
- free_irq(irq, vdev->ctx[vector].trigger);
+ free_irq(irq, ctx->trigger);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
-
- kfree(vdev->ctx[vector].name);
- eventfd_ctx_put(vdev->ctx[vector].trigger);
- vdev->ctx[vector].trigger = NULL;
+ kfree(ctx->name);
+ eventfd_ctx_put(ctx->trigger);
+ ctx->trigger = NULL;
}
if (fd < 0)
return 0;
- vdev->ctx[vector].name = kasprintf(GFP_KERNEL_ACCOUNT,
- "vfio-msi%s[%d](%s)",
- msix ? "x" : "", vector,
- pci_name(pdev));
- if (!vdev->ctx[vector].name)
+ ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
+ msix ? "x" : "", vector, pci_name(pdev));
+ if (!ctx->name)
return -ENOMEM;
trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger)) {
- kfree(vdev->ctx[vector].name);
+ kfree(ctx->name);
return PTR_ERR(trigger);
}
@@ -372,26 +446,25 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
pci_write_msi_msg(irq, &msg);
}
- ret = request_irq(irq, vfio_msihandler, 0,
- vdev->ctx[vector].name, trigger);
+ ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
if (ret) {
- kfree(vdev->ctx[vector].name);
+ kfree(ctx->name);
eventfd_ctx_put(trigger);
return ret;
}
- vdev->ctx[vector].producer.token = trigger;
- vdev->ctx[vector].producer.irq = irq;
- ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
+ ctx->producer.token = trigger;
+ ctx->producer.irq = irq;
+ ret = irq_bypass_register_producer(&ctx->producer);
if (unlikely(ret)) {
dev_info(&pdev->dev,
"irq bypass producer (token %p) registration fails: %d\n",
- vdev->ctx[vector].producer.token, ret);
+ ctx->producer.token, ret);
- vdev->ctx[vector].producer.token = NULL;
+ ctx->producer.token = NULL;
}
- vdev->ctx[vector].trigger = trigger;
+ ctx->trigger = trigger;
return 0;
}
@@ -421,13 +494,17 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
+ struct vfio_pci_irq_ctx *ctx;
unsigned int i;
u16 cmd;
for (i = 0; i < vdev->num_ctx; i++) {
- vfio_virqfd_disable(&vdev->ctx[i].unmask);
- vfio_virqfd_disable(&vdev->ctx[i].mask);
- vfio_msi_set_vector_signal(vdev, i, -1, msix);
+ ctx = vfio_irq_ctx_get(vdev, i);
+ if (ctx) {
+ vfio_virqfd_disable(&ctx->unmask);
+ vfio_virqfd_disable(&ctx->mask);
+ vfio_msi_set_vector_signal(vdev, i, -1, msix);
+ }
}
cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -443,7 +520,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
vdev->irq_type = VFIO_PCI_NUM_IRQS;
vdev->num_ctx = 0;
- kfree(vdev->ctx);
+ vfio_irq_ctx_free_all(vdev);
}
/*
@@ -463,14 +540,18 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
if (unmask)
vfio_pci_intx_unmask(vdev);
} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
+ struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
int32_t fd = *(int32_t *)data;
+
+ if (WARN_ON_ONCE(!ctx))
+ return -EINVAL;
if (fd >= 0)
return vfio_virqfd_enable((void *) vdev,
vfio_pci_intx_unmask_handler,
vfio_send_intx_eventfd, NULL,
- &vdev->ctx[0].unmask, fd);
+ &ctx->unmask, fd);
- vfio_virqfd_disable(&vdev->ctx[0].unmask);
+ vfio_virqfd_disable(&ctx->unmask);
}
return 0;
@@ -543,6 +624,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags, void *data)
{
+ struct vfio_pci_irq_ctx *ctx;
unsigned int i;
bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
@@ -577,14 +659,15 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
return -EINVAL;
for (i = start; i < start + count; i++) {
- if (!vdev->ctx[i].trigger)
+ ctx = vfio_irq_ctx_get(vdev, i);
+ if (!ctx || !ctx->trigger)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
- eventfd_signal(vdev->ctx[i].trigger, 1);
+ eventfd_signal(ctx->trigger, 1);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
uint8_t *bools = data;
if (bools[i - start])
- eventfd_signal(vdev->ctx[i].trigger, 1);
+ eventfd_signal(ctx->trigger, 1);
}
}
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* RE: [PATCH V5 03/11] vfio/pci: Prepare for dynamic interrupt context storage
2023-05-11 15:44 ` [PATCH V5 03/11] vfio/pci: Prepare for dynamic interrupt context storage Reinette Chatre
@ 2023-05-17 2:12 ` Tian, Kevin
0 siblings, 0 replies; 25+ messages in thread
From: Tian, Kevin @ 2023-05-17 2:12 UTC (permalink / raw)
To: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
shameerali.kolothum.thodi@huawei.com, alex.williamson@redhat.com
Cc: tglx@linutronix.de, darwi@linutronix.de, kvm@vger.kernel.org,
Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org
> From: Chatre, Reinette <reinette.chatre@intel.com>
> Sent: Thursday, May 11, 2023 11:45 PM
>
> Interrupt context storage is statically allocated at the time
> interrupts are allocated. Following allocation, the interrupt
> context is managed by directly accessing the elements of the
> array using the vector as index.
>
> It is possible to allocate additional MSI-X vectors after
> MSI-X has been enabled. Dynamic storage of interrupt context
> is needed to support adding new MSI-X vectors after initial
> allocation.
>
> Replace direct access of array elements with pointers to the
> array elements. Doing so reduces impact of moving to a new data
> structure. Move interactions with the array to helpers to
> mostly contain changes needed to transition to a dynamic
> data structure.
>
> No functional change intended.
>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH V5 04/11] vfio/pci: Move to single error path
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (2 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 03/11] vfio/pci: Prepare for dynamic interrupt context storage Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 05/11] vfio/pci: Use xarray for interrupt context storage Reinette Chatre
` (10 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
Enabling and disabling of an interrupt involves several steps
that can fail. Cleanup after failure is done when the error
is encountered, resulting in some repetitive code.
Support for dynamic contexts will introduce more steps during
interrupt enabling and disabling.
Transition to centralized exit path in preparation for dynamic
contexts to eliminate duplicate error handling code.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag. (Kevin)
Changes since V2:
- Move patch to earlier in series in support of the change to
dynamic context management.
- Do not add the "ctx->name = NULL" in error path. It is not done
in baseline and will not be needed when transitioning to
dynamic context management.
- Update changelog to not make this change specific to dynamic
MSI-X.
Changes since RFC V1:
- Improve changelog.
drivers/vfio/pci/vfio_pci_intrs.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 6094679349d9..96396e1ad085 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -427,8 +427,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger)) {
- kfree(ctx->name);
- return PTR_ERR(trigger);
+ ret = PTR_ERR(trigger);
+ goto out_free_name;
}
/*
@@ -448,11 +448,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
- if (ret) {
- kfree(ctx->name);
- eventfd_ctx_put(trigger);
- return ret;
- }
+ if (ret)
+ goto out_put_eventfd_ctx;
ctx->producer.token = trigger;
ctx->producer.irq = irq;
@@ -467,6 +464,12 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
ctx->trigger = trigger;
return 0;
+
+out_put_eventfd_ctx:
+ eventfd_ctx_put(trigger);
+out_free_name:
+ kfree(ctx->name);
+ return ret;
}
static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH V5 05/11] vfio/pci: Use xarray for interrupt context storage
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (3 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 04/11] vfio/pci: Move to single error path Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 06/11] vfio/pci: Remove interrupt context counter Reinette Chatre
` (9 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
Interrupt context is statically allocated at the time interrupts
are allocated. Following allocation, the context is managed by
directly accessing the elements of the array using the vector
as index. The storage is released when interrupts are disabled.
It is possible to dynamically allocate a single MSI-X interrupt
after MSI-X is enabled. A dynamic storage for interrupt context
is needed to support this. Replace the interrupt context array with an
xarray (similar to what the core uses as store for MSI descriptors)
that can support the dynamic expansion while maintaining the
custom that uses the vector as index.
With a dynamic storage it is no longer required to pre-allocate
interrupt contexts at the time the interrupts are allocated.
MSI and MSI-X interrupt contexts are only used when interrupts are
enabled. Their allocation can thus be delayed until interrupt enabling.
Only enabled interrupts will have associated interrupt contexts.
Whether an interrupt has been allocated (a Linux irq number exists
for it) becomes the criteria for whether an interrupt can be enabled.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/lkml/20230404122444.59e36a99.alex.williamson@redhat.com/
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag. (Kevin)
Changes since V2:
- Only allocate contexts as they are used, or "active". (Alex)
- Move vfio_irq_ctx_free() from later patch to prevent open-coding
the same within vfio_irq_ctx_free_all(). This evolved into
vfio_irq_ctx_free() used for dynamic context allocation and
vfio_irq_ctx_free_all() removed because of it. (Alex)
- With vfio_irq_ctx_alloc_num() removed, rename vfio_irq_ctx_alloc_single()
to vfio_irq_ctx_alloc().
Changes since RFC V1:
- Let vfio_irq_ctx_alloc_single() return pointer to allocated
context. (Alex)
- Transition INTx allocation to simplified vfio_irq_ctx_alloc_single().
- Improve accuracy of changelog.
drivers/vfio/pci/vfio_pci_core.c | 1 +
drivers/vfio/pci/vfio_pci_intrs.c | 91 ++++++++++++++++---------------
include/linux/vfio_pci_core.h | 2 +-
3 files changed, 48 insertions(+), 46 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..ae0e161c7fc9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2102,6 +2102,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev)
INIT_LIST_HEAD(&vdev->vma_list);
INIT_LIST_HEAD(&vdev->sriov_pfs_item);
init_rwsem(&vdev->memory_lock);
+ xa_init(&vdev->ctx);
return 0;
}
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 96396e1ad085..77957274027c 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -52,25 +52,33 @@ static
struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
unsigned long index)
{
- if (index >= vdev->num_ctx)
- return NULL;
- return &vdev->ctx[index];
+ return xa_load(&vdev->ctx, index);
}
-static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
+static void vfio_irq_ctx_free(struct vfio_pci_core_device *vdev,
+ struct vfio_pci_irq_ctx *ctx, unsigned long index)
{
- kfree(vdev->ctx);
+ xa_erase(&vdev->ctx, index);
+ kfree(ctx);
}
-static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
- unsigned long num)
+static struct vfio_pci_irq_ctx *
+vfio_irq_ctx_alloc(struct vfio_pci_core_device *vdev, unsigned long index)
{
- vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
- GFP_KERNEL_ACCOUNT);
- if (!vdev->ctx)
- return -ENOMEM;
+ struct vfio_pci_irq_ctx *ctx;
+ int ret;
- return 0;
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+ if (!ctx)
+ return NULL;
+
+ ret = xa_insert(&vdev->ctx, index, ctx, GFP_KERNEL_ACCOUNT);
+ if (ret) {
+ kfree(ctx);
+ return NULL;
+ }
+
+ return ctx;
}
/*
@@ -226,7 +234,6 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
{
struct vfio_pci_irq_ctx *ctx;
- int ret;
if (!is_irq_none(vdev))
return -EINVAL;
@@ -234,15 +241,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
if (!vdev->pdev->irq)
return -ENODEV;
- ret = vfio_irq_ctx_alloc_num(vdev, 1);
- if (ret)
- return ret;
-
- ctx = vfio_irq_ctx_get(vdev, 0);
- if (!ctx) {
- vfio_irq_ctx_free_all(vdev);
- return -EINVAL;
- }
+ ctx = vfio_irq_ctx_alloc(vdev, 0);
+ if (!ctx)
+ return -ENOMEM;
vdev->num_ctx = 1;
@@ -334,7 +335,7 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
vfio_intx_set_signal(vdev, -1);
vdev->irq_type = VFIO_PCI_NUM_IRQS;
vdev->num_ctx = 0;
- vfio_irq_ctx_free_all(vdev);
+ vfio_irq_ctx_free(vdev, ctx, 0);
}
/*
@@ -358,10 +359,6 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
if (!is_irq_none(vdev))
return -EINVAL;
- ret = vfio_irq_ctx_alloc_num(vdev, nvec);
- if (ret)
- return ret;
-
/* return the number of supported vectors if we can't get all: */
cmd = vfio_pci_memory_lock_and_enable(vdev);
ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
@@ -369,7 +366,6 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
if (ret > 0)
pci_free_irq_vectors(pdev);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
- vfio_irq_ctx_free_all(vdev);
return ret;
}
vfio_pci_memory_unlock_and_restore(vdev, cmd);
@@ -401,12 +397,13 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
if (vector >= vdev->num_ctx)
return -EINVAL;
- ctx = vfio_irq_ctx_get(vdev, vector);
- if (!ctx)
- return -EINVAL;
irq = pci_irq_vector(pdev, vector);
+ if (irq < 0)
+ return -EINVAL;
- if (ctx->trigger) {
+ ctx = vfio_irq_ctx_get(vdev, vector);
+
+ if (ctx) {
irq_bypass_unregister_producer(&ctx->producer);
cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -414,16 +411,22 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
vfio_pci_memory_unlock_and_restore(vdev, cmd);
kfree(ctx->name);
eventfd_ctx_put(ctx->trigger);
- ctx->trigger = NULL;
+ vfio_irq_ctx_free(vdev, ctx, vector);
}
if (fd < 0)
return 0;
+ ctx = vfio_irq_ctx_alloc(vdev, vector);
+ if (!ctx)
+ return -ENOMEM;
+
ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
msix ? "x" : "", vector, pci_name(pdev));
- if (!ctx->name)
- return -ENOMEM;
+ if (!ctx->name) {
+ ret = -ENOMEM;
+ goto out_free_ctx;
+ }
trigger = eventfd_ctx_fdget(fd);
if (IS_ERR(trigger)) {
@@ -469,6 +472,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
eventfd_ctx_put(trigger);
out_free_name:
kfree(ctx->name);
+out_free_ctx:
+ vfio_irq_ctx_free(vdev, ctx, vector);
return ret;
}
@@ -498,16 +503,13 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
- unsigned int i;
+ unsigned long i;
u16 cmd;
- for (i = 0; i < vdev->num_ctx; i++) {
- ctx = vfio_irq_ctx_get(vdev, i);
- if (ctx) {
- vfio_virqfd_disable(&ctx->unmask);
- vfio_virqfd_disable(&ctx->mask);
- vfio_msi_set_vector_signal(vdev, i, -1, msix);
- }
+ xa_for_each(&vdev->ctx, i, ctx) {
+ vfio_virqfd_disable(&ctx->unmask);
+ vfio_virqfd_disable(&ctx->mask);
+ vfio_msi_set_vector_signal(vdev, i, -1, msix);
}
cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -523,7 +525,6 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
vdev->irq_type = VFIO_PCI_NUM_IRQS;
vdev->num_ctx = 0;
- vfio_irq_ctx_free_all(vdev);
}
/*
@@ -663,7 +664,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
for (i = start; i < start + count; i++) {
ctx = vfio_irq_ctx_get(vdev, i);
- if (!ctx || !ctx->trigger)
+ if (!ctx)
continue;
if (flags & VFIO_IRQ_SET_DATA_NONE) {
eventfd_signal(ctx->trigger, 1);
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 367fd79226a3..61d7873a3973 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -59,7 +59,7 @@ struct vfio_pci_core_device {
struct perm_bits *msi_perm;
spinlock_t irqlock;
struct mutex igate;
- struct vfio_pci_irq_ctx *ctx;
+ struct xarray ctx;
int num_ctx;
int irq_type;
int num_regions;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH V5 06/11] vfio/pci: Remove interrupt context counter
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (4 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 05/11] vfio/pci: Use xarray for interrupt context storage Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 07/11] vfio/pci: Update stale comment Reinette Chatre
` (8 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
struct vfio_pci_core_device::num_ctx counts how many interrupt
contexts have been allocated. When all interrupt contexts are
allocated simultaneously num_ctx provides the upper bound of all
vectors that can be used as indices into the interrupt context
array.
With the upcoming support for dynamic MSI-X the number of
interrupt contexts does not necessarily span the range of allocated
interrupts. Consequently, num_ctx is no longer a trusted upper bound
for valid indices.
Stop using num_ctx to determine if a provided vector is valid. Use
the existence of allocated interrupt.
This changes behavior on the error path when user space provides
an invalid vector range. Behavior changes from early exit without
any modifications to possible modifications to valid vectors within
the invalid range. This is acceptable considering that an invalid
range is not a valid scenario, see link to discussion.
The checks that ensure that user space provides a range of vectors
that is valid for the device are untouched.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/lkml/20230316155646.07ae266f.alex.williamson@redhat.com/
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag. (Kevin)
Changes since V2:
- Update changelog to reflect change in policy that existence of
allocated interrupt is validity check, not existence of context
(which is now dynamically allocated).
Changes since RFC V1:
- Remove vfio_irq_ctx_range_allocated(). (Alex and Kevin).
drivers/vfio/pci/vfio_pci_intrs.c | 13 +------------
include/linux/vfio_pci_core.h | 1 -
2 files changed, 1 insertion(+), 13 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 77957274027c..e40eca69a293 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -245,8 +245,6 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
if (!ctx)
return -ENOMEM;
- vdev->num_ctx = 1;
-
/*
* If the virtual interrupt is masked, restore it. Devices
* supporting DisINTx can be masked at the hardware level
@@ -334,7 +332,6 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
}
vfio_intx_set_signal(vdev, -1);
vdev->irq_type = VFIO_PCI_NUM_IRQS;
- vdev->num_ctx = 0;
vfio_irq_ctx_free(vdev, ctx, 0);
}
@@ -370,7 +367,6 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
}
vfio_pci_memory_unlock_and_restore(vdev, cmd);
- vdev->num_ctx = nvec;
vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
VFIO_PCI_MSI_IRQ_INDEX;
@@ -394,9 +390,6 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
int irq, ret;
u16 cmd;
- if (vector >= vdev->num_ctx)
- return -EINVAL;
-
irq = pci_irq_vector(pdev, vector);
if (irq < 0)
return -EINVAL;
@@ -483,9 +476,6 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
unsigned int i, j;
int ret = 0;
- if (start >= vdev->num_ctx || start + count > vdev->num_ctx)
- return -EINVAL;
-
for (i = 0, j = start; i < count && !ret; i++, j++) {
int fd = fds ? fds[i] : -1;
ret = vfio_msi_set_vector_signal(vdev, j, fd, msix);
@@ -524,7 +514,6 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
pci_intx(pdev, 0);
vdev->irq_type = VFIO_PCI_NUM_IRQS;
- vdev->num_ctx = 0;
}
/*
@@ -659,7 +648,7 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
return ret;
}
- if (!irq_is(vdev, index) || start + count > vdev->num_ctx)
+ if (!irq_is(vdev, index))
return -EINVAL;
for (i = start; i < start + count; i++) {
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 61d7873a3973..148fd1ae6c1c 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -60,7 +60,6 @@ struct vfio_pci_core_device {
spinlock_t irqlock;
struct mutex igate;
struct xarray ctx;
- int num_ctx;
int irq_type;
int num_regions;
struct vfio_pci_region *region;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH V5 07/11] vfio/pci: Update stale comment
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (5 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 06/11] vfio/pci: Remove interrupt context counter Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-17 2:12 ` Tian, Kevin
2023-05-11 15:44 ` [PATCH V5 08/11] vfio/pci: Use bitfield for struct vfio_pci_core_device flags Reinette Chatre
` (7 subsequent siblings)
14 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
In preparation for surrounding code change it is helpful to
ensure that existing comments are accurate.
Remove inaccurate comment about direct access and update
the rest of the comment to reflect the purpose of writing
the cached MSI message to the device.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/lkml/20230330164050.0069e2a5.alex.williamson@redhat.com/
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V4:
- Restore text about backdoor reset as example to indicate that
the restore may not be a severe problem but instead done in
collaboration with user space. (Kevin)
Changes since V2:
- New patch.
drivers/vfio/pci/vfio_pci_intrs.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index e40eca69a293..867327e159c1 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -428,11 +428,9 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
}
/*
- * The MSIx vector table resides in device memory which may be cleared
- * via backdoor resets. We don't allow direct access to the vector
- * table so even if a userspace driver attempts to save/restore around
- * such a reset it would be unsuccessful. To avoid this, restore the
- * cached value of the message prior to enabling.
+ * If the vector was previously allocated, refresh the on-device
+ * message data before enabling in case it had been cleared or
+ * corrupted (e.g. due to backdoor resets) since writing.
*/
cmd = vfio_pci_memory_lock_and_enable(vdev);
if (msix) {
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* RE: [PATCH V5 07/11] vfio/pci: Update stale comment
2023-05-11 15:44 ` [PATCH V5 07/11] vfio/pci: Update stale comment Reinette Chatre
@ 2023-05-17 2:12 ` Tian, Kevin
0 siblings, 0 replies; 25+ messages in thread
From: Tian, Kevin @ 2023-05-17 2:12 UTC (permalink / raw)
To: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
shameerali.kolothum.thodi@huawei.com, alex.williamson@redhat.com
Cc: tglx@linutronix.de, darwi@linutronix.de, kvm@vger.kernel.org,
Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org
> From: Chatre, Reinette <reinette.chatre@intel.com>
> Sent: Thursday, May 11, 2023 11:45 PM
>
> In preparation for surrounding code change it is helpful to
> ensure that existing comments are accurate.
>
> Remove inaccurate comment about direct access and update
> the rest of the comment to reflect the purpose of writing
> the cached MSI message to the device.
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Link:
> https://lore.kernel.org/lkml/20230330164050.0069e2a5.alex.williamson@re
> dhat.com/
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH V5 08/11] vfio/pci: Use bitfield for struct vfio_pci_core_device flags
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (6 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 07/11] vfio/pci: Update stale comment Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 09/11] vfio/pci: Probe and store ability to support dynamic MSI-X Reinette Chatre
` (6 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
struct vfio_pci_core_device contains eleven boolean flags.
Boolean flags clearly indicate their usage but space usage
starts to be a concern when there are many.
An upcoming change adds another boolean flag to
struct vfio_pci_core_device, thereby increasing the concern
that the boolean flags are consuming unnecessary space.
Transition the boolean flags to use bitfields. On a system that
uses one byte per boolean this reduces the space consumed
by existing flags from 11 bytes to 2 bytes with room for
a few more flags without increasing the structure's size.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag. (Kevin)
Changes since V3:
- New patch. (Jason)
include/linux/vfio_pci_core.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index 148fd1ae6c1c..adb47e2914d7 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -68,17 +68,17 @@ struct vfio_pci_core_device {
u16 msix_size;
u32 msix_offset;
u32 rbar[7];
- bool pci_2_3;
- bool virq_disabled;
- bool reset_works;
- bool extended_caps;
- bool bardirty;
- bool has_vga;
- bool needs_reset;
- bool nointx;
- bool needs_pm_restore;
- bool pm_intx_masked;
- bool pm_runtime_engaged;
+ bool pci_2_3:1;
+ bool virq_disabled:1;
+ bool reset_works:1;
+ bool extended_caps:1;
+ bool bardirty:1;
+ bool has_vga:1;
+ bool needs_reset:1;
+ bool nointx:1;
+ bool needs_pm_restore:1;
+ bool pm_intx_masked:1;
+ bool pm_runtime_engaged:1;
struct pci_saved_state *pci_saved_state;
struct pci_saved_state *pm_save;
int ioeventfds_nr;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH V5 09/11] vfio/pci: Probe and store ability to support dynamic MSI-X
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (7 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 08/11] vfio/pci: Use bitfield for struct vfio_pci_core_device flags Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-11 15:44 ` [PATCH V5 10/11] vfio/pci: Support " Reinette Chatre
` (5 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
Not all MSI-X devices support dynamic MSI-X allocation. Whether
a device supports dynamic MSI-X should be queried using
pci_msix_can_alloc_dyn().
Instead of scattering code with pci_msix_can_alloc_dyn(),
probe this ability once and store it as a property of the
virtual device.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag. (Kevin)
Changes since V3:
- Move field to improve structure layout. (Alex)
- Use bitfield. (Jason)
Changes since V2:
- New patch. (Alex)
drivers/vfio/pci/vfio_pci_core.c | 5 ++++-
include/linux/vfio_pci_core.h | 1 +
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index ae0e161c7fc9..a3635a8e54c8 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -530,8 +530,11 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
vdev->msix_bar = table & PCI_MSIX_TABLE_BIR;
vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET;
vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16;
- } else
+ vdev->has_dyn_msix = pci_msix_can_alloc_dyn(pdev);
+ } else {
vdev->msix_bar = 0xFF;
+ vdev->has_dyn_msix = false;
+ }
if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
vdev->has_vga = true;
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index adb47e2914d7..562e8754869d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -68,6 +68,7 @@ struct vfio_pci_core_device {
u16 msix_size;
u32 msix_offset;
u32 rbar[7];
+ bool has_dyn_msix:1;
bool pci_2_3:1;
bool virq_disabled:1;
bool reset_works:1;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH V5 10/11] vfio/pci: Support dynamic MSI-X
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (8 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 09/11] vfio/pci: Probe and store ability to support dynamic MSI-X Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-17 2:13 ` Tian, Kevin
2023-05-11 15:44 ` [PATCH V5 11/11] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
` (4 subsequent siblings)
14 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be
allocated after MSI-X enabling.
Use dynamic MSI-X (if supported by the device) to allocate an interrupt
after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at
the time a valid eventfd is assigned. This is different behavior from
a range provided during MSI-X enabling where interrupts are allocated
for the entire range whether a valid eventfd is provided for each
interrupt or not.
The PCI-MSIX API requires that some number of irqs are allocated for
an initial set of vectors when enabling MSI-X on the device. When
dynamic MSIX allocation is not supported, the vector table, and thus
the allocated irq set can only be resized by disabling and re-enabling
MSI-X with a different range. In that case the irq allocation is
essentially a cache for configuring vectors within the previously
allocated vector range. When dynamic MSI-X allocation is supported,
the API still requires some initial set of irqs to be allocated, but
also supports allocating and freeing specific irq vectors both
within and beyond the initially allocated range.
For consistency between modes, as well as to reduce latency and improve
reliability of allocations, and also simplicity, this implementation
only releases irqs via pci_free_irq_vectors() when either the interrupt
mode changes or the device is released.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/lkml/20230403211841.0e206b67.alex.williamson@redhat.com/
---
Changes since V4:
- Treat pci_irq_vector() returning '0' for Linux IRQ number as kernel
bug with a WARN. (Kevin)
- Move comment about missing vfio_msi_free_irq() to description of
vfio_msi_alloc_irq(). (Kevin)
- Add comment at vfio_msi_alloc_irq() call that indicates the allocation
is maintained until MSI-X disable to explain the absence of its
release in error path. (Kevin)
Changes since V3:
- Remove vfio_msi_free_irq(). (Alex)
- Rework changelog. (Alex)
Changes since V2:
- Move vfio_irq_ctx_free() to earlier in series to support
earlier usage. (Alex)
- Use consistent terms in changelog: MSI-x changed to MSI-X.
- Make dynamic interrupt context creation generic across all
MSI/MSI-X interrupts. This resulted in code moving to earlier
in series as part of xarray introduction patch. (Alex)
- Remove the local allow_dyn_alloc and direct calling of
pci_msix_can_alloc_dyn(), use the new vdev->has_dyn_msix
introduced earlier instead. (Alex)
- Stop tracking new allocations (remove "new_ctx"). (Alex)
- Introduce new wrapper that returns Linux interrupt number or
dynamically allocate a new interrupt. Wrapper can be used for
all interrupt cases. (Alex)
- Only free dynamic MSI-X interrupts on MSI-X teardown. (Alex)
Changes since RFC V1:
- Add pointer to interrupt context as function parameter to
vfio_irq_ctx_free(). (Alex)
- Initialize new_ctx to false. (Dan Carpenter)
- Only support dynamic allocation if device supports it. (Alex)
drivers/vfio/pci/vfio_pci_intrs.c | 47 +++++++++++++++++++++++++++----
1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 867327e159c1..cbb4bcbfbf83 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -381,27 +381,55 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
return 0;
}
+/*
+ * vfio_msi_alloc_irq() returns the Linux IRQ number of an MSI or MSI-X device
+ * interrupt vector. If a Linux IRQ number is not available then a new
+ * interrupt is allocated if dynamic MSI-X is supported.
+ *
+ * Where is vfio_msi_free_irq()? Allocated interrupts are maintained,
+ * essentially forming a cache that subsequent allocations can draw from.
+ * Interrupts are freed using pci_free_irq_vectors() when MSI/MSI-X is
+ * disabled.
+ */
+static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
+ unsigned int vector, bool msix)
+{
+ struct pci_dev *pdev = vdev->pdev;
+ struct msi_map map;
+ int irq;
+ u16 cmd;
+
+ irq = pci_irq_vector(pdev, vector);
+ if (WARN_ON_ONCE(irq == 0))
+ return -EINVAL;
+ if (irq > 0 || !msix || !vdev->has_dyn_msix)
+ return irq;
+
+ cmd = vfio_pci_memory_lock_and_enable(vdev);
+ map = pci_msix_alloc_irq_at(pdev, vector, NULL);
+ vfio_pci_memory_unlock_and_restore(vdev, cmd);
+
+ return map.index < 0 ? map.index : map.virq;
+}
+
static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
unsigned int vector, int fd, bool msix)
{
struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
struct eventfd_ctx *trigger;
- int irq, ret;
+ int irq = -EINVAL, ret;
u16 cmd;
- irq = pci_irq_vector(pdev, vector);
- if (irq < 0)
- return -EINVAL;
-
ctx = vfio_irq_ctx_get(vdev, vector);
if (ctx) {
irq_bypass_unregister_producer(&ctx->producer);
-
+ irq = pci_irq_vector(pdev, vector);
cmd = vfio_pci_memory_lock_and_enable(vdev);
free_irq(irq, ctx->trigger);
vfio_pci_memory_unlock_and_restore(vdev, cmd);
+ /* Interrupt stays allocated, will be freed at MSI-X disable. */
kfree(ctx->name);
eventfd_ctx_put(ctx->trigger);
vfio_irq_ctx_free(vdev, ctx, vector);
@@ -410,6 +438,13 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
if (fd < 0)
return 0;
+ if (irq == -EINVAL) {
+ /* Interrupt stays allocated, will be freed at MSI-X disable. */
+ irq = vfio_msi_alloc_irq(vdev, vector, msix);
+ if (irq < 0)
+ return irq;
+ }
+
ctx = vfio_irq_ctx_alloc(vdev, vector);
if (!ctx)
return -ENOMEM;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* RE: [PATCH V5 10/11] vfio/pci: Support dynamic MSI-X
2023-05-11 15:44 ` [PATCH V5 10/11] vfio/pci: Support " Reinette Chatre
@ 2023-05-17 2:13 ` Tian, Kevin
0 siblings, 0 replies; 25+ messages in thread
From: Tian, Kevin @ 2023-05-17 2:13 UTC (permalink / raw)
To: Chatre, Reinette, jgg@nvidia.com, yishaih@nvidia.com,
shameerali.kolothum.thodi@huawei.com, alex.williamson@redhat.com
Cc: tglx@linutronix.de, darwi@linutronix.de, kvm@vger.kernel.org,
Jiang, Dave, Liu, Jing2, Raj, Ashok, Yu, Fenghua,
tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org
> From: Chatre, Reinette <reinette.chatre@intel.com>
> Sent: Thursday, May 11, 2023 11:45 PM
>
> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be
> allocated after MSI-X enabling.
>
> Use dynamic MSI-X (if supported by the device) to allocate an interrupt
> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at
> the time a valid eventfd is assigned. This is different behavior from
> a range provided during MSI-X enabling where interrupts are allocated
> for the entire range whether a valid eventfd is provided for each
> interrupt or not.
>
> The PCI-MSIX API requires that some number of irqs are allocated for
> an initial set of vectors when enabling MSI-X on the device. When
> dynamic MSIX allocation is not supported, the vector table, and thus
> the allocated irq set can only be resized by disabling and re-enabling
> MSI-X with a different range. In that case the irq allocation is
> essentially a cache for configuring vectors within the previously
> allocated vector range. When dynamic MSI-X allocation is supported,
> the API still requires some initial set of irqs to be allocated, but
> also supports allocating and freeing specific irq vectors both
> within and beyond the initially allocated range.
>
> For consistency between modes, as well as to reduce latency and improve
> reliability of allocations, and also simplicity, this implementation
> only releases irqs via pci_free_irq_vectors() when either the interrupt
> mode changes or the device is released.
>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> Link:
> https://lore.kernel.org/lkml/20230403211841.0e206b67.alex.williamson@re
> dhat.com/
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH V5 11/11] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (9 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 10/11] vfio/pci: Support " Reinette Chatre
@ 2023-05-11 15:44 ` Reinette Chatre
2023-05-16 22:53 ` [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Alex Williamson
` (3 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-11 15:44 UTC (permalink / raw)
To: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian,
alex.williamson
Cc: tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
Dynamic MSI-X is supported. Clear VFIO_IRQ_INFO_NORESIZE
to provide guidance to user space.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
Changes since V4:
- Add Kevin's Reviewed-by tag. (Kevin)
Changes since V3:
- Remove unnecessary test from condition. (Alex)
Changes since V2:
- Use new vdev->has_dyn_msix property instead of calling
pci_msix_can_alloc_dyn() directly. (Alex)
Changes since RFC V1:
- Only advertise VFIO_IRQ_INFO_NORESIZE for MSI-X devices that
can actually support dynamic allocation. (Alex)
drivers/vfio/pci/vfio_pci_core.c | 2 +-
include/uapi/linux/vfio.h | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a3635a8e54c8..ec7e662de033 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1114,7 +1114,7 @@ static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev,
if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
info.flags |=
(VFIO_IRQ_INFO_MASKABLE | VFIO_IRQ_INFO_AUTOMASKED);
- else
+ else if (info.index != VFIO_PCI_MSIX_IRQ_INDEX || !vdev->has_dyn_msix)
info.flags |= VFIO_IRQ_INFO_NORESIZE;
return copy_to_user(arg, &info, minsz) ? -EFAULT : 0;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..1a36134cae5c 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -511,6 +511,9 @@ struct vfio_region_info_cap_nvlink2_lnkspd {
* then add and unmask vectors, it's up to userspace to make the decision
* whether to allocate the maximum supported number of vectors or tear
* down setup and incrementally increase the vectors as each is enabled.
+ * Absence of the NORESIZE flag indicates that vectors can be enabled
+ * and disabled dynamically without impacting other vectors within the
+ * index.
*/
struct vfio_irq_info {
__u32 argsz;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (10 preceding siblings ...)
2023-05-11 15:44 ` [PATCH V5 11/11] vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X Reinette Chatre
@ 2023-05-16 22:53 ` Alex Williamson
2023-05-17 2:14 ` Tian, Kevin
2023-05-17 14:25 ` Jason Gunthorpe
` (2 subsequent siblings)
14 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2023-05-16 22:53 UTC (permalink / raw)
To: Reinette Chatre
Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
linux-kernel
On Thu, 11 May 2023 08:44:27 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:
> Changes since V4:
> - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/
> - Add Kevin's Reviewed-by tag as applicable.
> - Treat non-existing INTx interrupt context as kernel bug with WARN. This
> exposed an issue in the scenario where INTx mask/unmask may occur without
> INTx enabled. This is fixed by obtaining the interrupt context later
> (right before use) within impacted functions: vfio_pci_intx_mask() and
> vfio_pci_intx_unmask_handler(). (Kevin)
> - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel
> bug via a WARN instead of ignoring this value. (Kevin)
> - Improve accuracy of comments. (Kevin)
> - Please refer to individual patches for local changes.
Looks good to me.
Kevin, do you want to add any additional reviews or check the changes
made based on your previous comments?
Thanks,
Alex
> Changes since V3:
> - V3: https://lore.kernel.org/lkml/cover.1681837892.git.reinette.chatre@intel.com/
> - Be considerate about layout and size with changes to
> struct vfio_pci_core_device. Keep flags together and transition all to
> use bitfields. (Alex and Jason)
> - Do not free dynamically allocated interrupts on error path. (Alex)
> - Please refer to individual patches for localized changes.
>
> Changes since V2:
> - V2: https://lore.kernel.org/lkml/cover.1680038771.git.reinette.chatre@intel.com/
> - During testing of V2 "kernel test robot" reported issues resulting from
> include/linux/pci.h missing a stub for pci_msix_can_alloc_dyn() when
> CONFIG_PCI_MSI=n. A separate fix was sent to address this. The fix can
> be found in the kernel (since v6.3-rc7) as
> commit 195d8e5da3ac ("PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()")
> - Biggest change is the transition to "active contexts" for both MSI and MSI-X.
> Interrupt contexts have always been allocated when the interrupts are
> allocated while they are only used while interrupts are
> enabled. In this series interrupt contexts are made dynamic, while doing
> so their allocation is moved to match how they are used: allocated when
> interrupts are enabled. Whether a Linux interrupt number exists determines
> whether an interrupt can be enabled.
> Previous policy (up to V2) that an allocated interrupt has an interrupt
> context no longer applies. Instead, an interrupt context has a
> handler/trigger, aka "active contexts". (Alex)
> - Re-ordered patches in support of "active contexts".
> - Only free interrupts on MSI-X teardown and otherwise use the
> allocated interrupts as a cache. (Alex)
> - Using unsigned int for the vector broke the unwind loop within
> vfio_msi_set_block(). (Alex)
> - Introduce new "has_dyn_msix" property of virtual device instead of
> querying support every time. (Alex)
> - Some smaller changes, please refer to individual patches.
>
> Changes since RFC V1:
> - RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/
> - Improved changelogs.
> - Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to
> allocated context. (Alex)
> - Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain
> invalid error path behavior. (Alex and Kevin)
> - Add pointer to interrupt context as function parameter to
> vfio_irq_ctx_free(). (Alex)
> - Ensure variables are initialized. (Dan Carpenter)
> - Only support dynamic allocation if device supports it. (Alex)
>
> Qemu allocates interrupts incrementally at the time the guest unmasks an
> interrupt, for example each time a Linux guest runs request_irq().
>
> Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1].
> This prompted Qemu to, when allocating a new interrupt, first release all
> previously allocated interrupts (including disable of MSI-X) followed
> by re-allocation of all interrupts that includes the new interrupt.
> Please see [2] for a detailed discussion about this issue.
>
> Releasing and re-allocating interrupts may be acceptable if all
> interrupts are unmasked during device initialization. If unmasking of
> interrupts occur during runtime this may result in lost interrupts.
> For example, consider an accelerator device with multiple work queues,
> each work queue having a dedicated interrupt. A work queue can be
> enabled at any time with its associated interrupt unmasked while other
> work queues are already active. Having all interrupts released and MSI-X
> disabled to enable the new work queue will impact active work queues.
>
> This series builds on the recent interrupt sub-system core changes
> that added support for dynamic MSI-X allocation after initial MSI-X
> enabling.
>
> Add support for dynamic MSI-X allocation to vfio-pci. A flag
> indicating lack of support for dynamic allocation already exist:
> VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With
> support for dynamic MSI-X the flag is cleared for MSI-X when supported,
> enabling Qemu to modify its behavior.
>
> Any feedback is appreciated
>
> Reinette
>
> [1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
> [2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t
>
> Reinette Chatre (11):
> vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
> vfio/pci: Remove negative check on unsigned vector
> vfio/pci: Prepare for dynamic interrupt context storage
> vfio/pci: Move to single error path
> vfio/pci: Use xarray for interrupt context storage
> vfio/pci: Remove interrupt context counter
> vfio/pci: Update stale comment
> vfio/pci: Use bitfield for struct vfio_pci_core_device flags
> vfio/pci: Probe and store ability to support dynamic MSI-X
> vfio/pci: Support dynamic MSI-X
> vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
>
> drivers/vfio/pci/vfio_pci_core.c | 8 +-
> drivers/vfio/pci/vfio_pci_intrs.c | 305 ++++++++++++++++++++----------
> include/linux/vfio_pci_core.h | 26 +--
> include/uapi/linux/vfio.h | 3 +
> 4 files changed, 229 insertions(+), 113 deletions(-)
>
>
> base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4
^ permalink raw reply [flat|nested] 25+ messages in thread* RE: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-16 22:53 ` [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Alex Williamson
@ 2023-05-17 2:14 ` Tian, Kevin
2023-05-17 15:46 ` Reinette Chatre
0 siblings, 1 reply; 25+ messages in thread
From: Tian, Kevin @ 2023-05-17 2:14 UTC (permalink / raw)
To: Alex Williamson, Chatre, Reinette
Cc: jgg@nvidia.com, yishaih@nvidia.com,
shameerali.kolothum.thodi@huawei.com, tglx@linutronix.de,
darwi@linutronix.de, kvm@vger.kernel.org, Jiang, Dave, Liu, Jing2,
Raj, Ashok, Yu, Fenghua, tom.zanussi@linux.intel.com,
linux-kernel@vger.kernel.org
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, May 17, 2023 6:53 AM
>
> On Thu, 11 May 2023 08:44:27 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
>
> > Changes since V4:
> > - V4:
> https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com
> /
> > - Add Kevin's Reviewed-by tag as applicable.
> > - Treat non-existing INTx interrupt context as kernel bug with WARN. This
> > exposed an issue in the scenario where INTx mask/unmask may occur
> without
> > INTx enabled. This is fixed by obtaining the interrupt context later
> > (right before use) within impacted functions: vfio_pci_intx_mask() and
> > vfio_pci_intx_unmask_handler(). (Kevin)
> > - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel
> > bug via a WARN instead of ignoring this value. (Kevin)
> > - Improve accuracy of comments. (Kevin)
> > - Please refer to individual patches for local changes.
>
> Looks good to me.
>
> Kevin, do you want to add any additional reviews or check the changes
> made based on your previous comments?
>
Good to me too. I've given the remaining reviewed-by's.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-17 2:14 ` Tian, Kevin
@ 2023-05-17 15:46 ` Reinette Chatre
0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-17 15:46 UTC (permalink / raw)
To: Tian, Kevin, Alex Williamson
Cc: jgg@nvidia.com, yishaih@nvidia.com,
shameerali.kolothum.thodi@huawei.com, tglx@linutronix.de,
darwi@linutronix.de, kvm@vger.kernel.org, Jiang, Dave, Liu, Jing2,
Raj, Ashok, Yu, Fenghua, tom.zanussi@linux.intel.com,
linux-kernel@vger.kernel.org
On 5/16/2023 7:14 PM, Tian, Kevin wrote:
>> From: Alex Williamson <alex.williamson@redhat.com>
>> Sent: Wednesday, May 17, 2023 6:53 AM
>>
>> On Thu, 11 May 2023 08:44:27 -0700
>> Reinette Chatre <reinette.chatre@intel.com> wrote:
>>
>>> Changes since V4:
>>> - V4:
>> https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com
>> /
>>> - Add Kevin's Reviewed-by tag as applicable.
>>> - Treat non-existing INTx interrupt context as kernel bug with WARN. This
>>> exposed an issue in the scenario where INTx mask/unmask may occur
>> without
>>> INTx enabled. This is fixed by obtaining the interrupt context later
>>> (right before use) within impacted functions: vfio_pci_intx_mask() and
>>> vfio_pci_intx_unmask_handler(). (Kevin)
>>> - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel
>>> bug via a WARN instead of ignoring this value. (Kevin)
>>> - Improve accuracy of comments. (Kevin)
>>> - Please refer to individual patches for local changes.
>>
>> Looks good to me.
>>
>> Kevin, do you want to add any additional reviews or check the changes
>> made based on your previous comments?
>>
>
> Good to me too. I've given the remaining reviewed-by's.
Thank you very much Alex and Kevin.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (11 preceding siblings ...)
2023-05-16 22:53 ` [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Alex Williamson
@ 2023-05-17 14:25 ` Jason Gunthorpe
2023-05-17 15:47 ` Reinette Chatre
2023-05-22 22:25 ` Thomas Gleixner
2023-05-23 22:43 ` Alex Williamson
14 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2023-05-17 14:25 UTC (permalink / raw)
To: Reinette Chatre
Cc: yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson,
tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, linux-kernel
On Thu, May 11, 2023 at 08:44:27AM -0700, Reinette Chatre wrote:
>
> Qemu allocates interrupts incrementally at the time the guest unmasks an
> interrupt, for example each time a Linux guest runs request_irq().
>
> Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1].
> This prompted Qemu to, when allocating a new interrupt, first release all
> previously allocated interrupts (including disable of MSI-X) followed
> by re-allocation of all interrupts that includes the new interrupt.
> Please see [2] for a detailed discussion about this issue.
>
> Releasing and re-allocating interrupts may be acceptable if all
> interrupts are unmasked during device initialization. If unmasking of
> interrupts occur during runtime this may result in lost interrupts.
> For example, consider an accelerator device with multiple work queues,
> each work queue having a dedicated interrupt. A work queue can be
> enabled at any time with its associated interrupt unmasked while other
> work queues are already active. Having all interrupts released and MSI-X
> disabled to enable the new work queue will impact active work queues.
>
> This series builds on the recent interrupt sub-system core changes
> that added support for dynamic MSI-X allocation after initial MSI-X
> enabling.
>
> Add support for dynamic MSI-X allocation to vfio-pci. A flag
> indicating lack of support for dynamic allocation already exist:
> VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With
> support for dynamic MSI-X the flag is cleared for MSI-X when supported,
> enabling Qemu to modify its behavior.
>
> Any feedback is appreciated
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-17 14:25 ` Jason Gunthorpe
@ 2023-05-17 15:47 ` Reinette Chatre
0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-17 15:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: yishaih, shameerali.kolothum.thodi, kevin.tian, alex.williamson,
tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, linux-kernel
On 5/17/2023 7:25 AM, Jason Gunthorpe wrote:
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>
Thank you very much Jason.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (12 preceding siblings ...)
2023-05-17 14:25 ` Jason Gunthorpe
@ 2023-05-22 22:25 ` Thomas Gleixner
2023-05-22 22:52 ` Reinette Chatre
2023-05-23 22:43 ` Alex Williamson
14 siblings, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2023-05-22 22:25 UTC (permalink / raw)
To: Reinette Chatre, jgg, yishaih, shameerali.kolothum.thodi,
kevin.tian, alex.williamson
Cc: darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, reinette.chatre, linux-kernel
Reinette!
On Thu, May 11 2023 at 08:44, Reinette Chatre wrote:
> Changes since V4:
> - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/
> - Add Kevin's Reviewed-by tag as applicable.
> - Treat non-existing INTx interrupt context as kernel bug with WARN. This
> exposed an issue in the scenario where INTx mask/unmask may occur without
> INTx enabled. This is fixed by obtaining the interrupt context later
> (right before use) within impacted functions: vfio_pci_intx_mask() and
> vfio_pci_intx_unmask_handler(). (Kevin)
> - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel
> bug via a WARN instead of ignoring this value. (Kevin)
> - Improve accuracy of comments. (Kevin)
> - Please refer to individual patches for local changes.
I only skimmed the series for obvious hickups vs. the PCI/MSI core (my
virt[io] foo is limited) and I did not find anything to complain about.
Aside of that I really like how this series is built up to restructure
and cleanup things first so that the new functionality falls in place
instead of the popular 'glue it in no matter what' approach.
That's a pleasure to read even for the virt[io] illiterate!
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Thanks,
tglx
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-22 22:25 ` Thomas Gleixner
@ 2023-05-22 22:52 ` Reinette Chatre
0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-22 22:52 UTC (permalink / raw)
To: Thomas Gleixner, jgg, yishaih, shameerali.kolothum.thodi,
kevin.tian, alex.williamson
Cc: darwi, kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu,
tom.zanussi, linux-kernel
On 5/22/2023 3:25 PM, Thomas Gleixner wrote:
>
> Acked-by: Thomas Gleixner <tglx@linutronix.de>
>
Thank you very much Thomas.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-11 15:44 [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts Reinette Chatre
` (13 preceding siblings ...)
2023-05-22 22:25 ` Thomas Gleixner
@ 2023-05-23 22:43 ` Alex Williamson
2023-05-24 2:43 ` YangHang Liu
14 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2023-05-23 22:43 UTC (permalink / raw)
To: Reinette Chatre
Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
linux-kernel
On Thu, 11 May 2023 08:44:27 -0700
Reinette Chatre <reinette.chatre@intel.com> wrote:
> Changes since V4:
> - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/
> - Add Kevin's Reviewed-by tag as applicable.
> - Treat non-existing INTx interrupt context as kernel bug with WARN. This
> exposed an issue in the scenario where INTx mask/unmask may occur without
> INTx enabled. This is fixed by obtaining the interrupt context later
> (right before use) within impacted functions: vfio_pci_intx_mask() and
> vfio_pci_intx_unmask_handler(). (Kevin)
> - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel
> bug via a WARN instead of ignoring this value. (Kevin)
> - Improve accuracy of comments. (Kevin)
> - Please refer to individual patches for local changes.
>
> Changes since V3:
> - V3: https://lore.kernel.org/lkml/cover.1681837892.git.reinette.chatre@intel.com/
> - Be considerate about layout and size with changes to
> struct vfio_pci_core_device. Keep flags together and transition all to
> use bitfields. (Alex and Jason)
> - Do not free dynamically allocated interrupts on error path. (Alex)
> - Please refer to individual patches for localized changes.
>
> Changes since V2:
> - V2: https://lore.kernel.org/lkml/cover.1680038771.git.reinette.chatre@intel.com/
> - During testing of V2 "kernel test robot" reported issues resulting from
> include/linux/pci.h missing a stub for pci_msix_can_alloc_dyn() when
> CONFIG_PCI_MSI=n. A separate fix was sent to address this. The fix can
> be found in the kernel (since v6.3-rc7) as
> commit 195d8e5da3ac ("PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()")
> - Biggest change is the transition to "active contexts" for both MSI and MSI-X.
> Interrupt contexts have always been allocated when the interrupts are
> allocated while they are only used while interrupts are
> enabled. In this series interrupt contexts are made dynamic, while doing
> so their allocation is moved to match how they are used: allocated when
> interrupts are enabled. Whether a Linux interrupt number exists determines
> whether an interrupt can be enabled.
> Previous policy (up to V2) that an allocated interrupt has an interrupt
> context no longer applies. Instead, an interrupt context has a
> handler/trigger, aka "active contexts". (Alex)
> - Re-ordered patches in support of "active contexts".
> - Only free interrupts on MSI-X teardown and otherwise use the
> allocated interrupts as a cache. (Alex)
> - Using unsigned int for the vector broke the unwind loop within
> vfio_msi_set_block(). (Alex)
> - Introduce new "has_dyn_msix" property of virtual device instead of
> querying support every time. (Alex)
> - Some smaller changes, please refer to individual patches.
>
> Changes since RFC V1:
> - RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/
> - Improved changelogs.
> - Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to
> allocated context. (Alex)
> - Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain
> invalid error path behavior. (Alex and Kevin)
> - Add pointer to interrupt context as function parameter to
> vfio_irq_ctx_free(). (Alex)
> - Ensure variables are initialized. (Dan Carpenter)
> - Only support dynamic allocation if device supports it. (Alex)
>
> Qemu allocates interrupts incrementally at the time the guest unmasks an
> interrupt, for example each time a Linux guest runs request_irq().
>
> Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1].
> This prompted Qemu to, when allocating a new interrupt, first release all
> previously allocated interrupts (including disable of MSI-X) followed
> by re-allocation of all interrupts that includes the new interrupt.
> Please see [2] for a detailed discussion about this issue.
>
> Releasing and re-allocating interrupts may be acceptable if all
> interrupts are unmasked during device initialization. If unmasking of
> interrupts occur during runtime this may result in lost interrupts.
> For example, consider an accelerator device with multiple work queues,
> each work queue having a dedicated interrupt. A work queue can be
> enabled at any time with its associated interrupt unmasked while other
> work queues are already active. Having all interrupts released and MSI-X
> disabled to enable the new work queue will impact active work queues.
>
> This series builds on the recent interrupt sub-system core changes
> that added support for dynamic MSI-X allocation after initial MSI-X
> enabling.
>
> Add support for dynamic MSI-X allocation to vfio-pci. A flag
> indicating lack of support for dynamic allocation already exist:
> VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With
> support for dynamic MSI-X the flag is cleared for MSI-X when supported,
> enabling Qemu to modify its behavior.
>
> Any feedback is appreciated
>
> Reinette
>
> [1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
> [2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t
>
> Reinette Chatre (11):
> vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
> vfio/pci: Remove negative check on unsigned vector
> vfio/pci: Prepare for dynamic interrupt context storage
> vfio/pci: Move to single error path
> vfio/pci: Use xarray for interrupt context storage
> vfio/pci: Remove interrupt context counter
> vfio/pci: Update stale comment
> vfio/pci: Use bitfield for struct vfio_pci_core_device flags
> vfio/pci: Probe and store ability to support dynamic MSI-X
> vfio/pci: Support dynamic MSI-X
> vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
>
> drivers/vfio/pci/vfio_pci_core.c | 8 +-
> drivers/vfio/pci/vfio_pci_intrs.c | 305 ++++++++++++++++++++----------
> include/linux/vfio_pci_core.h | 26 +--
> include/uapi/linux/vfio.h | 3 +
> 4 files changed, 229 insertions(+), 113 deletions(-)
>
>
> base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4
Applied to vfio next branch for v6.5. Thanks!
Alex
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-23 22:43 ` Alex Williamson
@ 2023-05-24 2:43 ` YangHang Liu
2023-05-24 14:38 ` Reinette Chatre
0 siblings, 1 reply; 25+ messages in thread
From: YangHang Liu @ 2023-05-24 2:43 UTC (permalink / raw)
To: Alex Williamson
Cc: Reinette Chatre, jgg, yishaih, shameerali.kolothum.thodi,
kevin.tian, tglx, darwi, kvm, dave.jiang, jing2.liu, ashok.raj,
fenghua.yu, tom.zanussi, linux-kernel
Running regression tests in the following test matrix:
i40e PF + INTx interrupt + RHEL9 guest -- PASS
bnx2x PF + MSI-X + RHEL9 guest -- PASS
iavf VF + MSI-X + Win2019 guest -- PASS
mlx5_core VF + MSI-X + Win2022 guest -- PASS
ixgbe PF + INTx + RHEL9 guest -- PASS
i40e PF + MSIX + Win2019 guest -- PASS
qede VF + MSIX + RHEL9 guest -- PASS
Tested-by: YangHang Liu <yanghliu@redhat.com>
On Wed, May 24, 2023 at 6:46 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Thu, 11 May 2023 08:44:27 -0700
> Reinette Chatre <reinette.chatre@intel.com> wrote:
>
> > Changes since V4:
> > - V4: https://lore.kernel.org/lkml/cover.1682615447.git.reinette.chatre@intel.com/
> > - Add Kevin's Reviewed-by tag as applicable.
> > - Treat non-existing INTx interrupt context as kernel bug with WARN. This
> > exposed an issue in the scenario where INTx mask/unmask may occur without
> > INTx enabled. This is fixed by obtaining the interrupt context later
> > (right before use) within impacted functions: vfio_pci_intx_mask() and
> > vfio_pci_intx_unmask_handler(). (Kevin)
> > - Treat pci_irq_vector() returning '0' for a MSI/MSI-X interrupt as a kernel
> > bug via a WARN instead of ignoring this value. (Kevin)
> > - Improve accuracy of comments. (Kevin)
> > - Please refer to individual patches for local changes.
> >
> > Changes since V3:
> > - V3: https://lore.kernel.org/lkml/cover.1681837892.git.reinette.chatre@intel.com/
> > - Be considerate about layout and size with changes to
> > struct vfio_pci_core_device. Keep flags together and transition all to
> > use bitfields. (Alex and Jason)
> > - Do not free dynamically allocated interrupts on error path. (Alex)
> > - Please refer to individual patches for localized changes.
> >
> > Changes since V2:
> > - V2: https://lore.kernel.org/lkml/cover.1680038771.git.reinette.chatre@intel.com/
> > - During testing of V2 "kernel test robot" reported issues resulting from
> > include/linux/pci.h missing a stub for pci_msix_can_alloc_dyn() when
> > CONFIG_PCI_MSI=n. A separate fix was sent to address this. The fix can
> > be found in the kernel (since v6.3-rc7) as
> > commit 195d8e5da3ac ("PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()")
> > - Biggest change is the transition to "active contexts" for both MSI and MSI-X.
> > Interrupt contexts have always been allocated when the interrupts are
> > allocated while they are only used while interrupts are
> > enabled. In this series interrupt contexts are made dynamic, while doing
> > so their allocation is moved to match how they are used: allocated when
> > interrupts are enabled. Whether a Linux interrupt number exists determines
> > whether an interrupt can be enabled.
> > Previous policy (up to V2) that an allocated interrupt has an interrupt
> > context no longer applies. Instead, an interrupt context has a
> > handler/trigger, aka "active contexts". (Alex)
> > - Re-ordered patches in support of "active contexts".
> > - Only free interrupts on MSI-X teardown and otherwise use the
> > allocated interrupts as a cache. (Alex)
> > - Using unsigned int for the vector broke the unwind loop within
> > vfio_msi_set_block(). (Alex)
> > - Introduce new "has_dyn_msix" property of virtual device instead of
> > querying support every time. (Alex)
> > - Some smaller changes, please refer to individual patches.
> >
> > Changes since RFC V1:
> > - RFC V1: https://lore.kernel.org/lkml/cover.1678911529.git.reinette.chatre@intel.com/
> > - Improved changelogs.
> > - Simplify interface so that vfio_irq_ctx_alloc_single() returns pointer to
> > allocated context. (Alex)
> > - Remove vfio_irq_ctx_range_allocated() and associated attempts to maintain
> > invalid error path behavior. (Alex and Kevin)
> > - Add pointer to interrupt context as function parameter to
> > vfio_irq_ctx_free(). (Alex)
> > - Ensure variables are initialized. (Dan Carpenter)
> > - Only support dynamic allocation if device supports it. (Alex)
> >
> > Qemu allocates interrupts incrementally at the time the guest unmasks an
> > interrupt, for example each time a Linux guest runs request_irq().
> >
> > Dynamic allocation of MSI-X interrupts was not possible until v6.2 [1].
> > This prompted Qemu to, when allocating a new interrupt, first release all
> > previously allocated interrupts (including disable of MSI-X) followed
> > by re-allocation of all interrupts that includes the new interrupt.
> > Please see [2] for a detailed discussion about this issue.
> >
> > Releasing and re-allocating interrupts may be acceptable if all
> > interrupts are unmasked during device initialization. If unmasking of
> > interrupts occur during runtime this may result in lost interrupts.
> > For example, consider an accelerator device with multiple work queues,
> > each work queue having a dedicated interrupt. A work queue can be
> > enabled at any time with its associated interrupt unmasked while other
> > work queues are already active. Having all interrupts released and MSI-X
> > disabled to enable the new work queue will impact active work queues.
> >
> > This series builds on the recent interrupt sub-system core changes
> > that added support for dynamic MSI-X allocation after initial MSI-X
> > enabling.
> >
> > Add support for dynamic MSI-X allocation to vfio-pci. A flag
> > indicating lack of support for dynamic allocation already exist:
> > VFIO_IRQ_INFO_NORESIZE and has always been set for MSI and MSI-X. With
> > support for dynamic MSI-X the flag is cleared for MSI-X when supported,
> > enabling Qemu to modify its behavior.
> >
> > Any feedback is appreciated
> >
> > Reinette
> >
> > [1] commit 34026364df8e ("PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X")
> > [2] https://lore.kernel.org/kvm/MWHPR11MB188603D0D809C1079F5817DC8C099@MWHPR11MB1886.namprd11.prod.outlook.com/#t
> >
> > Reinette Chatre (11):
> > vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
> > vfio/pci: Remove negative check on unsigned vector
> > vfio/pci: Prepare for dynamic interrupt context storage
> > vfio/pci: Move to single error path
> > vfio/pci: Use xarray for interrupt context storage
> > vfio/pci: Remove interrupt context counter
> > vfio/pci: Update stale comment
> > vfio/pci: Use bitfield for struct vfio_pci_core_device flags
> > vfio/pci: Probe and store ability to support dynamic MSI-X
> > vfio/pci: Support dynamic MSI-X
> > vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
> >
> > drivers/vfio/pci/vfio_pci_core.c | 8 +-
> > drivers/vfio/pci/vfio_pci_intrs.c | 305 ++++++++++++++++++++----------
> > include/linux/vfio_pci_core.h | 26 +--
> > include/uapi/linux/vfio.h | 3 +
> > 4 files changed, 229 insertions(+), 113 deletions(-)
> >
> >
> > base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4
>
> Applied to vfio next branch for v6.5. Thanks!
>
> Alex
>
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH V5 00/11] vfio/pci: Support dynamic allocation of MSI-X interrupts
2023-05-24 2:43 ` YangHang Liu
@ 2023-05-24 14:38 ` Reinette Chatre
0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2023-05-24 14:38 UTC (permalink / raw)
To: YangHang Liu, Alex Williamson
Cc: jgg, yishaih, shameerali.kolothum.thodi, kevin.tian, tglx, darwi,
kvm, dave.jiang, jing2.liu, ashok.raj, fenghua.yu, tom.zanussi,
linux-kernel
Hi YangHang,
On 5/23/2023 7:43 PM, YangHang Liu wrote:
> Running regression tests in the following test matrix:
>
> i40e PF + INTx interrupt + RHEL9 guest -- PASS
> bnx2x PF + MSI-X + RHEL9 guest -- PASS
> iavf VF + MSI-X + Win2019 guest -- PASS
> mlx5_core VF + MSI-X + Win2022 guest -- PASS
> ixgbe PF + INTx + RHEL9 guest -- PASS
> i40e PF + MSIX + Win2019 guest -- PASS
> qede VF + MSIX + RHEL9 guest -- PASS
>
> Tested-by: YangHang Liu <yanghliu@redhat.com>
>
Thank you very much for this thorough testing.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread