public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y
@ 2024-03-29 21:38 Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 1/7] Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ" Alex Williamson
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable; +Cc: Alex Williamson, sashal, gregkh, eric.auger

This corrects the backport of commit fe9a7082684e ("vfio/pci: Disable
auto-enable of exclusive INTx IRQ"), choosing to adapt the fix to the
current tree which uses an array of eventfd contexts rather than
include a base patch for the conversion to xarray, which is found to
be faulty in isolation.

I include the reverts here for completeness, but if the associated
commits are otherwise already dropped due to previous report[1], the
remainder of this series is still valid.

Largely this just adapts the mainline commits to the eventfd context
array from the current internal API where they're stored in an xarray.
Thanks,

Alex

[1]https://lore.kernel.org/all/20240329110433.156ff56c.alex.williamson@redhat.com/

Alex Williamson (7):
  Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ"
  Revert "vfio/pci: Prepare for dynamic interrupt context storage"
  vfio/pci: Disable auto-enable of exclusive INTx IRQ
  vfio: Introduce interface to flush virqfd inject workqueue
  vfio/pci: Create persistent INTx handler
  vfio/platform: Create persistent IRQ handlers
  vfio/fsl-mc: Block calling interrupt handler without trigger

 drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c    |   7 +-
 drivers/vfio/pci/vfio_pci_intrs.c         | 318 +++++++++-------------
 drivers/vfio/platform/vfio_platform_irq.c | 101 ++++---
 drivers/vfio/virqfd.c                     |  21 ++
 include/linux/vfio.h                      |   2 +
 5 files changed, 220 insertions(+), 229 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 6.1.y 1/7] Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ"
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
@ 2024-03-29 21:38 ` Alex Williamson
  2024-03-30  8:46   ` Greg KH
  2024-03-29 21:38 ` [PATCH 6.1.y 2/7] Revert "vfio/pci: Prepare for dynamic interrupt context storage" Alex Williamson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable; +Cc: Alex Williamson, sashal, gregkh, eric.auger

This reverts commit e31eb60c4288ecbb4cc447a8e2496ea758a3984e.

This is a bad backport, it pulls in an unnecessary dependency
in commit b8e81e269b3d ("vfio/pci: Prepare for dynamic interrupt
context storage") which turns out to be broken on it's own.

Revert and start over.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 0deb51c820d2..ef63ee441c36 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -319,15 +319,8 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 
 	ctx->trigger = trigger;
 
-	/*
-	 * Devices without DisINTx support require an exclusive interrupt,
-	 * IRQ masking is performed at the IRQ chip.  The masked status is
-	 * protected by vdev->irqlock. Setup the IRQ without auto-enable and
-	 * unmask as necessary below under lock.  DisINTx is unmodified by
-	 * the IRQ configuration and may therefore use auto-enable.
-	 */
 	if (!vdev->pci_2_3)
-		irqflags = IRQF_NO_AUTOEN;
+		irqflags = 0;
 
 	ret = request_irq(pdev->irq, vfio_intx_handler,
 			  irqflags, ctx->name, vdev);
@@ -338,9 +331,13 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 		return ret;
 	}
 
+	/*
+	 * INTx disable will stick across the new irq setup,
+	 * disable_irq won't.
+	 */
 	spin_lock_irqsave(&vdev->irqlock, flags);
-	if (!vdev->pci_2_3 && !ctx->masked)
-		enable_irq(pdev->irq);
+	if (!vdev->pci_2_3 && ctx->masked)
+		disable_irq_nosync(pdev->irq);
 	spin_unlock_irqrestore(&vdev->irqlock, flags);
 
 	return 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6.1.y 2/7] Revert "vfio/pci: Prepare for dynamic interrupt context storage"
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 1/7] Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ" Alex Williamson
@ 2024-03-29 21:38 ` Alex Williamson
  2024-03-30  8:46   ` Greg KH
  2024-03-29 21:38 ` [PATCH 6.1.y 3/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ Alex Williamson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable; +Cc: Alex Williamson, sashal, gregkh, eric.auger

This reverts commit b8e81e269b3d97fe53cd9819aa4a29e1aaf26731.

This commit does not stand alone, vfio_intx_enable() adds a call
to vfio_irq_ctx_alloc_num() followed by vfio_irq_ctx_get(), and
finally setting vdev->num_ctx in the existing code.
vfio_irq_ctx_get() relies on a valid num_ctx value, therefore this
function will always return -EINVAL.  This was fixed without being
noticed later in the usptream series.

A better solution is to start over and adjust commit fe9a7082684e
("vfio/pci: Disable auto-enable of exclusive INTx IRQ") to remove
this dependency.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 215 +++++++++---------------------
 1 file changed, 66 insertions(+), 149 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index ef63ee441c36..8c8b04d85845 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -48,31 +48,6 @@ static bool is_irq_none(struct vfio_pci_core_device *vdev)
 		 vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX);
 }
 
-static
-struct vfio_pci_irq_ctx *vfio_irq_ctx_get(struct vfio_pci_core_device *vdev,
-					  unsigned long index)
-{
-	if (index >= vdev->num_ctx)
-		return NULL;
-	return &vdev->ctx[index];
-}
-
-static void vfio_irq_ctx_free_all(struct vfio_pci_core_device *vdev)
-{
-	kfree(vdev->ctx);
-}
-
-static int vfio_irq_ctx_alloc_num(struct vfio_pci_core_device *vdev,
-				  unsigned long num)
-{
-	vdev->ctx = kcalloc(num, sizeof(struct vfio_pci_irq_ctx),
-			    GFP_KERNEL_ACCOUNT);
-	if (!vdev->ctx)
-		return -ENOMEM;
-
-	return 0;
-}
-
 /*
  * INTx
  */
@@ -80,21 +55,14 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
 {
 	struct vfio_pci_core_device *vdev = opaque;
 
-	if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
-		struct vfio_pci_irq_ctx *ctx;
-
-		ctx = vfio_irq_ctx_get(vdev, 0);
-		if (WARN_ON_ONCE(!ctx))
-			return;
-		eventfd_signal(ctx->trigger, 1);
-	}
+	if (likely(is_intx(vdev) && !vdev->virq_disabled))
+		eventfd_signal(vdev->ctx[0].trigger, 1);
 }
 
 /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
 static bool __vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	struct vfio_pci_irq_ctx *ctx;
 	unsigned long flags;
 	bool masked_changed = false;
 
@@ -111,14 +79,7 @@ static bool __vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 	if (unlikely(!is_intx(vdev))) {
 		if (vdev->pci_2_3)
 			pci_intx(pdev, 0);
-		goto out_unlock;
-	}
-
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (WARN_ON_ONCE(!ctx))
-		goto out_unlock;
-
-	if (!ctx->masked) {
+	} else if (!vdev->ctx[0].masked) {
 		/*
 		 * Can't use check_and_mask here because we always want to
 		 * mask, not just when something is pending.
@@ -128,11 +89,10 @@ static bool __vfio_pci_intx_mask(struct vfio_pci_core_device *vdev)
 		else
 			disable_irq_nosync(pdev->irq);
 
-		ctx->masked = true;
+		vdev->ctx[0].masked = true;
 		masked_changed = true;
 	}
 
-out_unlock:
 	spin_unlock_irqrestore(&vdev->irqlock, flags);
 	return masked_changed;
 }
@@ -158,7 +118,6 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 {
 	struct vfio_pci_core_device *vdev = opaque;
 	struct pci_dev *pdev = vdev->pdev;
-	struct vfio_pci_irq_ctx *ctx;
 	unsigned long flags;
 	int ret = 0;
 
@@ -171,14 +130,7 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 	if (unlikely(!is_intx(vdev))) {
 		if (vdev->pci_2_3)
 			pci_intx(pdev, 1);
-		goto out_unlock;
-	}
-
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (WARN_ON_ONCE(!ctx))
-		goto out_unlock;
-
-	if (ctx->masked && !vdev->virq_disabled) {
+	} else if (vdev->ctx[0].masked && !vdev->virq_disabled) {
 		/*
 		 * A pending interrupt here would immediately trigger,
 		 * but we can avoid that overhead by just re-sending
@@ -190,10 +142,9 @@ static int vfio_pci_intx_unmask_handler(void *opaque, void *unused)
 		} else
 			enable_irq(pdev->irq);
 
-		ctx->masked = (ret > 0);
+		vdev->ctx[0].masked = (ret > 0);
 	}
 
-out_unlock:
 	spin_unlock_irqrestore(&vdev->irqlock, flags);
 
 	return ret;
@@ -217,23 +168,18 @@ void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev)
 static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 {
 	struct vfio_pci_core_device *vdev = dev_id;
-	struct vfio_pci_irq_ctx *ctx;
 	unsigned long flags;
 	int ret = IRQ_NONE;
 
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (WARN_ON_ONCE(!ctx))
-		return ret;
-
 	spin_lock_irqsave(&vdev->irqlock, flags);
 
 	if (!vdev->pci_2_3) {
 		disable_irq_nosync(vdev->pdev->irq);
-		ctx->masked = true;
+		vdev->ctx[0].masked = true;
 		ret = IRQ_HANDLED;
-	} else if (!ctx->masked &&  /* may be shared */
+	} else if (!vdev->ctx[0].masked &&  /* may be shared */
 		   pci_check_and_mask_intx(vdev->pdev)) {
-		ctx->masked = true;
+		vdev->ctx[0].masked = true;
 		ret = IRQ_HANDLED;
 	}
 
@@ -247,24 +193,15 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 
 static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 {
-	struct vfio_pci_irq_ctx *ctx;
-	int ret;
-
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
 	if (!vdev->pdev->irq)
 		return -ENODEV;
 
-	ret = vfio_irq_ctx_alloc_num(vdev, 1);
-	if (ret)
-		return ret;
-
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (!ctx) {
-		vfio_irq_ctx_free_all(vdev);
-		return -EINVAL;
-	}
+	vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL_ACCOUNT);
+	if (!vdev->ctx)
+		return -ENOMEM;
 
 	vdev->num_ctx = 1;
 
@@ -274,9 +211,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
 	 * here, non-PCI-2.3 devices will have to wait until the
 	 * interrupt is enabled.
 	 */
-	ctx->masked = vdev->virq_disabled;
+	vdev->ctx[0].masked = vdev->virq_disabled;
 	if (vdev->pci_2_3)
-		pci_intx(vdev->pdev, !ctx->masked);
+		pci_intx(vdev->pdev, !vdev->ctx[0].masked);
 
 	vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
@@ -287,46 +224,41 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	unsigned long irqflags = IRQF_SHARED;
-	struct vfio_pci_irq_ctx *ctx;
 	struct eventfd_ctx *trigger;
 	unsigned long flags;
 	int ret;
 
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	if (WARN_ON_ONCE(!ctx))
-		return -EINVAL;
-
-	if (ctx->trigger) {
+	if (vdev->ctx[0].trigger) {
 		free_irq(pdev->irq, vdev);
-		kfree(ctx->name);
-		eventfd_ctx_put(ctx->trigger);
-		ctx->trigger = NULL;
+		kfree(vdev->ctx[0].name);
+		eventfd_ctx_put(vdev->ctx[0].trigger);
+		vdev->ctx[0].trigger = NULL;
 	}
 
 	if (fd < 0) /* Disable only */
 		return 0;
 
-	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
-			      pci_name(pdev));
-	if (!ctx->name)
+	vdev->ctx[0].name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
+				      pci_name(pdev));
+	if (!vdev->ctx[0].name)
 		return -ENOMEM;
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
-		kfree(ctx->name);
+		kfree(vdev->ctx[0].name);
 		return PTR_ERR(trigger);
 	}
 
-	ctx->trigger = trigger;
+	vdev->ctx[0].trigger = trigger;
 
 	if (!vdev->pci_2_3)
 		irqflags = 0;
 
 	ret = request_irq(pdev->irq, vfio_intx_handler,
-			  irqflags, ctx->name, vdev);
+			  irqflags, vdev->ctx[0].name, vdev);
 	if (ret) {
-		ctx->trigger = NULL;
-		kfree(ctx->name);
+		vdev->ctx[0].trigger = NULL;
+		kfree(vdev->ctx[0].name);
 		eventfd_ctx_put(trigger);
 		return ret;
 	}
@@ -336,7 +268,7 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 	 * disable_irq won't.
 	 */
 	spin_lock_irqsave(&vdev->irqlock, flags);
-	if (!vdev->pci_2_3 && ctx->masked)
+	if (!vdev->pci_2_3 && vdev->ctx[0].masked)
 		disable_irq_nosync(pdev->irq);
 	spin_unlock_irqrestore(&vdev->irqlock, flags);
 
@@ -345,18 +277,12 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 
 static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
 {
-	struct vfio_pci_irq_ctx *ctx;
-
-	ctx = vfio_irq_ctx_get(vdev, 0);
-	WARN_ON_ONCE(!ctx);
-	if (ctx) {
-		vfio_virqfd_disable(&ctx->unmask);
-		vfio_virqfd_disable(&ctx->mask);
-	}
+	vfio_virqfd_disable(&vdev->ctx[0].unmask);
+	vfio_virqfd_disable(&vdev->ctx[0].mask);
 	vfio_intx_set_signal(vdev, -1);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	vdev->num_ctx = 0;
-	vfio_irq_ctx_free_all(vdev);
+	kfree(vdev->ctx);
 }
 
 /*
@@ -380,9 +306,10 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
-	ret = vfio_irq_ctx_alloc_num(vdev, nvec);
-	if (ret)
-		return ret;
+	vdev->ctx = kcalloc(nvec, sizeof(struct vfio_pci_irq_ctx),
+			    GFP_KERNEL_ACCOUNT);
+	if (!vdev->ctx)
+		return -ENOMEM;
 
 	/* return the number of supported vectors if we can't get all: */
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -391,7 +318,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 		if (ret > 0)
 			pci_free_irq_vectors(pdev);
 		vfio_pci_memory_unlock_and_restore(vdev, cmd);
-		vfio_irq_ctx_free_all(vdev);
+		kfree(vdev->ctx);
 		return ret;
 	}
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
@@ -415,7 +342,6 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 				      unsigned int vector, int fd, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	struct vfio_pci_irq_ctx *ctx;
 	struct eventfd_ctx *trigger;
 	int irq, ret;
 	u16 cmd;
@@ -423,33 +349,33 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 	if (vector >= vdev->num_ctx)
 		return -EINVAL;
 
-	ctx = vfio_irq_ctx_get(vdev, vector);
-	if (!ctx)
-		return -EINVAL;
 	irq = pci_irq_vector(pdev, vector);
 
-	if (ctx->trigger) {
-		irq_bypass_unregister_producer(&ctx->producer);
+	if (vdev->ctx[vector].trigger) {
+		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
 
 		cmd = vfio_pci_memory_lock_and_enable(vdev);
-		free_irq(irq, ctx->trigger);
+		free_irq(irq, vdev->ctx[vector].trigger);
 		vfio_pci_memory_unlock_and_restore(vdev, cmd);
-		kfree(ctx->name);
-		eventfd_ctx_put(ctx->trigger);
-		ctx->trigger = NULL;
+
+		kfree(vdev->ctx[vector].name);
+		eventfd_ctx_put(vdev->ctx[vector].trigger);
+		vdev->ctx[vector].trigger = NULL;
 	}
 
 	if (fd < 0)
 		return 0;
 
-	ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-msi%s[%d](%s)",
-			      msix ? "x" : "", vector, pci_name(pdev));
-	if (!ctx->name)
+	vdev->ctx[vector].name = kasprintf(GFP_KERNEL_ACCOUNT,
+					   "vfio-msi%s[%d](%s)",
+					   msix ? "x" : "", vector,
+					   pci_name(pdev));
+	if (!vdev->ctx[vector].name)
 		return -ENOMEM;
 
 	trigger = eventfd_ctx_fdget(fd);
 	if (IS_ERR(trigger)) {
-		kfree(ctx->name);
+		kfree(vdev->ctx[vector].name);
 		return PTR_ERR(trigger);
 	}
 
@@ -468,25 +394,26 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
 		pci_write_msi_msg(irq, &msg);
 	}
 
-	ret = request_irq(irq, vfio_msihandler, 0, ctx->name, trigger);
+	ret = request_irq(irq, vfio_msihandler, 0,
+			  vdev->ctx[vector].name, trigger);
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 	if (ret) {
-		kfree(ctx->name);
+		kfree(vdev->ctx[vector].name);
 		eventfd_ctx_put(trigger);
 		return ret;
 	}
 
-	ctx->producer.token = trigger;
-	ctx->producer.irq = irq;
-	ret = irq_bypass_register_producer(&ctx->producer);
+	vdev->ctx[vector].producer.token = trigger;
+	vdev->ctx[vector].producer.irq = irq;
+	ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
 	if (unlikely(ret)) {
 		dev_info(&pdev->dev,
 		"irq bypass producer (token %p) registration fails: %d\n",
-		ctx->producer.token, ret);
+		vdev->ctx[vector].producer.token, ret);
 
-		ctx->producer.token = NULL;
+		vdev->ctx[vector].producer.token = NULL;
 	}
-	ctx->trigger = trigger;
+	vdev->ctx[vector].trigger = trigger;
 
 	return 0;
 }
@@ -516,17 +443,13 @@ static int vfio_msi_set_block(struct vfio_pci_core_device *vdev, unsigned start,
 static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
 	u16 cmd;
 
 	for (i = 0; i < vdev->num_ctx; i++) {
-		ctx = vfio_irq_ctx_get(vdev, i);
-		if (ctx) {
-			vfio_virqfd_disable(&ctx->unmask);
-			vfio_virqfd_disable(&ctx->mask);
-			vfio_msi_set_vector_signal(vdev, i, -1, msix);
-		}
+		vfio_virqfd_disable(&vdev->ctx[i].unmask);
+		vfio_virqfd_disable(&vdev->ctx[i].mask);
+		vfio_msi_set_vector_signal(vdev, i, -1, msix);
 	}
 
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
@@ -542,7 +465,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	vdev->num_ctx = 0;
-	vfio_irq_ctx_free_all(vdev);
+	kfree(vdev->ctx);
 }
 
 /*
@@ -562,18 +485,14 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_core_device *vdev,
 		if (unmask)
 			__vfio_pci_intx_unmask(vdev);
 	} else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
-		struct vfio_pci_irq_ctx *ctx = vfio_irq_ctx_get(vdev, 0);
 		int32_t fd = *(int32_t *)data;
-
-		if (WARN_ON_ONCE(!ctx))
-			return -EINVAL;
 		if (fd >= 0)
 			return vfio_virqfd_enable((void *) vdev,
 						  vfio_pci_intx_unmask_handler,
 						  vfio_send_intx_eventfd, NULL,
-						  &ctx->unmask, fd);
+						  &vdev->ctx[0].unmask, fd);
 
-		vfio_virqfd_disable(&ctx->unmask);
+		vfio_virqfd_disable(&vdev->ctx[0].unmask);
 	}
 
 	return 0;
@@ -646,7 +565,6 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 				    unsigned index, unsigned start,
 				    unsigned count, uint32_t flags, void *data)
 {
-	struct vfio_pci_irq_ctx *ctx;
 	unsigned int i;
 	bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
 
@@ -681,15 +599,14 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 		return -EINVAL;
 
 	for (i = start; i < start + count; i++) {
-		ctx = vfio_irq_ctx_get(vdev, i);
-		if (!ctx || !ctx->trigger)
+		if (!vdev->ctx[i].trigger)
 			continue;
 		if (flags & VFIO_IRQ_SET_DATA_NONE) {
-			eventfd_signal(ctx->trigger, 1);
+			eventfd_signal(vdev->ctx[i].trigger, 1);
 		} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
 			uint8_t *bools = data;
 			if (bools[i - start])
-				eventfd_signal(ctx->trigger, 1);
+				eventfd_signal(vdev->ctx[i].trigger, 1);
 		}
 	}
 	return 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6.1.y 3/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 1/7] Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ" Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 2/7] Revert "vfio/pci: Prepare for dynamic interrupt context storage" Alex Williamson
@ 2024-03-29 21:38 ` Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 4/7] vfio: Introduce interface to flush virqfd inject workqueue Alex Williamson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable; +Cc: Alex Williamson, sashal, gregkh, eric.auger, Kevin Tian

[ Upstream commit fe9a7082684eb059b925c535682e68c34d487d43 ]

Currently for devices requiring masking at the irqchip for INTx, ie.
devices without DisINTx support, the IRQ is enabled in request_irq()
and subsequently disabled as necessary to align with the masked status
flag.  This presents a window where the interrupt could fire between
these events, resulting in the IRQ incrementing the disable depth twice.
This would be unrecoverable for a user since the masked flag prevents
nested enables through vfio.

Instead, invert the logic using IRQF_NO_AUTOEN such that exclusive INTx
is never auto-enabled, then unmask as required.

Cc:  <stable@vger.kernel.org>
Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20240308230557.805580-2-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8c8b04d85845..4f0699215f12 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -251,8 +251,15 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 
 	vdev->ctx[0].trigger = trigger;
 
+	/*
+	 * Devices without DisINTx support require an exclusive interrupt,
+	 * IRQ masking is performed at the IRQ chip.  The masked status is
+	 * protected by vdev->irqlock. Setup the IRQ without auto-enable and
+	 * unmask as necessary below under lock.  DisINTx is unmodified by
+	 * the IRQ configuration and may therefore use auto-enable.
+	 */
 	if (!vdev->pci_2_3)
-		irqflags = 0;
+		irqflags = IRQF_NO_AUTOEN;
 
 	ret = request_irq(pdev->irq, vfio_intx_handler,
 			  irqflags, vdev->ctx[0].name, vdev);
@@ -263,13 +270,9 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
 		return ret;
 	}
 
-	/*
-	 * INTx disable will stick across the new irq setup,
-	 * disable_irq won't.
-	 */
 	spin_lock_irqsave(&vdev->irqlock, flags);
-	if (!vdev->pci_2_3 && vdev->ctx[0].masked)
-		disable_irq_nosync(pdev->irq);
+	if (!vdev->pci_2_3 && !vdev->ctx[0].masked)
+		enable_irq(pdev->irq);
 	spin_unlock_irqrestore(&vdev->irqlock, flags);
 
 	return 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6.1.y 4/7] vfio: Introduce interface to flush virqfd inject workqueue
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
                   ` (2 preceding siblings ...)
  2024-03-29 21:38 ` [PATCH 6.1.y 3/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ Alex Williamson
@ 2024-03-29 21:38 ` Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 5/7] vfio/pci: Create persistent INTx handler Alex Williamson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable
  Cc: Alex Williamson, sashal, gregkh, eric.auger, Kevin Tian,
	Reinette Chatre

[ Upstream commit b620ecbd17a03cacd06f014a5d3f3a11285ce053 ]

In order to synchronize changes that can affect the thread callback,
introduce an interface to force a flush of the inject workqueue.  The
irqfd pointer is only valid under spinlock, but the workqueue cannot
be flushed under spinlock.  Therefore the flush work for the irqfd is
queued under spinlock.  The vfio_irqfd_cleanup_wq workqueue is re-used
for queuing this work such that flushing the workqueue is also ordered
relative to shutdown.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20240308230557.805580-4-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/virqfd.c | 21 +++++++++++++++++++++
 include/linux/vfio.h  |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index a928c68df476..e06b32ddedce 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -104,6 +104,13 @@ static void virqfd_inject(struct work_struct *work)
 		virqfd->thread(virqfd->opaque, virqfd->data);
 }
 
+static void virqfd_flush_inject(struct work_struct *work)
+{
+	struct virqfd *virqfd = container_of(work, struct virqfd, flush_inject);
+
+	flush_work(&virqfd->inject);
+}
+
 int vfio_virqfd_enable(void *opaque,
 		       int (*handler)(void *, void *),
 		       void (*thread)(void *, void *),
@@ -127,6 +134,7 @@ int vfio_virqfd_enable(void *opaque,
 
 	INIT_WORK(&virqfd->shutdown, virqfd_shutdown);
 	INIT_WORK(&virqfd->inject, virqfd_inject);
+	INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
 
 	irqfd = fdget(fd);
 	if (!irqfd.file) {
@@ -217,6 +225,19 @@ void vfio_virqfd_disable(struct virqfd **pvirqfd)
 }
 EXPORT_SYMBOL_GPL(vfio_virqfd_disable);
 
+void vfio_virqfd_flush_thread(struct virqfd **pvirqfd)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&virqfd_lock, flags);
+	if (*pvirqfd && (*pvirqfd)->thread)
+		queue_work(vfio_irqfd_cleanup_wq, &(*pvirqfd)->flush_inject);
+	spin_unlock_irqrestore(&virqfd_lock, flags);
+
+	flush_workqueue(vfio_irqfd_cleanup_wq);
+}
+EXPORT_SYMBOL_GPL(vfio_virqfd_flush_thread);
+
 module_init(vfio_virqfd_init);
 module_exit(vfio_virqfd_exit);
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fdd393f70b19..5e7bf143cb22 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -268,6 +268,7 @@ struct virqfd {
 	wait_queue_entry_t		wait;
 	poll_table		pt;
 	struct work_struct	shutdown;
+	struct work_struct	flush_inject;
 	struct virqfd		**pvirqfd;
 };
 
@@ -275,5 +276,6 @@ int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *),
 		       void (*thread)(void *, void *), void *data,
 		       struct virqfd **pvirqfd, int fd);
 void vfio_virqfd_disable(struct virqfd **pvirqfd);
+void vfio_virqfd_flush_thread(struct virqfd **pvirqfd);
 
 #endif /* VFIO_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6.1.y 5/7] vfio/pci: Create persistent INTx handler
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
                   ` (3 preceding siblings ...)
  2024-03-29 21:38 ` [PATCH 6.1.y 4/7] vfio: Introduce interface to flush virqfd inject workqueue Alex Williamson
@ 2024-03-29 21:38 ` Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 6/7] vfio/platform: Create persistent IRQ handlers Alex Williamson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable
  Cc: Alex Williamson, sashal, gregkh, eric.auger, Reinette Chatre,
	Kevin Tian

[ Upstream commit 18c198c96a815c962adc2b9b77909eec0be7df4d ]

A vulnerability exists where the eventfd for INTx signaling can be
deconfigured, which unregisters the IRQ handler but still allows
eventfds to be signaled with a NULL context through the SET_IRQS ioctl
or through unmask irqfd if the device interrupt is pending.

Ideally this could be solved with some additional locking; the igate
mutex serializes the ioctl and config space accesses, and the interrupt
handler is unregistered relative to the trigger, but the irqfd path
runs asynchronous to those.  The igate mutex cannot be acquired from the
atomic context of the eventfd wake function.  Disabling the irqfd
relative to the eventfd registration is potentially incompatible with
existing userspace.

As a result, the solution implemented here moves configuration of the
INTx interrupt handler to track the lifetime of the INTx context object
and irq_type configuration, rather than registration of a particular
trigger eventfd.  Synchronization is added between the ioctl path and
eventfd_signal() wrapper such that the eventfd trigger can be
dynamically updated relative to in-flight interrupts or irqfd callbacks.

Cc:  <stable@vger.kernel.org>
Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20240308230557.805580-5-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 149 ++++++++++++++++--------------
 1 file changed, 82 insertions(+), 67 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 4f0699215f12..03246a59b553 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -55,8 +55,13 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
 {
 	struct vfio_pci_core_device *vdev = opaque;
 
-	if (likely(is_intx(vdev) && !vdev->virq_disabled))
-		eventfd_signal(vdev->ctx[0].trigger, 1);
+	if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
+		struct eventfd_ctx *trigger;
+
+		trigger = READ_ONCE(vdev->ctx[0].trigger);
+		if (likely(trigger))
+			eventfd_signal(trigger, 1);
+	}
 }
 
 /* Returns true if the INTx vfio_pci_irq_ctx.masked value is changed. */
@@ -191,98 +196,104 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
 	return ret;
 }
 
-static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
+static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
+			    struct eventfd_ctx *trigger)
 {
+	struct pci_dev *pdev = vdev->pdev;
+	unsigned long irqflags;
+	char *name;
+	int ret;
+
 	if (!is_irq_none(vdev))
 		return -EINVAL;
 
-	if (!vdev->pdev->irq)
+	if (!pdev->irq)
 		return -ENODEV;
 
+	name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev));
+	if (!name)
+		return -ENOMEM;
+
 	vdev->ctx = kzalloc(sizeof(struct vfio_pci_irq_ctx), GFP_KERNEL_ACCOUNT);
 	if (!vdev->ctx)
 		return -ENOMEM;
 
 	vdev->num_ctx = 1;
 
+	vdev->ctx[0].name = name;
+	vdev->ctx[0].trigger = trigger;
+
 	/*
-	 * If the virtual interrupt is masked, restore it.  Devices
-	 * supporting DisINTx can be masked at the hardware level
-	 * here, non-PCI-2.3 devices will have to wait until the
-	 * interrupt is enabled.
+	 * Fill the initial masked state based on virq_disabled.  After
+	 * enable, changing the DisINTx bit in vconfig directly changes INTx
+	 * masking.  igate prevents races during setup, once running masked
+	 * is protected via irqlock.
+	 *
+	 * Devices supporting DisINTx also reflect the current mask state in
+	 * the physical DisINTx bit, which is not affected during IRQ setup.
+	 *
+	 * Devices without DisINTx support require an exclusive interrupt.
+	 * IRQ masking is performed at the IRQ chip.  Again, igate protects
+	 * against races during setup and IRQ handlers and irqfds are not
+	 * yet active, therefore masked is stable and can be used to
+	 * conditionally auto-enable the IRQ.
+	 *
+	 * irq_type must be stable while the IRQ handler is registered,
+	 * therefore it must be set before request_irq().
 	 */
 	vdev->ctx[0].masked = vdev->virq_disabled;
-	if (vdev->pci_2_3)
-		pci_intx(vdev->pdev, !vdev->ctx[0].masked);
+	if (vdev->pci_2_3) {
+		pci_intx(pdev, !vdev->ctx[0].masked);
+		irqflags = IRQF_SHARED;
+	} else {
+		irqflags = vdev->ctx[0].masked ? IRQF_NO_AUTOEN : 0;
+	}
 
 	vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
 
+	ret = request_irq(pdev->irq, vfio_intx_handler,
+			  irqflags, vdev->ctx[0].name, vdev);
+	if (ret) {
+		vdev->irq_type = VFIO_PCI_NUM_IRQS;
+		kfree(name);
+		vdev->num_ctx = 0;
+		kfree(vdev->ctx);
+		return ret;
+	}
+
 	return 0;
 }
 
-static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
+static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
+				struct eventfd_ctx *trigger)
 {
 	struct pci_dev *pdev = vdev->pdev;
-	unsigned long irqflags = IRQF_SHARED;
-	struct eventfd_ctx *trigger;
-	unsigned long flags;
-	int ret;
+	struct eventfd_ctx *old;
 
-	if (vdev->ctx[0].trigger) {
-		free_irq(pdev->irq, vdev);
-		kfree(vdev->ctx[0].name);
-		eventfd_ctx_put(vdev->ctx[0].trigger);
-		vdev->ctx[0].trigger = NULL;
-	}
-
-	if (fd < 0) /* Disable only */
-		return 0;
+	old = vdev->ctx[0].trigger;
 
-	vdev->ctx[0].name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
-				      pci_name(pdev));
-	if (!vdev->ctx[0].name)
-		return -ENOMEM;
+	WRITE_ONCE(vdev->ctx[0].trigger, trigger);
 
-	trigger = eventfd_ctx_fdget(fd);
-	if (IS_ERR(trigger)) {
-		kfree(vdev->ctx[0].name);
-		return PTR_ERR(trigger);
-	}
-
-	vdev->ctx[0].trigger = trigger;
-
-	/*
-	 * Devices without DisINTx support require an exclusive interrupt,
-	 * IRQ masking is performed at the IRQ chip.  The masked status is
-	 * protected by vdev->irqlock. Setup the IRQ without auto-enable and
-	 * unmask as necessary below under lock.  DisINTx is unmodified by
-	 * the IRQ configuration and may therefore use auto-enable.
-	 */
-	if (!vdev->pci_2_3)
-		irqflags = IRQF_NO_AUTOEN;
-
-	ret = request_irq(pdev->irq, vfio_intx_handler,
-			  irqflags, vdev->ctx[0].name, vdev);
-	if (ret) {
-		vdev->ctx[0].trigger = NULL;
-		kfree(vdev->ctx[0].name);
-		eventfd_ctx_put(trigger);
-		return ret;
+	/* Releasing an old ctx requires synchronizing in-flight users */
+	if (old) {
+		synchronize_irq(pdev->irq);
+		vfio_virqfd_flush_thread(&vdev->ctx[0].unmask);
+		eventfd_ctx_put(old);
 	}
 
-	spin_lock_irqsave(&vdev->irqlock, flags);
-	if (!vdev->pci_2_3 && !vdev->ctx[0].masked)
-		enable_irq(pdev->irq);
-	spin_unlock_irqrestore(&vdev->irqlock, flags);
-
 	return 0;
 }
 
 static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
 {
+	struct pci_dev *pdev = vdev->pdev;
+
 	vfio_virqfd_disable(&vdev->ctx[0].unmask);
 	vfio_virqfd_disable(&vdev->ctx[0].mask);
-	vfio_intx_set_signal(vdev, -1);
+	free_irq(pdev->irq, vdev);
+	if (vdev->ctx[0].trigger)
+		eventfd_ctx_put(vdev->ctx[0].trigger);
+	kfree(vdev->ctx[0].name);
 	vdev->irq_type = VFIO_PCI_NUM_IRQS;
 	vdev->num_ctx = 0;
 	kfree(vdev->ctx);
@@ -534,19 +545,23 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
 		return -EINVAL;
 
 	if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
+		struct eventfd_ctx *trigger = NULL;
 		int32_t fd = *(int32_t *)data;
 		int ret;
 
-		if (is_intx(vdev))
-			return vfio_intx_set_signal(vdev, fd);
+		if (fd >= 0) {
+			trigger = eventfd_ctx_fdget(fd);
+			if (IS_ERR(trigger))
+				return PTR_ERR(trigger);
+		}
 
-		ret = vfio_intx_enable(vdev);
-		if (ret)
-			return ret;
+		if (is_intx(vdev))
+			ret = vfio_intx_set_signal(vdev, trigger);
+		else
+			ret = vfio_intx_enable(vdev, trigger);
 
-		ret = vfio_intx_set_signal(vdev, fd);
-		if (ret)
-			vfio_intx_disable(vdev);
+		if (ret && trigger)
+			eventfd_ctx_put(trigger);
 
 		return ret;
 	}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6.1.y 6/7] vfio/platform: Create persistent IRQ handlers
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
                   ` (4 preceding siblings ...)
  2024-03-29 21:38 ` [PATCH 6.1.y 5/7] vfio/pci: Create persistent INTx handler Alex Williamson
@ 2024-03-29 21:38 ` Alex Williamson
  2024-03-29 21:38 ` [PATCH 6.1.y 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger Alex Williamson
  2024-03-30  8:54 ` [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Greg KH
  7 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable; +Cc: Alex Williamson, sashal, gregkh, eric.auger, Kevin Tian

[ Upstream commit 675daf435e9f8e5a5eab140a9864dfad6668b375 ]

The vfio-platform SET_IRQS ioctl currently allows loopback triggering of
an interrupt before a signaling eventfd has been configured by the user,
which thereby allows a NULL pointer dereference.

Rather than register the IRQ relative to a valid trigger, register all
IRQs in a disabled state in the device open path.  This allows mask
operations on the IRQ to nest within the overall enable state governed
by a valid eventfd signal.  This decouples @masked, protected by the
@locked spinlock from @trigger, protected via the @igate mutex.

In doing so, it's guaranteed that changes to @trigger cannot race the
IRQ handlers because the IRQ handler is synchronously disabled before
modifying the trigger, and loopback triggering of the IRQ via ioctl is
safe due to serialization with trigger changes via igate.

For compatibility, request_irq() failures are maintained to be local to
the SET_IRQS ioctl rather than a fatal error in the open device path.
This allows, for example, a userspace driver with polling mode support
to continue to work regardless of moving the request_irq() call site.
This necessarily blocks all SET_IRQS access to the failed index.

Cc: Eric Auger <eric.auger@redhat.com>
Cc:  <stable@vger.kernel.org>
Fixes: 57f972e2b341 ("vfio/platform: trigger an interrupt via eventfd")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20240308230557.805580-7-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/platform/vfio_platform_irq.c | 101 +++++++++++++++-------
 1 file changed, 68 insertions(+), 33 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index f2893f2fcaab..7f4341a8d718 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -136,6 +136,16 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
 	return 0;
 }
 
+/*
+ * The trigger eventfd is guaranteed valid in the interrupt path
+ * and protected by the igate mutex when triggered via ioctl.
+ */
+static void vfio_send_eventfd(struct vfio_platform_irq *irq_ctx)
+{
+	if (likely(irq_ctx->trigger))
+		eventfd_signal(irq_ctx->trigger, 1);
+}
+
 static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
 {
 	struct vfio_platform_irq *irq_ctx = dev_id;
@@ -155,7 +165,7 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
 	spin_unlock_irqrestore(&irq_ctx->lock, flags);
 
 	if (ret == IRQ_HANDLED)
-		eventfd_signal(irq_ctx->trigger, 1);
+		vfio_send_eventfd(irq_ctx);
 
 	return ret;
 }
@@ -164,22 +174,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
 {
 	struct vfio_platform_irq *irq_ctx = dev_id;
 
-	eventfd_signal(irq_ctx->trigger, 1);
+	vfio_send_eventfd(irq_ctx);
 
 	return IRQ_HANDLED;
 }
 
 static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
-			    int fd, irq_handler_t handler)
+			    int fd)
 {
 	struct vfio_platform_irq *irq = &vdev->irqs[index];
 	struct eventfd_ctx *trigger;
-	int ret;
 
 	if (irq->trigger) {
-		irq_clear_status_flags(irq->hwirq, IRQ_NOAUTOEN);
-		free_irq(irq->hwirq, irq);
-		kfree(irq->name);
+		disable_irq(irq->hwirq);
 		eventfd_ctx_put(irq->trigger);
 		irq->trigger = NULL;
 	}
@@ -187,30 +194,20 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
 	if (fd < 0) /* Disable only */
 		return 0;
 
-	irq->name = kasprintf(GFP_KERNEL, "vfio-irq[%d](%s)",
-						irq->hwirq, vdev->name);
-	if (!irq->name)
-		return -ENOMEM;
-
 	trigger = eventfd_ctx_fdget(fd);
-	if (IS_ERR(trigger)) {
-		kfree(irq->name);
+	if (IS_ERR(trigger))
 		return PTR_ERR(trigger);
-	}
 
 	irq->trigger = trigger;
 
-	irq_set_status_flags(irq->hwirq, IRQ_NOAUTOEN);
-	ret = request_irq(irq->hwirq, handler, 0, irq->name, irq);
-	if (ret) {
-		kfree(irq->name);
-		eventfd_ctx_put(trigger);
-		irq->trigger = NULL;
-		return ret;
-	}
-
-	if (!irq->masked)
-		enable_irq(irq->hwirq);
+	/*
+	 * irq->masked effectively provides nested disables within the overall
+	 * enable relative to trigger.  Specifically request_irq() is called
+	 * with NO_AUTOEN, therefore the IRQ is initially disabled.  The user
+	 * may only further disable the IRQ with a MASK operations because
+	 * irq->masked is initially false.
+	 */
+	enable_irq(irq->hwirq);
 
 	return 0;
 }
@@ -229,7 +226,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
 		handler = vfio_irq_handler;
 
 	if (!count && (flags & VFIO_IRQ_SET_DATA_NONE))
-		return vfio_set_trigger(vdev, index, -1, handler);
+		return vfio_set_trigger(vdev, index, -1);
 
 	if (start != 0 || count != 1)
 		return -EINVAL;
@@ -237,7 +234,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
 	if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
 		int32_t fd = *(int32_t *)data;
 
-		return vfio_set_trigger(vdev, index, fd, handler);
+		return vfio_set_trigger(vdev, index, fd);
 	}
 
 	if (flags & VFIO_IRQ_SET_DATA_NONE) {
@@ -261,6 +258,14 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
 		    unsigned start, unsigned count, uint32_t flags,
 		    void *data) = NULL;
 
+	/*
+	 * For compatibility, errors from request_irq() are local to the
+	 * SET_IRQS path and reflected in the name pointer.  This allows,
+	 * for example, polling mode fallback for an exclusive IRQ failure.
+	 */
+	if (IS_ERR(vdev->irqs[index].name))
+		return PTR_ERR(vdev->irqs[index].name);
+
 	switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
 	case VFIO_IRQ_SET_ACTION_MASK:
 		func = vfio_platform_set_irq_mask;
@@ -281,7 +286,7 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
 
 int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 {
-	int cnt = 0, i;
+	int cnt = 0, i, ret = 0;
 
 	while (vdev->get_irq(vdev, cnt) >= 0)
 		cnt++;
@@ -292,29 +297,54 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
 
 	for (i = 0; i < cnt; i++) {
 		int hwirq = vdev->get_irq(vdev, i);
+		irq_handler_t handler = vfio_irq_handler;
 
-		if (hwirq < 0)
+		if (hwirq < 0) {
+			ret = -EINVAL;
 			goto err;
+		}
 
 		spin_lock_init(&vdev->irqs[i].lock);
 
 		vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
 
-		if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK)
+		if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK) {
 			vdev->irqs[i].flags |= VFIO_IRQ_INFO_MASKABLE
 						| VFIO_IRQ_INFO_AUTOMASKED;
+			handler = vfio_automasked_irq_handler;
+		}
 
 		vdev->irqs[i].count = 1;
 		vdev->irqs[i].hwirq = hwirq;
 		vdev->irqs[i].masked = false;
+		vdev->irqs[i].name = kasprintf(GFP_KERNEL,
+					       "vfio-irq[%d](%s)", hwirq,
+					       vdev->name);
+		if (!vdev->irqs[i].name) {
+			ret = -ENOMEM;
+			goto err;
+		}
+
+		ret = request_irq(hwirq, handler, IRQF_NO_AUTOEN,
+				  vdev->irqs[i].name, &vdev->irqs[i]);
+		if (ret) {
+			kfree(vdev->irqs[i].name);
+			vdev->irqs[i].name = ERR_PTR(ret);
+		}
 	}
 
 	vdev->num_irqs = cnt;
 
 	return 0;
 err:
+	for (--i; i >= 0; i--) {
+		if (!IS_ERR(vdev->irqs[i].name)) {
+			free_irq(vdev->irqs[i].hwirq, &vdev->irqs[i]);
+			kfree(vdev->irqs[i].name);
+		}
+	}
 	kfree(vdev->irqs);
-	return -EINVAL;
+	return ret;
 }
 
 void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
@@ -324,7 +354,12 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
 	for (i = 0; i < vdev->num_irqs; i++) {
 		vfio_virqfd_disable(&vdev->irqs[i].mask);
 		vfio_virqfd_disable(&vdev->irqs[i].unmask);
-		vfio_set_trigger(vdev, i, -1, NULL);
+		if (!IS_ERR(vdev->irqs[i].name)) {
+			free_irq(vdev->irqs[i].hwirq, &vdev->irqs[i]);
+			if (vdev->irqs[i].trigger)
+				eventfd_ctx_put(vdev->irqs[i].trigger);
+			kfree(vdev->irqs[i].name);
+		}
 	}
 
 	vdev->num_irqs = 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6.1.y 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
                   ` (5 preceding siblings ...)
  2024-03-29 21:38 ` [PATCH 6.1.y 6/7] vfio/platform: Create persistent IRQ handlers Alex Williamson
@ 2024-03-29 21:38 ` Alex Williamson
  2024-03-30  8:54 ` [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Greg KH
  7 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2024-03-29 21:38 UTC (permalink / raw)
  To: stable
  Cc: Alex Williamson, sashal, gregkh, eric.auger, Diana Craciun,
	Kevin Tian

[ Upstream commit 7447d911af699a15f8d050dfcb7c680a86f87012 ]

The eventfd_ctx trigger pointer of the vfio_fsl_mc_irq object is
initially NULL and may become NULL if the user sets the trigger
eventfd to -1.  The interrupt handler itself is guaranteed that
trigger is always valid between request_irq() and free_irq(), but
the loopback testing mechanisms to invoke the handler function
need to test the trigger.  The triggering and setting ioctl paths
both make use of igate and are therefore mutually exclusive.

The vfio-fsl-mc driver does not make use of irqfds, nor does it
support any sort of masking operations, therefore unlike vfio-pci
and vfio-platform, the flow can remain essentially unchanged.

Cc: Diana Craciun <diana.craciun@oss.nxp.com>
Cc:  <stable@vger.kernel.org>
Fixes: cc0ee20bd969 ("vfio/fsl-mc: trigger an interrupt via eventfd")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20240308230557.805580-8-alex.williamson@redhat.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
index 7b428eac3d3e..b125b6edf634 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
@@ -142,13 +142,14 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev,
 	irq = &vdev->mc_irqs[index];
 
 	if (flags & VFIO_IRQ_SET_DATA_NONE) {
-		vfio_fsl_mc_irq_handler(hwirq, irq);
+		if (irq->trigger)
+			eventfd_signal(irq->trigger, 1);
 
 	} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
 		u8 trigger = *(u8 *)data;
 
-		if (trigger)
-			vfio_fsl_mc_irq_handler(hwirq, irq);
+		if (trigger && irq->trigger)
+			eventfd_signal(irq->trigger, 1);
 	}
 
 	return 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 6.1.y 1/7] Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ"
  2024-03-29 21:38 ` [PATCH 6.1.y 1/7] Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ" Alex Williamson
@ 2024-03-30  8:46   ` Greg KH
  0 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2024-03-30  8:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: stable, sashal, eric.auger

On Fri, Mar 29, 2024 at 03:38:48PM -0600, Alex Williamson wrote:
> This reverts commit e31eb60c4288ecbb4cc447a8e2496ea758a3984e.

That's a fake id, I've just dropped the patch from the queue now,
thanks.

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 6.1.y 2/7] Revert "vfio/pci: Prepare for dynamic interrupt context storage"
  2024-03-29 21:38 ` [PATCH 6.1.y 2/7] Revert "vfio/pci: Prepare for dynamic interrupt context storage" Alex Williamson
@ 2024-03-30  8:46   ` Greg KH
  0 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2024-03-30  8:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: stable, sashal, eric.auger

On Fri, Mar 29, 2024 at 03:38:49PM -0600, Alex Williamson wrote:
> This reverts commit b8e81e269b3d97fe53cd9819aa4a29e1aaf26731.

Again, fake id, I've dropped the change from the queue, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y
  2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
                   ` (6 preceding siblings ...)
  2024-03-29 21:38 ` [PATCH 6.1.y 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger Alex Williamson
@ 2024-03-30  8:54 ` Greg KH
  7 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2024-03-30  8:54 UTC (permalink / raw)
  To: Alex Williamson; +Cc: stable, sashal, eric.auger

On Fri, Mar 29, 2024 at 03:38:47PM -0600, Alex Williamson wrote:
> This corrects the backport of commit fe9a7082684e ("vfio/pci: Disable
> auto-enable of exclusive INTx IRQ"), choosing to adapt the fix to the
> current tree which uses an array of eventfd contexts rather than
> include a base patch for the conversion to xarray, which is found to
> be faulty in isolation.
> 
> I include the reverts here for completeness, but if the associated
> commits are otherwise already dropped due to previous report[1], the
> remainder of this series is still valid.

Remainder of the series is now queued up, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-30  8:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-29 21:38 [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Alex Williamson
2024-03-29 21:38 ` [PATCH 6.1.y 1/7] Revert "vfio/pci: Disable auto-enable of exclusive INTx IRQ" Alex Williamson
2024-03-30  8:46   ` Greg KH
2024-03-29 21:38 ` [PATCH 6.1.y 2/7] Revert "vfio/pci: Prepare for dynamic interrupt context storage" Alex Williamson
2024-03-30  8:46   ` Greg KH
2024-03-29 21:38 ` [PATCH 6.1.y 3/7] vfio/pci: Disable auto-enable of exclusive INTx IRQ Alex Williamson
2024-03-29 21:38 ` [PATCH 6.1.y 4/7] vfio: Introduce interface to flush virqfd inject workqueue Alex Williamson
2024-03-29 21:38 ` [PATCH 6.1.y 5/7] vfio/pci: Create persistent INTx handler Alex Williamson
2024-03-29 21:38 ` [PATCH 6.1.y 6/7] vfio/platform: Create persistent IRQ handlers Alex Williamson
2024-03-29 21:38 ` [PATCH 6.1.y 7/7] vfio/fsl-mc: Block calling interrupt handler without trigger Alex Williamson
2024-03-30  8:54 ` [PATCH 6.1.y 0/7] vfio: Interrupt eventfd hardening for 6.6.y Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox