linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] genirq/msi: Fix device MSI prepare/alloc sequencing
@ 2025-05-11 16:35 Marc Zyngier
  2025-05-11 16:35 ` [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Marc Zyngier @ 2025-05-11 16:35 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

While reviewing (and debugging) the GICv5 code, it became obvious that
there was something pretty fishy about the way MSI allocation was
taking place in respect to the msi_prepare() callback.

There is a strong (though not 100% explicit) requirement that the
.msi_prepare() callback must take place exactly once during the lifetime
of an MSI domain, before any MSI is allocated. Sadly, that's not how the
core code behaves, leading to all sorts of odd bad issues with MBIGEN or
the GICv5 IWB.

The fix is both simple and surprisingly convoluted. The first course
of action would be to hoist the "prepare" action before we allocate
any MSI, adding new tracking for the msi_alloc_info_t structure.

But this unveils another issue that the core code has been papering
over, which is that there is no pendant to the .msi_prepare() callback
that would be called on irqdomain destruction, and that at least the
ITS code has been using some crap heuristics to work around it.

So this needs to be addressed first, and that's the point of the first
two patches, introducing and implementing the .msi_teardown() callback.

The last patch remove a gross hack in the ITS msi-parent code, which
was papering over the broken use of the "prepare" callback.

This has been tested on a GICv4 system with MBIGEN, patches on top of
6.15-rc5.

Marc Zyngier (4):
  genirq/msi: Add .msi_teardown() callback as the reverse of
    .msi_prepare()
  irqchip/gic-v3-its: Implement .msi_teardown() callback
  genirq/msi: Move prepare() call to per-device allocation
  irqchip/gic-v3-its: Use allocation size from the prepare call

 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 29 ++++-------
 drivers/irqchip/irq-gic-v3-its.c            | 56 +++++++++++++--------
 include/linux/msi.h                         | 12 ++++-
 kernel/irq/msi.c                            | 51 +++++++++++++++++--
 4 files changed, 101 insertions(+), 47 deletions(-)

-- 
2.39.2



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
  2025-05-11 16:35 [PATCH 0/4] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
@ 2025-05-11 16:35 ` Marc Zyngier
  2025-05-12 14:29   ` Thomas Gleixner
  2025-05-11 16:35 ` [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2025-05-11 16:35 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

While the MSI ops do have a .msi_prepare() callback that is
responsible for setting up the relevant (usually per-device)
allocation, we don't have a callback reversing this setup.

For this purpose, let's a .msi_teardown() callback. This is
reliying on the msi_domain_info structure having a non-NULL
alloc_data field.

Nobody is populating this field yet, so there is no change
in behaviour yet.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 include/linux/msi.h | 10 ++++++++--
 kernel/irq/msi.c    | 12 ++++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 86e42742fd0fb..0a44a2cba3105 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -420,6 +420,7 @@ struct msi_domain_info;
  * @msi_init:		Domain specific init function for MSI interrupts
  * @msi_free:		Domain specific function to free a MSI interrupts
  * @msi_prepare:	Prepare the allocation of the interrupts in the domain
+ * @msi_teardown:	Reverse the effects of @msi_prepare
  * @prepare_desc:	Optional function to prepare the allocated MSI descriptor
  *			in the domain
  * @set_desc:		Set the msi descriptor for an interrupt
@@ -435,8 +436,9 @@ struct msi_domain_info;
  * @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
  * irqdomain.
  *
- * @msi_check, @msi_prepare, @prepare_desc and @set_desc are callbacks used by the
- * msi_domain_alloc/free_irqs*() variants.
+ * @msi_check, @msi_prepare, @msi_teardown, @prepare_desc and
+ * @set_desc are callbacks used by the msi_domain_alloc/free_irqs*()
+ * variants.
  *
  * @domain_alloc_irqs, @domain_free_irqs can be used to override the
  * default allocation/free functions (__msi_domain_alloc/free_irqs). This
@@ -458,6 +460,8 @@ struct msi_domain_ops {
 	int		(*msi_prepare)(struct irq_domain *domain,
 				       struct device *dev, int nvec,
 				       msi_alloc_info_t *arg);
+	void		(*msi_teardown)(struct irq_domain *domain,
+					msi_alloc_info_t *arg);
 	void		(*prepare_desc)(struct irq_domain *domain, msi_alloc_info_t *arg,
 					struct msi_desc *desc);
 	void		(*set_desc)(msi_alloc_info_t *arg,
@@ -486,6 +490,7 @@ struct msi_domain_ops {
  * @handler:		Optional: associated interrupt flow handler
  * @handler_data:	Optional: associated interrupt flow handler data
  * @handler_name:	Optional: associated interrupt flow handler name
+ * @alloc_data:		Optional: associated interrupt allocation data
  * @data:		Optional: domain specific data
  */
 struct msi_domain_info {
@@ -498,6 +503,7 @@ struct msi_domain_info {
 	irq_flow_handler_t		handler;
 	void				*handler_data;
 	const char			*handler_name;
+	msi_alloc_info_t		*alloc_data;
 	void				*data;
 };
 
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index c05ba7ca00faa..a65ccf19b15d9 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -796,6 +796,11 @@ static int msi_domain_ops_prepare(struct irq_domain *domain, struct device *dev,
 	return 0;
 }
 
+static void msi_domain_ops_teardown(struct irq_domain *domain,
+				    msi_alloc_info_t *arg)
+{
+}
+
 static void msi_domain_ops_set_desc(msi_alloc_info_t *arg,
 				    struct msi_desc *desc)
 {
@@ -821,6 +826,7 @@ static struct msi_domain_ops msi_domain_ops_default = {
 	.get_hwirq		= msi_domain_ops_get_hwirq,
 	.msi_init		= msi_domain_ops_init,
 	.msi_prepare		= msi_domain_ops_prepare,
+	.msi_teardown		= msi_domain_ops_teardown,
 	.set_desc		= msi_domain_ops_set_desc,
 };
 
@@ -842,6 +848,8 @@ static void msi_domain_update_dom_ops(struct msi_domain_info *info)
 		ops->msi_init = msi_domain_ops_default.msi_init;
 	if (ops->msi_prepare == NULL)
 		ops->msi_prepare = msi_domain_ops_default.msi_prepare;
+	if (ops->msi_teardown == NULL)
+		ops->msi_teardown = msi_domain_ops_default.msi_teardown;
 	if (ops->set_desc == NULL)
 		ops->set_desc = msi_domain_ops_default.set_desc;
 }
@@ -1088,6 +1096,10 @@ void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
 
 	dev->msi.data->__domains[domid].domain = NULL;
 	info = domain->host_data;
+
+	if (info->alloc_data)
+		info->ops->msi_teardown(domain, info->alloc_data);
+
 	if (irq_domain_is_msi_device(domain))
 		fwnode = domain->fwnode;
 	irq_domain_remove(domain);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
  2025-05-11 16:35 [PATCH 0/4] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
  2025-05-11 16:35 ` [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
@ 2025-05-11 16:35 ` Marc Zyngier
  2025-05-12 14:34   ` Thomas Gleixner
  2025-05-12 16:30   ` Lorenzo Pieralisi
  2025-05-11 16:35 ` [PATCH 3/4] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
  2025-05-11 16:35 ` [PATCH 4/4] irqchip/gic-v3-its: Use allocation size from the prepare call Marc Zyngier
  3 siblings, 2 replies; 15+ messages in thread
From: Marc Zyngier @ 2025-05-11 16:35 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

We currently nuke the structure representing an endpoint device
translating via an ITS on freeing the last LPI allocated for it.

That's an unfortunate state of affair, as it is pretty common for
a driver to allocate a single MSI, do something clever, teardown
this MSI, and reallocate a whole bunch of them. The nvme driver
does exactly that, amongst others.

What happens in that case is that the core code is buggy enough
to issue another .msi_prepare() call, even if it shouldn't.
This luckily cancels the above behaviour and hides the problem.

In order to fix the core code, let's start by implementing the new
.msi_teardown() callback. Nothing calls it yet, so a side effect
is that the its_dev structure will not be freed and that the DID
will stay mapped. Not a big deal, and this will be solved in the
following patch.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 10 ++++
 drivers/irqchip/irq-gic-v3-its.c            | 56 +++++++++++++--------
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
index bdb04c8081480..76b94f55b00e2 100644
--- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
@@ -159,6 +159,14 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
 					  dev, nvec, info);
 }
 
+static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
+{
+	struct msi_domain_info *msi_info;
+
+	msi_info = msi_get_domain_info(domain->parent);
+	msi_info->ops->msi_teardown(domain->parent, info);
+}
+
 static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 				  struct irq_domain *real_parent, struct msi_domain_info *info)
 {
@@ -182,6 +190,7 @@ static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		 * %MSI_MAX_INDEX.
 		 */
 		info->ops->msi_prepare = its_pci_msi_prepare;
+		info->ops->msi_teardown = its_msi_teardown;
 		break;
 	case DOMAIN_BUS_DEVICE_MSI:
 	case DOMAIN_BUS_WIRED_TO_MSI:
@@ -190,6 +199,7 @@ static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		 * size is also known at domain creation time.
 		 */
 		info->ops->msi_prepare = its_pmsi_prepare;
+		info->ops->msi_teardown = its_msi_teardown;
 		break;
 	default:
 		/* Confused. How did the lib return true? */
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 0115ad6c82593..3472b97477104 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3620,8 +3620,43 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 	return err;
 }
 
+static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
+{
+	struct msi_domain_info *msi_info;
+	struct its_device *its_dev;
+	struct its_node *its;
+	u32 dev_id;
+
+	dev_id = info->scratchpad[0].ul;
+
+	msi_info = msi_get_domain_info(domain);
+	its = msi_info->data;
+
+	guard(mutex)(&its->dev_alloc_lock);
+
+	its_dev = its_find_device(its, dev_id);
+
+	/* If the device is shared, keep everything around */
+	if (its_dev->shared)
+		return;
+
+	/* LPIs should have been already unmapped at this stage */
+	if (WARN_ON_ONCE(!bitmap_empty(its_dev->event_map.lpi_map,
+				       its_dev->event_map.nr_lpis)))
+		return;
+
+	its_lpi_free(its_dev->event_map.lpi_map,
+		     its_dev->event_map.lpi_base,
+		     its_dev->event_map.nr_lpis);
+
+	/* Unmap device/itt, and get rid of the tracking */
+	its_send_mapd(its_dev, 0);
+	its_free_device(its_dev);
+}
+
 static struct msi_domain_ops its_msi_domain_ops = {
 	.msi_prepare	= its_msi_prepare,
+	.msi_teardown	= its_msi_teardown,
 };
 
 static int its_irq_gic_domain_alloc(struct irq_domain *domain,
@@ -3722,7 +3757,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 {
 	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
 	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
-	struct its_node *its = its_dev->its;
 	int i;
 
 	bitmap_release_region(its_dev->event_map.lpi_map,
@@ -3736,26 +3770,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 		irq_domain_reset_irq_data(data);
 	}
 
-	mutex_lock(&its->dev_alloc_lock);
-
-	/*
-	 * If all interrupts have been freed, start mopping the
-	 * floor. This is conditioned on the device not being shared.
-	 */
-	if (!its_dev->shared &&
-	    bitmap_empty(its_dev->event_map.lpi_map,
-			 its_dev->event_map.nr_lpis)) {
-		its_lpi_free(its_dev->event_map.lpi_map,
-			     its_dev->event_map.lpi_base,
-			     its_dev->event_map.nr_lpis);
-
-		/* Unmap device/itt */
-		its_send_mapd(its_dev, 0);
-		its_free_device(its_dev);
-	}
-
-	mutex_unlock(&its->dev_alloc_lock);
-
 	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
 }
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] genirq/msi: Move prepare() call to per-device allocation
  2025-05-11 16:35 [PATCH 0/4] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
  2025-05-11 16:35 ` [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
  2025-05-11 16:35 ` [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
@ 2025-05-11 16:35 ` Marc Zyngier
  2025-05-12 14:24   ` Thomas Gleixner
  2025-05-11 16:35 ` [PATCH 4/4] irqchip/gic-v3-its: Use allocation size from the prepare call Marc Zyngier
  3 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2025-05-11 16:35 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

The current device MSI infrastructure is subtly broken, as it
will issue an .msi_prepare() callback into the MSI controller
driver every time it needs to allocate an MSI. That's pretty wrong,
as the contract between the MSI controller and the core code is that
.msi_prepare() is called exactly once per device.

This leads to some subtle breakage in said MSI controller drivers,
as it gives the impression that there are multiple endpoints sharing
a bus identifier (RID in PCI parlance, DID for GICv3+). It implies
that whatever allocation the ITS driver (for example) has done on
behalf of these devices cannot be undone, as there is no way to
track the shared state. This is particularly bad for wire-MSI devices,
for which .msi_prepare() is called for. each. input. line.

To address this issue, move the call to .msi_prepare() to take place
at the point of irq domain allocation, which is the only place that
makes sense. The msi_alloc_info_t structure is made part of the
msi_domain_template, so that its life-cycle is that of the domain
as well.

Finally, the msi_info::alloc_data field is made to point at this
allocation tracking structure, ensuring that it is carried around
the block.

This is all pretty straightforward, except for the non-device-MSI
leftovers, which still have to call .msi_prepare() at the old
spot. One day...

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 include/linux/msi.h |  2 ++
 kernel/irq/msi.c    | 39 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 0a44a2cba3105..68a8b2d03eba9 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -513,12 +513,14 @@ struct msi_domain_info {
  * @chip:	Interrupt chip for this domain
  * @ops:	MSI domain ops
  * @info:	MSI domain info data
+ * @arg:	MSI domain allocation data (arch specific)
  */
 struct msi_domain_template {
 	char			name[48];
 	struct irq_chip		chip;
 	struct msi_domain_ops	ops;
 	struct msi_domain_info	info;
+	msi_alloc_info_t	arg;
 };
 
 /*
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index a65ccf19b15d9..b8dc3289958c6 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -59,7 +59,8 @@ struct msi_ctrl {
 static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl);
 static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid);
 static inline int msi_sysfs_create_group(struct device *dev);
-
+static int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
+				   int nvec, msi_alloc_info_t *arg);
 
 /**
  * msi_alloc_desc - Allocate an initialized msi_desc
@@ -1025,6 +1026,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 	bundle->info.ops = &bundle->ops;
 	bundle->info.data = domain_data;
 	bundle->info.chip_data = chip_data;
+	bundle->info.alloc_data = &bundle->arg;
 
 	pops = parent->msi_parent_ops;
 	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
@@ -1053,21 +1055,28 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 	msi_lock_descs(dev);
 
 	if (WARN_ON_ONCE(msi_get_device_domain(dev, domid)))
-		goto fail;
+		goto fail_unlock;
 
 	if (!pops->init_dev_msi_info(dev, parent, parent, &bundle->info))
-		goto fail;
+		goto fail_unlock;
 
 	domain = __msi_create_irq_domain(fwnode, &bundle->info, IRQ_DOMAIN_FLAG_MSI_DEVICE, parent);
 	if (!domain)
-		goto fail;
+		goto fail_unlock;
 
 	domain->dev = dev;
 	dev->msi.data->__domains[domid].domain = domain;
+
+	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->arg))
+		goto fail;
+
 	msi_unlock_descs(dev);
 	return true;
 
 fail:
+	dev->msi.data->__domains[domid].domain = NULL;
+	irq_domain_remove(domain);
+fail_unlock:
 	msi_unlock_descs(dev);
 free_fwnode:
 	irq_domain_free_fwnode(fwnalloced);
@@ -1250,6 +1259,26 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
 	return 0;
 }
 
+static int __populate_alloc_info(struct irq_domain *domain, struct device *dev,
+				 unsigned int nirqs, msi_alloc_info_t *arg)
+{
+	struct msi_domain_info *info = domain->host_data;
+	int ret = 0;
+
+	/*
+	 * If the caller has provided a template alloc info, use that. Once
+	 * all users of msi_create_irq_domain() have been eliminated, this
+	 * should be the only source of allocation information, and the
+	 * prepare call below should be finally removed.
+	 */
+	if (info->alloc_data)
+		*arg = *info->alloc_data;
+	else
+		ret = msi_domain_prepare_irqs(domain, dev, nirqs, arg);
+
+	return ret;
+}
+
 static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain,
 				   struct msi_ctrl *ctrl)
 {
@@ -1262,7 +1291,7 @@ static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain
 	unsigned long idx;
 	int i, ret, virq;
 
-	ret = msi_domain_prepare_irqs(domain, dev, ctrl->nirqs, &arg);
+	ret = __populate_alloc_info(domain, dev, ctrl->nirqs, &arg);
 	if (ret)
 		return ret;
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] irqchip/gic-v3-its: Use allocation size from the prepare call
  2025-05-11 16:35 [PATCH 0/4] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
                   ` (2 preceding siblings ...)
  2025-05-11 16:35 ` [PATCH 3/4] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
@ 2025-05-11 16:35 ` Marc Zyngier
  3 siblings, 0 replies; 15+ messages in thread
From: Marc Zyngier @ 2025-05-11 16:35 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

Now that .msi_prepare() gets called at the right time and not
with semi-random parameters, remove the ugly hack that tried
to fix up the number of allocated vectors.

It is now correct by construction.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
index 76b94f55b00e2..2df75a758c104 100644
--- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
@@ -67,17 +67,6 @@ static int its_pci_msi_prepare(struct irq_domain *domain, struct device *dev,
 	/* ITS specific DeviceID, as the core ITS ignores dev. */
 	info->scratchpad[0].ul = pci_msi_domain_get_msi_rid(domain->parent, pdev);
 
-	/*
-	 * @domain->msi_domain_info->hwsize contains the size of the
-	 * MSI[-X] domain, but vector allocation happens one by one. This
-	 * needs some thought when MSI comes into play as the size of MSI
-	 * might be unknown at domain creation time and therefore set to
-	 * MSI_MAX_INDEX.
-	 */
-	msi_info = msi_get_domain_info(domain);
-	if (msi_info->hwsize > nvec)
-		nvec = msi_info->hwsize;
-
 	/*
 	 * Always allocate a power of 2, and special case device 0 for
 	 * broken systems where the DevID is not wired (and all devices
@@ -143,14 +132,6 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
 	/* ITS specific DeviceID, as the core ITS ignores dev. */
 	info->scratchpad[0].ul = dev_id;
 
-	/*
-	 * @domain->msi_domain_info->hwsize contains the size of the device
-	 * domain, but vector allocation happens one by one.
-	 */
-	msi_info = msi_get_domain_info(domain);
-	if (msi_info->hwsize > nvec)
-		nvec = msi_info->hwsize;
-
 	/* Allocate at least 32 MSIs, and always as a power of 2 */
 	nvec = max_t(int, 32, roundup_pow_of_two(nvec));
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] genirq/msi: Move prepare() call to per-device allocation
  2025-05-11 16:35 ` [PATCH 3/4] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
@ 2025-05-12 14:24   ` Thomas Gleixner
  2025-05-12 15:55     ` Marc Zyngier
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2025-05-12 14:24 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel, linux-arm-kernel
  Cc: Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:
> The current device MSI infrastructure is subtly broken, as it
> will issue an .msi_prepare() callback into the MSI controller
> driver every time it needs to allocate an MSI. That's pretty wrong,
> as the contract between the MSI controller and the core code is that
> .msi_prepare() is called exactly once per device.

That contract is nowhere written in stone.

There are some MSI controller which get confused about that, but that's
a problem of said controllers

> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 0a44a2cba3105..68a8b2d03eba9 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -513,12 +513,14 @@ struct msi_domain_info {
>   * @chip:	Interrupt chip for this domain
>   * @ops:	MSI domain ops
>   * @info:	MSI domain info data
> + * @arg:	MSI domain allocation data (arch specific)

arg is a horrible name. Can this please be alloc_info or such?

> @@ -1025,6 +1026,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
>  	bundle->info.ops = &bundle->ops;
>  	bundle->info.data = domain_data;
>  	bundle->info.chip_data = chip_data;
> +	bundle->info.alloc_data = &bundle->arg;
>  
>  	pops = parent->msi_parent_ops;
>  	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
> @@ -1053,21 +1055,28 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
>  	msi_lock_descs(dev);

Please work against tip irq/msi which carries the guard() replacement
for msi_lock_descs(). This patch heavily conflicts with the queued
changes.

> +static int __populate_alloc_info(struct irq_domain *domain, struct device *dev,
> +				 unsigned int nirqs, msi_alloc_info_t *arg)
> +{

Why does this need double underscores?

> +	struct msi_domain_info *info = domain->host_data;
> +	int ret = 0;
> +
> +	/*
> +	 * If the caller has provided a template alloc info, use that. Once
> +	 * all users of msi_create_irq_domain() have been eliminated, this
> +	 * should be the only source of allocation information, and the
> +	 * prepare call below should be finally removed.

That's only a matter of decades :)

> +	 */
> +	if (info->alloc_data)
> +		*arg = *info->alloc_data;
> +	else
> +		ret = msi_domain_prepare_irqs(domain, dev, nirqs, arg);
> +
> +	return ret;

	if (!info->alloc_data)
        	return msi_domain_prepare_irqs(domain, dev, nirqs, arg);

	*arg = *info->alloc_data;
        return 0;

perhaps?

Thanks,

        tglx

         


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
  2025-05-11 16:35 ` [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
@ 2025-05-12 14:29   ` Thomas Gleixner
  2025-05-12 15:57     ` Marc Zyngier
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2025-05-12 14:29 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel, linux-arm-kernel
  Cc: Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:

> While the MSI ops do have a .msi_prepare() callback that is
> responsible for setting up the relevant (usually per-device)
> allocation, we don't have a callback reversing this setup.

..., there is no callback reversing ...

> For this purpose, let's a .msi_teardown() callback. This is

'let's a ...' is not a sentence. Just say: add a .... calback.

> reliying on the msi_domain_info structure having a non-NULL

  ^^^^^ spell check is your friend.

> alloc_data field.
>
> Nobody is populating this field yet, so there is no change

No driver is ..

>  
> +static void msi_domain_ops_teardown(struct irq_domain *domain,
> +				    msi_alloc_info_t *arg)

No line break required.

> +{
> +}
> +
>  static void msi_domain_ops_set_desc(msi_alloc_info_t *arg,
>  				    struct msi_desc *desc)
>  {
> @@ -821,6 +826,7 @@ static struct msi_domain_ops msi_domain_ops_default = {
>  	.get_hwirq		= msi_domain_ops_get_hwirq,
>  	.msi_init		= msi_domain_ops_init,
>  	.msi_prepare		= msi_domain_ops_prepare,
> +	.msi_teardown		= msi_domain_ops_teardown,
>  	.set_desc		= msi_domain_ops_set_desc,
>  };
>  
> @@ -842,6 +848,8 @@ static void msi_domain_update_dom_ops(struct msi_domain_info *info)
>  		ops->msi_init = msi_domain_ops_default.msi_init;
>  	if (ops->msi_prepare == NULL)
>  		ops->msi_prepare = msi_domain_ops_default.msi_prepare;
> +	if (ops->msi_teardown == NULL)
> +		ops->msi_teardown = msi_domain_ops_default.msi_teardown;
>  	if (ops->set_desc == NULL)
>  		ops->set_desc = msi_domain_ops_default.set_desc;
>  }
> @@ -1088,6 +1096,10 @@ void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
>  
>  	dev->msi.data->__domains[domid].domain = NULL;
>  	info = domain->host_data;
> +
> +	if (info->alloc_data)
> +		info->ops->msi_teardown(domain, info->alloc_data);

Hmm, that's weird.

Why not call it unconditionally. The empty teardown() default callback
does not care about @arg being NULL. No?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
  2025-05-11 16:35 ` [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
@ 2025-05-12 14:34   ` Thomas Gleixner
  2025-05-12 16:09     ` Marc Zyngier
  2025-05-12 16:30   ` Lorenzo Pieralisi
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2025-05-12 14:34 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel, linux-arm-kernel
  Cc: Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:
> We currently nuke the structure representing an endpoint device

How is we? We means nothing as you know :)

> translating via an ITS on freeing the last LPI allocated for it.
>
> That's an unfortunate state of affair, as it is pretty common for
> a driver to allocate a single MSI, do something clever, teardown
> this MSI, and reallocate a whole bunch of them. The nvme driver
> does exactly that, amongst others.
>
> What happens in that case is that the core code is buggy enough
> to issue another .msi_prepare() call, even if it shouldn't.
> This luckily cancels the above behaviour and hides the problem.
>
> In order to fix the core code, let's start by implementing the new

s/let's//

I really have to understand why everyone is so fond of "let's". It only
makes limited sense when the patch is proposed, but in a change log it
does not make sense at all.

> .msi_teardown() callback. Nothing calls it yet, so a side effect
> is that the its_dev structure will not be freed and that the DID
> will stay mapped. Not a big deal, and this will be solved in the
> following patch.

Now I see why you added this incomprehensible condition into the core
code. Bah.

Why don't you add this callback once you changed the prepare muck, which
introduces info::alloc_data?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] genirq/msi: Move prepare() call to per-device allocation
  2025-05-12 14:24   ` Thomas Gleixner
@ 2025-05-12 15:55     ` Marc Zyngier
  0 siblings, 0 replies; 15+ messages in thread
From: Marc Zyngier @ 2025-05-12 15:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, 12 May 2025 15:24:39 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:
> > The current device MSI infrastructure is subtly broken, as it
> > will issue an .msi_prepare() callback into the MSI controller
> > driver every time it needs to allocate an MSI. That's pretty wrong,
> > as the contract between the MSI controller and the core code is that
> > .msi_prepare() is called exactly once per device.
> 
> That contract is nowhere written in stone.

It was *definitely* there the first place, and a baked in assumption
since the ITS code was merged. You're welcome to come up with a new
scheme, but the way the HW works requires this prepare phase to take
place once per device.

If we can't have that, maybe we should consider reverting the GICv3/v4
code back to the pre-6.10 scheme that doesn't suffer from this issue.

> There are some MSI controller which get confused about that, but that's
> a problem of said controllers

No. It's an infrastructure problem. This model worked before for a
whole class of HW, until it was mutated into something else.

> 
> > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > index 0a44a2cba3105..68a8b2d03eba9 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -513,12 +513,14 @@ struct msi_domain_info {
> >   * @chip:	Interrupt chip for this domain
> >   * @ops:	MSI domain ops
> >   * @info:	MSI domain info data
> > + * @arg:	MSI domain allocation data (arch specific)
> 
> arg is a horrible name. Can this please be alloc_info or such?

Because that's the name every single function that takes it as a
parameter uses? But sure, whatever name you want.

> 
> > @@ -1025,6 +1026,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
> >  	bundle->info.ops = &bundle->ops;
> >  	bundle->info.data = domain_data;
> >  	bundle->info.chip_data = chip_data;
> > +	bundle->info.alloc_data = &bundle->arg;
> >  
> >  	pops = parent->msi_parent_ops;
> >  	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
> > @@ -1053,21 +1055,28 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
> >  	msi_lock_descs(dev);
> 
> Please work against tip irq/msi which carries the guard() replacement
> for msi_lock_descs(). This patch heavily conflicts with the queued
> changes.
> 
> > +static int __populate_alloc_info(struct irq_domain *domain, struct device *dev,
> > +				 unsigned int nirqs, msi_alloc_info_t *arg)
> > +{
> 
> Why does this need double underscores?

Because it doesn't look that out of place in this file?

> 
> > +	struct msi_domain_info *info = domain->host_data;
> > +	int ret = 0;
> > +
> > +	/*
> > +	 * If the caller has provided a template alloc info, use that. Once
> > +	 * all users of msi_create_irq_domain() have been eliminated, this
> > +	 * should be the only source of allocation information, and the
> > +	 * prepare call below should be finally removed.
> 
> That's only a matter of decades :)
>
> > +	 */
> > +	if (info->alloc_data)
> > +		*arg = *info->alloc_data;
> > +	else
> > +		ret = msi_domain_prepare_irqs(domain, dev, nirqs, arg);
> > +
> > +	return ret;
> 
> 	if (!info->alloc_data)
>         	return msi_domain_prepare_irqs(domain, dev, nirqs, arg);
> 
> 	*arg = *info->alloc_data;
>         return 0;
> 
> perhaps?

Sure.

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
  2025-05-12 14:29   ` Thomas Gleixner
@ 2025-05-12 15:57     ` Marc Zyngier
  2025-05-12 18:46       ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2025-05-12 15:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, 12 May 2025 15:29:39 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:
> 
> > While the MSI ops do have a .msi_prepare() callback that is
> > responsible for setting up the relevant (usually per-device)
> > allocation, we don't have a callback reversing this setup.
> 
> ..., there is no callback reversing ...
> 
> > For this purpose, let's a .msi_teardown() callback. This is
> 
> 'let's a ...' is not a sentence. Just say: add a .... calback.
>
> > reliying on the msi_domain_info structure having a non-NULL
> 
>   ^^^^^ spell check is your friend.

I rely on humans for that. But maybe I should ask someone to put these
recommendations into one of these AI bots, and generate the stuff
automatically. It will be devoid of any actual reasoning, but at least
it will have the "officially sanctioned" verbiage.

> 
> > alloc_data field.
> >
> > Nobody is populating this field yet, so there is no change
> 
> No driver is ..

No. There is definitely no driver populating this, nor there will ever
be. That's 100% MSI infrastructure.

> 
> >  
> > +static void msi_domain_ops_teardown(struct irq_domain *domain,
> > +				    msi_alloc_info_t *arg)
> 
> No line break required.

You mean...

> 
> > +{
> > +}
> > +
> >  static void msi_domain_ops_set_desc(msi_alloc_info_t *arg,
> >  				    struct msi_desc *desc)

... not like this?

> >  {
> > @@ -821,6 +826,7 @@ static struct msi_domain_ops msi_domain_ops_default = {
> >  	.get_hwirq		= msi_domain_ops_get_hwirq,
> >  	.msi_init		= msi_domain_ops_init,
> >  	.msi_prepare		= msi_domain_ops_prepare,
> > +	.msi_teardown		= msi_domain_ops_teardown,
> >  	.set_desc		= msi_domain_ops_set_desc,
> >  };
> >  
> > @@ -842,6 +848,8 @@ static void msi_domain_update_dom_ops(struct msi_domain_info *info)
> >  		ops->msi_init = msi_domain_ops_default.msi_init;
> >  	if (ops->msi_prepare == NULL)
> >  		ops->msi_prepare = msi_domain_ops_default.msi_prepare;
> > +	if (ops->msi_teardown == NULL)
> > +		ops->msi_teardown = msi_domain_ops_default.msi_teardown;
> >  	if (ops->set_desc == NULL)
> >  		ops->set_desc = msi_domain_ops_default.set_desc;
> >  }
> > @@ -1088,6 +1096,10 @@ void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
> >  
> >  	dev->msi.data->__domains[domid].domain = NULL;
> >  	info = domain->host_data;
> > +
> > +	if (info->alloc_data)
> > +		info->ops->msi_teardown(domain, info->alloc_data);
> 
> Hmm, that's weird.
> 
> Why not call it unconditionally. The empty teardown() default callback
> does not care about @arg being NULL. No?

Because at this point, nothing populates that pointer. It is only
after patch 3 that this pointer is valid. After patch 2, we get a
non-default callback, which should never be presented with a NULL
allocation context.

And since I value keeping things bisectable, it has to happen in this
order.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
  2025-05-12 14:34   ` Thomas Gleixner
@ 2025-05-12 16:09     ` Marc Zyngier
  2025-05-12 18:47       ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2025-05-12 16:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, 12 May 2025 15:34:59 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Sun, May 11 2025 at 17:35, Marc Zyngier wrote:
> > We currently nuke the structure representing an endpoint device
> 
> How is we? We means nothing as you know :)

We means a lot to *me*.

> 
> > translating via an ITS on freeing the last LPI allocated for it.
> >
> > That's an unfortunate state of affair, as it is pretty common for
> > a driver to allocate a single MSI, do something clever, teardown
> > this MSI, and reallocate a whole bunch of them. The nvme driver
> > does exactly that, amongst others.
> >
> > What happens in that case is that the core code is buggy enough
> > to issue another .msi_prepare() call, even if it shouldn't.
> > This luckily cancels the above behaviour and hides the problem.
> >
> > In order to fix the core code, let's start by implementing the new
> 
> s/let's//
> 
> I really have to understand why everyone is so fond of "let's". It only
> makes limited sense when the patch is proposed, but in a change log it
> does not make sense at all.
> 
> > .msi_teardown() callback. Nothing calls it yet, so a side effect
> > is that the its_dev structure will not be freed and that the DID
> > will stay mapped. Not a big deal, and this will be solved in the
> > following patch.
> 
> Now I see why you added this incomprehensible condition into the core
> code. Bah.
> 
> Why don't you add this callback once you changed the prepare muck, which
> introduces info::alloc_data?

Changing prepare first breaks nvme and igbxe, amongst other things,
for the reasons explained just above, so you need to implement
teardown first, plug the MSI controller into it, and only then you can
switch prepare.

What I can do is:

(1) introduce teardown without the call site in msi_remove_device_irq_domain()

(2) add its implementation in the ITS driver

(3) move the prepare crap

(4) add the teardown call to msi_remove_device_irq_domain(), without
    the check on info->alloc_data

With that order, everything will keep working, with a temporary leak
of ITTs in the ITS driver between patches (2) and (4).

Is that something you can live with?

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
  2025-05-11 16:35 ` [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
  2025-05-12 14:34   ` Thomas Gleixner
@ 2025-05-12 16:30   ` Lorenzo Pieralisi
  2025-05-12 17:00     ` Marc Zyngier
  1 sibling, 1 reply; 15+ messages in thread
From: Lorenzo Pieralisi @ 2025-05-12 16:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner, Sascha Bischoff,
	Timothy Hayes

On Sun, May 11, 2025 at 05:35:18PM +0100, Marc Zyngier wrote:
> We currently nuke the structure representing an endpoint device
> translating via an ITS on freeing the last LPI allocated for it.
> 
> That's an unfortunate state of affair, as it is pretty common for
> a driver to allocate a single MSI, do something clever, teardown
> this MSI, and reallocate a whole bunch of them. The nvme driver
> does exactly that, amongst others.
> 
> What happens in that case is that the core code is buggy enough
> to issue another .msi_prepare() call, even if it shouldn't.
> This luckily cancels the above behaviour and hides the problem.
> 
> In order to fix the core code, let's start by implementing the new
> .msi_teardown() callback. Nothing calls it yet, so a side effect
> is that the its_dev structure will not be freed and that the DID
> will stay mapped. Not a big deal, and this will be solved in the
> following patch.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  drivers/irqchip/irq-gic-v3-its-msi-parent.c | 10 ++++
>  drivers/irqchip/irq-gic-v3-its.c            | 56 +++++++++++++--------
>  2 files changed, 45 insertions(+), 21 deletions(-)

First off, thanks a lot for putting this together, it makes an awful
lot of sense to me.

> index 0115ad6c82593..3472b97477104 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3620,8 +3620,43 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
>  	return err;
>  }
>  
> +static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
> +{
> +	struct msi_domain_info *msi_info;
> +	struct its_device *its_dev;
> +	struct its_node *its;
> +	u32 dev_id;
> +
> +	dev_id = info->scratchpad[0].ul;

I have just managed to get to a keyboard :), I don't think the dev_id
makes it to this point, we overwrite it with the its_dev pointer in
its_msi_prepare() (could use second scratchpad for the pointer maybe ?).

I was bitten by this while removing the old IWB code into the new one
(unrelated to this code but that's how I noticed scratchpad is a union).

Ignore me if I am mistaken, just reading the code, have not tested it
(but I am about to do it for v5).

Thanks,
Lorenzo

> +
> +	msi_info = msi_get_domain_info(domain);
> +	its = msi_info->data;
> +
> +	guard(mutex)(&its->dev_alloc_lock);
> +
> +	its_dev = its_find_device(its, dev_id);
> +
> +	/* If the device is shared, keep everything around */
> +	if (its_dev->shared)
> +		return;
> +
> +	/* LPIs should have been already unmapped at this stage */
> +	if (WARN_ON_ONCE(!bitmap_empty(its_dev->event_map.lpi_map,
> +				       its_dev->event_map.nr_lpis)))
> +		return;
> +
> +	its_lpi_free(its_dev->event_map.lpi_map,
> +		     its_dev->event_map.lpi_base,
> +		     its_dev->event_map.nr_lpis);
> +
> +	/* Unmap device/itt, and get rid of the tracking */
> +	its_send_mapd(its_dev, 0);
> +	its_free_device(its_dev);
> +}
> +
>  static struct msi_domain_ops its_msi_domain_ops = {
>  	.msi_prepare	= its_msi_prepare,
> +	.msi_teardown	= its_msi_teardown,
>  };
>  
>  static int its_irq_gic_domain_alloc(struct irq_domain *domain,
> @@ -3722,7 +3757,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
>  {
>  	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
>  	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
> -	struct its_node *its = its_dev->its;
>  	int i;
>  
>  	bitmap_release_region(its_dev->event_map.lpi_map,
> @@ -3736,26 +3770,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
>  		irq_domain_reset_irq_data(data);
>  	}
>  
> -	mutex_lock(&its->dev_alloc_lock);
> -
> -	/*
> -	 * If all interrupts have been freed, start mopping the
> -	 * floor. This is conditioned on the device not being shared.
> -	 */
> -	if (!its_dev->shared &&
> -	    bitmap_empty(its_dev->event_map.lpi_map,
> -			 its_dev->event_map.nr_lpis)) {
> -		its_lpi_free(its_dev->event_map.lpi_map,
> -			     its_dev->event_map.lpi_base,
> -			     its_dev->event_map.nr_lpis);
> -
> -		/* Unmap device/itt */
> -		its_send_mapd(its_dev, 0);
> -		its_free_device(its_dev);
> -	}
> -
> -	mutex_unlock(&its->dev_alloc_lock);
> -
>  	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
>  }
>  
> -- 
> 2.39.2
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
  2025-05-12 16:30   ` Lorenzo Pieralisi
@ 2025-05-12 17:00     ` Marc Zyngier
  0 siblings, 0 replies; 15+ messages in thread
From: Marc Zyngier @ 2025-05-12 17:00 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner, Sascha Bischoff,
	Timothy Hayes

On Mon, 12 May 2025 17:30:58 +0100,
Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> 
> On Sun, May 11, 2025 at 05:35:18PM +0100, Marc Zyngier wrote:
> > We currently nuke the structure representing an endpoint device
> > translating via an ITS on freeing the last LPI allocated for it.
> > 
> > That's an unfortunate state of affair, as it is pretty common for
> > a driver to allocate a single MSI, do something clever, teardown
> > this MSI, and reallocate a whole bunch of them. The nvme driver
> > does exactly that, amongst others.
> > 
> > What happens in that case is that the core code is buggy enough
> > to issue another .msi_prepare() call, even if it shouldn't.
> > This luckily cancels the above behaviour and hides the problem.
> > 
> > In order to fix the core code, let's start by implementing the new
> > .msi_teardown() callback. Nothing calls it yet, so a side effect
> > is that the its_dev structure will not be freed and that the DID
> > will stay mapped. Not a big deal, and this will be solved in the
> > following patch.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  drivers/irqchip/irq-gic-v3-its-msi-parent.c | 10 ++++
> >  drivers/irqchip/irq-gic-v3-its.c            | 56 +++++++++++++--------
> >  2 files changed, 45 insertions(+), 21 deletions(-)
> 
> First off, thanks a lot for putting this together, it makes an awful
> lot of sense to me.
> 
> > index 0115ad6c82593..3472b97477104 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -3620,8 +3620,43 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
> >  	return err;
> >  }
> >  
> > +static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
> > +{
> > +	struct msi_domain_info *msi_info;
> > +	struct its_device *its_dev;
> > +	struct its_node *its;
> > +	u32 dev_id;
> > +
> > +	dev_id = info->scratchpad[0].ul;
> 
> I have just managed to get to a keyboard :), I don't think the dev_id
> makes it to this point, we overwrite it with the its_dev pointer in
> its_msi_prepare() (could use second scratchpad for the pointer maybe ?).
> 
> I was bitten by this while removing the old IWB code into the new one
> (unrelated to this code but that's how I noticed scratchpad is a union).

Gah, this is missing the fixup I had on top and that I didn't squash:

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d8c4d3b8256f3..7e0e7f0160936 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3622,19 +3622,9 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 
 static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
 {
-	struct msi_domain_info *msi_info;
-	struct its_device *its_dev;
-	struct its_node *its;
-	u32 dev_id;
-
-	dev_id = info->scratchpad[0].ul;
-
-	msi_info = msi_get_domain_info(domain);
-	its = msi_info->data;
-
-	guard(mutex)(&its->dev_alloc_lock);
+	struct its_device *its_dev = info->scratchpad[0].ptr;
 
-	its_dev = its_find_device(its, dev_id);
+	guard(mutex)(&its_dev->its->dev_alloc_lock);
 
 	/* If the device is shared, keep everything around */
 	if (its_dev->shared)


So no need for another scratchpad entry -- the device structure has
everything we need. I'll repost things with tglx's comments addressed
and this fix correctly squashed in.

Sorry for the noise,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
  2025-05-12 15:57     ` Marc Zyngier
@ 2025-05-12 18:46       ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2025-05-12 18:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, May 12 2025 at 16:57, Marc Zyngier wrote:
> On Mon, 12 May 2025 15:29:39 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> > +static void msi_domain_ops_teardown(struct irq_domain *domain,
>> > +				    msi_alloc_info_t *arg)
>> 
>> No line break required.
>
> You mean...
>
>> 
>> > +{
>> > +}
>> > +
>> >  static void msi_domain_ops_set_desc(msi_alloc_info_t *arg,
>> >  				    struct msi_desc *desc)
>
> ... not like this?

Right. That was written before we went to 100 characters line length.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback
  2025-05-12 16:09     ` Marc Zyngier
@ 2025-05-12 18:47       ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2025-05-12 18:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, May 12 2025 at 17:09, Marc Zyngier wrote:
> On Mon, 12 May 2025 15:34:59 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Changing prepare first breaks nvme and igbxe, amongst other things,
> for the reasons explained just above, so you need to implement
> teardown first, plug the MSI controller into it, and only then you can
> switch prepare.
>
> What I can do is:
>
> (1) introduce teardown without the call site in msi_remove_device_irq_domain()
>
> (2) add its implementation in the ITS driver
>
> (3) move the prepare crap
>
> (4) add the teardown call to msi_remove_device_irq_domain(), without
>     the check on info->alloc_data
>
> With that order, everything will keep working, with a temporary leak
> of ITTs in the ITS driver between patches (2) and (4).
>
> Is that something you can live with?

Sounds good.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-05-12 19:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-11 16:35 [PATCH 0/4] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
2025-05-11 16:35 ` [PATCH 1/4] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
2025-05-12 14:29   ` Thomas Gleixner
2025-05-12 15:57     ` Marc Zyngier
2025-05-12 18:46       ` Thomas Gleixner
2025-05-11 16:35 ` [PATCH 2/4] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
2025-05-12 14:34   ` Thomas Gleixner
2025-05-12 16:09     ` Marc Zyngier
2025-05-12 18:47       ` Thomas Gleixner
2025-05-12 16:30   ` Lorenzo Pieralisi
2025-05-12 17:00     ` Marc Zyngier
2025-05-11 16:35 ` [PATCH 3/4] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
2025-05-12 14:24   ` Thomas Gleixner
2025-05-12 15:55     ` Marc Zyngier
2025-05-11 16:35 ` [PATCH 4/4] irqchip/gic-v3-its: Use allocation size from the prepare call Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).