linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] genirq/msi: Fix device MSI prepare/alloc sequencing
@ 2025-05-13 16:31 Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 1/5] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Marc Zyngier @ 2025-05-13 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

While reviewing (and debugging) the GICv5 code, it became obvious that there
was something pretty fishy about the way MSI allocation was taking place in
respect to the msi_prepare() callback.

There is a strong (though not 100% explicit) requirement that the
.msi_prepare() callback must take place exactly once during the lifetime of
an MSI domain, before any MSI is allocated. Sadly, that's not how the core
code behaves, leading to all sorts of "interesting" issues with MBIGEN or
the GICv5 IWB.

The fix is both simple and surprisingly convoluted. The first course of
action would be to hoist the "prepare" action before we allocate any MSI,
adding new tracking for the msi_alloc_info_t structure.

But this unveils another issue that the core code has been papering over,
which is that there is no pendant to the .msi_prepare() callback that would
be called on irqdomain destruction, and that at least the ITS code has been
using some crap heuristics to work around it.

So this needs to be addressed first, and that's the point of the first two
patches, introducing and implementing the .msi_teardown() callback.  The
.msi_prepare() call is then moved is the specific case of device-MSI, and
the .msi_teardown() callback wired up.

The last patch remove a gross hack in the ITS msi-parent code, which
was papering over the broken use of the "prepare" callback.

This has been tested on a GICv4 system with MBIGEN, patches on top of
tip/irq/msi (a1d8a83093675).

* From v1 [1]:

  - Fixed blunder in the ITS .msi_teardown()

  - Fixed spelling, grammar, and other use of unclear terms

  - Fixed line splitting, disgraceful underscores

  - Split the calling of .msi_teardown() into its own patch

  - Rebased on top of tip/irq/msi

[1] https://lore.kernel.org/r/20250511163520.1307654-1-maz@kernel.org

Marc Zyngier (5):
  genirq/msi: Add .msi_teardown() callback as the reverse of
    .msi_prepare()
  irqchip/gic-v3-its: Implement .msi_teardown() callback
  genirq/msi: Move prepare() call to per-device allocation
  genirq/msi: Engage the .msi_teardown() callback on domain removal
  irqchip/gic-v3-its: Use allocation size from the prepare call

 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 29 +++++--------
 drivers/irqchip/irq-gic-v3-its.c            | 46 +++++++++++----------
 include/linux/msi.h                         | 12 +++++-
 kernel/irq/msi.c                            | 45 ++++++++++++++++++--
 4 files changed, 86 insertions(+), 46 deletions(-)

-- 
2.39.2



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/5] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare()
  2025-05-13 16:31 [PATCH v2 0/5] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
@ 2025-05-13 16:31 ` Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 2/5] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Marc Zyngier @ 2025-05-13 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

While the MSI ops do have a .msi_prepare() callback that is
responsible for setting up the relevant (usually per-device)
allocation, there is no callback reversing this setup.

For this purpose, add .msi_teardown() callback.

In order to avoid breaking the ITS driver that suffers from related
issues, do not call the call back just yet.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 include/linux/msi.h | 10 ++++++++--
 kernel/irq/msi.c    |  8 ++++++++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8c0ec9fc05a39..63c23003ec9b7 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -423,6 +423,7 @@ struct msi_domain_info;
  * @msi_init:		Domain specific init function for MSI interrupts
  * @msi_free:		Domain specific function to free a MSI interrupts
  * @msi_prepare:	Prepare the allocation of the interrupts in the domain
+ * @msi_teardown:	Reverse the effects of @msi_prepare
  * @prepare_desc:	Optional function to prepare the allocated MSI descriptor
  *			in the domain
  * @set_desc:		Set the msi descriptor for an interrupt
@@ -438,8 +439,9 @@ struct msi_domain_info;
  * @get_hwirq, @msi_init and @msi_free are callbacks used by the underlying
  * irqdomain.
  *
- * @msi_check, @msi_prepare, @prepare_desc and @set_desc are callbacks used by the
- * msi_domain_alloc/free_irqs*() variants.
+ * @msi_check, @msi_prepare, @msi_teardown, @prepare_desc and
+ * @set_desc are callbacks used by the msi_domain_alloc/free_irqs*()
+ * variants.
  *
  * @domain_alloc_irqs, @domain_free_irqs can be used to override the
  * default allocation/free functions (__msi_domain_alloc/free_irqs). This
@@ -461,6 +463,8 @@ struct msi_domain_ops {
 	int		(*msi_prepare)(struct irq_domain *domain,
 				       struct device *dev, int nvec,
 				       msi_alloc_info_t *arg);
+	void		(*msi_teardown)(struct irq_domain *domain,
+					msi_alloc_info_t *arg);
 	void		(*prepare_desc)(struct irq_domain *domain, msi_alloc_info_t *arg,
 					struct msi_desc *desc);
 	void		(*set_desc)(msi_alloc_info_t *arg,
@@ -489,6 +493,7 @@ struct msi_domain_ops {
  * @handler:		Optional: associated interrupt flow handler
  * @handler_data:	Optional: associated interrupt flow handler data
  * @handler_name:	Optional: associated interrupt flow handler name
+ * @alloc_data:		Optional: associated interrupt allocation data
  * @data:		Optional: domain specific data
  */
 struct msi_domain_info {
@@ -501,6 +506,7 @@ struct msi_domain_info {
 	irq_flow_handler_t		handler;
 	void				*handler_data;
 	const char			*handler_name;
+	msi_alloc_info_t		*alloc_data;
 	void				*data;
 };
 
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 2b3ef2333dbc8..a3b34c3c599be 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -795,6 +795,11 @@ static int msi_domain_ops_prepare(struct irq_domain *domain, struct device *dev,
 	return 0;
 }
 
+static void msi_domain_ops_teardown(struct irq_domain *domain,
+				    msi_alloc_info_t *arg)
+{
+}
+
 static void msi_domain_ops_set_desc(msi_alloc_info_t *arg,
 				    struct msi_desc *desc)
 {
@@ -820,6 +825,7 @@ static struct msi_domain_ops msi_domain_ops_default = {
 	.get_hwirq		= msi_domain_ops_get_hwirq,
 	.msi_init		= msi_domain_ops_init,
 	.msi_prepare		= msi_domain_ops_prepare,
+	.msi_teardown		= msi_domain_ops_teardown,
 	.set_desc		= msi_domain_ops_set_desc,
 };
 
@@ -841,6 +847,8 @@ static void msi_domain_update_dom_ops(struct msi_domain_info *info)
 		ops->msi_init = msi_domain_ops_default.msi_init;
 	if (ops->msi_prepare == NULL)
 		ops->msi_prepare = msi_domain_ops_default.msi_prepare;
+	if (ops->msi_teardown == NULL)
+		ops->msi_teardown = msi_domain_ops_default.msi_teardown;
 	if (ops->set_desc == NULL)
 		ops->set_desc = msi_domain_ops_default.set_desc;
 }
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/5] irqchip/gic-v3-its: Implement .msi_teardown() callback
  2025-05-13 16:31 [PATCH v2 0/5] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 1/5] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
@ 2025-05-13 16:31 ` Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Marc Zyngier @ 2025-05-13 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

The ITS driver currently nukes the structure representing
an endpoint device translating via an ITS on freeing the last LPI
allocated for it.

That's an unfortunate state of affair, as it is pretty common for
a driver to allocate a single MSI, do something clever, teardown
this MSI, and reallocate a whole bunch of them. The nvme driver
does exactly that, amongst others.

What happens in that case is that the core code is buggy enough
to issue another .msi_prepare() call, even if it shouldn't.
This luckily cancels the above behaviour and hides the problem.

In order to fix the core code, start by implementing the new
.msi_teardown() callback. Nothing calls it yet, so a side effect
is that the its_dev structure will not be freed and that the DID
will stay mapped. Not a big deal, and this will be solved in
following patches.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 10 +++++
 drivers/irqchip/irq-gic-v3-its.c            | 46 +++++++++++----------
 kernel/irq/msi.c                            |  3 +-
 3 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
index 68f9ba4085ce5..958736622fa57 100644
--- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
@@ -167,6 +167,14 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
 					  dev, nvec, info);
 }
 
+static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
+{
+	struct msi_domain_info *msi_info;
+
+	msi_info = msi_get_domain_info(domain->parent);
+	msi_info->ops->msi_teardown(domain->parent, info);
+}
+
 static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 				  struct irq_domain *real_parent, struct msi_domain_info *info)
 {
@@ -190,6 +198,7 @@ static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		 * %MSI_MAX_INDEX.
 		 */
 		info->ops->msi_prepare = its_pci_msi_prepare;
+		info->ops->msi_teardown = its_msi_teardown;
 		break;
 	case DOMAIN_BUS_DEVICE_MSI:
 	case DOMAIN_BUS_WIRED_TO_MSI:
@@ -198,6 +207,7 @@ static bool its_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
 		 * size is also known at domain creation time.
 		 */
 		info->ops->msi_prepare = its_pmsi_prepare;
+		info->ops->msi_teardown = its_msi_teardown;
 		break;
 	default:
 		/* Confused. How did the lib return true? */
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fd6e7c170d37e..a77f11e23ad6c 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3620,8 +3620,33 @@ static int its_msi_prepare(struct irq_domain *domain, struct device *dev,
 	return err;
 }
 
+static void its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
+{
+	struct its_device *its_dev = info->scratchpad[0].ptr;
+
+	guard(mutex)(&its_dev->its->dev_alloc_lock);
+
+	/* If the device is shared, keep everything around */
+	if (its_dev->shared)
+		return;
+
+	/* LPIs should have been already unmapped at this stage */
+	if (WARN_ON_ONCE(!bitmap_empty(its_dev->event_map.lpi_map,
+				       its_dev->event_map.nr_lpis)))
+		return;
+
+	its_lpi_free(its_dev->event_map.lpi_map,
+		     its_dev->event_map.lpi_base,
+		     its_dev->event_map.nr_lpis);
+
+	/* Unmap device/itt, and get rid of the tracking */
+	its_send_mapd(its_dev, 0);
+	its_free_device(its_dev);
+}
+
 static struct msi_domain_ops its_msi_domain_ops = {
 	.msi_prepare	= its_msi_prepare,
+	.msi_teardown	= its_msi_teardown,
 };
 
 static int its_irq_gic_domain_alloc(struct irq_domain *domain,
@@ -3722,7 +3747,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 {
 	struct irq_data *d = irq_domain_get_irq_data(domain, virq);
 	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
-	struct its_node *its = its_dev->its;
 	int i;
 
 	bitmap_release_region(its_dev->event_map.lpi_map,
@@ -3736,26 +3760,6 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 		irq_domain_reset_irq_data(data);
 	}
 
-	mutex_lock(&its->dev_alloc_lock);
-
-	/*
-	 * If all interrupts have been freed, start mopping the
-	 * floor. This is conditioned on the device not being shared.
-	 */
-	if (!its_dev->shared &&
-	    bitmap_empty(its_dev->event_map.lpi_map,
-			 its_dev->event_map.nr_lpis)) {
-		its_lpi_free(its_dev->event_map.lpi_map,
-			     its_dev->event_map.lpi_base,
-			     its_dev->event_map.nr_lpis);
-
-		/* Unmap device/itt */
-		its_send_mapd(its_dev, 0);
-		its_free_device(its_dev);
-	}
-
-	mutex_unlock(&its->dev_alloc_lock);
-
 	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
 }
 
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index a3b34c3c599be..31378a2535fb9 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -795,8 +795,7 @@ static int msi_domain_ops_prepare(struct irq_domain *domain, struct device *dev,
 	return 0;
 }
 
-static void msi_domain_ops_teardown(struct irq_domain *domain,
-				    msi_alloc_info_t *arg)
+static void msi_domain_ops_teardown(struct irq_domain *domain, msi_alloc_info_t *arg)
 {
 }
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-05-13 16:31 [PATCH v2 0/5] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 1/5] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 2/5] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
@ 2025-05-13 16:31 ` Marc Zyngier
  2025-06-03  8:22   ` Zenghui Yu
  2025-05-13 16:31 ` [PATCH v2 4/5] genirq/msi: Engage the .msi_teardown() callback on domain removal Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call Marc Zyngier
  4 siblings, 1 reply; 22+ messages in thread
From: Marc Zyngier @ 2025-05-13 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

The current device MSI infrastructure is subtly broken, as it
will issue an .msi_prepare() callback into the MSI controller
driver every time it needs to allocate an MSI. That's pretty wrong,
as the contract (or unwarranted assumption, depending who you ask)
between the MSI controller and the core code is that .msi_prepare()
is called exactly once per device.

This leads to some subtle breakage in said MSI controller drivers,
as it gives the impression that there are multiple endpoints sharing
a bus identifier (RID in PCI parlance, DID for GICv3+). It implies
that whatever allocation the ITS driver (for example) has done on
behalf of these devices cannot be undone, as there is no way to
track the shared state. This is particularly bad for wire-MSI devices,
for which .msi_prepare() is called for. each. input. line.

To address this issue, move the call to .msi_prepare() to take place
at the point of irq domain allocation, which is the only place that
makes sense. The msi_alloc_info_t structure is made part of the
msi_domain_template, so that its life-cycle is that of the domain
as well.

Finally, the msi_info::alloc_data field is made to point at this
allocation tracking structure, ensuring that it is carried around
the block.

This is all pretty straightforward, except for the non-device-MSI
leftovers, which still have to call .msi_prepare() at the old
spot. One day...

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 include/linux/msi.h |  2 ++
 kernel/irq/msi.c    | 35 +++++++++++++++++++++++++++++++----
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 63c23003ec9b7..ba1c77a829a1c 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -516,12 +516,14 @@ struct msi_domain_info {
  * @chip:	Interrupt chip for this domain
  * @ops:	MSI domain ops
  * @info:	MSI domain info data
+ * @alloc_info:	MSI domain allocation data (arch specific)
  */
 struct msi_domain_template {
 	char			name[48];
 	struct irq_chip		chip;
 	struct msi_domain_ops	ops;
 	struct msi_domain_info	info;
+	msi_alloc_info_t	alloc_info;
 };
 
 /*
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 31378a2535fb9..07eb857efd15e 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -59,7 +59,8 @@ struct msi_ctrl {
 static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl);
 static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid);
 static inline int msi_sysfs_create_group(struct device *dev);
-
+static int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
+				   int nvec, msi_alloc_info_t *arg);
 
 /**
  * msi_alloc_desc - Allocate an initialized msi_desc
@@ -1023,6 +1024,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 	bundle->info.ops = &bundle->ops;
 	bundle->info.data = domain_data;
 	bundle->info.chip_data = chip_data;
+	bundle->info.alloc_data = &bundle->alloc_info;
 
 	pops = parent->msi_parent_ops;
 	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
@@ -1061,11 +1063,18 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
 	if (!domain)
 		return false;
 
+	domain->dev = dev;
+	dev->msi.data->__domains[domid].domain = domain;
+
+	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
+		dev->msi.data->__domains[domid].domain = NULL;
+		irq_domain_remove(domain);
+		return false;
+	}
+
 	/* @bundle and @fwnode_alloced are now in use. Prevent cleanup */
 	retain_and_null_ptr(bundle);
 	retain_and_null_ptr(fwnode_alloced);
-	domain->dev = dev;
-	dev->msi.data->__domains[domid].domain = domain;
 	return true;
 }
 
@@ -1232,6 +1241,24 @@ static int msi_init_virq(struct irq_domain *domain, int virq, unsigned int vflag
 	return 0;
 }
 
+static int populate_alloc_info(struct irq_domain *domain, struct device *dev,
+			       unsigned int nirqs, msi_alloc_info_t *arg)
+{
+	struct msi_domain_info *info = domain->host_data;
+
+	/*
+	 * If the caller has provided a template alloc info, use that. Once
+	 * all users of msi_create_irq_domain() have been eliminated, this
+	 * should be the only source of allocation information, and the
+	 * prepare call below should be finally removed.
+	 */
+	if (!info->alloc_data)
+		return msi_domain_prepare_irqs(domain, dev, nirqs, arg);
+
+	*arg = *info->alloc_data;
+	return 0;
+}
+
 static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain,
 				   struct msi_ctrl *ctrl)
 {
@@ -1244,7 +1271,7 @@ static int __msi_domain_alloc_irqs(struct device *dev, struct irq_domain *domain
 	unsigned long idx;
 	int i, ret, virq;
 
-	ret = msi_domain_prepare_irqs(domain, dev, ctrl->nirqs, &arg);
+	ret = populate_alloc_info(domain, dev, ctrl->nirqs, &arg);
 	if (ret)
 		return ret;
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/5] genirq/msi: Engage the .msi_teardown() callback on domain removal
  2025-05-13 16:31 [PATCH v2 0/5] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
                   ` (2 preceding siblings ...)
  2025-05-13 16:31 ` [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
@ 2025-05-13 16:31 ` Marc Zyngier
  2025-05-13 16:31 ` [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call Marc Zyngier
  4 siblings, 0 replies; 22+ messages in thread
From: Marc Zyngier @ 2025-05-13 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

Kindly inform the MSI driver that we are tearing down the domain,
providing the allocation context previously populated on domain
creation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 kernel/irq/msi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 07eb857efd15e..8f3d9f5ff1eb4 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1096,6 +1096,9 @@ void msi_remove_device_irq_domain(struct device *dev, unsigned int domid)
 
 	dev->msi.data->__domains[domid].domain = NULL;
 	info = domain->host_data;
+
+	info->ops->msi_teardown(domain, info->alloc_data);
+
 	if (irq_domain_is_msi_device(domain))
 		fwnode = domain->fwnode;
 	irq_domain_remove(domain);
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call
  2025-05-13 16:31 [PATCH v2 0/5] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
                   ` (3 preceding siblings ...)
  2025-05-13 16:31 ` [PATCH v2 4/5] genirq/msi: Engage the .msi_teardown() callback on domain removal Marc Zyngier
@ 2025-05-13 16:31 ` Marc Zyngier
  2025-05-18 18:53   ` Thomas Gleixner
  4 siblings, 1 reply; 22+ messages in thread
From: Marc Zyngier @ 2025-05-13 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: Thomas Gleixner, Lorenzo Pieralisi, Sascha Bischoff,
	Timothy Hayes

Now that .msi_prepare() gets called at the right time and not
with semi-random parameters, remove the ugly hack that tried
to fix up the number of allocated vectors.

It is now correct by construction.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/irqchip/irq-gic-v3-its-msi-parent.c | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
index 958736622fa57..6a5f64f120d4a 100644
--- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c
+++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c
@@ -67,17 +67,6 @@ static int its_pci_msi_prepare(struct irq_domain *domain, struct device *dev,
 	/* ITS specific DeviceID, as the core ITS ignores dev. */
 	info->scratchpad[0].ul = pci_msi_domain_get_msi_rid(domain->parent, pdev);
 
-	/*
-	 * @domain->msi_domain_info->hwsize contains the size of the
-	 * MSI[-X] domain, but vector allocation happens one by one. This
-	 * needs some thought when MSI comes into play as the size of MSI
-	 * might be unknown at domain creation time and therefore set to
-	 * MSI_MAX_INDEX.
-	 */
-	msi_info = msi_get_domain_info(domain);
-	if (msi_info->hwsize > nvec)
-		nvec = msi_info->hwsize;
-
 	/*
 	 * Always allocate a power of 2, and special case device 0 for
 	 * broken systems where the DevID is not wired (and all devices
@@ -151,14 +140,6 @@ static int its_pmsi_prepare(struct irq_domain *domain, struct device *dev,
 	/* ITS specific DeviceID, as the core ITS ignores dev. */
 	info->scratchpad[0].ul = dev_id;
 
-	/*
-	 * @domain->msi_domain_info->hwsize contains the size of the device
-	 * domain, but vector allocation happens one by one.
-	 */
-	msi_info = msi_get_domain_info(domain);
-	if (msi_info->hwsize > nvec)
-		nvec = msi_info->hwsize;
-
 	/* Allocate at least 32 MSIs, and always as a power of 2 */
 	nvec = max_t(int, 32, roundup_pow_of_two(nvec));
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call
  2025-05-13 16:31 ` [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call Marc Zyngier
@ 2025-05-18 18:53   ` Thomas Gleixner
  2025-05-19 10:15     ` Marc Zyngier
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2025-05-18 18:53 UTC (permalink / raw)
  To: Marc Zyngier, linux-kernel, linux-arm-kernel
  Cc: Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

On Tue, May 13 2025 at 17:31, Marc Zyngier wrote:

> Now that .msi_prepare() gets called at the right time and not
> with semi-random parameters, remove the ugly hack that tried
> to fix up the number of allocated vectors.
>
> It is now correct by construction.

FWIW, while looking at something related, it occured to me that with
this change you can enable MSI_FLAG_PCI_MSIX_ALLOC_DYN now on GIC ITS.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call
  2025-05-18 18:53   ` Thomas Gleixner
@ 2025-05-19 10:15     ` Marc Zyngier
  2025-05-19 12:16       ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Zyngier @ 2025-05-19 10:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Sun, 18 May 2025 19:53:42 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Tue, May 13 2025 at 17:31, Marc Zyngier wrote:
> 
> > Now that .msi_prepare() gets called at the right time and not
> > with semi-random parameters, remove the ugly hack that tried
> > to fix up the number of allocated vectors.
> >
> > It is now correct by construction.
> 
> FWIW, while looking at something related, it occured to me that with
> this change you can enable MSI_FLAG_PCI_MSIX_ALLOC_DYN now on GIC ITS.

Maybe. It is rather unclear to me what this "dynamic allocation"
actually provides in terms of guarantees to the endpoint driver.

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call
  2025-05-19 10:15     ` Marc Zyngier
@ 2025-05-19 12:16       ` Thomas Gleixner
  2025-05-19 14:28         ` Marc Zyngier
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Gleixner @ 2025-05-19 12:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, May 19 2025 at 11:15, Marc Zyngier wrote:
> On Sun, 18 May 2025 19:53:42 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>> On Tue, May 13 2025 at 17:31, Marc Zyngier wrote:
>> 
>> > Now that .msi_prepare() gets called at the right time and not
>> > with semi-random parameters, remove the ugly hack that tried
>> > to fix up the number of allocated vectors.
>> >
>> > It is now correct by construction.
>> 
>> FWIW, while looking at something related, it occured to me that with
>> this change you can enable MSI_FLAG_PCI_MSIX_ALLOC_DYN now on GIC ITS.
>
> Maybe. It is rather unclear to me what this "dynamic allocation"
> actually provides in terms of guarantees to the endpoint driver.

It allows the driver to avoid allocating a gazillion of interrupts
upfront during initialization. Instead it can allocate them on demand,
when e.g. a queue is initialized. Of course that means that such an
allocation can fail, but so can request_irq() and other things. I'm not
sure what you mean with guarantees here.

Thanks

        tglx


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call
  2025-05-19 12:16       ` Thomas Gleixner
@ 2025-05-19 14:28         ` Marc Zyngier
  2025-05-21 15:29           ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Zyngier @ 2025-05-19 14:28 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, 19 May 2025 13:16:58 +0100,
Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Mon, May 19 2025 at 11:15, Marc Zyngier wrote:
> > On Sun, 18 May 2025 19:53:42 +0100,
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >> 
> >> On Tue, May 13 2025 at 17:31, Marc Zyngier wrote:
> >> 
> >> > Now that .msi_prepare() gets called at the right time and not
> >> > with semi-random parameters, remove the ugly hack that tried
> >> > to fix up the number of allocated vectors.
> >> >
> >> > It is now correct by construction.
> >> 
> >> FWIW, while looking at something related, it occured to me that with
> >> this change you can enable MSI_FLAG_PCI_MSIX_ALLOC_DYN now on GIC ITS.
> >
> > Maybe. It is rather unclear to me what this "dynamic allocation"
> > actually provides in terms of guarantees to the endpoint driver.
> 
> It allows the driver to avoid allocating a gazillion of interrupts
> upfront during initialization. Instead it can allocate them on demand,
> when e.g. a queue is initialized. Of course that means that such an
> allocation can fail, but so can request_irq() and other things. I'm not
> sure what you mean with guarantees here.

What is the endpoint driver allowed to expect in terms of continuity
of allocation in the IRQ space? If this is solely limited to MSI-X,
then the answer probably is "none whatsoever", and the driver should
only manage the MSI descriptor index.

Can any other MSI-like mechanism end-up with multiple allocations and
require extra alignment/contiguity guarantees in the hwirq space, more
or less similar to what MultiMSI requires? Because that'd be much
harder to provide.

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call
  2025-05-19 14:28         ` Marc Zyngier
@ 2025-05-21 15:29           ` Thomas Gleixner
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2025-05-21 15:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Lorenzo Pieralisi,
	Sascha Bischoff, Timothy Hayes

On Mon, May 19 2025 at 15:28, Marc Zyngier wrote:
> On Mon, 19 May 2025 13:16:58 +0100,
> Thomas Gleixner <tglx@linutronix.de> wrote:
>> > Maybe. It is rather unclear to me what this "dynamic allocation"
>> > actually provides in terms of guarantees to the endpoint driver.
>> 
>> It allows the driver to avoid allocating a gazillion of interrupts
>> upfront during initialization. Instead it can allocate them on demand,
>> when e.g. a queue is initialized. Of course that means that such an
>> allocation can fail, but so can request_irq() and other things. I'm not
>> sure what you mean with guarantees here.
>
> What is the endpoint driver allowed to expect in terms of continuity
> of allocation in the IRQ space? If this is solely limited to MSI-X,
> then the answer probably is "none whatsoever", and the driver should
> only manage the MSI descriptor index.
>
> Can any other MSI-like mechanism end-up with multiple allocations and
> require extra alignment/contiguity guarantees in the hwirq space, more
> or less similar to what MultiMSI requires? Because that'd be much
> harder to provide.

It's only relevant to MSI-X today. That's the only facility, which
actually provides an interface _if_ the underlying parent supports it.

static const struct msi_domain_template pci_msix_template = {
       ....
	.info = {
		.flags			= MSI_COMMON_FLAGS | MSI_FLAG_PCI_MSIX |
					  MSI_FLAG_PCI_MSIX_ALLOC_DYN,
		.bus_token		= DOMAIN_BUS_PCI_DEVICE_MSIX,
	},
};

That's the device domain template, which requests the functionality and
the core then checks whether the parent domain supports it. If so the
functionality is enabled.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-05-13 16:31 ` [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
@ 2025-06-03  8:22   ` Zenghui Yu
  2025-06-03  9:35     ` Lorenzo Pieralisi
  2025-06-03 12:50     ` Marc Zyngier
  0 siblings, 2 replies; 22+ messages in thread
From: Zenghui Yu @ 2025-06-03  8:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

Hi Marc,

On 2025/5/14 0:31, Marc Zyngier wrote:
> The current device MSI infrastructure is subtly broken, as it
> will issue an .msi_prepare() callback into the MSI controller
> driver every time it needs to allocate an MSI. That's pretty wrong,
> as the contract (or unwarranted assumption, depending who you ask)
> between the MSI controller and the core code is that .msi_prepare()
> is called exactly once per device.
> 
> This leads to some subtle breakage in said MSI controller drivers,
> as it gives the impression that there are multiple endpoints sharing
> a bus identifier (RID in PCI parlance, DID for GICv3+). It implies
> that whatever allocation the ITS driver (for example) has done on
> behalf of these devices cannot be undone, as there is no way to
> track the shared state. This is particularly bad for wire-MSI devices,
> for which .msi_prepare() is called for. each. input. line.
> 
> To address this issue, move the call to .msi_prepare() to take place
> at the point of irq domain allocation, which is the only place that
> makes sense. The msi_alloc_info_t structure is made part of the
> msi_domain_template, so that its life-cycle is that of the domain
> as well.
> 
> Finally, the msi_info::alloc_data field is made to point at this
> allocation tracking structure, ensuring that it is carried around
> the block.
> 
> This is all pretty straightforward, except for the non-device-MSI
> leftovers, which still have to call .msi_prepare() at the old
> spot. One day...
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  include/linux/msi.h |  2 ++
>  kernel/irq/msi.c    | 35 +++++++++++++++++++++++++++++++----
>  2 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 63c23003ec9b7..ba1c77a829a1c 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -516,12 +516,14 @@ struct msi_domain_info {
>   * @chip:	Interrupt chip for this domain
>   * @ops:	MSI domain ops
>   * @info:	MSI domain info data
> + * @alloc_info:	MSI domain allocation data (arch specific)
>   */
>  struct msi_domain_template {
>  	char			name[48];
>  	struct irq_chip		chip;
>  	struct msi_domain_ops	ops;
>  	struct msi_domain_info	info;
> +	msi_alloc_info_t	alloc_info;
>  };
>  
>  /*
> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> index 31378a2535fb9..07eb857efd15e 100644
> --- a/kernel/irq/msi.c
> +++ b/kernel/irq/msi.c
> @@ -59,7 +59,8 @@ struct msi_ctrl {
>  static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl);
>  static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid);
>  static inline int msi_sysfs_create_group(struct device *dev);
> -
> +static int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
> +				   int nvec, msi_alloc_info_t *arg);
>  
>  /**
>   * msi_alloc_desc - Allocate an initialized msi_desc
> @@ -1023,6 +1024,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
>  	bundle->info.ops = &bundle->ops;
>  	bundle->info.data = domain_data;
>  	bundle->info.chip_data = chip_data;
> +	bundle->info.alloc_data = &bundle->alloc_info;
>  
>  	pops = parent->msi_parent_ops;
>  	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
> @@ -1061,11 +1063,18 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
>  	if (!domain)
>  		return false;
>  
> +	domain->dev = dev;
> +	dev->msi.data->__domains[domid].domain = domain;
> +
> +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {

Does it work for MSI? hwsize is 1 in the MSI case, without taking
pci_msi_vec_count() into account.

bool pci_setup_msi_device_domain(struct pci_dev *pdev)
{
	[...]

	return pci_create_device_domain(pdev, &pci_msi_template, 1);

Thanks,
Zenghui


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03  8:22   ` Zenghui Yu
@ 2025-06-03  9:35     ` Lorenzo Pieralisi
  2025-06-03 13:09       ` Marc Zyngier
  2025-06-03 12:50     ` Marc Zyngier
  1 sibling, 1 reply; 22+ messages in thread
From: Lorenzo Pieralisi @ 2025-06-03  9:35 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: Marc Zyngier, linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Sascha Bischoff, Timothy Hayes

On Tue, Jun 03, 2025 at 04:22:47PM +0800, Zenghui Yu wrote:
> Hi Marc,
> 
> On 2025/5/14 0:31, Marc Zyngier wrote:
> > The current device MSI infrastructure is subtly broken, as it
> > will issue an .msi_prepare() callback into the MSI controller
> > driver every time it needs to allocate an MSI. That's pretty wrong,
> > as the contract (or unwarranted assumption, depending who you ask)
> > between the MSI controller and the core code is that .msi_prepare()
> > is called exactly once per device.
> > 
> > This leads to some subtle breakage in said MSI controller drivers,
> > as it gives the impression that there are multiple endpoints sharing
> > a bus identifier (RID in PCI parlance, DID for GICv3+). It implies
> > that whatever allocation the ITS driver (for example) has done on
> > behalf of these devices cannot be undone, as there is no way to
> > track the shared state. This is particularly bad for wire-MSI devices,
> > for which .msi_prepare() is called for. each. input. line.
> > 
> > To address this issue, move the call to .msi_prepare() to take place
> > at the point of irq domain allocation, which is the only place that
> > makes sense. The msi_alloc_info_t structure is made part of the
> > msi_domain_template, so that its life-cycle is that of the domain
> > as well.
> > 
> > Finally, the msi_info::alloc_data field is made to point at this
> > allocation tracking structure, ensuring that it is carried around
> > the block.
> > 
> > This is all pretty straightforward, except for the non-device-MSI
> > leftovers, which still have to call .msi_prepare() at the old
> > spot. One day...
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  include/linux/msi.h |  2 ++
> >  kernel/irq/msi.c    | 35 +++++++++++++++++++++++++++++++----
> >  2 files changed, 33 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > index 63c23003ec9b7..ba1c77a829a1c 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -516,12 +516,14 @@ struct msi_domain_info {
> >   * @chip:	Interrupt chip for this domain
> >   * @ops:	MSI domain ops
> >   * @info:	MSI domain info data
> > + * @alloc_info:	MSI domain allocation data (arch specific)
> >   */
> >  struct msi_domain_template {
> >  	char			name[48];
> >  	struct irq_chip		chip;
> >  	struct msi_domain_ops	ops;
> >  	struct msi_domain_info	info;
> > +	msi_alloc_info_t	alloc_info;
> >  };
> >  
> >  /*
> > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
> > index 31378a2535fb9..07eb857efd15e 100644
> > --- a/kernel/irq/msi.c
> > +++ b/kernel/irq/msi.c
> > @@ -59,7 +59,8 @@ struct msi_ctrl {
> >  static void msi_domain_free_locked(struct device *dev, struct msi_ctrl *ctrl);
> >  static unsigned int msi_domain_get_hwsize(struct device *dev, unsigned int domid);
> >  static inline int msi_sysfs_create_group(struct device *dev);
> > -
> > +static int msi_domain_prepare_irqs(struct irq_domain *domain, struct device *dev,
> > +				   int nvec, msi_alloc_info_t *arg);
> >  
> >  /**
> >   * msi_alloc_desc - Allocate an initialized msi_desc
> > @@ -1023,6 +1024,7 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
> >  	bundle->info.ops = &bundle->ops;
> >  	bundle->info.data = domain_data;
> >  	bundle->info.chip_data = chip_data;
> > +	bundle->info.alloc_data = &bundle->alloc_info;
> >  
> >  	pops = parent->msi_parent_ops;
> >  	snprintf(bundle->name, sizeof(bundle->name), "%s%s-%s",
> > @@ -1061,11 +1063,18 @@ bool msi_create_device_irq_domain(struct device *dev, unsigned int domid,
> >  	if (!domain)
> >  		return false;
> >  
> > +	domain->dev = dev;
> > +	dev->msi.data->__domains[domid].domain = domain;
> > +
> > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> 
> Does it work for MSI?

This means that it does not work for MSI for you as it stands, right ?

If you spotted an issue, thanks for that, report it fully please.

> hwsize is 1 in the MSI case, without taking pci_msi_vec_count() into account.
> 
> bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> {
> 	[...]
> 
> 	return pci_create_device_domain(pdev, &pci_msi_template, 1);

I had a stab at it with GICv5 models and an MSI capable device and this indeed
calls the ITS msi_prepare() callback with 1 as vector count, so we size
the device tables wrongly.

The question is why pci_create_device_domain() is called here with
hwsize == 1. Probably, before this series, the ITS MSI parent code was
fixing the size up so we did not notice, I need to check.

Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03  8:22   ` Zenghui Yu
  2025-06-03  9:35     ` Lorenzo Pieralisi
@ 2025-06-03 12:50     ` Marc Zyngier
  2025-06-03 13:07       ` Zenghui Yu
  2025-06-03 13:29       ` Lorenzo Pieralisi
  1 sibling, 2 replies; 22+ messages in thread
From: Marc Zyngier @ 2025-06-03 12:50 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

Hi Zenghui,

On Tue, 03 Jun 2025 09:22:47 +0100,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> > +	domain->dev = dev;
> > +	dev->msi.data->__domains[domid].domain = domain;
> > +
> > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> 
> Does it work for MSI? hwsize is 1 in the MSI case, without taking
> pci_msi_vec_count() into account.
>
> bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> {
> 	[...]
> 
> 	return pci_create_device_domain(pdev, &pci_msi_template, 1);

Well spotted.

This looks like a PCI bug ignoring Multi-MSI. Can you give the
following a go and let people know whether that fixes your issue?

Thanks,

	M.

diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index d7ba8795d60f..89677a21d525 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -287,7 +287,7 @@ static bool pci_create_device_domain(struct pci_dev *pdev, const struct msi_doma
  *	- The device is removed
  *	- MSI is disabled and a MSI-X domain is created
  */
-bool pci_setup_msi_device_domain(struct pci_dev *pdev)
+bool pci_setup_msi_device_domain(struct pci_dev *pdev, unsigned int hwsize)
 {
 	if (WARN_ON_ONCE(pdev->msix_enabled))
 		return false;
@@ -297,7 +297,7 @@ bool pci_setup_msi_device_domain(struct pci_dev *pdev)
 	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
 		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
 
-	return pci_create_device_domain(pdev, &pci_msi_template, 1);
+	return pci_create_device_domain(pdev, &pci_msi_template, hwsize);
 }
 
 /**
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 8b8848788618..81891701840a 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -449,7 +449,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (rc)
 		return rc;
 
-	if (!pci_setup_msi_device_domain(dev))
+	if (!pci_setup_msi_device_domain(dev, nvec))
 		return -ENODEV;
 
 	for (;;) {
diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
index ee53cf079f4e..3ab898af88a7 100644
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -107,7 +107,7 @@ enum support_mode {
 };
 
 bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, enum support_mode mode);
-bool pci_setup_msi_device_domain(struct pci_dev *pdev);
+bool pci_setup_msi_device_domain(struct pci_dev *pdev, unsigned int hwsize);
 bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize);
 
 /* Legacy (!IRQDOMAIN) fallbacks */

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03 12:50     ` Marc Zyngier
@ 2025-06-03 13:07       ` Zenghui Yu
  2025-06-03 13:27         ` Marc Zyngier
  2025-06-03 13:29       ` Lorenzo Pieralisi
  1 sibling, 1 reply; 22+ messages in thread
From: Zenghui Yu @ 2025-06-03 13:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

Hi Marc,

On 2025/6/3 20:50, Marc Zyngier wrote:
> Hi Zenghui,
> 
> On Tue, 03 Jun 2025 09:22:47 +0100,
> Zenghui Yu <yuzenghui@huawei.com> wrote:
> >
> > > +	domain->dev = dev;
> > > +	dev->msi.data->__domains[domid].domain = domain;
> > > +
> > > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> >
> > Does it work for MSI? hwsize is 1 in the MSI case, without taking
> > pci_msi_vec_count() into account.
> >
> > bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> > {
> > 	[...]
> >
> > 	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> 
> Well spotted.
> 
> This looks like a PCI bug ignoring Multi-MSI. Can you give the
> following a go and let people know whether that fixes your issue?

I hit this problem on Kunpeng920 with some HiSilicon SAS (Serial
Attached SCSI controller) on it. These controllers are MSI-capable and
didn't work after this commit.

# lspci -v -s 74:02.0
74:02.0 Serial Attached SCSI controller: Huawei Technologies Co., Ltd.
HiSilicon SAS 3.0 HBA (rev 21)
	Flags: bus master, fast devsel, latency 0, IRQ 42, NUMA node 0, IOMMU
group 27
	Memory at a2000000 (32-bit, non-prefetchable) [size=32K]
	Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [80] MSI: Enable+ Count=32/32 Maskable+ 64bit+
	Capabilities: [b0] Power Management version 3
	Kernel driver in use: hisi_sas_v3_hw

> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> index d7ba8795d60f..89677a21d525 100644
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -287,7 +287,7 @@ static bool pci_create_device_domain(struct pci_dev *pdev, const struct msi_doma
>   *	- The device is removed
>   *	- MSI is disabled and a MSI-X domain is created
>   */
> -bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> +bool pci_setup_msi_device_domain(struct pci_dev *pdev, unsigned int hwsize)
>  {
>  	if (WARN_ON_ONCE(pdev->msix_enabled))
>  		return false;
> @@ -297,7 +297,7 @@ bool pci_setup_msi_device_domain(struct pci_dev *pdev)
>  	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
>  		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
>  
> -	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> +	return pci_create_device_domain(pdev, &pci_msi_template, hwsize);
>  }
>  
>  /**
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 8b8848788618..81891701840a 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -449,7 +449,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  	if (rc)
>  		return rc;
>  
> -	if (!pci_setup_msi_device_domain(dev))
> +	if (!pci_setup_msi_device_domain(dev, nvec))
>  		return -ENODEV;
>  
>  	for (;;) {
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index ee53cf079f4e..3ab898af88a7 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -107,7 +107,7 @@ enum support_mode {
>  };
>  
>  bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, enum support_mode mode);
> -bool pci_setup_msi_device_domain(struct pci_dev *pdev);
> +bool pci_setup_msi_device_domain(struct pci_dev *pdev, unsigned int hwsize);
>  bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize);
>  
>  /* Legacy (!IRQDOMAIN) fallbacks */

I have the exact same diff to get my box to work again ;-)

Tested-by: Zenghui Yu <yuzenghui@huawei.com>

Thanks for your fix!

Zenghui


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03  9:35     ` Lorenzo Pieralisi
@ 2025-06-03 13:09       ` Marc Zyngier
  2025-06-03 14:37         ` Lorenzo Pieralisi
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Zyngier @ 2025-06-03 13:09 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Zenghui Yu, linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Sascha Bischoff, Timothy Hayes

On Tue, 03 Jun 2025 10:35:51 +0100,
Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> 
> On Tue, Jun 03, 2025 at 04:22:47PM +0800, Zenghui Yu wrote:
> > > +	domain->dev = dev;
> > > +	dev->msi.data->__domains[domid].domain = domain;
> > > +
> > > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> > 
> > Does it work for MSI?
> 
> This means that it does not work for MSI for you as it stands, right ?
> 
> If you spotted an issue, thanks for that, report it fully please.

Honestly, you're barking up the wrong tree. Zenghui points us to a
glaring bug in the core code, with detailed information on what could
go wrong, as well as what is wrong in the code. It doesn't get better
than that.

The usual level of bug reports is "its b0rken", sometimes followed by
a trace with lots of hex and no information. Spot the difference?

> 
> > hwsize is 1 in the MSI case, without taking pci_msi_vec_count() into account.
> > 
> > bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> > {
> > 	[...]
> > 
> > 	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> 
> I had a stab at it with GICv5 models and an MSI capable device and this indeed
> calls the ITS msi_prepare() callback with 1 as vector count, so we size
> the device tables wrongly.

Not wrongly. Exactly as instructed.

> 
> The question is why pci_create_device_domain() is called here with
> hwsize == 1. Probably, before this series, the ITS MSI parent code was
> fixing the size up so we did not notice, I need to check.

The GICv3 ITS code would upgrade the vector count to the next power of
two (one bit of EID space -> 2 MSIs), but with the device domain
squarely set to 1, the endpoint driver would never get more. It is
prepared to fail gracefully though, hence nothing really breaks.

I don't think this patch makes anything regress though. Commit
15c72f824b327 seems to be the offending one. If Zenghui confirms that
the hack I posted separately works for him, I'll follow up with a
"real" patch.

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03 13:07       ` Zenghui Yu
@ 2025-06-03 13:27         ` Marc Zyngier
  0 siblings, 0 replies; 22+ messages in thread
From: Marc Zyngier @ 2025-06-03 13:27 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Lorenzo Pieralisi, Sascha Bischoff, Timothy Hayes

On Tue, 03 Jun 2025 14:07:04 +0100,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2025/6/3 20:50, Marc Zyngier wrote:
> > Hi Zenghui,
> > 
> > On Tue, 03 Jun 2025 09:22:47 +0100,
> > Zenghui Yu <yuzenghui@huawei.com> wrote:
> > >
> > > > +	domain->dev = dev;
> > > > +	dev->msi.data->__domains[domid].domain = domain;
> > > > +
> > > > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> > >
> > > Does it work for MSI? hwsize is 1 in the MSI case, without taking
> > > pci_msi_vec_count() into account.
> > >
> > > bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> > > {
> > > 	[...]
> > >
> > > 	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> > 
> > Well spotted.
> > 
> > This looks like a PCI bug ignoring Multi-MSI. Can you give the
> > following a go and let people know whether that fixes your issue?
> 
> I hit this problem on Kunpeng920 with some HiSilicon SAS (Serial
> Attached SCSI controller) on it. These controllers are MSI-capable and
> didn't work after this commit.

That's interesting. It means that the sizing of the irqdomain to 1
wasn't getting in the way, even if that was obviously wrong. The funny
thing is that the msi_desc would still be OK, as we only have one for
MSI, no matter how many vectors are used.

>
> I have the exact same diff to get my box to work again ;-)

Ah ;-)

>
> Tested-by: Zenghui Yu <yuzenghui@huawei.com>
> 
> Thanks for your fix!

Thank you!

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03 12:50     ` Marc Zyngier
  2025-06-03 13:07       ` Zenghui Yu
@ 2025-06-03 13:29       ` Lorenzo Pieralisi
  2025-06-03 14:03         ` Marc Zyngier
  1 sibling, 1 reply; 22+ messages in thread
From: Lorenzo Pieralisi @ 2025-06-03 13:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Zenghui Yu, linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Sascha Bischoff, Timothy Hayes

On Tue, Jun 03, 2025 at 01:50:47PM +0100, Marc Zyngier wrote:
> Hi Zenghui,
> 
> On Tue, 03 Jun 2025 09:22:47 +0100,
> Zenghui Yu <yuzenghui@huawei.com> wrote:
> > 
> > > +	domain->dev = dev;
> > > +	dev->msi.data->__domains[domid].domain = domain;
> > > +
> > > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> > 
> > Does it work for MSI? hwsize is 1 in the MSI case, without taking
> > pci_msi_vec_count() into account.
> >
> > bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> > {
> > 	[...]
> > 
> > 	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> 
> Well spotted.
> 
> This looks like a PCI bug ignoring Multi-MSI. Can you give the
> following a go and let people know whether that fixes your issue?
> 
> Thanks,
> 
> 	M.
> 
> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> index d7ba8795d60f..89677a21d525 100644
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -287,7 +287,7 @@ static bool pci_create_device_domain(struct pci_dev *pdev, const struct msi_doma
>   *	- The device is removed
>   *	- MSI is disabled and a MSI-X domain is created
>   */
> -bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> +bool pci_setup_msi_device_domain(struct pci_dev *pdev, unsigned int hwsize)
>  {
>  	if (WARN_ON_ONCE(pdev->msix_enabled))
>  		return false;
> @@ -297,7 +297,7 @@ bool pci_setup_msi_device_domain(struct pci_dev *pdev)
>  	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
>  		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
>  
> -	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> +	return pci_create_device_domain(pdev, &pci_msi_template, hwsize);
>  }
>  
>  /**
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 8b8848788618..81891701840a 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -449,7 +449,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  	if (rc)
>  		return rc;
>  
> -	if (!pci_setup_msi_device_domain(dev))
> +	if (!pci_setup_msi_device_domain(dev, nvec))

If pci_msi_vec_count(dev) > maxvec we would cap nvec and size the
domain with the capped value.

In __pci_enable_msix_range() we are sizing the device according to
pci_msix_vec_count(dev) regardless of maxvec, if I read the code correctly.

While fixing it it would be good to make them consistent unless there is
a reason why they should not.

Lorenzo

>  		return -ENODEV;
>  
>  	for (;;) {
> diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
> index ee53cf079f4e..3ab898af88a7 100644
> --- a/drivers/pci/msi/msi.h
> +++ b/drivers/pci/msi/msi.h
> @@ -107,7 +107,7 @@ enum support_mode {
>  };
>  
>  bool pci_msi_domain_supports(struct pci_dev *dev, unsigned int feature_mask, enum support_mode mode);
> -bool pci_setup_msi_device_domain(struct pci_dev *pdev);
> +bool pci_setup_msi_device_domain(struct pci_dev *pdev, unsigned int hwsize);
>  bool pci_setup_msix_device_domain(struct pci_dev *pdev, unsigned int hwsize);
>  
>  /* Legacy (!IRQDOMAIN) fallbacks */
> 
> -- 
> Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03 13:29       ` Lorenzo Pieralisi
@ 2025-06-03 14:03         ` Marc Zyngier
  2025-06-03 14:08           ` Marc Zyngier
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Zyngier @ 2025-06-03 14:03 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Zenghui Yu, linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Sascha Bischoff, Timothy Hayes

On Tue, 03 Jun 2025 14:29:58 +0100,
Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> 
> On Tue, Jun 03, 2025 at 01:50:47PM +0100, Marc Zyngier wrote:
> > Hi Zenghui,
> > 
> > On Tue, 03 Jun 2025 09:22:47 +0100,
> > Zenghui Yu <yuzenghui@huawei.com> wrote:
> > > 
> > > > +	domain->dev = dev;
> > > > +	dev->msi.data->__domains[domid].domain = domain;
> > > > +
> > > > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> > > 
> > > Does it work for MSI? hwsize is 1 in the MSI case, without taking
> > > pci_msi_vec_count() into account.
> > >
> > > bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> > > {
> > > 	[...]
> > > 
> > > 	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> > 
> > Well spotted.
> > 
> > This looks like a PCI bug ignoring Multi-MSI. Can you give the
> > following a go and let people know whether that fixes your issue?
> > 
> > Thanks,
> > 
> > 	M.
> > 
> > diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> > index d7ba8795d60f..89677a21d525 100644
> > --- a/drivers/pci/msi/irqdomain.c
> > +++ b/drivers/pci/msi/irqdomain.c
> > @@ -287,7 +287,7 @@ static bool pci_create_device_domain(struct pci_dev *pdev, const struct msi_doma
> >   *	- The device is removed
> >   *	- MSI is disabled and a MSI-X domain is created
> >   */
> > -bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> > +bool pci_setup_msi_device_domain(struct pci_dev *pdev, unsigned int hwsize)
> >  {
> >  	if (WARN_ON_ONCE(pdev->msix_enabled))
> >  		return false;
> > @@ -297,7 +297,7 @@ bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> >  	if (pci_match_device_domain(pdev, DOMAIN_BUS_PCI_DEVICE_MSIX))
> >  		msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
> >  
> > -	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> > +	return pci_create_device_domain(pdev, &pci_msi_template, hwsize);
> >  }
> >  
> >  /**
> > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> > index 8b8848788618..81891701840a 100644
> > --- a/drivers/pci/msi/msi.c
> > +++ b/drivers/pci/msi/msi.c
> > @@ -449,7 +449,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
> >  	if (rc)
> >  		return rc;
> >  
> > -	if (!pci_setup_msi_device_domain(dev))
> > +	if (!pci_setup_msi_device_domain(dev, nvec))
> 
> If pci_msi_vec_count(dev) > maxvec we would cap nvec and size the
> domain with the capped value.
> 
> In __pci_enable_msix_range() we are sizing the device according to
> pci_msix_vec_count(dev) regardless of maxvec, if I read the code correctly.
> 
> While fixing it it would be good to make them consistent unless there is
> a reason why they should not.

This is indeed odd, but that'd be a separate fix. Something like:

diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 2090eef64b14..6ede55a7c5e6 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -439,9 +439,6 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (nvec < minvec)
 		return -ENOSPC;
 
-	if (nvec > maxvec)
-		nvec = maxvec;
-
 	rc = pci_setup_msi_context(dev);
 	if (rc)
 		return rc;
@@ -449,6 +446,9 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	if (!pci_setup_msi_device_domain(dev, nvec))
 		return -ENODEV;
 
+	if (nvec > maxvec)
+		nvec = maxvec;
+
 	for (;;) {
 		if (affd) {
 			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03 14:03         ` Marc Zyngier
@ 2025-06-03 14:08           ` Marc Zyngier
  0 siblings, 0 replies; 22+ messages in thread
From: Marc Zyngier @ 2025-06-03 14:08 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Zenghui Yu, linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Sascha Bischoff, Timothy Hayes

On Tue, 03 Jun 2025 15:03:26 +0100,
Marc Zyngier <maz@kernel.org> wrote:
> 
> On Tue, 03 Jun 2025 14:29:58 +0100,
> Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> > 
> > > +	if (!pci_setup_msi_device_domain(dev, nvec))
> > 
> > If pci_msi_vec_count(dev) > maxvec we would cap nvec and size the
> > domain with the capped value.
> > 
> > In __pci_enable_msix_range() we are sizing the device according to
> > pci_msix_vec_count(dev) regardless of maxvec, if I read the code correctly.
> > 
> > While fixing it it would be good to make them consistent unless there is
> > a reason why they should not.
> 
> This is indeed odd, but that'd be a separate fix. Something like:
> 
> diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
> index 2090eef64b14..6ede55a7c5e6 100644
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -439,9 +439,6 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  	if (nvec < minvec)
>  		return -ENOSPC;
>  
> -	if (nvec > maxvec)
> -		nvec = maxvec;
> -
>  	rc = pci_setup_msi_context(dev);
>  	if (rc)
>  		return rc;
> @@ -449,6 +446,9 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
>  	if (!pci_setup_msi_device_domain(dev, nvec))
>  		return -ENODEV;
>  
> +	if (nvec > maxvec)
> +		nvec = maxvec;
> +
>  	for (;;) {
>  		if (affd) {
>  			nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
> 

Actually, let's lump the two together. There is no reason to wait.

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03 13:09       ` Marc Zyngier
@ 2025-06-03 14:37         ` Lorenzo Pieralisi
  2025-06-04  1:52           ` Zenghui Yu
  0 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Pieralisi @ 2025-06-03 14:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Zenghui Yu, linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Sascha Bischoff, Timothy Hayes

On Tue, Jun 03, 2025 at 02:09:50PM +0100, Marc Zyngier wrote:
> On Tue, 03 Jun 2025 10:35:51 +0100,
> Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> > 
> > On Tue, Jun 03, 2025 at 04:22:47PM +0800, Zenghui Yu wrote:
> > > > +	domain->dev = dev;
> > > > +	dev->msi.data->__domains[domid].domain = domain;
> > > > +
> > > > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> > > 
> > > Does it work for MSI?
> > 
> > This means that it does not work for MSI for you as it stands, right ?
> > 
> > If you spotted an issue, thanks for that, report it fully please.
> 
> Honestly, you're barking up the wrong tree. Zenghui points us to a
> glaring bug in the core code, with detailed information on what could
> go wrong, as well as what is wrong in the code. It doesn't get better
> than that.
> 
> The usual level of bug reports is "its b0rken", sometimes followed by
> a trace with lots of hex and no information. Spot the difference?

Agreed, thanks again Zenghui for reporting it and forgive me if the
message sounded a bit patronizing, I did not mean it.

Lorenzo

> > > hwsize is 1 in the MSI case, without taking pci_msi_vec_count() into account.
> > > 
> > > bool pci_setup_msi_device_domain(struct pci_dev *pdev)
> > > {
> > > 	[...]
> > > 
> > > 	return pci_create_device_domain(pdev, &pci_msi_template, 1);
> > 
> > I had a stab at it with GICv5 models and an MSI capable device and this indeed
> > calls the ITS msi_prepare() callback with 1 as vector count, so we size
> > the device tables wrongly.
> 
> Not wrongly. Exactly as instructed.
> 
> > 
> > The question is why pci_create_device_domain() is called here with
> > hwsize == 1. Probably, before this series, the ITS MSI parent code was
> > fixing the size up so we did not notice, I need to check.
> 
> The GICv3 ITS code would upgrade the vector count to the next power of
> two (one bit of EID space -> 2 MSIs), but with the device domain
> squarely set to 1, the endpoint driver would never get more. It is
> prepared to fail gracefully though, hence nothing really breaks.
> 
> I don't think this patch makes anything regress though. Commit
> 15c72f824b327 seems to be the offending one. If Zenghui confirms that
> the hack I posted separately works for him, I'll follow up with a
> "real" patch.
> 
> 	M.
> 
> -- 
> Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation
  2025-06-03 14:37         ` Lorenzo Pieralisi
@ 2025-06-04  1:52           ` Zenghui Yu
  0 siblings, 0 replies; 22+ messages in thread
From: Zenghui Yu @ 2025-06-04  1:52 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Marc Zyngier, linux-kernel, linux-arm-kernel, Thomas Gleixner,
	Sascha Bischoff, Timothy Hayes

On 2025/6/3 22:37, Lorenzo Pieralisi wrote:
> On Tue, Jun 03, 2025 at 02:09:50PM +0100, Marc Zyngier wrote:
> > On Tue, 03 Jun 2025 10:35:51 +0100,
> > Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
> > >
> > > On Tue, Jun 03, 2025 at 04:22:47PM +0800, Zenghui Yu wrote:
> > > > > +	domain->dev = dev;
> > > > > +	dev->msi.data->__domains[domid].domain = domain;
> > > > > +
> > > > > +	if (msi_domain_prepare_irqs(domain, dev, hwsize, &bundle->alloc_info)) {
> > > >
> > > > Does it work for MSI?
> > >
> > > This means that it does not work for MSI for you as it stands, right ?
> > >
> > > If you spotted an issue, thanks for that, report it fully please.
> >
> > Honestly, you're barking up the wrong tree. Zenghui points us to a
> > glaring bug in the core code, with detailed information on what could
> > go wrong, as well as what is wrong in the code. It doesn't get better
> > than that.
> >
> > The usual level of bug reports is "its b0rken", sometimes followed by
> > a trace with lots of hex and no information. Spot the difference?
> 
> Agreed, thanks again Zenghui for reporting it and forgive me if the
> message sounded a bit patronizing, I did not mean it.

Never mind :-)

Zenghui


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-06-04  3:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 16:31 [PATCH v2 0/5] genirq/msi: Fix device MSI prepare/alloc sequencing Marc Zyngier
2025-05-13 16:31 ` [PATCH v2 1/5] genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() Marc Zyngier
2025-05-13 16:31 ` [PATCH v2 2/5] irqchip/gic-v3-its: Implement .msi_teardown() callback Marc Zyngier
2025-05-13 16:31 ` [PATCH v2 3/5] genirq/msi: Move prepare() call to per-device allocation Marc Zyngier
2025-06-03  8:22   ` Zenghui Yu
2025-06-03  9:35     ` Lorenzo Pieralisi
2025-06-03 13:09       ` Marc Zyngier
2025-06-03 14:37         ` Lorenzo Pieralisi
2025-06-04  1:52           ` Zenghui Yu
2025-06-03 12:50     ` Marc Zyngier
2025-06-03 13:07       ` Zenghui Yu
2025-06-03 13:27         ` Marc Zyngier
2025-06-03 13:29       ` Lorenzo Pieralisi
2025-06-03 14:03         ` Marc Zyngier
2025-06-03 14:08           ` Marc Zyngier
2025-05-13 16:31 ` [PATCH v2 4/5] genirq/msi: Engage the .msi_teardown() callback on domain removal Marc Zyngier
2025-05-13 16:31 ` [PATCH v2 5/5] irqchip/gic-v3-its: Use allocation size from the prepare call Marc Zyngier
2025-05-18 18:53   ` Thomas Gleixner
2025-05-19 10:15     ` Marc Zyngier
2025-05-19 12:16       ` Thomas Gleixner
2025-05-19 14:28         ` Marc Zyngier
2025-05-21 15:29           ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).