linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain
@ 2025-08-11  9:28 Nam Cao
  2025-08-11  9:28 ` [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers Nam Cao
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Nam Cao @ 2025-08-11  9:28 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Thomas Gleixner, Marc Zyngier, Gautam Menghani,
	linuxppc-dev, linux-kernel
  Cc: Nam Cao

The initial implementation of PCI/MSI interrupt domains in the hierarchical
interrupt domain model used a shortcut by providing a global PCI/MSI
domain.

This works because the PCI/MSI[X] hardware is standardized and uniform, but
it violates the basic design principle of hierarchical interrupt domains:
Each hardware block involved in the interrupt delivery chain should have a
separate interrupt domain.

For PCI/MSI[X], the interrupt controller is per PCI device and not a global
made-up entity.

Unsurprisingly, the shortcut turned out to have downsides as it does not
allow dynamic allocation of interrupt vectors after initialization and it
prevents supporting IMS on PCI. For further details, see:

https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de/

The solution is implementing per device MSI domains, this means the
entities which provide global PCI/MSI domain so far have to implement MSI
parent domain functionality instead.

This series:

   - Untangle XIVE driver from Powernv and Pseries drivers

   - Convert the Powernv and Pseries drivers to implement MSI parent domain
     functionality

v2: Fix up boot issue with NVMe

Nam Cao (3):
  powerpc/xive: Untangle xive from child interrupt controller drivers
  powerpc/powernv/pci: Switch to use msi_create_parent_irq_domain()
  powerpc/pseries/msi: Switch to msi_create_parent_irq_domain()

 arch/powerpc/include/asm/pci-bridge.h     |   2 -
 arch/powerpc/include/asm/xive.h           |   1 -
 arch/powerpc/platforms/powernv/Kconfig    |   1 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  96 ++++++----------
 arch/powerpc/platforms/pseries/Kconfig    |   1 +
 arch/powerpc/platforms/pseries/msi.c      | 132 ++++++++--------------
 arch/powerpc/sysdev/xive/common.c         |  63 +++++------
 7 files changed, 117 insertions(+), 179 deletions(-)

-- 
2.39.5



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers
  2025-08-11  9:28 [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Nam Cao
@ 2025-08-11  9:28 ` Nam Cao
  2025-10-07  3:24   ` Ritesh Harjani
  2025-08-11  9:28 ` [PATCH v2 2/3] powerpc/powernv/pci: Switch to use msi_create_parent_irq_domain() Nam Cao
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Nam Cao @ 2025-08-11  9:28 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Thomas Gleixner, Marc Zyngier, Gautam Menghani,
	linuxppc-dev, linux-kernel
  Cc: Nam Cao

xive-specific data is stored in handler_data. This creates a mess, as xive
has to rely on child interrupt controller drivers to clean up this data, as
was done by 9a014f45688 ("powerpc/pseries/pci: Add a msi_free() handler to
clear XIVE data").

Instead, store xive-specific data in chip_data and untangle the child
drivers.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
v2: no change
---
 arch/powerpc/include/asm/xive.h           |  1 -
 arch/powerpc/platforms/powernv/pci-ioda.c | 21 +-------
 arch/powerpc/platforms/pseries/msi.c      | 18 +------
 arch/powerpc/sysdev/xive/common.c         | 63 +++++++++++------------
 4 files changed, 33 insertions(+), 70 deletions(-)

diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
index 92930b0b5d0e..efb0f5effcc6 100644
--- a/arch/powerpc/include/asm/xive.h
+++ b/arch/powerpc/include/asm/xive.h
@@ -111,7 +111,6 @@ void xive_native_free_vp_block(u32 vp_base);
 int xive_native_populate_irq_data(u32 hw_irq,
 				  struct xive_irq_data *data);
 void xive_cleanup_irq_data(struct xive_irq_data *xd);
-void xive_irq_free_data(unsigned int virq);
 void xive_native_free_irq(u32 irq);
 int xive_native_configure_irq(u32 hw_irq, u32 target, u8 prio, u32 sw_irq);
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d8ccf2c9b98a..fb37098d4d58 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -37,7 +37,6 @@
 #include <asm/firmware.h>
 #include <asm/pnv-pci.h>
 #include <asm/mmzone.h>
-#include <asm/xive.h>
 
 #include "powernv.h"
 #include "pci.h"
@@ -1707,23 +1706,6 @@ static int __pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 	return 0;
 }
 
-/*
- * The msi_free() op is called before irq_domain_free_irqs_top() when
- * the handler data is still available. Use that to clear the XIVE
- * controller.
- */
-static void pnv_msi_ops_msi_free(struct irq_domain *domain,
-				 struct msi_domain_info *info,
-				 unsigned int irq)
-{
-	if (xive_enabled())
-		xive_irq_free_data(irq);
-}
-
-static struct msi_domain_ops pnv_pci_msi_domain_ops = {
-	.msi_free	= pnv_msi_ops_msi_free,
-};
-
 static void pnv_msi_shutdown(struct irq_data *d)
 {
 	d = d->parent_data;
@@ -1754,7 +1736,6 @@ static struct irq_chip pnv_pci_msi_irq_chip = {
 static struct msi_domain_info pnv_msi_domain_info = {
 	.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
 		  MSI_FLAG_MULTI_PCI_MSI  | MSI_FLAG_PCI_MSIX),
-	.ops   = &pnv_pci_msi_domain_ops,
 	.chip  = &pnv_pci_msi_irq_chip,
 };
 
@@ -1870,7 +1851,7 @@ static void pnv_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 		 virq, d->hwirq, nr_irqs);
 
 	msi_bitmap_free_hwirqs(&phb->msi_bmp, d->hwirq, nr_irqs);
-	/* XIVE domain is cleared through ->msi_free() */
+	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
 }
 
 static const struct irq_domain_ops pnv_irq_domain_ops = {
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index ee1c8c6898a3..10712477938e 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -15,7 +15,6 @@
 #include <asm/hw_irq.h>
 #include <asm/ppc-pci.h>
 #include <asm/machdep.h>
-#include <asm/xive.h>
 
 #include "pseries.h"
 
@@ -436,19 +435,6 @@ static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev
 	return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
 }
 
-/*
- * ->msi_free() is called before irq_domain_free_irqs_top() when the
- * handler data is still available. Use that to clear the XIVE
- * controller data.
- */
-static void pseries_msi_ops_msi_free(struct irq_domain *domain,
-				     struct msi_domain_info *info,
-				     unsigned int irq)
-{
-	if (xive_enabled())
-		xive_irq_free_data(irq);
-}
-
 /*
  * RTAS can not disable one MSI at a time. It's all or nothing. Do it
  * at the end after all IRQs have been freed.
@@ -463,7 +449,6 @@ static void pseries_msi_post_free(struct irq_domain *domain, struct device *dev)
 
 static struct msi_domain_ops pseries_pci_msi_domain_ops = {
 	.msi_prepare	= pseries_msi_ops_prepare,
-	.msi_free	= pseries_msi_ops_msi_free,
 	.msi_post_free	= pseries_msi_post_free,
 };
 
@@ -604,8 +589,7 @@ static void pseries_irq_domain_free(struct irq_domain *domain, unsigned int virq
 	struct pci_controller *phb = irq_data_get_irq_chip_data(d);
 
 	pr_debug("%s bridge %pOF %d #%d\n", __func__, phb->dn, virq, nr_irqs);
-
-	/* XIVE domain data is cleared through ->msi_free() */
+	irq_domain_free_irqs_parent(domain, virq, nr_irqs);
 }
 
 static const struct irq_domain_ops pseries_irq_domain_ops = {
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index f10592405024..625361a15424 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -317,7 +317,7 @@ int xmon_xive_get_irq_config(u32 hw_irq, struct irq_data *d)
 	if (d) {
 		char buffer[128];
 
-		xive_irq_data_dump(irq_data_get_irq_handler_data(d),
+		xive_irq_data_dump(irq_data_get_irq_chip_data(d),
 				   buffer, sizeof(buffer));
 		xmon_printf("%s", buffer);
 	}
@@ -437,7 +437,7 @@ static void xive_do_source_eoi(struct xive_irq_data *xd)
 /* irq_chip eoi callback, called with irq descriptor lock held */
 static void xive_irq_eoi(struct irq_data *d)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 	struct xive_cpu *xc = __this_cpu_read(xive_cpu);
 
 	DBG_VERBOSE("eoi_irq: irq=%d [0x%lx] pending=%02x\n",
@@ -595,7 +595,7 @@ static int xive_pick_irq_target(struct irq_data *d,
 				const struct cpumask *affinity)
 {
 	static unsigned int fuzz;
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 	cpumask_var_t mask;
 	int cpu = -1;
 
@@ -628,7 +628,7 @@ static int xive_pick_irq_target(struct irq_data *d,
 
 static unsigned int xive_irq_startup(struct irq_data *d)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
 	int target, rc;
 
@@ -673,7 +673,7 @@ static unsigned int xive_irq_startup(struct irq_data *d)
 /* called with irq descriptor lock held */
 static void xive_irq_shutdown(struct irq_data *d)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
 
 	pr_debug("%s: irq %d [0x%x] data @%p\n", __func__, d->irq, hw_irq, d);
@@ -698,7 +698,7 @@ static void xive_irq_shutdown(struct irq_data *d)
 
 static void xive_irq_unmask(struct irq_data *d)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 
 	pr_debug("%s: irq %d data @%p\n", __func__, d->irq, xd);
 
@@ -707,7 +707,7 @@ static void xive_irq_unmask(struct irq_data *d)
 
 static void xive_irq_mask(struct irq_data *d)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 
 	pr_debug("%s: irq %d data @%p\n", __func__, d->irq, xd);
 
@@ -718,7 +718,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
 				 const struct cpumask *cpumask,
 				 bool force)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
 	u32 target, old_target;
 	int rc = 0;
@@ -776,7 +776,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
 
 static int xive_irq_set_type(struct irq_data *d, unsigned int flow_type)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 
 	/*
 	 * We only support these. This has really no effect other than setting
@@ -815,7 +815,7 @@ static int xive_irq_set_type(struct irq_data *d, unsigned int flow_type)
 
 static int xive_irq_retrigger(struct irq_data *d)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 
 	/* This should be only for MSIs */
 	if (WARN_ON(xd->flags & XIVE_IRQ_FLAG_LSI))
@@ -837,7 +837,7 @@ static int xive_irq_retrigger(struct irq_data *d)
  */
 static int xive_irq_set_vcpu_affinity(struct irq_data *d, void *state)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);
 	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
 	int rc;
 	u8 pq;
@@ -951,7 +951,7 @@ static int xive_irq_set_vcpu_affinity(struct irq_data *d, void *state)
 static int xive_get_irqchip_state(struct irq_data *data,
 				  enum irqchip_irq_state which, bool *state)
 {
-	struct xive_irq_data *xd = irq_data_get_irq_handler_data(data);
+	struct xive_irq_data *xd = irq_data_get_irq_chip_data(data);
 	u8 pq;
 
 	switch (which) {
@@ -1011,21 +1011,20 @@ void xive_cleanup_irq_data(struct xive_irq_data *xd)
 }
 EXPORT_SYMBOL_GPL(xive_cleanup_irq_data);
 
-static int xive_irq_alloc_data(unsigned int virq, irq_hw_number_t hw)
+static struct xive_irq_data *xive_irq_alloc_data(unsigned int virq, irq_hw_number_t hw)
 {
 	struct xive_irq_data *xd;
 	int rc;
 
 	xd = kzalloc(sizeof(struct xive_irq_data), GFP_KERNEL);
 	if (!xd)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 	rc = xive_ops->populate_irq_data(hw, xd);
 	if (rc) {
 		kfree(xd);
-		return rc;
+		return ERR_PTR(rc);
 	}
 	xd->target = XIVE_INVALID_TARGET;
-	irq_set_handler_data(virq, xd);
 
 	/*
 	 * Turn OFF by default the interrupt being mapped. A side
@@ -1036,20 +1035,19 @@ static int xive_irq_alloc_data(unsigned int virq, irq_hw_number_t hw)
 	 */
 	xive_esb_read(xd, XIVE_ESB_SET_PQ_01);
 
-	return 0;
+	return xd;
 }
 
-void xive_irq_free_data(unsigned int virq)
+static void xive_irq_free_data(unsigned int virq)
 {
-	struct xive_irq_data *xd = irq_get_handler_data(virq);
+	struct xive_irq_data *xd = irq_get_chip_data(virq);
 
 	if (!xd)
 		return;
-	irq_set_handler_data(virq, NULL);
+	irq_set_chip_data(virq, NULL);
 	xive_cleanup_irq_data(xd);
 	kfree(xd);
 }
-EXPORT_SYMBOL_GPL(xive_irq_free_data);
 
 #ifdef CONFIG_SMP
 
@@ -1286,7 +1284,7 @@ void __init xive_smp_probe(void)
 static int xive_irq_domain_map(struct irq_domain *h, unsigned int virq,
 			       irq_hw_number_t hw)
 {
-	int rc;
+	struct xive_irq_data *xd;
 
 	/*
 	 * Mark interrupts as edge sensitive by default so that resend
@@ -1294,11 +1292,12 @@ static int xive_irq_domain_map(struct irq_domain *h, unsigned int virq,
 	 */
 	irq_clear_status_flags(virq, IRQ_LEVEL);
 
-	rc = xive_irq_alloc_data(virq, hw);
-	if (rc)
-		return rc;
+	xd = xive_irq_alloc_data(virq, hw);
+	if (IS_ERR(xd))
+		return PTR_ERR(xd);
 
 	irq_set_chip_and_handler(virq, &xive_irq_chip, handle_fasteoi_irq);
+	irq_set_chip_data(virq, xd);
 
 	return 0;
 }
@@ -1366,7 +1365,7 @@ static void xive_irq_domain_debug_show(struct seq_file *m, struct irq_domain *d,
 	seq_printf(m, "%*sXIVE:\n", ind, "");
 	ind++;
 
-	xd = irq_data_get_irq_handler_data(irqd);
+	xd = irq_data_get_irq_chip_data(irqd);
 	if (!xd) {
 		seq_printf(m, "%*snot assigned\n", ind, "");
 		return;
@@ -1403,6 +1402,7 @@ static int xive_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 				 unsigned int nr_irqs, void *arg)
 {
 	struct irq_fwspec *fwspec = arg;
+	struct xive_irq_data *xd;
 	irq_hw_number_t hwirq;
 	unsigned int type = IRQ_TYPE_NONE;
 	int i, rc;
@@ -1423,12 +1423,11 @@ static int xive_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 		irq_clear_status_flags(virq, IRQ_LEVEL);
 
 		/* allocates and sets handler data */
-		rc = xive_irq_alloc_data(virq + i, hwirq + i);
-		if (rc)
-			return rc;
+		xd = xive_irq_alloc_data(virq + i, hwirq + i);
+		if (IS_ERR(xd))
+			return PTR_ERR(xd);
 
-		irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i,
-					      &xive_irq_chip, domain->host_data);
+		irq_domain_set_hwirq_and_chip(domain, virq + i, hwirq + i, &xive_irq_chip, xd);
 		irq_set_handler(virq + i, handle_fasteoi_irq);
 	}
 
@@ -1764,7 +1763,7 @@ static void xive_debug_show_irq(struct seq_file *m, struct irq_data *d)
 	seq_printf(m, "IRQ 0x%08x : target=0x%x prio=%02x lirq=0x%x ",
 		   hw_irq, target, prio, lirq);
 
-	xive_irq_data_dump(irq_data_get_irq_handler_data(d), buffer, sizeof(buffer));
+	xive_irq_data_dump(irq_data_get_irq_chip_data(d), buffer, sizeof(buffer));
 	seq_puts(m, buffer);
 	seq_puts(m, "\n");
 }
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] powerpc/powernv/pci: Switch to use msi_create_parent_irq_domain()
  2025-08-11  9:28 [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Nam Cao
  2025-08-11  9:28 ` [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers Nam Cao
@ 2025-08-11  9:28 ` Nam Cao
  2025-08-11  9:28 ` [PATCH v2 3/3] powerpc/pseries/msi: Switch to msi_create_parent_irq_domain() Nam Cao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Nam Cao @ 2025-08-11  9:28 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Thomas Gleixner, Marc Zyngier, Gautam Menghani,
	linuxppc-dev, linux-kernel
  Cc: Nam Cao

Move away from the legacy MSI domain setup, switch to use
msi_create_parent_irq_domain().

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
v2: no change
---
 arch/powerpc/platforms/powernv/Kconfig    |  1 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 75 ++++++++++-------------
 2 files changed, 35 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 95d7ba73d43d..b5ad7c173ef0 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -9,6 +9,7 @@ config PPC_POWERNV
 	select PPC_P7_NAP
 	select FORCE_PCI
 	select PCI_MSI
+	select IRQ_MSI_LIB
 	select EPAPR_BOOT
 	select PPC_INDIRECT_PIO
 	select PPC_UDBG_16550
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index fb37098d4d58..c2a932c6e304 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -15,6 +15,7 @@
 #include <linux/init.h>
 #include <linux/memblock.h>
 #include <linux/irq.h>
+#include <linux/irqchip/irq-msi-lib.h>
 #include <linux/io.h>
 #include <linux/msi.h>
 #include <linux/iommu.h>
@@ -1713,30 +1714,33 @@ static void pnv_msi_shutdown(struct irq_data *d)
 		d->chip->irq_shutdown(d);
 }
 
-static void pnv_msi_mask(struct irq_data *d)
+static bool pnv_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				  struct irq_domain *real_parent, struct msi_domain_info *info)
 {
-	pci_msi_mask_irq(d);
-	irq_chip_mask_parent(d);
-}
+	struct irq_chip *chip = info->chip;
 
-static void pnv_msi_unmask(struct irq_data *d)
-{
-	pci_msi_unmask_irq(d);
-	irq_chip_unmask_parent(d);
-}
+	if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
+		return false;
 
-static struct irq_chip pnv_pci_msi_irq_chip = {
-	.name		= "PNV-PCI-MSI",
-	.irq_shutdown	= pnv_msi_shutdown,
-	.irq_mask	= pnv_msi_mask,
-	.irq_unmask	= pnv_msi_unmask,
-	.irq_eoi	= irq_chip_eoi_parent,
-};
+	chip->irq_shutdown = pnv_msi_shutdown;
+	return true;
+}
 
-static struct msi_domain_info pnv_msi_domain_info = {
-	.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-		  MSI_FLAG_MULTI_PCI_MSI  | MSI_FLAG_PCI_MSIX),
-	.chip  = &pnv_pci_msi_irq_chip,
+#define PNV_PCI_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS		| \
+				    MSI_FLAG_USE_DEF_CHIP_OPS		| \
+				    MSI_FLAG_PCI_MSI_MASK_PARENT)
+#define PNV_PCI_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK		| \
+				     MSI_FLAG_PCI_MSIX			| \
+				     MSI_FLAG_MULTI_PCI_MSI)
+
+static const struct msi_parent_ops pnv_msi_parent_ops = {
+	.required_flags		= PNV_PCI_MSI_FLAGS_REQUIRED,
+	.supported_flags	= PNV_PCI_MSI_FLAGS_SUPPORTED,
+	.chip_flags		= MSI_CHIP_FLAG_SET_EOI,
+	.bus_select_token	= DOMAIN_BUS_NEXUS,
+	.bus_select_mask	= MATCH_PCI_MSI,
+	.prefix			= "PNV-",
+	.init_dev_msi_info	= pnv_init_dev_msi_info,
 };
 
 static void pnv_msi_compose_msg(struct irq_data *d, struct msi_msg *msg)
@@ -1855,37 +1859,26 @@ static void pnv_irq_domain_free(struct irq_domain *domain, unsigned int virq,
 }
 
 static const struct irq_domain_ops pnv_irq_domain_ops = {
+	.select	= msi_lib_irq_domain_select,
 	.alloc  = pnv_irq_domain_alloc,
 	.free   = pnv_irq_domain_free,
 };
 
 static int __init pnv_msi_allocate_domains(struct pci_controller *hose, unsigned int count)
 {
-	struct pnv_phb *phb = hose->private_data;
 	struct irq_domain *parent = irq_get_default_domain();
-
-	hose->fwnode = irq_domain_alloc_named_id_fwnode("PNV-MSI", phb->opal_id);
-	if (!hose->fwnode)
-		return -ENOMEM;
-
-	hose->dev_domain = irq_domain_create_hierarchy(parent, 0, count,
-						       hose->fwnode,
-						       &pnv_irq_domain_ops, hose);
+	struct irq_domain_info info = {
+		.fwnode		= of_fwnode_handle(hose->dn),
+		.ops		= &pnv_irq_domain_ops,
+		.host_data	= hose,
+		.size		= count,
+		.parent		= parent,
+	};
+
+	hose->dev_domain = msi_create_parent_irq_domain(&info, &pnv_msi_parent_ops);
 	if (!hose->dev_domain) {
-		pr_err("PCI: failed to create IRQ domain bridge %pOF (domain %d)\n",
-		       hose->dn, hose->global_number);
-		irq_domain_free_fwnode(hose->fwnode);
-		return -ENOMEM;
-	}
-
-	hose->msi_domain = pci_msi_create_irq_domain(of_fwnode_handle(hose->dn),
-						     &pnv_msi_domain_info,
-						     hose->dev_domain);
-	if (!hose->msi_domain) {
 		pr_err("PCI: failed to create MSI IRQ domain bridge %pOF (domain %d)\n",
 		       hose->dn, hose->global_number);
-		irq_domain_free_fwnode(hose->fwnode);
-		irq_domain_remove(hose->dev_domain);
 		return -ENOMEM;
 	}
 
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/3] powerpc/pseries/msi: Switch to msi_create_parent_irq_domain()
  2025-08-11  9:28 [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Nam Cao
  2025-08-11  9:28 ` [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers Nam Cao
  2025-08-11  9:28 ` [PATCH v2 2/3] powerpc/powernv/pci: Switch to use msi_create_parent_irq_domain() Nam Cao
@ 2025-08-11  9:28 ` Nam Cao
  2025-09-03 14:07 ` [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Thomas Gleixner
  2025-09-12  3:56 ` Madhavan Srinivasan
  4 siblings, 0 replies; 10+ messages in thread
From: Nam Cao @ 2025-08-11  9:28 UTC (permalink / raw)
  To: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Thomas Gleixner, Marc Zyngier, Gautam Menghani,
	linuxppc-dev, linux-kernel
  Cc: Nam Cao

Move away from the legacy MSI domain setup, switch to use
msi_create_parent_irq_domain().

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
v2: Use msi_teardown() instead of msi_post_free() to fix boot issue with
    NVMe
---
 arch/powerpc/include/asm/pci-bridge.h  |   2 -
 arch/powerpc/platforms/pseries/Kconfig |   1 +
 arch/powerpc/platforms/pseries/msi.c   | 114 +++++++++++--------------
 3 files changed, 49 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index 2aa3a091ef20..1dae53130782 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -133,8 +133,6 @@ struct pci_controller {
 
 	/* IRQ domain hierarchy */
 	struct irq_domain	*dev_domain;
-	struct irq_domain	*msi_domain;
-	struct fwnode_handle	*fwnode;
 
 	/* iommu_ops support */
 	struct iommu_device	iommu;
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index fa3c2fff082a..3e042218d6cd 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -7,6 +7,7 @@ config PPC_PSERIES
 	select OF_DYNAMIC
 	select FORCE_PCI
 	select PCI_MSI
+	select IRQ_MSI_LIB
 	select GENERIC_ALLOCATOR
 	select PPC_XICS
 	select PPC_XIVE_SPAPR
diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 10712477938e..7da142dd5baa 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -7,6 +7,7 @@
 #include <linux/crash_dump.h>
 #include <linux/device.h>
 #include <linux/irq.h>
+#include <linux/irqchip/irq-msi-lib.h>
 #include <linux/irqdomain.h>
 #include <linux/msi.h>
 #include <linux/seq_file.h>
@@ -429,8 +430,9 @@ static int rtas_prepare_msi_irqs(struct pci_dev *pdev, int nvec_in, int type,
 static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev,
 				   int nvec, msi_alloc_info_t *arg)
 {
+	struct msi_domain_info *info = domain->host_data;
 	struct pci_dev *pdev = to_pci_dev(dev);
-	int type = pdev->msix_enabled ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
+	int type = (info->flags & MSI_FLAG_PCI_MSIX) ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI;
 
 	return rtas_prepare_msi_irqs(pdev, nvec, type, arg);
 }
@@ -439,19 +441,14 @@ static int pseries_msi_ops_prepare(struct irq_domain *domain, struct device *dev
  * RTAS can not disable one MSI at a time. It's all or nothing. Do it
  * at the end after all IRQs have been freed.
  */
-static void pseries_msi_post_free(struct irq_domain *domain, struct device *dev)
+static void pseries_msi_ops_teardown(struct irq_domain *domain, msi_alloc_info_t *arg)
 {
-	if (WARN_ON_ONCE(!dev_is_pci(dev)))
-		return;
+	struct msi_desc *desc = arg->desc;
+	struct pci_dev *pdev = msi_desc_to_pci_dev(desc);
 
-	rtas_disable_msi(to_pci_dev(dev));
+	rtas_disable_msi(pdev);
 }
 
-static struct msi_domain_ops pseries_pci_msi_domain_ops = {
-	.msi_prepare	= pseries_msi_ops_prepare,
-	.msi_post_free	= pseries_msi_post_free,
-};
-
 static void pseries_msi_shutdown(struct irq_data *d)
 {
 	d = d->parent_data;
@@ -459,18 +456,6 @@ static void pseries_msi_shutdown(struct irq_data *d)
 		d->chip->irq_shutdown(d);
 }
 
-static void pseries_msi_mask(struct irq_data *d)
-{
-	pci_msi_mask_irq(d);
-	irq_chip_mask_parent(d);
-}
-
-static void pseries_msi_unmask(struct irq_data *d)
-{
-	pci_msi_unmask_irq(d);
-	irq_chip_unmask_parent(d);
-}
-
 static void pseries_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 {
 	struct msi_desc *entry = irq_data_get_msi_desc(data);
@@ -485,27 +470,39 @@ static void pseries_msi_write_msg(struct irq_data *data, struct msi_msg *msg)
 	entry->msg = *msg;
 }
 
-static struct irq_chip pseries_pci_msi_irq_chip = {
-	.name		= "pSeries-PCI-MSI",
-	.irq_shutdown	= pseries_msi_shutdown,
-	.irq_mask	= pseries_msi_mask,
-	.irq_unmask	= pseries_msi_unmask,
-	.irq_eoi	= irq_chip_eoi_parent,
-	.irq_write_msi_msg	= pseries_msi_write_msg,
-};
+static bool pseries_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				      struct irq_domain *real_parent, struct msi_domain_info *info)
+{
+	struct irq_chip *chip = info->chip;
 
+	if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
+		return false;
 
-/*
- * Set MSI_FLAG_MSIX_CONTIGUOUS as there is no way to express to
- * firmware to request a discontiguous or non-zero based range of
- * MSI-X entries. Core code will reject such setup attempts.
- */
-static struct msi_domain_info pseries_msi_domain_info = {
-	.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
-		  MSI_FLAG_MULTI_PCI_MSI  | MSI_FLAG_PCI_MSIX |
-		  MSI_FLAG_MSIX_CONTIGUOUS),
-	.ops   = &pseries_pci_msi_domain_ops,
-	.chip  = &pseries_pci_msi_irq_chip,
+	chip->irq_shutdown = pseries_msi_shutdown;
+	chip->irq_write_msi_msg	= pseries_msi_write_msg;
+
+	info->ops->msi_prepare = pseries_msi_ops_prepare;
+	info->ops->msi_teardown = pseries_msi_ops_teardown;
+
+	return true;
+}
+
+#define PSERIES_PCI_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS	| \
+					MSI_FLAG_USE_DEF_CHIP_OPS	| \
+					MSI_FLAG_PCI_MSI_MASK_PARENT)
+#define PSERIES_PCI_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK		| \
+					 MSI_FLAG_PCI_MSIX		| \
+					 MSI_FLAG_MSIX_CONTIGUOUS	| \
+					 MSI_FLAG_MULTI_PCI_MSI)
+
+static const struct msi_parent_ops pseries_msi_parent_ops = {
+	.required_flags		= PSERIES_PCI_MSI_FLAGS_REQUIRED,
+	.supported_flags	= PSERIES_PCI_MSI_FLAGS_SUPPORTED,
+	.chip_flags		= MSI_CHIP_FLAG_SET_EOI,
+	.bus_select_token	= DOMAIN_BUS_NEXUS,
+	.bus_select_mask	= MATCH_PCI_MSI,
+	.prefix			= "pSeries-",
+	.init_dev_msi_info	= pseries_init_dev_msi_info,
 };
 
 static void pseries_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
@@ -593,6 +590,7 @@ static void pseries_irq_domain_free(struct irq_domain *domain, unsigned int virq
 }
 
 static const struct irq_domain_ops pseries_irq_domain_ops = {
+	.select	= msi_lib_irq_domain_select,
 	.alloc  = pseries_irq_domain_alloc,
 	.free   = pseries_irq_domain_free,
 };
@@ -601,30 +599,18 @@ static int __pseries_msi_allocate_domains(struct pci_controller *phb,
 					  unsigned int count)
 {
 	struct irq_domain *parent = irq_get_default_domain();
-
-	phb->fwnode = irq_domain_alloc_named_id_fwnode("pSeries-MSI",
-						       phb->global_number);
-	if (!phb->fwnode)
-		return -ENOMEM;
-
-	phb->dev_domain = irq_domain_create_hierarchy(parent, 0, count,
-						      phb->fwnode,
-						      &pseries_irq_domain_ops, phb);
+	struct irq_domain_info info = {
+		.fwnode		= of_fwnode_handle(phb->dn),
+		.ops		= &pseries_irq_domain_ops,
+		.host_data	= phb,
+		.size		= count,
+		.parent		= parent,
+	};
+
+	phb->dev_domain = msi_create_parent_irq_domain(&info, &pseries_msi_parent_ops);
 	if (!phb->dev_domain) {
-		pr_err("PCI: failed to create IRQ domain bridge %pOF (domain %d)\n",
-		       phb->dn, phb->global_number);
-		irq_domain_free_fwnode(phb->fwnode);
-		return -ENOMEM;
-	}
-
-	phb->msi_domain = pci_msi_create_irq_domain(of_fwnode_handle(phb->dn),
-						    &pseries_msi_domain_info,
-						    phb->dev_domain);
-	if (!phb->msi_domain) {
 		pr_err("PCI: failed to create MSI IRQ domain bridge %pOF (domain %d)\n",
 		       phb->dn, phb->global_number);
-		irq_domain_free_fwnode(phb->fwnode);
-		irq_domain_remove(phb->dev_domain);
 		return -ENOMEM;
 	}
 
@@ -646,12 +632,8 @@ int pseries_msi_allocate_domains(struct pci_controller *phb)
 
 void pseries_msi_free_domains(struct pci_controller *phb)
 {
-	if (phb->msi_domain)
-		irq_domain_remove(phb->msi_domain);
 	if (phb->dev_domain)
 		irq_domain_remove(phb->dev_domain);
-	if (phb->fwnode)
-		irq_domain_free_fwnode(phb->fwnode);
 }
 
 static void rtas_msi_pci_irq_fixup(struct pci_dev *pdev)
-- 
2.39.5



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain
  2025-08-11  9:28 [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Nam Cao
                   ` (2 preceding siblings ...)
  2025-08-11  9:28 ` [PATCH v2 3/3] powerpc/pseries/msi: Switch to msi_create_parent_irq_domain() Nam Cao
@ 2025-09-03 14:07 ` Thomas Gleixner
  2025-09-06 11:48   ` Madhavan Srinivasan
  2025-09-12  3:56 ` Madhavan Srinivasan
  4 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2025-09-03 14:07 UTC (permalink / raw)
  To: Nam Cao, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Marc Zyngier, Gautam Menghani, linuxppc-dev,
	linux-kernel
  Cc: Nam Cao

On Mon, Aug 11 2025 at 11:28, Nam Cao wrote:

> The initial implementation of PCI/MSI interrupt domains in the hierarchical
> interrupt domain model used a shortcut by providing a global PCI/MSI
> domain.
>
> This works because the PCI/MSI[X] hardware is standardized and uniform, but
> it violates the basic design principle of hierarchical interrupt domains:
> Each hardware block involved in the interrupt delivery chain should have a
> separate interrupt domain.
>
> For PCI/MSI[X], the interrupt controller is per PCI device and not a global
> made-up entity.
>
> Unsurprisingly, the shortcut turned out to have downsides as it does not
> allow dynamic allocation of interrupt vectors after initialization and it
> prevents supporting IMS on PCI. For further details, see:
>
> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de/
>
> The solution is implementing per device MSI domains, this means the
> entities which provide global PCI/MSI domain so far have to implement MSI
> parent domain functionality instead.
>
> This series:
>
>    - Untangle XIVE driver from Powernv and Pseries drivers
>
>    - Convert the Powernv and Pseries drivers to implement MSI parent domain
>      functionality

Polite reminder to the PPC folks. Can we please get this moving so we
can finally cleanup the pci_msi_create_irq_domain() leftovers?

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain
  2025-09-03 14:07 ` [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Thomas Gleixner
@ 2025-09-06 11:48   ` Madhavan Srinivasan
  0 siblings, 0 replies; 10+ messages in thread
From: Madhavan Srinivasan @ 2025-09-06 11:48 UTC (permalink / raw)
  To: Thomas Gleixner, Nam Cao, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Marc Zyngier, Gautam Menghani, linuxppc-dev,
	linux-kernel



On 9/3/25 7:37 PM, Thomas Gleixner wrote:
> On Mon, Aug 11 2025 at 11:28, Nam Cao wrote:
> 
>> The initial implementation of PCI/MSI interrupt domains in the hierarchical
>> interrupt domain model used a shortcut by providing a global PCI/MSI
>> domain.
>>
>> This works because the PCI/MSI[X] hardware is standardized and uniform, but
>> it violates the basic design principle of hierarchical interrupt domains:
>> Each hardware block involved in the interrupt delivery chain should have a
>> separate interrupt domain.
>>
>> For PCI/MSI[X], the interrupt controller is per PCI device and not a global
>> made-up entity.
>>
>> Unsurprisingly, the shortcut turned out to have downsides as it does not
>> allow dynamic allocation of interrupt vectors after initialization and it
>> prevents supporting IMS on PCI. For further details, see:
>>
>> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de/
>>
>> The solution is implementing per device MSI domains, this means the
>> entities which provide global PCI/MSI domain so far have to implement MSI
>> parent domain functionality instead.
>>
>> This series:
>>
>>    - Untangle XIVE driver from Powernv and Pseries drivers
>>
>>    - Convert the Powernv and Pseries drivers to implement MSI parent domain
>>      functionality
> 
> Polite reminder to the PPC folks. Can we please get this moving so we
> can finally cleanup the pci_msi_create_irq_domain() leftovers?

Yes, sorry for the delay,
Will add this to my next after the test

Maddy

> 
> Thanks,
> 
>         tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain
  2025-08-11  9:28 [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Nam Cao
                   ` (3 preceding siblings ...)
  2025-09-03 14:07 ` [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Thomas Gleixner
@ 2025-09-12  3:56 ` Madhavan Srinivasan
  4 siblings, 0 replies; 10+ messages in thread
From: Madhavan Srinivasan @ 2025-09-12  3:56 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Thomas Gleixner, Marc Zyngier, Gautam Menghani, linuxppc-dev,
	linux-kernel, Nam Cao

On Mon, 11 Aug 2025 11:28:53 +0200, Nam Cao wrote:
> The initial implementation of PCI/MSI interrupt domains in the hierarchical
> interrupt domain model used a shortcut by providing a global PCI/MSI
> domain.
> 
> This works because the PCI/MSI[X] hardware is standardized and uniform, but
> it violates the basic design principle of hierarchical interrupt domains:
> Each hardware block involved in the interrupt delivery chain should have a
> separate interrupt domain.
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/xive: Untangle xive from child interrupt controller drivers
      https://git.kernel.org/powerpc/c/cc0cc23babc979e399f34f53e4bccf702a389558
[2/3] powerpc/powernv/pci: Switch to use msi_create_parent_irq_domain()
      https://git.kernel.org/powerpc/c/f0ac60e6e311062f1a452d93376055787db4b070
[3/3] powerpc/pseries/msi: Switch to msi_create_parent_irq_domain()
      https://git.kernel.org/powerpc/c/daaa574aba6f9c683408b58a7ab2dc775ece2f98

Thanks


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers
  2025-08-11  9:28 ` [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers Nam Cao
@ 2025-10-07  3:24   ` Ritesh Harjani
  2025-10-07  5:24     ` Nam Cao
  0 siblings, 1 reply; 10+ messages in thread
From: Ritesh Harjani @ 2025-10-07  3:24 UTC (permalink / raw)
  To: Nam Cao, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Thomas Gleixner, Marc Zyngier, Gautam Menghani,
	linuxppc-dev, linux-kernel
  Cc: Nam Cao

Nam Cao <namcao@linutronix.de> writes:

> xive-specific data is stored in handler_data. This creates a mess, as xive
> has to rely on child interrupt controller drivers to clean up this data, as
> was done by 9a014f45688 ("powerpc/pseries/pci: Add a msi_free() handler to
> clear XIVE data").
>
> Instead, store xive-specific data in chip_data and untangle the child
> drivers.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
> v2: no change
> ---
>  arch/powerpc/include/asm/xive.h           |  1 -
>  arch/powerpc/platforms/powernv/pci-ioda.c | 21 +-------
>  arch/powerpc/platforms/pseries/msi.c      | 18 +------
>  arch/powerpc/sysdev/xive/common.c         | 63 +++++++++++------------
>  4 files changed, 33 insertions(+), 70 deletions(-)


Hi Nam, 

I am facing kernel crash on host when trying to run kvm pseries guest on
powernv host. Looking it a bit more closely, I see that we are missing
conversion of xxx_irq_handler_data()) to xxx_irq_chip_data() at few other
places, including in powerpc KVM code. 

<Crash signature>

BUG: Kernel NULL pointer dereference on read at 0x00000010
Faulting instruction address: 0xc0000000001c0704
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=2048 NUMA PowerNV
CPU: 103 UID: 0 PID: 2742 Comm: qemu-system-ppc Not tainted 6.17.0-01737-g50c19e20ed2e #1 NONE
<...>
NIP:  c0000000001c0704 LR: c0000000001c06f8 CTR: 0000000000000000
REGS: c0000000627476a0 TRAP: 0300   Not tainted  (6.17.0-01737-g50c19e20ed2e)
MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 88044488  XER: 00000036
CFAR: c0000000002c20d8 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 
<...>
NIP [c0000000001c0704] kvmppc_xive_attach_escalation+0x174/0x240
LR [c0000000001c06f8] kvmppc_xive_attach_escalation+0x168/0x240
Call Trace:
  kvmppc_xive_attach_escalation+0x168/0x240 (unreliable)
  kvmppc_xive_connect_vcpu+0x2a0/0x4c0
  kvm_arch_vcpu_ioctl+0x354/0x470
  kvm_vcpu_ioctl+0x488/0x9a0
  sys_ioctl+0x4ec/0x1030
  system_call_exception+0x104/0x2b0
  system_call_vectored_common+0x15c/0x2ec



Here is the diff which fixed this.. 

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 1302b5ac5672..c029c6cc82ef 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -917,7 +917,7 @@ int kvmppc_xive_attach_escalation(struct kvm_vcpu *vcpu, u8 prio,
         */
        if (single_escalation) {
                struct irq_data *d = irq_get_irq_data(xc->esc_virq[prio]);
-               struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+               struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);

                xive_vm_esb_load(xd, XIVE_ESB_SET_PQ_01);
                vcpu->arch.xive_esc_raddr = xd->eoi_page;
@@ -1612,7 +1612,7 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,

        /* Grab info about irq */
        state->pt_number = hw_irq;
-       state->pt_data = irq_data_get_irq_handler_data(host_data);
+       state->pt_data = irq_data_get_irq_chip_data(host_data);

        /*
         * Configure the IRQ to match the existing configuration of
@@ -1788,7 +1788,7 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
 void xive_cleanup_single_escalation(struct kvm_vcpu *vcpu, int irq)
 {
        struct irq_data *d = irq_get_irq_data(irq);
-       struct xive_irq_data *xd = irq_data_get_irq_handler_data(d);
+       struct xive_irq_data *xd = irq_data_get_irq_chip_data(d);

        /*
         * This slightly odd sequence gives the right result
@@ -2829,7 +2829,7 @@ int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
                if (xc->esc_virq[i]) {
                        struct irq_data *d = irq_get_irq_data(xc->esc_virq[i]);
                        struct xive_irq_data *xd =
-                               irq_data_get_irq_handler_data(d);
+                               irq_data_get_irq_chip_data(d);
                        u64 pq = xive_vm_esb_load(xd, XIVE_ESB_GET);

                        seq_printf(m, "    ESC %d %c%c EOI @%llx",


... However grepping for "handler_data" in arch/powerpc I see there is
atleast one more place where we may still need the fix.. There are few
more places which grep returned - but I am not sure if they all really need
the fix. But I guess VAS should be fixed i.e :

arch/powerpc/platforms/powernv/vas.c:   xd = irq_get_handler_data(vinst->virq);


Would you like to submit an official patch for converting these other places too?


Thanks!
-ritesh



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers
  2025-10-07  3:24   ` Ritesh Harjani
@ 2025-10-07  5:24     ` Nam Cao
  2025-10-07 15:08       ` Nam Cao
  0 siblings, 1 reply; 10+ messages in thread
From: Nam Cao @ 2025-10-07  5:24 UTC (permalink / raw)
  To: Ritesh Harjani, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Marc Zyngier,
	Gautam Menghani, linuxppc-dev, linux-kernel

Hi Ritesh,

Ritesh Harjani (IBM) <ritesh.list@gmail.com> writes:
> I am facing kernel crash on host when trying to run kvm pseries guest on
> powernv host. Looking it a bit more closely, I see that we are missing
> conversion of xxx_irq_handler_data()) to xxx_irq_chip_data() at few other
> places, including in powerpc KVM code. 
[snip]
> Here is the diff which fixed this.. 
[snip]
> ... However grepping for "handler_data" in arch/powerpc I see there is
> atleast one more place where we may still need the fix.. There are few
> more places which grep returned - but I am not sure if they all really need
> the fix. But I guess VAS should be fixed i.e :
>
> arch/powerpc/platforms/powernv/vas.c:   xd = irq_get_handler_data(vinst->virq);
>
> Would you like to submit an official patch for converting these other places too?

Thanks for the report. I didn't expect struct xive_irq_data to be used
in multiple files while making that patch, sorry about that!

Something like your patch should do the job. However, my gut feeling is
that multiple files shouldn't share a single irq struct this way. Let me
stare at it...

Nam


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers
  2025-10-07  5:24     ` Nam Cao
@ 2025-10-07 15:08       ` Nam Cao
  0 siblings, 0 replies; 10+ messages in thread
From: Nam Cao @ 2025-10-07 15:08 UTC (permalink / raw)
  To: Ritesh Harjani, Madhavan Srinivasan, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, Thomas Gleixner, Marc Zyngier,
	Gautam Menghani, linuxppc-dev, linux-kernel

Nam Cao <namcao@linutronix.de> writes:

> Hi Ritesh,
>
> Ritesh Harjani (IBM) <ritesh.list@gmail.com> writes:
>> I am facing kernel crash on host when trying to run kvm pseries guest on
>> powernv host. Looking it a bit more closely, I see that we are missing
>> conversion of xxx_irq_handler_data()) to xxx_irq_chip_data() at few other
>> places, including in powerpc KVM code. 
> [snip]
>> Here is the diff which fixed this.. 
> [snip]
>> ... However grepping for "handler_data" in arch/powerpc I see there is
>> atleast one more place where we may still need the fix.. There are few
>> more places which grep returned - but I am not sure if they all really need
>> the fix. But I guess VAS should be fixed i.e :
>>
>> arch/powerpc/platforms/powernv/vas.c:   xd = irq_get_handler_data(vinst->virq);
>>
>> Would you like to submit an official patch for converting these other places too?
>
> Thanks for the report. I didn't expect struct xive_irq_data to be used
> in multiple files while making that patch, sorry about that!
>
> Something like your patch should do the job. However, my gut feeling is
> that multiple files shouldn't share a single irq struct this way. Let me
> stare at it...

I *think* we can do a cleanup. But I don't think it would be trivial
enough for 6.18. Let's do as you suggested for now to get KVM functional
again.

Nam


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-07 15:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11  9:28 [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Nam Cao
2025-08-11  9:28 ` [PATCH v2 1/3] powerpc/xive: Untangle xive from child interrupt controller drivers Nam Cao
2025-10-07  3:24   ` Ritesh Harjani
2025-10-07  5:24     ` Nam Cao
2025-10-07 15:08       ` Nam Cao
2025-08-11  9:28 ` [PATCH v2 2/3] powerpc/powernv/pci: Switch to use msi_create_parent_irq_domain() Nam Cao
2025-08-11  9:28 ` [PATCH v2 3/3] powerpc/pseries/msi: Switch to msi_create_parent_irq_domain() Nam Cao
2025-09-03 14:07 ` [PATCH v2 0/3] powerpc: Cleanup and convert to MSI parent domain Thomas Gleixner
2025-09-06 11:48   ` Madhavan Srinivasan
2025-09-12  3:56 ` Madhavan Srinivasan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).