* [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver
@ 2025-07-08 17:33 Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 01/13] genirq: Teach handle_simple_irq() to resend an in-progress interrupt Marc Zyngier
` (13 more replies)
0 siblings, 14 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
Having recently dipped into the xgene-msi driver to bring it to use
the MSI-parent concept, I have realised that some of it was slightly
sub-par (read: downright broken).
The driver is playing horrible tricks behind the core code, missing
proper affinity management, is terribly over-designed for no good
reason, and despite what MAINTAINERS says, completely unmaintained.
This series is an attempt to fix most of the issues, and effectively
results more or less in a full rewrite of the driver, removing a lot
of cruft and fixing the interactions with the PCI host driver in the
process (there really isn't any reason to rely on initcall ordering
anymore).
I've stopped short of repainting the MAINTAINERS file, but given how
reactive Toan Le has been, maybe that's on the cards. Patches on top
of -rc3, tested on a Mustang board. I'd really like this to hit 6.17!
* From v1:
- Killed CPUHP_PCI_XGENE_DEAD altogether
- Added a couple of definitions and helpers to make the hwirq/frame/group
mapping a bit less awkward
- More comments about the weird and wonderful behaviour of MSInIRx
registerse
- Collected RB from tglx, with thanks
Marc Zyngier (13):
genirq: Teach handle_simple_irq() to resend an in-progress interrupt
PCI: xgene: Defer probing if the MSI widget driver hasn't probed yet
PCI: xgene: Drop useless conditional compilation
PCI: xgene: Drop XGENE_PCIE_IP_VER_UNKN
PCI: xgene-msi: Make per-CPU interrupt setup robust
PCI: xgene-msi: Drop superfluous fields from xgene_msi structure
PCI: xgene-msi: Use device-managed memory allocations
PCI: xgene-msi: Get rid of intermediate tracking structure
PCI: xgene-msi: Sanitise MSI allocation and affinity setting
PCI: xgene-msi: Resend an MSI racing with itself on a different CPU
PCI: xgene-msi: Probe as a standard platform driver
PCI: xgene-msi: Restructure handler setup/teardown
cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD
drivers/pci/controller/pci-xgene-msi.c | 428 +++++++++----------------
drivers/pci/controller/pci-xgene.c | 33 +-
include/linux/cpuhotplug.h | 1 -
kernel/irq/chip.c | 8 +-
4 files changed, 187 insertions(+), 283 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 01/13] genirq: Teach handle_simple_irq() to resend an in-progress interrupt
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 02/13] PCI: xgene: Defer probing if the MSI widget driver hasn't probed yet Marc Zyngier
` (12 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
It appears that the defect outlined in 9c15eeb5362c4 ("genirq: Allow
fasteoi handler to resend interrupts on concurrent handling") also
affects some other less stellar MSI controllers, this time using
the handle_simple_irq() flow.
Teach this flow about irqd_needs_resend_when_in_progress(). Given
the invasive nature of this workaround, only this flow is updated.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
kernel/irq/chip.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 2b274007e8bab..6e789035919f7 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -559,7 +559,13 @@ void handle_simple_irq(struct irq_desc *desc)
{
guard(raw_spinlock)(&desc->lock);
- if (!irq_can_handle(desc))
+ if (!irq_can_handle_pm(desc)) {
+ if (irqd_needs_resend_when_in_progress(&desc->irq_data))
+ desc->istate |= IRQS_PENDING;
+ return;
+ }
+
+ if (!irq_can_handle_actions(desc))
return;
kstat_incr_irqs_this_cpu(desc);
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 02/13] PCI: xgene: Defer probing if the MSI widget driver hasn't probed yet
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 01/13] genirq: Teach handle_simple_irq() to resend an in-progress interrupt Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 03/13] PCI: xgene: Drop useless conditional compilation Marc Zyngier
` (11 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
As a preparatory work to make the XGene MSI driver probe less of
a sorry hack, make the PCI driver check for the availability of
the MSI parent domain, and defer the probing otherwise.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c
index 1e2ebbfa36d19..f26cb58f814ec 100644
--- a/drivers/pci/controller/pci-xgene.c
+++ b/drivers/pci/controller/pci-xgene.c
@@ -12,6 +12,7 @@
#include <linux/jiffies.h>
#include <linux/memblock.h>
#include <linux/init.h>
+#include <linux/irqdomain.h>
#include <linux/of.h>
#include <linux/of_address.h>
#include <linux/of_pci.h>
@@ -594,6 +595,24 @@ static struct pci_ops xgene_pcie_ops = {
.write = pci_generic_config_write32,
};
+static bool xgene_check_pcie_msi_ready(void)
+{
+ struct device_node *np;
+ struct irq_domain *d;
+
+ if (!IS_ENABLED(CONFIG_PCI_XGENE_MSI))
+ return true;
+
+ np = of_find_compatible_node(NULL, NULL, "apm,xgene1-msi");
+ if (!np)
+ return true;
+
+ d = irq_find_matching_host(np, DOMAIN_BUS_PCI_MSI);
+ of_node_put(np);
+
+ return d && irq_domain_is_msi_parent(d);
+}
+
static int xgene_pcie_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
@@ -602,6 +621,10 @@ static int xgene_pcie_probe(struct platform_device *pdev)
struct pci_host_bridge *bridge;
int ret;
+ if (!xgene_check_pcie_msi_ready())
+ return dev_err_probe(&pdev->dev, -EPROBE_DEFER,
+ "MSI driver not ready\n");
+
bridge = devm_pci_alloc_host_bridge(dev, sizeof(*port));
if (!bridge)
return -ENOMEM;
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 03/13] PCI: xgene: Drop useless conditional compilation
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 01/13] genirq: Teach handle_simple_irq() to resend an in-progress interrupt Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 02/13] PCI: xgene: Defer probing if the MSI widget driver hasn't probed yet Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 04/13] PCI: xgene: Drop XGENE_PCIE_IP_VER_UNKN Marc Zyngier
` (10 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
pci-xgene.c only gets compiled if CONFIG_PCI_XGENE is selected.
It is therefore pointless to check for CONFIG_PCI_XGENE inside
the driver.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c
index f26cb58f814ec..a848f98203ae4 100644
--- a/drivers/pci/controller/pci-xgene.c
+++ b/drivers/pci/controller/pci-xgene.c
@@ -58,7 +58,6 @@
#define XGENE_PCIE_IP_VER_1 1
#define XGENE_PCIE_IP_VER_2 2
-#if defined(CONFIG_PCI_XGENE) || (defined(CONFIG_ACPI) && defined(CONFIG_PCI_QUIRKS))
struct xgene_pcie {
struct device_node *node;
struct device *dev;
@@ -189,7 +188,6 @@ static int xgene_pcie_config_read32(struct pci_bus *bus, unsigned int devfn,
return PCIBIOS_SUCCESSFUL;
}
-#endif
#if defined(CONFIG_ACPI) && defined(CONFIG_PCI_QUIRKS)
static int xgene_get_csr_resource(struct acpi_device *adev,
@@ -280,7 +278,6 @@ const struct pci_ecam_ops xgene_v2_pcie_ecam_ops = {
};
#endif
-#if defined(CONFIG_PCI_XGENE)
static u64 xgene_pcie_set_ib_mask(struct xgene_pcie *port, u32 addr,
u32 flags, u64 size)
{
@@ -670,4 +667,3 @@ static struct platform_driver xgene_pcie_driver = {
.probe = xgene_pcie_probe,
};
builtin_platform_driver(xgene_pcie_driver);
-#endif
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 04/13] PCI: xgene: Drop XGENE_PCIE_IP_VER_UNKN
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (2 preceding siblings ...)
2025-07-08 17:33 ` [PATCH v2 03/13] PCI: xgene: Drop useless conditional compilation Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 05/13] PCI: xgene-msi: Make per-CPU interrupt setup robust Marc Zyngier
` (9 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
XGENE_PCIE_IP_VER_UNKN is only refered to when probing for the
original XGene PCIe implementation, and get immediately overridden
if the device has the "apm,xgene-pcie" compatible string.
Given that the only way to get there is by finding this very string in
the DT, it is obvious that we will always ovwrite the version with
XGENE_PCIE_IP_VER_1.
Drop the whole thing.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene.c b/drivers/pci/controller/pci-xgene.c
index a848f98203ae4..b95afa35201d0 100644
--- a/drivers/pci/controller/pci-xgene.c
+++ b/drivers/pci/controller/pci-xgene.c
@@ -54,7 +54,6 @@
#define XGENE_V1_PCI_EXP_CAP 0x40
/* PCIe IP version */
-#define XGENE_PCIE_IP_VER_UNKN 0
#define XGENE_PCIE_IP_VER_1 1
#define XGENE_PCIE_IP_VER_2 2
@@ -630,10 +629,7 @@ static int xgene_pcie_probe(struct platform_device *pdev)
port->node = of_node_get(dn);
port->dev = dev;
-
- port->version = XGENE_PCIE_IP_VER_UNKN;
- if (of_device_is_compatible(port->node, "apm,xgene-pcie"))
- port->version = XGENE_PCIE_IP_VER_1;
+ port->version = XGENE_PCIE_IP_VER_1;
ret = xgene_pcie_map_reg(port, pdev);
if (ret)
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 05/13] PCI: xgene-msi: Make per-CPU interrupt setup robust
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (3 preceding siblings ...)
2025-07-08 17:33 ` [PATCH v2 04/13] PCI: xgene: Drop XGENE_PCIE_IP_VER_UNKN Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 06/13] PCI: xgene-msi: Drop superfluous fields from xgene_msi structure Marc Zyngier
` (8 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
The way the per-CPU interrupts are dealt with in the XGene MSI
driver isn't great:
- the affinity is set after the interrupt is enabled
- nothing prevents userspace from moving the interrupt around
- the affinity setting code pointlessly allocates memory
- the driver checks for conditions that cannot possibly happen
Address all of this in one go, resulting in slightly simpler setup
code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 29 ++++++--------------------
1 file changed, 6 insertions(+), 23 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index b05ec8b0bb93f..5b69286689177 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -355,40 +355,26 @@ static int xgene_msi_hwirq_alloc(unsigned int cpu)
{
struct xgene_msi *msi = &xgene_msi_ctrl;
struct xgene_msi_group *msi_group;
- cpumask_var_t mask;
int i;
int err;
for (i = cpu; i < NR_HW_IRQS; i += msi->num_cpus) {
msi_group = &msi->msi_groups[i];
- if (!msi_group->gic_irq)
- continue;
-
- irq_set_chained_handler_and_data(msi_group->gic_irq,
- xgene_msi_isr, msi_group);
/*
* Statically allocate MSI GIC IRQs to each CPU core.
* With 8-core X-Gene v1, 2 MSI GIC IRQs are allocated
* to each core.
*/
- if (alloc_cpumask_var(&mask, GFP_KERNEL)) {
- cpumask_clear(mask);
- cpumask_set_cpu(cpu, mask);
- err = irq_set_affinity(msi_group->gic_irq, mask);
- if (err)
- pr_err("failed to set affinity for GIC IRQ");
- free_cpumask_var(mask);
- } else {
- pr_err("failed to alloc CPU mask for affinity\n");
- err = -EINVAL;
- }
-
+ irq_set_status_flags(msi_group->gic_irq, IRQ_NO_BALANCING);
+ err = irq_set_affinity(msi_group->gic_irq, cpumask_of(cpu));
if (err) {
- irq_set_chained_handler_and_data(msi_group->gic_irq,
- NULL, NULL);
+ pr_err("failed to set affinity for GIC IRQ");
return err;
}
+
+ irq_set_chained_handler_and_data(msi_group->gic_irq,
+ xgene_msi_isr, msi_group);
}
return 0;
@@ -402,9 +388,6 @@ static int xgene_msi_hwirq_free(unsigned int cpu)
for (i = cpu; i < NR_HW_IRQS; i += msi->num_cpus) {
msi_group = &msi->msi_groups[i];
- if (!msi_group->gic_irq)
- continue;
-
irq_set_chained_handler_and_data(msi_group->gic_irq, NULL,
NULL);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 06/13] PCI: xgene-msi: Drop superfluous fields from xgene_msi structure
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (4 preceding siblings ...)
2025-07-08 17:33 ` [PATCH v2 05/13] PCI: xgene-msi: Make per-CPU interrupt setup robust Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-17 11:24 ` Markus Elfring
2025-07-08 17:33 ` [PATCH v2 07/13] PCI: xgene-msi: Use device-managed memory allocations Marc Zyngier
` (7 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
The xgene_msi structure remembers both the of_node of the device
and the number of CPUs. All of which are perfectly useless.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index 5b69286689177..50a817920cfd9 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -31,14 +31,12 @@ struct xgene_msi_group {
};
struct xgene_msi {
- struct device_node *node;
struct irq_domain *inner_domain;
u64 msi_addr;
void __iomem *msi_regs;
unsigned long *bitmap;
struct mutex bitmap_lock;
struct xgene_msi_group *msi_groups;
- int num_cpus;
};
/* Global data */
@@ -147,7 +145,7 @@ static void xgene_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
*/
static int hwirq_to_cpu(unsigned long hwirq)
{
- return (hwirq % xgene_msi_ctrl.num_cpus);
+ return (hwirq % num_possible_cpus());
}
static unsigned long hwirq_to_canonical_hwirq(unsigned long hwirq)
@@ -186,9 +184,9 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
mutex_lock(&msi->bitmap_lock);
msi_irq = bitmap_find_next_zero_area(msi->bitmap, NR_MSI_VEC, 0,
- msi->num_cpus, 0);
+ num_possible_cpus(), 0);
if (msi_irq < NR_MSI_VEC)
- bitmap_set(msi->bitmap, msi_irq, msi->num_cpus);
+ bitmap_set(msi->bitmap, msi_irq, num_possible_cpus());
else
msi_irq = -ENOSPC;
@@ -214,7 +212,7 @@ static void xgene_irq_domain_free(struct irq_domain *domain,
mutex_lock(&msi->bitmap_lock);
hwirq = hwirq_to_canonical_hwirq(d->hwirq);
- bitmap_clear(msi->bitmap, hwirq, msi->num_cpus);
+ bitmap_clear(msi->bitmap, hwirq, num_possible_cpus());
mutex_unlock(&msi->bitmap_lock);
@@ -235,10 +233,11 @@ static const struct msi_parent_ops xgene_msi_parent_ops = {
.init_dev_msi_info = msi_lib_init_dev_msi_info,
};
-static int xgene_allocate_domains(struct xgene_msi *msi)
+static int xgene_allocate_domains(struct device_node *node,
+ struct xgene_msi *msi)
{
struct irq_domain_info info = {
- .fwnode = of_fwnode_handle(msi->node),
+ .fwnode = of_fwnode_handle(node),
.ops = &xgene_msi_domain_ops,
.size = NR_MSI_VEC,
.host_data = msi,
@@ -358,7 +357,7 @@ static int xgene_msi_hwirq_alloc(unsigned int cpu)
int i;
int err;
- for (i = cpu; i < NR_HW_IRQS; i += msi->num_cpus) {
+ for (i = cpu; i < NR_HW_IRQS; i += num_possible_cpus()) {
msi_group = &msi->msi_groups[i];
/*
@@ -386,7 +385,7 @@ static int xgene_msi_hwirq_free(unsigned int cpu)
struct xgene_msi_group *msi_group;
int i;
- for (i = cpu; i < NR_HW_IRQS; i += msi->num_cpus) {
+ for (i = cpu; i < NR_HW_IRQS; i += num_possible_cpus()) {
msi_group = &msi->msi_groups[i];
irq_set_chained_handler_and_data(msi_group->gic_irq, NULL,
NULL);
@@ -417,8 +416,6 @@ static int xgene_msi_probe(struct platform_device *pdev)
goto error;
}
xgene_msi->msi_addr = res->start;
- xgene_msi->node = pdev->dev.of_node;
- xgene_msi->num_cpus = num_possible_cpus();
rc = xgene_msi_init_allocator(xgene_msi);
if (rc) {
@@ -426,7 +423,7 @@ static int xgene_msi_probe(struct platform_device *pdev)
goto error;
}
- rc = xgene_allocate_domains(xgene_msi);
+ rc = xgene_allocate_domains(dev_of_node(&pdev->dev), xgene_msi);
if (rc) {
dev_err(&pdev->dev, "Failed to allocate MSI domain\n");
goto error;
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 07/13] PCI: xgene-msi: Use device-managed memory allocations
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (5 preceding siblings ...)
2025-07-08 17:33 ` [PATCH v2 06/13] PCI: xgene-msi: Drop superfluous fields from xgene_msi structure Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 08/13] PCI: xgene-msi: Get rid of intermediate tracking structure Marc Zyngier
` (6 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
Since the MSI driver is probed as a platform device, there is no
reason to not use device-managed allocations. That's including
the top-level bookkeeping structure, which is better dynamically
alocated than being static.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 37 +++++++++++++-------------
1 file changed, 19 insertions(+), 18 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index 50a817920cfd9..8b6724fe8d71c 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -40,7 +40,7 @@ struct xgene_msi {
};
/* Global data */
-static struct xgene_msi xgene_msi_ctrl;
+static struct xgene_msi *xgene_msi_ctrl;
/*
* X-Gene v1 has 16 groups of MSI termination registers MSInIRx, where
@@ -253,18 +253,18 @@ static void xgene_free_domains(struct xgene_msi *msi)
irq_domain_remove(msi->inner_domain);
}
-static int xgene_msi_init_allocator(struct xgene_msi *xgene_msi)
+static int xgene_msi_init_allocator(struct device *dev)
{
- xgene_msi->bitmap = bitmap_zalloc(NR_MSI_VEC, GFP_KERNEL);
- if (!xgene_msi->bitmap)
+ xgene_msi_ctrl->bitmap = devm_bitmap_zalloc(dev, NR_MSI_VEC, GFP_KERNEL);
+ if (!xgene_msi_ctrl->bitmap)
return -ENOMEM;
- mutex_init(&xgene_msi->bitmap_lock);
+ mutex_init(&xgene_msi_ctrl->bitmap_lock);
- xgene_msi->msi_groups = kcalloc(NR_HW_IRQS,
- sizeof(struct xgene_msi_group),
- GFP_KERNEL);
- if (!xgene_msi->msi_groups)
+ xgene_msi_ctrl->msi_groups = devm_kcalloc(dev, NR_HW_IRQS,
+ sizeof(struct xgene_msi_group),
+ GFP_KERNEL);
+ if (!xgene_msi_ctrl->msi_groups)
return -ENOMEM;
return 0;
@@ -273,15 +273,14 @@ static int xgene_msi_init_allocator(struct xgene_msi *xgene_msi)
static void xgene_msi_isr(struct irq_desc *desc)
{
struct irq_chip *chip = irq_desc_get_chip(desc);
+ struct xgene_msi *xgene_msi = xgene_msi_ctrl;
struct xgene_msi_group *msi_groups;
- struct xgene_msi *xgene_msi;
int msir_index, msir_val, hw_irq, ret;
u32 intr_index, grp_select, msi_grp;
chained_irq_enter(chip, desc);
msi_groups = irq_desc_get_handler_data(desc);
- xgene_msi = msi_groups->msi;
msi_grp = msi_groups->msi_grp;
/*
@@ -344,15 +343,12 @@ static void xgene_msi_remove(struct platform_device *pdev)
kfree(msi->msi_groups);
- bitmap_free(msi->bitmap);
- msi->bitmap = NULL;
-
xgene_free_domains(msi);
}
static int xgene_msi_hwirq_alloc(unsigned int cpu)
{
- struct xgene_msi *msi = &xgene_msi_ctrl;
+ struct xgene_msi *msi = xgene_msi_ctrl;
struct xgene_msi_group *msi_group;
int i;
int err;
@@ -381,7 +377,7 @@ static int xgene_msi_hwirq_alloc(unsigned int cpu)
static int xgene_msi_hwirq_free(unsigned int cpu)
{
- struct xgene_msi *msi = &xgene_msi_ctrl;
+ struct xgene_msi *msi = xgene_msi_ctrl;
struct xgene_msi_group *msi_group;
int i;
@@ -406,7 +402,12 @@ static int xgene_msi_probe(struct platform_device *pdev)
int virt_msir;
u32 msi_val, msi_idx;
- xgene_msi = &xgene_msi_ctrl;
+ xgene_msi_ctrl = devm_kzalloc(&pdev->dev, sizeof(*xgene_msi_ctrl),
+ GFP_KERNEL);
+ if (!xgene_msi_ctrl)
+ return -ENOMEM;
+
+ xgene_msi = xgene_msi_ctrl;
platform_set_drvdata(pdev, xgene_msi);
@@ -417,7 +418,7 @@ static int xgene_msi_probe(struct platform_device *pdev)
}
xgene_msi->msi_addr = res->start;
- rc = xgene_msi_init_allocator(xgene_msi);
+ rc = xgene_msi_init_allocator(&pdev->dev);
if (rc) {
dev_err(&pdev->dev, "Error allocating MSI bitmap\n");
goto error;
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 08/13] PCI: xgene-msi: Get rid of intermediate tracking structure
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (6 preceding siblings ...)
2025-07-08 17:33 ` [PATCH v2 07/13] PCI: xgene-msi: Use device-managed memory allocations Marc Zyngier
@ 2025-07-08 17:33 ` Marc Zyngier
2025-07-08 17:34 ` [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting Marc Zyngier
` (5 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:33 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
The xgene-msi driver uses an odd construct in the form of an
intermediate tracking structure, evidently designed to deal with
multiple instances of the MSI widget. However, the existing HW
only has one set, and it is obvious that there won't be new HW
coming down that particular line.
Simplify the driver by using a bit of pointer arithmetic instead,
directly tracking the interrupt and avoiding extra memory allocation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 60 ++++++++------------------
1 file changed, 18 insertions(+), 42 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index 8b6724fe8d71c..cef0488749e1d 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -24,19 +24,13 @@
#define NR_HW_IRQS 16
#define NR_MSI_VEC (IDX_PER_GROUP * IRQS_PER_IDX * NR_HW_IRQS)
-struct xgene_msi_group {
- struct xgene_msi *msi;
- int gic_irq;
- u32 msi_grp;
-};
-
struct xgene_msi {
struct irq_domain *inner_domain;
u64 msi_addr;
void __iomem *msi_regs;
unsigned long *bitmap;
struct mutex bitmap_lock;
- struct xgene_msi_group *msi_groups;
+ unsigned int gic_irq[NR_HW_IRQS];
};
/* Global data */
@@ -261,27 +255,20 @@ static int xgene_msi_init_allocator(struct device *dev)
mutex_init(&xgene_msi_ctrl->bitmap_lock);
- xgene_msi_ctrl->msi_groups = devm_kcalloc(dev, NR_HW_IRQS,
- sizeof(struct xgene_msi_group),
- GFP_KERNEL);
- if (!xgene_msi_ctrl->msi_groups)
- return -ENOMEM;
-
return 0;
}
static void xgene_msi_isr(struct irq_desc *desc)
{
+ unsigned int *irqp = irq_desc_get_handler_data(desc);
struct irq_chip *chip = irq_desc_get_chip(desc);
struct xgene_msi *xgene_msi = xgene_msi_ctrl;
- struct xgene_msi_group *msi_groups;
int msir_index, msir_val, hw_irq, ret;
u32 intr_index, grp_select, msi_grp;
chained_irq_enter(chip, desc);
- msi_groups = irq_desc_get_handler_data(desc);
- msi_grp = msi_groups->msi_grp;
+ msi_grp = irqp - xgene_msi->gic_irq;
/*
* MSIINTn (n is 0..F) indicates if there is a pending MSI interrupt
@@ -341,35 +328,31 @@ static void xgene_msi_remove(struct platform_device *pdev)
cpuhp_remove_state(pci_xgene_online);
cpuhp_remove_state(CPUHP_PCI_XGENE_DEAD);
- kfree(msi->msi_groups);
-
xgene_free_domains(msi);
}
static int xgene_msi_hwirq_alloc(unsigned int cpu)
{
- struct xgene_msi *msi = xgene_msi_ctrl;
- struct xgene_msi_group *msi_group;
int i;
int err;
for (i = cpu; i < NR_HW_IRQS; i += num_possible_cpus()) {
- msi_group = &msi->msi_groups[i];
+ unsigned int irq = xgene_msi_ctrl->gic_irq[i];
/*
* Statically allocate MSI GIC IRQs to each CPU core.
* With 8-core X-Gene v1, 2 MSI GIC IRQs are allocated
* to each core.
*/
- irq_set_status_flags(msi_group->gic_irq, IRQ_NO_BALANCING);
- err = irq_set_affinity(msi_group->gic_irq, cpumask_of(cpu));
+ irq_set_status_flags(irq, IRQ_NO_BALANCING);
+ err = irq_set_affinity(irq, cpumask_of(cpu));
if (err) {
pr_err("failed to set affinity for GIC IRQ");
return err;
}
- irq_set_chained_handler_and_data(msi_group->gic_irq,
- xgene_msi_isr, msi_group);
+ irq_set_chained_handler_and_data(irq, xgene_msi_isr,
+ &xgene_msi_ctrl->gic_irq[i]);
}
return 0;
@@ -378,14 +361,11 @@ static int xgene_msi_hwirq_alloc(unsigned int cpu)
static int xgene_msi_hwirq_free(unsigned int cpu)
{
struct xgene_msi *msi = xgene_msi_ctrl;
- struct xgene_msi_group *msi_group;
int i;
- for (i = cpu; i < NR_HW_IRQS; i += num_possible_cpus()) {
- msi_group = &msi->msi_groups[i];
- irq_set_chained_handler_and_data(msi_group->gic_irq, NULL,
- NULL);
- }
+ for (i = cpu; i < NR_HW_IRQS; i += num_possible_cpus())
+ irq_set_chained_handler_and_data(msi->gic_irq[i], NULL, NULL);
+
return 0;
}
@@ -397,10 +377,9 @@ static const struct of_device_id xgene_msi_match_table[] = {
static int xgene_msi_probe(struct platform_device *pdev)
{
struct resource *res;
- int rc, irq_index;
struct xgene_msi *xgene_msi;
- int virt_msir;
u32 msi_val, msi_idx;
+ int rc;
xgene_msi_ctrl = devm_kzalloc(&pdev->dev, sizeof(*xgene_msi_ctrl),
GFP_KERNEL);
@@ -430,15 +409,12 @@ static int xgene_msi_probe(struct platform_device *pdev)
goto error;
}
- for (irq_index = 0; irq_index < NR_HW_IRQS; irq_index++) {
- virt_msir = platform_get_irq(pdev, irq_index);
- if (virt_msir < 0) {
- rc = virt_msir;
+ for (int irq_index = 0; irq_index < NR_HW_IRQS; irq_index++) {
+ rc = platform_get_irq(pdev, irq_index);
+ if (rc < 0)
goto error;
- }
- xgene_msi->msi_groups[irq_index].gic_irq = virt_msir;
- xgene_msi->msi_groups[irq_index].msi_grp = irq_index;
- xgene_msi->msi_groups[irq_index].msi = xgene_msi;
+
+ xgene_msi->gic_irq[irq_index] = rc;
}
/*
@@ -446,7 +422,7 @@ static int xgene_msi_probe(struct platform_device *pdev)
* interrupt handlers, read all of them to clear spurious
* interrupts that may occur before the driver is probed.
*/
- for (irq_index = 0; irq_index < NR_HW_IRQS; irq_index++) {
+ for (int irq_index = 0; irq_index < NR_HW_IRQS; irq_index++) {
for (msi_idx = 0; msi_idx < IDX_PER_GROUP; msi_idx++)
xgene_msi_ir_read(xgene_msi, irq_index, msi_idx);
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (7 preceding siblings ...)
2025-07-08 17:33 ` [PATCH v2 08/13] PCI: xgene-msi: Get rid of intermediate tracking structure Marc Zyngier
@ 2025-07-08 17:34 ` Marc Zyngier
2025-07-11 9:55 ` Lorenzo Pieralisi
2025-07-08 17:34 ` [PATCH v2 10/13] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU Marc Zyngier
` (4 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:34 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
Plugging a device that doesn't use managed affinity on an XGene-1
machine results in messages such as:
genirq: irq_chip PCI-MSIX-0000:01:00.0 did not update eff. affinity mask of irq 39
As it turns out, the driver was never updated to populate the effective
affinity on irq_set_affinity() call, and the core code is prickly about
that.
But upon further investigation, it appears that the driver keeps repainting
the hwirq field of the irq_data structure as a way to track the affinity
of the MSI, something that is very much frowned upon as it breaks the
fundamentals of an IRQ domain (an array indexed by hwirq).
Fixing this results more or less in a rewrite of the driver:
- Define how a hwirq and a cpu affinity map onto the MSI termination
registers
- Allocate a single entry in the bitmap per MSI instead of *8*
- Correctly track CPU affinity
- Fix the documentation so that it actually means something (to me)
- Use standard bitmap iterators
- and plenty of other cleanups
With this, the driver behaves correctly on my vintage Mustang board.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 222 +++++++++++--------------
1 file changed, 93 insertions(+), 129 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index cef0488749e1d..b9f364da87f2a 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -6,6 +6,7 @@
* Author: Tanmay Inamdar <tinamdar@apm.com>
* Duc Dang <dhdang@apm.com>
*/
+#include <linux/bitfield.h>
#include <linux/cpu.h>
#include <linux/interrupt.h>
#include <linux/irqdomain.h>
@@ -22,7 +23,15 @@
#define IDX_PER_GROUP 8
#define IRQS_PER_IDX 16
#define NR_HW_IRQS 16
-#define NR_MSI_VEC (IDX_PER_GROUP * IRQS_PER_IDX * NR_HW_IRQS)
+#define NR_MSI_BITS (IDX_PER_GROUP * IRQS_PER_IDX * NR_HW_IRQS)
+#define NR_MSI_VEC (NR_MSI_BITS / num_possible_cpus())
+
+#define MSI_GROUP_MASK GENMASK(22, 19)
+#define MSI_INDEX_MASK GENMASK(18, 16)
+#define MSI_INTR_MASK GENMASK(19, 16)
+
+#define MSInRx_HWIRQ_MASK GENMASK(6, 4)
+#define DATA_HWIRQ_MASK GENMASK(3, 0)
struct xgene_msi {
struct irq_domain *inner_domain;
@@ -37,8 +46,26 @@ struct xgene_msi {
static struct xgene_msi *xgene_msi_ctrl;
/*
- * X-Gene v1 has 16 groups of MSI termination registers MSInIRx, where
- * n is group number (0..F), x is index of registers in each group (0..7)
+ * X-Gene v1 has 16 frames of MSI termination registers MSInIRx, where n is
+ * frame number (0..15), x is index of registers in each frame (0..7). Each
+ * 32b register is at the beginning of a 64kB region, each frame occupying
+ * 512kB (and the whole thing 8MB of PA space).
+ *
+ * Each register supports 16 MSI vectors (0..15) to generate interrupts. A
+ * write to the MSInIRx from the PCI side generates an interrupt. A read
+ * from the MSInRx on the CPU side returns a bitmap of the pending MSIs in
+ * the lower 16 bits. A side effect of this read is that all pending
+ * interrupts are acknowledged and cleared).
+ *
+ * Additionally, each MSI termination frame has 1 MSIINTn register (n is
+ * 0..15) to indicate the MSI pending status caused by any of its 8
+ * termination registers, reported as a bitmap in the lower 8 bits. Each 32b
+ * register is at the beginning of a 64kB region (and overall occupying an
+ * extra 1MB).
+ *
+ * There is one GIC IRQ assigned for each MSI termination frame, 16 in
+ * total.
+ *
* The register layout is as follows:
* MSI0IR0 base_addr
* MSI0IR1 base_addr + 0x10000
@@ -59,107 +86,74 @@ static struct xgene_msi *xgene_msi_ctrl;
* MSIINT1 base_addr + 0x810000
* ... ...
* MSIINTF base_addr + 0x8F0000
- *
- * Each index register supports 16 MSI vectors (0..15) to generate interrupt.
- * There are total 16 GIC IRQs assigned for these 16 groups of MSI termination
- * registers.
- *
- * Each MSI termination group has 1 MSIINTn register (n is 0..15) to indicate
- * the MSI pending status caused by 1 of its 8 index registers.
*/
/* MSInIRx read helper */
-static u32 xgene_msi_ir_read(struct xgene_msi *msi,
- u32 msi_grp, u32 msir_idx)
+static u32 xgene_msi_ir_read(struct xgene_msi *msi, u32 msi_grp, u32 msir_idx)
{
return readl_relaxed(msi->msi_regs + MSI_IR0 +
- (msi_grp << 19) + (msir_idx << 16));
+ (FIELD_PREP(MSI_GROUP_MASK, msi_grp) |
+ FIELD_PREP(MSI_INDEX_MASK, msir_idx)));
}
/* MSIINTn read helper */
static u32 xgene_msi_int_read(struct xgene_msi *msi, u32 msi_grp)
{
- return readl_relaxed(msi->msi_regs + MSI_INT0 + (msi_grp << 16));
+ return readl_relaxed(msi->msi_regs + MSI_INT0 +
+ FIELD_PREP(MSI_INTR_MASK, msi_grp));
}
/*
- * With 2048 MSI vectors supported, the MSI message can be constructed using
- * following scheme:
- * - Divide into 8 256-vector groups
- * Group 0: 0-255
- * Group 1: 256-511
- * Group 2: 512-767
- * ...
- * Group 7: 1792-2047
- * - Each 256-vector group is divided into 16 16-vector groups
- * As an example: 16 16-vector groups for 256-vector group 0-255 is
- * Group 0: 0-15
- * Group 1: 16-32
- * ...
- * Group 15: 240-255
- * - The termination address of MSI vector in 256-vector group n and 16-vector
- * group x is the address of MSIxIRn
- * - The data for MSI vector in 16-vector group x is x
+ * In order to allow an MSI to be moved from one CPU to another without
+ * having to repaint both the address and the data (which cannot be done
+ * atomically), we statically partitions the MSI frames between CPUs. Given
+ * that XGene-1 has 8 CPUs, each CPU gets two frames assigned to it
+ *
+ * We adopt the convention that when an MSI is moved, it is configured to
+ * target the same register number in the congruent frame assigned to the
+ * new target CPU. This reserves a given MSI across all CPUs, and reduces
+ * the MSI capacity from 2048 to 256.
+ *
+ * Effectively, this amounts to:
+ * - hwirq[7]::cpu[2:0] is the target frame number (n in MSInIRx)
+ * - hwirq[6:4] is the register index in any given frame (x in MSInIRx)
+ * - hwirq[3:0] is the MSI data
*/
-static u32 hwirq_to_reg_set(unsigned long hwirq)
+static irq_hw_number_t compute_hwirq(u8 frame, u8 index, u8 data)
{
- return (hwirq / (NR_HW_IRQS * IRQS_PER_IDX));
-}
-
-static u32 hwirq_to_group(unsigned long hwirq)
-{
- return (hwirq % NR_HW_IRQS);
-}
-
-static u32 hwirq_to_msi_data(unsigned long hwirq)
-{
- return ((hwirq / NR_HW_IRQS) % IRQS_PER_IDX);
+ return (FIELD_PREP(BIT(7), FIELD_GET(BIT(3), frame)) |
+ FIELD_PREP(MSInRx_HWIRQ_MASK, index) |
+ FIELD_PREP(DATA_HWIRQ_MASK, data));
}
static void xgene_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
{
struct xgene_msi *msi = irq_data_get_irq_chip_data(data);
- u32 reg_set = hwirq_to_reg_set(data->hwirq);
- u32 group = hwirq_to_group(data->hwirq);
- u64 target_addr = msi->msi_addr + (((8 * group) + reg_set) << 16);
+ u64 target_addr;
+ u32 frame, msir;
+ int cpu;
- msg->address_hi = upper_32_bits(target_addr);
- msg->address_lo = lower_32_bits(target_addr);
- msg->data = hwirq_to_msi_data(data->hwirq);
-}
+ cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
+ msir = FIELD_GET(GENMASK(6, 4), data->hwirq);
+ frame = FIELD_PREP(BIT(3), FIELD_GET(BIT(7), data->hwirq)) | cpu;
-/*
- * X-Gene v1 only has 16 MSI GIC IRQs for 2048 MSI vectors. To maintain
- * the expected behaviour of .set_affinity for each MSI interrupt, the 16
- * MSI GIC IRQs are statically allocated to 8 X-Gene v1 cores (2 GIC IRQs
- * for each core). The MSI vector is moved from 1 MSI GIC IRQ to another
- * MSI GIC IRQ to steer its MSI interrupt to correct X-Gene v1 core. As a
- * consequence, the total MSI vectors that X-Gene v1 supports will be
- * reduced to 256 (2048/8) vectors.
- */
-static int hwirq_to_cpu(unsigned long hwirq)
-{
- return (hwirq % num_possible_cpus());
-}
+ target_addr = msi->msi_addr;
+ target_addr += (FIELD_PREP(MSI_GROUP_MASK, frame) |
+ FIELD_PREP(MSI_INTR_MASK, msir));
-static unsigned long hwirq_to_canonical_hwirq(unsigned long hwirq)
-{
- return (hwirq - hwirq_to_cpu(hwirq));
+ msg->address_hi = upper_32_bits(target_addr);
+ msg->address_lo = lower_32_bits(target_addr);
+ msg->data = FIELD_GET(DATA_HWIRQ_MASK, data->hwirq);
}
static int xgene_msi_set_affinity(struct irq_data *irqdata,
const struct cpumask *mask, bool force)
{
int target_cpu = cpumask_first(mask);
- int curr_cpu;
-
- curr_cpu = hwirq_to_cpu(irqdata->hwirq);
- if (curr_cpu == target_cpu)
- return IRQ_SET_MASK_OK_DONE;
- /* Update MSI number to target the new CPU */
- irqdata->hwirq = hwirq_to_canonical_hwirq(irqdata->hwirq) + target_cpu;
+ irq_data_update_effective_affinity(irqdata, cpumask_of(target_cpu));
+ /* Force the core code to regenerate the message */
return IRQ_SET_MASK_OK;
}
@@ -173,23 +167,20 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
unsigned int nr_irqs, void *args)
{
struct xgene_msi *msi = domain->host_data;
- int msi_irq;
+ irq_hw_number_t hwirq;
mutex_lock(&msi->bitmap_lock);
- msi_irq = bitmap_find_next_zero_area(msi->bitmap, NR_MSI_VEC, 0,
- num_possible_cpus(), 0);
- if (msi_irq < NR_MSI_VEC)
- bitmap_set(msi->bitmap, msi_irq, num_possible_cpus());
- else
- msi_irq = -ENOSPC;
+ hwirq = find_first_zero_bit(msi->bitmap, NR_MSI_VEC);
+ if (hwirq < NR_MSI_VEC)
+ set_bit(hwirq, msi->bitmap);
mutex_unlock(&msi->bitmap_lock);
- if (msi_irq < 0)
- return msi_irq;
+ if (hwirq >= NR_MSI_VEC)
+ return -ENOSPC;
- irq_domain_set_info(domain, virq, msi_irq,
+ irq_domain_set_info(domain, virq, hwirq,
&xgene_msi_bottom_irq_chip, domain->host_data,
handle_simple_irq, NULL, NULL);
@@ -201,12 +192,10 @@ static void xgene_irq_domain_free(struct irq_domain *domain,
{
struct irq_data *d = irq_domain_get_irq_data(domain, virq);
struct xgene_msi *msi = irq_data_get_irq_chip_data(d);
- u32 hwirq;
mutex_lock(&msi->bitmap_lock);
- hwirq = hwirq_to_canonical_hwirq(d->hwirq);
- bitmap_clear(msi->bitmap, hwirq, num_possible_cpus());
+ clear_bit(d->hwirq, msi->bitmap);
mutex_unlock(&msi->bitmap_lock);
@@ -263,55 +252,30 @@ static void xgene_msi_isr(struct irq_desc *desc)
unsigned int *irqp = irq_desc_get_handler_data(desc);
struct irq_chip *chip = irq_desc_get_chip(desc);
struct xgene_msi *xgene_msi = xgene_msi_ctrl;
- int msir_index, msir_val, hw_irq, ret;
- u32 intr_index, grp_select, msi_grp;
+ unsigned long grp_pending;
+ int msir_idx;
+ u32 msi_grp;
chained_irq_enter(chip, desc);
msi_grp = irqp - xgene_msi->gic_irq;
- /*
- * MSIINTn (n is 0..F) indicates if there is a pending MSI interrupt
- * If bit x of this register is set (x is 0..7), one or more interrupts
- * corresponding to MSInIRx is set.
- */
- grp_select = xgene_msi_int_read(xgene_msi, msi_grp);
- while (grp_select) {
- msir_index = ffs(grp_select) - 1;
- /*
- * Calculate MSInIRx address to read to check for interrupts
- * (refer to termination address and data assignment
- * described in xgene_compose_msi_msg() )
- */
- msir_val = xgene_msi_ir_read(xgene_msi, msi_grp, msir_index);
- while (msir_val) {
- intr_index = ffs(msir_val) - 1;
- /*
- * Calculate MSI vector number (refer to the termination
- * address and data assignment described in
- * xgene_compose_msi_msg function)
- */
- hw_irq = (((msir_index * IRQS_PER_IDX) + intr_index) *
- NR_HW_IRQS) + msi_grp;
- /*
- * As we have multiple hw_irq that maps to single MSI,
- * always look up the virq using the hw_irq as seen from
- * CPU0
- */
- hw_irq = hwirq_to_canonical_hwirq(hw_irq);
- ret = generic_handle_domain_irq(xgene_msi->inner_domain, hw_irq);
+ grp_pending = xgene_msi_int_read(xgene_msi, msi_grp);
+
+ for_each_set_bit(msir_idx, &grp_pending, IDX_PER_GROUP) {
+ unsigned long msir;
+ int intr_idx;
+
+ msir = xgene_msi_ir_read(xgene_msi, msi_grp, msir_idx);
+
+ for_each_set_bit(intr_idx, &msir, IRQS_PER_IDX) {
+ irq_hw_number_t hwirq;
+ int ret;
+
+ hwirq = compute_hwirq(msi_grp, msir_idx, intr_idx);
+ ret = generic_handle_domain_irq(xgene_msi->inner_domain,
+ hwirq);
WARN_ON_ONCE(ret);
- msir_val &= ~(1 << intr_index);
- }
- grp_select &= ~(1 << msir_index);
-
- if (!grp_select) {
- /*
- * We handled all interrupts happened in this group,
- * resample this group MSI_INTx register in case
- * something else has been made pending in the meantime
- */
- grp_select = xgene_msi_int_read(xgene_msi, msi_grp);
}
}
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 10/13] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (8 preceding siblings ...)
2025-07-08 17:34 ` [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting Marc Zyngier
@ 2025-07-08 17:34 ` Marc Zyngier
2025-07-08 17:34 ` [PATCH v2 11/13] PCI: xgene-msi: Probe as a standard platform driver Marc Zyngier
` (3 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:34 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
Since changing the affinity of an MSI really is about changing
the target address and that it isn't possible to mask an individual
MSI, it is completely possible for an interrupt to race with itself,
usually resulting in a lost interrupt.
Paper over the design blunder by informing the core code of this
sad state of affair.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index b9f364da87f2a..a190c25c8df52 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -183,6 +183,7 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
irq_domain_set_info(domain, virq, hwirq,
&xgene_msi_bottom_irq_chip, domain->host_data,
handle_simple_irq, NULL, NULL);
+ irqd_set_resend_when_in_progress(irq_get_irq_data(virq));
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 11/13] PCI: xgene-msi: Probe as a standard platform driver
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (9 preceding siblings ...)
2025-07-08 17:34 ` [PATCH v2 10/13] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU Marc Zyngier
@ 2025-07-08 17:34 ` Marc Zyngier
2025-07-17 11:45 ` Markus Elfring
2025-07-08 17:34 ` [PATCH v2 12/13] PCI: xgene-msi: Restructure handler setup/teardown Marc Zyngier
` (2 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:34 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
Now that we have made the dependedncy between the PCI driver and
the MSI driver explicit, there is no need to use subsys_initcall()
as a probing hook, and we can rely on builtin_platform_driver()
instead.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index a190c25c8df52..243c7721c8799 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -429,9 +429,4 @@ static struct platform_driver xgene_msi_driver = {
.probe = xgene_msi_probe,
.remove = xgene_msi_remove,
};
-
-static int __init xgene_pcie_msi_init(void)
-{
- return platform_driver_register(&xgene_msi_driver);
-}
-subsys_initcall(xgene_pcie_msi_init);
+builtin_platform_driver(xgene_msi_driver);
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 12/13] PCI: xgene-msi: Restructure handler setup/teardown
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (10 preceding siblings ...)
2025-07-08 17:34 ` [PATCH v2 11/13] PCI: xgene-msi: Probe as a standard platform driver Marc Zyngier
@ 2025-07-08 17:34 ` Marc Zyngier
2025-07-08 17:34 ` [PATCH v2 13/13] cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD Marc Zyngier
2025-07-17 9:52 ` [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Lorenzo Pieralisi
13 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:34 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
Another utterly pointless aspect of the xgene-msi driver is that
it is built around CPU hotplug. Which is quite amusing since this
is one of the few arm64 platforms that, by construction, cannot
do CPU hotplug in a supported way (no EL3, no PSCI, no luck).
Drop the CPU hotplug nonsense and just setup the IRQs and handlers
in a less overdesigned way, grouping things more logically in the
process.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
drivers/pci/controller/pci-xgene-msi.c | 107 +++++++++----------------
1 file changed, 37 insertions(+), 70 deletions(-)
diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index 243c7721c8799..dd8b119ab90e0 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -231,12 +231,6 @@ static int xgene_allocate_domains(struct device_node *node,
return msi->inner_domain ? 0 : -ENOMEM;
}
-static void xgene_free_domains(struct xgene_msi *msi)
-{
- if (msi->inner_domain)
- irq_domain_remove(msi->inner_domain);
-}
-
static int xgene_msi_init_allocator(struct device *dev)
{
xgene_msi_ctrl->bitmap = devm_bitmap_zalloc(dev, NR_MSI_VEC, GFP_KERNEL);
@@ -283,26 +277,48 @@ static void xgene_msi_isr(struct irq_desc *desc)
chained_irq_exit(chip, desc);
}
-static enum cpuhp_state pci_xgene_online;
-
static void xgene_msi_remove(struct platform_device *pdev)
{
- struct xgene_msi *msi = platform_get_drvdata(pdev);
-
- if (pci_xgene_online)
- cpuhp_remove_state(pci_xgene_online);
- cpuhp_remove_state(CPUHP_PCI_XGENE_DEAD);
+ for (int i = 0; i < NR_HW_IRQS; i++) {
+ unsigned int irq = xgene_msi_ctrl->gic_irq[i];
+ if (!irq)
+ continue;
+ irq_set_chained_handler_and_data(irq, NULL, NULL);
+ }
- xgene_free_domains(msi);
+ if (xgene_msi_ctrl->inner_domain)
+ irq_domain_remove(xgene_msi_ctrl->inner_domain);
}
-static int xgene_msi_hwirq_alloc(unsigned int cpu)
+static int xgene_msi_handler_setup(struct platform_device *pdev)
{
+ struct xgene_msi *xgene_msi = xgene_msi_ctrl;
int i;
- int err;
- for (i = cpu; i < NR_HW_IRQS; i += num_possible_cpus()) {
- unsigned int irq = xgene_msi_ctrl->gic_irq[i];
+ for (i = 0; i < NR_HW_IRQS; i++) {
+ u32 msi_val;
+ int irq, err;
+
+ /*
+ * MSInIRx registers are read-to-clear; before registering
+ * interrupt handlers, read all of them to clear spurious
+ * interrupts that may occur before the driver is probed.
+ */
+ for (int msi_idx = 0; msi_idx < IDX_PER_GROUP; msi_idx++)
+ xgene_msi_ir_read(xgene_msi, i, msi_idx);
+
+ /* Read MSIINTn to confirm */
+ msi_val = xgene_msi_int_read(xgene_msi, i);
+ if (msi_val) {
+ dev_err(&pdev->dev, "Failed to clear spurious IRQ\n");
+ return EINVAL;
+ }
+
+ irq = platform_get_irq(pdev, i);
+ if (irq < 0)
+ return irq;
+
+ xgene_msi->gic_irq[i] = irq;
/*
* Statically allocate MSI GIC IRQs to each CPU core.
@@ -310,7 +326,7 @@ static int xgene_msi_hwirq_alloc(unsigned int cpu)
* to each core.
*/
irq_set_status_flags(irq, IRQ_NO_BALANCING);
- err = irq_set_affinity(irq, cpumask_of(cpu));
+ err = irq_set_affinity(irq, cpumask_of(i % num_possible_cpus()));
if (err) {
pr_err("failed to set affinity for GIC IRQ");
return err;
@@ -323,17 +339,6 @@ static int xgene_msi_hwirq_alloc(unsigned int cpu)
return 0;
}
-static int xgene_msi_hwirq_free(unsigned int cpu)
-{
- struct xgene_msi *msi = xgene_msi_ctrl;
- int i;
-
- for (i = cpu; i < NR_HW_IRQS; i += num_possible_cpus())
- irq_set_chained_handler_and_data(msi->gic_irq[i], NULL, NULL);
-
- return 0;
-}
-
static const struct of_device_id xgene_msi_match_table[] = {
{.compatible = "apm,xgene1-msi"},
{},
@@ -343,7 +348,6 @@ static int xgene_msi_probe(struct platform_device *pdev)
{
struct resource *res;
struct xgene_msi *xgene_msi;
- u32 msi_val, msi_idx;
int rc;
xgene_msi_ctrl = devm_kzalloc(&pdev->dev, sizeof(*xgene_msi_ctrl),
@@ -353,8 +357,6 @@ static int xgene_msi_probe(struct platform_device *pdev)
xgene_msi = xgene_msi_ctrl;
- platform_set_drvdata(pdev, xgene_msi);
-
xgene_msi->msi_regs = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
if (IS_ERR(xgene_msi->msi_regs)) {
rc = PTR_ERR(xgene_msi->msi_regs);
@@ -374,48 +376,13 @@ static int xgene_msi_probe(struct platform_device *pdev)
goto error;
}
- for (int irq_index = 0; irq_index < NR_HW_IRQS; irq_index++) {
- rc = platform_get_irq(pdev, irq_index);
- if (rc < 0)
- goto error;
-
- xgene_msi->gic_irq[irq_index] = rc;
- }
-
- /*
- * MSInIRx registers are read-to-clear; before registering
- * interrupt handlers, read all of them to clear spurious
- * interrupts that may occur before the driver is probed.
- */
- for (int irq_index = 0; irq_index < NR_HW_IRQS; irq_index++) {
- for (msi_idx = 0; msi_idx < IDX_PER_GROUP; msi_idx++)
- xgene_msi_ir_read(xgene_msi, irq_index, msi_idx);
-
- /* Read MSIINTn to confirm */
- msi_val = xgene_msi_int_read(xgene_msi, irq_index);
- if (msi_val) {
- dev_err(&pdev->dev, "Failed to clear spurious IRQ\n");
- rc = -EINVAL;
- goto error;
- }
- }
-
- rc = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "pci/xgene:online",
- xgene_msi_hwirq_alloc, NULL);
- if (rc < 0)
- goto err_cpuhp;
- pci_xgene_online = rc;
- rc = cpuhp_setup_state(CPUHP_PCI_XGENE_DEAD, "pci/xgene:dead", NULL,
- xgene_msi_hwirq_free);
+ rc = xgene_msi_handler_setup(pdev);
if (rc)
- goto err_cpuhp;
+ goto error;
dev_info(&pdev->dev, "APM X-Gene PCIe MSI driver loaded\n");
return 0;
-
-err_cpuhp:
- dev_err(&pdev->dev, "failed to add CPU MSI notifier\n");
error:
xgene_msi_remove(pdev);
return rc;
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 13/13] cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (11 preceding siblings ...)
2025-07-08 17:34 ` [PATCH v2 12/13] PCI: xgene-msi: Restructure handler setup/teardown Marc Zyngier
@ 2025-07-08 17:34 ` Marc Zyngier
2025-07-11 13:15 ` Lorenzo Pieralisi
2025-07-17 9:52 ` [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Lorenzo Pieralisi
13 siblings, 1 reply; 23+ messages in thread
From: Marc Zyngier @ 2025-07-08 17:34 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel
Cc: Toan Le, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
Now that the XGene MSI driver has been mostly rewritten and doesn't
use the CPU hotplug infrastructure, CPUHP_PCI_XGENE_DEAD is unused.
Remove it to reduce the size of cpuhp_hp_states[].
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
include/linux/cpuhotplug.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index df366ee15456b..eaca70eb6136b 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -90,7 +90,6 @@ enum cpuhp_state {
CPUHP_RADIX_DEAD,
CPUHP_PAGE_ALLOC,
CPUHP_NET_DEV_DEAD,
- CPUHP_PCI_XGENE_DEAD,
CPUHP_IOMMU_IOVA_DEAD,
CPUHP_AP_ARM_CACHE_B15_RAC_DEAD,
CPUHP_PADATA_DEAD,
--
2.39.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting
2025-07-08 17:34 ` [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting Marc Zyngier
@ 2025-07-11 9:55 ` Lorenzo Pieralisi
2025-07-11 10:11 ` Lorenzo Pieralisi
2025-07-11 10:50 ` Marc Zyngier
0 siblings, 2 replies; 23+ messages in thread
From: Lorenzo Pieralisi @ 2025-07-11 9:55 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-pci, linux-arm-kernel, linux-kernel, Toan Le,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Bjorn Helgaas, Thomas Gleixner
On Tue, Jul 08, 2025 at 06:34:00PM +0100, Marc Zyngier wrote:
> Plugging a device that doesn't use managed affinity on an XGene-1
> machine results in messages such as:
>
> genirq: irq_chip PCI-MSIX-0000:01:00.0 did not update eff. affinity mask of irq 39
>
> As it turns out, the driver was never updated to populate the effective
> affinity on irq_set_affinity() call, and the core code is prickly about
> that.
>
> But upon further investigation, it appears that the driver keeps repainting
> the hwirq field of the irq_data structure as a way to track the affinity
> of the MSI, something that is very much frowned upon as it breaks the
> fundamentals of an IRQ domain (an array indexed by hwirq).
>
> Fixing this results more or less in a rewrite of the driver:
>
> - Define how a hwirq and a cpu affinity map onto the MSI termination
> registers
>
> - Allocate a single entry in the bitmap per MSI instead of *8*
>
> - Correctly track CPU affinity
>
> - Fix the documentation so that it actually means something (to me)
>
> - Use standard bitmap iterators
>
> - and plenty of other cleanups
>
> With this, the driver behaves correctly on my vintage Mustang board.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> drivers/pci/controller/pci-xgene-msi.c | 222 +++++++++++--------------
> 1 file changed, 93 insertions(+), 129 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
> index cef0488749e1d..b9f364da87f2a 100644
> --- a/drivers/pci/controller/pci-xgene-msi.c
> +++ b/drivers/pci/controller/pci-xgene-msi.c
> @@ -6,6 +6,7 @@
> * Author: Tanmay Inamdar <tinamdar@apm.com>
> * Duc Dang <dhdang@apm.com>
> */
> +#include <linux/bitfield.h>
> #include <linux/cpu.h>
> #include <linux/interrupt.h>
> #include <linux/irqdomain.h>
> @@ -22,7 +23,15 @@
> #define IDX_PER_GROUP 8
> #define IRQS_PER_IDX 16
> #define NR_HW_IRQS 16
> -#define NR_MSI_VEC (IDX_PER_GROUP * IRQS_PER_IDX * NR_HW_IRQS)
> +#define NR_MSI_BITS (IDX_PER_GROUP * IRQS_PER_IDX * NR_HW_IRQS)
> +#define NR_MSI_VEC (NR_MSI_BITS / num_possible_cpus())
> +
> +#define MSI_GROUP_MASK GENMASK(22, 19)
> +#define MSI_INDEX_MASK GENMASK(18, 16)
> +#define MSI_INTR_MASK GENMASK(19, 16)
> +
> +#define MSInRx_HWIRQ_MASK GENMASK(6, 4)
> +#define DATA_HWIRQ_MASK GENMASK(3, 0)
>
> struct xgene_msi {
> struct irq_domain *inner_domain;
> @@ -37,8 +46,26 @@ struct xgene_msi {
> static struct xgene_msi *xgene_msi_ctrl;
>
> /*
> - * X-Gene v1 has 16 groups of MSI termination registers MSInIRx, where
> - * n is group number (0..F), x is index of registers in each group (0..7)
> + * X-Gene v1 has 16 frames of MSI termination registers MSInIRx, where n is
> + * frame number (0..15), x is index of registers in each frame (0..7). Each
> + * 32b register is at the beginning of a 64kB region, each frame occupying
> + * 512kB (and the whole thing 8MB of PA space).
> + *
> + * Each register supports 16 MSI vectors (0..15) to generate interrupts. A
> + * write to the MSInIRx from the PCI side generates an interrupt. A read
> + * from the MSInRx on the CPU side returns a bitmap of the pending MSIs in
> + * the lower 16 bits. A side effect of this read is that all pending
> + * interrupts are acknowledged and cleared).
> + *
> + * Additionally, each MSI termination frame has 1 MSIINTn register (n is
> + * 0..15) to indicate the MSI pending status caused by any of its 8
> + * termination registers, reported as a bitmap in the lower 8 bits. Each 32b
> + * register is at the beginning of a 64kB region (and overall occupying an
> + * extra 1MB).
> + *
> + * There is one GIC IRQ assigned for each MSI termination frame, 16 in
> + * total.
> + *
> * The register layout is as follows:
> * MSI0IR0 base_addr
> * MSI0IR1 base_addr + 0x10000
> @@ -59,107 +86,74 @@ static struct xgene_msi *xgene_msi_ctrl;
> * MSIINT1 base_addr + 0x810000
> * ... ...
> * MSIINTF base_addr + 0x8F0000
> - *
> - * Each index register supports 16 MSI vectors (0..15) to generate interrupt.
> - * There are total 16 GIC IRQs assigned for these 16 groups of MSI termination
> - * registers.
> - *
> - * Each MSI termination group has 1 MSIINTn register (n is 0..15) to indicate
> - * the MSI pending status caused by 1 of its 8 index registers.
> */
>
> /* MSInIRx read helper */
> -static u32 xgene_msi_ir_read(struct xgene_msi *msi,
> - u32 msi_grp, u32 msir_idx)
> +static u32 xgene_msi_ir_read(struct xgene_msi *msi, u32 msi_grp, u32 msir_idx)
> {
> return readl_relaxed(msi->msi_regs + MSI_IR0 +
> - (msi_grp << 19) + (msir_idx << 16));
> + (FIELD_PREP(MSI_GROUP_MASK, msi_grp) |
> + FIELD_PREP(MSI_INDEX_MASK, msir_idx)));
> }
>
> /* MSIINTn read helper */
> static u32 xgene_msi_int_read(struct xgene_msi *msi, u32 msi_grp)
> {
> - return readl_relaxed(msi->msi_regs + MSI_INT0 + (msi_grp << 16));
> + return readl_relaxed(msi->msi_regs + MSI_INT0 +
> + FIELD_PREP(MSI_INTR_MASK, msi_grp));
> }
>
> /*
> - * With 2048 MSI vectors supported, the MSI message can be constructed using
> - * following scheme:
> - * - Divide into 8 256-vector groups
> - * Group 0: 0-255
> - * Group 1: 256-511
> - * Group 2: 512-767
> - * ...
> - * Group 7: 1792-2047
> - * - Each 256-vector group is divided into 16 16-vector groups
> - * As an example: 16 16-vector groups for 256-vector group 0-255 is
> - * Group 0: 0-15
> - * Group 1: 16-32
> - * ...
> - * Group 15: 240-255
> - * - The termination address of MSI vector in 256-vector group n and 16-vector
> - * group x is the address of MSIxIRn
> - * - The data for MSI vector in 16-vector group x is x
> + * In order to allow an MSI to be moved from one CPU to another without
> + * having to repaint both the address and the data (which cannot be done
> + * atomically), we statically partitions the MSI frames between CPUs. Given
> + * that XGene-1 has 8 CPUs, each CPU gets two frames assigned to it
> + *
> + * We adopt the convention that when an MSI is moved, it is configured to
> + * target the same register number in the congruent frame assigned to the
> + * new target CPU. This reserves a given MSI across all CPUs, and reduces
> + * the MSI capacity from 2048 to 256.
> + *
> + * Effectively, this amounts to:
> + * - hwirq[7]::cpu[2:0] is the target frame number (n in MSInIRx)
> + * - hwirq[6:4] is the register index in any given frame (x in MSInIRx)
> + * - hwirq[3:0] is the MSI data
> */
> -static u32 hwirq_to_reg_set(unsigned long hwirq)
> +static irq_hw_number_t compute_hwirq(u8 frame, u8 index, u8 data)
> {
> - return (hwirq / (NR_HW_IRQS * IRQS_PER_IDX));
> -}
> -
> -static u32 hwirq_to_group(unsigned long hwirq)
> -{
> - return (hwirq % NR_HW_IRQS);
> -}
> -
> -static u32 hwirq_to_msi_data(unsigned long hwirq)
> -{
> - return ((hwirq / NR_HW_IRQS) % IRQS_PER_IDX);
> + return (FIELD_PREP(BIT(7), FIELD_GET(BIT(3), frame)) |
> + FIELD_PREP(MSInRx_HWIRQ_MASK, index) |
> + FIELD_PREP(DATA_HWIRQ_MASK, data));
> }
>
> static void xgene_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> {
> struct xgene_msi *msi = irq_data_get_irq_chip_data(data);
> - u32 reg_set = hwirq_to_reg_set(data->hwirq);
> - u32 group = hwirq_to_group(data->hwirq);
> - u64 target_addr = msi->msi_addr + (((8 * group) + reg_set) << 16);
> + u64 target_addr;
> + u32 frame, msir;
> + int cpu;
>
> - msg->address_hi = upper_32_bits(target_addr);
> - msg->address_lo = lower_32_bits(target_addr);
> - msg->data = hwirq_to_msi_data(data->hwirq);
> -}
> + cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
> + msir = FIELD_GET(GENMASK(6, 4), data->hwirq);
We could use MSInRx_HWIRQ_MASK, I can update it.
More importantly, what code would set data->hwirq[6:4] (and
data->hwirq[7:7] below) ?
> + frame = FIELD_PREP(BIT(3), FIELD_GET(BIT(7), data->hwirq)) | cpu;
>
> -/*
> - * X-Gene v1 only has 16 MSI GIC IRQs for 2048 MSI vectors. To maintain
> - * the expected behaviour of .set_affinity for each MSI interrupt, the 16
> - * MSI GIC IRQs are statically allocated to 8 X-Gene v1 cores (2 GIC IRQs
> - * for each core). The MSI vector is moved from 1 MSI GIC IRQ to another
> - * MSI GIC IRQ to steer its MSI interrupt to correct X-Gene v1 core. As a
> - * consequence, the total MSI vectors that X-Gene v1 supports will be
> - * reduced to 256 (2048/8) vectors.
> - */
> -static int hwirq_to_cpu(unsigned long hwirq)
> -{
> - return (hwirq % num_possible_cpus());
> -}
> + target_addr = msi->msi_addr;
> + target_addr += (FIELD_PREP(MSI_GROUP_MASK, frame) |
> + FIELD_PREP(MSI_INTR_MASK, msir));
>
> -static unsigned long hwirq_to_canonical_hwirq(unsigned long hwirq)
> -{
> - return (hwirq - hwirq_to_cpu(hwirq));
> + msg->address_hi = upper_32_bits(target_addr);
> + msg->address_lo = lower_32_bits(target_addr);
> + msg->data = FIELD_GET(DATA_HWIRQ_MASK, data->hwirq);
> }
>
> static int xgene_msi_set_affinity(struct irq_data *irqdata,
> const struct cpumask *mask, bool force)
> {
> int target_cpu = cpumask_first(mask);
> - int curr_cpu;
> -
> - curr_cpu = hwirq_to_cpu(irqdata->hwirq);
> - if (curr_cpu == target_cpu)
> - return IRQ_SET_MASK_OK_DONE;
>
> - /* Update MSI number to target the new CPU */
> - irqdata->hwirq = hwirq_to_canonical_hwirq(irqdata->hwirq) + target_cpu;
> + irq_data_update_effective_affinity(irqdata, cpumask_of(target_cpu));
>
> + /* Force the core code to regenerate the message */
> return IRQ_SET_MASK_OK;
> }
>
> @@ -173,23 +167,20 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
> unsigned int nr_irqs, void *args)
> {
> struct xgene_msi *msi = domain->host_data;
> - int msi_irq;
> + irq_hw_number_t hwirq;
>
> mutex_lock(&msi->bitmap_lock);
>
> - msi_irq = bitmap_find_next_zero_area(msi->bitmap, NR_MSI_VEC, 0,
> - num_possible_cpus(), 0);
> - if (msi_irq < NR_MSI_VEC)
> - bitmap_set(msi->bitmap, msi_irq, num_possible_cpus());
> - else
> - msi_irq = -ENOSPC;
> + hwirq = find_first_zero_bit(msi->bitmap, NR_MSI_VEC);
> + if (hwirq < NR_MSI_VEC)
> + set_bit(hwirq, msi->bitmap);
>
> mutex_unlock(&msi->bitmap_lock);
>
> - if (msi_irq < 0)
> - return msi_irq;
> + if (hwirq >= NR_MSI_VEC)
> + return -ENOSPC;
>
> - irq_domain_set_info(domain, virq, msi_irq,
> + irq_domain_set_info(domain, virq, hwirq,
> &xgene_msi_bottom_irq_chip, domain->host_data,
> handle_simple_irq, NULL, NULL);
This is something I don't get. We alloc an MSI, set a bit in the bitmap
and the hwirq to that value, when we handle the IRQ below in
xgene_msi_isr()
hwirq = compute_hwirq(msi_grp, msir_idx, intr_idx);
ret = generic_handle_domain_irq(xgene_msi->inner_domain, hwirq);
imagining that we changed the affinity for the IRQ so that the computed
HWIRQ does not have zeros in bits[7:4], how would the domain HWIRQ
matching work ?
Actually, how would an IRQ fire causing the hwirq[7:4] bits to be != 0
in the first place ?
Forgive me if I am missing something obvious, the *current* MSI handling
is very hard to grok, it is certain I misunderstood it entirely.
Thanks,
Lorenzo
> @@ -201,12 +192,10 @@ static void xgene_irq_domain_free(struct irq_domain *domain,
> {
> struct irq_data *d = irq_domain_get_irq_data(domain, virq);
> struct xgene_msi *msi = irq_data_get_irq_chip_data(d);
> - u32 hwirq;
>
> mutex_lock(&msi->bitmap_lock);
>
> - hwirq = hwirq_to_canonical_hwirq(d->hwirq);
> - bitmap_clear(msi->bitmap, hwirq, num_possible_cpus());
> + clear_bit(d->hwirq, msi->bitmap);
>
> mutex_unlock(&msi->bitmap_lock);
>
> @@ -263,55 +252,30 @@ static void xgene_msi_isr(struct irq_desc *desc)
> unsigned int *irqp = irq_desc_get_handler_data(desc);
> struct irq_chip *chip = irq_desc_get_chip(desc);
> struct xgene_msi *xgene_msi = xgene_msi_ctrl;
> - int msir_index, msir_val, hw_irq, ret;
> - u32 intr_index, grp_select, msi_grp;
> + unsigned long grp_pending;
> + int msir_idx;
> + u32 msi_grp;
>
> chained_irq_enter(chip, desc);
>
> msi_grp = irqp - xgene_msi->gic_irq;
>
> - /*
> - * MSIINTn (n is 0..F) indicates if there is a pending MSI interrupt
> - * If bit x of this register is set (x is 0..7), one or more interrupts
> - * corresponding to MSInIRx is set.
> - */
> - grp_select = xgene_msi_int_read(xgene_msi, msi_grp);
> - while (grp_select) {
> - msir_index = ffs(grp_select) - 1;
> - /*
> - * Calculate MSInIRx address to read to check for interrupts
> - * (refer to termination address and data assignment
> - * described in xgene_compose_msi_msg() )
> - */
> - msir_val = xgene_msi_ir_read(xgene_msi, msi_grp, msir_index);
> - while (msir_val) {
> - intr_index = ffs(msir_val) - 1;
> - /*
> - * Calculate MSI vector number (refer to the termination
> - * address and data assignment described in
> - * xgene_compose_msi_msg function)
> - */
> - hw_irq = (((msir_index * IRQS_PER_IDX) + intr_index) *
> - NR_HW_IRQS) + msi_grp;
> - /*
> - * As we have multiple hw_irq that maps to single MSI,
> - * always look up the virq using the hw_irq as seen from
> - * CPU0
> - */
> - hw_irq = hwirq_to_canonical_hwirq(hw_irq);
> - ret = generic_handle_domain_irq(xgene_msi->inner_domain, hw_irq);
> + grp_pending = xgene_msi_int_read(xgene_msi, msi_grp);
> +
> + for_each_set_bit(msir_idx, &grp_pending, IDX_PER_GROUP) {
> + unsigned long msir;
> + int intr_idx;
> +
> + msir = xgene_msi_ir_read(xgene_msi, msi_grp, msir_idx);
> +
> + for_each_set_bit(intr_idx, &msir, IRQS_PER_IDX) {
> + irq_hw_number_t hwirq;
> + int ret;
> +
> + hwirq = compute_hwirq(msi_grp, msir_idx, intr_idx);
> + ret = generic_handle_domain_irq(xgene_msi->inner_domain,
> + hwirq);
> WARN_ON_ONCE(ret);
> - msir_val &= ~(1 << intr_index);
> - }
> - grp_select &= ~(1 << msir_index);
> -
> - if (!grp_select) {
> - /*
> - * We handled all interrupts happened in this group,
> - * resample this group MSI_INTx register in case
> - * something else has been made pending in the meantime
> - */
> - grp_select = xgene_msi_int_read(xgene_msi, msi_grp);
> }
> }
>
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting
2025-07-11 9:55 ` Lorenzo Pieralisi
@ 2025-07-11 10:11 ` Lorenzo Pieralisi
2025-07-11 10:51 ` Marc Zyngier
2025-07-11 10:50 ` Marc Zyngier
1 sibling, 1 reply; 23+ messages in thread
From: Lorenzo Pieralisi @ 2025-07-11 10:11 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-pci, linux-arm-kernel, linux-kernel, Toan Le,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Bjorn Helgaas, Thomas Gleixner
On Fri, Jul 11, 2025 at 11:55:17AM +0200, Lorenzo Pieralisi wrote:
> On Tue, Jul 08, 2025 at 06:34:00PM +0100, Marc Zyngier wrote:
[...]
> We could use MSInRx_HWIRQ_MASK, I can update it.
>
> More importantly, what code would set data->hwirq[6:4] (and
> data->hwirq[7:7] below) ?
Forget it. It is the hwirq allocation itself that sets those bits,
256 HWIRQs you use the effective cpu affinity to steer the frame,
it makes sense now.
I can update the code to use the mask above and merge it.
Sorry for the noise,
Lorenzo
> > + frame = FIELD_PREP(BIT(3), FIELD_GET(BIT(7), data->hwirq)) | cpu;
> >
> > -/*
> > - * X-Gene v1 only has 16 MSI GIC IRQs for 2048 MSI vectors. To maintain
> > - * the expected behaviour of .set_affinity for each MSI interrupt, the 16
> > - * MSI GIC IRQs are statically allocated to 8 X-Gene v1 cores (2 GIC IRQs
> > - * for each core). The MSI vector is moved from 1 MSI GIC IRQ to another
> > - * MSI GIC IRQ to steer its MSI interrupt to correct X-Gene v1 core. As a
> > - * consequence, the total MSI vectors that X-Gene v1 supports will be
> > - * reduced to 256 (2048/8) vectors.
> > - */
> > -static int hwirq_to_cpu(unsigned long hwirq)
> > -{
> > - return (hwirq % num_possible_cpus());
> > -}
> > + target_addr = msi->msi_addr;
> > + target_addr += (FIELD_PREP(MSI_GROUP_MASK, frame) |
> > + FIELD_PREP(MSI_INTR_MASK, msir));
> >
> > -static unsigned long hwirq_to_canonical_hwirq(unsigned long hwirq)
> > -{
> > - return (hwirq - hwirq_to_cpu(hwirq));
> > + msg->address_hi = upper_32_bits(target_addr);
> > + msg->address_lo = lower_32_bits(target_addr);
> > + msg->data = FIELD_GET(DATA_HWIRQ_MASK, data->hwirq);
> > }
> >
> > static int xgene_msi_set_affinity(struct irq_data *irqdata,
> > const struct cpumask *mask, bool force)
> > {
> > int target_cpu = cpumask_first(mask);
> > - int curr_cpu;
> > -
> > - curr_cpu = hwirq_to_cpu(irqdata->hwirq);
> > - if (curr_cpu == target_cpu)
> > - return IRQ_SET_MASK_OK_DONE;
> >
> > - /* Update MSI number to target the new CPU */
> > - irqdata->hwirq = hwirq_to_canonical_hwirq(irqdata->hwirq) + target_cpu;
> > + irq_data_update_effective_affinity(irqdata, cpumask_of(target_cpu));
> >
> > + /* Force the core code to regenerate the message */
> > return IRQ_SET_MASK_OK;
> > }
> >
> > @@ -173,23 +167,20 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
> > unsigned int nr_irqs, void *args)
> > {
> > struct xgene_msi *msi = domain->host_data;
> > - int msi_irq;
> > + irq_hw_number_t hwirq;
> >
> > mutex_lock(&msi->bitmap_lock);
> >
> > - msi_irq = bitmap_find_next_zero_area(msi->bitmap, NR_MSI_VEC, 0,
> > - num_possible_cpus(), 0);
> > - if (msi_irq < NR_MSI_VEC)
> > - bitmap_set(msi->bitmap, msi_irq, num_possible_cpus());
> > - else
> > - msi_irq = -ENOSPC;
> > + hwirq = find_first_zero_bit(msi->bitmap, NR_MSI_VEC);
> > + if (hwirq < NR_MSI_VEC)
> > + set_bit(hwirq, msi->bitmap);
> >
> > mutex_unlock(&msi->bitmap_lock);
> >
> > - if (msi_irq < 0)
> > - return msi_irq;
> > + if (hwirq >= NR_MSI_VEC)
> > + return -ENOSPC;
> >
> > - irq_domain_set_info(domain, virq, msi_irq,
> > + irq_domain_set_info(domain, virq, hwirq,
> > &xgene_msi_bottom_irq_chip, domain->host_data,
> > handle_simple_irq, NULL, NULL);
>
> This is something I don't get. We alloc an MSI, set a bit in the bitmap
> and the hwirq to that value, when we handle the IRQ below in
>
> xgene_msi_isr()
>
> hwirq = compute_hwirq(msi_grp, msir_idx, intr_idx);
> ret = generic_handle_domain_irq(xgene_msi->inner_domain, hwirq);
>
> imagining that we changed the affinity for the IRQ so that the computed
> HWIRQ does not have zeros in bits[7:4], how would the domain HWIRQ
> matching work ?
>
> Actually, how would an IRQ fire causing the hwirq[7:4] bits to be != 0
> in the first place ?
>
> Forgive me if I am missing something obvious, the *current* MSI handling
> is very hard to grok, it is certain I misunderstood it entirely.
>
> Thanks,
> Lorenzo
>
> > @@ -201,12 +192,10 @@ static void xgene_irq_domain_free(struct irq_domain *domain,
> > {
> > struct irq_data *d = irq_domain_get_irq_data(domain, virq);
> > struct xgene_msi *msi = irq_data_get_irq_chip_data(d);
> > - u32 hwirq;
> >
> > mutex_lock(&msi->bitmap_lock);
> >
> > - hwirq = hwirq_to_canonical_hwirq(d->hwirq);
> > - bitmap_clear(msi->bitmap, hwirq, num_possible_cpus());
> > + clear_bit(d->hwirq, msi->bitmap);
> >
> > mutex_unlock(&msi->bitmap_lock);
> >
> > @@ -263,55 +252,30 @@ static void xgene_msi_isr(struct irq_desc *desc)
> > unsigned int *irqp = irq_desc_get_handler_data(desc);
> > struct irq_chip *chip = irq_desc_get_chip(desc);
> > struct xgene_msi *xgene_msi = xgene_msi_ctrl;
> > - int msir_index, msir_val, hw_irq, ret;
> > - u32 intr_index, grp_select, msi_grp;
> > + unsigned long grp_pending;
> > + int msir_idx;
> > + u32 msi_grp;
> >
> > chained_irq_enter(chip, desc);
> >
> > msi_grp = irqp - xgene_msi->gic_irq;
> >
> > - /*
> > - * MSIINTn (n is 0..F) indicates if there is a pending MSI interrupt
> > - * If bit x of this register is set (x is 0..7), one or more interrupts
> > - * corresponding to MSInIRx is set.
> > - */
> > - grp_select = xgene_msi_int_read(xgene_msi, msi_grp);
> > - while (grp_select) {
> > - msir_index = ffs(grp_select) - 1;
> > - /*
> > - * Calculate MSInIRx address to read to check for interrupts
> > - * (refer to termination address and data assignment
> > - * described in xgene_compose_msi_msg() )
> > - */
> > - msir_val = xgene_msi_ir_read(xgene_msi, msi_grp, msir_index);
> > - while (msir_val) {
> > - intr_index = ffs(msir_val) - 1;
> > - /*
> > - * Calculate MSI vector number (refer to the termination
> > - * address and data assignment described in
> > - * xgene_compose_msi_msg function)
> > - */
> > - hw_irq = (((msir_index * IRQS_PER_IDX) + intr_index) *
> > - NR_HW_IRQS) + msi_grp;
> > - /*
> > - * As we have multiple hw_irq that maps to single MSI,
> > - * always look up the virq using the hw_irq as seen from
> > - * CPU0
> > - */
> > - hw_irq = hwirq_to_canonical_hwirq(hw_irq);
> > - ret = generic_handle_domain_irq(xgene_msi->inner_domain, hw_irq);
> > + grp_pending = xgene_msi_int_read(xgene_msi, msi_grp);
> > +
> > + for_each_set_bit(msir_idx, &grp_pending, IDX_PER_GROUP) {
> > + unsigned long msir;
> > + int intr_idx;
> > +
> > + msir = xgene_msi_ir_read(xgene_msi, msi_grp, msir_idx);
> > +
> > + for_each_set_bit(intr_idx, &msir, IRQS_PER_IDX) {
> > + irq_hw_number_t hwirq;
> > + int ret;
> > +
> > + hwirq = compute_hwirq(msi_grp, msir_idx, intr_idx);
> > + ret = generic_handle_domain_irq(xgene_msi->inner_domain,
> > + hwirq);
> > WARN_ON_ONCE(ret);
> > - msir_val &= ~(1 << intr_index);
> > - }
> > - grp_select &= ~(1 << msir_index);
> > -
> > - if (!grp_select) {
> > - /*
> > - * We handled all interrupts happened in this group,
> > - * resample this group MSI_INTx register in case
> > - * something else has been made pending in the meantime
> > - */
> > - grp_select = xgene_msi_int_read(xgene_msi, msi_grp);
> > }
> > }
> >
> > --
> > 2.39.2
> >
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting
2025-07-11 9:55 ` Lorenzo Pieralisi
2025-07-11 10:11 ` Lorenzo Pieralisi
@ 2025-07-11 10:50 ` Marc Zyngier
1 sibling, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-11 10:50 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: linux-pci, linux-arm-kernel, linux-kernel, Toan Le,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Bjorn Helgaas, Thomas Gleixner
On Fri, 11 Jul 2025 10:55:17 +0100,
Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
>
> On Tue, Jul 08, 2025 at 06:34:00PM +0100, Marc Zyngier wrote:
> > static void xgene_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> > {
> > struct xgene_msi *msi = irq_data_get_irq_chip_data(data);
> > - u32 reg_set = hwirq_to_reg_set(data->hwirq);
> > - u32 group = hwirq_to_group(data->hwirq);
> > - u64 target_addr = msi->msi_addr + (((8 * group) + reg_set) << 16);
> > + u64 target_addr;
> > + u32 frame, msir;
> > + int cpu;
> >
> > - msg->address_hi = upper_32_bits(target_addr);
> > - msg->address_lo = lower_32_bits(target_addr);
> > - msg->data = hwirq_to_msi_data(data->hwirq);
> > -}
> > + cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
> > + msir = FIELD_GET(GENMASK(6, 4), data->hwirq);
>
> We could use MSInRx_HWIRQ_MASK, I can update it.
Yes, I appear to have missed that one. It'd be good if you could fix
it up.
> More importantly, what code would set data->hwirq[6:4] (and
> data->hwirq[7:7] below) ?
That's obviously driven by the hwirq allocation, which is guaranteed
to happen in the 0:255 range.
[...]
> > @@ -173,23 +167,20 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
> > unsigned int nr_irqs, void *args)
> > {
> > struct xgene_msi *msi = domain->host_data;
> > - int msi_irq;
> > + irq_hw_number_t hwirq;
> >
> > mutex_lock(&msi->bitmap_lock);
> >
> > - msi_irq = bitmap_find_next_zero_area(msi->bitmap, NR_MSI_VEC, 0,
> > - num_possible_cpus(), 0);
> > - if (msi_irq < NR_MSI_VEC)
> > - bitmap_set(msi->bitmap, msi_irq, num_possible_cpus());
> > - else
> > - msi_irq = -ENOSPC;
> > + hwirq = find_first_zero_bit(msi->bitmap, NR_MSI_VEC);
> > + if (hwirq < NR_MSI_VEC)
> > + set_bit(hwirq, msi->bitmap);
> >
> > mutex_unlock(&msi->bitmap_lock);
> >
> > - if (msi_irq < 0)
> > - return msi_irq;
> > + if (hwirq >= NR_MSI_VEC)
> > + return -ENOSPC;
> >
> > - irq_domain_set_info(domain, virq, msi_irq,
> > + irq_domain_set_info(domain, virq, hwirq,
> > &xgene_msi_bottom_irq_chip, domain->host_data,
> > handle_simple_irq, NULL, NULL);
>
> This is something I don't get. We alloc an MSI, set a bit in the bitmap
> and the hwirq to that value, when we handle the IRQ below in
>
> xgene_msi_isr()
>
> hwirq = compute_hwirq(msi_grp, msir_idx, intr_idx);
> ret = generic_handle_domain_irq(xgene_msi->inner_domain, hwirq);
>
> imagining that we changed the affinity for the IRQ so that the computed
> HWIRQ does not have zeros in bits[7:4], how would the domain HWIRQ
> matching work ?
No. The whole point of this series is that hwirq is now *constant*, no
matter what the affinity says.
> Actually, how would an IRQ fire causing the hwirq[7:4] bits to be != 0
> in the first place ?
hwirq[7:4] is a function of what IRQ fired (giving you a register
frame), and what register was accessed. That 's what allows you to
*rebuild* hwirq as the HW doesn't give it you on a plate (only the
bottom 4 bits are more or less given as a bitmap).
> Forgive me if I am missing something obvious, the *current* MSI handling
> is very hard to grok, it is certain I misunderstood it entirely.
Ignore the current code, it does things that are completely forbidden
by law in any civilised country (although I find it difficult to name
a single civilised country these days).
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting
2025-07-11 10:11 ` Lorenzo Pieralisi
@ 2025-07-11 10:51 ` Marc Zyngier
0 siblings, 0 replies; 23+ messages in thread
From: Marc Zyngier @ 2025-07-11 10:51 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: linux-pci, linux-arm-kernel, linux-kernel, Toan Le,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Bjorn Helgaas, Thomas Gleixner
On Fri, 11 Jul 2025 11:11:02 +0100,
Lorenzo Pieralisi <lpieralisi@kernel.org> wrote:
>
> On Fri, Jul 11, 2025 at 11:55:17AM +0200, Lorenzo Pieralisi wrote:
> > On Tue, Jul 08, 2025 at 06:34:00PM +0100, Marc Zyngier wrote:
>
> [...]
>
> > We could use MSInRx_HWIRQ_MASK, I can update it.
> >
> > More importantly, what code would set data->hwirq[6:4] (and
> > data->hwirq[7:7] below) ?
>
> Forget it. It is the hwirq allocation itself that sets those bits,
> 256 HWIRQs you use the effective cpu affinity to steer the frame,
> it makes sense now.
Exactly.
>
> I can update the code to use the mask above and merge it.
>
> Sorry for the noise,
No worries, and thanks for looking at it!
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 13/13] cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD
2025-07-08 17:34 ` [PATCH v2 13/13] cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD Marc Zyngier
@ 2025-07-11 13:15 ` Lorenzo Pieralisi
0 siblings, 0 replies; 23+ messages in thread
From: Lorenzo Pieralisi @ 2025-07-11 13:15 UTC (permalink / raw)
To: Marc Zyngier, Thomas Gleixner
Cc: linux-pci, linux-arm-kernel, linux-kernel, Toan Le,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Bjorn Helgaas
On Tue, Jul 08, 2025 at 06:34:04PM +0100, Marc Zyngier wrote:
> Now that the XGene MSI driver has been mostly rewritten and doesn't
> use the CPU hotplug infrastructure, CPUHP_PCI_XGENE_DEAD is unused.
>
> Remove it to reduce the size of cpuhp_hp_states[].
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> include/linux/cpuhotplug.h | 1 -
> 1 file changed, 1 deletion(-)
Hi Thomas,
I would upstream this patch through the PCI tree please let me know if that's OK.
Thanks,
Lorenzo
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index df366ee15456b..eaca70eb6136b 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -90,7 +90,6 @@ enum cpuhp_state {
> CPUHP_RADIX_DEAD,
> CPUHP_PAGE_ALLOC,
> CPUHP_NET_DEV_DEAD,
> - CPUHP_PCI_XGENE_DEAD,
> CPUHP_IOMMU_IOVA_DEAD,
> CPUHP_AP_ARM_CACHE_B15_RAC_DEAD,
> CPUHP_PADATA_DEAD,
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
` (12 preceding siblings ...)
2025-07-08 17:34 ` [PATCH v2 13/13] cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD Marc Zyngier
@ 2025-07-17 9:52 ` Lorenzo Pieralisi
13 siblings, 0 replies; 23+ messages in thread
From: Lorenzo Pieralisi @ 2025-07-17 9:52 UTC (permalink / raw)
To: linux-pci, linux-arm-kernel, linux-kernel, Marc Zyngier
Cc: Lorenzo Pieralisi, Toan Le, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Thomas Gleixner
On Tue, 08 Jul 2025 18:33:51 +0100, Marc Zyngier wrote:
> Having recently dipped into the xgene-msi driver to bring it to use
> the MSI-parent concept, I have realised that some of it was slightly
> sub-par (read: downright broken).
>
> The driver is playing horrible tricks behind the core code, missing
> proper affinity management, is terribly over-designed for no good
> reason, and despite what MAINTAINERS says, completely unmaintained.
>
> [...]
Applied to controller/xgene, thanks!
[01/13] genirq: Teach handle_simple_irq() to resend an in-progress interrupt
https://git.kernel.org/pci/pci/c/fad9efd72b5f
[02/13] PCI: xgene: Defer probing if the MSI widget driver hasn't probed yet
https://git.kernel.org/pci/pci/c/b3ffac51b6a7
[03/13] PCI: xgene: Drop useless conditional compilation
https://git.kernel.org/pci/pci/c/1ded8cc14884
[04/13] PCI: xgene: Drop XGENE_PCIE_IP_VER_UNKN
https://git.kernel.org/pci/pci/c/14a347069f71
[05/13] PCI: xgene-msi: Make per-CPU interrupt setup robust
https://git.kernel.org/pci/pci/c/84cb4108bf6f
[06/13] PCI: xgene-msi: Drop superfluous fields from xgene_msi structure
https://git.kernel.org/pci/pci/c/61aeded55e5b
[07/13] PCI: xgene-msi: Use device-managed memory allocations
https://git.kernel.org/pci/pci/c/a782a50689d6
[08/13] PCI: xgene-msi: Get rid of intermediate tracking structure
https://git.kernel.org/pci/pci/c/9a7ca398f2d4
[09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting
https://git.kernel.org/pci/pci/c/7112647f6d19
[10/13] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU
https://git.kernel.org/pci/pci/c/b43fa1b0691b
[11/13] PCI: xgene-msi: Probe as a standard platform driver
https://git.kernel.org/pci/pci/c/a435be2c3318
[12/13] PCI: xgene-msi: Restructure handler setup/teardown
https://git.kernel.org/pci/pci/c/f7d447633452
[13/13] cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD
https://git.kernel.org/pci/pci/c/8db22d697c52
Thanks,
Lorenzo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 06/13] PCI: xgene-msi: Drop superfluous fields from xgene_msi structure
2025-07-08 17:33 ` [PATCH v2 06/13] PCI: xgene-msi: Drop superfluous fields from xgene_msi structure Marc Zyngier
@ 2025-07-17 11:24 ` Markus Elfring
0 siblings, 0 replies; 23+ messages in thread
From: Markus Elfring @ 2025-07-17 11:24 UTC (permalink / raw)
To: Marc Zyngier, linux-pci, linux-arm-kernel
Cc: LKML, Bjorn Helgaas, Krzysztof Wilczyński, Lorenzo Pieralisi,
Manivannan Sadhasivam, Rob Herring, Thomas Gleixner, Toan Le
…
> +++ b/drivers/pci/controller/pci-xgene-msi.c
…
> @@ -214,7 +212,7 @@ static void xgene_irq_domain_free(struct irq_domain *domain,
> mutex_lock(&msi->bitmap_lock);
>
> hwirq = hwirq_to_canonical_hwirq(d->hwirq);
> - bitmap_clear(msi->bitmap, hwirq, msi->num_cpus);
> + bitmap_clear(msi->bitmap, hwirq, num_possible_cpus());
>
> mutex_unlock(&msi->bitmap_lock);
…
Under which circumstances would you become interested to apply a statement
like “guard(mutex)(&msi->bitmap_lock);”?
https://elixir.bootlin.com/linux/v6.16-rc6/source/include/linux/mutex.h#L225
Regards,
Markus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 11/13] PCI: xgene-msi: Probe as a standard platform driver
2025-07-08 17:34 ` [PATCH v2 11/13] PCI: xgene-msi: Probe as a standard platform driver Marc Zyngier
@ 2025-07-17 11:45 ` Markus Elfring
2025-07-17 13:18 ` Lorenzo Pieralisi
0 siblings, 1 reply; 23+ messages in thread
From: Markus Elfring @ 2025-07-17 11:45 UTC (permalink / raw)
To: Marc Zyngier, linux-pci, linux-arm-kernel
Cc: LKML, Bjorn Helgaas, Krzysztof Wilczyński, Lorenzo Pieralisi,
Manivannan Sadhasivam, Rob Herring, Thomas Gleixner, Toan Le
> Now that we have made the dependedncy between the PCI driver and
…
dependency?
Regards,
Markus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v2 11/13] PCI: xgene-msi: Probe as a standard platform driver
2025-07-17 11:45 ` Markus Elfring
@ 2025-07-17 13:18 ` Lorenzo Pieralisi
0 siblings, 0 replies; 23+ messages in thread
From: Lorenzo Pieralisi @ 2025-07-17 13:18 UTC (permalink / raw)
To: Markus Elfring
Cc: Marc Zyngier, linux-pci, linux-arm-kernel, LKML, Bjorn Helgaas,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Thomas Gleixner, Toan Le
On Thu, Jul 17, 2025 at 01:45:12PM +0200, Markus Elfring wrote:
> > Now that we have made the dependedncy between the PCI driver and
> …
>
> dependency?
https://git.kernel.org/pci/pci/c/a435be2c3318
Thanks,
Lorenzo
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-07-17 13:18 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 17:33 [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 01/13] genirq: Teach handle_simple_irq() to resend an in-progress interrupt Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 02/13] PCI: xgene: Defer probing if the MSI widget driver hasn't probed yet Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 03/13] PCI: xgene: Drop useless conditional compilation Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 04/13] PCI: xgene: Drop XGENE_PCIE_IP_VER_UNKN Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 05/13] PCI: xgene-msi: Make per-CPU interrupt setup robust Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 06/13] PCI: xgene-msi: Drop superfluous fields from xgene_msi structure Marc Zyngier
2025-07-17 11:24 ` Markus Elfring
2025-07-08 17:33 ` [PATCH v2 07/13] PCI: xgene-msi: Use device-managed memory allocations Marc Zyngier
2025-07-08 17:33 ` [PATCH v2 08/13] PCI: xgene-msi: Get rid of intermediate tracking structure Marc Zyngier
2025-07-08 17:34 ` [PATCH v2 09/13] PCI: xgene-msi: Sanitise MSI allocation and affinity setting Marc Zyngier
2025-07-11 9:55 ` Lorenzo Pieralisi
2025-07-11 10:11 ` Lorenzo Pieralisi
2025-07-11 10:51 ` Marc Zyngier
2025-07-11 10:50 ` Marc Zyngier
2025-07-08 17:34 ` [PATCH v2 10/13] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU Marc Zyngier
2025-07-08 17:34 ` [PATCH v2 11/13] PCI: xgene-msi: Probe as a standard platform driver Marc Zyngier
2025-07-17 11:45 ` Markus Elfring
2025-07-17 13:18 ` Lorenzo Pieralisi
2025-07-08 17:34 ` [PATCH v2 12/13] PCI: xgene-msi: Restructure handler setup/teardown Marc Zyngier
2025-07-08 17:34 ` [PATCH v2 13/13] cpu/hotplug: Remove unused cpuhp_state CPUHP_PCI_XGENE_DEAD Marc Zyngier
2025-07-11 13:15 ` Lorenzo Pieralisi
2025-07-17 9:52 ` [PATCH v2 00/13] PCI: xgene: Fix and simplify the MSI driver Lorenzo Pieralisi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).