* [PATCH v2 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API
@ 2025-11-17 8:59 Tobias Schumacher
2025-11-17 8:59 ` [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t Tobias Schumacher
2025-11-17 8:59 ` [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API Tobias Schumacher
0 siblings, 2 replies; 8+ messages in thread
From: Tobias Schumacher @ 2025-11-17 8:59 UTC (permalink / raw)
To: Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Niklas Schnelle,
Gerald Schaefer, Gerd Bayer, Halil Pasic, Matthew Rosato,
Thomas Gleixner
Cc: linux-kernel, linux-s390, Tobias Schumacher
This patch series reworks the PCIe interrupt handling on s390 by
migrating it to use a proper MSI parent domain. Introducing a dedicated
MSI domain hierarchy aligns s390 PCIe support with the generic Linux IRQ
domain model. Currently s390 is one of the last architectures still using
the legacy API.
The migration splits the existing code in the legacy functions
arch_setup_msi_irqs() and arch_teardown_msi_irqs() into different
callbacks of the newly created MSI parent domain:
- zpci_msi_prepare(): prepare the allocation of per-device MSI IRQs.
will be called once for each device before allocating individual
IRQs and sets up for example the adapter aisb and aibv.
- zpci_msi_teardown(): reverts the effects of zpci_msi_prepare() and is
called after all MSI IRQs are freed.
- zpci_msi_domain_alloc(): the allocation function for interrupts
- zpci_msi_domain_free(): revert the effects of zpci_msi_domain_alloc()
- zpci_compose_msi_msg(): create the MSI message to be written into the
corresponding PCI config space.
* Patch 1 fixes an inconsistency in the irqdomain API. Internally, hw
irqs are represented by an unsigned long int (irq_hw_number_t) while
the external API in some cases takes an unsigned int as parameter.
This must be fixed to allow for the hwirq encoding used for s390.
* Patch 2 implements IRQ domains for s390 PCI
Since patch 1 changes common APIs, some build tests were done for x86_64
and arm64.
Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
---
Changes in v2:
- fix directed interrupt setup and handling
- add flag MSI_FLAG_NO_AFFINITY in case of floating interrupts
- style adjustments according to review comments
- Link to v1: https://lore.kernel.org/r/20251112-implement-msi-domain-v1-0-103dd123de14@linux.ibm.com
---
Tobias Schumacher (2):
genirq: Change hwirq parameter to irq_hw_number_t
s390/pci: Migrate s390 IRQ logic to IRQ domain API
arch/s390/Kconfig | 1 +
arch/s390/include/asm/pci.h | 1 +
arch/s390/pci/pci_bus.c | 1 +
arch/s390/pci/pci_irq.c | 335 +++++++++++++++++++++++++++-----------------
include/linux/irqdesc.h | 6 +-
kernel/irq/irqdesc.c | 6 +-
6 files changed, 214 insertions(+), 136 deletions(-)
---
base-commit: 882489402b556fa6d916cba86051276fcc9a8953
change-id: 20251104-implement-msi-domain-dc1ea014580e
Best regards,
--
Tobias Schumacher <ts@linux.ibm.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t
2025-11-17 8:59 [PATCH v2 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API Tobias Schumacher
@ 2025-11-17 8:59 ` Tobias Schumacher
2025-11-17 11:26 ` Thomas Gleixner
2025-11-17 15:56 ` Niklas Schnelle
2025-11-17 8:59 ` [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API Tobias Schumacher
1 sibling, 2 replies; 8+ messages in thread
From: Tobias Schumacher @ 2025-11-17 8:59 UTC (permalink / raw)
To: Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Niklas Schnelle,
Gerald Schaefer, Gerd Bayer, Halil Pasic, Matthew Rosato,
Thomas Gleixner
Cc: linux-kernel, linux-s390, Tobias Schumacher
The irqdomain implementation internally represents hardware IRQs as
irq_hw_number_t, which is defined as unsigned long int. When providing
an irq_hw_number_t to the generic_handle_domain() functions that expect
and unsigned int hwirq, this can lead to a loss of information. Change
the hwirq parameter to irq_hw_number_t to support the full range of
hwirqs.
Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
---
include/linux/irqdesc.h | 6 +++---
kernel/irq/irqdesc.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index fd091c35d5721eee37a2fd3d5526559671d5048d..03b63aea73bb21ae1456910afa534d60f9cfa94d 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -183,9 +183,9 @@ int generic_handle_irq_safe(unsigned int irq);
* and handle the result interrupt number. Return -EINVAL if
* conversion failed.
*/
-int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq);
-int generic_handle_domain_irq_safe(struct irq_domain *domain, unsigned int hwirq);
-int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq);
+int generic_handle_domain_irq(struct irq_domain *domain, irq_hw_number_t hwirq);
+int generic_handle_domain_irq_safe(struct irq_domain *domain, irq_hw_number_t hwirq);
+int generic_handle_domain_nmi(struct irq_domain *domain, irq_hw_number_t hwirq);
#endif
/* Test to see if a driver has successfully requested an irq */
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index db714d3014b5f7b62403ea04b80331ec6b1dc642..0cd3198496bc0766c81c353c3ff80ea184793d6a 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -720,7 +720,7 @@ EXPORT_SYMBOL_GPL(generic_handle_irq_safe);
* This function must be called from an IRQ context with irq regs
* initialized.
*/
-int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq)
+int generic_handle_domain_irq(struct irq_domain *domain, irq_hw_number_t hwirq)
{
return handle_irq_desc(irq_resolve_mapping(domain, hwirq));
}
@@ -738,7 +738,7 @@ EXPORT_SYMBOL_GPL(generic_handle_domain_irq);
* context). If the interrupt is marked as 'enforce IRQ-context only' then
* the function must be invoked from hard interrupt context.
*/
-int generic_handle_domain_irq_safe(struct irq_domain *domain, unsigned int hwirq)
+int generic_handle_domain_irq_safe(struct irq_domain *domain, irq_hw_number_t hwirq)
{
unsigned long flags;
int ret;
@@ -761,7 +761,7 @@ EXPORT_SYMBOL_GPL(generic_handle_domain_irq_safe);
* This function must be called from an NMI context with irq regs
* initialized.
**/
-int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq)
+int generic_handle_domain_nmi(struct irq_domain *domain, irq_hw_number_t hwirq)
{
WARN_ON_ONCE(!in_nmi());
return handle_irq_desc(irq_resolve_mapping(domain, hwirq));
--
2.48.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API
2025-11-17 8:59 [PATCH v2 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API Tobias Schumacher
2025-11-17 8:59 ` [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t Tobias Schumacher
@ 2025-11-17 8:59 ` Tobias Schumacher
2025-11-17 17:20 ` Niklas Schnelle
2025-11-17 22:46 ` Farhan Ali
1 sibling, 2 replies; 8+ messages in thread
From: Tobias Schumacher @ 2025-11-17 8:59 UTC (permalink / raw)
To: Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Niklas Schnelle,
Gerald Schaefer, Gerd Bayer, Halil Pasic, Matthew Rosato,
Thomas Gleixner
Cc: linux-kernel, linux-s390, Tobias Schumacher
s390 is one of the last architectures using the legacy API for setup and
teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
to the MSI parent domain API. For details, see:
https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
In detail, create an MSI parent domain for zpci which is used by
all PCI devices. When a PCI device sets up MSI or MSI-X IRQs, the
library creates a per-device IRQ domain for this device, which is
used by the device for allocating and freeing IRQs.
The per-device domain delegates this allocation and freeing to the
parent-domain. In the end, the corresponding callbacks of the parent
domain are responsible for allocating and freeing the IRQs.
The allocation is split into two parts:
- zpci_msi_prepare() is called once for each device and allocates the
required resources. On s390, each PCI function has its own airq
vector and a summary bit, which must be configured once per function.
This is done in prepare().
- zpci_msi_alloc() can be called multiple times for allocating one or
more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
number in the kernel and the hardware IRQ number.
Freeing is split into two counterparts:
- zpci_msi_free() reverts the effects of zpci_msi_alloc() and
- zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
callend once when all IRQs are freed before a device is removed.
Since the parent domain in the end allocates the IRQs, the hwirq
encoding must be unambiguous for all IRQs of all devices. This is
achieved by encoding the hwirq using the PCI function id and the MSI
index.
Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
---
arch/s390/Kconfig | 1 +
arch/s390/include/asm/pci.h | 1 +
arch/s390/pci/pci_bus.c | 1 +
arch/s390/pci/pci_irq.c | 335 +++++++++++++++++++++++++++-----------------
4 files changed, 208 insertions(+), 130 deletions(-)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index df22b10d91415e1ed183cc8add9ad0ac4293c50e..48cd6a12bd04dfe4dd61ecc79d3401ba685c51bb 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -251,6 +251,7 @@ config S390
select HOTPLUG_SMT
select IOMMU_HELPER if PCI
select IOMMU_SUPPORT if PCI
+ select IRQ_MSI_LIB if PCI
select KASAN_VMALLOC if KASAN
select LOCK_MM_AND_FIND_VMA
select MMU_GATHER_MERGE_VMAS
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index a32f465ecf73a5cc3408a312d94ec888d62848cc..462e87bdb7acdfa4e7df0f9ca8e82c269e1f98aa 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -310,6 +310,7 @@ int zpci_dma_exit_device(struct zpci_dev *zdev);
/* IRQ */
int __init zpci_irq_init(void);
void __init zpci_irq_exit(void);
+void zpci_set_msi_parent_domain(struct zpci_bus *zbus);
/* FMB */
int zpci_fmb_enable_device(struct zpci_dev *);
diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
index be8c697fea0cc755cfdb4fb0a9e3b95183bec0dc..2be33cfb8970409db4fcb75ea73543f49b583a5c 100644
--- a/arch/s390/pci/pci_bus.c
+++ b/arch/s390/pci/pci_bus.c
@@ -210,6 +210,7 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
}
zbus->bus = bus;
+ zpci_set_msi_parent_domain(zbus);
return 0;
}
diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
index e73be96ce5fe6473fc193d65b8f0ff635d6a98ba..f2c2fc23d5693e211bcbfd94e8f2fe25dc71a4e2 100644
--- a/arch/s390/pci/pci_irq.c
+++ b/arch/s390/pci/pci_irq.c
@@ -7,6 +7,7 @@
#include <linux/kernel_stat.h>
#include <linux/pci.h>
#include <linux/msi.h>
+#include <linux/irqchip/irq-msi-lib.h>
#include <linux/smp.h>
#include <asm/isc.h>
@@ -29,6 +30,8 @@ static struct airq_iv *zpci_sbv;
*/
static struct airq_iv **zpci_ibv;
+static struct irq_domain *zpci_msi_parent_domain;
+
/* Modify PCI: Register floating adapter interruptions */
static int zpci_set_airq(struct zpci_dev *zdev)
{
@@ -110,43 +113,42 @@ static int zpci_set_irq(struct zpci_dev *zdev)
return rc;
}
-/* Clear adapter interruptions */
-static int zpci_clear_irq(struct zpci_dev *zdev)
+static int zpci_set_irq_affinity(struct irq_data *data, const struct cpumask *dest,
+ bool force)
{
- int rc;
-
- if (irq_delivery == DIRECTED)
- rc = zpci_clear_directed_irq(zdev);
- else
- rc = zpci_clear_airq(zdev);
-
- return rc;
+ irq_data_update_affinity(data, dest);
+ return IRQ_SET_MASK_OK;
}
-static int zpci_set_irq_affinity(struct irq_data *data, const struct cpumask *dest,
- bool force)
+static void zpci_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
{
- struct msi_desc *entry = irq_data_get_msi_desc(data);
- struct msi_msg msg = entry->msg;
- int cpu_addr = smp_cpu_get_cpu_address(cpumask_first(dest));
+ struct msi_desc *desc = irq_data_get_msi_desc(data);
+ struct zpci_dev *zdev = to_zpci_dev(desc->dev);
- msg.address_lo &= 0xff0000ff;
- msg.address_lo |= (cpu_addr << 8);
- pci_write_msi_msg(data->irq, &msg);
+ if (irq_delivery == DIRECTED) {
+ int cpu = cpumask_first(irq_data_get_affinity_mask(data));
- return IRQ_SET_MASK_OK;
+ msg->address_lo = zdev->msi_addr & 0xff0000ff;
+ msg->address_lo |= (smp_cpu_get_cpu_address(cpu) << 8);
+ } else {
+ msg->address_lo = zdev->msi_addr & 0xffffffff;
+ }
+ msg->address_hi = zdev->msi_addr >> 32;
+ msg->data = data->hwirq & 0xffffffff;
}
static struct irq_chip zpci_irq_chip = {
.name = "PCI-MSI",
.irq_unmask = pci_msi_unmask_irq,
.irq_mask = pci_msi_mask_irq,
+ .irq_compose_msi_msg = zpci_compose_msi_msg
};
static void zpci_handle_cpu_local_irq(bool rescan)
{
struct airq_iv *dibv = zpci_ibv[smp_processor_id()];
union zpci_sic_iib iib = {{0}};
+ irq_hw_number_t hwirq;
unsigned long bit;
int irqs_on = 0;
@@ -164,7 +166,8 @@ static void zpci_handle_cpu_local_irq(bool rescan)
continue;
}
inc_irq_stat(IRQIO_MSI);
- generic_handle_irq(airq_iv_get_data(dibv, bit));
+ hwirq = airq_iv_get_ptr(dibv, bit);
+ generic_handle_domain_irq(zpci_msi_parent_domain, hwirq);
}
}
@@ -229,6 +232,7 @@ static void zpci_floating_irq_handler(struct airq_struct *airq,
struct tpi_info *tpi_info)
{
union zpci_sic_iib iib = {{0}};
+ irq_hw_number_t hwirq;
unsigned long si, ai;
struct airq_iv *aibv;
int irqs_on = 0;
@@ -256,7 +260,9 @@ static void zpci_floating_irq_handler(struct airq_struct *airq,
break;
inc_irq_stat(IRQIO_MSI);
airq_iv_lock(aibv, ai);
- generic_handle_irq(airq_iv_get_data(aibv, ai));
+
+ hwirq = airq_iv_get_ptr(aibv, ai);
+ generic_handle_domain_irq(zpci_msi_parent_domain, hwirq);
airq_iv_unlock(aibv, ai);
}
}
@@ -278,7 +284,7 @@ static int __alloc_airq(struct zpci_dev *zdev, int msi_vecs,
zdev->aisb = *bit;
/* Create adapter interrupt vector */
- zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA | AIRQ_IV_BITLOCK, NULL);
+ zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_PTR | AIRQ_IV_BITLOCK, NULL);
if (!zdev->aibv)
return -ENOMEM;
@@ -290,146 +296,203 @@ static int __alloc_airq(struct zpci_dev *zdev, int msi_vecs,
return 0;
}
-int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
+static struct airq_struct zpci_airq = {
+ .handler = zpci_floating_irq_handler,
+ .isc = PCI_ISC,
+};
+
+/*
+ * Encode the hwirq number for the parent domain. The encoding must be unique
+ * for each IRQ of each device in the parent domain, so it uses the fid to
+ * identify the device and the msi_index to identify the IRQ within that device.
+ */
+static inline irq_hw_number_t zpci_encode_hwirq(u32 fid, u16 msi_index)
{
- unsigned int hwirq, msi_vecs, irqs_per_msi, i, cpu;
- struct zpci_dev *zdev = to_zpci(pdev);
- struct msi_desc *msi;
- struct msi_msg msg;
- unsigned long bit;
- int cpu_addr;
- int rc, irq;
+ return ((irq_hw_number_t)fid << 32) | msi_index;
+}
- zdev->aisb = -1UL;
- zdev->msi_first_bit = -1U;
+static inline u16 zpci_decode_hwirq_msi_index(irq_hw_number_t irq)
+{
+ return irq & 0xFFFF;
+}
+
+static int zpci_msi_prepare(struct irq_domain *domain,
+ struct device *dev, int nvec,
+ msi_alloc_info_t *info)
+{
+ struct zpci_dev *zdev = to_zpci_dev(dev);
+ struct pci_dev *pdev = to_pci_dev(dev);
+ unsigned long bit;
+ int msi_vecs, rc;
msi_vecs = min_t(unsigned int, nvec, zdev->max_msi);
- if (msi_vecs < nvec) {
- pr_info("%s requested %d irqs, allocate system limit of %d",
+ if (msi_vecs < nvec)
+ pr_info("%s requested %d IRQs, allocate system limit of %d",
pci_name(pdev), nvec, zdev->max_msi);
- }
rc = __alloc_airq(zdev, msi_vecs, &bit);
- if (rc < 0)
+ if (rc) {
+ pr_err("Allocating adapter IRQs for %s failed\n", pci_name(pdev));
return rc;
+ }
- /*
- * Request MSI interrupts:
- * When using MSI, nvec_used interrupt sources and their irq
- * descriptors are controlled through one msi descriptor.
- * Thus the outer loop over msi descriptors shall run only once,
- * while two inner loops iterate over the interrupt vectors.
- * When using MSI-X, each interrupt vector/irq descriptor
- * is bound to exactly one msi descriptor (nvec_used is one).
- * So the inner loops are executed once, while the outer iterates
- * over the MSI-X descriptors.
- */
- hwirq = bit;
- msi_for_each_desc(msi, &pdev->dev, MSI_DESC_NOTASSOCIATED) {
- if (hwirq - bit >= msi_vecs)
- break;
- irqs_per_msi = min_t(unsigned int, msi_vecs, msi->nvec_used);
- irq = __irq_alloc_descs(-1, 0, irqs_per_msi, 0, THIS_MODULE,
- (irq_delivery == DIRECTED) ?
- msi->affinity : NULL);
- if (irq < 0)
- return -ENOMEM;
-
- for (i = 0; i < irqs_per_msi; i++) {
- rc = irq_set_msi_desc_off(irq, i, msi);
- if (rc)
- return rc;
- irq_set_chip_and_handler(irq + i, &zpci_irq_chip,
- handle_percpu_irq);
- }
-
- msg.data = hwirq - bit;
+ zdev->msi_first_bit = bit;
+ zdev->msi_nr_irqs = msi_vecs;
+ rc = zpci_set_irq(zdev);
+ if (rc) {
+ pr_err("Registering adapter IRQs for %s failed\n",
+ pci_name(pdev));
if (irq_delivery == DIRECTED) {
- if (msi->affinity)
- cpu = cpumask_first(&msi->affinity->mask);
- else
- cpu = 0;
- cpu_addr = smp_cpu_get_cpu_address(cpu);
-
- msg.address_lo = zdev->msi_addr & 0xff0000ff;
- msg.address_lo |= (cpu_addr << 8);
-
- for_each_possible_cpu(cpu) {
- for (i = 0; i < irqs_per_msi; i++)
- airq_iv_set_data(zpci_ibv[cpu],
- hwirq + i, irq + i);
- }
+ airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, msi_vecs);
} else {
- msg.address_lo = zdev->msi_addr & 0xffffffff;
- for (i = 0; i < irqs_per_msi; i++)
- airq_iv_set_data(zdev->aibv, hwirq + i, irq + i);
+ zpci_clear_airq(zdev);
+ airq_iv_release(zdev->aibv);
+ zdev->aibv = NULL;
+ airq_iv_free_bit(zpci_sbv, zdev->aisb);
+ zdev->aisb = -1UL;
}
- msg.address_hi = zdev->msi_addr >> 32;
- pci_write_msi_msg(irq, &msg);
- hwirq += irqs_per_msi;
+ zdev->msi_first_bit = -1U;
+ return rc;
}
- zdev->msi_first_bit = bit;
- zdev->msi_nr_irqs = hwirq - bit;
+ return 0;
+}
- rc = zpci_set_irq(zdev);
- if (rc)
- return rc;
+static void zpci_msi_teardown_directed(struct zpci_dev *zdev)
+{
+ zpci_clear_directed_irq(zdev);
+ airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, zdev->max_msi);
+ zdev->msi_first_bit = -1U;
+}
- return (zdev->msi_nr_irqs == nvec) ? 0 : zdev->msi_nr_irqs;
+static void zpci_msi_teardown_floating(struct zpci_dev *zdev)
+{
+ zpci_clear_airq(zdev);
+ airq_iv_release(zdev->aibv);
+ zdev->aibv = NULL;
+ airq_iv_free_bit(zpci_sbv, zdev->aisb);
+ zdev->aisb = -1UL;
+ zdev->msi_first_bit = -1U;
}
-void arch_teardown_msi_irqs(struct pci_dev *pdev)
+static void zpci_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *arg)
{
- struct zpci_dev *zdev = to_zpci(pdev);
- struct msi_desc *msi;
- unsigned int i;
- int rc;
+ struct zpci_dev *zdev = to_zpci_dev(domain->dev);
- /* Disable interrupts */
- rc = zpci_clear_irq(zdev);
- if (rc)
- return;
+ if (irq_delivery == DIRECTED)
+ zpci_msi_teardown_directed(zdev);
+ else
+ zpci_msi_teardown_floating(zdev);
+}
- /* Release MSI interrupts */
- msi_for_each_desc(msi, &pdev->dev, MSI_DESC_ASSOCIATED) {
- for (i = 0; i < msi->nvec_used; i++) {
- irq_set_msi_desc(msi->irq + i, NULL);
- irq_free_desc(msi->irq + i);
+static int zpci_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs, void *args)
+{
+ struct msi_desc *desc = ((msi_alloc_info_t *)args)->desc;
+ struct zpci_dev *zdev = to_zpci_dev(desc->dev);
+ irq_hw_number_t hwirq;
+ unsigned long bit;
+ unsigned int cpu;
+ int i;
+
+ bit = zdev->msi_first_bit + desc->msi_index;
+ hwirq = zpci_encode_hwirq(zdev->fid, desc->msi_index);
+
+ if (desc->msi_index + nr_irqs > zdev->max_msi)
+ return -EINVAL;
+
+ for (i = 0; i < nr_irqs; i++) {
+ irq_domain_set_info(domain, virq + i, hwirq + i,
+ &zpci_irq_chip, zdev,
+ handle_percpu_irq, NULL, NULL);
+
+ if (irq_delivery == DIRECTED) {
+ for_each_possible_cpu(cpu) {
+ airq_iv_set_ptr(zpci_ibv[cpu],
+ bit + i, hwirq + i);
+ }
+
+ } else {
+ airq_iv_set_ptr(zdev->aibv, bit + i, hwirq + i);
}
- msi->msg.address_lo = 0;
- msi->msg.address_hi = 0;
- msi->msg.data = 0;
- msi->irq = 0;
}
- if (zdev->aisb != -1UL) {
- zpci_ibv[zdev->aisb] = NULL;
- airq_iv_free_bit(zpci_sbv, zdev->aisb);
- zdev->aisb = -1UL;
- }
- if (zdev->aibv) {
- airq_iv_release(zdev->aibv);
- zdev->aibv = NULL;
- }
+ return 0;
+}
- if ((irq_delivery == DIRECTED) && zdev->msi_first_bit != -1U)
- airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, zdev->msi_nr_irqs);
+static void zpci_msi_domain_free(struct irq_domain *domain, unsigned int virq,
+ unsigned int nr_irqs)
+{
+ irq_hw_number_t hwirq;
+ struct irq_data *d;
+ u16 msi_index;
+ int i;
+
+ for (i = 0; i < nr_irqs; i++) {
+ d = irq_domain_get_irq_data(domain, virq + i);
+ hwirq = d->hwirq;
+ msi_index = zpci_decode_hwirq_msi_index(hwirq);
+ irq_domain_reset_irq_data(d);
+ }
}
-bool arch_restore_msi_irqs(struct pci_dev *pdev)
+static const struct irq_domain_ops zpci_msi_domain_ops = {
+ .alloc = zpci_msi_domain_alloc,
+ .free = zpci_msi_domain_free
+};
+
+static bool zpci_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+ struct irq_domain *real_parent,
+ struct msi_domain_info *info)
{
- struct zpci_dev *zdev = to_zpci(pdev);
+ if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
+ return false;
+
+ info->ops->msi_prepare = zpci_msi_prepare;
+ info->ops->msi_teardown = zpci_msi_teardown;
- zpci_set_irq(zdev);
return true;
}
-static struct airq_struct zpci_airq = {
- .handler = zpci_floating_irq_handler,
- .isc = PCI_ISC,
+static struct msi_parent_ops zpci_msi_parent_ops = {
+ .supported_flags = MSI_GENERIC_FLAGS_MASK |
+ MSI_FLAG_PCI_MSIX |
+ MSI_FLAG_MULTI_PCI_MSI,
+ .required_flags = MSI_FLAG_USE_DEF_DOM_OPS |
+ MSI_FLAG_USE_DEF_CHIP_OPS |
+ MSI_FLAG_PCI_MSI_MASK_PARENT,
+ .init_dev_msi_info = zpci_init_dev_msi_info
};
+static int __init zpci_create_parent_msi_domain(void)
+{
+ struct irq_domain_info info = {
+ .fwnode = irq_domain_alloc_named_fwnode("zpci_msi"),
+ .ops = &zpci_msi_domain_ops
+ };
+
+ if (!info.fwnode) {
+ pr_err("Failed to allocate fwnode for MSI IRQ domain\n");
+ return -ENOMEM;
+ }
+
+ if (irq_delivery == FLOATING)
+ zpci_msi_parent_ops.required_flags |= MSI_FLAG_NO_AFFINITY;
+ zpci_msi_parent_domain = msi_create_parent_irq_domain(&info, &zpci_msi_parent_ops);
+ if (!zpci_msi_parent_domain) {
+ irq_domain_free_fwnode(info.fwnode);
+ pr_err("Failed to create MSI IRQ domain\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+void zpci_set_msi_parent_domain(struct zpci_bus *zbus)
+{
+ dev_set_msi_domain(&zbus->bus->dev, zpci_msi_parent_domain);
+}
+
static void __init cpu_enable_directed_irq(void *unused)
{
union zpci_sic_iib iib = {{0}};
@@ -466,7 +529,7 @@ static int __init zpci_directed_irq_init(void)
* is only done on the first vector.
*/
zpci_ibv[cpu] = airq_iv_create(cache_line_size() * BITS_PER_BYTE,
- AIRQ_IV_DATA |
+ AIRQ_IV_PTR |
AIRQ_IV_CACHELINE |
(!cpu ? AIRQ_IV_ALLOC : 0), NULL);
if (!zpci_ibv[cpu])
@@ -511,6 +574,11 @@ int __init zpci_irq_init(void)
rc = register_adapter_interrupt(&zpci_airq);
if (rc)
goto out;
+
+ zpci_create_parent_msi_domain();
+ if (!zpci_msi_parent_domain)
+ goto out_airq;
+
/* Set summary to 1 to be called every time for the ISC. */
*zpci_airq.lsi_ptr = 1;
@@ -524,7 +592,7 @@ int __init zpci_irq_init(void)
}
if (rc)
- goto out_airq;
+ goto out_msi_domain;
/*
* Enable floating IRQs (with suppression after one IRQ). When using
@@ -533,6 +601,8 @@ int __init zpci_irq_init(void)
zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib);
return 0;
+out_msi_domain:
+ irq_domain_remove(zpci_msi_parent_domain);
out_airq:
unregister_adapter_interrupt(&zpci_airq);
out:
@@ -541,6 +611,7 @@ int __init zpci_irq_init(void)
void __init zpci_irq_exit(void)
{
+ struct fwnode_handle *fn;
unsigned int cpu;
if (irq_delivery == DIRECTED) {
@@ -549,6 +620,10 @@ void __init zpci_irq_exit(void)
}
}
kfree(zpci_ibv);
+ fn = zpci_msi_parent_domain->fwnode;
+ irq_domain_remove(zpci_msi_parent_domain);
+ irq_domain_free_fwnode(fn);
+
if (zpci_sbv)
airq_iv_release(zpci_sbv);
unregister_adapter_interrupt(&zpci_airq);
--
2.48.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t
2025-11-17 8:59 ` [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t Tobias Schumacher
@ 2025-11-17 11:26 ` Thomas Gleixner
2025-11-17 15:56 ` Niklas Schnelle
1 sibling, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2025-11-17 11:26 UTC (permalink / raw)
To: Tobias Schumacher, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Niklas Schnelle, Gerald Schaefer, Gerd Bayer, Halil Pasic,
Matthew Rosato
Cc: linux-kernel, linux-s390, Tobias Schumacher
On Mon, Nov 17 2025 at 09:59, Tobias Schumacher wrote:
> The irqdomain implementation internally represents hardware IRQs as
> irq_hw_number_t, which is defined as unsigned long int. When providing
> an irq_hw_number_t to the generic_handle_domain() functions that expect
> and unsigned int hwirq, this can lead to a loss of information. Change
> the hwirq parameter to irq_hw_number_t to support the full range of
> hwirqs.
>
> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t
2025-11-17 8:59 ` [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t Tobias Schumacher
2025-11-17 11:26 ` Thomas Gleixner
@ 2025-11-17 15:56 ` Niklas Schnelle
1 sibling, 0 replies; 8+ messages in thread
From: Niklas Schnelle @ 2025-11-17 15:56 UTC (permalink / raw)
To: Tobias Schumacher, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Gerd Bayer, Halil Pasic, Matthew Rosato,
Thomas Gleixner
Cc: linux-kernel, linux-s390
On Mon, 2025-11-17 at 09:59 +0100, Tobias Schumacher wrote:
> The irqdomain implementation internally represents hardware IRQs as
> irq_hw_number_t, which is defined as unsigned long int. When providing
> an irq_hw_number_t to the generic_handle_domain() functions that expect
> and unsigned int hwirq, this can lead to a loss of information. Change
> the hwirq parameter to irq_hw_number_t to support the full range of
> hwirqs.
>
> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
> ---
> include/linux/irqdesc.h | 6 +++---
> kernel/irq/irqdesc.c | 6 +++---
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
>
Looks good and this also made it very clean to combine function IDs
with the MSI index. Thanks!
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API
2025-11-17 8:59 ` [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API Tobias Schumacher
@ 2025-11-17 17:20 ` Niklas Schnelle
2025-11-17 22:46 ` Farhan Ali
1 sibling, 0 replies; 8+ messages in thread
From: Niklas Schnelle @ 2025-11-17 17:20 UTC (permalink / raw)
To: Tobias Schumacher, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Gerd Bayer, Halil Pasic, Matthew Rosato,
Thomas Gleixner
Cc: linux-kernel, linux-s390
On Mon, 2025-11-17 at 09:59 +0100, Tobias Schumacher wrote:
> s390 is one of the last architectures using the legacy API for setup and
> teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
> to the MSI parent domain API. For details, see:
>
> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
>
> In detail, create an MSI parent domain for zpci which is used by
> all PCI devices. When a PCI device sets up MSI or MSI-X IRQs, the
> library creates a per-device IRQ domain for this device, which is
> used by the device for allocating and freeing IRQs.
>
> The per-device domain delegates this allocation and freeing to the
> parent-domain. In the end, the corresponding callbacks of the parent
> domain are responsible for allocating and freeing the IRQs.
>
> The allocation is split into two parts:
> - zpci_msi_prepare() is called once for each device and allocates the
> required resources. On s390, each PCI function has its own airq
> vector and a summary bit, which must be configured once per function.
> This is done in prepare().
> - zpci_msi_alloc() can be called multiple times for allocating one or
> more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
> number in the kernel and the hardware IRQ number.
>
> Freeing is split into two counterparts:
> - zpci_msi_free() reverts the effects of zpci_msi_alloc() and
> - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
> callend once when all IRQs are freed before a device is removed.
>
> Since the parent domain in the end allocates the IRQs, the hwirq
> encoding must be unambiguous for all IRQs of all devices. This is
> achieved by encoding the hwirq using the PCI function id and the MSI
> index.
>
> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
> ---
> arch/s390/Kconfig | 1 +
> arch/s390/include/asm/pci.h | 1 +
> arch/s390/pci/pci_bus.c | 1 +
> arch/s390/pci/pci_irq.c | 335 +++++++++++++++++++++++++++-----------------
> 4 files changed, 208 insertions(+), 130 deletions(-)
>
--- snip ---
> +
> +static int zpci_msi_prepare(struct irq_domain *domain,
> + struct device *dev, int nvec,
> + msi_alloc_info_t *info)
> +{
> + struct zpci_dev *zdev = to_zpci_dev(dev);
> + struct pci_dev *pdev = to_pci_dev(dev);
> + unsigned long bit;
> + int msi_vecs, rc;
>
> msi_vecs = min_t(unsigned int, nvec, zdev->max_msi);
> - if (msi_vecs < nvec) {
> - pr_info("%s requested %d irqs, allocate system limit of %d",
> + if (msi_vecs < nvec)
> + pr_info("%s requested %d IRQs, allocate system limit of %d",
> pci_name(pdev), nvec, zdev->max_msi);
This is already wrong in the existing code but the above pr_info()
misses a "\n" at the end.
> - }
>
--- snip ---
> +static int zpci_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> + unsigned int nr_irqs, void *args)
> +{
> + struct msi_desc *desc = ((msi_alloc_info_t *)args)->desc;
> + struct zpci_dev *zdev = to_zpci_dev(desc->dev);
> + irq_hw_number_t hwirq;
> + unsigned long bit;
> + unsigned int cpu;
> + int i;
> +
> + bit = zdev->msi_first_bit + desc->msi_index;
> + hwirq = zpci_encode_hwirq(zdev->fid, desc->msi_index);
> +
> + if (desc->msi_index + nr_irqs > zdev->max_msi)
> + return -EINVAL;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_domain_set_info(domain, virq + i, hwirq + i,
> + &zpci_irq_chip, zdev,
> + handle_percpu_irq, NULL, NULL);
> +
> + if (irq_delivery == DIRECTED) {
> + for_each_possible_cpu(cpu) {
> + airq_iv_set_ptr(zpci_ibv[cpu],
> + bit + i, hwirq + i);
> + }
> +
The above closing brace seems to be indented wrong. I have no idea why
checkpatch.pl --strict doesn't catch this (I tried). It also doesn't
complain when I remove one tab so let's do that. While at it also drop
the empty line here.
> + } else {
> + airq_iv_set_ptr(zdev->aibv, bit + i, hwirq + i);
> }
> - msi->msg.address_lo = 0;
> - msi->msg.address_hi = 0;
> - msi->msg.data = 0;
> - msi->irq = 0;
> }
>
> - if (zdev->aisb != -1UL) {
> - zpci_ibv[zdev->aisb] = NULL;
> - airq_iv_free_bit(zpci_sbv, zdev->aisb);
> - zdev->aisb = -1UL;
> - }
> - if (zdev->aibv) {
> - airq_iv_release(zdev->aibv);
> - zdev->aibv = NULL;
> - }
> + return 0;
> +}
--- snip ---
Apart from the two style issues this now works well with directed IRQs
and overall is a nice cleanup. Thanks a lot!
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API
2025-11-17 8:59 ` [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API Tobias Schumacher
2025-11-17 17:20 ` Niklas Schnelle
@ 2025-11-17 22:46 ` Farhan Ali
[not found] ` <c5823dcc4bfb96a632f159f79af43d98@imap.linux.ibm.com>
1 sibling, 1 reply; 8+ messages in thread
From: Farhan Ali @ 2025-11-17 22:46 UTC (permalink / raw)
To: Tobias Schumacher, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Niklas Schnelle, Gerald Schaefer, Gerd Bayer, Halil Pasic,
Matthew Rosato, Thomas Gleixner
Cc: linux-kernel, linux-s390
On 11/17/2025 12:59 AM, Tobias Schumacher wrote:
> s390 is one of the last architectures using the legacy API for setup and
> teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
> to the MSI parent domain API. For details, see:
>
> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
>
> In detail, create an MSI parent domain for zpci which is used by
> all PCI devices. When a PCI device sets up MSI or MSI-X IRQs, the
> library creates a per-device IRQ domain for this device, which is
> used by the device for allocating and freeing IRQs.
>
> The per-device domain delegates this allocation and freeing to the
> parent-domain. In the end, the corresponding callbacks of the parent
> domain are responsible for allocating and freeing the IRQs.
>
> The allocation is split into two parts:
> - zpci_msi_prepare() is called once for each device and allocates the
> required resources. On s390, each PCI function has its own airq
> vector and a summary bit, which must be configured once per function.
> This is done in prepare().
> - zpci_msi_alloc() can be called multiple times for allocating one or
> more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
> number in the kernel and the hardware IRQ number.
>
> Freeing is split into two counterparts:
> - zpci_msi_free() reverts the effects of zpci_msi_alloc() and
> - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
> callend once when all IRQs are freed before a device is removed.
>
> Since the parent domain in the end allocates the IRQs, the hwirq
> encoding must be unambiguous for all IRQs of all devices. This is
> achieved by encoding the hwirq using the PCI function id and the MSI
> index.
>
> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
> ---
> arch/s390/Kconfig | 1 +
> arch/s390/include/asm/pci.h | 1 +
> arch/s390/pci/pci_bus.c | 1 +
> arch/s390/pci/pci_irq.c | 335 +++++++++++++++++++++++++++-----------------
> 4 files changed, 208 insertions(+), 130 deletions(-)
>
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index df22b10d91415e1ed183cc8add9ad0ac4293c50e..48cd6a12bd04dfe4dd61ecc79d3401ba685c51bb 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -251,6 +251,7 @@ config S390
> select HOTPLUG_SMT
> select IOMMU_HELPER if PCI
> select IOMMU_SUPPORT if PCI
> + select IRQ_MSI_LIB if PCI
> select KASAN_VMALLOC if KASAN
> select LOCK_MM_AND_FIND_VMA
> select MMU_GATHER_MERGE_VMAS
> diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
> index a32f465ecf73a5cc3408a312d94ec888d62848cc..462e87bdb7acdfa4e7df0f9ca8e82c269e1f98aa 100644
> --- a/arch/s390/include/asm/pci.h
> +++ b/arch/s390/include/asm/pci.h
> @@ -310,6 +310,7 @@ int zpci_dma_exit_device(struct zpci_dev *zdev);
> /* IRQ */
> int __init zpci_irq_init(void);
> void __init zpci_irq_exit(void);
> +void zpci_set_msi_parent_domain(struct zpci_bus *zbus);
>
> /* FMB */
> int zpci_fmb_enable_device(struct zpci_dev *);
> diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
> index be8c697fea0cc755cfdb4fb0a9e3b95183bec0dc..2be33cfb8970409db4fcb75ea73543f49b583a5c 100644
> --- a/arch/s390/pci/pci_bus.c
> +++ b/arch/s390/pci/pci_bus.c
> @@ -210,6 +210,7 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
> }
>
> zbus->bus = bus;
> + zpci_set_msi_parent_domain(zbus);
Why are we setting the zpci_set_msi_parent_domain per root device
instead of per zpci device?
>
> return 0;
> }
> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
> index e73be96ce5fe6473fc193d65b8f0ff635d6a98ba..f2c2fc23d5693e211bcbfd94e8f2fe25dc71a4e2 100644
> --- a/arch/s390/pci/pci_irq.c
> +++ b/arch/s390/pci/pci_irq.c
> @@ -7,6 +7,7 @@
> #include <linux/kernel_stat.h>
> #include <linux/pci.h>
> #include <linux/msi.h>
> +#include <linux/irqchip/irq-msi-lib.h>
> #include <linux/smp.h>
>
> #include <asm/isc.h>
> @@ -29,6 +30,8 @@ static struct airq_iv *zpci_sbv;
> */
> static struct airq_iv **zpci_ibv;
>
> +static struct irq_domain *zpci_msi_parent_domain;
> +
> /* Modify PCI: Register floating adapter interruptions */
> static int zpci_set_airq(struct zpci_dev *zdev)
> {
> @@ -110,43 +113,42 @@ static int zpci_set_irq(struct zpci_dev *zdev)
> return rc;
> }
>
> -/* Clear adapter interruptions */
> -static int zpci_clear_irq(struct zpci_dev *zdev)
> +static int zpci_set_irq_affinity(struct irq_data *data, const struct cpumask *dest,
> + bool force)
> {
> - int rc;
> -
> - if (irq_delivery == DIRECTED)
> - rc = zpci_clear_directed_irq(zdev);
> - else
> - rc = zpci_clear_airq(zdev);
> -
> - return rc;
> + irq_data_update_affinity(data, dest);
> + return IRQ_SET_MASK_OK;
> }
>
> -static int zpci_set_irq_affinity(struct irq_data *data, const struct cpumask *dest,
> - bool force)
> +static void zpci_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> {
> - struct msi_desc *entry = irq_data_get_msi_desc(data);
> - struct msi_msg msg = entry->msg;
> - int cpu_addr = smp_cpu_get_cpu_address(cpumask_first(dest));
> + struct msi_desc *desc = irq_data_get_msi_desc(data);
> + struct zpci_dev *zdev = to_zpci_dev(desc->dev);
>
> - msg.address_lo &= 0xff0000ff;
> - msg.address_lo |= (cpu_addr << 8);
> - pci_write_msi_msg(data->irq, &msg);
> + if (irq_delivery == DIRECTED) {
> + int cpu = cpumask_first(irq_data_get_affinity_mask(data));
>
> - return IRQ_SET_MASK_OK;
> + msg->address_lo = zdev->msi_addr & 0xff0000ff;
> + msg->address_lo |= (smp_cpu_get_cpu_address(cpu) << 8);
> + } else {
> + msg->address_lo = zdev->msi_addr & 0xffffffff;
> + }
> + msg->address_hi = zdev->msi_addr >> 32;
> + msg->data = data->hwirq & 0xffffffff;
> }
>
> static struct irq_chip zpci_irq_chip = {
> .name = "PCI-MSI",
> .irq_unmask = pci_msi_unmask_irq,
> .irq_mask = pci_msi_mask_irq,
> + .irq_compose_msi_msg = zpci_compose_msi_msg
> };
>
> static void zpci_handle_cpu_local_irq(bool rescan)
> {
> struct airq_iv *dibv = zpci_ibv[smp_processor_id()];
> union zpci_sic_iib iib = {{0}};
> + irq_hw_number_t hwirq;
> unsigned long bit;
> int irqs_on = 0;
>
> @@ -164,7 +166,8 @@ static void zpci_handle_cpu_local_irq(bool rescan)
> continue;
> }
> inc_irq_stat(IRQIO_MSI);
> - generic_handle_irq(airq_iv_get_data(dibv, bit));
> + hwirq = airq_iv_get_ptr(dibv, bit);
> + generic_handle_domain_irq(zpci_msi_parent_domain, hwirq);
> }
> }
>
> @@ -229,6 +232,7 @@ static void zpci_floating_irq_handler(struct airq_struct *airq,
> struct tpi_info *tpi_info)
> {
> union zpci_sic_iib iib = {{0}};
> + irq_hw_number_t hwirq;
> unsigned long si, ai;
> struct airq_iv *aibv;
> int irqs_on = 0;
> @@ -256,7 +260,9 @@ static void zpci_floating_irq_handler(struct airq_struct *airq,
> break;
> inc_irq_stat(IRQIO_MSI);
> airq_iv_lock(aibv, ai);
> - generic_handle_irq(airq_iv_get_data(aibv, ai));
> +
> + hwirq = airq_iv_get_ptr(aibv, ai);
> + generic_handle_domain_irq(zpci_msi_parent_domain, hwirq);
> airq_iv_unlock(aibv, ai);
> }
> }
> @@ -278,7 +284,7 @@ static int __alloc_airq(struct zpci_dev *zdev, int msi_vecs,
> zdev->aisb = *bit;
>
> /* Create adapter interrupt vector */
> - zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA | AIRQ_IV_BITLOCK, NULL);
> + zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_PTR | AIRQ_IV_BITLOCK, NULL);
> if (!zdev->aibv)
> return -ENOMEM;
>
> @@ -290,146 +296,203 @@ static int __alloc_airq(struct zpci_dev *zdev, int msi_vecs,
> return 0;
> }
>
> -int arch_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
> +static struct airq_struct zpci_airq = {
> + .handler = zpci_floating_irq_handler,
> + .isc = PCI_ISC,
> +};
> +
> +/*
> + * Encode the hwirq number for the parent domain. The encoding must be unique
> + * for each IRQ of each device in the parent domain, so it uses the fid to
> + * identify the device and the msi_index to identify the IRQ within that device.
> + */
> +static inline irq_hw_number_t zpci_encode_hwirq(u32 fid, u16 msi_index)
> {
> - unsigned int hwirq, msi_vecs, irqs_per_msi, i, cpu;
> - struct zpci_dev *zdev = to_zpci(pdev);
> - struct msi_desc *msi;
> - struct msi_msg msg;
> - unsigned long bit;
> - int cpu_addr;
> - int rc, irq;
> + return ((irq_hw_number_t)fid << 32) | msi_index;
> +}
>
> - zdev->aisb = -1UL;
> - zdev->msi_first_bit = -1U;
> +static inline u16 zpci_decode_hwirq_msi_index(irq_hw_number_t irq)
> +{
> + return irq & 0xFFFF;
> +}
> +
> +static int zpci_msi_prepare(struct irq_domain *domain,
> + struct device *dev, int nvec,
> + msi_alloc_info_t *info)
> +{
> + struct zpci_dev *zdev = to_zpci_dev(dev);
> + struct pci_dev *pdev = to_pci_dev(dev);
> + unsigned long bit;
> + int msi_vecs, rc;
>
> msi_vecs = min_t(unsigned int, nvec, zdev->max_msi);
> - if (msi_vecs < nvec) {
> - pr_info("%s requested %d irqs, allocate system limit of %d",
> + if (msi_vecs < nvec)
> + pr_info("%s requested %d IRQs, allocate system limit of %d",
> pci_name(pdev), nvec, zdev->max_msi);
> - }
>
> rc = __alloc_airq(zdev, msi_vecs, &bit);
> - if (rc < 0)
> + if (rc) {
> + pr_err("Allocating adapter IRQs for %s failed\n", pci_name(pdev));
> return rc;
> + }
>
> - /*
> - * Request MSI interrupts:
> - * When using MSI, nvec_used interrupt sources and their irq
> - * descriptors are controlled through one msi descriptor.
> - * Thus the outer loop over msi descriptors shall run only once,
> - * while two inner loops iterate over the interrupt vectors.
> - * When using MSI-X, each interrupt vector/irq descriptor
> - * is bound to exactly one msi descriptor (nvec_used is one).
> - * So the inner loops are executed once, while the outer iterates
> - * over the MSI-X descriptors.
> - */
> - hwirq = bit;
> - msi_for_each_desc(msi, &pdev->dev, MSI_DESC_NOTASSOCIATED) {
> - if (hwirq - bit >= msi_vecs)
> - break;
> - irqs_per_msi = min_t(unsigned int, msi_vecs, msi->nvec_used);
> - irq = __irq_alloc_descs(-1, 0, irqs_per_msi, 0, THIS_MODULE,
> - (irq_delivery == DIRECTED) ?
> - msi->affinity : NULL);
> - if (irq < 0)
> - return -ENOMEM;
> -
> - for (i = 0; i < irqs_per_msi; i++) {
> - rc = irq_set_msi_desc_off(irq, i, msi);
> - if (rc)
> - return rc;
> - irq_set_chip_and_handler(irq + i, &zpci_irq_chip,
> - handle_percpu_irq);
> - }
> -
> - msg.data = hwirq - bit;
> + zdev->msi_first_bit = bit;
> + zdev->msi_nr_irqs = msi_vecs;
> + rc = zpci_set_irq(zdev);
> + if (rc) {
> + pr_err("Registering adapter IRQs for %s failed\n",
> + pci_name(pdev));
> if (irq_delivery == DIRECTED) {
> - if (msi->affinity)
> - cpu = cpumask_first(&msi->affinity->mask);
> - else
> - cpu = 0;
> - cpu_addr = smp_cpu_get_cpu_address(cpu);
> -
> - msg.address_lo = zdev->msi_addr & 0xff0000ff;
> - msg.address_lo |= (cpu_addr << 8);
> -
> - for_each_possible_cpu(cpu) {
> - for (i = 0; i < irqs_per_msi; i++)
> - airq_iv_set_data(zpci_ibv[cpu],
> - hwirq + i, irq + i);
> - }
> + airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, msi_vecs);
> } else {
> - msg.address_lo = zdev->msi_addr & 0xffffffff;
> - for (i = 0; i < irqs_per_msi; i++)
> - airq_iv_set_data(zdev->aibv, hwirq + i, irq + i);
> + zpci_clear_airq(zdev);
> + airq_iv_release(zdev->aibv);
> + zdev->aibv = NULL;
> + airq_iv_free_bit(zpci_sbv, zdev->aisb);
> + zdev->aisb = -1UL;
> }
> - msg.address_hi = zdev->msi_addr >> 32;
> - pci_write_msi_msg(irq, &msg);
> - hwirq += irqs_per_msi;
> + zdev->msi_first_bit = -1U;
> + return rc;
> }
>
> - zdev->msi_first_bit = bit;
> - zdev->msi_nr_irqs = hwirq - bit;
> + return 0;
> +}
>
> - rc = zpci_set_irq(zdev);
> - if (rc)
> - return rc;
> +static void zpci_msi_teardown_directed(struct zpci_dev *zdev)
> +{
> + zpci_clear_directed_irq(zdev);
> + airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, zdev->max_msi);
> + zdev->msi_first_bit = -1U;
> +}
>
> - return (zdev->msi_nr_irqs == nvec) ? 0 : zdev->msi_nr_irqs;
> +static void zpci_msi_teardown_floating(struct zpci_dev *zdev)
> +{
> + zpci_clear_airq(zdev);
> + airq_iv_release(zdev->aibv);
> + zdev->aibv = NULL;
> + airq_iv_free_bit(zpci_sbv, zdev->aisb);
> + zdev->aisb = -1UL;
> + zdev->msi_first_bit = -1U;
> }
>
> -void arch_teardown_msi_irqs(struct pci_dev *pdev)
> +static void zpci_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *arg)
> {
> - struct zpci_dev *zdev = to_zpci(pdev);
> - struct msi_desc *msi;
> - unsigned int i;
> - int rc;
> + struct zpci_dev *zdev = to_zpci_dev(domain->dev);
>
> - /* Disable interrupts */
> - rc = zpci_clear_irq(zdev);
> - if (rc)
> - return;
> + if (irq_delivery == DIRECTED)
> + zpci_msi_teardown_directed(zdev);
> + else
> + zpci_msi_teardown_floating(zdev);
> +}
>
> - /* Release MSI interrupts */
> - msi_for_each_desc(msi, &pdev->dev, MSI_DESC_ASSOCIATED) {
> - for (i = 0; i < msi->nvec_used; i++) {
> - irq_set_msi_desc(msi->irq + i, NULL);
> - irq_free_desc(msi->irq + i);
> +static int zpci_msi_domain_alloc(struct irq_domain *domain, unsigned int virq,
> + unsigned int nr_irqs, void *args)
> +{
> + struct msi_desc *desc = ((msi_alloc_info_t *)args)->desc;
> + struct zpci_dev *zdev = to_zpci_dev(desc->dev);
> + irq_hw_number_t hwirq;
> + unsigned long bit;
> + unsigned int cpu;
> + int i;
> +
> + bit = zdev->msi_first_bit + desc->msi_index;
> + hwirq = zpci_encode_hwirq(zdev->fid, desc->msi_index);
> +
> + if (desc->msi_index + nr_irqs > zdev->max_msi)
> + return -EINVAL;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + irq_domain_set_info(domain, virq + i, hwirq + i,
> + &zpci_irq_chip, zdev,
> + handle_percpu_irq, NULL, NULL);
> +
> + if (irq_delivery == DIRECTED) {
> + for_each_possible_cpu(cpu) {
> + airq_iv_set_ptr(zpci_ibv[cpu],
> + bit + i, hwirq + i);
> + }
> +
> + } else {
> + airq_iv_set_ptr(zdev->aibv, bit + i, hwirq + i);
> }
> - msi->msg.address_lo = 0;
> - msi->msg.address_hi = 0;
> - msi->msg.data = 0;
> - msi->irq = 0;
> }
>
> - if (zdev->aisb != -1UL) {
> - zpci_ibv[zdev->aisb] = NULL;
> - airq_iv_free_bit(zpci_sbv, zdev->aisb);
> - zdev->aisb = -1UL;
> - }
> - if (zdev->aibv) {
> - airq_iv_release(zdev->aibv);
> - zdev->aibv = NULL;
> - }
> + return 0;
> +}
>
> - if ((irq_delivery == DIRECTED) && zdev->msi_first_bit != -1U)
> - airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, zdev->msi_nr_irqs);
> +static void zpci_msi_domain_free(struct irq_domain *domain, unsigned int virq,
> + unsigned int nr_irqs)
> +{
> + irq_hw_number_t hwirq;
> + struct irq_data *d;
> + u16 msi_index;
> + int i;
> +
> + for (i = 0; i < nr_irqs; i++) {
> + d = irq_domain_get_irq_data(domain, virq + i);
> + hwirq = d->hwirq;
> + msi_index = zpci_decode_hwirq_msi_index(hwirq);
> + irq_domain_reset_irq_data(d);
> + }
> }
>
> -bool arch_restore_msi_irqs(struct pci_dev *pdev)
Why are we removing arch_restore_msi_irqs? This is called in the path
for restoring the MSI/MSI-X state of the device
(pci_restore_msi_state()). It looks like irqdomain infrastructure will
not setup airq for the device in the restore path.
> +static const struct irq_domain_ops zpci_msi_domain_ops = {
> + .alloc = zpci_msi_domain_alloc,
> + .free = zpci_msi_domain_free
> +};
> +
> +static bool zpci_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
> + struct irq_domain *real_parent,
> + struct msi_domain_info *info)
> {
> - struct zpci_dev *zdev = to_zpci(pdev);
> + if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
> + return false;
> +
> + info->ops->msi_prepare = zpci_msi_prepare;
> + info->ops->msi_teardown = zpci_msi_teardown;
>
> - zpci_set_irq(zdev);
> return true;
> }
>
> -static struct airq_struct zpci_airq = {
> - .handler = zpci_floating_irq_handler,
> - .isc = PCI_ISC,
> +static struct msi_parent_ops zpci_msi_parent_ops = {
> + .supported_flags = MSI_GENERIC_FLAGS_MASK |
> + MSI_FLAG_PCI_MSIX |
> + MSI_FLAG_MULTI_PCI_MSI,
> + .required_flags = MSI_FLAG_USE_DEF_DOM_OPS |
> + MSI_FLAG_USE_DEF_CHIP_OPS |
> + MSI_FLAG_PCI_MSI_MASK_PARENT,
> + .init_dev_msi_info = zpci_init_dev_msi_info
> };
>
> +static int __init zpci_create_parent_msi_domain(void)
> +{
> + struct irq_domain_info info = {
> + .fwnode = irq_domain_alloc_named_fwnode("zpci_msi"),
> + .ops = &zpci_msi_domain_ops
> + };
> +
> + if (!info.fwnode) {
> + pr_err("Failed to allocate fwnode for MSI IRQ domain\n");
> + return -ENOMEM;
> + }
> +
> + if (irq_delivery == FLOATING)
> + zpci_msi_parent_ops.required_flags |= MSI_FLAG_NO_AFFINITY;
> + zpci_msi_parent_domain = msi_create_parent_irq_domain(&info, &zpci_msi_parent_ops);
> + if (!zpci_msi_parent_domain) {
> + irq_domain_free_fwnode(info.fwnode);
> + pr_err("Failed to create MSI IRQ domain\n");
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +void zpci_set_msi_parent_domain(struct zpci_bus *zbus)
> +{
> + dev_set_msi_domain(&zbus->bus->dev, zpci_msi_parent_domain);
> +}
> +
> static void __init cpu_enable_directed_irq(void *unused)
> {
> union zpci_sic_iib iib = {{0}};
> @@ -466,7 +529,7 @@ static int __init zpci_directed_irq_init(void)
> * is only done on the first vector.
> */
> zpci_ibv[cpu] = airq_iv_create(cache_line_size() * BITS_PER_BYTE,
> - AIRQ_IV_DATA |
> + AIRQ_IV_PTR |
> AIRQ_IV_CACHELINE |
> (!cpu ? AIRQ_IV_ALLOC : 0), NULL);
> if (!zpci_ibv[cpu])
> @@ -511,6 +574,11 @@ int __init zpci_irq_init(void)
> rc = register_adapter_interrupt(&zpci_airq);
> if (rc)
> goto out;
> +
> + zpci_create_parent_msi_domain();
> + if (!zpci_msi_parent_domain)
> + goto out_airq;
> +
> /* Set summary to 1 to be called every time for the ISC. */
> *zpci_airq.lsi_ptr = 1;
>
> @@ -524,7 +592,7 @@ int __init zpci_irq_init(void)
> }
>
> if (rc)
> - goto out_airq;
> + goto out_msi_domain;
>
> /*
> * Enable floating IRQs (with suppression after one IRQ). When using
> @@ -533,6 +601,8 @@ int __init zpci_irq_init(void)
> zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, PCI_ISC, &iib);
>
> return 0;
> +out_msi_domain:
> + irq_domain_remove(zpci_msi_parent_domain);
Shouldn't this also do a irq_domain_free_fwnode?
Thanks
Farhan
> out_airq:
> unregister_adapter_interrupt(&zpci_airq);
> out:
> @@ -541,6 +611,7 @@ int __init zpci_irq_init(void)
>
> void __init zpci_irq_exit(void)
> {
> + struct fwnode_handle *fn;
> unsigned int cpu;
>
> if (irq_delivery == DIRECTED) {
> @@ -549,6 +620,10 @@ void __init zpci_irq_exit(void)
> }
> }
> kfree(zpci_ibv);
> + fn = zpci_msi_parent_domain->fwnode;
> + irq_domain_remove(zpci_msi_parent_domain);
> + irq_domain_free_fwnode(fn);
> +
> if (zpci_sbv)
> airq_iv_release(zpci_sbv);
> unregister_adapter_interrupt(&zpci_airq);
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API
[not found] ` <c5823dcc4bfb96a632f159f79af43d98@imap.linux.ibm.com>
@ 2025-11-18 10:15 ` Niklas Schnelle
0 siblings, 0 replies; 8+ messages in thread
From: Niklas Schnelle @ 2025-11-18 10:15 UTC (permalink / raw)
To: Tobias Schumacher, Farhan Ali
Cc: Tobias Schumacher, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Gerald Schaefer, Gerd Bayer, Halil Pasic, Matthew Rosato,
Thomas Gleixner, linux-kernel, linux-s390
On Tue, 2025-11-18 at 08:14 +0100, Tobias Schumacher wrote:
> Am 2025-11-17 23:46, schrieb Farhan Ali:
> > On 11/17/2025 12:59 AM, Tobias Schumacher wrote:
> --- snip ---
>
> > > diff --git a/arch/s390/pci/pci_bus.c b/arch/s390/pci/pci_bus.c
> > > index
> > > be8c697fea0cc755cfdb4fb0a9e3b95183bec0dc..2be33cfb8970409db4fcb75ea73543f49b583a5c
> > > 100644
> > > --- a/arch/s390/pci/pci_bus.c
> > > +++ b/arch/s390/pci/pci_bus.c
> > > @@ -210,6 +210,7 @@ static int zpci_bus_create_pci_bus(struct zpci_bus
> > > *zbus, struct zpci_dev *fr, s
> > > }
> > > zbus->bus = bus;
> > > + zpci_set_msi_parent_domain(zbus);
> >
> > Why are we setting the zpci_set_msi_parent_domain per root device
> > instead of per zpci device?
>
> On other architectures the parent domain is set once for the root bus.
> During bus scanning, pci_device_add() then sets the domain in the device
> by calling pci_set_msi_domain(). By adding the parent domain to the root
> busses, we are using the same mechanisms and are as close as possible to
> other architectures.
>
I didn't think about this enough, but reading your argument I wonder if
that points to having a parent domain per PCI domain vs having a single
global one like you have at the moment. And since we're limited to one
struct zpci_bus per PCI domain in the current code with struct
zpci_bus::bus being a root bus and the zpci_bus also carrying the
domain number, I think that would more closely match the parent domain
per PCI domain model.
As for Farhan's comment as I understand it the common code still
creates child MSI domains for the individual zpci devices. So both a
global parent domain as in this version or a per zpci bus domain will
work we just want to keep things as similar to non virtual PCI
topologies as possible I think.
Thanks,
Niklas
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-18 10:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-17 8:59 [PATCH v2 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API Tobias Schumacher
2025-11-17 8:59 ` [PATCH v2 1/2] genirq: Change hwirq parameter to irq_hw_number_t Tobias Schumacher
2025-11-17 11:26 ` Thomas Gleixner
2025-11-17 15:56 ` Niklas Schnelle
2025-11-17 8:59 ` [PATCH v2 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API Tobias Schumacher
2025-11-17 17:20 ` Niklas Schnelle
2025-11-17 22:46 ` Farhan Ali
[not found] ` <c5823dcc4bfb96a632f159f79af43d98@imap.linux.ibm.com>
2025-11-18 10:15 ` Niklas Schnelle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox