From: "Tobias Schumacher" <ts@linux.ibm.com>
To: "Gerd Bayer" <gbayer@linux.ibm.com>,
"Tobias Schumacher" <ts@linux.ibm.com>,
"Heiko Carstens" <hca@linux.ibm.com>,
"Vasily Gorbik" <gor@linux.ibm.com>,
"Alexander Gordeev" <agordeev@linux.ibm.com>,
"Christian Borntraeger" <borntraeger@linux.ibm.com>,
"Sven Schnelle" <svens@linux.ibm.com>,
"Niklas Schnelle" <schnelle@linux.ibm.com>,
"Gerald Schaefer" <gerald.schaefer@linux.ibm.com>,
"Halil Pasic" <pasic@linux.ibm.com>,
"Matthew Rosato" <mjrosato@linux.ibm.com>,
"Thomas Gleixner" <tglx@linutronix.de>
Cc: <linux-kernel@vger.kernel.org>, <linux-s390@vger.kernel.org>,
"Farhan Ali" <alifm@linux.ibm.com>
Subject: Re: [PATCH v5 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API
Date: Fri, 21 Nov 2025 15:49:48 +0100 [thread overview]
Message-ID: <DEEGFVBPI57E.1QW7C1D4B2ER4@linux.ibm.com> (raw)
In-Reply-To: <626c1d010ff720c8c2beb7bdd36b0565850a6ab3.camel@linux.ibm.com>
On Fri Nov 21, 2025 at 2:27 PM CET, Gerd Bayer wrote:
> Hi Tobias,
>
> sorry for being late with my comments...
>
> On Fri, 2025-11-21 at 06:32 +0100, Tobias Schumacher wrote:
>> s390 is one of the last architectures using the legacy API for setup and
>> teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
>> to the MSI parent domain API. For details, see:
>>
>> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
>>
>> In detail, create an MSI parent domain for each PCI domain. When a PCI
>> device sets up MSI or MSI-X IRQs, the library creates a per-device IRQ
>> domain for this device, which is used by the device for allocating and
>> freeing IRQs.
>>
>> The per-device domain delegates this allocation and freeing to the
>> parent-domain. In the end, the corresponding callbacks of the parent
>> domain are responsible for allocating and freeing the IRQs.
>>
>> The allocation is split into two parts:
>> - zpci_msi_prepare() is called once for each device and allocates the
>> required resources. On s390, each PCI function has its own airq
>> vector and a summary bit, which must be configured once per function.
>> This is done in prepare().
>> - zpci_msi_alloc() can be called multiple times for allocating one or
>> more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
>> number in the kernel and the hardware IRQ number.
>>
>> Freeing is split into two counterparts:
>> - zpci_msi_free() reverts the effects of zpci_msi_alloc() and
>> - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
>> called once when all IRQs are freed before a device is removed.
>>
>> Since the parent domain in the end allocates the IRQs, the hwirq
>> encoding must be unambiguous for all IRQs of all devices. This is
>> achieved by
>>
>> encoding the hwirq using the PCI function id and the MSI
>> index.
>
> This is no longer true with the per-PCI-domain irq domains! But you
> encode the hwirq with the devfn within the PCI domain, instead.
Correct, will fix that.
>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
>> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
>> ---
>> arch/s390/Kconfig | 1 +
>> arch/s390/include/asm/pci.h | 4 +
>> arch/s390/pci/pci_bus.c | 21 ++-
>> arch/s390/pci/pci_irq.c | 333 +++++++++++++++++++++++++++-----------------
>> 4 files changed, 227 insertions(+), 132 deletions(-)
>>
>> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>> index 778ce20d34046cad84dd4ef57cab5a662e5796d9..46ab67d607f0db7f5f108106172699a5eebfc8c8 100644
>> --- a/arch/s390/Kconfig
>> +++ b/arch/s390/Kconfig
>> @@ -251,6 +251,7 @@ config S390
>> select HOTPLUG_SMT
>> select IOMMU_HELPER if PCI
>> select IOMMU_SUPPORT if PCI
>> + select IRQ_MSI_LIB if PCI
>
> Nit: There's precedence for both versions (above and below!) but I
> personally would prefer to indent the pre-condition "if PCI" so it
> stands out. Maybe that's a generic clean-up task for this arch-specific
> file...
Since at least the 'if PCI' preconditions are indented in this file,
I'll also do it for this line.
>> @@ -189,7 +190,7 @@ static bool zpci_bus_is_multifunction_root(struct zpci_dev *zdev)
>> static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, struct pci_ops *ops)
>> {
>> struct pci_bus *bus;
>> - int domain;
>> + int domain, rc;
>>
>> domain = zpci_alloc_domain((u16)fr->uid);
>> if (domain < 0)
>> @@ -199,19 +200,28 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
>> zbus->multifunction = zpci_bus_is_multifunction_root(fr);
>> zbus->max_bus_speed = fr->max_bus_speed;
>>
>> + rc = zpci_create_parent_msi_domain(zbus);
>> + if (rc)
>> + goto out_free_domain;
>> +
>
> If you shortened this to use the call to
> zpci_create_parent_msi_domain() as predicate for "if" you could do
> without the additional rc.
Will do.
>> /*
>> * Note that the zbus->resources are taken over and zbus->resources
>> * is empty after a successful call
>> */
>> bus = pci_create_root_bus(NULL, ZPCI_BUS_NR, ops, zbus, &zbus->resources);
>> - if (!bus) {
>> - zpci_free_domain(zbus->domain_nr);
>> - return -EFAULT;
>> - }
>> + if (!bus)
>> + goto out_remove_msi_domain;
>
> Or do you want to set rc to -EFAULT here, and return the "original" rc
> in the error exits?
As Heiko mentioned, -EFAULT shouldn't be returned anyways I'd change it
to -ENOMEM for all error cases.
--- snip ---
>> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
>> index e73be96ce5fe6473fc193d65b8f0ff635d6a98ba..b0be21ab56995e81f54339fc77167f5930182542 100644
>> --- a/arch/s390/pci/pci_irq.c
>> +++ b/arch/s390/pci/pci_irq.c
>> @@ -7,6 +7,7 @@
>> #include <linux/kernel_stat.h>
>> #include <linux/pci.h>
>> #include <linux/msi.h>
>> +#include <linux/irqchip/irq-msi-lib.h>
>> #include <linux/smp.h>
>>
>> #include <asm/isc.h>
>> @@ -110,43 +111,41 @@ static int zpci_set_irq(struct zpci_dev *zdev)
>> return rc;
>> }
>>
>> -/* Clear adapter interruptions */
>> -static int zpci_clear_irq(struct zpci_dev *zdev)
>
>
> Any specific reason, why you removed zpci_clear_irq() indirecting to
> airq vs. directed_irq - but kept zpci_set_irq()? Just saying this is
> imbalanced now.
I removed it because it was only required in zpci_msi_teardown(), which
already has the distinction between directed and floating IRQs. So
having a separate zpci_clear_irq() seemed redundant.
--- snip ---
>> +static inline u16 zpci_decode_hwirq_msi_index(irq_hw_number_t irq)
>> +{
>> + return irq & GENMASK_U16(15, 0);
>
>
> Please don't use GENMASK_U16. It is harder to read than any explicit
> constant like 0x00FF, especially since its parameters contradict the
> architecture's endianess.
> But then, is this called anywhere?
Right, it's not used anymore, I'll remove it completely.
--- snip ---
>> +static int zpci_msi_prepare(struct irq_domain *domain,
>> + struct device *dev, int nvec,
>> + msi_alloc_info_t *info)
>> +{
>> + struct zpci_dev *zdev = to_zpci_dev(dev);
>> + struct pci_dev *pdev = to_pci_dev(dev);
>> + unsigned long bit;
>> + int msi_vecs, rc;
>>
>> msi_vecs = min_t(unsigned int, nvec, zdev->max_msi);
>> - if (msi_vecs < nvec) {
>> - pr_info("%s requested %d irqs, allocate system limit of %d",
>> + if (msi_vecs < nvec)
>> + pr_info("%s requested %d IRQs, allocate system limit of %d\n",
>> pci_name(pdev), nvec, zdev->max_msi);
>> - }
>>
>> rc = __alloc_airq(zdev, msi_vecs, &bit);
>> - if (rc < 0)
>> + if (rc) {
>> + pr_err("Allocating adapter IRQs for %s failed\n", pci_name(pdev));
>> return rc;
>> + }
>>
>> - /*
>> - * Request MSI interrupts:
>> - * When using MSI, nvec_used interrupt sources and their irq
>> - * descriptors are controlled through one msi descriptor.
>> - * Thus the outer loop over msi descriptors shall run only once,
>> - * while two inner loops iterate over the interrupt vectors.
>> - * When using MSI-X, each interrupt vector/irq descriptor
>> - * is bound to exactly one msi descriptor (nvec_used is one).
>> - * So the inner loops are executed once, while the outer iterates
>> - * over the MSI-X descriptors.
>> - */
>> - hwirq = bit;
>> - msi_for_each_desc(msi, &pdev->dev, MSI_DESC_NOTASSOCIATED) {
>> - if (hwirq - bit >= msi_vecs)
>> - break;
>> - irqs_per_msi = min_t(unsigned int, msi_vecs, msi->nvec_used);
>> - irq = __irq_alloc_descs(-1, 0, irqs_per_msi, 0, THIS_MODULE,
>> - (irq_delivery == DIRECTED) ?
>> - msi->affinity : NULL);
>> - if (irq < 0)
>> - return -ENOMEM;
>> -
>> - for (i = 0; i < irqs_per_msi; i++) {
>> - rc = irq_set_msi_desc_off(irq, i, msi);
>> - if (rc)
>> - return rc;
>> - irq_set_chip_and_handler(irq + i, &zpci_irq_chip,
>> - handle_percpu_irq);
>> + zdev->msi_first_bit = bit;
>> + zdev->msi_nr_irqs = msi_vecs;
>> + rc = zpci_set_irq(zdev);
>> + if (rc) {
>> + pr_err("Registering adapter IRQs for %s failed\n",
>> + pci_name(pdev));
>> + if (irq_delivery == DIRECTED) {
>> + airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, msi_vecs);
>> + } else {
>> + zpci_clear_airq(zdev);
>> + airq_iv_release(zdev->aibv);
>> + zdev->aibv = NULL;
>> + airq_iv_free_bit(zpci_sbv, zdev->aisb);
>> + zdev->aisb = -1UL;
>
> These two failure clean-ups look a lot like
> zpci_msi_teardown_directed/_floating() below. Could these be called
> instead of duplicating the code?
Yes they are similar, only that zpci_msi_teardown_directed/_floating()
also call zpci_clear_directed_irq()/zpci_clear_airq(), respectively.
Considering your other comment above, it might be cleaner to add
zpci_clear_irq() again, call this directly from zpci_msi_teardown() and
then use zpci_msi_teardown_directed/_floating() here as suggested.
Thanks,
Tobias
next prev parent reply other threads:[~2025-11-21 14:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-21 5:32 [PATCH v5 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API Tobias Schumacher
2025-11-21 5:32 ` [PATCH v5 1/2] genirq: Change hwirq parameter to irq_hw_number_t Tobias Schumacher
2025-11-21 5:32 ` [PATCH v5 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API Tobias Schumacher
2025-11-21 13:18 ` Heiko Carstens
2025-11-21 13:27 ` Gerd Bayer
2025-11-21 14:03 ` Heiko Carstens
2025-11-21 14:19 ` Tobias Schumacher
2025-11-21 14:49 ` Tobias Schumacher [this message]
2025-11-21 11:03 ` [PATCH v5 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API Niklas Schnelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DEEGFVBPI57E.1QW7C1D4B2ER4@linux.ibm.com \
--to=ts@linux.ibm.com \
--cc=agordeev@linux.ibm.com \
--cc=alifm@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=gbayer@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=pasic@linux.ibm.com \
--cc=schnelle@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.