public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Tobias Schumacher" <ts@linux.ibm.com>
To: "Gerd Bayer" <gbayer@linux.ibm.com>,
	"Tobias Schumacher" <ts@linux.ibm.com>,
	"Heiko Carstens" <hca@linux.ibm.com>,
	"Vasily Gorbik" <gor@linux.ibm.com>,
	"Alexander Gordeev" <agordeev@linux.ibm.com>,
	"Christian Borntraeger" <borntraeger@linux.ibm.com>,
	"Sven Schnelle" <svens@linux.ibm.com>,
	"Niklas Schnelle" <schnelle@linux.ibm.com>,
	"Gerald Schaefer" <gerald.schaefer@linux.ibm.com>,
	"Halil Pasic" <pasic@linux.ibm.com>,
	"Matthew Rosato" <mjrosato@linux.ibm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>
Cc: <linux-kernel@vger.kernel.org>, <linux-s390@vger.kernel.org>,
	"Farhan Ali" <alifm@linux.ibm.com>
Subject: Re: [PATCH v5 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API
Date: Fri, 21 Nov 2025 15:49:48 +0100	[thread overview]
Message-ID: <DEEGFVBPI57E.1QW7C1D4B2ER4@linux.ibm.com> (raw)
In-Reply-To: <626c1d010ff720c8c2beb7bdd36b0565850a6ab3.camel@linux.ibm.com>

On Fri Nov 21, 2025 at 2:27 PM CET, Gerd Bayer wrote:
> Hi Tobias,
>
> sorry for being late with my comments...
>
> On Fri, 2025-11-21 at 06:32 +0100, Tobias Schumacher wrote:
>> s390 is one of the last architectures using the legacy API for setup and
>> teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
>> to the MSI parent domain API. For details, see:
>> 
>> https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
>> 
>> In detail, create an MSI parent domain for each PCI domain. When a PCI
>> device sets up MSI or MSI-X IRQs, the library creates a per-device IRQ
>> domain for this device, which is used by the device for allocating and
>> freeing IRQs.
>> 
>> The per-device domain delegates this allocation and freeing to the
>> parent-domain. In the end, the corresponding callbacks of the parent
>> domain are responsible for allocating and freeing the IRQs.
>> 
>> The allocation is split into two parts:
>> - zpci_msi_prepare() is called once for each device and allocates the
>>   required resources. On s390, each PCI function has its own airq
>>   vector and a summary bit, which must be configured once per function.
>>   This is done in prepare().
>> - zpci_msi_alloc() can be called multiple times for allocating one or
>>   more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
>>   number in the kernel and the hardware IRQ number.
>> 
>> Freeing is split into two counterparts:
>> - zpci_msi_free() reverts the effects of zpci_msi_alloc() and
>> - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
>>   called once when all IRQs are freed before a device is removed.
>> 
>> Since the parent domain in the end allocates the IRQs, the hwirq
>> encoding must be unambiguous for all IRQs of all devices. This is
>> achieved by 
>> 
>> encoding the hwirq using the PCI function id and the MSI
>> index.
>
> This is no longer true with the per-PCI-domain irq domains! But you
> encode the hwirq with the devfn within the PCI domain, instead.

Correct, will fix that.

>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
>> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
>> ---
>>  arch/s390/Kconfig           |   1 +
>>  arch/s390/include/asm/pci.h |   4 +
>>  arch/s390/pci/pci_bus.c     |  21 ++-
>>  arch/s390/pci/pci_irq.c     | 333 +++++++++++++++++++++++++++-----------------
>>  4 files changed, 227 insertions(+), 132 deletions(-)
>> 
>> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>> index 778ce20d34046cad84dd4ef57cab5a662e5796d9..46ab67d607f0db7f5f108106172699a5eebfc8c8 100644
>> --- a/arch/s390/Kconfig
>> +++ b/arch/s390/Kconfig
>> @@ -251,6 +251,7 @@ config S390
>>  	select HOTPLUG_SMT
>>  	select IOMMU_HELPER		if PCI
>>  	select IOMMU_SUPPORT		if PCI
>> +	select IRQ_MSI_LIB if PCI
>
> Nit: There's precedence for both versions (above and below!) but I
> personally would prefer to indent the pre-condition "if PCI" so it
> stands out. Maybe that's a generic clean-up task for this arch-specific
> file...

Since at least the 'if PCI' preconditions are indented in this file,
I'll also do it for this line. 

>> @@ -189,7 +190,7 @@ static bool zpci_bus_is_multifunction_root(struct zpci_dev *zdev)
>>  static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, struct pci_ops *ops)
>>  {
>>  	struct pci_bus *bus;
>> -	int domain;
>> +	int domain, rc;
>>  
>>  	domain = zpci_alloc_domain((u16)fr->uid);
>>  	if (domain < 0)
>> @@ -199,19 +200,28 @@ static int zpci_bus_create_pci_bus(struct zpci_bus *zbus, struct zpci_dev *fr, s
>>  	zbus->multifunction = zpci_bus_is_multifunction_root(fr);
>>  	zbus->max_bus_speed = fr->max_bus_speed;
>>  
>> +	rc = zpci_create_parent_msi_domain(zbus);
>> +	if (rc)
>> +		goto out_free_domain;
>> +
>
> If you shortened this to use the call to
> zpci_create_parent_msi_domain() as predicate for "if" you could do
> without the additional rc.

Will do.

>>  	/*
>>  	 * Note that the zbus->resources are taken over and zbus->resources
>>  	 * is empty after a successful call
>>  	 */
>>  	bus = pci_create_root_bus(NULL, ZPCI_BUS_NR, ops, zbus, &zbus->resources);
>> -	if (!bus) {
>> -		zpci_free_domain(zbus->domain_nr);
>> -		return -EFAULT;
>> -	}
>> +	if (!bus)
>> +		goto out_remove_msi_domain;
>
> Or do you want to set rc to -EFAULT here, and return the "original" rc
> in the error exits?

As Heiko mentioned, -EFAULT shouldn't be returned anyways I'd change it
to -ENOMEM for all error cases.

--- snip ---

>> diff --git a/arch/s390/pci/pci_irq.c b/arch/s390/pci/pci_irq.c
>> index e73be96ce5fe6473fc193d65b8f0ff635d6a98ba..b0be21ab56995e81f54339fc77167f5930182542 100644
>> --- a/arch/s390/pci/pci_irq.c
>> +++ b/arch/s390/pci/pci_irq.c
>> @@ -7,6 +7,7 @@
>>  #include <linux/kernel_stat.h>
>>  #include <linux/pci.h>
>>  #include <linux/msi.h>
>> +#include <linux/irqchip/irq-msi-lib.h>
>>  #include <linux/smp.h>
>>  
>>  #include <asm/isc.h>
>> @@ -110,43 +111,41 @@ static int zpci_set_irq(struct zpci_dev *zdev)
>>  	return rc;
>>  }
>>  
>> -/* Clear adapter interruptions */
>> -static int zpci_clear_irq(struct zpci_dev *zdev)
>
>
> Any specific reason, why you removed zpci_clear_irq() indirecting to
> airq vs. directed_irq - but kept zpci_set_irq()? Just saying this is
> imbalanced now.

I removed it because it was only required in zpci_msi_teardown(), which
already has the distinction between directed and floating IRQs. So
having a separate zpci_clear_irq() seemed redundant. 

--- snip ---

>> +static inline u16 zpci_decode_hwirq_msi_index(irq_hw_number_t irq)
>> +{
>> +	return irq & GENMASK_U16(15, 0);
>
>
> Please don't use GENMASK_U16. It is harder to read than any explicit
> constant like 0x00FF, especially since its parameters contradict the
> architecture's endianess.
> But then, is this called anywhere?

Right, it's not used anymore, I'll remove it completely.

--- snip ---

>> +static int zpci_msi_prepare(struct irq_domain *domain,
>> +			    struct device *dev, int nvec,
>> +			    msi_alloc_info_t *info)
>> +{
>> +	struct zpci_dev *zdev = to_zpci_dev(dev);
>> +	struct pci_dev *pdev = to_pci_dev(dev);
>> +	unsigned long bit;
>> +	int msi_vecs, rc;
>>  
>>  	msi_vecs = min_t(unsigned int, nvec, zdev->max_msi);
>> -	if (msi_vecs < nvec) {
>> -		pr_info("%s requested %d irqs, allocate system limit of %d",
>> +	if (msi_vecs < nvec)
>> +		pr_info("%s requested %d IRQs, allocate system limit of %d\n",
>>  			pci_name(pdev), nvec, zdev->max_msi);
>> -	}
>>  
>>  	rc = __alloc_airq(zdev, msi_vecs, &bit);
>> -	if (rc < 0)
>> +	if (rc) {
>> +		pr_err("Allocating adapter IRQs for %s failed\n", pci_name(pdev));
>>  		return rc;
>> +	}
>>  
>> -	/*
>> -	 * Request MSI interrupts:
>> -	 * When using MSI, nvec_used interrupt sources and their irq
>> -	 * descriptors are controlled through one msi descriptor.
>> -	 * Thus the outer loop over msi descriptors shall run only once,
>> -	 * while two inner loops iterate over the interrupt vectors.
>> -	 * When using MSI-X, each interrupt vector/irq descriptor
>> -	 * is bound to exactly one msi descriptor (nvec_used is one).
>> -	 * So the inner loops are executed once, while the outer iterates
>> -	 * over the MSI-X descriptors.
>> -	 */
>> -	hwirq = bit;
>> -	msi_for_each_desc(msi, &pdev->dev, MSI_DESC_NOTASSOCIATED) {
>> -		if (hwirq - bit >= msi_vecs)
>> -			break;
>> -		irqs_per_msi = min_t(unsigned int, msi_vecs, msi->nvec_used);
>> -		irq = __irq_alloc_descs(-1, 0, irqs_per_msi, 0, THIS_MODULE,
>> -					(irq_delivery == DIRECTED) ?
>> -					msi->affinity : NULL);
>> -		if (irq < 0)
>> -			return -ENOMEM;
>> -
>> -		for (i = 0; i < irqs_per_msi; i++) {
>> -			rc = irq_set_msi_desc_off(irq, i, msi);
>> -			if (rc)
>> -				return rc;
>> -			irq_set_chip_and_handler(irq + i, &zpci_irq_chip,
>> -						 handle_percpu_irq);
>> +	zdev->msi_first_bit = bit;
>> +	zdev->msi_nr_irqs = msi_vecs;
>> +	rc = zpci_set_irq(zdev);
>> +	if (rc) {
>> +		pr_err("Registering adapter IRQs for %s failed\n",
>> +		       pci_name(pdev));
>> +		if (irq_delivery == DIRECTED) {
>> +			airq_iv_free(zpci_ibv[0], zdev->msi_first_bit, msi_vecs);
>> +		} else {
>> +			zpci_clear_airq(zdev);
>> +			airq_iv_release(zdev->aibv);
>> +			zdev->aibv = NULL;
>> +			airq_iv_free_bit(zpci_sbv, zdev->aisb);
>> +			zdev->aisb = -1UL;
>
> These two failure clean-ups look a lot like
> zpci_msi_teardown_directed/_floating() below. Could these be called
> instead of duplicating the code?

Yes they are similar, only that zpci_msi_teardown_directed/_floating()
also call zpci_clear_directed_irq()/zpci_clear_airq(), respectively.
Considering your other comment above, it might be cleaner to add
zpci_clear_irq() again, call this directly from zpci_msi_teardown() and
then use zpci_msi_teardown_directed/_floating() here as suggested.

Thanks,
Tobias

  parent reply	other threads:[~2025-11-21 14:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-21  5:32 [PATCH v5 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API Tobias Schumacher
2025-11-21  5:32 ` [PATCH v5 1/2] genirq: Change hwirq parameter to irq_hw_number_t Tobias Schumacher
2025-11-21  5:32 ` [PATCH v5 2/2] s390/pci: Migrate s390 IRQ logic to IRQ domain API Tobias Schumacher
2025-11-21 13:18   ` Heiko Carstens
2025-11-21 13:27   ` Gerd Bayer
2025-11-21 14:03     ` Heiko Carstens
2025-11-21 14:19       ` Tobias Schumacher
2025-11-21 14:49     ` Tobias Schumacher [this message]
2025-11-21 11:03 ` [PATCH v5 0/2] genirq: s390/pci: Migrate MSI interrupts to irqdomain API Niklas Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DEEGFVBPI57E.1QW7C1D4B2ER4@linux.ibm.com \
    --to=ts@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=alifm@linux.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=gbayer@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox