Re: [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

From: Philipp Stanner <phasta@mailbox.org>
To: Shawn Lin <shawn.lin@rock-chips.com>,
	Bjorn Helgaas <bhelgaas@google.com>
Cc: Nirmal Patel <nirmal.patel@linux.intel.com>,
	Jonathan Derrick <jonathan.derrick@linux.dev>,
	Kurt Schwemmer <kurt.schwemmer@microsemi.com>,
	 Logan Gunthorpe <logang@deltatee.com>,
	Philipp Stanner <phasta@kernel.org>,
	linux-pci@vger.kernel.org
Subject: Re: [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation
Date: Wed, 22 Apr 2026 09:32:48 +0200	[thread overview]
Message-ID: <2c227e17a573c8c674c027c9b3014c2ec9ebfec7.camel@mailbox.org> (raw)
In-Reply-To: <1776825522-6390-2-git-send-email-shawn.lin@rock-chips.com>

On Wed, 2026-04-22 at 10:38 +0800, Shawn Lin wrote:
> The PCI/MSI subsystem suffers from a long-standing design issue where
> the implicit, automatic management of IRQ vectors by the devres

nit:
The *automatic* handling is not actually the issue. All devres
interfaces (pcim_) are automatic by definition.

The issue is the "sometimes managed, sometimes not" nature of some pci_
interfaces. I came to call those "hybrid functions".

> framework conflicts with explicit driver cleanup, leading to ambiguous
> ownership and potential resource management bugs.
> 
> Historically, pcim_enable_device() not only manages standard PCI
> resources (BARs) via devres 
> 

See cover-letter. As I said, that's not the case anymore. It was the
case once.

I think you could just briefly state "Historically, some functions
exhibited this hybrid behavior, the last of which by now are …"

If you want you could maybe also Link: one  of my patch series's for
the wider history.

> but also implicitly sets the is_msi_managed
> flag if calling pci_alloc_irq_vectors*(). This registers pcim_msi_release()
> as a cleanup action, creating a hybrid ownership model.

nit:
s/ownership/responsibility

> 
> This ambiguity causes problems when drivers follow a common but
> hazardous pattern:
> 1. Using pcim_enable_device() (which implicitly marks IRQs as managed)
> 2. Explicitly calling pci_alloc_irq_vectors() for IRQ allocation
> 3. Also calling pci_free_irq_vectors() in error paths or remove routines
> 
> In this scenario, the devres framework may attempt to free the IRQ
> vectors a second time upon device release, leading to double-free
> issues. Analysis of the tree shows this hazardous pattern exists
> in multiple drivers, while 35 other drivers correctly rely solely
> on the implicit cleanup.

See cover-letter.

I would leave that to Bjorn's preference, but I think that info is
useful for the cover letter so that reviewers get what's going on, but
it's not useful in the commit message. Your description of the
problematic API is justification enough IMO.

> 
> To resolve this ambiguity, introduce explicit managed APIs for IRQ
> vector allocation:
> - pcim_alloc_irq_vectors()
> - pcim_alloc_irq_vectors_affinity()
> 
> These functions are the devres-managed counterparts to
> pci_alloc_irq_vectors[_affinity](). Drivers that wish to have
> devres-managed IRQ vectors should use these new APIs instead.
> 
> The long-term goal is to convert all drivers which use pcim_enable_device()
> as well as pci_alloc_irq_vectors[_affinity]() to use these managed

nit:
s/as well as/in combination with

> functions, and eventually remove the problematic hybrid management
> pattern from pcim_enable_device() and pcim_setup_msi_release() entirely.
> This patch lays the foundation by introducing the APIs.

But in general reads very nice and is mostly spot-on :)

> 
> Suggested-by: Philipp Stanner <phasta@kernel.org>
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> 
> ---
> 
> Changes in v3:
> - Rework the commit message and function doc (Philipp)
> - Remove setting is_msi_managed flag from new APIs (Philipp)
> 
> Changes in v2:
> - Rebase
> - Introduce patches only for PCI subsystem to convert the API
> 
>  drivers/pci/msi/api.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h   | 22 ++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index c18559b..90b67f5 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -297,6 +297,53 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
>  
>  /**
> + * pcim_alloc_irq_vectors - devres managed pci_alloc_irq_vectors()
> + * @dev: the PCI device to operate on
> + * @min_vecs: minimum number of vectors required (must be >= 1)
> + * @max_vecs: maximum (desired) number of vectors
> + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> + *
> + * This is a device resource managed version of pci_alloc_irq_vectors().
> + * Interrupt vectors are automatically freed on driver detach by the
> + * devres. Drivers do not need to explicitly call pci_free_irq_vectors().

s/the devres/devres

s/do not need/must not  <- You stated doing so would be a bug, right?

> + *
> + * Returns number of vectors allocated (which might be smaller than
> + * @max_vecs) on success, or a negative error code on failure.
> + */
> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +			   unsigned int max_vecs, unsigned int flags)
> +{
> +	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					       flags, NULL);
> +}
> +EXPORT_SYMBOL(pcim_alloc_irq_vectors);
> +
> +/**
> + * pcim_alloc_irq_vectors_affinity - devres managed pci_alloc_irq_vectors_affinity()
> + * @dev: the PCI device to operate on
> + * @min_vecs: minimum number of vectors required (must be >= 1)
> + * @max_vecs: maximum (desired) number of vectors
> + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> + * @affd: optional description of the affinity requirements
> + *
> + * This is a device resource managed version of pci_alloc_irq_vectors_affinity().
> + * Interrupt vectors are automatically freed on driver detach by the devres
> + * machinery. Drivers which use pcim_enable_device() as well as this function,
> + * do not need to explicitly call pci_free_irq_vectors() currently.
> + *
> + * Returns number of vectors allocated (which might be smaller than
> + * @max_vecs) on success, or a negative error code on failure.
> + */
> +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> +				   unsigned int max_vecs, unsigned int flags,
> +				   struct irq_affinity *affd)
> +{
> +	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					      flags, affd);

Hmmm, OK – I am not thaaaat deep in the topic anymore, so correct me if
I am wrong. But to me it seems that we might have a problem.

Your intention seems to be to rely on hybrid devres here, isn't it?

Because pci_alloc_irq_vectors_affinity() runs into
__pci_enable_msi(x)_range(), which runs into pci_setup_msi_context(),
which sets up the devres callbacks.

And this would now mean that your new function here *also* is hybrid,
because it relies on pcim_enable_device() setting the flag, right?

IOW, when someone called pci_enable_device(), your pcim_ function here
would NOT be a managed devres function. Right?

That would obviously not fly. The current (and IMO) desirable design
goal for devres PCI is that pcim_ functions are ALWAYS managed (hence
the 'm' = managed, and pci_ functions are NEVER managed).

I am sorry if / that I overlooked that while first glimpsing at your
previous versions.


IIRC for this reason I had to provide a completely unmanaged backend,
and to do so I actually had to *copy* existing functions from pci.c, so
for many months PCI contained two versions of the same functions.

The reason was that the hybrid behavior was burried so deep in the
subsystem.

See here:

Here I had to reimplement some functions to be used by my new pcim_
functions:
https://lore.kernel.org/linux-pci/20240610093149.20640-4-pstanner@redhat.com/


And here I could finally remove them later:
https://lore.kernel.org/linux-pci/20250519112959.25487-7-phasta@kernel.org/


I hope this helps. And again sorry for potentially causing you
unnecessary work :(


Regards
Philipp


> +}
> +EXPORT_SYMBOL(pcim_alloc_irq_vectors_affinity);
> +
> +/**
>   * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
>   * @dev: the PCI device to operate on
>   * @nr:  device-relative interrupt vector index (0-based); has different
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2c44545..3716c67 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1770,6 +1770,12 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  				   unsigned int max_vecs, unsigned int flags,
>  				   struct irq_affinity *affd);
>  
> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +			  unsigned int max_vecs, unsigned int flags);
> +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> +				   unsigned int max_vecs, unsigned int flags,
> +				   struct irq_affinity *affd);
> +
>  bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
>  struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
>  				     const struct irq_affinity_desc *affdesc);
> @@ -1812,6 +1818,22 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>  					      flags, NULL);
>  }
>  
> +static inline int
> +pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> +			       unsigned int max_vecs, unsigned int flags,
> +			       struct irq_affinity *aff_desc)
> +{
> +	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					      flags, aff_desc);
> +}
> +static inline int
> +pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +		      unsigned int max_vecs, unsigned int flags)
> +{
> +	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					      flags, NULL);
> +}
> +
>  static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
>  { return false; }
>  static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,

next prev parent reply	other threads:[~2026-04-22  7:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22  2:38 [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Shawn Lin
2026-04-22  2:38 ` [RESEND PATCH v3 1/3] PCI/MSI: " Shawn Lin
2026-04-22  7:32   ` Philipp Stanner [this message]
2026-04-22  8:36     ` Shawn Lin
2026-04-22 13:07       ` Philipp Stanner
2026-04-24  7:23         ` Shawn Lin
2026-04-22  2:38 ` [RESEND PATCH v3 2/3] PCI: switchtec: Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors() Shawn Lin
2026-04-22  2:38 ` [RESEND PATCH v3 3/3] PCI: vmd: " Shawn Lin
2026-04-22  7:11 ` [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Philipp Stanner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c227e17a573c8c674c027c9b3014c2ec9ebfec7.camel@mailbox.org \
    --to=phasta@mailbox.org \
    --cc=bhelgaas@google.com \
    --cc=jonathan.derrick@linux.dev \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=nirmal.patel@linux.intel.com \
    --cc=phasta@kernel.org \
    --cc=shawn.lin@rock-chips.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox