[RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

* [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation
@ 2026-04-22  2:38 Shawn Lin
  2026-04-22  2:38 ` [RESEND PATCH v3 1/3] PCI/MSI: " Shawn Lin
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Shawn Lin @ 2026-04-22  2:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Nirmal Patel, Jonathan Derrick, Kurt Schwemmer, Logan Gunthorpe,
	Philipp Stanner, linux-pci, Shawn Lin

There is a long-standing design issue in the PCI/MSI subsystem where the
implicit, automatic management of IRQ vectors by the devres framework
conflicts with explicit driver cleanup, creating ambiguity and potential
resource management bugs.

Historically, pcim_enable_device() not only manages standard PCI resources
(BARs) via devres but also implicitly triggers automatic IRQ vector management
when calling pci_alloc_irq_vectors[_affinity], because pcim_enable_device()
sets is_managed flag, thus pcim_msi_release() will register a cleanup action.

This creates an ambiguous ownership model. Many drivers follow a pattern of:
1. Calling pci_alloc_irq_vectors() to allocate interrupts.
2. Also calling pci_free_irq_vectors() in their error paths or remove routines.

When such a driver also uses pcim_enable_device(), the devres framework may
attempt to free the IRQ vectors a second time upon device release, leading to
a double-free. Analysis of the tree shows this hazardous pattern exists widely,
while 35 other drivers correctly rely solely on the implicit cleanup.

This series introduces new managed APIs: pcim_alloc_irq_vectors()and
pcim_alloc_irq_vectors_affinity(). Drivers that wish to have devres-managed IRQ
vectors should use these functions. They are currently the same as non-devres
managed version. In the short term, the series converts two drivers within the
PCI subsystem to use the new APIs. The long-term goal is to convert all other
drivers which wish to use these managed functions, and finally to remove the
problematic hybrid management pattern from pcim_enable_device() and
pcim_setup_msi_release() entirely.

Changes in v3:
- Rework the commit message and function doc (Philipp)
- Remove setting is_msi_managed flag from new APIs (Philipp)

Changes in v2:
- Rebase
- Introduce patches only for PCI subsystem to convert the API

Shawn Lin (3):
  PCI/MSI: Add Devres managed IRQ vectors allocation
  PCI: switchtec: Replace pci_alloc_irq_vectors() with
    pcim_alloc_irq_vectors()
  PCI: vmd: Replace pci_alloc_irq_vectors() with
    pcim_alloc_irq_vectors()

 drivers/pci/controller/vmd.c   |  4 ++--
 drivers/pci/msi/api.c          | 47 ++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/switch/switchtec.c |  6 +++---
 include/linux/pci.h            | 22 ++++++++++++++++++++
 4 files changed, 74 insertions(+), 5 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation
  2026-04-22  2:38 [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Shawn Lin
@ 2026-04-22  2:38 ` Shawn Lin
  2026-04-22  7:32   ` Philipp Stanner
  2026-04-22  2:38 ` [RESEND PATCH v3 2/3] PCI: switchtec: Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors() Shawn Lin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Shawn Lin @ 2026-04-22  2:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Nirmal Patel, Jonathan Derrick, Kurt Schwemmer, Logan Gunthorpe,
	Philipp Stanner, linux-pci, Shawn Lin

The PCI/MSI subsystem suffers from a long-standing design issue where
the implicit, automatic management of IRQ vectors by the devres
framework conflicts with explicit driver cleanup, leading to ambiguous
ownership and potential resource management bugs.

Historically, pcim_enable_device() not only manages standard PCI
resources (BARs) via devres but also implicitly sets the is_msi_managed
flag if calling pci_alloc_irq_vectors*(). This registers pcim_msi_release()
as a cleanup action, creating a hybrid ownership model.

This ambiguity causes problems when drivers follow a common but
hazardous pattern:
1. Using pcim_enable_device() (which implicitly marks IRQs as managed)
2. Explicitly calling pci_alloc_irq_vectors() for IRQ allocation
3. Also calling pci_free_irq_vectors() in error paths or remove routines

In this scenario, the devres framework may attempt to free the IRQ
vectors a second time upon device release, leading to double-free
issues. Analysis of the tree shows this hazardous pattern exists
in multiple drivers, while 35 other drivers correctly rely solely
on the implicit cleanup.

To resolve this ambiguity, introduce explicit managed APIs for IRQ
vector allocation:
- pcim_alloc_irq_vectors()
- pcim_alloc_irq_vectors_affinity()

These functions are the devres-managed counterparts to
pci_alloc_irq_vectors[_affinity](). Drivers that wish to have
devres-managed IRQ vectors should use these new APIs instead.

The long-term goal is to convert all drivers which use pcim_enable_device()
as well as pci_alloc_irq_vectors[_affinity]() to use these managed
functions, and eventually remove the problematic hybrid management
pattern from pcim_enable_device() and pcim_setup_msi_release() entirely.
This patch lays the foundation by introducing the APIs.

Suggested-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>

---

Changes in v3:
- Rework the commit message and function doc (Philipp)
- Remove setting is_msi_managed flag from new APIs (Philipp)

Changes in v2:
- Rebase
- Introduce patches only for PCI subsystem to convert the API

 drivers/pci/msi/api.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h   | 22 ++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index c18559b..90b67f5 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -297,6 +297,53 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
 
 /**
+ * pcim_alloc_irq_vectors - devres managed pci_alloc_irq_vectors()
+ * @dev: the PCI device to operate on
+ * @min_vecs: minimum number of vectors required (must be >= 1)
+ * @max_vecs: maximum (desired) number of vectors
+ * @flags: flags for this allocation, see pci_alloc_irq_vectors()
+ *
+ * This is a device resource managed version of pci_alloc_irq_vectors().
+ * Interrupt vectors are automatically freed on driver detach by the
+ * devres. Drivers do not need to explicitly call pci_free_irq_vectors().
+ *
+ * Returns number of vectors allocated (which might be smaller than
+ * @max_vecs) on success, or a negative error code on failure.
+ */
+int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+			   unsigned int max_vecs, unsigned int flags)
+{
+	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
+					       flags, NULL);
+}
+EXPORT_SYMBOL(pcim_alloc_irq_vectors);
+
+/**
+ * pcim_alloc_irq_vectors_affinity - devres managed pci_alloc_irq_vectors_affinity()
+ * @dev: the PCI device to operate on
+ * @min_vecs: minimum number of vectors required (must be >= 1)
+ * @max_vecs: maximum (desired) number of vectors
+ * @flags: flags for this allocation, see pci_alloc_irq_vectors()
+ * @affd: optional description of the affinity requirements
+ *
+ * This is a device resource managed version of pci_alloc_irq_vectors_affinity().
+ * Interrupt vectors are automatically freed on driver detach by the devres
+ * machinery. Drivers which use pcim_enable_device() as well as this function,
+ * do not need to explicitly call pci_free_irq_vectors() currently.
+ *
+ * Returns number of vectors allocated (which might be smaller than
+ * @max_vecs) on success, or a negative error code on failure.
+ */
+int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+				   unsigned int max_vecs, unsigned int flags,
+				   struct irq_affinity *affd)
+{
+	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
+					      flags, affd);
+}
+EXPORT_SYMBOL(pcim_alloc_irq_vectors_affinity);
+
+/**
  * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
  * @dev: the PCI device to operate on
  * @nr:  device-relative interrupt vector index (0-based); has different
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2c44545..3716c67 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1770,6 +1770,12 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   unsigned int max_vecs, unsigned int flags,
 				   struct irq_affinity *affd);
 
+int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+			  unsigned int max_vecs, unsigned int flags);
+int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+				   unsigned int max_vecs, unsigned int flags,
+				   struct irq_affinity *affd);
+
 bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
 struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
 				     const struct irq_affinity_desc *affdesc);
@@ -1812,6 +1818,22 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 					      flags, NULL);
 }
 
+static inline int
+pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
+			       unsigned int max_vecs, unsigned int flags,
+			       struct irq_affinity *aff_desc)
+{
+	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
+					      flags, aff_desc);
+}
+static inline int
+pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+		      unsigned int max_vecs, unsigned int flags)
+{
+	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
+					      flags, NULL);
+}
+
 static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
 { return false; }
 static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation
  2026-04-22  2:38 ` [RESEND PATCH v3 1/3] PCI/MSI: " Shawn Lin
@ 2026-04-22  7:32   ` Philipp Stanner
  2026-04-22  8:36     ` Shawn Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Philipp Stanner @ 2026-04-22  7:32 UTC (permalink / raw)
  To: Shawn Lin, Bjorn Helgaas
  Cc: Nirmal Patel, Jonathan Derrick, Kurt Schwemmer, Logan Gunthorpe,
	Philipp Stanner, linux-pci

On Wed, 2026-04-22 at 10:38 +0800, Shawn Lin wrote:
> The PCI/MSI subsystem suffers from a long-standing design issue where
> the implicit, automatic management of IRQ vectors by the devres

nit:
The *automatic* handling is not actually the issue. All devres
interfaces (pcim_) are automatic by definition.

The issue is the "sometimes managed, sometimes not" nature of some pci_
interfaces. I came to call those "hybrid functions".

> framework conflicts with explicit driver cleanup, leading to ambiguous
> ownership and potential resource management bugs.
> 
> Historically, pcim_enable_device() not only manages standard PCI
> resources (BARs) via devres 
> 

See cover-letter. As I said, that's not the case anymore. It was the
case once.

I think you could just briefly state "Historically, some functions
exhibited this hybrid behavior, the last of which by now are …"

If you want you could maybe also Link: one  of my patch series's for
the wider history.

> but also implicitly sets the is_msi_managed
> flag if calling pci_alloc_irq_vectors*(). This registers pcim_msi_release()
> as a cleanup action, creating a hybrid ownership model.

nit:
s/ownership/responsibility

> 
> This ambiguity causes problems when drivers follow a common but
> hazardous pattern:
> 1. Using pcim_enable_device() (which implicitly marks IRQs as managed)
> 2. Explicitly calling pci_alloc_irq_vectors() for IRQ allocation
> 3. Also calling pci_free_irq_vectors() in error paths or remove routines
> 
> In this scenario, the devres framework may attempt to free the IRQ
> vectors a second time upon device release, leading to double-free
> issues. Analysis of the tree shows this hazardous pattern exists
> in multiple drivers, while 35 other drivers correctly rely solely
> on the implicit cleanup.

See cover-letter.

I would leave that to Bjorn's preference, but I think that info is
useful for the cover letter so that reviewers get what's going on, but
it's not useful in the commit message. Your description of the
problematic API is justification enough IMO.

> 
> To resolve this ambiguity, introduce explicit managed APIs for IRQ
> vector allocation:
> - pcim_alloc_irq_vectors()
> - pcim_alloc_irq_vectors_affinity()
> 
> These functions are the devres-managed counterparts to
> pci_alloc_irq_vectors[_affinity](). Drivers that wish to have
> devres-managed IRQ vectors should use these new APIs instead.
> 
> The long-term goal is to convert all drivers which use pcim_enable_device()
> as well as pci_alloc_irq_vectors[_affinity]() to use these managed

nit:
s/as well as/in combination with

> functions, and eventually remove the problematic hybrid management
> pattern from pcim_enable_device() and pcim_setup_msi_release() entirely.
> This patch lays the foundation by introducing the APIs.

But in general reads very nice and is mostly spot-on :)

> 
> Suggested-by: Philipp Stanner <phasta@kernel.org>
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> 
> ---
> 
> Changes in v3:
> - Rework the commit message and function doc (Philipp)
> - Remove setting is_msi_managed flag from new APIs (Philipp)
> 
> Changes in v2:
> - Rebase
> - Introduce patches only for PCI subsystem to convert the API
> 
>  drivers/pci/msi/api.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h   | 22 ++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> index c18559b..90b67f5 100644
> --- a/drivers/pci/msi/api.c
> +++ b/drivers/pci/msi/api.c
> @@ -297,6 +297,53 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
>  
>  /**
> + * pcim_alloc_irq_vectors - devres managed pci_alloc_irq_vectors()
> + * @dev: the PCI device to operate on
> + * @min_vecs: minimum number of vectors required (must be >= 1)
> + * @max_vecs: maximum (desired) number of vectors
> + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> + *
> + * This is a device resource managed version of pci_alloc_irq_vectors().
> + * Interrupt vectors are automatically freed on driver detach by the
> + * devres. Drivers do not need to explicitly call pci_free_irq_vectors().

s/the devres/devres

s/do not need/must not  <- You stated doing so would be a bug, right?

> + *
> + * Returns number of vectors allocated (which might be smaller than
> + * @max_vecs) on success, or a negative error code on failure.
> + */
> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +			   unsigned int max_vecs, unsigned int flags)
> +{
> +	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					       flags, NULL);
> +}
> +EXPORT_SYMBOL(pcim_alloc_irq_vectors);
> +
> +/**
> + * pcim_alloc_irq_vectors_affinity - devres managed pci_alloc_irq_vectors_affinity()
> + * @dev: the PCI device to operate on
> + * @min_vecs: minimum number of vectors required (must be >= 1)
> + * @max_vecs: maximum (desired) number of vectors
> + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> + * @affd: optional description of the affinity requirements
> + *
> + * This is a device resource managed version of pci_alloc_irq_vectors_affinity().
> + * Interrupt vectors are automatically freed on driver detach by the devres
> + * machinery. Drivers which use pcim_enable_device() as well as this function,
> + * do not need to explicitly call pci_free_irq_vectors() currently.
> + *
> + * Returns number of vectors allocated (which might be smaller than
> + * @max_vecs) on success, or a negative error code on failure.
> + */
> +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> +				   unsigned int max_vecs, unsigned int flags,
> +				   struct irq_affinity *affd)
> +{
> +	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					      flags, affd);

Hmmm, OK – I am not thaaaat deep in the topic anymore, so correct me if
I am wrong. But to me it seems that we might have a problem.

Your intention seems to be to rely on hybrid devres here, isn't it?

Because pci_alloc_irq_vectors_affinity() runs into
__pci_enable_msi(x)_range(), which runs into pci_setup_msi_context(),
which sets up the devres callbacks.

And this would now mean that your new function here *also* is hybrid,
because it relies on pcim_enable_device() setting the flag, right?

IOW, when someone called pci_enable_device(), your pcim_ function here
would NOT be a managed devres function. Right?

That would obviously not fly. The current (and IMO) desirable design
goal for devres PCI is that pcim_ functions are ALWAYS managed (hence
the 'm' = managed, and pci_ functions are NEVER managed).

I am sorry if / that I overlooked that while first glimpsing at your
previous versions.


IIRC for this reason I had to provide a completely unmanaged backend,
and to do so I actually had to *copy* existing functions from pci.c, so
for many months PCI contained two versions of the same functions.

The reason was that the hybrid behavior was burried so deep in the
subsystem.

See here:

Here I had to reimplement some functions to be used by my new pcim_
functions:
https://lore.kernel.org/linux-pci/20240610093149.20640-4-pstanner@redhat.com/


And here I could finally remove them later:
https://lore.kernel.org/linux-pci/20250519112959.25487-7-phasta@kernel.org/


I hope this helps. And again sorry for potentially causing you
unnecessary work :(


Regards
Philipp


> +}
> +EXPORT_SYMBOL(pcim_alloc_irq_vectors_affinity);
> +
> +/**
>   * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
>   * @dev: the PCI device to operate on
>   * @nr:  device-relative interrupt vector index (0-based); has different
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2c44545..3716c67 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1770,6 +1770,12 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>  				   unsigned int max_vecs, unsigned int flags,
>  				   struct irq_affinity *affd);
>  
> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +			  unsigned int max_vecs, unsigned int flags);
> +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> +				   unsigned int max_vecs, unsigned int flags,
> +				   struct irq_affinity *affd);
> +
>  bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
>  struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
>  				     const struct irq_affinity_desc *affdesc);
> @@ -1812,6 +1818,22 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>  					      flags, NULL);
>  }
>  
> +static inline int
> +pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> +			       unsigned int max_vecs, unsigned int flags,
> +			       struct irq_affinity *aff_desc)
> +{
> +	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					      flags, aff_desc);
> +}
> +static inline int
> +pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> +		      unsigned int max_vecs, unsigned int flags)
> +{
> +	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> +					      flags, NULL);
> +}
> +
>  static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
>  { return false; }
>  static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation
  2026-04-22  7:32   ` Philipp Stanner
@ 2026-04-22  8:36     ` Shawn Lin
  2026-04-22 13:07       ` Philipp Stanner
  0 siblings, 1 reply; 9+ messages in thread
From: Shawn Lin @ 2026-04-22  8:36 UTC (permalink / raw)
  To: phasta, Bjorn Helgaas
  Cc: shawn.lin, Nirmal Patel, Jonathan Derrick, Kurt Schwemmer,
	Logan Gunthorpe, linux-pci

Hi Philipp

在 2026/04/22 星期三 15:32, Philipp Stanner 写道:
> On Wed, 2026-04-22 at 10:38 +0800, Shawn Lin wrote:
>> The PCI/MSI subsystem suffers from a long-standing design issue where
>> the implicit, automatic management of IRQ vectors by the devres
> 
> nit:
> The *automatic* handling is not actually the issue. All devres
> interfaces (pcim_) are automatic by definition.
> 
> The issue is the "sometimes managed, sometimes not" nature of some pci_
> interfaces. I came to call those "hybrid functions".
> 
>> framework conflicts with explicit driver cleanup, leading to ambiguous
>> ownership and potential resource management bugs.
>>
>> Historically, pcim_enable_device() not only manages standard PCI
>> resources (BARs) via devres
>>
> 
> See cover-letter. As I said, that's not the case anymore. It was the
> case once.
> 
> I think you could just briefly state "Historically, some functions
> exhibited this hybrid behavior, the last of which by now are …"
> 
> If you want you could maybe also Link: one  of my patch series's for
> the wider history.
> 
>> but also implicitly sets the is_msi_managed
>> flag if calling pci_alloc_irq_vectors*(). This registers pcim_msi_release()
>> as a cleanup action, creating a hybrid ownership model.
> 
> nit:
> s/ownership/responsibility
> 
>>
>> This ambiguity causes problems when drivers follow a common but
>> hazardous pattern:
>> 1. Using pcim_enable_device() (which implicitly marks IRQs as managed)
>> 2. Explicitly calling pci_alloc_irq_vectors() for IRQ allocation
>> 3. Also calling pci_free_irq_vectors() in error paths or remove routines
>>
>> In this scenario, the devres framework may attempt to free the IRQ
>> vectors a second time upon device release, leading to double-free
>> issues. Analysis of the tree shows this hazardous pattern exists
>> in multiple drivers, while 35 other drivers correctly rely solely
>> on the implicit cleanup.
> 
> See cover-letter.
> 
> I would leave that to Bjorn's preference, but I think that info is
> useful for the cover letter so that reviewers get what's going on, but
> it's not useful in the commit message. Your description of the
> problematic API is justification enough IMO.
> 
>>
>> To resolve this ambiguity, introduce explicit managed APIs for IRQ
>> vector allocation:
>> - pcim_alloc_irq_vectors()
>> - pcim_alloc_irq_vectors_affinity()
>>
>> These functions are the devres-managed counterparts to
>> pci_alloc_irq_vectors[_affinity](). Drivers that wish to have
>> devres-managed IRQ vectors should use these new APIs instead.
>>
>> The long-term goal is to convert all drivers which use pcim_enable_device()
>> as well as pci_alloc_irq_vectors[_affinity]() to use these managed
> 
> nit:
> s/as well as/in combination with
> 
>> functions, and eventually remove the problematic hybrid management
>> pattern from pcim_enable_device() and pcim_setup_msi_release() entirely.
>> This patch lays the foundation by introducing the APIs.
> 
> But in general reads very nice and is mostly spot-on :)
> 
>>
>> Suggested-by: Philipp Stanner <phasta@kernel.org>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>
>> ---
>>
>> Changes in v3:
>> - Rework the commit message and function doc (Philipp)
>> - Remove setting is_msi_managed flag from new APIs (Philipp)
>>
>> Changes in v2:
>> - Rebase
>> - Introduce patches only for PCI subsystem to convert the API
>>
>>   drivers/pci/msi/api.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/pci.h   | 22 ++++++++++++++++++++++
>>   2 files changed, 69 insertions(+)
>>
>> diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
>> index c18559b..90b67f5 100644
>> --- a/drivers/pci/msi/api.c
>> +++ b/drivers/pci/msi/api.c
>> @@ -297,6 +297,53 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>>   EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
>>   
>>   /**
>> + * pcim_alloc_irq_vectors - devres managed pci_alloc_irq_vectors()
>> + * @dev: the PCI device to operate on
>> + * @min_vecs: minimum number of vectors required (must be >= 1)
>> + * @max_vecs: maximum (desired) number of vectors
>> + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
>> + *
>> + * This is a device resource managed version of pci_alloc_irq_vectors().
>> + * Interrupt vectors are automatically freed on driver detach by the
>> + * devres. Drivers do not need to explicitly call pci_free_irq_vectors().
> 
> s/the devres/devres
> 
> s/do not need/must not  <- You stated doing so would be a bug, right?
> 
>> + *
>> + * Returns number of vectors allocated (which might be smaller than
>> + * @max_vecs) on success, or a negative error code on failure.
>> + */
>> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>> +			   unsigned int max_vecs, unsigned int flags)
>> +{
>> +	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
>> +					       flags, NULL);
>> +}
>> +EXPORT_SYMBOL(pcim_alloc_irq_vectors);
>> +
>> +/**
>> + * pcim_alloc_irq_vectors_affinity - devres managed pci_alloc_irq_vectors_affinity()
>> + * @dev: the PCI device to operate on
>> + * @min_vecs: minimum number of vectors required (must be >= 1)
>> + * @max_vecs: maximum (desired) number of vectors
>> + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
>> + * @affd: optional description of the affinity requirements
>> + *
>> + * This is a device resource managed version of pci_alloc_irq_vectors_affinity().
>> + * Interrupt vectors are automatically freed on driver detach by the devres
>> + * machinery. Drivers which use pcim_enable_device() as well as this function,
>> + * do not need to explicitly call pci_free_irq_vectors() currently.
>> + *
>> + * Returns number of vectors allocated (which might be smaller than
>> + * @max_vecs) on success, or a negative error code on failure.
>> + */
>> +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>> +				   unsigned int max_vecs, unsigned int flags,
>> +				   struct irq_affinity *affd)
>> +{
>> +	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
>> +					      flags, affd);
> 
> Hmmm, OK – I am not thaaaat deep in the topic anymore, so correct me if
> I am wrong. But to me it seems that we might have a problem.
> 
> Your intention seems to be to rely on hybrid devres here, isn't it?
> 

Yes, it's.

> Because pci_alloc_irq_vectors_affinity() runs into
> __pci_enable_msi(x)_range(), which runs into pci_setup_msi_context(),
> which sets up the devres callbacks.
> 
> And this would now mean that your new function here *also* is hybrid,
> because it relies on pcim_enable_device() setting the flag, right?
> 
> IOW, when someone called pci_enable_device(), your pcim_ function here
> would NOT be a managed devres function. Right?
> 
> That would obviously not fly. The current (and IMO) desirable design
> goal for devres PCI is that pcim_ functions are ALWAYS managed (hence
> the 'm' = managed, and pci_ functions are NEVER managed).
> 
> I am sorry if / that I overlooked that while first glimpsing at your
> previous versions.
> 

Don't worry.

> 
> IIRC for this reason I had to provide a completely unmanaged backend,
> and to do so I actually had to *copy* existing functions from pci.c, so
> for many months PCI contained two versions of the same functions.
> 

I was trying to avoid copying the existing functions.

There are some combinations here:
1. pcim_enable_device() + pci_alloc   -> correct
2. pcim_enable_device() + pci_alloc + pci_free  -> bug
3. pci_enable_device() + pci_alloc + pci_free -> correct

To solve the 2nd case, I need to remove the hybird mode from
pcim_enable_device(). So what I had in mind is to is :

1. keep pcim_alloc_irq_vectors() rely on pcim_enable_device(), still
the hybrid mode as you pointed out.
2. convert pcim_enable_device() + pci_alloc_irq_vectors() users to
pcim_enable_device() + pcim_alloc_irq_vectors(), which is the first
above case. Meanwhile add documentation or code check to prevent new
pci_enable_device() + pcim_alloc_irq_vectors() user from using it.

3. after conversions, we still have three combinations:
- pcim_enable_device() + pcim_alloc()  -> correct
- pcim_enable_device() + pci_alloc + pci_free  -> bug
- pci_enable_device() + pci_alloc + pci_free   -> correct

4. Then remove pcim_setup_msi_release() from pci_setup_msi_context().
And add pcim_setup_msi_release() to my new APIs. This way,
- pcim_enable_device() + pcim_alloc()  -> correct
- pcim_enable_device() + pci_alloc + pci_free  -> correct
- pci_enable_device() + pci_alloc + pci_free   -> correct

5. remove the limitation of using pci_enable_device() + pcim_alloc, as
mentioned in the 2nd above step. so we could allow this 4th pattern.
Finally all things work fine.

Please let me know if this approach works for you?


> The reason was that the hybrid behavior was burried so deep in the
> subsystem.
> 
> See here:
> 
> Here I had to reimplement some functions to be used by my new pcim_
> functions:
> https://lore.kernel.org/linux-pci/20240610093149.20640-4-pstanner@redhat.com/
> 
> 
> And here I could finally remove them later:
> https://lore.kernel.org/linux-pci/20250519112959.25487-7-phasta@kernel.org/
> 
> 
> I hope this helps. And again sorry for potentially causing you
> unnecessary work :(
> 
> 
> Regards
> Philipp
> 
> 
>> +}
>> +EXPORT_SYMBOL(pcim_alloc_irq_vectors_affinity);
>> +
>> +/**
>>    * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
>>    * @dev: the PCI device to operate on
>>    * @nr:  device-relative interrupt vector index (0-based); has different
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 2c44545..3716c67 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1770,6 +1770,12 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>>   				   unsigned int max_vecs, unsigned int flags,
>>   				   struct irq_affinity *affd);
>>   
>> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>> +			  unsigned int max_vecs, unsigned int flags);
>> +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>> +				   unsigned int max_vecs, unsigned int flags,
>> +				   struct irq_affinity *affd);
>> +
>>   bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
>>   struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
>>   				     const struct irq_affinity_desc *affdesc);
>> @@ -1812,6 +1818,22 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>   					      flags, NULL);
>>   }
>>   
>> +static inline int
>> +pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>> +			       unsigned int max_vecs, unsigned int flags,
>> +			       struct irq_affinity *aff_desc)
>> +{
>> +	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
>> +					      flags, aff_desc);
>> +}
>> +static inline int
>> +pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>> +		      unsigned int max_vecs, unsigned int flags)
>> +{
>> +	return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
>> +					      flags, NULL);
>> +}
>> +
>>   static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
>>   { return false; }
>>   static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation
  2026-04-22  8:36     ` Shawn Lin
@ 2026-04-22 13:07       ` Philipp Stanner
  2026-04-24  7:23         ` Shawn Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Philipp Stanner @ 2026-04-22 13:07 UTC (permalink / raw)
  To: Shawn Lin, phasta, Bjorn Helgaas
  Cc: Nirmal Patel, Jonathan Derrick, Kurt Schwemmer, Logan Gunthorpe,
	linux-pci

On Wed, 2026-04-22 at 16:36 +0800, Shawn Lin wrote:
> Hi Philipp
> 
> 在 2026/04/22 星期三 15:32, Philipp Stanner 写道:
> > On Wed, 2026-04-22 at 10:38 +0800, Shawn Lin wrote:
> > > The PCI/MSI subsystem suffers from a long-standing design issue where
> > > the implicit, automatic management of IRQ vectors by the devres
> > 
> > nit:
> > The *automatic* handling is not actually the issue. All devres
> > interfaces (pcim_) are automatic by definition.
> > 
> > The issue is the "sometimes managed, sometimes not" nature of some pci_
> > interfaces. I came to call those "hybrid functions".
> > 
> > > framework conflicts with explicit driver cleanup, leading to ambiguous
> > > ownership and potential resource management bugs.
> > > 
> > > Historically, pcim_enable_device() not only manages standard PCI
> > > resources (BARs) via devres
> > > 
> > 
> > See cover-letter. As I said, that's not the case anymore. It was the
> > case once.
> > 
> > I think you could just briefly state "Historically, some functions
> > exhibited this hybrid behavior, the last of which by now are …"
> > 
> > If you want you could maybe also Link: one  of my patch series's for
> > the wider history.
> > 
> > > but also implicitly sets the is_msi_managed
> > > flag if calling pci_alloc_irq_vectors*(). This registers pcim_msi_release()
> > > as a cleanup action, creating a hybrid ownership model.
> > 
> > nit:
> > s/ownership/responsibility
> > 
> > > 
> > > This ambiguity causes problems when drivers follow a common but
> > > hazardous pattern:
> > > 1. Using pcim_enable_device() (which implicitly marks IRQs as managed)
> > > 2. Explicitly calling pci_alloc_irq_vectors() for IRQ allocation
> > > 3. Also calling pci_free_irq_vectors() in error paths or remove routines
> > > 
> > > In this scenario, the devres framework may attempt to free the IRQ
> > > vectors a second time upon device release, leading to double-free
> > > issues. Analysis of the tree shows this hazardous pattern exists
> > > in multiple drivers, while 35 other drivers correctly rely solely
> > > on the implicit cleanup.
> > 
> > See cover-letter.
> > 
> > I would leave that to Bjorn's preference, but I think that info is
> > useful for the cover letter so that reviewers get what's going on, but
> > it's not useful in the commit message. Your description of the
> > problematic API is justification enough IMO.
> > 
> > > 
> > > To resolve this ambiguity, introduce explicit managed APIs for IRQ
> > > vector allocation:
> > > - pcim_alloc_irq_vectors()
> > > - pcim_alloc_irq_vectors_affinity()
> > > 
> > > These functions are the devres-managed counterparts to
> > > pci_alloc_irq_vectors[_affinity](). Drivers that wish to have
> > > devres-managed IRQ vectors should use these new APIs instead.
> > > 
> > > The long-term goal is to convert all drivers which use pcim_enable_device()
> > > as well as pci_alloc_irq_vectors[_affinity]() to use these managed
> > 
> > nit:
> > s/as well as/in combination with
> > 
> > > functions, and eventually remove the problematic hybrid management
> > > pattern from pcim_enable_device() and pcim_setup_msi_release() entirely.
> > > This patch lays the foundation by introducing the APIs.
> > 
> > But in general reads very nice and is mostly spot-on :)
> > 
> > > 
> > > Suggested-by: Philipp Stanner <phasta@kernel.org>
> > > Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> > > 
> > > ---
> > > 
> > > Changes in v3:
> > > - Rework the commit message and function doc (Philipp)
> > > - Remove setting is_msi_managed flag from new APIs (Philipp)
> > > 
> > > Changes in v2:
> > > - Rebase
> > > - Introduce patches only for PCI subsystem to convert the API
> > > 
> > >   drivers/pci/msi/api.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> > >   include/linux/pci.h   | 22 ++++++++++++++++++++++
> > >   2 files changed, 69 insertions(+)
> > > 
> > > diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> > > index c18559b..90b67f5 100644
> > > --- a/drivers/pci/msi/api.c
> > > +++ b/drivers/pci/msi/api.c
> > > @@ -297,6 +297,53 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > >   EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
> > >   
> > >   /**
> > > + * pcim_alloc_irq_vectors - devres managed pci_alloc_irq_vectors()
> > > + * @dev: the PCI device to operate on
> > > + * @min_vecs: minimum number of vectors required (must be >= 1)
> > > + * @max_vecs: maximum (desired) number of vectors
> > > + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> > > + *
> > > + * This is a device resource managed version of pci_alloc_irq_vectors().
> > > + * Interrupt vectors are automatically freed on driver detach by the
> > > + * devres. Drivers do not need to explicitly call pci_free_irq_vectors().
> > 
> > s/the devres/devres
> > 
> > s/do not need/must not  <- You stated doing so would be a bug, right?
> > 
> > > + *
> > > + * Returns number of vectors allocated (which might be smaller than
> > > + * @max_vecs) on success, or a negative error code on failure.
> > > + */
> > > +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > > +    unsigned int max_vecs, unsigned int flags)
> > > +{
> > > + return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > +        flags, NULL);
> > > +}
> > > +EXPORT_SYMBOL(pcim_alloc_irq_vectors);
> > > +
> > > +/**
> > > + * pcim_alloc_irq_vectors_affinity - devres managed pci_alloc_irq_vectors_affinity()
> > > + * @dev: the PCI device to operate on
> > > + * @min_vecs: minimum number of vectors required (must be >= 1)
> > > + * @max_vecs: maximum (desired) number of vectors
> > > + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> > > + * @affd: optional description of the affinity requirements
> > > + *
> > > + * This is a device resource managed version of pci_alloc_irq_vectors_affinity().
> > > + * Interrupt vectors are automatically freed on driver detach by the devres
> > > + * machinery. Drivers which use pcim_enable_device() as well as this function,
> > > + * do not need to explicitly call pci_free_irq_vectors() currently.
> > > + *
> > > + * Returns number of vectors allocated (which might be smaller than
> > > + * @max_vecs) on success, or a negative error code on failure.
> > > + */
> > > +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > +    unsigned int max_vecs, unsigned int flags,
> > > +    struct irq_affinity *affd)
> > > +{
> > > + return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > +       flags, affd);
> > 
> > Hmmm, OK – I am not thaaaat deep in the topic anymore, so correct me if
> > I am wrong. But to me it seems that we might have a problem.
> > 
> > Your intention seems to be to rely on hybrid devres here, isn't it?
> > 
> 
> Yes, it's.
> 
> > Because pci_alloc_irq_vectors_affinity() runs into
> > __pci_enable_msi(x)_range(), which runs into pci_setup_msi_context(),
> > which sets up the devres callbacks.
> > 
> > And this would now mean that your new function here *also* is hybrid,
> > because it relies on pcim_enable_device() setting the flag, right?
> > 
> > IOW, when someone called pci_enable_device(), your pcim_ function here
> > would NOT be a managed devres function. Right?
> > 
> > That would obviously not fly. The current (and IMO) desirable design
> > goal for devres PCI is that pcim_ functions are ALWAYS managed (hence
> > the 'm' = managed, and pci_ functions are NEVER managed).
> > 
> > I am sorry if / that I overlooked that while first glimpsing at your
> > previous versions.
> > 
> 
> Don't worry.
> 
> > 
> > IIRC for this reason I had to provide a completely unmanaged backend,
> > and to do so I actually had to *copy* existing functions from pci.c, so
> > for many months PCI contained two versions of the same functions.
> > 
> 
> I was trying to avoid copying the existing functions.
> 
> There are some combinations here:
> 1. pcim_enable_device() + pci_alloc   -> correct
> 2. pcim_enable_device() + pci_alloc + pci_free  -> bug
> 3. pci_enable_device() + pci_alloc + pci_free -> correct
> 
> To solve the 2nd case, I need to remove the hybird mode from
> pcim_enable_device(). So what I had in mind is to is :
> 
> 1. keep pcim_alloc_irq_vectors() rely on pcim_enable_device(), still
> the hybrid mode as you pointed out.
> 2. convert pcim_enable_device() + pci_alloc_irq_vectors() users to
> pcim_enable_device() + pcim_alloc_irq_vectors(), which is the first
> above case. Meanwhile add documentation or code check to prevent new
> pci_enable_device() + pcim_alloc_irq_vectors() user from using it.
> 
> 3. after conversions, we still have three combinations:
> - pcim_enable_device() + pcim_alloc()  -> correct
> - pcim_enable_device() + pci_alloc + pci_free  -> bug
> - pci_enable_device() + pci_alloc + pci_free   -> correct
> 
> 4. Then remove pcim_setup_msi_release() from pci_setup_msi_context().
> And add pcim_setup_msi_release() to my new APIs. This way,
> - pcim_enable_device() + pcim_alloc()  -> correct
> - pcim_enable_device() + pci_alloc + pci_free  -> correct
> - pci_enable_device() + pci_alloc + pci_free   -> correct
> 
> 5. remove the limitation of using pci_enable_device() + pcim_alloc, as
> mentioned in the 2nd above step. so we could allow this 4th pattern.
> Finally all things work fine.
> 
> Please let me know if this approach works for you?

The biggest issue I see is that you add another hybrid function, even
if it's temporary. pcim_alloc_irq_vectors() will sometimes be managed
and sometimes not – until you have removed all users of the
aforementioned functions, and finally performed step №4 from above.

Note that even in the past, with the oldest, most broken version of the
PCI devres API, pcim_ functions were *always* managed, completely
independently from pci(m)_enable_device().

What pcim_enable_device() did was implicitly switch some pci_
functions, such as pci_request_regions(), into managed mode.

The solution presented / intended by you would rely on:
   1. New users of pcim_ being aware (through documentation for
      example) that this one pcim_ function is special and not always
      managed (for now).
   2. New users being nice and really using pcim_enable_device().
   3. You being able to fully see it all through to the end, i.e., you
      not being switched to a different project, switching jobs and
      opening a bar in Hawaii etc. etc.

My experience has, unfortunately, been that people often don't read
documentation, especially if they are already used to interfaces like
that working a certain way.
How many drivers are there to be ported, do you have a list and maybe a
time horizon? I took almost a year to remove my helper functions again.

That's not a hard "NACK" from my side, it's just I wouldn't do it like
that. I suppose it is for the maintainers to pick one of the two
solution pathways:

a) have another hybrid function temporarily or
b) have duplicated code temporarily

(where "temporarily" means until the contributor finishes the job :)


BTW, please keep in mind that in my opinion, once PCI/MSI is solved, I
believe that pcim_enable_device() should do nothing anymore but disable
the device on driver detach, without any other side effects.


Regards
P.

> 
> 
> > The reason was that the hybrid behavior was burried so deep in the
> > subsystem.
> > 
> > See here:
> > 
> > Here I had to reimplement some functions to be used by my new pcim_
> > functions:
> > https://lore.kernel.org/linux-pci/20240610093149.20640-4-pstanner@redhat.com/
> > 
> > 
> > And here I could finally remove them later:
> > https://lore.kernel.org/linux-pci/20250519112959.25487-7-phasta@kernel.org/
> > 
> > 
> > I hope this helps. And again sorry for potentially causing you
> > unnecessary work :(
> > 
> > 
> > Regards
> > Philipp
> > 
> > 
> > > +}
> > > +EXPORT_SYMBOL(pcim_alloc_irq_vectors_affinity);
> > > +
> > > +/**
> > >    * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
> > >    * @dev: the PCI device to operate on
> > >    * @nr:  device-relative interrupt vector index (0-based); has different
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 2c44545..3716c67 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -1770,6 +1770,12 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > >       unsigned int max_vecs, unsigned int flags,
> > >       struct irq_affinity *affd);
> > >   
> > > +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > > +   unsigned int max_vecs, unsigned int flags);
> > > +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > +    unsigned int max_vecs, unsigned int flags,
> > > +    struct irq_affinity *affd);
> > > +
> > >   bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
> > >   struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
> > >         const struct irq_affinity_desc *affdesc);
> > > @@ -1812,6 +1818,22 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > >          flags, NULL);
> > >   }
> > >   
> > > +static inline int
> > > +pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > +        unsigned int max_vecs, unsigned int flags,
> > > +        struct irq_affinity *aff_desc)
> > > +{
> > > + return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > +       flags, aff_desc);
> > > +}
> > > +static inline int
> > > +pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > > +       unsigned int max_vecs, unsigned int flags)
> > > +{
> > > + return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > +       flags, NULL);
> > > +}
> > > +
> > >   static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
> > >   { return false; }
> > >   static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
> > 
> > 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation
  2026-04-22 13:07       ` Philipp Stanner
@ 2026-04-24  7:23         ` Shawn Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Shawn Lin @ 2026-04-24  7:23 UTC (permalink / raw)
  To: phasta, Bjorn Helgaas
  Cc: shawn.lin, Nirmal Patel, Jonathan Derrick, Kurt Schwemmer,
	Logan Gunthorpe, linux-pci

在 2026/04/22 星期三 21:07, Philipp Stanner 写道:
> On Wed, 2026-04-22 at 16:36 +0800, Shawn Lin wrote:
>> Hi Philipp
>>
>> 在 2026/04/22 星期三 15:32, Philipp Stanner 写道:
>>> On Wed, 2026-04-22 at 10:38 +0800, Shawn Lin wrote:
>>>> The PCI/MSI subsystem suffers from a long-standing design issue where

...

> 
> The biggest issue I see is that you add another hybrid function, even
> if it's temporary. pcim_alloc_irq_vectors() will sometimes be managed
> and sometimes not – until you have removed all users of the
> aforementioned functions, and finally performed step №4 from above.
> 
> Note that even in the past, with the oldest, most broken version of the
> PCI devres API, pcim_ functions were *always* managed, completely
> independently from pci(m)_enable_device().
> 
> What pcim_enable_device() did was implicitly switch some pci_
> functions, such as pci_request_regions(), into managed mode.
> 
> The solution presented / intended by you would rely on:
>     1. New users of pcim_ being aware (through documentation for
>        example) that this one pcim_ function is special and not always
>        managed (for now).

Agreed. That could be limited by code check instead of document, though.

>     2. New users being nice and really using pcim_enable_device().
>     3. You being able to fully see it all through to the end, i.e., you
>        not being switched to a different project, switching jobs and
>        opening a bar in Hawaii etc. etc.
> 

I've been contributing for over 10 years purely out of personal
interest, for non-employees' work like this series， no one's ever paid 
me for it. So it shouldn't be an issue at all for me. :))


> My experience has, unfortunately, been that people often don't read
> documentation, especially if they are already used to interfaces like
> that working a certain way.

+1

> How many drivers are there to be ported, do you have a list and maybe a
> time horizon? I took almost a year to remove my helper functions again.
> 

I will add it to my v4.

> That's not a hard "NACK" from my side, it's just I wouldn't do it like
> that. I suppose it is for the maintainers to pick one of the two
> solution pathways:
> 
> a) have another hybrid function temporarily or
> b) have duplicated code temporarily
> 
> (where "temporarily" means until the contributor finishes the job :)
> 

I fully understand your concern. The more straight way is to adopt
option b for sure. I will sent out my v4 soon.

Thanks for your suggestions, really appreciate!

> 
> BTW, please keep in mind that in my opinion, once PCI/MSI is solved, I
> believe that pcim_enable_device() should do nothing anymore but disable
> the device on driver detach, without any other side effects.
> 
> 
> Regards
> P.
> 
>>
>>
>>> The reason was that the hybrid behavior was burried so deep in the
>>> subsystem.
>>>
>>> See here:
>>>
>>> Here I had to reimplement some functions to be used by my new pcim_
>>> functions:
>>> https://lore.kernel.org/linux-pci/20240610093149.20640-4-pstanner@redhat.com/
>>>
>>>
>>> And here I could finally remove them later:
>>> https://lore.kernel.org/linux-pci/20250519112959.25487-7-phasta@kernel.org/
>>>
>>>
>>> I hope this helps. And again sorry for potentially causing you
>>> unnecessary work :(
>>>
>>>
>>> Regards
>>> Philipp
>>>
>>>
>>>> +}
>>>> +EXPORT_SYMBOL(pcim_alloc_irq_vectors_affinity);
>>>> +
>>>> +/**
>>>>     * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
>>>>     * @dev: the PCI device to operate on
>>>>     * @nr:  device-relative interrupt vector index (0-based); has different
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 2c44545..3716c67 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -1770,6 +1770,12 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>>>>        unsigned int max_vecs, unsigned int flags,
>>>>        struct irq_affinity *affd);
>>>>    
>>>> +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>> +   unsigned int max_vecs, unsigned int flags);
>>>> +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>>>> +    unsigned int max_vecs, unsigned int flags,
>>>> +    struct irq_affinity *affd);
>>>> +
>>>>    bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
>>>>    struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
>>>>          const struct irq_affinity_desc *affdesc);
>>>> @@ -1812,6 +1818,22 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>>           flags, NULL);
>>>>    }
>>>>    
>>>> +static inline int
>>>> +pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
>>>> +        unsigned int max_vecs, unsigned int flags,
>>>> +        struct irq_affinity *aff_desc)
>>>> +{
>>>> + return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
>>>> +       flags, aff_desc);
>>>> +}
>>>> +static inline int
>>>> +pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>> +       unsigned int max_vecs, unsigned int flags)
>>>> +{
>>>> + return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
>>>> +       flags, NULL);
>>>> +}
>>>> +
>>>>    static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
>>>>    { return false; }
>>>>    static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RESEND PATCH v3 2/3] PCI: switchtec: Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors()
  2026-04-22  2:38 [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Shawn Lin
  2026-04-22  2:38 ` [RESEND PATCH v3 1/3] PCI/MSI: " Shawn Lin
@ 2026-04-22  2:38 ` Shawn Lin
  2026-04-22  2:38 ` [RESEND PATCH v3 3/3] PCI: vmd: " Shawn Lin
  2026-04-22  7:11 ` [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Philipp Stanner
  3 siblings, 0 replies; 9+ messages in thread
From: Shawn Lin @ 2026-04-22  2:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Nirmal Patel, Jonathan Derrick, Kurt Schwemmer, Logan Gunthorpe,
	Philipp Stanner, linux-pci, Shawn Lin

Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors() to
explicitly request devres-managed interrupt vectors. This makes the
driver's intention clear and avoids the ambiguous implicit management
previously provided by pcim_enable_device().

The change prepares the driver for the eventual removal of the hybrid
IRQ management pattern from pcim_enable_device(), ensuring consistent
resource management through devres.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
---

Changes in v3: None
Changes in v2: None

 drivers/pci/switch/switchtec.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index 93ebec9..ed42661 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -1493,9 +1493,9 @@ static int switchtec_init_isr(struct switchtec_dev *stdev)
 	if (nirqs < 4)
 		nirqs = 4;
 
-	nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, nirqs,
-				      PCI_IRQ_MSIX | PCI_IRQ_MSI |
-				      PCI_IRQ_VIRTUAL);
+	nvecs = pcim_alloc_irq_vectors(stdev->pdev, 1, nirqs,
+				       PCI_IRQ_MSIX | PCI_IRQ_MSI |
+				       PCI_IRQ_VIRTUAL);
 	if (nvecs < 0)
 		return nvecs;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RESEND PATCH v3 3/3] PCI: vmd: Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors()
  2026-04-22  2:38 [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Shawn Lin
  2026-04-22  2:38 ` [RESEND PATCH v3 1/3] PCI/MSI: " Shawn Lin
  2026-04-22  2:38 ` [RESEND PATCH v3 2/3] PCI: switchtec: Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors() Shawn Lin
@ 2026-04-22  2:38 ` Shawn Lin
  2026-04-22  7:11 ` [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Philipp Stanner
  3 siblings, 0 replies; 9+ messages in thread
From: Shawn Lin @ 2026-04-22  2:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Nirmal Patel, Jonathan Derrick, Kurt Schwemmer, Logan Gunthorpe,
	Philipp Stanner, linux-pci, Shawn Lin

Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors() to
explicitly request devres-managed interrupt vectors. This makes the
driver's intention clear and avoids the ambiguous implicit management
previously provided by pcim_enable_device().

The change prepares the driver for the eventual removal of the hybrid
IRQ management pattern from pcim_enable_device(), ensuring consistent
resource management through devres.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---

Changes in v3: None
Changes in v2: None

 drivers/pci/controller/vmd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index d4ae250..d8de63a 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -684,8 +684,8 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
 	if (vmd->msix_count < 0)
 		return -ENODEV;
 
-	vmd->msix_count = pci_alloc_irq_vectors(dev, vmd->first_vec + 1,
-						vmd->msix_count, PCI_IRQ_MSIX);
+	vmd->msix_count = pcim_alloc_irq_vectors(dev, vmd->first_vec + 1,
+						 vmd->msix_count, PCI_IRQ_MSIX);
 	if (vmd->msix_count < 0)
 		return vmd->msix_count;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation
  2026-04-22  2:38 [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Shawn Lin
                   ` (2 preceding siblings ...)
  2026-04-22  2:38 ` [RESEND PATCH v3 3/3] PCI: vmd: " Shawn Lin
@ 2026-04-22  7:11 ` Philipp Stanner
  3 siblings, 0 replies; 9+ messages in thread
From: Philipp Stanner @ 2026-04-22  7:11 UTC (permalink / raw)
  To: Shawn Lin, Bjorn Helgaas
  Cc: Nirmal Patel, Jonathan Derrick, Kurt Schwemmer, Logan Gunthorpe,
	Philipp Stanner, linux-pci

Hi,

I did receive your first send-out of this v3. In case of doubt it's
possible to check on lore.kernel.org whether it was actually sent out.

On Wed, 2026-04-22 at 10:38 +0800, Shawn Lin wrote:
> 
> There is a long-standing design issue in the PCI/MSI subsystem where the
> implicit, automatic management of IRQ vectors by the devres framework
> conflicts with explicit driver cleanup, creating ambiguity and potential
> resource management bugs.
> 
> Historically, pcim_enable_device() not only manages standard PCI resources
> (BARs) via devres
> 

JFYI, it doesn't do that anymore, since I fully removed that.

What is left though is the weird "plural functions",
pcim_iomap_regions().

I've done some work to also remove them, but it's a lot of work, and
especially the ATA subsystem is relatively difficult to port.

>  but also implicitly triggers automatic IRQ vector management
> when calling pci_alloc_irq_vectors[_affinity], because pcim_enable_device()
> sets is_managed flag, thus pcim_msi_release() will register a cleanup action.
> 
> This creates an ambiguous ownership model. Many drivers follow a pattern of:
> 1. Calling pci_alloc_irq_vectors() to allocate interrupts.
> 2. Also calling pci_free_irq_vectors() in their error paths or remove routines.
> 
> When such a driver also uses pcim_enable_device(), the devres framework may
> attempt to free the IRQ vectors a second time upon device release, leading to
> a double-free. Analysis of the tree shows this hazardous pattern exists widely,
> while 35 other drivers correctly rely solely on the implicit cleanup.

Also just as an idea: for your removal strategy it's probably a good
idea to carry a list of the problematic drivers along, if that's not
too much work. Once created, you could continuously update it on your
machine, to keep track and later to provide evidence when you finally
remove the API. That "35 other drivers" do it *correctly* isn't really
that useful to know; what's important is to know who might explode.

I did it this way:
https://lore.kernel.org/linux-pci/20250519112959.25487-2-phasta@kernel.org/

That then helps the maintainer to fearlessly merge your final removal,
building trust that the drivers are safe.

> 
> This series introduces new managed APIs: pcim_alloc_irq_vectors()and
> pcim_alloc_irq_vectors_affinity(). Drivers that wish to have devres-managed IRQ
> vectors should use these functions. They are currently the same as non-devres
> managed version. In the short term, the series converts two drivers within the
> PCI subsystem to use the new APIs. The long-term goal is to convert all other
> drivers which wish to use these managed functions, and finally to remove the
> problematic hybrid management pattern from pcim_enable_device() and
> pcim_setup_msi_release() entirely.

10/10

P.

> 
> 
> Changes in v3:
> - Rework the commit message and function doc (Philipp)
> - Remove setting is_msi_managed flag from new APIs (Philipp)
> 
> Changes in v2:
> - Rebase
> - Introduce patches only for PCI subsystem to convert the API
> 
> Shawn Lin (3):
>   PCI/MSI: Add Devres managed IRQ vectors allocation
>   PCI: switchtec: Replace pci_alloc_irq_vectors() with
>     pcim_alloc_irq_vectors()
>   PCI: vmd: Replace pci_alloc_irq_vectors() with
>     pcim_alloc_irq_vectors()
> 
>  drivers/pci/controller/vmd.c   |  4 ++--
>  drivers/pci/msi/api.c          | 47 ++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/switch/switchtec.c |  6 +++---
>  include/linux/pci.h            | 22 ++++++++++++++++++++
>  4 files changed, 74 insertions(+), 5 deletions(-)
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-24  7:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22  2:38 [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Shawn Lin
2026-04-22  2:38 ` [RESEND PATCH v3 1/3] PCI/MSI: " Shawn Lin
2026-04-22  7:32   ` Philipp Stanner
2026-04-22  8:36     ` Shawn Lin
2026-04-22 13:07       ` Philipp Stanner
2026-04-24  7:23         ` Shawn Lin
2026-04-22  2:38 ` [RESEND PATCH v3 2/3] PCI: switchtec: Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors() Shawn Lin
2026-04-22  2:38 ` [RESEND PATCH v3 3/3] PCI: vmd: " Shawn Lin
2026-04-22  7:11 ` [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Philipp Stanner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox