From: Philipp Stanner <phasta@mailbox.org>
To: Shawn Lin <shawn.lin@rock-chips.com>,
phasta@kernel.org, Bjorn Helgaas <bhelgaas@google.com>
Cc: Nirmal Patel <nirmal.patel@linux.intel.com>,
Jonathan Derrick <jonathan.derrick@linux.dev>,
Kurt Schwemmer <kurt.schwemmer@microsemi.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-pci@vger.kernel.org
Subject: Re: [RESEND PATCH v3 1/3] PCI/MSI: Add Devres managed IRQ vectors allocation
Date: Wed, 22 Apr 2026 15:07:32 +0200 [thread overview]
Message-ID: <42e31ebc86a34001e98aecc7197e5e5a8fd5bb03.camel@mailbox.org> (raw)
In-Reply-To: <f912358d-925b-1e25-2852-1f265d09032a@rock-chips.com>
On Wed, 2026-04-22 at 16:36 +0800, Shawn Lin wrote:
> Hi Philipp
>
> 在 2026/04/22 星期三 15:32, Philipp Stanner 写道:
> > On Wed, 2026-04-22 at 10:38 +0800, Shawn Lin wrote:
> > > The PCI/MSI subsystem suffers from a long-standing design issue where
> > > the implicit, automatic management of IRQ vectors by the devres
> >
> > nit:
> > The *automatic* handling is not actually the issue. All devres
> > interfaces (pcim_) are automatic by definition.
> >
> > The issue is the "sometimes managed, sometimes not" nature of some pci_
> > interfaces. I came to call those "hybrid functions".
> >
> > > framework conflicts with explicit driver cleanup, leading to ambiguous
> > > ownership and potential resource management bugs.
> > >
> > > Historically, pcim_enable_device() not only manages standard PCI
> > > resources (BARs) via devres
> > >
> >
> > See cover-letter. As I said, that's not the case anymore. It was the
> > case once.
> >
> > I think you could just briefly state "Historically, some functions
> > exhibited this hybrid behavior, the last of which by now are …"
> >
> > If you want you could maybe also Link: one of my patch series's for
> > the wider history.
> >
> > > but also implicitly sets the is_msi_managed
> > > flag if calling pci_alloc_irq_vectors*(). This registers pcim_msi_release()
> > > as a cleanup action, creating a hybrid ownership model.
> >
> > nit:
> > s/ownership/responsibility
> >
> > >
> > > This ambiguity causes problems when drivers follow a common but
> > > hazardous pattern:
> > > 1. Using pcim_enable_device() (which implicitly marks IRQs as managed)
> > > 2. Explicitly calling pci_alloc_irq_vectors() for IRQ allocation
> > > 3. Also calling pci_free_irq_vectors() in error paths or remove routines
> > >
> > > In this scenario, the devres framework may attempt to free the IRQ
> > > vectors a second time upon device release, leading to double-free
> > > issues. Analysis of the tree shows this hazardous pattern exists
> > > in multiple drivers, while 35 other drivers correctly rely solely
> > > on the implicit cleanup.
> >
> > See cover-letter.
> >
> > I would leave that to Bjorn's preference, but I think that info is
> > useful for the cover letter so that reviewers get what's going on, but
> > it's not useful in the commit message. Your description of the
> > problematic API is justification enough IMO.
> >
> > >
> > > To resolve this ambiguity, introduce explicit managed APIs for IRQ
> > > vector allocation:
> > > - pcim_alloc_irq_vectors()
> > > - pcim_alloc_irq_vectors_affinity()
> > >
> > > These functions are the devres-managed counterparts to
> > > pci_alloc_irq_vectors[_affinity](). Drivers that wish to have
> > > devres-managed IRQ vectors should use these new APIs instead.
> > >
> > > The long-term goal is to convert all drivers which use pcim_enable_device()
> > > as well as pci_alloc_irq_vectors[_affinity]() to use these managed
> >
> > nit:
> > s/as well as/in combination with
> >
> > > functions, and eventually remove the problematic hybrid management
> > > pattern from pcim_enable_device() and pcim_setup_msi_release() entirely.
> > > This patch lays the foundation by introducing the APIs.
> >
> > But in general reads very nice and is mostly spot-on :)
> >
> > >
> > > Suggested-by: Philipp Stanner <phasta@kernel.org>
> > > Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> > >
> > > ---
> > >
> > > Changes in v3:
> > > - Rework the commit message and function doc (Philipp)
> > > - Remove setting is_msi_managed flag from new APIs (Philipp)
> > >
> > > Changes in v2:
> > > - Rebase
> > > - Introduce patches only for PCI subsystem to convert the API
> > >
> > > drivers/pci/msi/api.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> > > include/linux/pci.h | 22 ++++++++++++++++++++++
> > > 2 files changed, 69 insertions(+)
> > >
> > > diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
> > > index c18559b..90b67f5 100644
> > > --- a/drivers/pci/msi/api.c
> > > +++ b/drivers/pci/msi/api.c
> > > @@ -297,6 +297,53 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
> > >
> > > /**
> > > + * pcim_alloc_irq_vectors - devres managed pci_alloc_irq_vectors()
> > > + * @dev: the PCI device to operate on
> > > + * @min_vecs: minimum number of vectors required (must be >= 1)
> > > + * @max_vecs: maximum (desired) number of vectors
> > > + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> > > + *
> > > + * This is a device resource managed version of pci_alloc_irq_vectors().
> > > + * Interrupt vectors are automatically freed on driver detach by the
> > > + * devres. Drivers do not need to explicitly call pci_free_irq_vectors().
> >
> > s/the devres/devres
> >
> > s/do not need/must not <- You stated doing so would be a bug, right?
> >
> > > + *
> > > + * Returns number of vectors allocated (which might be smaller than
> > > + * @max_vecs) on success, or a negative error code on failure.
> > > + */
> > > +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > > + unsigned int max_vecs, unsigned int flags)
> > > +{
> > > + return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > + flags, NULL);
> > > +}
> > > +EXPORT_SYMBOL(pcim_alloc_irq_vectors);
> > > +
> > > +/**
> > > + * pcim_alloc_irq_vectors_affinity - devres managed pci_alloc_irq_vectors_affinity()
> > > + * @dev: the PCI device to operate on
> > > + * @min_vecs: minimum number of vectors required (must be >= 1)
> > > + * @max_vecs: maximum (desired) number of vectors
> > > + * @flags: flags for this allocation, see pci_alloc_irq_vectors()
> > > + * @affd: optional description of the affinity requirements
> > > + *
> > > + * This is a device resource managed version of pci_alloc_irq_vectors_affinity().
> > > + * Interrupt vectors are automatically freed on driver detach by the devres
> > > + * machinery. Drivers which use pcim_enable_device() as well as this function,
> > > + * do not need to explicitly call pci_free_irq_vectors() currently.
> > > + *
> > > + * Returns number of vectors allocated (which might be smaller than
> > > + * @max_vecs) on success, or a negative error code on failure.
> > > + */
> > > +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > + unsigned int max_vecs, unsigned int flags,
> > > + struct irq_affinity *affd)
> > > +{
> > > + return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > + flags, affd);
> >
> > Hmmm, OK – I am not thaaaat deep in the topic anymore, so correct me if
> > I am wrong. But to me it seems that we might have a problem.
> >
> > Your intention seems to be to rely on hybrid devres here, isn't it?
> >
>
> Yes, it's.
>
> > Because pci_alloc_irq_vectors_affinity() runs into
> > __pci_enable_msi(x)_range(), which runs into pci_setup_msi_context(),
> > which sets up the devres callbacks.
> >
> > And this would now mean that your new function here *also* is hybrid,
> > because it relies on pcim_enable_device() setting the flag, right?
> >
> > IOW, when someone called pci_enable_device(), your pcim_ function here
> > would NOT be a managed devres function. Right?
> >
> > That would obviously not fly. The current (and IMO) desirable design
> > goal for devres PCI is that pcim_ functions are ALWAYS managed (hence
> > the 'm' = managed, and pci_ functions are NEVER managed).
> >
> > I am sorry if / that I overlooked that while first glimpsing at your
> > previous versions.
> >
>
> Don't worry.
>
> >
> > IIRC for this reason I had to provide a completely unmanaged backend,
> > and to do so I actually had to *copy* existing functions from pci.c, so
> > for many months PCI contained two versions of the same functions.
> >
>
> I was trying to avoid copying the existing functions.
>
> There are some combinations here:
> 1. pcim_enable_device() + pci_alloc -> correct
> 2. pcim_enable_device() + pci_alloc + pci_free -> bug
> 3. pci_enable_device() + pci_alloc + pci_free -> correct
>
> To solve the 2nd case, I need to remove the hybird mode from
> pcim_enable_device(). So what I had in mind is to is :
>
> 1. keep pcim_alloc_irq_vectors() rely on pcim_enable_device(), still
> the hybrid mode as you pointed out.
> 2. convert pcim_enable_device() + pci_alloc_irq_vectors() users to
> pcim_enable_device() + pcim_alloc_irq_vectors(), which is the first
> above case. Meanwhile add documentation or code check to prevent new
> pci_enable_device() + pcim_alloc_irq_vectors() user from using it.
>
> 3. after conversions, we still have three combinations:
> - pcim_enable_device() + pcim_alloc() -> correct
> - pcim_enable_device() + pci_alloc + pci_free -> bug
> - pci_enable_device() + pci_alloc + pci_free -> correct
>
> 4. Then remove pcim_setup_msi_release() from pci_setup_msi_context().
> And add pcim_setup_msi_release() to my new APIs. This way,
> - pcim_enable_device() + pcim_alloc() -> correct
> - pcim_enable_device() + pci_alloc + pci_free -> correct
> - pci_enable_device() + pci_alloc + pci_free -> correct
>
> 5. remove the limitation of using pci_enable_device() + pcim_alloc, as
> mentioned in the 2nd above step. so we could allow this 4th pattern.
> Finally all things work fine.
>
> Please let me know if this approach works for you?
The biggest issue I see is that you add another hybrid function, even
if it's temporary. pcim_alloc_irq_vectors() will sometimes be managed
and sometimes not – until you have removed all users of the
aforementioned functions, and finally performed step №4 from above.
Note that even in the past, with the oldest, most broken version of the
PCI devres API, pcim_ functions were *always* managed, completely
independently from pci(m)_enable_device().
What pcim_enable_device() did was implicitly switch some pci_
functions, such as pci_request_regions(), into managed mode.
The solution presented / intended by you would rely on:
1. New users of pcim_ being aware (through documentation for
example) that this one pcim_ function is special and not always
managed (for now).
2. New users being nice and really using pcim_enable_device().
3. You being able to fully see it all through to the end, i.e., you
not being switched to a different project, switching jobs and
opening a bar in Hawaii etc. etc.
My experience has, unfortunately, been that people often don't read
documentation, especially if they are already used to interfaces like
that working a certain way.
How many drivers are there to be ported, do you have a list and maybe a
time horizon? I took almost a year to remove my helper functions again.
That's not a hard "NACK" from my side, it's just I wouldn't do it like
that. I suppose it is for the maintainers to pick one of the two
solution pathways:
a) have another hybrid function temporarily or
b) have duplicated code temporarily
(where "temporarily" means until the contributor finishes the job :)
BTW, please keep in mind that in my opinion, once PCI/MSI is solved, I
believe that pcim_enable_device() should do nothing anymore but disable
the device on driver detach, without any other side effects.
Regards
P.
>
>
> > The reason was that the hybrid behavior was burried so deep in the
> > subsystem.
> >
> > See here:
> >
> > Here I had to reimplement some functions to be used by my new pcim_
> > functions:
> > https://lore.kernel.org/linux-pci/20240610093149.20640-4-pstanner@redhat.com/
> >
> >
> > And here I could finally remove them later:
> > https://lore.kernel.org/linux-pci/20250519112959.25487-7-phasta@kernel.org/
> >
> >
> > I hope this helps. And again sorry for potentially causing you
> > unnecessary work :(
> >
> >
> > Regards
> > Philipp
> >
> >
> > > +}
> > > +EXPORT_SYMBOL(pcim_alloc_irq_vectors_affinity);
> > > +
> > > +/**
> > > * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
> > > * @dev: the PCI device to operate on
> > > * @nr: device-relative interrupt vector index (0-based); has different
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 2c44545..3716c67 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -1770,6 +1770,12 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > unsigned int max_vecs, unsigned int flags,
> > > struct irq_affinity *affd);
> > >
> > > +int pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > > + unsigned int max_vecs, unsigned int flags);
> > > +int pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > + unsigned int max_vecs, unsigned int flags,
> > > + struct irq_affinity *affd);
> > > +
> > > bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
> > > struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
> > > const struct irq_affinity_desc *affdesc);
> > > @@ -1812,6 +1818,22 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > > flags, NULL);
> > > }
> > >
> > > +static inline int
> > > +pcim_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
> > > + unsigned int max_vecs, unsigned int flags,
> > > + struct irq_affinity *aff_desc)
> > > +{
> > > + return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > + flags, aff_desc);
> > > +}
> > > +static inline int
> > > +pcim_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> > > + unsigned int max_vecs, unsigned int flags)
> > > +{
> > > + return pcim_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
> > > + flags, NULL);
> > > +}
> > > +
> > > static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
> > > { return false; }
> > > static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
> >
> >
next prev parent reply other threads:[~2026-04-22 13:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 2:38 [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Shawn Lin
2026-04-22 2:38 ` [RESEND PATCH v3 1/3] PCI/MSI: " Shawn Lin
2026-04-22 7:32 ` Philipp Stanner
2026-04-22 8:36 ` Shawn Lin
2026-04-22 13:07 ` Philipp Stanner [this message]
2026-04-24 7:23 ` Shawn Lin
2026-04-22 2:38 ` [RESEND PATCH v3 2/3] PCI: switchtec: Replace pci_alloc_irq_vectors() with pcim_alloc_irq_vectors() Shawn Lin
2026-04-22 2:38 ` [RESEND PATCH v3 3/3] PCI: vmd: " Shawn Lin
2026-04-22 7:11 ` [RESEND PATCH v3 0/3] Add Devres managed IRQ vectors allocation Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42e31ebc86a34001e98aecc7197e5e5a8fd5bb03.camel@mailbox.org \
--to=phasta@mailbox.org \
--cc=bhelgaas@google.com \
--cc=jonathan.derrick@linux.dev \
--cc=kurt.schwemmer@microsemi.com \
--cc=linux-pci@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=nirmal.patel@linux.intel.com \
--cc=phasta@kernel.org \
--cc=shawn.lin@rock-chips.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox