Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Philipp Stanner <pstanner@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] PCI: Fix devres regression in pci_intx()
Date: Fri, 06 Sep 2024 08:45:29 +0200	[thread overview]
Message-ID: <0726caa78b940c2f097b094c732a2c008ab0cbfe.camel@redhat.com> (raw)
In-Reply-To: <2072aac8-cdab-40e3-806c-399d38e683f9@kernel.org>

On Fri, 2024-09-06 at 09:37 +0900, Damien Le Moal wrote:
> On 9/5/24 16:13, Philipp Stanner wrote:
> > On Thu, 2024-09-05 at 09:33 +0900, Damien Le Moal wrote:
> > > On 2024/09/05 6:10, Alex Williamson wrote:
> > > > On Wed, 4 Sep 2024 23:24:53 +0300
> > > > Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
> > > > 
> > > > > Wed, Sep 04, 2024 at 12:07:21PM -0600, Alex Williamson
> > > > > kirjoitti:
> > > > > > On Wed, 04 Sep 2024 15:37:25 +0200
> > > > > > Philipp Stanner <pstanner@redhat.com> wrote:  
> > > > > > > On Wed, 2024-09-04 at 17:25 +0900, Damien Le Moal wrote: 
> > > > > 
> > > > > ...
> > > > > 
> > > > > > > If vfio-pci can get rid of pci_intx() alltogether, that
> > > > > > > might
> > > > > > > be a good
> > > > > > > thing. As far as I understood Andy Shevchenko, pci_intx()
> > > > > > > is
> > > > > > > outdated.
> > > > > > > There's only a hand full of users anyways.  
> > > > > > 
> > > > > > What's the alternative?  
> > > > > 
> > > > > From API perspective the pci_alloc_irq_vectors() & Co should
> > > > > be
> > > > > used.
> > > > 
> > > > We can't replace a device level INTx control with a vector
> > > > allocation
> > > > function.
> > > >  
> > > > > > vfio-pci has a potentially unique requirement
> > > > > > here, we don't know how to handle the device interrupt, we
> > > > > > only
> > > > > > forward
> > > > > > it to the userspace driver.  As a level triggered
> > > > > > interrupt,
> > > > > > INTx will
> > > > > > continue to assert until that userspace driver handles the
> > > > > > device.
> > > > > > That's obviously unacceptable from a host perspective, so
> > > > > > INTx
> > > > > > is
> > > > > > masked at the device via pci_intx() where available, or at
> > > > > > the
> > > > > > interrupt controller otherwise.  The API with the userspace
> > > > > > driver
> > > > > > requires that driver to unmask the interrupt, again
> > > > > > resulting
> > > > > > in a call
> > > > > > to pci_intx() or unmasking the interrupt controller, in
> > > > > > order
> > > > > > to receive
> > > > > > further interrupts from the device.  Thanks,  
> > > > > 
> > > > > I briefly read the discussion and if I understand it
> > > > > correctly
> > > > > the problem here
> > > > > is in the flow: when the above mentioned API is being called.
> > > > > Hence it's design
> > > > > (or architectural) level of issue and changing call from
> > > > > foo() to
> > > > > bar() won't
> > > > > magically make problem go away. But I might be mistaken.
> > > > 
> > > > Certainly from a vector allocation standpoint we can change to
> > > > whatever
> > > > is preferred, but the direct INTx manipulation functions are a
> > > > different thing entirely and afaik there's nothing else that
> > > > can
> > > > replace them at a low level, nor can we just get rid of our
> > > > calls
> > > > to
> > > > pci_intx().  Thanks,
> > > 
> > > But can these calls be moved out of the spinlock context ? If
> > > not,
> > > then we need
> > > to clarify that pci_intx() can be called from any context, which
> > > will
> > > require
> > > changing to a GFP_ATOMIC for the resource allocation, even if the
> > > use
> > > case
> > > cannot trigger the allocation. This is needed to ensure the
> > > correctness of the
> > > pci_intx() function use.
> > 
> > We could do that I guess. As I keep saying, it's not intended to
> > have
> > pci_intx() allocate _permanently_. This is a temporary situation.
> > pci_intx() should have neither devres nor allocation.
> > 
> > > Frankly, I am surprised that the might sleep splat you
> > > got was not already reported before (fuzzying, static analyzers
> > > might
> > > eventually
> > > catch that though).
> > 
> > It's a super rare situation:
> >  * pci_intx() has very few callers
> >  * It only allocates if pcim_enable_device() instead of
> >    pci_enable_device() ran.
> >  * It only allocates when it's called for the FIRST TIME
> >  * All of the above is only a problem while you hold a lock
> > 
> > > 
> > > The other solution would be a version of pci_intx() that has a
> > > gfp
> > > flags
> > > argument to allow callers to use the right gfp flags for the call
> > > context.
> > 
> > I don't think that's a good idea. As I said, I want to clean up all
> > that in the mid term.
> > 
> > As a matter of fact, there is already __pcim_intx() in pci/devres.c
> > as
> > a pure unmanaged pci_intx() as a means to split and then cleanup
> > the
> > APIs.
> 
> Yeah. That naming is in fact confusing. __pcim_intx() should really
> be named
> pci_intx()...
> 
> > One path towards getting the hybrid behavior out of pci_intx()
> > could be
> > to rename __pcim_intx() to pci_intx_unmanaged() and port everyone
> > who
> > uses pci_enable_device() + pci_intx() to that version. That would
> > be
> > better than to have a third version with a gfp_t argument.
> 
> Sounds good. But ideally, all users that rely on the managed variant
> should be
> converted to use pcim_intx() and pci_intx() changed to not call in
> devres. But
> that may be too much code churn (I have not checked).

Exactly that is the plan.

I looked into that yesterday and my idea is to publish __pcim_intx()
under the name pci_intx_unmanaged(), then port everyone who doesn't
rely on managed behavior to that function and then port everyone who
does rely on it to pcim_intx() (as you suggest).
Afterwards you can remove pci_intx_unmanaged() again and also remove
the hybrid behavior from pci_intx().

It's doable on not that much code. Getting it merged might be
politically difficult, though.

The only thing I'm really a bit afraid of is the pci_intx() user in
pci/msi/ – MSI calls that through yet another implicit devres call,
pcim_msi_release().
I *suspect* that pci_intx() in MSI can just be made
pci_intx_unmanaged(), but that should be checked carefully. The doubly
implicit devres magic caused some trouble in the past already, as we
had discussed here [1].


P.

[1] https://lore.kernel.org/all/ee44ea7ac760e73edad3f20b30b4d2fff66c1a85.camel@redhat.com/


  reply	other threads:[~2024-09-06  6:45 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25 12:07 [PATCH] PCI: Fix devres regression in pci_intx() Philipp Stanner
2024-07-25 14:26 ` Christoph Hellwig
2024-07-25 15:21   ` Philipp Stanner
2024-07-25 15:47     ` Christoph Hellwig
2024-07-25 21:00 ` Bjorn Helgaas
2024-07-26  0:19 ` Damien Le Moal
2024-07-26 18:43   ` pstanner
2024-07-26 18:59     ` Bjorn Helgaas
2024-07-29 11:29     ` Damien Le Moal
2024-07-29 15:45       ` Philipp Stanner
2024-09-03 15:44 ` Alex Williamson
2024-09-04  7:06   ` Philipp Stanner
2024-09-04  8:25     ` Damien Le Moal
2024-09-04 13:37       ` Philipp Stanner
2024-09-04 18:07         ` Alex Williamson
2024-09-04 20:24           ` Andy Shevchenko
2024-09-04 21:10             ` Alex Williamson
2024-09-05  0:33               ` Damien Le Moal
2024-09-05  1:56                 ` Alex Williamson
2024-09-05  7:13                 ` Philipp Stanner
2024-09-06  0:37                   ` Damien Le Moal
2024-09-06  6:45                     ` Philipp Stanner [this message]
2024-09-04 12:57     ` Alex Williamson
2024-09-04 13:29       ` Philipp Stanner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0726caa78b940c2f097b094c732a2c008ab0cbfe.camel@redhat.com \
    --to=pstanner@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=andy.shevchenko@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=dlemoal@kernel.org \
    --cc=kwilczynski@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox