All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: "Christian König" <christian.koenig@amd.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Anatoli Antonovitch <anatoli.antonovitch@amd.com>,
	Keith Busch <kbusch@kernel.org>,
	Alex Deucher <alexander.deucher@amd.com>,
	Andrey Grodzovsky <andrey.grodzovsky@amd.com>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: hotplug: Allow marking devices as disconnected during bind/unbind
Date: Fri, 20 Jan 2023 18:52:10 +0100	[thread overview]
Message-ID: <20230120175210.GA18331@wunner.de> (raw)
In-Reply-To: <4b6fc160-5380-d451-6aca-f3a9d636736f@amd.com>

On Fri, Jan 20, 2023 at 10:41:13AM +0100, Christian König wrote:
> Am 20.01.23 um 10:19 schrieb Lukas Wunner:
> > On surprise removal, pciehp_unconfigure_device() and acpiphp's
> > trim_stale_devices() call pci_dev_set_disconnected() to mark removed
> > devices as permanently offline.  Thereby, the PCI core and drivers know
> > to skip device accesses.
> > 
> > However pci_dev_set_disconnected() takes the device_lock and thus waits
> > for a concurrent driver bind or unbind to complete.  As a result, the
> > driver's ->probe and ->remove hooks have no chance to learn that the
> > device is gone.
> 
> Who is reading dev->error_state in this situation and especially do we have
> the necessary read barrier in place?
> 
> Alternatively if this is just opportunistically we should document that
> somehow.

dev->error_state is read by pci_dev_is_disconnected() and by various
drivers.  None of them has a specific read barrier AFAICS or is using
the device_lock to protect read access.

For pci_dev_is_disconnected(), the read is indeed opportunistic.
It's an optimization that prevents the kernel from performing
a lot of non-posted requests to a removed device and waiting
for them to time out.  Or worse, having them be received by a
replacement device.


> > That doesn't make any sense, so drop the device_lock and instead use
> > atomic xchg() and cmpxchg() operations to update the device state.
> 
> You use xchg() instead of WRITE_ONCE() for the memory barrier here?

xchg() implies a full memory barrier to ensure that the new state is
immediately visible to all CPUs.  WRITE_ONCE() would be weaker,
it's just a compiler barrier.

The desire to immediately make the state change visible is why we
absolutely do not want locking here.


> > As a byproduct, an AB-BA deadlock reported by Anatoli is fixed which
> > occurs on surprise removal with AER concurrently performing a bus reset.
> 
> Well this byproduct is probably the main fix in this patch. I'm wondering
> why lockdep didn't complained about that more drastically in our testing.

Well I guess I could rephrase the commit message if you feel I'm burying
the lede.

Thanks,

Lukas

  reply	other threads:[~2023-01-20 17:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20  9:19 [PATCH] PCI: hotplug: Allow marking devices as disconnected during bind/unbind Lukas Wunner
2023-01-20  9:41 ` Christian König
2023-01-20 17:52   ` Lukas Wunner [this message]
2023-02-15 21:13 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230120175210.GA18331@wunner.de \
    --to=lukas@wunner.de \
    --cc=alexander.deucher@amd.com \
    --cc=anatoli.antonovitch@amd.com \
    --cc=andrey.grodzovsky@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.