From: Bjorn Helgaas <helgaas@kernel.org>
To: Lukas Wunner <lukas@wunner.de>
Cc: Anatoli Antonovitch <anatoli.antonovitch@amd.com>,
Keith Busch <kbusch@kernel.org>,
Alex Deucher <alexander.deucher@amd.com>,
Christian Koenig <christian.koenig@amd.com>,
Andrey Grodzovsky <andrey.grodzovsky@amd.com>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH] PCI: hotplug: Allow marking devices as disconnected during bind/unbind
Date: Wed, 15 Feb 2023 15:13:54 -0600 [thread overview]
Message-ID: <20230215211354.GA3216458@bhelgaas> (raw)
In-Reply-To: <3dc88ea82bdc0e37d9000e413d5ebce481cbd629.1674205689.git.lukas@wunner.de>
On Fri, Jan 20, 2023 at 10:19:02AM +0100, Lukas Wunner wrote:
> On surprise removal, pciehp_unconfigure_device() and acpiphp's
> trim_stale_devices() call pci_dev_set_disconnected() to mark removed
> devices as permanently offline. Thereby, the PCI core and drivers know
> to skip device accesses.
>
> However pci_dev_set_disconnected() takes the device_lock and thus waits
> for a concurrent driver bind or unbind to complete. As a result, the
> driver's ->probe and ->remove hooks have no chance to learn that the
> device is gone.
>
> That doesn't make any sense, so drop the device_lock and instead use
> atomic xchg() and cmpxchg() operations to update the device state.
>
> As a byproduct, an AB-BA deadlock reported by Anatoli is fixed which
> occurs on surprise removal with AER concurrently performing a bus reset.
>
> AER bus reset:
>
> INFO: task irq/26-aerdrv:95 blocked for more than 120 seconds.
> Tainted: G W 6.2.0-rc3-custom-norework-jan11+
> schedule
> rwsem_down_write_slowpath
> down_write_nested
> pciehp_reset_slot # acquires reset_lock
> pci_reset_hotplug_slot
> pci_slot_reset # acquires device_lock
> pci_bus_error_reset
> aer_root_reset
> pcie_do_recovery
> aer_process_err_devices
> aer_isr
>
> pciehp surprise removal:
>
> INFO: task irq/26-pciehp:96 blocked for more than 120 seconds.
> Tainted: G W 6.2.0-rc3-custom-norework-jan11+
> schedule_preempt_disabled
> __mutex_lock
> mutex_lock_nested
> pci_dev_set_disconnected # acquires device_lock
> pci_walk_bus
> pciehp_unconfigure_device
> pciehp_disable_slot
> pciehp_handle_presence_or_link_change
> pciehp_ist # acquires reset_lock
>
> Fixes: a6bd101b8f84 ("PCI: Unify device inaccessible")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590
> Reported-by: Anatoli Antonovitch <anatoli.antonovitch@amd.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v4.20+
> Cc: Keith Busch <kbusch@kernel.org>
Applied to pci/hotplug for v6.3, thanks!
The xchg()/cmpxchg() is a little subtle just because we don't see it
every day, but a really nice simplification and explanation of the
state change in addition to the locking improvement. Thank you!
> ---
> drivers/pci/pci.h | 43 +++++++++++++------------------------------
> 1 file changed, 13 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 9ed3b55..5d5a44a 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -310,53 +310,36 @@ struct pci_sriov {
> * @dev: PCI device to set new error_state
> * @new: the state we want dev to be in
> *
> - * Must be called with device_lock held.
> + * If the device is experiencing perm_failure, it has to remain in that state.
> + * Any other transition is allowed.
> *
> * Returns true if state has been changed to the requested state.
> */
> static inline bool pci_dev_set_io_state(struct pci_dev *dev,
> pci_channel_state_t new)
> {
> - bool changed = false;
> + pci_channel_state_t old;
>
> - device_lock_assert(&dev->dev);
> switch (new) {
> case pci_channel_io_perm_failure:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - case pci_channel_io_perm_failure:
> - changed = true;
> - break;
> - }
> - break;
> + xchg(&dev->error_state, pci_channel_io_perm_failure);
> + return true;
> case pci_channel_io_frozen:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - changed = true;
> - break;
> - }
> - break;
> + old = cmpxchg(&dev->error_state, pci_channel_io_normal,
> + pci_channel_io_frozen);
> + return old != pci_channel_io_perm_failure;
> case pci_channel_io_normal:
> - switch (dev->error_state) {
> - case pci_channel_io_frozen:
> - case pci_channel_io_normal:
> - changed = true;
> - break;
> - }
> - break;
> + old = cmpxchg(&dev->error_state, pci_channel_io_frozen,
> + pci_channel_io_normal);
> + return old != pci_channel_io_perm_failure;
> + default:
> + return false;
> }
> - if (changed)
> - dev->error_state = new;
> - return changed;
> }
>
> static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
> {
> - device_lock(&dev->dev);
> pci_dev_set_io_state(dev, pci_channel_io_perm_failure);
> - device_unlock(&dev->dev);
>
> return 0;
> }
> --
> 2.39.1
>
prev parent reply other threads:[~2023-02-15 21:14 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 9:19 [PATCH] PCI: hotplug: Allow marking devices as disconnected during bind/unbind Lukas Wunner
2023-01-20 9:41 ` Christian König
2023-01-20 17:52 ` Lukas Wunner
2023-02-15 21:13 ` Bjorn Helgaas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230215211354.GA3216458@bhelgaas \
--to=helgaas@kernel.org \
--cc=alexander.deucher@amd.com \
--cc=anatoli.antonovitch@amd.com \
--cc=andrey.grodzovsky@amd.com \
--cc=christian.koenig@amd.com \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).