From: Ian May <ian.may@canonical.com>
To: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>, Lukas Wunner <lukas@wunner.de>
Subject: Re: [PATCH] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock
Date: Tue, 8 Sep 2020 11:33:28 -0500 [thread overview]
Message-ID: <20200908163328.GA2703021@uws> (raw)
In-Reply-To: <908047f7699d9de9ec2efd6b79aa752d73dab4b6.1595329748.git.lukas@wunner.de>
Hi All,
Just wanted to touch base on the status of this patch. It resolves a deadlock I am seeing between AER and Hotplug when a controller is reset. I have a machine that can consistently recreate this deadlock, so if there is any further investigation needed I'd be willing to help in any way I can.
Thanks,
Ian
On 2020-07-21 13:17:50 , Lukas Wunner wrote:
> Back in 2013, commits
>
> 2e35afaefe64 ("PCI: pciehp: Add reset_slot() method")
> 608c388122c7 ("PCI: Add slot reset option to pci_dev_reset()")
>
> introduced the callback pciehp_reset_slot() to the PCIe hotplug driver
> and amended __pci_dev_reset() (today __pci_reset_function_locked()) to
> invoke it when resetting a hotplug port's child. The callback performs
> a Secondary Bus Reset and ensures that an ensuing link or presence flap
> is ignored by pciehp.
>
> However the commits did not perform any locking, in particular:
>
> * No precautions were taken to prevent concurrent execution of the new
> callback with pciehp's IRQ handler or a sysfs request to bring the
> slot up or down. These code paths may see flapping link or presence
> bits during a slot reset.
>
> * pciehp is not prevented from unbinding while the new callback accesses
> its struct controller. Commit 608c388122c7 did take a reference on
> pciehp's module, but that's not sufficient. It only keeps pciehp's
> code in memory, but doesn't prevent unbinding.
>
> * In pci_dev_reset_slot_function(), commit 608c388122c7 iterates over
> the devices on a bus without holding pci_bus_sem.
>
> In 2018, commit
>
> 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
>
> sought to address the first of these locking issues: It introduced a
> reset_lock which serializes a slot reset with other parts of pciehp.
>
> But Michael Haeuptle reports that deadlocks now occur on simultaneous
> hot-removal and reset of vfio devices because pciehp acquires the
> reset_lock and the device_lock in a different order than
> pci_try_reset_function():
>
> pciehp_ist() # down_read(reset_lock)
> pciehp_handle_presence_or_link_change()
> pciehp_disable_slot()
> __pciehp_disable_slot()
> remove_board()
> pciehp_unconfigure_device()
> pci_stop_and_remove_bus_device()
> pci_stop_bus_device()
> pci_stop_dev()
> device_release_driver()
> device_release_driver_internal()
> __device_driver_lock() # device_lock()
>
> SYS_munmap()
> vfio_device_fops_release()
> vfio_pci_release()
> vfio_pci_disable()
> pci_try_reset_function() # device_lock()
> __pci_reset_function_locked()
> pci_reset_hotplug_slot()
> pciehp_reset_slot() # down_write(reset_lock)
>
> Ian May reports the same deadlock on simultaneous hot-removal and AER
> reset:
>
> aer_recover_work_func()
> pcie_do_recovery()
> aer_root_reset()
> pci_bus_error_reset()
> pci_slot_reset()
> pci_slot_lock() # device_lock()
> pci_reset_hotplug_slot()
> pciehp_reset_slot() # down_write(reset_lock)
>
> Fix by pushing the reset_lock out of pciehp's struct controller and into
> struct pci_slot such that it can be taken by the PCI core before taking
> the device lock.
>
> There's a catch though: Some drivers call __pci_reset_function_locked()
> directly and the function expects that all necessary locks, including
> the reset_lock, have been acquired by the caller. There are callers
> which already hold the device_lock, so they can't acquire the reset_lock
> without re-introducing the AB-BA deadlock:
>
> * drivers/net/ethernet/cavium/liquidio/lio_main.c: octeon_pci_flr()
> * drivers/xen/xen-pciback/pci_stub.c: pcistub_device_release()
> * drivers/xen/xen-pciback/pci_stub.c: pcistub_init_device() (if called
> from pcistub_seize())
>
> In the case of octeon_pci_flr(), the device is reset on driver unbind,
> which is why the device_lock is already held. A possible solution might
> be to add a flag in struct pci_dev with which drivers tell the PCI core
> that the device is handed back in an unclean state and needs a reset.
> The PCI core would then perform the reset on behalf of the driver after
> it has unbound and before any new driver is bound.
>
> As for xen, this patch (which was never applied) explains that a reset
> is performed on bind, unbind and on un-assigning a device from a guest:
>
> https://lore.kernel.org/patchwork/patch/848180/
>
> The unbind code path could be solved by the same solution as for octeon
> and it may also be possible to make it work on bind, though it's unclear
> why a reset on bind is at all necessary. The un-assigning code path is
> fixed by the present commit I think.
>
> For now, the three functions do not acquire the reset_lock. I'm
> inserting a lockdep_assert_held_write() so that a lockdep splat is shown
> as a reminder that liquidio and xen require fixing.
>
> Fixes: 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
> Link: https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM
> Link: https://lore.kernel.org/linux-pci/20200615143250.438252-1-ian.may@canonical.com
> Reported-and-tested-by: Michael Haeuptle <michael.haeuptle@hpe.com>
> Reported-and-tested-by: Ian May <ian.may@canonical.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v4.19+
> Cc: Alex Williamson <alex.williamson@redhat.com>
> ---
> drivers/pci/hotplug/pciehp.h | 5 -----
> drivers/pci/hotplug/pciehp_core.c | 4 ++--
> drivers/pci/hotplug/pciehp_hpc.c | 12 ++++++------
> drivers/pci/pci.c | 17 +++++++++++++++++
> drivers/pci/slot.c | 2 ++
> drivers/vfio/pci/vfio_pci.c | 19 +++++++++++++------
> drivers/xen/xen-pciback/passthrough.c | 14 ++++++++++++--
> drivers/xen/xen-pciback/pci_stub.c | 6 ++++++
> drivers/xen/xen-pciback/vpci.c | 14 ++++++++++++--
> include/linux/pci.h | 8 +++++++-
> 10 files changed, 77 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index 4fd200d..676e579 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -20,7 +20,6 @@
> #include <linux/pci_hotplug.h>
> #include <linux/delay.h>
> #include <linux/mutex.h>
> -#include <linux/rwsem.h>
> #include <linux/workqueue.h>
>
> #include "../pcie/portdrv.h"
> @@ -69,9 +68,6 @@
> * @button_work: work item to turn the slot on or off after 5 seconds
> * in response to an Attention Button press
> * @hotplug_slot: structure registered with the PCI hotplug core
> - * @reset_lock: prevents access to the Data Link Layer Link Active bit in the
> - * Link Status register and to the Presence Detect State bit in the Slot
> - * Status register during a slot reset which may cause them to flap
> * @ist_running: flag to keep user request waiting while IRQ thread is running
> * @request_result: result of last user request submitted to the IRQ thread
> * @requester: wait queue to wake up on completion of user request,
> @@ -102,7 +98,6 @@ struct controller {
> struct delayed_work button_work;
>
> struct hotplug_slot hotplug_slot; /* hotplug core interface */
> - struct rw_semaphore reset_lock;
> unsigned int ist_running;
> int request_result;
> wait_queue_head_t requester;
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index bf779f2..cdb241b 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -165,7 +165,7 @@ static void pciehp_check_presence(struct controller *ctrl)
> {
> int occupied;
>
> - down_read(&ctrl->reset_lock);
> + down_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
> mutex_lock(&ctrl->state_lock);
>
> occupied = pciehp_card_present_or_link_active(ctrl);
> @@ -176,7 +176,7 @@ static void pciehp_check_presence(struct controller *ctrl)
> pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC);
>
> mutex_unlock(&ctrl->state_lock);
> - up_read(&ctrl->reset_lock);
> + up_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
> }
>
> static int pciehp_probe(struct pcie_device *dev)
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 53433b3..a1c9072 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -706,13 +706,17 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
> /*
> * Disable requests have higher priority than Presence Detect Changed
> * or Data Link Layer State Changed events.
> + *
> + * A slot reset may cause flaps of the Presence Detect State bit in the
> + * Slot Status register and the Data Link Layer Link Active bit in the
> + * Link Status register. Prevent by holding the reset lock.
> */
> - down_read(&ctrl->reset_lock);
> + down_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
> if (events & DISABLE_SLOT)
> pciehp_handle_disable_request(ctrl);
> else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
> pciehp_handle_presence_or_link_change(ctrl, events);
> - up_read(&ctrl->reset_lock);
> + up_read(&ctrl->hotplug_slot.pci_slot->reset_lock);
>
> ret = IRQ_HANDLED;
> out:
> @@ -841,8 +845,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
> if (probe)
> return 0;
>
> - down_write(&ctrl->reset_lock);
> -
> if (!ATTN_BUTTN(ctrl)) {
> ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
> stat_mask |= PCI_EXP_SLTSTA_PDC;
> @@ -861,7 +863,6 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
> ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
> pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
>
> - up_write(&ctrl->reset_lock);
> return rc;
> }
>
> @@ -925,7 +926,6 @@ struct controller *pcie_init(struct pcie_device *dev)
> ctrl->slot_cap = slot_cap;
> mutex_init(&ctrl->ctrl_lock);
> mutex_init(&ctrl->state_lock);
> - init_rwsem(&ctrl->reset_lock);
> init_waitqueue_head(&ctrl->requester);
> init_waitqueue_head(&ctrl->queue);
> INIT_DELAYED_WORK(&ctrl->button_work, pciehp_queue_pushbutton_work);
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 45c51af..455da72 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4902,6 +4902,8 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, int probe)
> if (!hotplug || !try_module_get(hotplug->owner))
> return rc;
>
> + lockdep_assert_held_write(&hotplug->pci_slot->reset_lock);
> +
> if (hotplug->ops->reset_slot)
> rc = hotplug->ops->reset_slot(hotplug, probe);
>
> @@ -5110,6 +5112,8 @@ int pci_reset_function(struct pci_dev *dev)
> if (!dev->reset_fn)
> return -ENOTTY;
>
> + if (dev->slot)
> + down_write(&dev->slot->reset_lock);
> pci_dev_lock(dev);
> pci_dev_save_and_disable(dev);
>
> @@ -5117,6 +5121,8 @@ int pci_reset_function(struct pci_dev *dev)
>
> pci_dev_restore(dev);
> pci_dev_unlock(dev);
> + if (dev->slot)
> + up_write(&dev->slot->reset_lock);
>
> return rc;
> }
> @@ -5169,6 +5175,9 @@ int pci_try_reset_function(struct pci_dev *dev)
> if (!dev->reset_fn)
> return -ENOTTY;
>
> + if (dev->slot && !down_write_trylock(&dev->slot->reset_lock))
> + return -EAGAIN;
> +
> if (!pci_dev_trylock(dev))
> return -EAGAIN;
>
> @@ -5176,6 +5185,8 @@ int pci_try_reset_function(struct pci_dev *dev)
> rc = __pci_reset_function_locked(dev);
> pci_dev_restore(dev);
> pci_dev_unlock(dev);
> + if (dev->slot)
> + up_write(&dev->slot->reset_lock);
>
> return rc;
> }
> @@ -5274,6 +5285,7 @@ static void pci_slot_lock(struct pci_slot *slot)
> {
> struct pci_dev *dev;
>
> + down_write(&slot->reset_lock);
> list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> if (!dev->slot || dev->slot != slot)
> continue;
> @@ -5295,6 +5307,7 @@ static void pci_slot_unlock(struct pci_slot *slot)
> pci_bus_unlock(dev->subordinate);
> pci_dev_unlock(dev);
> }
> + up_write(&slot->reset_lock);
> }
>
> /* Return 1 on successful lock, 0 on contention */
> @@ -5302,6 +5315,9 @@ static int pci_slot_trylock(struct pci_slot *slot)
> {
> struct pci_dev *dev;
>
> + if (!down_write_trylock(&slot->reset_lock))
> + return 0;
> +
> list_for_each_entry(dev, &slot->bus->devices, bus_list) {
> if (!dev->slot || dev->slot != slot)
> continue;
> @@ -5325,6 +5341,7 @@ static int pci_slot_trylock(struct pci_slot *slot)
> pci_bus_unlock(dev->subordinate);
> pci_dev_unlock(dev);
> }
> + up_write(&slot->reset_lock);
> return 0;
> }
>
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index cc386ef..e8e7d09 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -279,6 +279,8 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
> INIT_LIST_HEAD(&slot->list);
> list_add(&slot->list, &parent->slots);
>
> + init_rwsem(&slot->reset_lock);
> +
> down_read(&pci_bus_sem);
> list_for_each_entry(dev, &parent->devices, bus_list)
> if (PCI_SLOT(dev->devfn) == slot_nr)
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index f634c81..260650e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -454,13 +454,20 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
> * We can not use the "try" reset interface here, which will
> * overwrite the previously restored configuration information.
> */
> - if (vdev->reset_works && pci_cfg_access_trylock(pdev)) {
> - if (device_trylock(&pdev->dev)) {
> - if (!__pci_reset_function_locked(pdev))
> - vdev->needs_reset = false;
> - device_unlock(&pdev->dev);
> + if (vdev->reset_works) {
> + if (!pdev->slot ||
> + down_write_trylock(&pdev->slot->reset_lock)) {
> + if (pci_cfg_access_trylock(pdev)) {
> + if (device_trylock(&pdev->dev)) {
> + if (!__pci_reset_function_locked(pdev))
> + vdev->needs_reset = false;
> + device_unlock(&pdev->dev);
> + }
> + pci_cfg_access_unlock(pdev);
> + }
> + if (pdev->slot)
> + up_write(&pdev->slot->reset_lock);
> }
> - pci_cfg_access_unlock(pdev);
> }
>
> pci_restore_state(pdev);
> diff --git a/drivers/xen/xen-pciback/passthrough.c b/drivers/xen/xen-pciback/passthrough.c
> index 66e9b81..98a9ec8 100644
> --- a/drivers/xen/xen-pciback/passthrough.c
> +++ b/drivers/xen/xen-pciback/passthrough.c
> @@ -89,11 +89,17 @@ static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
> mutex_unlock(&dev_data->lock);
>
> if (found_dev) {
> - if (lock)
> + if (lock) {
> + if (found_dev->slot)
> + down_write(&found_dev->slot->reset_lock);
> device_lock(&found_dev->dev);
> + }
> pcistub_put_pci_dev(found_dev);
> - if (lock)
> + if (lock) {
> device_unlock(&found_dev->dev);
> + if (found_dev->slot)
> + up_write(&found_dev->slot->reset_lock);
> + }
> }
> }
>
> @@ -164,9 +170,13 @@ static void __xen_pcibk_release_devices(struct xen_pcibk_device *pdev)
> list_for_each_entry_safe(dev_entry, t, &dev_data->dev_list, list) {
> struct pci_dev *dev = dev_entry->dev;
> list_del(&dev_entry->list);
> + if (dev->slot)
> + down_write(&dev->slot->reset_lock);
> device_lock(&dev->dev);
> pcistub_put_pci_dev(dev);
> device_unlock(&dev->dev);
> + if (dev->slot)
> + up_write(&dev->slot->reset_lock);
> kfree(dev_entry);
> }
>
> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
> index e876c3d..91779a2 100644
> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -463,6 +463,9 @@ static int __init pcistub_init_devices_late(void)
>
> spin_unlock_irqrestore(&pcistub_devices_lock, flags);
>
> + if (psdev->dev->slot)
> + down_write(&psdev->dev->slot->reset_lock);
> + device_lock(&psdev->dev->dev);
> err = pcistub_init_device(psdev->dev);
> if (err) {
> dev_err(&psdev->dev->dev,
> @@ -470,6 +473,9 @@ static int __init pcistub_init_devices_late(void)
> kfree(psdev);
> psdev = NULL;
> }
> + device_unlock(&psdev->dev->dev);
> + if (psdev->dev->slot)
> + up_write(&psdev->dev->slot->reset_lock);
>
> spin_lock_irqsave(&pcistub_devices_lock, flags);
>
> diff --git a/drivers/xen/xen-pciback/vpci.c b/drivers/xen/xen-pciback/vpci.c
> index 5447b5a..d157b1d 100644
> --- a/drivers/xen/xen-pciback/vpci.c
> +++ b/drivers/xen/xen-pciback/vpci.c
> @@ -171,11 +171,17 @@ static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev,
> mutex_unlock(&vpci_dev->lock);
>
> if (found_dev) {
> - if (lock)
> + if (lock) {
> + if (found_dev->slot)
> + down_write(&found_dev->slot->reset_lock);
> device_lock(&found_dev->dev);
> + }
> pcistub_put_pci_dev(found_dev);
> - if (lock)
> + if (lock) {
> device_unlock(&found_dev->dev);
> + if (found_dev->slot)
> + up_write(&found_dev->slot->reset_lock);
> + }
> }
> }
>
> @@ -216,9 +222,13 @@ static void __xen_pcibk_release_devices(struct xen_pcibk_device *pdev)
> list) {
> struct pci_dev *dev = e->dev;
> list_del(&e->list);
> + if (dev->slot)
> + down_write(&dev->slot->reset_lock);
> device_lock(&dev->dev);
> pcistub_put_pci_dev(dev);
> device_unlock(&dev->dev);
> + if (dev->slot)
> + up_write(&dev->slot->reset_lock);
> kfree(e);
> }
> }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2a2d00e..12869bd 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -38,6 +38,7 @@
> #include <linux/interrupt.h>
> #include <linux/io.h>
> #include <linux/resource_ext.h>
> +#include <linux/rwsem.h>
> #include <uapi/linux/pci.h>
>
> #include <linux/pci_ids.h>
> @@ -65,11 +66,16 @@
> /* return bus from PCI devid = ((u16)bus_number) << 8) | devfn */
> #define PCI_BUS_NUM(x) (((x) >> 8) & 0xff)
>
> -/* pci_slot represents a physical slot */
> +/**
> + * struct pci_slot - represents a physical slot
> + * @reset_lock: held for writing during a slot reset; acquire for reading to
> + * protect access to register bits which may flap upon a reset
> + */
> struct pci_slot {
> struct pci_bus *bus; /* Bus this slot is on */
> struct list_head list; /* Node in list of slots */
> struct hotplug_slot *hotplug; /* Hotplug info (move here) */
> + struct rw_semaphore reset_lock;
> unsigned char number; /* PCI_SLOT(pci_dev->devfn) */
> struct kobject kobj;
> };
> --
> 2.27.0
>
prev parent reply other threads:[~2020-09-08 16:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 11:17 [PATCH] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock Lukas Wunner
[not found] ` <20200723091305.GJ19262@shao2-debian>
2020-07-23 9:51 ` [PCI] 3233e41d3e: WARNING:at_drivers/pci/pci.c:#pci_reset_hotplug_slot Lukas Wunner
2020-09-09 20:40 ` Bjorn Helgaas
2020-09-08 16:33 ` Ian May [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200908163328.GA2703021@uws \
--to=ian.may@canonical.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox