* [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors
@ 2023-05-09 10:41 Lukas Wunner
2023-05-09 10:41 ` [PATCH 5.4.y 2/2] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock Lukas Wunner
2023-05-15 12:40 ` [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors Greg KH
0 siblings, 2 replies; 3+ messages in thread
From: Lukas Wunner @ 2023-05-09 10:41 UTC (permalink / raw)
To: stable
Cc: Anatoli.Antonovitch, alex.williamson, amichon, andrey2805,
ashok.raj, bhelgaas, dstein, ian.may, michael.haeuptle,
mika.westerberg, rahul.kumar1, sathyanarayanan.kuppuswamy,
wangxiongfeng2, zhangjialin11
From: Hans de Goede <hdegoede@redhat.com>
commit 085a9f43433f30cbe8a1ade62d9d7827c3217f4d upstream.
Use down_read_nested() and down_write_nested() when taking the
ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
the path to the PCI root bus as lock subclass parameter.
This fixes the following false-positive lockdep report when unplugging a
Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:
pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
============================================
WARNING: possible recursive locking detected
5.16.0-rc2+ #621 Not tainted
--------------------------------------------
irq/124-pciehp/86 is trying to acquire lock:
ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
but task is already holding lock:
ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ctrl->reset_lock);
lock(&ctrl->reset_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by irq/124-pciehp/86:
#0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
#1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
#2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
stack backtrace:
CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
Call Trace:
<TASK>
dump_stack_lvl+0x59/0x73
__lock_acquire.cold+0xc5/0x2c6
lock_acquire+0xb5/0x2b0
down_read+0x3e/0x50
pciehp_check_presence+0x23/0x80
pciehp_runtime_resume+0x5c/0xa0
device_for_each_child+0x45/0x70
pcie_port_device_runtime_resume+0x20/0x30
pci_pm_runtime_resume+0xa7/0xc0
__rpm_callback+0x41/0x110
rpm_callback+0x59/0x70
rpm_resume+0x512/0x7b0
__pm_runtime_resume+0x4a/0x90
__device_release_driver+0x28/0x240
device_release_driver+0x26/0x40
pci_stop_bus_device+0x68/0x90
pci_stop_bus_device+0x2c/0x90
pci_stop_and_remove_bus_device+0xe/0x20
pciehp_unconfigure_device+0x6c/0x110
pciehp_disable_slot+0x5b/0xe0
pciehp_handle_presence_or_link_change+0xc3/0x2f0
pciehp_ist+0x179/0x180
This lockdep warning is triggered because with Thunderbolt, hotplug ports
are nested. When removing multiple devices in a daisy-chain, each hotplug
port's reset_lock may be acquired recursively. It's never the same lock, so
the lockdep splat is a false positive.
Because locks at the same hierarchy level are never acquired recursively, a
per-level lockdep class is sufficient to fix the lockdep warning.
The choice to use one lockdep subclass per pcie-hotplug controller in the
path to the root-bus was made to conserve class keys because their number
is limited and the complexity grows quadratically with number of keys
according to Documentation/locking/lockdep-design.rst.
Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
Reported-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
[lukas: backport to v5.4-stable]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
drivers/pci/hotplug/pciehp.h | 3 +++
drivers/pci/hotplug/pciehp_core.c | 2 +-
drivers/pci/hotplug/pciehp_hpc.c | 19 +++++++++++++++++--
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index aa61d4c219d7..79a713f5dbf4 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -72,6 +72,8 @@ extern int pciehp_poll_time;
* @reset_lock: prevents access to the Data Link Layer Link Active bit in the
* Link Status register and to the Presence Detect State bit in the Slot
* Status register during a slot reset which may cause them to flap
+ * @depth: Number of additional hotplug ports in the path to the root bus,
+ * used as lock subclass for @reset_lock
* @ist_running: flag to keep user request waiting while IRQ thread is running
* @request_result: result of last user request submitted to the IRQ thread
* @requester: wait queue to wake up on completion of user request,
@@ -102,6 +104,7 @@ struct controller {
struct hotplug_slot hotplug_slot; /* hotplug core interface */
struct rw_semaphore reset_lock;
+ unsigned int depth;
unsigned int ist_running;
int request_result;
wait_queue_head_t requester;
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 312cc45c44c7..f5255045b149 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -165,7 +165,7 @@ static void pciehp_check_presence(struct controller *ctrl)
{
int occupied;
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
mutex_lock(&ctrl->state_lock);
occupied = pciehp_card_present_or_link_active(ctrl);
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 13f3bc239c66..651664fe4058 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -674,7 +674,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
* Disable requests have higher priority than Presence Detect Changed
* or Data Link Layer State Changed events.
*/
- down_read(&ctrl->reset_lock);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
if (events & DISABLE_SLOT)
pciehp_handle_disable_request(ctrl);
else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
@@ -808,7 +808,7 @@ int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
if (probe)
return 0;
- down_write(&ctrl->reset_lock);
+ down_write_nested(&ctrl->reset_lock, ctrl->depth);
if (!ATTN_BUTTN(ctrl)) {
ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
@@ -864,6 +864,20 @@ static inline void dbg_ctrl(struct controller *ctrl)
#define FLAG(x, y) (((x) & (y)) ? '+' : '-')
+static inline int pcie_hotplug_depth(struct pci_dev *dev)
+{
+ struct pci_bus *bus = dev->bus;
+ int depth = 0;
+
+ while (bus->parent) {
+ bus = bus->parent;
+ if (bus->self && bus->self->is_hotplug_bridge)
+ depth++;
+ }
+
+ return depth;
+}
+
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
@@ -877,6 +891,7 @@ struct controller *pcie_init(struct pcie_device *dev)
return NULL;
ctrl->pcie = dev;
+ ctrl->depth = pcie_hotplug_depth(dev->port);
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
if (pdev->hotplug_user_indicators)
--
2.39.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 5.4.y 2/2] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock
2023-05-09 10:41 [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors Lukas Wunner
@ 2023-05-09 10:41 ` Lukas Wunner
2023-05-15 12:40 ` [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors Greg KH
1 sibling, 0 replies; 3+ messages in thread
From: Lukas Wunner @ 2023-05-09 10:41 UTC (permalink / raw)
To: stable
Cc: Anatoli.Antonovitch, alex.williamson, amichon, andrey2805,
ashok.raj, bhelgaas, dstein, ian.may, michael.haeuptle,
mika.westerberg, rahul.kumar1, sathyanarayanan.kuppuswamy,
wangxiongfeng2, zhangjialin11
commit f5eff5591b8f9c5effd25c92c758a127765f74c1 upstream.
In 2013, commits
2e35afaefe64 ("PCI: pciehp: Add reset_slot() method")
608c388122c7 ("PCI: Add slot reset option to pci_dev_reset()")
amended PCIe hotplug to mask Presence Detect Changed events during a
Secondary Bus Reset. The reset thus no longer causes gratuitous slot
bringdown and bringup.
However the commits neglected to serialize reset with code paths reading
slot registers. For instance, a slot bringup due to an earlier hotplug
event may see the Presence Detect State bit cleared during a concurrent
Secondary Bus Reset.
In 2018, commit
5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
retrofitted the missing locking. It introduced a reset_lock which
serializes a Secondary Bus Reset with other parts of pciehp.
Unfortunately the locking turns out to be overzealous: reset_lock is
held for the entire enumeration and de-enumeration of hotplugged devices,
including driver binding and unbinding.
Driver binding and unbinding acquires device_lock while the reset_lock
of the ancestral hotplug port is held. A concurrent Secondary Bus Reset
acquires the ancestral reset_lock while already holding the device_lock.
The asymmetric locking order in the two code paths can lead to AB-BA
deadlocks.
Michael Haeuptle reports such deadlocks on simultaneous hot-removal and
vfio release (the latter implies a Secondary Bus Reset):
pciehp_ist() # down_read(reset_lock)
pciehp_handle_presence_or_link_change()
pciehp_disable_slot()
__pciehp_disable_slot()
remove_board()
pciehp_unconfigure_device()
pci_stop_and_remove_bus_device()
pci_stop_bus_device()
pci_stop_dev()
device_release_driver()
device_release_driver_internal()
__device_driver_lock() # device_lock()
SYS_munmap()
vfio_device_fops_release()
vfio_device_group_close()
vfio_device_close()
vfio_device_last_close()
vfio_pci_core_close_device()
vfio_pci_core_disable() # device_lock()
__pci_reset_function_locked()
pci_reset_bus_function()
pci_dev_reset_slot_function()
pci_reset_hotplug_slot()
pciehp_reset_slot() # down_write(reset_lock)
Ian May reports the same deadlock on simultaneous hot-removal and an
AER-induced Secondary Bus Reset:
aer_recover_work_func()
pcie_do_recovery()
aer_root_reset()
pci_bus_error_reset()
pci_slot_reset()
pci_slot_lock() # device_lock()
pci_reset_hotplug_slot()
pciehp_reset_slot() # down_write(reset_lock)
Fix by releasing the reset_lock during driver binding and unbinding,
thereby splitting and shrinking the critical section.
Driver binding and unbinding is protected by the device_lock() and thus
serialized with a Secondary Bus Reset. There's no need to additionally
protect it with the reset_lock. However, pciehp does not bind and
unbind devices directly, but rather invokes PCI core functions which
also perform certain enumeration and de-enumeration steps.
The reset_lock's purpose is to protect slot registers, not enumeration
and de-enumeration of hotplugged devices. That would arguably be the
job of the PCI core, not the PCIe hotplug driver. After all, an
AER-induced Secondary Bus Reset may as well happen during boot-time
enumeration of the PCI hierarchy and there's no locking to prevent that
either.
Exempting *de-enumeration* from the reset_lock is relatively harmless:
A concurrent Secondary Bus Reset may foil config space accesses such as
PME interrupt disablement. But if the device is physically gone, those
accesses are pointless anyway. If the device is physically present and
only logically removed through an Attention Button press or the sysfs
"power" attribute, PME interrupts as well as DMA cannot come through
because pciehp_unconfigure_device() disables INTx and Bus Master bits.
That's still protected by the reset_lock in the present commit.
Exempting *enumeration* from the reset_lock also has limited impact:
The exempted call to pci_bus_add_device() may perform device accesses
through pcibios_bus_add_device() and pci_fixup_device() which are now
no longer protected from a concurrent Secondary Bus Reset. Otherwise
there should be no impact.
In essence, the present commit seeks to fix the AB-BA deadlocks while
still retaining a best-effort reset protection for enumeration and
de-enumeration of hotplugged devices -- until a general solution is
implemented in the PCI core.
Link: https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM
Link: https://lore.kernel.org/linux-pci/20200615143250.438252-1-ian.may@canonical.com
Link: https://lore.kernel.org/linux-pci/ce878dab-c0c4-5bd0-a725-9805a075682d@amd.com
Link: https://lore.kernel.org/linux-pci/ed831249-384a-6d35-0831-70af191e9bce@huawei.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590
Fixes: 5b3f7b7d062b ("PCI: pciehp: Avoid slot access during reset")
Link: https://lore.kernel.org/r/fef2b2e9edf245c049a8c5b94743c0f74ff5008a.1681191902.git.lukas@wunner.de
Reported-by: Michael Haeuptle <michael.haeuptle@hpe.com>
Reported-by: Ian May <ian.may@canonical.com>
Reported-by: Andrey Grodzovsky <andrey2805@gmail.com>
Reported-by: Rahul Kumar <rahul.kumar1@amd.com>
Reported-by: Jialin Zhang <zhangjialin11@huawei.com>
Tested-by: Anatoli Antonovitch <Anatoli.Antonovitch@amd.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Dan Stein <dstein@hpe.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Alex Michon <amichon@kalrayinc.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
---
drivers/pci/hotplug/pciehp_pci.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index d17f3bf36f70..ad12515a4a12 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -63,7 +63,14 @@ int pciehp_configure_device(struct controller *ctrl)
pci_assign_unassigned_bridge_resources(bridge);
pcie_bus_configure_settings(parent);
+
+ /*
+ * Release reset_lock during driver binding
+ * to avoid AB-BA deadlock with device_lock.
+ */
+ up_read(&ctrl->reset_lock);
pci_bus_add_devices(parent);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
out:
pci_unlock_rescan_remove();
@@ -104,7 +111,15 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
bus_list) {
pci_dev_get(dev);
+
+ /*
+ * Release reset_lock during driver unbinding
+ * to avoid AB-BA deadlock with device_lock.
+ */
+ up_read(&ctrl->reset_lock);
pci_stop_and_remove_bus_device(dev);
+ down_read_nested(&ctrl->reset_lock, ctrl->depth);
+
/*
* Ensure that no new Requests will be generated from
* the device.
--
2.39.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors
2023-05-09 10:41 [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors Lukas Wunner
2023-05-09 10:41 ` [PATCH 5.4.y 2/2] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock Lukas Wunner
@ 2023-05-15 12:40 ` Greg KH
1 sibling, 0 replies; 3+ messages in thread
From: Greg KH @ 2023-05-15 12:40 UTC (permalink / raw)
To: Lukas Wunner
Cc: stable, Anatoli.Antonovitch, alex.williamson, amichon, andrey2805,
ashok.raj, bhelgaas, dstein, ian.may, michael.haeuptle,
mika.westerberg, rahul.kumar1, sathyanarayanan.kuppuswamy,
wangxiongfeng2, zhangjialin11
On Tue, May 09, 2023 at 12:41:09PM +0200, Lukas Wunner wrote:
> From: Hans de Goede <hdegoede@redhat.com>
>
> commit 085a9f43433f30cbe8a1ade62d9d7827c3217f4d upstream.
All now queued up, thanks.
greg k-h
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-15 12:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-09 10:41 [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors Lukas Wunner
2023-05-09 10:41 ` [PATCH 5.4.y 2/2] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock Lukas Wunner
2023-05-15 12:40 ` [PATCH 5.4.y 1/2] PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox