* [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
[not found] <76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de>
@ 2024-01-02 23:25 ` Bjorn Helgaas
2024-01-02 23:33 ` Kuppuswamy Sathyanarayanan
2024-01-08 8:39 ` Johan Hovold
0 siblings, 2 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2024-01-02 23:25 UTC (permalink / raw)
To: Michael Schaller, Kai-Heng Feng
Cc: linux-pci, linux-kernel, regressions, Maciej W . Rozycki,
Ajay Agarwal, Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman,
Heiner Kallweit, Johan Hovold, Bjorn Helgaas, stable
From: Bjorn Helgaas <bhelgaas@google.com>
This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
Michael reported that when attempting to resume from suspend to RAM on ASUS
mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
with no output, followed by a reboot.
Workarounds include:
- Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
- Booting with "pcie_aspm=off"
- Booting with "pcie_aspm.policy=performance"
- "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
before suspending
- Connecting a USB flash drive
Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
Reported-by: Michael Schaller <michael@5challer.de>
Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org>
---
drivers/pci/pci.c | 6 ++++++
drivers/pci/pci.h | 2 ++
drivers/pci/pcie/aspm.c | 19 +++++++++++++++++++
3 files changed, 27 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 55bc3576a985..bdbf8a94b4d0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1335,6 +1335,9 @@ static int pci_set_full_power_state(struct pci_dev *dev)
pci_restore_bars(dev);
}
+ if (dev->bus->self)
+ pcie_aspm_pm_state_change(dev->bus->self);
+
return 0;
}
@@ -1429,6 +1432,9 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
pci_power_name(dev->current_state),
pci_power_name(state));
+ if (dev->bus->self)
+ pcie_aspm_pm_state_change(dev->bus->self);
+
return 0;
}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5ecbcf041179..f43873049d52 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -569,10 +569,12 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt);
#ifdef CONFIG_PCIEASPM
void pcie_aspm_init_link_state(struct pci_dev *pdev);
void pcie_aspm_exit_link_state(struct pci_dev *pdev);
+void pcie_aspm_pm_state_change(struct pci_dev *pdev);
void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
#else
static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
+static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
#endif
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 50b04ae5c394..8715e951c491 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1008,6 +1008,25 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
up_read(&pci_bus_sem);
}
+/* @pdev: the root port or switch downstream port */
+void pcie_aspm_pm_state_change(struct pci_dev *pdev)
+{
+ struct pcie_link_state *link = pdev->link_state;
+
+ if (aspm_disabled || !link)
+ return;
+ /*
+ * Devices changed PM state, we should recheck if latency
+ * meets all functions' requirement
+ */
+ down_read(&pci_bus_sem);
+ mutex_lock(&aspm_lock);
+ pcie_update_aspm_capable(link->root);
+ pcie_config_aspm_path(link);
+ mutex_unlock(&aspm_lock);
+ up_read(&pci_bus_sem);
+}
+
void pcie_aspm_powersave_config_link(struct pci_dev *pdev)
{
struct pcie_link_state *link = pdev->link_state;
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
@ 2024-01-02 23:33 ` Kuppuswamy Sathyanarayanan
2024-01-03 0:12 ` Bjorn Helgaas
2024-01-08 8:39 ` Johan Hovold
1 sibling, 1 reply; 10+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-01-02 23:33 UTC (permalink / raw)
To: Bjorn Helgaas, Michael Schaller, Kai-Heng Feng
Cc: linux-pci, linux-kernel, regressions, Maciej W . Rozycki,
Ajay Agarwal, Greg Kroah-Hartman, Heiner Kallweit, Johan Hovold,
Bjorn Helgaas, stable
On 1/2/2024 3:25 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
>
> Michael reported that when attempting to resume from suspend to RAM on ASUS
> mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> with no output, followed by a reboot.
>
> Workarounds include:
>
> - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> - Booting with "pcie_aspm=off"
> - Booting with "pcie_aspm.policy=performance"
> - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> before suspending
> - Connecting a USB flash drive
>
Did you find the root cause? Is this issue specific to that particular
device? If yes, can we do a quirk?
> Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> Reported-by: Michael Schaller <michael@5challer.de>
> Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Cc: <stable@vger.kernel.org>
> ---
> drivers/pci/pci.c | 6 ++++++
> drivers/pci/pci.h | 2 ++
> drivers/pci/pcie/aspm.c | 19 +++++++++++++++++++
> 3 files changed, 27 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 55bc3576a985..bdbf8a94b4d0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1335,6 +1335,9 @@ static int pci_set_full_power_state(struct pci_dev *dev)
> pci_restore_bars(dev);
> }
>
> + if (dev->bus->self)
> + pcie_aspm_pm_state_change(dev->bus->self);
> +
> return 0;
> }
>
> @@ -1429,6 +1432,9 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
> pci_power_name(dev->current_state),
> pci_power_name(state));
>
> + if (dev->bus->self)
> + pcie_aspm_pm_state_change(dev->bus->self);
> +
> return 0;
> }
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 5ecbcf041179..f43873049d52 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -569,10 +569,12 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt);
> #ifdef CONFIG_PCIEASPM
> void pcie_aspm_init_link_state(struct pci_dev *pdev);
> void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> +void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> #else
> static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> +static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> #endif
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 50b04ae5c394..8715e951c491 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1008,6 +1008,25 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
> up_read(&pci_bus_sem);
> }
>
> +/* @pdev: the root port or switch downstream port */
> +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> +{
> + struct pcie_link_state *link = pdev->link_state;
> +
> + if (aspm_disabled || !link)
> + return;
> + /*
> + * Devices changed PM state, we should recheck if latency
> + * meets all functions' requirement
> + */
> + down_read(&pci_bus_sem);
> + mutex_lock(&aspm_lock);
> + pcie_update_aspm_capable(link->root);
> + pcie_config_aspm_path(link);
> + mutex_unlock(&aspm_lock);
> + up_read(&pci_bus_sem);
> +}
> +
> void pcie_aspm_powersave_config_link(struct pci_dev *pdev)
> {
> struct pcie_link_state *link = pdev->link_state;
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
2024-01-02 23:33 ` Kuppuswamy Sathyanarayanan
@ 2024-01-03 0:12 ` Bjorn Helgaas
0 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2024-01-03 0:12 UTC (permalink / raw)
To: Kuppuswamy Sathyanarayanan
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal, Greg Kroah-Hartman,
Heiner Kallweit, Johan Hovold, Bjorn Helgaas, stable
On Tue, Jan 02, 2024 at 03:33:51PM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 1/2/2024 3:25 PM, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <bhelgaas@google.com>
> >
> > This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
> >
> > Michael reported that when attempting to resume from suspend to RAM on ASUS
> > mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> > ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> > with no output, followed by a reboot.
> >
> > Workarounds include:
> >
> > - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > - Booting with "pcie_aspm=off"
> > - Booting with "pcie_aspm.policy=performance"
> > - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> > before suspending
> > - Connecting a USB flash drive
>
> Did you find the root cause? Is this issue specific to that particular
> device? If yes, can we do a quirk?
Unfortunately we don't know the root cause yet. Without knowing the
root cause, I don't think we can make a good quirk.
Bjorn
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
2024-01-02 23:33 ` Kuppuswamy Sathyanarayanan
@ 2024-01-08 8:39 ` Johan Hovold
2024-01-22 10:53 ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
1 sibling, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2024-01-08 8:39 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal,
Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman, Heiner Kallweit,
Johan Hovold, Bjorn Helgaas, stable
Hi Bjorn,
On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
>
> Michael reported that when attempting to resume from suspend to RAM on ASUS
> mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> with no output, followed by a reboot.
>
> Workarounds include:
>
> - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> - Booting with "pcie_aspm=off"
> - Booting with "pcie_aspm.policy=performance"
> - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> before suspending
> - Connecting a USB flash drive
>
> Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> Reported-by: Michael Schaller <michael@5challer.de>
> Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Cc: <stable@vger.kernel.org>
> ---
> +/* @pdev: the root port or switch downstream port */
> +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> +{
> + struct pcie_link_state *link = pdev->link_state;
> +
> + if (aspm_disabled || !link)
> + return;
> + /*
> + * Devices changed PM state, we should recheck if latency
> + * meets all functions' requirement
> + */
> + down_read(&pci_bus_sem);
> + mutex_lock(&aspm_lock);
> + pcie_update_aspm_capable(link->root);
> + pcie_config_aspm_path(link);
> + mutex_unlock(&aspm_lock);
> + up_read(&pci_bus_sem);
> +}
This function is now restored in 6.7 final and is called in paths which
already hold the pci_bus_sem as reported by lockdep (see splat below).
This can potentially lead to a deadlock and specifically prevents using
lockdep on Qualcomm platforms.
Not sure if you want to propagate whether the bus semaphore is held to
pcie_aspm_pm_state_change() or if there was some alternative to
restoring this function which should be explored instead.
Johan
============================================
WARNING: possible recursive locking detected
6.7.0 #40 Not tainted
--------------------------------------------
kworker/u16:5/90 is trying to acquire lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
pcieport 0002:00:00.0: PME: Signaling with IRQ 197
but task is already holding lock:
ffffacfa78ced000
pcieport 0002:00:00.0: AER: enabled with IRQ 197
(pci_bus_sem
nvme nvme0: pci function 0002:01:00.0
){++++}-{3:3}
nvme 0002:01:00.0: enabling device (0000 -> 0002)
, at: pci_walk_bus+0x34/0xbc
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(pci_bus_sem);
lock(pci_bus_sem);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by kworker/u16:5/90:
#0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
#1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
#2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
#3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
stack backtrace:
CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
Workqueue: events_unbound async_run_entry_fn
Call trace:
dump_backtrace+0x9c/0x11c
show_stack+0x18/0x24
dump_stack_lvl+0x60/0xac
dump_stack+0x18/0x24
print_deadlock_bug+0x25c/0x348
__lock_acquire+0x10a4/0x2064
lock_acquire+0x1e8/0x318
down_read+0x60/0x184
pcie_aspm_pm_state_change+0x58/0xdc
pci_set_full_power_state+0xa8/0x114
pci_set_power_state+0xc4/0x120
qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
pci_walk_bus+0x64/0xbc
qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
^ permalink raw reply [flat|nested] 10+ messages in thread
* PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
2024-01-08 8:39 ` Johan Hovold
@ 2024-01-22 10:53 ` Johan Hovold
2024-01-22 18:26 ` Bjorn Helgaas
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2024-01-22 10:53 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal,
Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman, Heiner Kallweit,
Johan Hovold, Bjorn Helgaas, stable, regressions
Hi Bjorn,
I never got a reply to this one so resending with updated Subject in
case it got buried in your inbox.
On Mon, Jan 08, 2024 at 09:39:07AM +0100, Johan Hovold wrote:
> On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> > From: Bjorn Helgaas <bhelgaas@google.com>
> >
> > This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
> >
> > Michael reported that when attempting to resume from suspend to RAM on ASUS
> > mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> > ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> > with no output, followed by a reboot.
> >
> > Workarounds include:
> >
> > - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > - Booting with "pcie_aspm=off"
> > - Booting with "pcie_aspm.policy=performance"
> > - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> > before suspending
> > - Connecting a USB flash drive
> >
> > Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > Reported-by: Michael Schaller <michael@5challer.de>
> > Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: <stable@vger.kernel.org>
> > ---
>
> > +/* @pdev: the root port or switch downstream port */
> > +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> > +{
> > + struct pcie_link_state *link = pdev->link_state;
> > +
> > + if (aspm_disabled || !link)
> > + return;
> > + /*
> > + * Devices changed PM state, we should recheck if latency
> > + * meets all functions' requirement
> > + */
> > + down_read(&pci_bus_sem);
> > + mutex_lock(&aspm_lock);
> > + pcie_update_aspm_capable(link->root);
> > + pcie_config_aspm_path(link);
> > + mutex_unlock(&aspm_lock);
> > + up_read(&pci_bus_sem);
> > +}
>
> This function is now restored in 6.7 final and is called in paths which
> already hold the pci_bus_sem as reported by lockdep (see splat below).
>
> This can potentially lead to a deadlock and specifically prevents using
> lockdep on Qualcomm platforms.
>
> Not sure if you want to propagate whether the bus semaphore is held to
> pcie_aspm_pm_state_change() or if there was some alternative to
> restoring this function which should be explored instead.
So to summarise, this patch, which is now commit
f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
introduced a regression in 6.7-final for Qualcomm platforms (and some
Intel platforms) similar to the one recently fixed by commit
f352ce999260 ("PCI: qcom: Fix potential deadlock when enabling ASPM").
Johan
#regzbot introduced: f93e71aea6c6
> ============================================
> WARNING: possible recursive locking detected
> 6.7.0 #40 Not tainted
> --------------------------------------------
> kworker/u16:5/90 is trying to acquire lock:
> ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
> pcieport 0002:00:00.0: PME: Signaling with IRQ 197
>
> but task is already holding lock:
> ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(pci_bus_sem);
> lock(pci_bus_sem);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 4 locks held by kworker/u16:5/90:
> #0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> #1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> #2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
> #3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
>
> stack backtrace:
> CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
> Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
> Workqueue: events_unbound async_run_entry_fn
> Call trace:
> dump_backtrace+0x9c/0x11c
> show_stack+0x18/0x24
> dump_stack_lvl+0x60/0xac
> dump_stack+0x18/0x24
> print_deadlock_bug+0x25c/0x348
> __lock_acquire+0x10a4/0x2064
> lock_acquire+0x1e8/0x318
> down_read+0x60/0x184
> pcie_aspm_pm_state_change+0x58/0xdc
> pci_set_full_power_state+0xa8/0x114
> pci_set_power_state+0xc4/0x120
> qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
> pci_walk_bus+0x64/0xbc
> qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
2024-01-22 10:53 ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
@ 2024-01-22 18:26 ` Bjorn Helgaas
2024-01-23 17:25 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2024-01-22 18:26 UTC (permalink / raw)
To: Johan Hovold
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal,
Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman, Heiner Kallweit,
Johan Hovold, Bjorn Helgaas, stable, regressions
On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
> Hi Bjorn,
>
> I never got a reply to this one so resending with updated Subject in
> case it got buried in your inbox.
I did see it but decided it was better to fix the problem with resume
causing an unintended reboot, even though fixing that meant breaking
lockdep again, since I don't think we have user reports of the
potential deadlock lockdep finds.
08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
start at fixing other problems and also improving the ASPM style, so I
hope somebody steps up to fix both it and the lockdep issue. I
haven't looked at it enough to have a preference for *how* to fix it.
Bjorn
> On Mon, Jan 08, 2024 at 09:39:07AM +0100, Johan Hovold wrote:
>
> > On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> > > From: Bjorn Helgaas <bhelgaas@google.com>
> > >
> > > This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
> > >
> > > Michael reported that when attempting to resume from suspend to RAM on ASUS
> > > mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> > > ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> > > with no output, followed by a reboot.
> > >
> > > Workarounds include:
> > >
> > > - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > > - Booting with "pcie_aspm=off"
> > > - Booting with "pcie_aspm.policy=performance"
> > > - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> > > before suspending
> > > - Connecting a USB flash drive
> > >
> > > Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > > Reported-by: Michael Schaller <michael@5challer.de>
> > > Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: <stable@vger.kernel.org>
> > > ---
> >
> > > +/* @pdev: the root port or switch downstream port */
> > > +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> > > +{
> > > + struct pcie_link_state *link = pdev->link_state;
> > > +
> > > + if (aspm_disabled || !link)
> > > + return;
> > > + /*
> > > + * Devices changed PM state, we should recheck if latency
> > > + * meets all functions' requirement
> > > + */
> > > + down_read(&pci_bus_sem);
> > > + mutex_lock(&aspm_lock);
> > > + pcie_update_aspm_capable(link->root);
> > > + pcie_config_aspm_path(link);
> > > + mutex_unlock(&aspm_lock);
> > > + up_read(&pci_bus_sem);
> > > +}
> >
> > This function is now restored in 6.7 final and is called in paths which
> > already hold the pci_bus_sem as reported by lockdep (see splat below).
> >
> > This can potentially lead to a deadlock and specifically prevents using
> > lockdep on Qualcomm platforms.
> >
> > Not sure if you want to propagate whether the bus semaphore is held to
> > pcie_aspm_pm_state_change() or if there was some alternative to
> > restoring this function which should be explored instead.
>
> So to summarise, this patch, which is now commit
>
> f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
>
> introduced a regression in 6.7-final for Qualcomm platforms (and some
> Intel platforms) similar to the one recently fixed by commit
>
> f352ce999260 ("PCI: qcom: Fix potential deadlock when enabling ASPM").
>
> Johan
>
>
> #regzbot introduced: f93e71aea6c6
>
> > ============================================
> > WARNING: possible recursive locking detected
> > 6.7.0 #40 Not tainted
> > --------------------------------------------
> > kworker/u16:5/90 is trying to acquire lock:
> > ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
> > pcieport 0002:00:00.0: PME: Signaling with IRQ 197
> >
> > but task is already holding lock:
> > ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(pci_bus_sem);
> > lock(pci_bus_sem);
> >
> > *** DEADLOCK ***
> >
> > May be due to missing lock nesting notation
> >
> > 4 locks held by kworker/u16:5/90:
> > #0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> > #1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> > #2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
> > #3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
> >
> > stack backtrace:
> > CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
> > Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
> > Workqueue: events_unbound async_run_entry_fn
> > Call trace:
> > dump_backtrace+0x9c/0x11c
> > show_stack+0x18/0x24
> > dump_stack_lvl+0x60/0xac
> > dump_stack+0x18/0x24
> > print_deadlock_bug+0x25c/0x348
> > __lock_acquire+0x10a4/0x2064
> > lock_acquire+0x1e8/0x318
> > down_read+0x60/0x184
> > pcie_aspm_pm_state_change+0x58/0xdc
> > pci_set_full_power_state+0xa8/0x114
> > pci_set_power_state+0xc4/0x120
> > qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
> > pci_walk_bus+0x64/0xbc
> > qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
2024-01-22 18:26 ` Bjorn Helgaas
@ 2024-01-23 17:25 ` Johan Hovold
2024-01-23 22:36 ` Bjorn Helgaas
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2024-01-23 17:25 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal,
Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman, Heiner Kallweit,
Johan Hovold, Bjorn Helgaas, stable, regressions
On Mon, Jan 22, 2024 at 12:26:15PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
> > I never got a reply to this one so resending with updated Subject in
> > case it got buried in your inbox.
>
> I did see it but decided it was better to fix the problem with resume
> causing an unintended reboot, even though fixing that meant breaking
> lockdep again, since I don't think we have user reports of the
> potential deadlock lockdep finds.
That may be because I fixed the previous regression in 6.7-rc1 before
any users had a chance to hit the deadlock on Qualcomm platforms.
I can easily trigger a deadlock on the X13s by instrumenting 6.7-final
with a delay to increase the race window.
And any user hitting this occasionally is likely not going to be able to
track it down to this lock inversion (unless they have lockdep enabled).
> 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
> start at fixing other problems and also improving the ASPM style, so I
> hope somebody steps up to fix both it and the lockdep issue. I
> haven't looked at it enough to have a preference for *how* to fix it.
Ok, but since you were the one introducing the locking regression in
6.7-final shouldn't you look into fixing it?
Especially if there were alternatives to restoring the offending commit
which would solve the underlying issue for the resume failure without
breaking other platforms.
I don't want to spend more time on this if the offending commit could
simply be reverted.
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
2024-01-23 17:25 ` Johan Hovold
@ 2024-01-23 22:36 ` Bjorn Helgaas
2024-01-24 8:16 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2024-01-23 22:36 UTC (permalink / raw)
To: Johan Hovold
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal,
Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman, Heiner Kallweit,
Johan Hovold, Bjorn Helgaas, stable, regressions
On Tue, Jan 23, 2024 at 06:25:52PM +0100, Johan Hovold wrote:
> On Mon, Jan 22, 2024 at 12:26:15PM -0600, Bjorn Helgaas wrote:
> > On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
> > > I never got a reply to this one so resending with updated Subject in
> > > case it got buried in your inbox.
> >
> > I did see it but decided it was better to fix the problem with resume
> > causing an unintended reboot, even though fixing that meant breaking
> > lockdep again, since I don't think we have user reports of the
> > potential deadlock lockdep finds.
>
> That may be because I fixed the previous regression in 6.7-rc1 before
> any users had a chance to hit the deadlock on Qualcomm platforms.
>
> I can easily trigger a deadlock on the X13s by instrumenting 6.7-final
> with a delay to increase the race window.
>
> And any user hitting this occasionally is likely not going to be able to
> track it down to this lock inversion (unless they have lockdep enabled).
I agree, it's a problem we need to fix.
> > 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
> > start at fixing other problems and also improving the ASPM style, so I
> > hope somebody steps up to fix both it and the lockdep issue. I
> > haven't looked at it enough to have a preference for *how* to fix it.
>
> Ok, but since you were the one introducing the locking regression in
> 6.7-final shouldn't you look into fixing it?
>
> Especially if there were alternatives to restoring the offending commit
> which would solve the underlying issue for the resume failure without
> breaking other platforms.
Did somebody propose an alternate patch? If so, I missed it, but we
could look at it now.
> I don't want to spend more time on this if the offending commit could
> simply be reverted.
I don't quite follow. By simply reverting, do you mean to revert
f93e71aea6c6 ("Revert "PCI/ASPM: Remove
pcie_aspm_pm_state_change()"")? IIUC that would break Michael's
machine again.
Bjorn
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
2024-01-23 22:36 ` Bjorn Helgaas
@ 2024-01-24 8:16 ` Johan Hovold
2024-01-30 10:07 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2024-01-24 8:16 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal,
Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman, Heiner Kallweit,
Johan Hovold, Bjorn Helgaas, stable, regressions
On Tue, Jan 23, 2024 at 04:36:48PM -0600, Bjorn Helgaas wrote:
> On Tue, Jan 23, 2024 at 06:25:52PM +0100, Johan Hovold wrote:
> > On Mon, Jan 22, 2024 at 12:26:15PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
> > > 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
> > > start at fixing other problems and also improving the ASPM style, so I
> > > hope somebody steps up to fix both it and the lockdep issue. I
> > > haven't looked at it enough to have a preference for *how* to fix it.
> >
> > Ok, but since you were the one introducing the locking regression in
> > 6.7-final shouldn't you look into fixing it?
> >
> > Especially if there were alternatives to restoring the offending commit
> > which would solve the underlying issue for the resume failure without
> > breaking other platforms.
>
> Did somebody propose an alternate patch? If so, I missed it, but we
> could look at it now.
I've only skimmed the discussion leading up to the revert, but I got the
impression that other alternatives were looked at as it was still not
clear what the underlying issue actually was.
As Michael and Thorsten pointed out before the revert, it may have been
better not to do a last minute revert of a 16 month old commit which
risks introducing regressions (and brought back another sysfs issue
IIUC) before fully understanding what is really going on here.
> > I don't want to spend more time on this if the offending commit could
> > simply be reverted.
>
> I don't quite follow. By simply reverting, do you mean to revert
> f93e71aea6c6 ("Revert "PCI/ASPM: Remove
> pcie_aspm_pm_state_change()"")? IIUC that would break Michael's
> machine again.
Right, at least until that issue is fully understood and alternative
fixes have been considered.
If that's not an option, we need to rework core to pass a flag through
more than one layer to indicate whether pcie_aspm_pm_state_change()
should take the bus semaphore or not. I'd rather not do that if it can
be avoided.
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
2024-01-24 8:16 ` Johan Hovold
@ 2024-01-30 10:07 ` Johan Hovold
0 siblings, 0 replies; 10+ messages in thread
From: Johan Hovold @ 2024-01-30 10:07 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Michael Schaller, Kai-Heng Feng, linux-pci, linux-kernel,
regressions, Maciej W . Rozycki, Ajay Agarwal,
Kuppuswamy Sathyanarayanan, Greg Kroah-Hartman, Heiner Kallweit,
Johan Hovold, Bjorn Helgaas, stable, regressions
On Wed, Jan 24, 2024 at 09:16:38AM +0100, Johan Hovold wrote:
> On Tue, Jan 23, 2024 at 04:36:48PM -0600, Bjorn Helgaas wrote:
> > I don't quite follow. By simply reverting, do you mean to revert
> > f93e71aea6c6 ("Revert "PCI/ASPM: Remove
> > pcie_aspm_pm_state_change()"")? IIUC that would break Michael's
> > machine again.
>
> Right, at least until that issue is fully understood and alternative
> fixes have been considered.
>
> If that's not an option, we need to rework core to pass a flag through
> more than one layer to indicate whether pcie_aspm_pm_state_change()
> should take the bus semaphore or not. I'd rather not do that if it can
> be avoided.
As a revert appears unlikely to happen, let's fix the regression by
adding a new helper pci_set_power_state_locked() that can be called
with the bus lock held:
https://lore.kernel.org/lkml/20240130100243.11011-1-johan+linaro@kernel.org/
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-01-30 10:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de>
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
2024-01-02 23:33 ` Kuppuswamy Sathyanarayanan
2024-01-03 0:12 ` Bjorn Helgaas
2024-01-08 8:39 ` Johan Hovold
2024-01-22 10:53 ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
2024-01-22 18:26 ` Bjorn Helgaas
2024-01-23 17:25 ` Johan Hovold
2024-01-23 22:36 ` Bjorn Helgaas
2024-01-24 8:16 ` Johan Hovold
2024-01-30 10:07 ` Johan Hovold
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).