From: Bjorn Helgaas <helgaas@kernel.org>
To: Johan Hovold <johan@kernel.org>
Cc: Michael Schaller <michael@5challer.de>,
Kai-Heng Feng <kai.heng.feng@canonical.com>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
regressions@lists.linux.dev,
"Maciej W . Rozycki" <macro@orcam.me.uk>,
Ajay Agarwal <ajayagarwal@google.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Heiner Kallweit <hkallweit1@gmail.com>,
Johan Hovold <johan+linaro@kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
stable@vger.kernel.org, regressions@leemhuis.info
Subject: Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
Date: Mon, 22 Jan 2024 12:26:15 -0600 [thread overview]
Message-ID: <20240122182615.GA277100@bhelgaas> (raw)
In-Reply-To: <Za5JLxRC-K20sIfG@hovoldconsulting.com>
On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
> Hi Bjorn,
>
> I never got a reply to this one so resending with updated Subject in
> case it got buried in your inbox.
I did see it but decided it was better to fix the problem with resume
causing an unintended reboot, even though fixing that meant breaking
lockdep again, since I don't think we have user reports of the
potential deadlock lockdep finds.
08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
start at fixing other problems and also improving the ASPM style, so I
hope somebody steps up to fix both it and the lockdep issue. I
haven't looked at it enough to have a preference for *how* to fix it.
Bjorn
> On Mon, Jan 08, 2024 at 09:39:07AM +0100, Johan Hovold wrote:
>
> > On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> > > From: Bjorn Helgaas <bhelgaas@google.com>
> > >
> > > This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
> > >
> > > Michael reported that when attempting to resume from suspend to RAM on ASUS
> > > mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> > > ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> > > with no output, followed by a reboot.
> > >
> > > Workarounds include:
> > >
> > > - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > > - Booting with "pcie_aspm=off"
> > > - Booting with "pcie_aspm.policy=performance"
> > > - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> > > before suspending
> > > - Connecting a USB flash drive
> > >
> > > Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > > Reported-by: Michael Schaller <michael@5challer.de>
> > > Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: <stable@vger.kernel.org>
> > > ---
> >
> > > +/* @pdev: the root port or switch downstream port */
> > > +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> > > +{
> > > + struct pcie_link_state *link = pdev->link_state;
> > > +
> > > + if (aspm_disabled || !link)
> > > + return;
> > > + /*
> > > + * Devices changed PM state, we should recheck if latency
> > > + * meets all functions' requirement
> > > + */
> > > + down_read(&pci_bus_sem);
> > > + mutex_lock(&aspm_lock);
> > > + pcie_update_aspm_capable(link->root);
> > > + pcie_config_aspm_path(link);
> > > + mutex_unlock(&aspm_lock);
> > > + up_read(&pci_bus_sem);
> > > +}
> >
> > This function is now restored in 6.7 final and is called in paths which
> > already hold the pci_bus_sem as reported by lockdep (see splat below).
> >
> > This can potentially lead to a deadlock and specifically prevents using
> > lockdep on Qualcomm platforms.
> >
> > Not sure if you want to propagate whether the bus semaphore is held to
> > pcie_aspm_pm_state_change() or if there was some alternative to
> > restoring this function which should be explored instead.
>
> So to summarise, this patch, which is now commit
>
> f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
>
> introduced a regression in 6.7-final for Qualcomm platforms (and some
> Intel platforms) similar to the one recently fixed by commit
>
> f352ce999260 ("PCI: qcom: Fix potential deadlock when enabling ASPM").
>
> Johan
>
>
> #regzbot introduced: f93e71aea6c6
>
> > ============================================
> > WARNING: possible recursive locking detected
> > 6.7.0 #40 Not tainted
> > --------------------------------------------
> > kworker/u16:5/90 is trying to acquire lock:
> > ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
> > pcieport 0002:00:00.0: PME: Signaling with IRQ 197
> >
> > but task is already holding lock:
> > ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(pci_bus_sem);
> > lock(pci_bus_sem);
> >
> > *** DEADLOCK ***
> >
> > May be due to missing lock nesting notation
> >
> > 4 locks held by kworker/u16:5/90:
> > #0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> > #1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> > #2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
> > #3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
> >
> > stack backtrace:
> > CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
> > Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
> > Workqueue: events_unbound async_run_entry_fn
> > Call trace:
> > dump_backtrace+0x9c/0x11c
> > show_stack+0x18/0x24
> > dump_stack_lvl+0x60/0xac
> > dump_stack+0x18/0x24
> > print_deadlock_bug+0x25c/0x348
> > __lock_acquire+0x10a4/0x2064
> > lock_acquire+0x1e8/0x318
> > down_read+0x60/0x184
> > pcie_aspm_pm_state_change+0x58/0xdc
> > pci_set_full_power_state+0xa8/0x114
> > pci_set_power_state+0xc4/0x120
> > qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
> > pci_walk_bus+0x64/0xbc
> > qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
next prev parent reply other threads:[~2024-01-22 18:26 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-25 18:29 [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found) Michael Schaller
2023-12-29 0:26 ` Bjorn Helgaas
2023-12-29 10:31 ` Michael Schaller
2024-01-01 18:13 ` Bjorn Helgaas
2024-01-01 18:57 ` Michael Schaller
2024-01-01 22:15 ` Bjorn Helgaas
2024-01-02 13:50 ` Michael Schaller
2024-01-03 8:21 ` Linux regression tracking (Thorsten Leemhuis)
2024-01-05 3:25 ` Kai-Heng Feng
2024-01-05 11:18 ` Michael Schaller
2024-01-05 15:51 ` Bjorn Helgaas
2024-01-10 3:43 ` Kai-Heng Feng
2024-01-10 12:39 ` Michael Schaller
2024-03-07 6:51 ` Kai-Heng Feng
2024-03-08 15:49 ` michael
2024-03-08 16:40 ` Bjorn Helgaas
2024-01-03 15:41 ` Ilpo Järvinen
2024-01-05 3:14 ` Kai-Heng Feng
2024-01-05 10:29 ` Ilpo Järvinen
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
2024-01-02 23:33 ` Kuppuswamy Sathyanarayanan
2024-01-03 0:12 ` Bjorn Helgaas
2024-01-08 8:39 ` Johan Hovold
2024-01-22 10:53 ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
2024-01-22 18:26 ` Bjorn Helgaas [this message]
2024-01-23 17:25 ` Johan Hovold
2024-01-23 22:36 ` Bjorn Helgaas
2024-01-24 8:16 ` Johan Hovold
2024-01-30 10:07 ` Johan Hovold
2024-02-09 12:45 ` PCI/ASPM locking regression in 6.7-final Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240122182615.GA277100@bhelgaas \
--to=helgaas@kernel.org \
--cc=ajayagarwal@google.com \
--cc=bhelgaas@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=hkallweit1@gmail.com \
--cc=johan+linaro@kernel.org \
--cc=johan@kernel.org \
--cc=kai.heng.feng@canonical.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=macro@orcam.me.uk \
--cc=michael@5challer.de \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox