All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Johan Hovold <johan@kernel.org>
Cc: Michael Schaller <michael@5challer.de>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	regressions@lists.linux.dev,
	"Maciej W . Rozycki" <macro@orcam.me.uk>,
	Ajay Agarwal <ajayagarwal@google.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Johan Hovold <johan+linaro@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	stable@vger.kernel.org, regressions@leemhuis.info
Subject: Re: PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()")
Date: Mon, 22 Jan 2024 12:26:15 -0600	[thread overview]
Message-ID: <20240122182615.GA277100@bhelgaas> (raw)
In-Reply-To: <Za5JLxRC-K20sIfG@hovoldconsulting.com>

On Mon, Jan 22, 2024 at 11:53:35AM +0100, Johan Hovold wrote:
> Hi Bjorn,
> 
> I never got a reply to this one so resending with updated Subject in
> case it got buried in your inbox.

I did see it but decided it was better to fix the problem with resume
causing an unintended reboot, even though fixing that meant breaking
lockdep again, since I don't think we have user reports of the
potential deadlock lockdep finds.

08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") was a
start at fixing other problems and also improving the ASPM style, so I
hope somebody steps up to fix both it and the lockdep issue.  I
haven't looked at it enough to have a preference for *how* to fix it.

Bjorn

> On Mon, Jan 08, 2024 at 09:39:07AM +0100, Johan Hovold wrote:
>  
> > On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> > > From: Bjorn Helgaas <bhelgaas@google.com>
> > > 
> > > This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
> > > 
> > > Michael reported that when attempting to resume from suspend to RAM on ASUS
> > > mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> > > ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> > > with no output, followed by a reboot.
> > > 
> > > Workarounds include:
> > > 
> > >   - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > >   - Booting with "pcie_aspm=off"
> > >   - Booting with "pcie_aspm.policy=performance"
> > >   - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
> > >     before suspending
> > >   - Connecting a USB flash drive
> > > 
> > > Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> > > Reported-by: Michael Schaller <michael@5challer.de>
> > > Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: <stable@vger.kernel.org>
> > > ---
> >  
> > > +/* @pdev: the root port or switch downstream port */
> > > +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> > > +{
> > > +	struct pcie_link_state *link = pdev->link_state;
> > > +
> > > +	if (aspm_disabled || !link)
> > > +		return;
> > > +	/*
> > > +	 * Devices changed PM state, we should recheck if latency
> > > +	 * meets all functions' requirement
> > > +	 */
> > > +	down_read(&pci_bus_sem);
> > > +	mutex_lock(&aspm_lock);
> > > +	pcie_update_aspm_capable(link->root);
> > > +	pcie_config_aspm_path(link);
> > > +	mutex_unlock(&aspm_lock);
> > > +	up_read(&pci_bus_sem);
> > > +}
> > 
> > This function is now restored in 6.7 final and is called in paths which
> > already hold the pci_bus_sem as reported by lockdep (see splat below).
> > 
> > This can potentially lead to a deadlock and specifically prevents using
> > lockdep on Qualcomm platforms.
> > 
> > Not sure if you want to propagate whether the bus semaphore is held to
> > pcie_aspm_pm_state_change() or if there was some alternative to
> > restoring this function which should be explored instead.
> 
> So to summarise, this patch, which is now commit
> 
> 	f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
> 
> introduced a regression in 6.7-final for Qualcomm platforms (and some
> Intel platforms) similar to the one recently fixed by commit
> 
> 	f352ce999260 ("PCI: qcom: Fix potential deadlock when enabling ASPM").
> 
> Johan
> 
> 
> #regzbot introduced: f93e71aea6c6
> 
> >    ============================================
> >    WARNING: possible recursive locking detected
> >    6.7.0 #40 Not tainted
> >    --------------------------------------------
> >    kworker/u16:5/90 is trying to acquire lock:
> >    ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
> >    pcieport 0002:00:00.0: PME: Signaling with IRQ 197
> >    
> >                but task is already holding lock:
> >    ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
> >    
> >                other info that might help us debug this:
> >     Possible unsafe locking scenario:
> > 
> >           CPU0
> >           ----
> >      lock(pci_bus_sem);
> >      lock(pci_bus_sem);
> >    
> >                 *** DEADLOCK ***
> > 
> >     May be due to missing lock nesting notation
> > 
> >    4 locks held by kworker/u16:5/90:
> >     #0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> >     #1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
> >     #2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
> >     #3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
> >    
> >                stack backtrace:
> >    CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
> >    Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
> >    Workqueue: events_unbound async_run_entry_fn
> >    Call trace:
> >     dump_backtrace+0x9c/0x11c
> >     show_stack+0x18/0x24
> >     dump_stack_lvl+0x60/0xac
> >     dump_stack+0x18/0x24
> >     print_deadlock_bug+0x25c/0x348
> >     __lock_acquire+0x10a4/0x2064
> >     lock_acquire+0x1e8/0x318
> >     down_read+0x60/0x184
> >     pcie_aspm_pm_state_change+0x58/0xdc
> >     pci_set_full_power_state+0xa8/0x114
> >     pci_set_power_state+0xc4/0x120
> >     qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
> >     pci_walk_bus+0x64/0xbc
> >     qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]

  reply	other threads:[~2024-01-22 18:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-25 18:29 [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found) Michael Schaller
2023-12-29  0:26 ` Bjorn Helgaas
2023-12-29 10:31   ` Michael Schaller
2024-01-01 18:13 ` Bjorn Helgaas
2024-01-01 18:57   ` Michael Schaller
2024-01-01 22:15     ` Bjorn Helgaas
2024-01-02 13:50       ` Michael Schaller
2024-01-03  8:21         ` Linux regression tracking (Thorsten Leemhuis)
2024-01-05  3:25     ` Kai-Heng Feng
2024-01-05 11:18       ` Michael Schaller
2024-01-05 15:51         ` Bjorn Helgaas
2024-01-10  3:43           ` Kai-Heng Feng
2024-01-10 12:39             ` Michael Schaller
2024-03-07  6:51               ` Kai-Heng Feng
2024-03-08 15:49                 ` michael
2024-03-08 16:40                 ` Bjorn Helgaas
2024-01-03 15:41   ` Ilpo Järvinen
2024-01-05  3:14     ` Kai-Heng Feng
2024-01-05 10:29       ` Ilpo Järvinen
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
2024-01-02 23:33   ` Kuppuswamy Sathyanarayanan
2024-01-03  0:12     ` Bjorn Helgaas
2024-01-08  8:39   ` Johan Hovold
2024-01-22 10:53     ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
2024-01-22 18:26       ` Bjorn Helgaas [this message]
2024-01-23 17:25         ` Johan Hovold
2024-01-23 22:36           ` Bjorn Helgaas
2024-01-24  8:16             ` Johan Hovold
2024-01-30 10:07               ` Johan Hovold
2024-02-09 12:45       ` PCI/ASPM locking regression in 6.7-final Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240122182615.GA277100@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=ajayagarwal@google.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hkallweit1@gmail.com \
    --cc=johan+linaro@kernel.org \
    --cc=johan@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=macro@orcam.me.uk \
    --cc=michael@5challer.de \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.