Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Johan Hovold <johan@kernel.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Michael Schaller <michael@5challer.de>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	regressions@lists.linux.dev,
	"Maciej W . Rozycki" <macro@orcam.me.uk>,
	Ajay Agarwal <ajayagarwal@google.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Johan Hovold <johan+linaro@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
Date: Mon, 8 Jan 2024 09:39:07 +0100	[thread overview]
Message-ID: <ZZu0qx2cmn7IwTyQ@hovoldconsulting.com> (raw)
In-Reply-To: <20240102232550.1751655-1-helgaas@kernel.org>

Hi Bjorn,

On Tue, Jan 02, 2024 at 05:25:50PM -0600, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
> 
> Michael reported that when attempting to resume from suspend to RAM on ASUS
> mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
> ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
> with no output, followed by a reboot.
> 
> Workarounds include:
> 
>   - Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
>   - Booting with "pcie_aspm=off"
>   - Booting with "pcie_aspm.policy=performance"
>   - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
>     before suspending
>   - Connecting a USB flash drive
> 
> Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> Reported-by: Michael Schaller <michael@5challer.de>
> Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Cc: <stable@vger.kernel.org>
> ---
 
> +/* @pdev: the root port or switch downstream port */
> +void pcie_aspm_pm_state_change(struct pci_dev *pdev)
> +{
> +	struct pcie_link_state *link = pdev->link_state;
> +
> +	if (aspm_disabled || !link)
> +		return;
> +	/*
> +	 * Devices changed PM state, we should recheck if latency
> +	 * meets all functions' requirement
> +	 */
> +	down_read(&pci_bus_sem);
> +	mutex_lock(&aspm_lock);
> +	pcie_update_aspm_capable(link->root);
> +	pcie_config_aspm_path(link);
> +	mutex_unlock(&aspm_lock);
> +	up_read(&pci_bus_sem);
> +}

This function is now restored in 6.7 final and is called in paths which
already hold the pci_bus_sem as reported by lockdep (see splat below).

This can potentially lead to a deadlock and specifically prevents using
lockdep on Qualcomm platforms.

Not sure if you want to propagate whether the bus semaphore is held to
pcie_aspm_pm_state_change() or if there was some alternative to
restoring this function which should be explored instead.

Johan


   ============================================
   WARNING: possible recursive locking detected
   6.7.0 #40 Not tainted
   --------------------------------------------
   kworker/u16:5/90 is trying to acquire lock:
   ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
   pcieport 0002:00:00.0: PME: Signaling with IRQ 197
   
               but task is already holding lock:
   ffffacfa78ced000
   pcieport 0002:00:00.0: AER: enabled with IRQ 197
    (pci_bus_sem
   nvme nvme0: pci function 0002:01:00.0
   ){++++}-{3:3}
   nvme 0002:01:00.0: enabling device (0000 -> 0002)
   , at: pci_walk_bus+0x34/0xbc
   
               other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(pci_bus_sem);
     lock(pci_bus_sem);
   
                *** DEADLOCK ***

    May be due to missing lock nesting notation

   4 locks held by kworker/u16:5/90:
    #0: ffff06c5c0008d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
    #1: ffff800081c0bdd0 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x53c
    #2: ffff06c5c0b7d0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach_async_helper+0x3c/0xf4
    #3: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
   
               stack backtrace:
   CPU: 1 PID: 90 Comm: kworker/u16:5 Not tainted 6.7.0 #40
   Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
   Workqueue: events_unbound async_run_entry_fn
   Call trace:
    dump_backtrace+0x9c/0x11c
    show_stack+0x18/0x24
    dump_stack_lvl+0x60/0xac
    dump_stack+0x18/0x24
    print_deadlock_bug+0x25c/0x348
    __lock_acquire+0x10a4/0x2064
    lock_acquire+0x1e8/0x318
    down_read+0x60/0x184
    pcie_aspm_pm_state_change+0x58/0xdc
    pci_set_full_power_state+0xa8/0x114
    pci_set_power_state+0xc4/0x120
    qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
    pci_walk_bus+0x64/0xbc
    qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]

  parent reply	other threads:[~2024-01-08  8:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-25 18:29 [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found) Michael Schaller
2023-12-29  0:26 ` Bjorn Helgaas
2023-12-29 10:31   ` Michael Schaller
2024-01-01 18:13 ` Bjorn Helgaas
2024-01-01 18:57   ` Michael Schaller
2024-01-01 22:15     ` Bjorn Helgaas
2024-01-02 13:50       ` Michael Schaller
2024-01-03  8:21         ` Linux regression tracking (Thorsten Leemhuis)
2024-01-05  3:25     ` Kai-Heng Feng
2024-01-05 11:18       ` Michael Schaller
2024-01-05 15:51         ` Bjorn Helgaas
2024-01-10  3:43           ` Kai-Heng Feng
2024-01-10 12:39             ` Michael Schaller
2024-03-07  6:51               ` Kai-Heng Feng
2024-03-08 15:49                 ` michael
2024-03-08 16:40                 ` Bjorn Helgaas
2024-01-03 15:41   ` Ilpo Järvinen
2024-01-05  3:14     ` Kai-Heng Feng
2024-01-05 10:29       ` Ilpo Järvinen
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
2024-01-02 23:33   ` Kuppuswamy Sathyanarayanan
2024-01-03  0:12     ` Bjorn Helgaas
2024-01-08  8:39   ` Johan Hovold [this message]
2024-01-22 10:53     ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
2024-01-22 18:26       ` Bjorn Helgaas
2024-01-23 17:25         ` Johan Hovold
2024-01-23 22:36           ` Bjorn Helgaas
2024-01-24  8:16             ` Johan Hovold
2024-01-30 10:07               ` Johan Hovold
2024-02-09 12:45       ` PCI/ASPM locking regression in 6.7-final Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZu0qx2cmn7IwTyQ@hovoldconsulting.com \
    --to=johan@kernel.org \
    --cc=ajayagarwal@google.com \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=hkallweit1@gmail.com \
    --cc=johan+linaro@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=macro@orcam.me.uk \
    --cc=michael@5challer.de \
    --cc=regressions@lists.linux.dev \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox