Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Michael Schaller <michael@5challer.de>
Cc: bhelgaas@google.com, kai.heng.feng@canonical.com,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	regressions@lists.linux.dev, macro@orcam.me.uk,
	ajayagarwal@google.com,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	gregkh@linuxfoundation.org, hkallweit1@gmail.com,
	michael.a.bottini@linux.intel.com, johan+linaro@kernel.org,
	"David E. Box" <david.e.box@linux.intel.com>
Subject: Re: [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found)
Date: Thu, 28 Dec 2023 18:26:23 -0600	[thread overview]
Message-ID: <20231229002623.GA1560896@bhelgaas> (raw)
In-Reply-To: <76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de>

[+cc David (more details at
https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de)]

Hi Michael, thank you very much for debugging and reporting this!
Sorry for the major inconvenience.

On Mon, Dec 25, 2023 at 07:29:02PM +0100, Michael Schaller wrote:
> Issue:
> On resume from suspend to RAM there is no output for about 12 seconds, then
> shortly a blinking cursor is visible in the upper left corner on an
> otherwise black screen which is followed by a reboot.
> 
> Setup:
> * Machine: ASUS mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1)
> * Firmware: 0508 (latest; also tested previous 0505)
> * OS: Ubuntu 23.10 (except kernel)
> * Kernel: 6.6.8 (also tested 6.7-rc7; config attached)
> 
> Debugging summary:
> * Kernel 5.10.205 isn’t affected.
> * Bisect identified commit 08d0cc5f34265d1a1e3031f319f594bd1970976c as
> cause.

#regzbot introduced: 08d0cc5f3426^

> * PCI device 0000:03:00.0 (Intel 8265 Wifi) causes resume issues as long as
> ASPM is enabled (default).
> * The commit message indicates that a quirk could be written to mitigate the
> issue but I don’t know how to write such a quirk.
> 
> Confirmed workarounds:
> * Connect a USB flash drive (no clue why; maybe this causes a delay that
> lets the resume succeed)
> * Revert commit 08d0cc5f34265d1a1e3031f319f594bd1970976c (commit seemed
> intentional; a quirk seems to be the preferred solution)
> * pcie_aspm=off
> * pcie_aspm.policy=performance
> * echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm
> 
> Debugging details:
> * The resume trigger (power button, keyboard, mouse) doesn’t seem to make
> any difference.
> * Double checked that the kernel is configured to *not* reboot on panic.
> * Double checked that there still isn't any kernel output without quiet and
> splash.
> * The issue doesn’t happen if a USB flash drive is connected. The content of
> the flash drive doesn’t appear to matter. The USB port doesn’t appear to
> matter.
> * No information in any logs after the reboot. I suspect the resume from
> suspend to RAM isn’t getting far enough as that logs could be written.
> * Kernel 5.10.205 isn’t affected. Kernel 5.15.145, 6.6.8 and 6.7-rc7 are
> affected.
> * A kernel bisect has revealed the following commit as cause:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=08d0cc5f34265d1a1e3031f319f594bd1970976c
> * The commit was part of kernel 5.20 and has been backported to 5.15.
> * The commit mentions that a device-specific quirk could be added in case of
> new issues.
> * According to sysfs and lspci only device 0000:03:00.0 (Intel 8265 Wifi)
> has ASPM enabled by default.
> * Disabling ASPM for device 0000:03:00.0 lets the resume from suspend to RAM
> succeed.
> * Enabling ASPM for all devices except 0000:03:00.0 lets the resume from
> suspend to RAM succeed.
> * This would indicate that a quirk is missing for the device 0000:03:00.0
> (Intel 8265 Wifi) but I have no clue how to write such a quirk or how to get
> the specifics for such a quirk.
> * I still have no clue how a USB flash drive plays into all this. Maybe some
> kind of a timing issue where the connected USB flash drive delays something
> long enough so that the resume succeeds. Maybe the code removed by commit
> 08d0cc5f34265d1a1e3031f319f594bd1970976c caused a similar delay. ¯\_(ツ)_/¯

We have some known issues with saving and restoring ASPM state on
suspend/resume, in particular with ASPM L1 Substates, which are
enabled on this device.

David Box has a patch in the works that should fix one of those
issues:
https://lore.kernel.org/r/20231221011250.191599-1-david.e.box@linux.intel.com

It's not merged yet, but it's possible it might fix or at least be
related to this.  If you try it out, please let us know what happens.

> 03:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
> 	Capabilities: [40] Express (v2) Endpoint, MSI 00
> 		LnkCap:	Port #6, Speed 2.5GT/s, Width x1, ASPM L1, Exit Latency L1 <8us
> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> 			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> 	Capabilities: [14c v1] Latency Tolerance Reporting
> 		Max snoop latency: 1048576ns
> 		Max no snoop latency: 1048576ns
> 	Capabilities: [154 v1] L1 PM Substates
> 		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> 			  PortCommonModeRestoreTime=30us PortTPowerOnTime=18us
> 		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> 			   T_CommonMode=0us LTR1.2_Threshold=186368ns
> 		L1SubCtl2: T_PwrOn=150us

> $ grep -F '' /sys/bus/pci/devices/*/link/*pm
> /sys/bus/pci/devices/0000:03:00.0/link/clkpm:1
> /sys/bus/pci/devices/0000:03:00.0/link/l1_1_aspm:1
> /sys/bus/pci/devices/0000:03:00.0/link/l1_1_pcipm:1
> /sys/bus/pci/devices/0000:03:00.0/link/l1_2_aspm:1
> /sys/bus/pci/devices/0000:03:00.0/link/l1_2_pcipm:1
> /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm:1
> /sys/bus/pci/devices/0000:04:00.0/link/clkpm:0
> ...

  reply	other threads:[~2023-12-29  0:26 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-25 18:29 [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found) Michael Schaller
2023-12-29  0:26 ` Bjorn Helgaas [this message]
2023-12-29 10:31   ` Michael Schaller
2024-01-01 18:13 ` Bjorn Helgaas
2024-01-01 18:57   ` Michael Schaller
2024-01-01 22:15     ` Bjorn Helgaas
2024-01-02 13:50       ` Michael Schaller
2024-01-03  8:21         ` Linux regression tracking (Thorsten Leemhuis)
2024-01-05  3:25     ` Kai-Heng Feng
2024-01-05 11:18       ` Michael Schaller
2024-01-05 15:51         ` Bjorn Helgaas
2024-01-10  3:43           ` Kai-Heng Feng
2024-01-10 12:39             ` Michael Schaller
2024-03-07  6:51               ` Kai-Heng Feng
2024-03-08 15:49                 ` michael
2024-03-08 16:40                 ` Bjorn Helgaas
2024-01-03 15:41   ` Ilpo Järvinen
2024-01-05  3:14     ` Kai-Heng Feng
2024-01-05 10:29       ` Ilpo Järvinen
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
2024-01-02 23:33   ` Kuppuswamy Sathyanarayanan
2024-01-03  0:12     ` Bjorn Helgaas
2024-01-08  8:39   ` Johan Hovold
2024-01-22 10:53     ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
2024-01-22 18:26       ` Bjorn Helgaas
2024-01-23 17:25         ` Johan Hovold
2024-01-23 22:36           ` Bjorn Helgaas
2024-01-24  8:16             ` Johan Hovold
2024-01-30 10:07               ` Johan Hovold
2024-02-09 12:45       ` PCI/ASPM locking regression in 6.7-final Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231229002623.GA1560896@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=ajayagarwal@google.com \
    --cc=bhelgaas@google.com \
    --cc=david.e.box@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hkallweit1@gmail.com \
    --cc=johan+linaro@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=macro@orcam.me.uk \
    --cc=michael.a.bottini@linux.intel.com \
    --cc=michael@5challer.de \
    --cc=regressions@lists.linux.dev \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox