From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Michael Schaller <michael@5challer.de>,
bhelgaas@google.com, kai.heng.feng@canonical.com,
linux-pci@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
regressions@lists.linux.dev, macro@orcam.me.uk,
ajayagarwal@google.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
hkallweit1@gmail.com, michael.a.bottini@linux.intel.com,
johan+linaro@kernel.org
Subject: Re: [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found)
Date: Wed, 3 Jan 2024 17:41:10 +0200 (EET) [thread overview]
Message-ID: <d7e7b133-d373-e850-1f5f-deee8aa86958@linux.intel.com> (raw)
In-Reply-To: <20240101181348.GA1684058@bhelgaas>
[-- Attachment #1: Type: text/plain, Size: 4685 bytes --]
On Mon, 1 Jan 2024, Bjorn Helgaas wrote:
> On Mon, Dec 25, 2023 at 07:29:02PM +0100, Michael Schaller wrote:
> > Issue:
> > On resume from suspend to RAM there is no output for about 12 seconds, then
> > shortly a blinking cursor is visible in the upper left corner on an
> > otherwise black screen which is followed by a reboot.
> >
> > Setup:
> > * Machine: ASUS mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1)
> > * Firmware: 0508 (latest; also tested previous 0505)
> > * OS: Ubuntu 23.10 (except kernel)
> > * Kernel: 6.6.8 (also tested 6.7-rc7; config attached)
> >
> > Debugging summary:
> > * Kernel 5.10.205 isn’t affected.
> > * Bisect identified commit 08d0cc5f34265d1a1e3031f319f594bd1970976c as
> > cause.
> > * PCI device 0000:03:00.0 (Intel 8265 Wifi) causes resume issues as long as
> > ASPM is enabled (default).
> > * The commit message indicates that a quirk could be written to mitigate the
> > issue but I don’t know how to write such a quirk.
> >
> > Confirmed workarounds:
> > * Connect a USB flash drive (no clue why; maybe this causes a delay that
> > lets the resume succeed)
> > * Revert commit 08d0cc5f34265d1a1e3031f319f594bd1970976c (commit seemed
> > intentional; a quirk seems to be the preferred solution)
> > * pcie_aspm=off
> > * pcie_aspm.policy=performance
> > * echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm
> >
> > Debugging details:
> > * The resume trigger (power button, keyboard, mouse) doesn’t seem to make
> > any difference.
> > * Double checked that the kernel is configured to *not* reboot on panic.
> > * Double checked that there still isn't any kernel output without quiet and
> > splash.
> > * The issue doesn’t happen if a USB flash drive is connected. The content of
> > the flash drive doesn’t appear to matter. The USB port doesn’t appear to
> > matter.
> > * No information in any logs after the reboot. I suspect the resume from
> > suspend to RAM isn’t getting far enough as that logs could be written.
> > * Kernel 5.10.205 isn’t affected. Kernel 5.15.145, 6.6.8 and 6.7-rc7 are
> > affected.
> > * A kernel bisect has revealed the following commit as cause:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=08d0cc5f34265d1a1e3031f319f594bd1970976c
> > * The commit was part of kernel 5.20 and has been backported to 5.15.
> > * The commit mentions that a device-specific quirk could be added in case of
> > new issues.
> > * According to sysfs and lspci only device 0000:03:00.0 (Intel 8265 Wifi)
> > has ASPM enabled by default.
> > * Disabling ASPM for device 0000:03:00.0 lets the resume from suspend to RAM
> > succeed.
> > * Enabling ASPM for all devices except 0000:03:00.0 lets the resume from
> > suspend to RAM succeed.
> > * This would indicate that a quirk is missing for the device 0000:03:00.0
> > (Intel 8265 Wifi) but I have no clue how to write such a quirk or how to get
> > the specifics for such a quirk.
> > * I still have no clue how a USB flash drive plays into all this. Maybe some
> > kind of a timing issue where the connected USB flash drive delays something
> > long enough so that the resume succeeds. Maybe the code removed by commit
> > 08d0cc5f34265d1a1e3031f319f594bd1970976c caused a similar delay. ¯\_(ツ)_/¯
>
> Hmmm. 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
> appeared in v6.0, released Oct 2, 2022, so it's been there a while.
>
> But I think the best option is to revert it until this issue is
> resolved. Per the commit log, 08d0cc5f3426 solved two problems:
>
> 1) ASPM config changes done via sysfs are lost if the device power
> state is changed, e.g., typically set to D3hot in .suspend() and
> D0 in .resume().
>
> 2) If L1SS is restored during system resume, that restored state
> would be overwritten.
>
> Problem 2) relates to a patch that is currently reverted (a7152be79b62
> ("Revert "PCI/ASPM: Save L1 PM Substates Capability for
> suspend/resume""), so I don't think reverting 08d0cc5f3426 will make
> this problem worse.
>
> Reverting 08d0cc5f3426 will make 1) a problem again. But my guess is
> ASPM changes via sysfs are fairly unusual and the device probably
> remains functional even though it may use more power because the ASPM
> configuration was lost.
>
> So unless somebody has a counter-argument, I plan to queue a revert of
> 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") for
> v6.7.
Hi,
I cannot understand how 1) even occurs. AFAICT, nothing
pcie_aspm_pm_state_change() calls into overwrites link->aspm_disable that
is the variable storing user inputs via sysfs. So how the changes via
sysfs are lost?
--
i.
next prev parent reply other threads:[~2024-01-03 15:41 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-25 18:29 [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found) Michael Schaller
2023-12-29 0:26 ` Bjorn Helgaas
2023-12-29 10:31 ` Michael Schaller
2024-01-01 18:13 ` Bjorn Helgaas
2024-01-01 18:57 ` Michael Schaller
2024-01-01 22:15 ` Bjorn Helgaas
2024-01-02 13:50 ` Michael Schaller
2024-01-03 8:21 ` Linux regression tracking (Thorsten Leemhuis)
2024-01-05 3:25 ` Kai-Heng Feng
2024-01-05 11:18 ` Michael Schaller
2024-01-05 15:51 ` Bjorn Helgaas
2024-01-10 3:43 ` Kai-Heng Feng
2024-01-10 12:39 ` Michael Schaller
2024-03-07 6:51 ` Kai-Heng Feng
2024-03-08 15:49 ` michael
2024-03-08 16:40 ` Bjorn Helgaas
2024-01-03 15:41 ` Ilpo Järvinen [this message]
2024-01-05 3:14 ` Kai-Heng Feng
2024-01-05 10:29 ` Ilpo Järvinen
2024-01-02 23:25 ` [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()" Bjorn Helgaas
2024-01-02 23:33 ` Kuppuswamy Sathyanarayanan
2024-01-03 0:12 ` Bjorn Helgaas
2024-01-08 8:39 ` Johan Hovold
2024-01-22 10:53 ` PCI/ASPM locking regression in 6.7-final (was: Re: [PATCH] Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()") Johan Hovold
2024-01-22 18:26 ` Bjorn Helgaas
2024-01-23 17:25 ` Johan Hovold
2024-01-23 22:36 ` Bjorn Helgaas
2024-01-24 8:16 ` Johan Hovold
2024-01-30 10:07 ` Johan Hovold
2024-02-09 12:45 ` PCI/ASPM locking regression in 6.7-final Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d7e7b133-d373-e850-1f5f-deee8aa86958@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=ajayagarwal@google.com \
--cc=bhelgaas@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=helgaas@kernel.org \
--cc=hkallweit1@gmail.com \
--cc=johan+linaro@kernel.org \
--cc=kai.heng.feng@canonical.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=macro@orcam.me.uk \
--cc=michael.a.bottini@linux.intel.com \
--cc=michael@5challer.de \
--cc=regressions@lists.linux.dev \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox