From: Kalle Valo <kvalo@kernel.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
x86@kernel.org, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, regressions@lists.linux.dev,
Jeff Johnson <quic_jjohnson@quicinc.com>
Subject: Re: [regression] suspend stress test stalls within 30 minutes
Date: Fri, 17 May 2024 21:37:44 +0300 [thread overview]
Message-ID: <871q60ffnr.fsf@kernel.org> (raw)
In-Reply-To: <35086bb6-ee11-4ac6-b8ba-5fab20065b54@intel.com> (Dave Hansen's message of "Fri, 17 May 2024 10:22:32 -0700")
Dave Hansen <dave.hansen@intel.com> writes:
> On 5/17/24 10:15, Kalle Valo wrote:
>> Borislav Petkov <bp@alien8.de> writes:
>>> There might be some #GP or so in the logs in case we've managed to f*ck
>>> up microcode application which emulates that IBRS MSR bit and the
>>> actual toggling or so when suspending...
>> So the weird part is that when the bug happens (ie. suspend stalls) I
>> can access the box normally using ssh and I don't see anything special
>> in dmesg. Below is a full copy of dmesg output after the suspend
>> stalled. Do note that I copied this dmesg before I updated microcode so
>> it will still show the old microcode version.
>>
>> Let me know if you need more info.
>
> Kalle, could you remind us what we're seeing here? Does this show 30
> working rtcwake tests followed by a failure at "rtcwake test 31" where
> the system failed to suspend?
Correct. So basically what I do is that I start the nuc box, ssh into it
and run:
sudo su
for i in {1..400}; do echo "rtcwake test $i" > /dev/kmsg; rtcwake -m mem -s 10; sleep 10; done
Here's the start of first loop:
[ 54.945105] rtcwake test 1
[ 55.162603] PM: suspend entry (deep)
[ 55.168875] Filesystems sync: 0.006 seconds
[ 55.182427] Freezing user space processes
[ 55.191498] Freezing user space processes completed (elapsed 0.008 seconds)
[ 55.191711] OOM killer disabled.
[ 55.191805] Freezing remaining freezable tasks
[ 55.193507] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 55.194056] printk: Suspending console(s) (use no_console_suspend to debug)
[ 55.244962] e1000e: EEE TX LPI TIMER: 00000011
Now I leave the box to run it's test. I come back later to see that the
for loop has stalled and the box is not going into suspend gain. I ssh
into the machine and see this in dmesg:
[ 449.061525] rtcwake test 31
[ 449.176854] PM: suspend entry (deep)
[ 449.179072] Filesystems sync: 0.002 seconds
[ 632.961545] loop0: detected capacity change from 0 to 8
[ 637.003835] gpu-manager (6735) used greatest stack depth: 23808 bytes left
[ 738.799026] kworker/dying (87) used greatest stack depth: 23488 bytes left
[ 932.951032] loop0: detected capacity change from 0 to 8
[ 1232.962610] loop0: detected capacity change from 0 to 8
The system tried go into suspend but after the "Filesystems sync:"
message nothing happened for 10 minutes. I assume loop0 messages are
from some Ubuntu daemon, maybe snapd or similar. I have seen them
always, they are not specific to this issue.
And now comes the really strange part: if I run 'shutdown -h now' the
suspend continues apparently normally. Afterwards I checked from
/var/log/kern.log and didn't see any errors:
May 17 13:34:38 nuc2 kernel: [ 449.176854] PM: suspend entry (deep)
May 17 13:34:38 nuc2 kernel: [ 449.179072] Filesystems sync: 0.002 seconds
May 17 13:37:42 nuc2 kernel: [ 632.961545] loop0: detected capacity change from 0 to 8
May 17 13:37:46 nuc2 kernel: [ 637.003835] gpu-manager (6735) used greatest stack depth: 23808 bytes left
May 17 13:39:28 nuc2 kernel: [ 738.799026] kworker/dying (87) used greatest stack depth: 23488 bytes left
May 17 13:42:42 nuc2 kernel: [ 932.951032] loop0: detected capacity change from 0 to 8
May 17 13:47:42 nuc2 kernel: [ 1232.962610] loop0: detected capacity change from 0 to 8
May 17 13:52:45 nuc2 kernel: [ 1527.307800] Freezing user space processes
May 17 13:52:45 nuc2 kernel: [ 1527.334585] Freezing user space processes completed (elapsed 0.024 seconds)
May 17 13:52:45 nuc2 kernel: [ 1527.336094] OOM killer disabled.
May 17 13:52:45 nuc2 kernel: [ 1527.337562] Freezing remaining freezable tasks
May 17 13:52:45 nuc2 kernel: [ 1527.340324] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
May 17 13:52:45 nuc2 kernel: [ 1527.342596] printk: Suspending console(s) (use no_console_suspend to debug)
May 17 13:52:45 nuc2 kernel: [ 1527.380121] e1000e: EEE TX LPI TIMER: 00000011
May 17 13:52:45 nuc2 kernel: [ 1527.474981] ACPI: EC: interrupt blocked
May 17 13:52:45 nuc2 kernel: [ 1527.540696] ACPI: PM: Preparing to enter system sleep state S3
May 17 13:52:45 nuc2 kernel: [ 1527.567302] ACPI: EC: event blocked
May 17 13:52:45 nuc2 kernel: [ 1527.567307] ACPI: EC: EC stopped
May 17 13:52:45 nuc2 kernel: [ 1527.567311] ACPI: PM: Saving platform NVS memory
May 17 13:52:45 nuc2 kernel: [ 1527.567412] Disabling non-boot CPUs ...
While writing this email I found another way to continue the suspend
after a stall: terminate rtcwake with CTRL-C in the ssh session running
the for loop. That explains why 'sudo shutdown -h now' makes the suspend
go forward, it most likely kills the stalled rtcwake process.
next prev parent reply other threads:[~2024-05-17 18:37 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-11 18:22 [regression] suspend stress test stalls within 30 minutes Kalle Valo
2024-05-11 18:48 ` Borislav Petkov
2024-05-11 18:49 ` Borislav Petkov
2024-05-11 20:26 ` Kalle Valo
2024-05-13 19:58 ` Kalle Valo
2024-05-14 13:17 ` Kalle Valo
2024-05-14 16:05 ` Borislav Petkov
2024-05-14 17:36 ` Pawan Gupta
2024-05-17 17:15 ` Kalle Valo
2024-05-17 17:22 ` Dave Hansen
2024-05-17 18:37 ` Kalle Valo [this message]
2024-05-17 18:48 ` Dave Hansen
2024-05-17 18:58 ` Kalle Valo
2024-05-17 19:08 ` Rafael J. Wysocki
2024-05-17 19:00 ` Rafael J. Wysocki
2024-05-22 1:52 ` Len Brown
2024-05-17 17:26 ` Borislav Petkov
2024-05-17 18:22 ` Kalle Valo
2024-05-14 16:10 ` Dave Hansen
2024-05-15 7:22 ` Pawan Gupta
2024-05-15 7:44 ` Borislav Petkov
2024-05-15 16:27 ` Pawan Gupta
2024-05-15 16:47 ` Kalle Valo
2024-05-16 7:03 ` Pawan Gupta
2024-05-16 14:25 ` Pawan Gupta
2024-05-16 14:32 ` Dave Hansen
2024-05-16 15:41 ` Pawan Gupta
2024-05-17 17:41 ` Kalle Valo
2024-05-17 18:31 ` Pawan Gupta
2024-05-17 17:23 ` Kalle Valo
2024-05-17 17:19 ` Kalle Valo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871q60ffnr.fsf@kernel.org \
--to=kvalo@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=quic_jjohnson@quicinc.com \
--cc=rafael@kernel.org \
--cc=regressions@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).