All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@kernel.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>,
	 Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	 "Rafael J. Wysocki" <rafael@kernel.org>,
	x86@kernel.org,  linux-pm@vger.kernel.org,
	 linux-kernel@vger.kernel.org, regressions@lists.linux.dev,
	 Jeff Johnson <quic_jjohnson@quicinc.com>
Subject: Re: [regression] suspend stress test stalls within 30 minutes
Date: Fri, 17 May 2024 21:37:44 +0300	[thread overview]
Message-ID: <871q60ffnr.fsf@kernel.org> (raw)
In-Reply-To: <35086bb6-ee11-4ac6-b8ba-5fab20065b54@intel.com> (Dave Hansen's message of "Fri, 17 May 2024 10:22:32 -0700")

Dave Hansen <dave.hansen@intel.com> writes:

> On 5/17/24 10:15, Kalle Valo wrote:
>> Borislav Petkov <bp@alien8.de> writes:
>>> There might be some #GP or so in the logs in case we've managed to f*ck
>>> up microcode application which emulates that IBRS MSR bit and the
>>> actual toggling or so when suspending...
>> So the weird part is that when the bug happens (ie. suspend stalls) I
>> can access the box normally using ssh and I don't see anything special
>> in dmesg. Below is a full copy of dmesg output after the suspend
>> stalled. Do note that I copied this dmesg before I updated microcode so
>> it will still show the old microcode version.
>> 
>> Let me know if you need more info.
>
> Kalle, could you remind us what we're seeing here?  Does this show 30
> working rtcwake tests followed by a failure at "rtcwake test 31" where
> the system failed to suspend?

Correct. So basically what I do is that I start the nuc box, ssh into it
and run:

sudo su
for i in {1..400}; do echo "rtcwake test $i" > /dev/kmsg; rtcwake -m mem -s 10; sleep 10; done

Here's the start of first loop:

[   54.945105] rtcwake test 1
[   55.162603] PM: suspend entry (deep)
[   55.168875] Filesystems sync: 0.006 seconds
[   55.182427] Freezing user space processes
[   55.191498] Freezing user space processes completed (elapsed 0.008 seconds)
[   55.191711] OOM killer disabled.
[   55.191805] Freezing remaining freezable tasks
[   55.193507] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[   55.194056] printk: Suspending console(s) (use no_console_suspend to debug)
[   55.244962] e1000e: EEE TX LPI TIMER: 00000011

Now I leave the box to run it's test. I come back later to see that the
for loop has stalled and the box is not going into suspend gain. I ssh
into the machine and see this in dmesg:

[  449.061525] rtcwake test 31
[  449.176854] PM: suspend entry (deep)
[  449.179072] Filesystems sync: 0.002 seconds
[  632.961545] loop0: detected capacity change from 0 to 8
[  637.003835] gpu-manager (6735) used greatest stack depth: 23808 bytes left
[  738.799026] kworker/dying (87) used greatest stack depth: 23488 bytes left
[  932.951032] loop0: detected capacity change from 0 to 8
[ 1232.962610] loop0: detected capacity change from 0 to 8

The system tried go into suspend but after the "Filesystems sync:"
message nothing happened for 10 minutes. I assume loop0 messages are
from some Ubuntu daemon, maybe snapd or similar. I have seen them
always, they are not specific to this issue.

And now comes the really strange part: if I run  'shutdown -h now' the
suspend continues apparently normally. Afterwards I checked from
/var/log/kern.log and didn't see any errors:

May 17 13:34:38 nuc2 kernel: [  449.176854] PM: suspend entry (deep)
May 17 13:34:38 nuc2 kernel: [  449.179072] Filesystems sync: 0.002 seconds
May 17 13:37:42 nuc2 kernel: [  632.961545] loop0: detected capacity change from 0 to 8
May 17 13:37:46 nuc2 kernel: [  637.003835] gpu-manager (6735) used greatest stack depth: 23808 bytes left
May 17 13:39:28 nuc2 kernel: [  738.799026] kworker/dying (87) used greatest stack depth: 23488 bytes left
May 17 13:42:42 nuc2 kernel: [  932.951032] loop0: detected capacity change from 0 to 8
May 17 13:47:42 nuc2 kernel: [ 1232.962610] loop0: detected capacity change from 0 to 8
May 17 13:52:45 nuc2 kernel: [ 1527.307800] Freezing user space processes
May 17 13:52:45 nuc2 kernel: [ 1527.334585] Freezing user space processes completed (elapsed 0.024 seconds)
May 17 13:52:45 nuc2 kernel: [ 1527.336094] OOM killer disabled.
May 17 13:52:45 nuc2 kernel: [ 1527.337562] Freezing remaining freezable tasks
May 17 13:52:45 nuc2 kernel: [ 1527.340324] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
May 17 13:52:45 nuc2 kernel: [ 1527.342596] printk: Suspending console(s) (use no_console_suspend to debug)
May 17 13:52:45 nuc2 kernel: [ 1527.380121] e1000e: EEE TX LPI TIMER: 00000011
May 17 13:52:45 nuc2 kernel: [ 1527.474981] ACPI: EC: interrupt blocked
May 17 13:52:45 nuc2 kernel: [ 1527.540696] ACPI: PM: Preparing to enter system sleep state S3
May 17 13:52:45 nuc2 kernel: [ 1527.567302] ACPI: EC: event blocked
May 17 13:52:45 nuc2 kernel: [ 1527.567307] ACPI: EC: EC stopped
May 17 13:52:45 nuc2 kernel: [ 1527.567311] ACPI: PM: Saving platform NVS memory
May 17 13:52:45 nuc2 kernel: [ 1527.567412] Disabling non-boot CPUs ...

While writing this email I found another way to continue the suspend
after a stall: terminate rtcwake with CTRL-C in the ssh session running
the for loop. That explains why 'sudo shutdown -h now' makes the suspend
go forward, it most likely kills the stalled rtcwake process.

  reply	other threads:[~2024-05-17 18:37 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11 18:22 [regression] suspend stress test stalls within 30 minutes Kalle Valo
2024-05-11 18:48 ` Borislav Petkov
2024-05-11 18:49   ` Borislav Petkov
2024-05-11 20:26     ` Kalle Valo
2024-05-13 19:58       ` Kalle Valo
2024-05-14 13:17         ` Kalle Valo
2024-05-14 16:05           ` Borislav Petkov
2024-05-14 17:36             ` Pawan Gupta
2024-05-17 17:15             ` Kalle Valo
2024-05-17 17:22               ` Dave Hansen
2024-05-17 18:37                 ` Kalle Valo [this message]
2024-05-17 18:48                   ` Dave Hansen
2024-05-17 18:58                     ` Kalle Valo
2024-05-17 19:08                       ` Rafael J. Wysocki
2024-05-17 19:00                   ` Rafael J. Wysocki
2024-05-22  1:52                     ` Len Brown
2024-05-17 17:26               ` Borislav Petkov
2024-05-17 18:22                 ` Kalle Valo
2024-05-14 16:10           ` Dave Hansen
2024-05-15  7:22             ` Pawan Gupta
2024-05-15  7:44               ` Borislav Petkov
2024-05-15 16:27                 ` Pawan Gupta
2024-05-15 16:47                   ` Kalle Valo
2024-05-16  7:03                     ` Pawan Gupta
2024-05-16 14:25                       ` Pawan Gupta
2024-05-16 14:32                         ` Dave Hansen
2024-05-16 15:41                           ` Pawan Gupta
2024-05-17 17:41                         ` Kalle Valo
2024-05-17 18:31                           ` Pawan Gupta
2024-05-17 17:23                   ` Kalle Valo
2024-05-17 17:19               ` Kalle Valo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871q60ffnr.fsf@kernel.org \
    --to=kvalo@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=quic_jjohnson@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.