All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@kernel.org>
To: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, regressions@lists.linux.dev,
	Jeff Johnson <quic_jjohnson@quicinc.com>
Subject: [regression] suspend stress test stalls within 30 minutes
Date: Sat, 11 May 2024 21:22:43 +0300	[thread overview]
Message-ID: <87o79cjjik.fsf@kernel.org> (raw)

Hi,

I have a weird problem with suspend. Somewhere around v6.9-rc4 or so (not sure
exactly) I started seeing that our ath11k Wi-Fi driver suspend tests to
randomly fail. I have been investigating this for some time and now it
looks like it's somehow related to CPU_MITIGATIONS Kconfig option and
nothing to do with wireless.

The simplified test case I have is to run suspend and resume in loop
like this (Wi-Fi modules are not loaded):

for i in {1..400}; do echo "rtcwake test $i" > /dev/kmsg; rtcwake -m mem -s 10; sleep 10; done

If CPU_MITIGATIONS is enabled I usually see suspend stalling within 30
minutes. If I disable CPU_MITIGATIONS using menuconfig I don't see the bug.

When the bug happens in the kernel.log I see this and suspend stalls:

[  361.716546] PM: suspend entry (deep)
[  361.722558] Filesystems sync: 0.005 seconds
[  624.222721] kworker/dying (2519) used greatest stack depth: 22240 bytes left
[  633.897857] loop0: detected capacity change from 0 to 8

And if I don't do anything for several minutes nothing happens. What is
really strange is that once I run 'sudo shutdown -h now' then suspend
somehow immediately unstalls and continues with suspend, like this:

[  847.631147] Freezing user space processes
[  847.649590] Freezing user space processes completed (elapsed 0.016 seconds)
[  847.650710] OOM killer disabled.
[  847.651799] Freezing remaining freezable tasks
[  847.654618] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[  847.663757] printk: Suspending console(s) (use no_console_suspend to debug)
[  847.710060] e1000e: EEE TX LPI TIMER: 00000011
[  847.852370] ACPI: EC: interrupt blocked
[  847.899416] ACPI: PM: Preparing to enter system sleep state S3
[  847.933433] ACPI: EC: event blocked
[  847.933437] ACPI: EC: EC stopped
[  847.933441] ACPI: PM: Saving platform NVS memory
[  847.933817] Disabling non-boot CPUs ...

And now the system goes into suspend state as it should. And if I press
the power button on the device then the system resumes and after that
shuts down (as expected because I run the shutdown command). This
behaviour is consistent, I see it every time the suspend bug happens.

The test setup is a several years old Intel NUC x86 system, more info
below.

Any recommendations how should I debug this further? I tried to bisect
this earlier but that failed, most likely because I hadn't yet realised
that this is related to CPU_MITIGATIONS and might have messed up the
.config settings during bisect.

Kalle

DMI: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021

Ubuntu 20.04.6 LTS (GNU/Linux 6.9.0-rc7+ x86_64)

systemd 245.4-4ubuntu3.23 running in system mode. (+PAM +AUDIT +SELINUX
+IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS
+ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2
default-hierarchy=hybrid)

I verified that I see this on latest commit from Linus' tree:

cf87f46fd34d Merge tag 'drm-fixes-2024-05-11' of https://gitlab.freedesktop.org/drm/kernel

Here's the diff between broken and working .config:

$ diffconfig broken.config works.config 
-CALL_PADDING y
-CALL_THUNKS y
-CALL_THUNKS_DEBUG n
-HAVE_CALL_THUNKS y
-MITIGATION_CALL_DEPTH_TRACKING y
-MITIGATION_GDS_FORCE y
-MITIGATION_IBPB_ENTRY y
-MITIGATION_IBRS_ENTRY y
-MITIGATION_PAGE_TABLE_ISOLATION y
-MITIGATION_RETHUNK y
-MITIGATION_RETPOLINE y
-MITIGATION_RFDS y
-MITIGATION_SLS y
-MITIGATION_SPECTRE_BHI y
-MITIGATION_SRSO y
-MITIGATION_UNRET_ENTRY y
-PREFIX_SYMBOLS y
 CPU_MITIGATIONS y -> n

             reply	other threads:[~2024-05-11 18:22 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11 18:22 Kalle Valo [this message]
2024-05-11 18:48 ` [regression] suspend stress test stalls within 30 minutes Borislav Petkov
2024-05-11 18:49   ` Borislav Petkov
2024-05-11 20:26     ` Kalle Valo
2024-05-13 19:58       ` Kalle Valo
2024-05-14 13:17         ` Kalle Valo
2024-05-14 16:05           ` Borislav Petkov
2024-05-14 17:36             ` Pawan Gupta
2024-05-17 17:15             ` Kalle Valo
2024-05-17 17:22               ` Dave Hansen
2024-05-17 18:37                 ` Kalle Valo
2024-05-17 18:48                   ` Dave Hansen
2024-05-17 18:58                     ` Kalle Valo
2024-05-17 19:08                       ` Rafael J. Wysocki
2024-05-17 19:00                   ` Rafael J. Wysocki
2024-05-22  1:52                     ` Len Brown
2024-05-17 17:26               ` Borislav Petkov
2024-05-17 18:22                 ` Kalle Valo
2024-05-14 16:10           ` Dave Hansen
2024-05-15  7:22             ` Pawan Gupta
2024-05-15  7:44               ` Borislav Petkov
2024-05-15 16:27                 ` Pawan Gupta
2024-05-15 16:47                   ` Kalle Valo
2024-05-16  7:03                     ` Pawan Gupta
2024-05-16 14:25                       ` Pawan Gupta
2024-05-16 14:32                         ` Dave Hansen
2024-05-16 15:41                           ` Pawan Gupta
2024-05-17 17:41                         ` Kalle Valo
2024-05-17 18:31                           ` Pawan Gupta
2024-05-17 17:23                   ` Kalle Valo
2024-05-17 17:19               ` Kalle Valo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o79cjjik.fsf@kernel.org \
    --to=kvalo@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=quic_jjohnson@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.