* Regression: system freeze on resume from suspend introduced by printk per-console suspended state
@ 2025-12-21 22:42 ysard_git
2025-12-23 6:20 ` John Ogness
0 siblings, 1 reply; 16+ messages in thread
From: ysard_git @ 2025-12-21 22:42 UTC (permalink / raw)
To: linux-kernel; +Cc: john.ogness, pmladek, senozhatsky
Hello,
I would like to report a suspend/resume regression which I bisected down
to a specific printk commit.
Summary
=======
On an ASUS laptop, resuming from suspend leads to a complete system freeze:
black screen, no keyboard response. The system must be powered off forcibly.
This regression is reproducible and was introduced by the following commit:
9e70a5e109a4a ("printk: Add per-console suspended state")
Hardware / environment
======================
- Laptop: G75VX ASUS
- CPU Arch: x86_64
- BIOS Information
Version: G75VX.206
Release Date: 02/27/2013
- GPU: NVIDIA GeForce GTX 670MX (standalone board only, no integrated chip, no optimus)
Driver Version: 470.256.02
- Suspend mode: suspend-to-RAM (S3)
- Distribution: Debian Testing derivative (but the regression is reproduced with mainline kernel builds)
- Last working kernel from Debian:
Linux version 6.5.0-5-amd64 (debian-kernel@lists.debian.org)
(gcc-13 (Debian 13.2.0-7) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41)
#1 SMP PREEMPT_DYNAMIC Debian 6.5.13-1 (2023-11-29)
- Kernel config: based on defconfig / Debian config
- Taint flag: yes, only Nvidia proprietary drivers are loaded and compiled via DKMS
Symptoms
========
- On resume from suspend to RAM
- Screen remains black
- No visible kernel panic (no logs)
- System is completely frozen (no VT switch, no Caps lock nor Num lock responses)
- Power cycle required
Regression details
==================
Suspend/resume works correctly with kernels prior to this commit.
After this commit, resume consistently results in a hard freeze.
I performed a git bisect on the upstream kernel repository. The result is:
9e70a5e109a4a23367810de09be826c52d27ee2f is the first bad commit
Commit details:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e70a5e109a4a23367810de09be826c52d27ee2f
Reverting this commit on a later kernel (v6.7 0dd3ee31125508cd67f7e7172247f05b7fd1753a)
restores correct suspend/resume behavior on this machine.
The problem is still present on stable v6.18.2.
Additional notes
================
By playing around with the tests in /sys/power/, I can say that the problem only
occurs during a real S3 and not a simulated one.
No problem on resume:
$ echo core > /sys/power/pm_test
$ echo deep > /sys/power/mem_sleep
$ systemctl suspend
Freeze on resume:
$ echo none > /sys/power/pm_test
$ echo deep > /sys/power/mem_sleep
$ systemctl suspend
The issue appears hardware-dependent and may be related to the interaction
between printk console usability and resume ordering. The system likely
blocks silently during resume before the graphics stack becomes usable.
If additional debugging, logs, or testing are needed, I am available to
provide them.
Thank you for your time and for any guidance on how to proceed.
Best regards.
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2025-12-21 22:42 Regression: system freeze on resume from suspend introduced by printk per-console suspended state ysard_git @ 2025-12-23 6:20 ` John Ogness [not found] ` <trinity-43147d5d-a8ea-47c1-9f83-b578c346b387-1766479103562@3c-app-mailcom-bs12> 0 siblings, 1 reply; 16+ messages in thread From: John Ogness @ 2025-12-23 6:20 UTC (permalink / raw) To: ysard_git, linux-kernel; +Cc: pmladek, senozhatsky Hi, On 2025-12-21, ysard_git@gmx.fr wrote: > I would like to report a suspend/resume regression which I bisected down > to a specific printk commit. Thanks for the report and your efforts to isolate the commit! Comments below... > Summary > ======= > > On an ASUS laptop, resuming from suspend leads to a complete system freeze: > black screen, no keyboard response. The system must be powered off forcibly. > > This regression is reproducible and was introduced by the following commit: > > 9e70a5e109a4a ("printk: Add per-console suspended state") > > Hardware / environment > ====================== > > - Laptop: G75VX ASUS > - CPU Arch: x86_64 > - BIOS Information > Version: G75VX.206 > Release Date: 02/27/2013 > - GPU: NVIDIA GeForce GTX 670MX (standalone board only, no integrated chip, no optimus) > Driver Version: 470.256.02 > - Suspend mode: suspend-to-RAM (S3) > - Distribution: Debian Testing derivative (but the regression is reproduced with mainline kernel builds) > - Last working kernel from Debian: > Linux version 6.5.0-5-amd64 (debian-kernel@lists.debian.org) > (gcc-13 (Debian 13.2.0-7) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41) > #1 SMP PREEMPT_DYNAMIC Debian 6.5.13-1 (2023-11-29) > - Kernel config: based on defconfig / Debian config > - Taint flag: yes, only Nvidia proprietary drivers are loaded and compiled via DKMS > > Symptoms > ======== > > - On resume from suspend to RAM > - Screen remains black > - No visible kernel panic (no logs) > - System is completely frozen (no VT switch, no Caps lock nor Num lock responses) > - Power cycle required > > Regression details > ================== > > Suspend/resume works correctly with kernels prior to this commit. > After this commit, resume consistently results in a hard freeze. > > I performed a git bisect on the upstream kernel repository. The result is: > > 9e70a5e109a4a23367810de09be826c52d27ee2f is the first bad commit > > Commit details: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e70a5e109a4a23367810de09be826c52d27ee2f > > Reverting this commit on a later kernel (v6.7 0dd3ee31125508cd67f7e7172247f05b7fd1753a) > restores correct suspend/resume behavior on this machine. Interesting. This commit removes an ugly console_lock hack to supress console flushing during the suspend/resume window. > The problem is still present on stable v6.18.2. It is great that you can reproduce it with 6.18.2. Are you also able to test 6.19-rc1 and reproduce it? It would be nice to attack the problem from the latest code first. > Additional notes > ================ > > By playing around with the tests in /sys/power/, I can say that the problem only > occurs during a real S3 and not a simulated one. > > No problem on resume: > $ echo core > /sys/power/pm_test > $ echo deep > /sys/power/mem_sleep > $ systemctl suspend > > Freeze on resume: > $ echo none > /sys/power/pm_test > $ echo deep > /sys/power/mem_sleep > $ systemctl suspend > > The issue appears hardware-dependent and may be related to the interaction > between printk console usability and resume ordering. The system likely > blocks silently during resume before the graphics stack becomes usable. > > If additional debugging, logs, or testing are needed, I am available to > provide them. It would be helpful to know if you are doing anything special (like using "no_console_suspend") or have multiple consoles. Could you send the output of: $ cat /proc/consoles and $ cat /proc/cmdline Once we have this information, we can start to dig deeper. John Ogness ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <trinity-43147d5d-a8ea-47c1-9f83-b578c346b387-1766479103562@3c-app-mailcom-bs12>]
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state [not found] ` <trinity-43147d5d-a8ea-47c1-9f83-b578c346b387-1766479103562@3c-app-mailcom-bs12> @ 2026-01-08 0:05 ` pv 2026-01-08 9:43 ` John Ogness 0 siblings, 1 reply; 16+ messages in thread From: pv @ 2026-01-08 0:05 UTC (permalink / raw) To: ysard_git; +Cc: John Ogness, linux-kernel, pmladek, senozhatsky Hi, I am resending the message from December 23 because lkml considered it as spam. First, thank you for your response. > It is great that you can reproduce it with 6.18.2. Are you also able to > test 6.19-rc1 and reproduce it? It would be nice to attack the problem > from the latest code first. The code in 6.18.2 has changed enough that I don't dare change it as I did in 6.7. After compiling and installing 6.19-rc1 (8f0b4cce4481fb22653697cced8d0d04027cb1e8), I can confirm that the freeze is still present. > It would be helpful to know if you are doing anything special (like > using "no_console_suspend") or have multiple consoles. Could you send > the output of: > > $ cat /proc/consoles > > and > > $ cat /proc/cmdline > > Once we have this information, we can start to dig deeper. $ cat /proc/consoles tty0 -WU (EC p ) 4:2 $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-6.5.0-5-amd64 root=/dev/mapper/vg0-root ro cryptdevice=UUID=397b50bf-8142-4456-8af2-bf39dcc8f732:cryptdisk:allow-discards mitigations=off quiet splash It's similar for 6.19.0-rc1. $ cat /sys/module/printk/parameters/console_suspend Y Thank you for your time. Best regards. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-08 0:05 ` pv @ 2026-01-08 9:43 ` John Ogness 2026-01-23 7:44 ` ysard 0 siblings, 1 reply; 16+ messages in thread From: John Ogness @ 2026-01-08 9:43 UTC (permalink / raw) To: pv, ysard_git; +Cc: linux-kernel, pmladek, senozhatsky [-- Attachment #1: Type: text/plain, Size: 1709 bytes --] On 2026-01-08, pv <pierre.vignet@caramail.fr> wrote: > The code in 6.18.2 has changed enough that I don't dare change it as I did in 6.7. > After compiling and installing 6.19-rc1 (8f0b4cce4481fb22653697cced8d0d04027cb1e8), > I can confirm that the freeze is still present. Thanks for confirmation. Would you be willing to try a debugging patch? I have attached a patch (based on 6.19-rc4). It should restore the old console_lock behavior during suspend/resume. Assuming this works for you, it also adds some debugging information so that we can figure out who is locking the console. Please compile with CONFIG_PRINTK_CALLER=y. After suspend/resume, locate the debug lines with: # dmesg | grep printk It will probably look something like this: # dmesg | grep printk [ 12.085727][ T843] printk: Suspending console(s) (use no_console_suspend to debug) [ 12.236199][ T843] printk: console_suspend [ 17.581734][ T67] printk: console_lock [ 17.583711][ T67] printk: console_unlock [ 17.644819][ T843] printk: console_lock [ 17.644823][ T843] printk: console_unlock [ 17.644827][ T843] printk: console_resume If you have CONFIG_PRINTK_CALLER enabled (as I do here), it will show the tasks locking the console. In my case it is PID 67 and 843. It would be nice to know who these are. For me it is: # cat /proc/67/comm kworker/2:1-mm_percpu_wq # cat /proc/843/comm sh So in summary, I would like to know: 1. Does suspend/resume work on 6.19-rc4 with this patch applied? 2. Output of "dmesg | grep printk" (after suspend/resume) 3. Which tasks were locking the console between console_suspend and console_resume? Your help in debugging this is greatly appreciated. John Ogness [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-printk-Debug-new-vs.-old-suspend-resume-behavior.patch --] [-- Type: text/x-diff, Size: 3581 bytes --] From f3bb7d49b59b61a30f2fefc2b241d3f535b2afd2 Mon Sep 17 00:00:00 2001 From: John Ogness <john.ogness@linutronix.de> Date: Thu, 8 Jan 2026 09:31:51 +0000 Subject: [PATCH] printk: Debug new vs. old suspend/resume behavior This is just for debugging. It should restore the old console_lock behavior for suspend/resume and also adds some debugging information. Please compile with CONFIG_PRINTK_CALLER=y so that we can see which tasks are locking/unlocking the console during suspend/resume. --- kernel/printk/printk.c | 61 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1d765ad242b8..fd69cab4368e 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -356,6 +356,22 @@ static void __up_console_sem(unsigned long ip) */ static int console_locked; +static int console_suspended; + +int vprintk_store(int facility, int level, + const struct dev_printk_info *dev_info, + const char *fmt, va_list args); + +/* Helper function to store-only. */ +static void printk_store(const char *fmt, ...) +{ + va_list args; + + va_start(args, fmt); + vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args); + va_end(args); +} + /* * Array of consoles built from command line options (console=) */ @@ -2841,6 +2857,15 @@ void console_lock(void) msleep(1000); down_console_sem(); + if (console_suspended) { + printk_store(KERN_INFO "printk: %s\n", __func__); + /* + * Keep console locked, but do not touch + * @console_locked or @console_may_schedule. + * (Although they will both be 1 here anyway.) + */ + return; + } console_locked = 1; console_may_schedule = 1; } @@ -2861,6 +2886,15 @@ int console_trylock(void) return 0; if (down_trylock_console_sem()) return 0; + if (console_suspended) { + printk_store(KERN_INFO "printk: %s\n", __func__); + /* + * The lock was acquired, but unlock directly and report + * failure. Here console_locked=1 and console_may_schedule=1. + */ + up_console_sem(); + return 0; + } console_locked = 1; console_may_schedule = 0; return 1; @@ -3354,6 +3388,16 @@ void console_unlock(void) { struct console_flush_type ft; + if (console_suspended) { + printk_store(KERN_INFO "printk: %s\n", __func__); + /* + * Simply unlock directly. + * Here console_locked=1 and console_may_schedule=1. + */ + up_console_sem(); + return; + } + printk_get_console_flush_type(&ft); if (ft.legacy_direct) __console_flush_and_unlock(); @@ -3559,6 +3603,7 @@ struct tty_driver *console_device(int *index) void console_suspend(struct console *console) { __pr_flush(console, 1000, true); + console_list_lock(); console_srcu_write_flags(console, console->flags & ~CON_ENABLED); console_list_unlock(); @@ -3570,6 +3615,12 @@ void console_suspend(struct console *console) * using the port. */ synchronize_srcu(&console_srcu); + + console_lock(); + console_suspended = 1; + printk_store(KERN_INFO "printk: %s\n", __func__); + /* Unlock directly (i.e. without clearing @console_locked). */ + up_console_sem(); } EXPORT_SYMBOL(console_suspend); @@ -3597,6 +3648,16 @@ void console_resume(struct console *console) defer_console_output(); __pr_flush(console, 1000, true); + + down_console_sem(); + printk_store(KERN_INFO "printk: %s\n", __func__); + console_suspended = 0; + /* + * Perform a regular unlock. + * Here console_locked=1 and console_may_schedule=1. + * @console_unlocked will be cleared. + */ + console_unlock(); } EXPORT_SYMBOL(console_resume); -- 2.30.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-08 9:43 ` John Ogness @ 2026-01-23 7:44 ` ysard 2026-01-23 12:19 ` Petr Mladek 0 siblings, 1 reply; 16+ messages in thread From: ysard @ 2026-01-23 7:44 UTC (permalink / raw) To: John Ogness; +Cc: linux-kernel, pmladek, senozhatsky Good evening, thank you for your reply and the patch. Summary ====== The patch does not seem to have any effect on the problem, *but* I have found a way to temporarily fix the freeze by disabling the `nvidia-suspend` service. Additional info for diagnostics =============================== $ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 470.256.02 Thu May 2 14:37:44 UTC 2024 GCC version: gcc version 12.5.0 (Debian 12.5.0-6) $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 Procedure requested =================== > I have attached a patch (based on 6.19-rc4). It should restore the old > console_lock behavior during suspend/resume. Assuming this works for > you, it also adds some debugging information so that we can figure out > who is locking the console. I applied the patch. The behavior is the same as before (no resume). $ uname -r 6.19.0-rc4-dirty $ dmesg | grep printk [ 0.030102] [ T0] printk: log buffer data + meta data: 131072 + 458752 = 589824 bytes [ 0.077779] [ T0] printk: legacy console [tty0] enabled [ 152.678589] [ T1349] printk: Suspending console(s) (use no_console_suspend to debug) ... no resume Temporary solution ================== I had the idea of restarting in recovery mode (rescue.target) to run the test. The `systemctl suspend` command is not available in this mode, which forced me to use the `pm-suspend` command, which allows for proper sleep and resume across all kernel versions that I have been able to test previously. Systemd triggers a number of services before actually going into sleep mode, including a call to nvidia-suspend.service, which I disabled ("because it's always nvidia"). The following command restores normal operation of `systemctl suspend`, including on the first non-functional commit found by the bisect (9e70a5e109a4a23367810de09be826c52d27ee2f). $ systemctl disable nvidia-suspend.service This service calls a script `/usr/bin/nvidia-sleep.sh` that seems to play with vt consoles and expects that they are still usable (`chvt 63` ?): #!/bin/bash if [ ! -f /proc/driver/nvidia/suspend ]; then exit 0 fi RUN_DIR="/var/run/nvidia-sleep" XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number PATH="/bin:/usr/bin" case "$1" in suspend|hibernate) mkdir -p "${RUN_DIR}" fgconsole > "${XORG_VT_FILE}" chvt 63 if [[ $? -ne 0 ]]; then exit $? fi echo "$1" > /proc/driver/nvidia/suspend exit $? ;; resume) echo "$1" > /proc/driver/nvidia/suspend # # Check if Xorg was determined to be running at the time # of suspend, and whether its VT was recorded. If so, # attempt to switch back to this VT. # if [[ -f "${XORG_VT_FILE}" ]]; then XORG_PID=$(cat "${XORG_VT_FILE}") rm "${XORG_VT_FILE}" chvt "${XORG_PID}" fi exit 0 ;; *) exit 1 esac Conclusion ========== kernel nvidia-suspend (systemd 259~rc1-1) result < 9e70a5e109a4 enabled ok < 9e70a5e109a4 disabled ok >= 9e70a5e109a4 enabled freeze >= 9e70a5e109a4 disabled ok - Reactivating this service causes the freeze to reappear in a reproducible pattern. - The `pm-suspend` command has never stopped working. It seems that this is a two-sided problem? If the kernel is not the issue, I apologize and am sorry for wasting your time; I should have thought about the layers added by systemd sooner. Best regards. Extra ===== During my tests with 6.19.0-rc1 and 6.19.0-rc4, I noticed that resuming a sleep test that used to work now fails (it worked in 6.18.2), but I think this is unrelated and is due to another issue. I am noting this for historical purposes. $ echo core > /sys/power/pm_test $ echo deep > /sys/power/mem_sleep Both commands `pm-suspend` or `systemctl suspend` have the same effect: - Trigger suspend (`kernel: PM: suspend entry (deep)` in dmesg); - No response when pressing the power button to wake up; - Force shutdown by holding down the power button; - The computer shuts down but the motherboard indicates a state similar to sleep mode (LED flashing); - Pressing the power button starts the computer (fans + HDD spin up) for a fraction of a second (<1s) then the machine shuts down; - Pressing the power button starts the machine normally (not a resume from sleep mode). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-23 7:44 ` ysard @ 2026-01-23 12:19 ` Petr Mladek 2026-01-24 1:22 ` ysard 0 siblings, 1 reply; 16+ messages in thread From: Petr Mladek @ 2026-01-23 12:19 UTC (permalink / raw) To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky On Fri 2026-01-23 08:44:39, ysard wrote: > Good evening, thank you for your reply and the patch. > > > Summary > ====== > > The patch does not seem to have any effect on the problem, *but* I have found a > way to temporarily fix the freeze by disabling the `nvidia-suspend` service. Great catch! > Additional info for diagnostics > =============================== > > $ cat /proc/driver/nvidia/version > NVRM version: NVIDIA UNIX x86_64 Kernel Module 470.256.02 Thu May 2 14:37:44 UTC 2024 > GCC version: gcc version 12.5.0 (Debian 12.5.0-6) > > $ nvcc --version > nvcc: NVIDIA (R) Cuda compiler driver > Copyright (c) 2005-2019 NVIDIA Corporation > Built on Wed_Oct_23_19:24:38_PDT_2019 > Cuda compilation tools, release 10.2, V10.2.89 > > > Procedure requested > =================== > > > I have attached a patch (based on 6.19-rc4). It should restore the old > > console_lock behavior during suspend/resume. Assuming this works for > > you, it also adds some debugging information so that we can figure out > > who is locking the console. > > I applied the patch. The behavior is the same as before (no resume). > > $ uname -r > 6.19.0-rc4-dirty > > $ dmesg | grep printk > [ 0.030102] [ T0] printk: log buffer data + meta data: 131072 + 458752 = 589824 bytes > [ 0.077779] [ T0] printk: legacy console [tty0] enabled > [ 152.678589] [ T1349] printk: Suspending console(s) (use no_console_suspend to debug) > ... > no resume > > > Temporary solution > ================== > > I had the idea of restarting in recovery mode (rescue.target) to run the test. > The `systemctl suspend` command is not available in this mode, which forced me > to use the `pm-suspend` command, which allows for proper sleep and resume across > all kernel versions that I have been able to test previously. > > Systemd triggers a number of services before actually going into sleep mode, > including a call to nvidia-suspend.service, which I disabled > ("because it's always nvidia"). > > The following command restores normal operation of `systemctl suspend`, > including on the first non-functional commit found by the bisect > (9e70a5e109a4a23367810de09be826c52d27ee2f). > > $ systemctl disable nvidia-suspend.service > > This service calls a script `/usr/bin/nvidia-sleep.sh` that seems to play with > vt consoles and expects that they are still usable (`chvt 63` ?): I did run chvt with strace and it does something like: openat(AT_FDCWD, "/dev/tty0", O_RDWR) = 3 ioctl(3, TCGETS2, {c_iflag=IGNBRK|IGNPAR, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|, c_cflag=B38400|CS8|CREAD, c_lflag=, ...}) = 0 ioctl(3, KDGKBTYPE, [KB_101]) = 0 rt_sigaction(SIGALRM, {sa_handler=0x556bcf0418e0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f3e60a42910}, NULL, 8) = 0 timer_create(CLOCK_REALTIME, {sigev_value={sival_int=1668762552, sival_ptr=0x7ffc63774bb8}, sigev_signo=SIGALRM, sigev_notify=SIGEV_SIGNAL}, [0]) = 0 timer_settime(0, 0, {it_interval={tv_sec=1, tv_nsec=0}, it_value={tv_sec=1, tv_nsec=0}}, NULL) = 0 ioctl(3, VT_ACTIVATE, 0x3f) = 0 ioctl(3, VT_WAITACTIVE, 0x3f) = 0 And vt_ioctl(,,VT_ACTIVATE) calls vc_allocate(arg) under console_lock()... The commit 9e70a5e109a4a233 ("printk: Add per-console suspended state") does some changes in console_lock(). It newly sets: console_locked = 1; console_may_schedule = 1; I can't see how this might cause the freeze. Well, for example, it would allow console_conditional_schedule() to get asleep. And it might be harder to obtain the lock when the current owner is sleeping. But it would have an effect only in CONFIG_PREEMPT_VOLUNTARY kernel. The process might get scheduled even without cond_resched() with other CONFIG_PREEMPT modes. Also I would expect that the userspace waits until the services finish the job before suspending the kernel. > #!/bin/bash > > if [ ! -f /proc/driver/nvidia/suspend ]; then > exit 0 > fi > > RUN_DIR="/var/run/nvidia-sleep" > XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number > > PATH="/bin:/usr/bin" > > case "$1" in > suspend|hibernate) > mkdir -p "${RUN_DIR}" > fgconsole > "${XORG_VT_FILE}" > chvt 63 > if [[ $? -ne 0 ]]; then > exit $? > fi > echo "$1" > /proc/driver/nvidia/suspend > exit $? > ;; > resume) > echo "$1" > /proc/driver/nvidia/suspend > # > # Check if Xorg was determined to be running at the time > # of suspend, and whether its VT was recorded. If so, > # attempt to switch back to this VT. > # > if [[ -f "${XORG_VT_FILE}" ]]; then > XORG_PID=$(cat "${XORG_VT_FILE}") > rm "${XORG_VT_FILE}" > chvt "${XORG_PID}" > fi > exit 0 I just wonder. Could you please try to comment out the various commands here and bisect whether the problem is with "fgconsole", "chvt", or "echo XXX >/proc/driver/nvidia/suspend" commands. I mean to try to disable the counter parts in the suspend/resume code paths and try whether the freeze is still reproducible? > ;; > *) > exit 1 > esac > > > Conclusion > ========== > > kernel nvidia-suspend (systemd 259~rc1-1) result > < 9e70a5e109a4 enabled ok > < 9e70a5e109a4 disabled ok > >= 9e70a5e109a4 enabled freeze > >= 9e70a5e109a4 disabled ok > > - Reactivating this service causes the freeze to reappear in a reproducible pattern. > - The `pm-suspend` command has never stopped working. > > It seems that this is a two-sided problem? > If the kernel is not the issue, I apologize and am sorry for wasting your time; > I should have thought about the layers added by systemd sooner. There is no need to apologize. It seems to be somehow affected by the kernel commit. And it would be great to understand what is going on. > Extra > ===== > > During my tests with 6.19.0-rc1 and 6.19.0-rc4, I noticed that resuming a sleep > test that used to work now fails (it worked in 6.18.2), but I think this is > unrelated and is due to another issue. I am noting this for historical purposes. > > $ echo core > /sys/power/pm_test > $ echo deep > /sys/power/mem_sleep > > Both commands `pm-suspend` or `systemctl suspend` have the same effect: > > - Trigger suspend (`kernel: PM: suspend entry (deep)` in dmesg); > - No response when pressing the power button to wake up; > - Force shutdown by holding down the power button; > - The computer shuts down but the motherboard indicates a state similar to > sleep mode (LED flashing); > - Pressing the power button starts the computer (fans + HDD spin up) for a > fraction of a second (<1s) then the machine shuts down; > - Pressing the power button starts the machine normally > (not a resume from sleep mode). This would deserve a separate thread. It would be great if you could bisect it to the problematic commit. Best Regards, Petr ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-23 12:19 ` Petr Mladek @ 2026-01-24 1:22 ` ysard 2026-01-28 14:00 ` Petr Mladek 0 siblings, 1 reply; 16+ messages in thread From: ysard @ 2026-01-24 1:22 UTC (permalink / raw) To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky On Fri 2026-01-23 13:19:34 +0100, Petr Mladek wrote: > Also I would expect that the userspace waits until the services > finish the job before suspending the kernel. It does: janv. 24 00:33:41 systemd[1]: Reached target sleep.target - Sleep. janv. 24 00:33:41 systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions... janv. 24 00:33:41 suspend[51525]: nvidia-suspend.service janv. 24 00:33:41 logger[51525]: <13>Jan 24 00:33:41 suspend: nvidia-suspend.service janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.373:2351): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0 janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.969:2352): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0 janv. 24 00:33:45 systemd[1]: nvidia-suspend.service: Deactivated successfully. janv. 24 00:33:45 systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions. janv. 24 00:33:45 systemd[1]: Starting systemd-suspend.service - System Suspend... janv. 24 00:33:45 systemd[1]: session-1.scope: Unit now frozen-by-parent. janv. 24 00:33:45 systemd[1]: user@1000.service: Unit now frozen-by-parent. janv. 24 00:33:45 systemd[1]: user-1000.slice: Unit now frozen-by-parent. janv. 24 00:33:45 systemd[1]: user.slice: Unit now frozen. janv. 24 00:33:45 systemd-sleep[51562]: Successfully froze unit 'user.slice'. janv. 24 00:33:45 systemd-sleep[51562]: Performing sleep operation 'suspend'... janv. 24 00:33:45 kernel: PM: suspend entry (deep) > > #!/bin/bash > > > > if [ ! -f /proc/driver/nvidia/suspend ]; then > > exit 0 > > fi > > > > RUN_DIR="/var/run/nvidia-sleep" > > XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number > > > > PATH="/bin:/usr/bin" > > > > case "$1" in > > suspend|hibernate) > > mkdir -p "${RUN_DIR}" > > fgconsole > "${XORG_VT_FILE}" > > chvt 63 > > if [[ $? -ne 0 ]]; then > > exit $? > > fi > > echo "$1" > /proc/driver/nvidia/suspend > > exit $? > > ;; > > resume) > > echo "$1" > /proc/driver/nvidia/suspend > > # > > # Check if Xorg was determined to be running at the time > > # of suspend, and whether its VT was recorded. If so, > > # attempt to switch back to this VT. > > # > > if [[ -f "${XORG_VT_FILE}" ]]; then > > XORG_PID=$(cat "${XORG_VT_FILE}") > > rm "${XORG_VT_FILE}" > > chvt "${XORG_PID}" > > fi > > exit 0 > > I just wonder. Could you please try to comment out the various > commands here and bisect whether the problem is with > "fgconsole", "chvt", or "echo XXX >/proc/driver/nvidia/suspend" > commands. > > I mean to try to disable the counter parts in the suspend/resume > code paths and try whether the freeze is still reproducible? Yes I have a reproducible pattern here. With the service disabled. The service `nvidia-resume.service` (which basically calls the script with the 'resume' argument) is expected to start if the resume is completed, but the system does not reach this stage during the freeze. No freeze: $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && systemctl suspend" Freeze: $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && echo suspend >/proc/driver/nvidia/suspend \ && systemctl suspend" So the problem is related to this command: $ echo suspend >/proc/driver/nvidia/suspend Note that without the systemctl order this command suspends and wakes up the gpu correctly: $ sudo sh -c " chvt 63 \ && echo suspend >/proc/driver/nvidia/suspend; \ sleep 4; \ echo resume >/proc/driver/nvidia/suspend; \ chvt 2" Best regards. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-24 1:22 ` ysard @ 2026-01-28 14:00 ` Petr Mladek 2026-01-28 15:25 ` John Ogness 2026-01-29 9:34 ` ysard 0 siblings, 2 replies; 16+ messages in thread From: Petr Mladek @ 2026-01-28 14:00 UTC (permalink / raw) To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky On Sat 2026-01-24 02:22:41, ysard wrote: > On Fri 2026-01-23 13:19:34 +0100, Petr Mladek wrote: > > Also I would expect that the userspace waits until the services > > finish the job before suspending the kernel. > > It does: > > janv. 24 00:33:41 systemd[1]: Reached target sleep.target - Sleep. > janv. 24 00:33:41 systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions... > janv. 24 00:33:41 suspend[51525]: nvidia-suspend.service > janv. 24 00:33:41 logger[51525]: <13>Jan 24 00:33:41 suspend: nvidia-suspend.service > janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.373:2351): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0 > janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.969:2352): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0 > janv. 24 00:33:45 systemd[1]: nvidia-suspend.service: Deactivated successfully. > janv. 24 00:33:45 systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions. > janv. 24 00:33:45 systemd[1]: Starting systemd-suspend.service - System Suspend... > janv. 24 00:33:45 systemd[1]: session-1.scope: Unit now frozen-by-parent. > janv. 24 00:33:45 systemd[1]: user@1000.service: Unit now frozen-by-parent. > janv. 24 00:33:45 systemd[1]: user-1000.slice: Unit now frozen-by-parent. > janv. 24 00:33:45 systemd[1]: user.slice: Unit now frozen. > janv. 24 00:33:45 systemd-sleep[51562]: Successfully froze unit 'user.slice'. > janv. 24 00:33:45 systemd-sleep[51562]: Performing sleep operation 'suspend'... > janv. 24 00:33:45 kernel: PM: suspend entry (deep) OK. > Yes I have a reproducible pattern here. With the service disabled. > The service `nvidia-resume.service` (which basically calls the script > with the 'resume' argument) is expected to start if the resume is > completed, but the system does not reach this stage during the freeze. > > No freeze: > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && systemctl suspend" > > Freeze: > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && echo suspend >/proc/driver/nvidia/suspend \ > && systemctl suspend" > > So the problem is related to this command: > $ echo suspend >/proc/driver/nvidia/suspend > > Note that without the systemctl order this command suspends and wakes up the gpu correctly: > $ sudo sh -c " > chvt 63 \ > && echo suspend >/proc/driver/nvidia/suspend; \ > sleep 4; \ > echo resume >/proc/driver/nvidia/suspend; \ > chvt 2" Interesting. It looks like the nvidia suspend does something which breaks the system suspend. But the driver is able to revert it... To be honest, I do not have any theory which could explain this. But I have found a bug in John's debug patch from https://lore.kernel.org/all/877bts1ltv.fsf@jogness.linutronix.de/ The patch tried to restore the original behavior on current mainline. But console_suspend()/cosnole_resume() function have been renamed recently to console_suspend_all()/console_resume_all(). The original names were used for console-specific suspend/resume variants, see https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/ Also the debug patch did not revert synchronize_srcu(). I guess that this was intentional. But I would rather revert it as well because it is a potentially blocking operation. Could you please test it with this fixed version of the debug patch? If the patch helps, by chance, then please try to uncomment the synchronize_srcu() calls and check if it still works. I wonder if they make in difference. From a36b57cbcb239e7e5af4fb8278690cd4965d6fc0 Mon Sep 17 00:00:00 2001 From: John Ogness <john.ogness@linutronix.de> Date: Thu, 8 Jan 2026 10:49:24 +0106 Subject: [DEBUG v2] printk: Debug new vs. old suspend/resume behavior This is just for debugging. It should restore the old console_lock behavior for suspend/resume and also adds some debugging information. Please compile with CONFIG_PRINTK_CALLER=y so that we can see which tasks are locking/unlocking the console during suspend/resume. Changes against v1: - Set/Clear the global "console_suspended" variable in console_suspend_all()/console_restore_all() instead of console_suspend()/console_resume(). The functions have been renamed recently, see https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/ - Do not call synchronize_srcu() in the suspend/resume functions. They are another potentially blocking operation added by the problematic commit 9e70a5e109a4a2336 ("printk: Add per-console suspended state"). Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/printk.c | 64 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 62 insertions(+), 2 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1d765ad242b8..23fddc4006d3 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -356,6 +356,22 @@ static void __up_console_sem(unsigned long ip) */ static int console_locked; +static int console_suspended; + +int vprintk_store(int facility, int level, + const struct dev_printk_info *dev_info, + const char *fmt, va_list args); + +/* Helper function to store-only. */ +static void printk_store(const char *fmt, ...) +{ + va_list args; + + va_start(args, fmt); + vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args); + va_end(args); +} + /* * Array of consoles built from command line options (console=) */ @@ -2748,6 +2764,12 @@ void console_suspend_all(void) if (!console_suspend_enabled) return; + console_lock(); + console_suspended = 1; + printk_store(KERN_INFO "printk: %s\n", __func__); + /* Unlock directly (i.e. without clearing @console_locked). */ + up_console_sem(); + console_list_lock(); for_each_console(con) console_srcu_write_flags(con, con->flags | CON_SUSPENDED); @@ -2759,7 +2781,7 @@ void console_suspend_all(void) * is guaranteed that all printing has stopped when this function * completes. */ - synchronize_srcu(&console_srcu); +// synchronize_srcu(&console_srcu); } void console_resume_all(void) @@ -2785,7 +2807,17 @@ void console_resume_all(void) * contexts must be able to see they are no longer suspended so * that they are guaranteed to wake up and resume printing. */ - synchronize_srcu(&console_srcu); +// synchronize_srcu(&console_srcu); + + down_console_sem(); + printk_store(KERN_INFO "printk: %s\n", __func__); + console_suspended = 0; + /* + * Perform a regular unlock. + * Here console_locked=1 and console_may_schedule=1. + * @console_unlocked will be cleared. + */ + console_unlock(); } printk_get_console_flush_type(&ft); @@ -2841,6 +2873,15 @@ void console_lock(void) msleep(1000); down_console_sem(); + if (console_suspended) { + printk_store(KERN_INFO "printk: %s\n", __func__); + /* + * Keep console locked, but do not touch + * @console_locked or @console_may_schedule. + * (Although they will both be 1 here anyway.) + */ + return; + } console_locked = 1; console_may_schedule = 1; } @@ -2861,6 +2902,15 @@ int console_trylock(void) return 0; if (down_trylock_console_sem()) return 0; + if (console_suspended) { + printk_store(KERN_INFO "printk: %s\n", __func__); + /* + * The lock was acquired, but unlock directly and report + * failure. Here console_locked=1 and console_may_schedule=1. + */ + up_console_sem(); + return 0; + } console_locked = 1; console_may_schedule = 0; return 1; @@ -3354,6 +3404,16 @@ void console_unlock(void) { struct console_flush_type ft; + if (console_suspended) { + printk_store(KERN_INFO "printk: %s\n", __func__); + /* + * Simply unlock directly. + * Here console_locked=1 and console_may_schedule=1. + */ + up_console_sem(); + return; + } + printk_get_console_flush_type(&ft); if (ft.legacy_direct) __console_flush_and_unlock(); -- 2.52.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-28 14:00 ` Petr Mladek @ 2026-01-28 15:25 ` John Ogness 2026-01-29 9:34 ` ysard 1 sibling, 0 replies; 16+ messages in thread From: John Ogness @ 2026-01-28 15:25 UTC (permalink / raw) To: Petr Mladek, ysard; +Cc: linux-kernel, senozhatsky On 2026-01-28, Petr Mladek <pmladek@suse.com> wrote: > But I have found a bug in John's debug patch from > https://lore.kernel.org/all/877bts1ltv.fsf@jogness.linutronix.de/ > > The patch tried to restore the original behavior on current mainline. > But console_suspend()/cosnole_resume() function have been renamed recently > to console_suspend_all()/console_resume_all(). The original > names were used for console-specific suspend/resume variants, > see > https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/ > > Also the debug patch did not revert synchronize_srcu(). I guess that > this was intentional. I was also concerned about synchronize_srcu() being the culprit, which is why I left it in. I was really surprised the test patch still had problems. > But I would rather revert it as well because > it is a potentially blocking operation. > > Could you please test it with this fixed version of the debug patch? > > If the patch helps, by chance, then please try to uncomment > the synchronize_srcu() calls and check if it still works. Also, if the patch still has the problem, it would be nice to see the dmesg output with the patch applied when you do only the nvidia suspend/resume and avoid systemctl. > I wonder if they make in difference. Thanks Petr for the new patch version. I am curious what comes of it. John ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-28 14:00 ` Petr Mladek 2026-01-28 15:25 ` John Ogness @ 2026-01-29 9:34 ` ysard 2026-01-30 15:56 ` Petr Mladek 1 sibling, 1 reply; 16+ messages in thread From: ysard @ 2026-01-29 9:34 UTC (permalink / raw) To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky Summary ======= The patch works only when I *uncomment* the 2 `synchronize_srcu(&console_srcu);` lines! With synchronize_srcu commented (as is) ======================================= No freeze (expected): $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && systemctl suspend" Logs: [ 378.634960] [ T2805] printk: Suspending console(s) (use no_console_suspend to debug) [ 378.634961] [ T2805] printk: console_suspend_all [ 378.669076] [ T2831] printk: console_trylock [ 378.669385] [ T271] printk: console_trylock [ 378.677075] [ T2826] printk: console_trylock [ 378.677094] [ T2832] printk: console_trylock [ 378.677597] [ T269] printk: console_trylock [ 378.678842] [ T273] printk: console_trylock [ 378.820962] [ T100] printk: console_trylock [ 379.498727] [ T2805] printk: console_trylock [ 379.537391] [ T2805] printk: console_trylock [ 379.650475] [ T2805] printk: console_trylock [ 379.650477] [ T2805] printk: console_trylock [ 379.650478] [ T2805] printk: console_trylock [ 379.650618] [ T2805] printk: console_trylock [ 379.652295] [ T2805] printk: console_trylock [ 379.654336] [ T2805] printk: console_trylock [ 379.656435] [ T2805] printk: console_trylock [ 379.658727] [ T2805] printk: console_trylock [ 379.660623] [ T2805] printk: console_trylock [ 379.662429] [ T2805] printk: console_trylock [ 379.664265] [ T2805] printk: console_trylock [ 379.666130] [ T2805] printk: console_trylock [ 379.666154] [ T2805] printk: console_trylock [ 379.666156] [ T2805] printk: console_trylock [ 379.666607] [ T2805] printk: console_trylock [ 379.666654] [ T2805] printk: console_trylock [ 379.670212] [ T2805] printk: console_trylock [ 379.670243] [ T2805] printk: console_trylock [ 379.673828] [ T2805] printk: console_trylock [ 379.673863] [ T2805] printk: console_trylock [ 379.677518] [ T2805] printk: console_trylock [ 379.677547] [ T2805] printk: console_trylock [ 379.680645] [ T2805] printk: console_trylock [ 379.680675] [ T2805] printk: console_trylock [ 379.683811] [ T2805] printk: console_trylock [ 379.683836] [ T2805] printk: console_trylock [ 379.686960] [ T2805] printk: console_trylock [ 379.686984] [ T2805] printk: console_trylock [ 379.690141] [ T2805] printk: console_trylock [ 379.693604] [ T2805] printk: console_trylock [ 379.753022] [ T2805] printk: console_trylock [ 379.754384] [ T2805] printk: console_trylock [ 379.754386] [ T2831] printk: console_trylock [ 379.754392] [ T2831] printk: console_trylock [ 379.754395] [ T2831] printk: console_trylock [ 379.754421] [ T2825] printk: console_trylock [ 379.754466] [ T2829] printk: console_trylock [ 379.754563] [ T2826] printk: console_trylock [ 379.766769] [ T11] printk: console_lock [ 379.766772] [ T11] printk: console_unlock [ 380.030746] [ T2826] printk: console_trylock [ 380.030749] [ T2825] printk: console_trylock [ 380.072183] [ T11] printk: console_trylock [ 380.081075] [ T271] printk: console_trylock [ 380.081100] [ T273] printk: console_trylock [ 380.081734] [ T273] printk: console_trylock [ 380.082280] [ T271] printk: console_trylock [ 380.083257] [ T271] printk: console_trylock [ 380.083767] [ T273] printk: console_trylock [ 380.084464] [ T271] printk: console_trylock [ 380.086088] [ T273] printk: console_trylock [ 380.091983] [ T334] printk: console_trylock [ 380.094504] [ T271] printk: console_trylock [ 380.095494] [ T271] printk: console_trylock [ 380.096698] [ T271] printk: console_trylock [ 380.098628] [ T11] printk: console_trylock [ 380.099849] [ T84] printk: console_trylock [ 380.104195] [ T271] printk: console_trylock [ 380.104493] [ T273] printk: console_trylock [ 380.106533] [ T273] printk: console_trylock [ 380.108797] [ T273] printk: console_trylock [ 380.122616] [ T273] printk: console_trylock [ 380.158723] [ T2831] printk: console_trylock [ 380.470807] [ T2825] printk: console_trylock [ 380.652605] [ T2805] printk: console_resume_all Proc comm values: 100: kworker/5:1-mm_percpu_wq 11: kworker/0:1-events 269: scsi_eh_0 271: scsi_eh_1 273: scsi_eh_2 2805: not exists or not readable 2825: not exists or not readable 2826: not exists or not readable 2829: not exists or not readable 2831: kworker/u32:17-flush-253:1 2832: not exists or not readable 334: not exists or not readable 76: kauditd 84: kworker/2:1-events --- Freeze (as before): $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && echo suspend >/proc/driver/nvidia/suspend \ && systemctl suspend" With synchronize_srcu uncommented ================================= No freeze (expected): $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && systemctl suspend" Logs: [ 97.006391] [ T2643] printk: Suspending console(s) (use no_console_suspend to debug) [ 97.006393] [ T2643] printk: console_suspend_all [ 97.043073] [ T2654] printk: console_trylock [ 97.043290] [ T2655] printk: console_trylock [ 97.043486] [ T270] printk: console_trylock [ 97.044873] [ T272] printk: console_trylock [ 97.067023] [ T2656] printk: console_trylock [ 97.067403] [ T268] printk: console_trylock [ 97.138911] [ T9] printk: console_trylock [ 97.853561] [ T2643] printk: console_trylock [ 97.891353] [ T2643] printk: console_trylock [ 98.003694] [ T2643] printk: console_trylock [ 98.003696] [ T2643] printk: console_trylock [ 98.003697] [ T2643] printk: console_trylock [ 98.003838] [ T2643] printk: console_trylock [ 98.005787] [ T2643] printk: console_trylock [ 98.008142] [ T2643] printk: console_trylock [ 98.010180] [ T2643] printk: console_trylock [ 98.012412] [ T2643] printk: console_trylock [ 98.014319] [ T2643] printk: console_trylock [ 98.016306] [ T2643] printk: console_trylock [ 98.018004] [ T2643] printk: console_trylock [ 98.019610] [ T2643] printk: console_trylock [ 98.019634] [ T2643] printk: console_trylock [ 98.019636] [ T2643] printk: console_trylock [ 98.020082] [ T2643] printk: console_trylock [ 98.020130] [ T2643] printk: console_trylock [ 98.023697] [ T2643] printk: console_trylock [ 98.023730] [ T2643] printk: console_trylock [ 98.027334] [ T2643] printk: console_trylock [ 98.027369] [ T2643] printk: console_trylock [ 98.031044] [ T2643] printk: console_trylock [ 98.031074] [ T2643] printk: console_trylock [ 98.034191] [ T2643] printk: console_trylock [ 98.034221] [ T2643] printk: console_trylock [ 98.037379] [ T2643] printk: console_trylock [ 98.037405] [ T2643] printk: console_trylock [ 98.040576] [ T2643] printk: console_trylock [ 98.040599] [ T2643] printk: console_trylock [ 98.043786] [ T2643] printk: console_trylock [ 98.047294] [ T2643] printk: console_trylock [ 98.107363] [ T2643] printk: console_trylock [ 98.108733] [ T2643] printk: console_trylock [ 98.108748] [ T64] printk: console_trylock [ 98.108751] [ T70] printk: console_trylock [ 98.108753] [ T70] printk: console_trylock [ 98.108862] [ T2660] printk: console_trylock [ 98.108869] [ T2665] printk: console_trylock [ 98.124749] [ T84] printk: console_lock [ 98.124752] [ T84] printk: console_unlock [ 98.384581] [ T64] printk: console_trylock [ 98.384591] [ T2665] printk: console_trylock [ 98.432411] [ T84] printk: console_trylock [ 98.442955] [ T272] printk: console_trylock [ 98.442980] [ T270] printk: console_trylock [ 98.443315] [ T272] printk: console_trylock [ 98.444400] [ T270] printk: console_trylock [ 98.445386] [ T270] printk: console_trylock [ 98.445776] [ T272] printk: console_trylock [ 98.446591] [ T270] printk: console_trylock [ 98.448456] [ T272] printk: console_trylock [ 98.454112] [ T88] printk: console_trylock [ 98.456644] [ T270] printk: console_trylock [ 98.457620] [ T270] printk: console_trylock [ 98.458827] [ T270] printk: console_trylock [ 98.460353] [ T84] printk: console_trylock [ 98.464803] [ T9] printk: console_trylock [ 98.466327] [ T270] printk: console_trylock [ 98.470287] [ T272] printk: console_trylock [ 98.472759] [ T272] printk: console_trylock [ 98.475445] [ T272] printk: console_trylock [ 98.491793] [ T272] printk: console_trylock [ 98.512594] [ T70] printk: console_trylock [ 98.824640] [ T64] printk: console_trylock [ 99.006658] [ T2643] printk: console_resume_all Proc comm values: 2643: not exists or not readable 2654: kworker/u32:12-async 2655: kworker/u32:13-async 2656: kworker/u32:14-async 2660: kworker/u32:18-events_unbound 2665: kworker/u32:23-pm 268: scsi_eh_0 270: scsi_eh_1 272: scsi_eh_2 64: kworker/u32:1-events_unbound 70: kworker/u32:7-pm 76: kauditd 84: kworker/1:1-events 88: kworker/2:1-cgroup_free 9: kworker/0:0-events --- Works now!: $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && echo suspend >/proc/driver/nvidia/suspend \ && systemctl suspend" Logs: [ 338.901995] [ T3134] printk: Suspending console(s) (use no_console_suspend to debug) [ 338.901997] [ T3134] printk: console_suspend_all [ 338.932763] [ T2672] printk: console_trylock [ 338.948664] [ T2659] printk: console_trylock [ 338.948685] [ T2671] printk: console_trylock [ 338.950716] [ T272] printk: console_trylock [ 338.950747] [ T270] printk: console_trylock [ 338.982194] [ T3134] printk: console_trylock [ 339.020910] [ T3134] printk: console_trylock [ 339.044613] [ T158] printk: console_trylock [ 339.132842] [ T3134] printk: console_trylock [ 339.132843] [ T3134] printk: console_trylock [ 339.132844] [ T3134] printk: console_trylock [ 339.132980] [ T3134] printk: console_trylock [ 339.134648] [ T3134] printk: console_trylock [ 339.136741] [ T3134] printk: console_trylock [ 339.138806] [ T3134] printk: console_trylock [ 339.141014] [ T3134] printk: console_trylock [ 339.142875] [ T3134] printk: console_trylock [ 339.144627] [ T3134] printk: console_trylock [ 339.146490] [ T3134] printk: console_trylock [ 339.148175] [ T3134] printk: console_trylock [ 339.148200] [ T3134] printk: console_trylock [ 339.148201] [ T3134] printk: console_trylock [ 339.148657] [ T3134] printk: console_trylock [ 339.148706] [ T3134] printk: console_trylock [ 339.152285] [ T3134] printk: console_trylock [ 339.152318] [ T3134] printk: console_trylock [ 339.155931] [ T3134] printk: console_trylock [ 339.155964] [ T3134] printk: console_trylock [ 339.159760] [ T3134] printk: console_trylock [ 339.159792] [ T3134] printk: console_trylock [ 339.162961] [ T3134] printk: console_trylock [ 339.162987] [ T3134] printk: console_trylock [ 339.166055] [ T3134] printk: console_trylock [ 339.166079] [ T3134] printk: console_trylock [ 339.169188] [ T3134] printk: console_trylock [ 339.169214] [ T3134] printk: console_trylock [ 339.172501] [ T3134] printk: console_trylock [ 339.176003] [ T3134] printk: console_trylock [ 339.225601] [ T3134] printk: console_trylock [ 339.226950] [ T66] printk: console_trylock [ 339.226954] [ T66] printk: console_trylock [ 339.226957] [ T66] printk: console_trylock [ 339.226964] [ T65] printk: console_trylock [ 339.226969] [ T3134] printk: console_trylock [ 339.227037] [ T2671] printk: console_trylock [ 339.227045] [ T2661] printk: console_trylock [ 339.254208] [ T9] printk: console_lock [ 339.254214] [ T9] printk: console_unlock [ 339.502296] [ T2673] printk: console_trylock [ 339.506297] [ T67] printk: console_trylock [ 339.561605] [ T272] printk: console_trylock [ 339.561632] [ T270] printk: console_trylock [ 339.562072] [ T272] printk: console_trylock [ 339.563175] [ T270] printk: console_trylock [ 339.564203] [ T270] printk: console_trylock [ 339.564703] [ T272] printk: console_trylock [ 339.565094] [ T272] printk: console_trylock [ 339.565341] [ T270] printk: console_trylock [ 339.570554] [ T1127] printk: console_trylock [ 339.571705] [ T272] printk: console_trylock [ 339.573017] [ T310] printk: console_trylock [ 339.574203] [ T272] printk: console_trylock [ 339.575888] [ T270] printk: console_trylock [ 339.575899] [ T272] printk: console_trylock [ 339.580570] [ T272] printk: console_trylock [ 339.582271] [ T270] printk: console_trylock [ 339.583508] [ T270] printk: console_trylock [ 339.586181] [ T9] printk: console_trylock [ 339.591087] [ T270] printk: console_trylock [ 339.614444] [ T9] printk: console_trylock [ 339.634304] [ T3149] printk: console_trylock [ 339.942244] [ T2673] printk: console_trylock [ 340.123772] [ T3134] printk: console_resume_all Proc comm values: 1127: kworker/3:2-cgroup_offline 158: kworker/5:1-mm_percpu_wq 2659: kworker/u32:17-async 2661: kworker/u32:19-async 2671: kworker/u32:29-async 2672: kworker/u32:30-async 2673: kworker/u32:31-kvfree_rcu_reclaim 270: scsi_eh_1 272: scsi_eh_2 310: kworker/2:2-events 3134: not exists or not readable 3149: kworker/u32:38-flush-253:3 65: kworker/u32:2-async 66: kworker/u32:3-async 67: kworker/u32:4-async 9: kworker/0:0-events --- On 2026-01-28, John Ogness wrote: > Also, if the patch still has the problem, it would be nice to see the > dmesg output with the patch applied when you do only the nvidia > suspend/resume and avoid systemctl. Unfortunately, there is no printk return under these conditions. When searching for the names of the console_suspend/console_resume and console_lock()/console_unlock() functions in the kms module sources, I only found two uses of the latter, to which I added logs. These functions are only called very early during startup and then never again; Maybe this will help. [ 4.967951] [ T248] NV_API_CALL: os_disable_console_access [ 5.164823] [ T248] NV_API_CALL: os_enable_console_access [ 6.724895] [ T248] NV_API_CALL: os_disable_console_access [ 6.724988] [ T248] NV_API_CALL: os_enable_console_access common/inc/nv-linux.h: /* * Early 2.6 kernels have acquire_console_sem, but from 2.6.38+ it was * renamed to console_lock. */ #if defined(NV_ACQUIRE_CONSOLE_SEM_PRESENT) #define NV_ACQUIRE_CONSOLE_SEM() acquire_console_sem() #define NV_RELEASE_CONSOLE_SEM() release_console_sem() #elif defined(NV_CONSOLE_LOCK_PRESENT) #define NV_ACQUIRE_CONSOLE_SEM() console_lock() #define NV_RELEASE_CONSOLE_SEM() console_unlock() #else #error "console lock api unrecognized!." #endif nvidia/os-interface.c: NV_STATUS NV_API_CALL os_disable_console_access(void) { nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_disable_console_access\n"); NV_ACQUIRE_CONSOLE_SEM(); return NV_OK; } NV_STATUS NV_API_CALL os_enable_console_access(void) { nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_enable_console_access\n"); NV_RELEASE_CONSOLE_SEM(); return NV_OK; } Cheers ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-29 9:34 ` ysard @ 2026-01-30 15:56 ` Petr Mladek 2026-01-30 16:28 ` Petr Mladek 0 siblings, 1 reply; 16+ messages in thread From: Petr Mladek @ 2026-01-30 15:56 UTC (permalink / raw) To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky On Thu 2026-01-29 10:34:53, ysard wrote: > Summary > ======= > > The patch works only when I *uncomment* the 2 `synchronize_srcu(&console_srcu);` lines! It is a pretty interesting information. It is good to know that the synchronize_scru() is needed and has a positive effect. > With synchronize_srcu commented (as is) > ======================================= > > No freeze (expected): > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && systemctl suspend" > > Logs: > [ 378.634960] [ T2805] printk: Suspending console(s) (use no_console_suspend to debug) > [ 378.634961] [ T2805] printk: console_suspend_all > [ 378.669076] [ T2831] printk: console_trylock > [ 378.669385] [ T271] printk: console_trylock > [ 378.677075] [ T2826] printk: console_trylock > [ 378.677094] [ T2832] printk: console_trylock > [ 378.677597] [ T269] printk: console_trylock > [ 378.678842] [ T273] printk: console_trylock > [ 378.820962] [ T100] printk: console_trylock > [ 379.498727] [ T2805] printk: console_trylock > [ 379.537391] [ T2805] printk: console_trylock > [ 379.650475] [ T2805] printk: console_trylock > [ 379.650477] [ T2805] printk: console_trylock > [ 379.650478] [ T2805] printk: console_trylock > [ 379.650618] [ T2805] printk: console_trylock > [ 379.652295] [ T2805] printk: console_trylock > [ 379.654336] [ T2805] printk: console_trylock > [ 379.656435] [ T2805] printk: console_trylock > [ 379.658727] [ T2805] printk: console_trylock > [ 379.660623] [ T2805] printk: console_trylock > [ 379.662429] [ T2805] printk: console_trylock > [ 379.664265] [ T2805] printk: console_trylock > [ 379.666130] [ T2805] printk: console_trylock > [ 379.666154] [ T2805] printk: console_trylock > [ 379.666156] [ T2805] printk: console_trylock > [ 379.666607] [ T2805] printk: console_trylock > [ 379.666654] [ T2805] printk: console_trylock > [ 379.670212] [ T2805] printk: console_trylock > [ 379.670243] [ T2805] printk: console_trylock > [ 379.673828] [ T2805] printk: console_trylock > [ 379.673863] [ T2805] printk: console_trylock > [ 379.677518] [ T2805] printk: console_trylock > [ 379.677547] [ T2805] printk: console_trylock > [ 379.680645] [ T2805] printk: console_trylock > [ 379.680675] [ T2805] printk: console_trylock > [ 379.683811] [ T2805] printk: console_trylock > [ 379.683836] [ T2805] printk: console_trylock > [ 379.686960] [ T2805] printk: console_trylock > [ 379.686984] [ T2805] printk: console_trylock > [ 379.690141] [ T2805] printk: console_trylock > [ 379.693604] [ T2805] printk: console_trylock > [ 379.753022] [ T2805] printk: console_trylock > [ 379.754384] [ T2805] printk: console_trylock > [ 379.754386] [ T2831] printk: console_trylock > [ 379.754392] [ T2831] printk: console_trylock > [ 379.754395] [ T2831] printk: console_trylock > [ 379.754421] [ T2825] printk: console_trylock > [ 379.754466] [ T2829] printk: console_trylock > [ 379.754563] [ T2826] printk: console_trylock > [ 379.766769] [ T11] printk: console_lock > [ 379.766772] [ T11] printk: console_unlock > [ 380.030746] [ T2826] printk: console_trylock > [ 380.030749] [ T2825] printk: console_trylock > [ 380.072183] [ T11] printk: console_trylock > [ 380.081075] [ T271] printk: console_trylock > [ 380.081100] [ T273] printk: console_trylock > [ 380.081734] [ T273] printk: console_trylock > [ 380.082280] [ T271] printk: console_trylock > [ 380.083257] [ T271] printk: console_trylock > [ 380.083767] [ T273] printk: console_trylock > [ 380.084464] [ T271] printk: console_trylock > [ 380.086088] [ T273] printk: console_trylock > [ 380.091983] [ T334] printk: console_trylock > [ 380.094504] [ T271] printk: console_trylock > [ 380.095494] [ T271] printk: console_trylock > [ 380.096698] [ T271] printk: console_trylock > [ 380.098628] [ T11] printk: console_trylock > [ 380.099849] [ T84] printk: console_trylock > [ 380.104195] [ T271] printk: console_trylock > [ 380.104493] [ T273] printk: console_trylock > [ 380.106533] [ T273] printk: console_trylock > [ 380.108797] [ T273] printk: console_trylock > [ 380.122616] [ T273] printk: console_trylock > [ 380.158723] [ T2831] printk: console_trylock > [ 380.470807] [ T2825] printk: console_trylock > [ 380.652605] [ T2805] printk: console_resume_all > > Proc comm values: > 100: kworker/5:1-mm_percpu_wq > 11: kworker/0:1-events > 269: scsi_eh_0 > 271: scsi_eh_1 > 273: scsi_eh_2 > 2805: not exists or not readable > 2825: not exists or not readable > 2826: not exists or not readable > 2829: not exists or not readable > 2831: kworker/u32:17-flush-253:1 > 2832: not exists or not readable > 334: not exists or not readable > 76: kauditd > 84: kworker/2:1-events > > --- > > Freeze (as before): > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && echo suspend >/proc/driver/nvidia/suspend \ > && systemctl suspend" Note that the disabled synchronize_srcu() might cause another problems (races) which are not related to the nvidia driver. It is possible that it failed from other reasons this time. Anyway, it is great to know that the console_srcu() could and actually should stay. > With synchronize_srcu uncommented > ================================= > > No freeze (expected): > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && systemctl suspend" > > Logs: > [ 97.006391] [ T2643] printk: Suspending console(s) (use no_console_suspend to debug) > [ 97.006393] [ T2643] printk: console_suspend_all > [ 97.043073] [ T2654] printk: console_trylock > [ 97.043290] [ T2655] printk: console_trylock > [ 97.043486] [ T270] printk: console_trylock > [ 97.044873] [ T272] printk: console_trylock > [ 97.067023] [ T2656] printk: console_trylock > [ 97.067403] [ T268] printk: console_trylock > [ 97.138911] [ T9] printk: console_trylock > [ 97.853561] [ T2643] printk: console_trylock > [ 97.891353] [ T2643] printk: console_trylock > [ 98.003694] [ T2643] printk: console_trylock > [ 98.003696] [ T2643] printk: console_trylock > [ 98.003697] [ T2643] printk: console_trylock > [ 98.003838] [ T2643] printk: console_trylock > [ 98.005787] [ T2643] printk: console_trylock > [ 98.008142] [ T2643] printk: console_trylock > [ 98.010180] [ T2643] printk: console_trylock > [ 98.012412] [ T2643] printk: console_trylock > [ 98.014319] [ T2643] printk: console_trylock > [ 98.016306] [ T2643] printk: console_trylock > [ 98.018004] [ T2643] printk: console_trylock > [ 98.019610] [ T2643] printk: console_trylock > [ 98.019634] [ T2643] printk: console_trylock > [ 98.019636] [ T2643] printk: console_trylock > [ 98.020082] [ T2643] printk: console_trylock > [ 98.020130] [ T2643] printk: console_trylock > [ 98.023697] [ T2643] printk: console_trylock > [ 98.023730] [ T2643] printk: console_trylock > [ 98.027334] [ T2643] printk: console_trylock > [ 98.027369] [ T2643] printk: console_trylock > [ 98.031044] [ T2643] printk: console_trylock > [ 98.031074] [ T2643] printk: console_trylock > [ 98.034191] [ T2643] printk: console_trylock > [ 98.034221] [ T2643] printk: console_trylock > [ 98.037379] [ T2643] printk: console_trylock > [ 98.037405] [ T2643] printk: console_trylock > [ 98.040576] [ T2643] printk: console_trylock > [ 98.040599] [ T2643] printk: console_trylock > [ 98.043786] [ T2643] printk: console_trylock > [ 98.047294] [ T2643] printk: console_trylock > [ 98.107363] [ T2643] printk: console_trylock > [ 98.108733] [ T2643] printk: console_trylock > [ 98.108748] [ T64] printk: console_trylock > [ 98.108751] [ T70] printk: console_trylock > [ 98.108753] [ T70] printk: console_trylock > [ 98.108862] [ T2660] printk: console_trylock > [ 98.108869] [ T2665] printk: console_trylock > [ 98.124749] [ T84] printk: console_lock > [ 98.124752] [ T84] printk: console_unlock > [ 98.384581] [ T64] printk: console_trylock > [ 98.384591] [ T2665] printk: console_trylock > [ 98.432411] [ T84] printk: console_trylock > [ 98.442955] [ T272] printk: console_trylock > [ 98.442980] [ T270] printk: console_trylock > [ 98.443315] [ T272] printk: console_trylock > [ 98.444400] [ T270] printk: console_trylock > [ 98.445386] [ T270] printk: console_trylock > [ 98.445776] [ T272] printk: console_trylock > [ 98.446591] [ T270] printk: console_trylock > [ 98.448456] [ T272] printk: console_trylock > [ 98.454112] [ T88] printk: console_trylock > [ 98.456644] [ T270] printk: console_trylock > [ 98.457620] [ T270] printk: console_trylock > [ 98.458827] [ T270] printk: console_trylock > [ 98.460353] [ T84] printk: console_trylock > [ 98.464803] [ T9] printk: console_trylock > [ 98.466327] [ T270] printk: console_trylock > [ 98.470287] [ T272] printk: console_trylock > [ 98.472759] [ T272] printk: console_trylock > [ 98.475445] [ T272] printk: console_trylock > [ 98.491793] [ T272] printk: console_trylock > [ 98.512594] [ T70] printk: console_trylock > [ 98.824640] [ T64] printk: console_trylock > [ 99.006658] [ T2643] printk: console_resume_all > > Proc comm values: > 2643: not exists or not readable > 2654: kworker/u32:12-async > 2655: kworker/u32:13-async > 2656: kworker/u32:14-async > 2660: kworker/u32:18-events_unbound > 2665: kworker/u32:23-pm > 268: scsi_eh_0 > 270: scsi_eh_1 > 272: scsi_eh_2 > 64: kworker/u32:1-events_unbound > 70: kworker/u32:7-pm > 76: kauditd > 84: kworker/1:1-events > 88: kworker/2:1-cgroup_free > 9: kworker/0:0-events > > --- > > Works now!: > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && echo suspend >/proc/driver/nvidia/suspend \ > && systemctl suspend" > > Logs: > [ 338.901995] [ T3134] printk: Suspending console(s) (use no_console_suspend to debug) > [ 338.901997] [ T3134] printk: console_suspend_all > [ 338.932763] [ T2672] printk: console_trylock > [ 338.948664] [ T2659] printk: console_trylock > [ 338.948685] [ T2671] printk: console_trylock > [ 338.950716] [ T272] printk: console_trylock > [ 338.950747] [ T270] printk: console_trylock > [ 338.982194] [ T3134] printk: console_trylock > [ 339.020910] [ T3134] printk: console_trylock > [ 339.044613] [ T158] printk: console_trylock > [ 339.132842] [ T3134] printk: console_trylock > [ 339.132843] [ T3134] printk: console_trylock > [ 339.132844] [ T3134] printk: console_trylock > [ 339.132980] [ T3134] printk: console_trylock > [ 339.134648] [ T3134] printk: console_trylock > [ 339.136741] [ T3134] printk: console_trylock > [ 339.138806] [ T3134] printk: console_trylock > [ 339.141014] [ T3134] printk: console_trylock > [ 339.142875] [ T3134] printk: console_trylock > [ 339.144627] [ T3134] printk: console_trylock > [ 339.146490] [ T3134] printk: console_trylock > [ 339.148175] [ T3134] printk: console_trylock > [ 339.148200] [ T3134] printk: console_trylock > [ 339.148201] [ T3134] printk: console_trylock > [ 339.148657] [ T3134] printk: console_trylock > [ 339.148706] [ T3134] printk: console_trylock > [ 339.152285] [ T3134] printk: console_trylock > [ 339.152318] [ T3134] printk: console_trylock > [ 339.155931] [ T3134] printk: console_trylock > [ 339.155964] [ T3134] printk: console_trylock > [ 339.159760] [ T3134] printk: console_trylock > [ 339.159792] [ T3134] printk: console_trylock > [ 339.162961] [ T3134] printk: console_trylock > [ 339.162987] [ T3134] printk: console_trylock > [ 339.166055] [ T3134] printk: console_trylock > [ 339.166079] [ T3134] printk: console_trylock > [ 339.169188] [ T3134] printk: console_trylock > [ 339.169214] [ T3134] printk: console_trylock > [ 339.172501] [ T3134] printk: console_trylock > [ 339.176003] [ T3134] printk: console_trylock > [ 339.225601] [ T3134] printk: console_trylock > [ 339.226950] [ T66] printk: console_trylock > [ 339.226954] [ T66] printk: console_trylock > [ 339.226957] [ T66] printk: console_trylock > [ 339.226964] [ T65] printk: console_trylock > [ 339.226969] [ T3134] printk: console_trylock > [ 339.227037] [ T2671] printk: console_trylock > [ 339.227045] [ T2661] printk: console_trylock > [ 339.254208] [ T9] printk: console_lock > [ 339.254214] [ T9] printk: console_unlock > [ 339.502296] [ T2673] printk: console_trylock > [ 339.506297] [ T67] printk: console_trylock > [ 339.561605] [ T272] printk: console_trylock > [ 339.561632] [ T270] printk: console_trylock > [ 339.562072] [ T272] printk: console_trylock > [ 339.563175] [ T270] printk: console_trylock > [ 339.564203] [ T270] printk: console_trylock > [ 339.564703] [ T272] printk: console_trylock > [ 339.565094] [ T272] printk: console_trylock > [ 339.565341] [ T270] printk: console_trylock > [ 339.570554] [ T1127] printk: console_trylock > [ 339.571705] [ T272] printk: console_trylock > [ 339.573017] [ T310] printk: console_trylock > [ 339.574203] [ T272] printk: console_trylock > [ 339.575888] [ T270] printk: console_trylock > [ 339.575899] [ T272] printk: console_trylock > [ 339.580570] [ T272] printk: console_trylock > [ 339.582271] [ T270] printk: console_trylock > [ 339.583508] [ T270] printk: console_trylock > [ 339.586181] [ T9] printk: console_trylock > [ 339.591087] [ T270] printk: console_trylock > [ 339.614444] [ T9] printk: console_trylock > [ 339.634304] [ T3149] printk: console_trylock > [ 339.942244] [ T2673] printk: console_trylock > [ 340.123772] [ T3134] printk: console_resume_all > > Proc comm values: > 1127: kworker/3:2-cgroup_offline > 158: kworker/5:1-mm_percpu_wq > 2659: kworker/u32:17-async > 2661: kworker/u32:19-async > 2671: kworker/u32:29-async > 2672: kworker/u32:30-async > 2673: kworker/u32:31-kvfree_rcu_reclaim > 270: scsi_eh_1 > 272: scsi_eh_2 > 310: kworker/2:2-events > 3134: not exists or not readable > 3149: kworker/u32:38-flush-253:3 > 65: kworker/u32:2-async > 66: kworker/u32:3-async > 67: kworker/u32:4-async > 9: kworker/0:0-events It is hard to know what is going there. I guess that many console_trylock() calls are from printk(). But they might also be from tty or from the nvidia driver code. I have tried to create a patch which would print backtraces of the callers. The output might be interesting. I am going to send it in a separate mail. > On 2026-01-28, John Ogness wrote: > > Also, if the patch still has the problem, it would be nice to see the > > dmesg output with the patch applied when you do only the nvidia > > suspend/resume and avoid systemctl. > > Unfortunately, there is no printk return under these conditions. > > When searching for the names of the console_suspend/console_resume and > console_lock()/console_unlock() functions in the kms module sources, > I only found two uses of the latter, to which I added logs. > These functions are only called very early during startup and then never again; > Maybe this will help. > > [ 4.967951] [ T248] NV_API_CALL: os_disable_console_access > [ 5.164823] [ T248] NV_API_CALL: os_enable_console_access > [ 6.724895] [ T248] NV_API_CALL: os_disable_console_access > [ 6.724988] [ T248] NV_API_CALL: os_enable_console_access > > > common/inc/nv-linux.h: > > /* > * Early 2.6 kernels have acquire_console_sem, but from 2.6.38+ it was > * renamed to console_lock. > */ > #if defined(NV_ACQUIRE_CONSOLE_SEM_PRESENT) > #define NV_ACQUIRE_CONSOLE_SEM() acquire_console_sem() > #define NV_RELEASE_CONSOLE_SEM() release_console_sem() > #elif defined(NV_CONSOLE_LOCK_PRESENT) > #define NV_ACQUIRE_CONSOLE_SEM() console_lock() > #define NV_RELEASE_CONSOLE_SEM() console_unlock() > #else > #error "console lock api unrecognized!." > #endif > > nvidia/os-interface.c: > > NV_STATUS NV_API_CALL os_disable_console_access(void) > { > nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_disable_console_access\n"); > NV_ACQUIRE_CONSOLE_SEM(); > return NV_OK; > } > > NV_STATUS NV_API_CALL os_enable_console_access(void) > { > nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_enable_console_access\n"); > NV_RELEASE_CONSOLE_SEM(); > return NV_OK; > } Good to know. So the nvidia driver synchronizes some operations using console_lock() as well. And it might be affected by the modified behavior. For example, before the commit 9e70a5e109a4a2336 ("printk: Add per-console suspended state") during suspend: + console_trylock() never succeeded + console_lock() set neither cosnole_locked, nor console_may_schedule. ... Best Regards, Petr PS: I am going to write some more ideas into another mail with a new debug patch. I am not sure if I would send it today. I have to leave in 30 minutes or so... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-30 15:56 ` Petr Mladek @ 2026-01-30 16:28 ` Petr Mladek 2026-01-31 22:22 ` ysard 0 siblings, 1 reply; 16+ messages in thread From: Petr Mladek @ 2026-01-30 16:28 UTC (permalink / raw) To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky On Fri 2026-01-30 16:56:56, Petr Mladek wrote: > On Thu 2026-01-29 10:34:53, ysard wrote: > > Summary > > ======= > > > > The patch works only when I *uncomment* the 2 `synchronize_srcu(&console_srcu);` lines! > > > Works now!: > > $ sudo sh -c " > > mkdir -p /var/run/nvidia-sleep \ > > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > > && chvt 63 \ > > && echo suspend >/proc/driver/nvidia/suspend \ > > && systemctl suspend" > > > > Logs: > > [ 338.901995] [ T3134] printk: Suspending console(s) (use no_console_suspend to debug) > > [ 338.901997] [ T3134] printk: console_suspend_all > > [ 338.932763] [ T2672] printk: console_trylock > > [ 338.948664] [ T2659] printk: console_trylock > > [ 338.948685] [ T2671] printk: console_trylock > > [ 338.950716] [ T272] printk: console_trylock > > [ 338.950747] [ T270] printk: console_trylock > > [ 338.982194] [ T3134] printk: console_trylock > > [ 339.020910] [ T3134] printk: console_trylock > > [ 339.044613] [ T158] printk: console_trylock > > [ 339.132842] [ T3134] printk: console_trylock [...] > > [ 339.614444] [ T9] printk: console_trylock > > [ 339.634304] [ T3149] printk: console_trylock > > [ 339.942244] [ T2673] printk: console_trylock > > [ 340.123772] [ T3134] printk: console_resume_all > > > > Proc comm values: > > 1127: kworker/3:2-cgroup_offline > > 158: kworker/5:1-mm_percpu_wq > > 2659: kworker/u32:17-async > > 2661: kworker/u32:19-async > > 2671: kworker/u32:29-async > > 2672: kworker/u32:30-async > > 2673: kworker/u32:31-kvfree_rcu_reclaim > > 270: scsi_eh_1 > > 272: scsi_eh_2 > > 310: kworker/2:2-events > > 3134: not exists or not readable > > 3149: kworker/u32:38-flush-253:3 > > 65: kworker/u32:2-async > > 66: kworker/u32:3-async > > 67: kworker/u32:4-async > > 9: kworker/0:0-events > > It is hard to know what is going there. I guess that many > console_trylock() calls are from printk(). But they might also > be from tty or from the nvidia driver code. > > I have tried to create a patch which would print backtraces > of the callers. The output might be interesting. I am going > to send it in a separate mail. Please, find the patch below. It can be applied on top of the preovius one which reverted the problematic commit, see https://lore.kernel.org/all/aXoWiJhcOaGGlcmk@pathway.suse.cz/ Important: You still need to explicitely uncomment the sychronize_srcu() calls. It would be nice to provide three logs with this patch: 1. console_lock() API called by echo suspend >/proc/driver/nvidia/suspend 2. console_lock() API called by the suspend test No freeze (expected): $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && systemctl suspend" 3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.: $ sudo sh -c " mkdir -p /var/run/nvidia-sleep \ && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ && chvt 63 \ && echo suspend >/proc/driver/nvidia/suspend \ && systemctl suspend" Feel free to do your own tests. But please do not spend too much time on it. It would be nice to find the culprit. Especially I would like to know whether the problem is in the core kernel code (printk, suspend, tty) or in the nvidia driver. But we might also just need to restore the original behavior of the console lock API during suspend. I mean the short cuts. They are a kind of optimization anyway. But as I said, it would be nice to understand the problem. There might be a race or infinite loop somewhere and the original console_lock API behavior just hides the problem. OK, here is the patch which printed backtraces of console_lock API callers around the suspend for me. From ab9e67fc50425eb6574c016c5809865a085c4b44 Mon Sep 17 00:00:00 2001 From: Petr Mladek <pmladek@suse.com> Date: Fri, 30 Jan 2026 16:57:25 +0100 Subject: [PATCH] printk: debug: Allow to print backtraces of console_lock API callers. This patch allows to print backtraces of console_lock API callers. The logging must be explicitely enabled by: echo 1 >/sys/module/printk/parameters/debug_console_lock I suggest to increase the internal log buffer size, for example, use on the command line: log_buf_len=1M The logging might easily go wild because the console API is called by printk() and tty code. I have explicitely disabled the logging in: + vprintk_emit() + fb_flashcursor() , see no_debug_console_lock_enter()/exit() calls. Otherwise, there was an endless loop of printed entries. Note that I run the tests in a virtualized test system (QEMU) via ssh. Some more tty API might need to get disabled when doing the tests using the tty console. But it might hide the code called during suspend. So, doing the tests via ssh sounds better to me. Also the number of printed backtraces is limited to "100". It helps to keep the system running when printing the report triggers more callers of the API and causes an infinite loop. Signed-off-by: Petr Mladek <pmladek@suse.com> --- drivers/video/fbdev/core/fbcon.c | 9 +++- include/linux/printk.h | 2 + kernel/printk/printk.c | 91 ++++++++++++++++++++++++++++++-- 3 files changed, 98 insertions(+), 4 deletions(-) diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c index 7be9e865325d..1d5d7af516ed 100644 --- a/drivers/video/fbdev/core/fbcon.c +++ b/drivers/video/fbdev/core/fbcon.c @@ -397,14 +397,19 @@ static void fb_flashcursor(struct work_struct *work) struct vc_data *vc = NULL; int c; bool enable; + int debug; int ret; + debug = no_debug_console_lock_enter(); + /* FIXME: we should sort out the unbind locking instead */ /* instead we just fail to flash the cursor if we can't get * the lock instead of blocking fbcon deinit */ ret = console_trylock(); - if (ret == 0) + if (ret == 0) { + no_debug_console_lock_exit(debug); return; + } /* protected by console_lock */ info = par->info; @@ -416,6 +421,7 @@ static void fb_flashcursor(struct work_struct *work) fbcon_info_from_console(vc->vc_num) != info || vc->vc_deccm != 1) { console_unlock(); + no_debug_console_lock_exit(debug); return; } @@ -425,6 +431,7 @@ static void fb_flashcursor(struct work_struct *work) get_fg_color(vc, info, c), get_bg_color(vc, info, c)); console_unlock(); + no_debug_console_lock_exit(debug); queue_delayed_work(system_power_efficient_wq, &par->cursor_work, par->cur_blink_jiffies); diff --git a/include/linux/printk.h b/include/linux/printk.h index 45c663124c9b..f266b20b7584 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -201,6 +201,8 @@ void __init setup_log_buf(int early); __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...); void dump_stack_print_info(const char *log_lvl); void show_regs_print_info(const char *log_lvl); +int no_debug_console_lock_enter(void); +void no_debug_console_lock_exit(int debug); extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold; extern asmlinkage void dump_stack(void) __cold; void printk_trigger_flush(void); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 6e0ad926f409..c54c70d7a074 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2383,11 +2383,52 @@ MODULE_PARM_DESC(debug_non_panic_cpus, "allow messages from non-panic CPUs in panic()"); #endif +int debug_console_lock; +module_param(debug_console_lock, int, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(debug_console_lock, + "Print some caller info when console_lock() API is called."); + +static DEFINE_PER_CPU(int, no_debug_console_lock_nesting); + +int no_debug_console_lock_enter(void) +{ + int debug; + + debug = READ_ONCE(debug_console_lock); + if (debug) { + if (this_cpu_read(no_debug_console_lock_nesting) > 256) { + debug_console_lock = 0; + pr_err("Overflow of no_debug_console_lock_nesting! Disabled console_lock debuginng.\n"); + return 0; + } + + migrate_disable(); + this_cpu_inc(no_debug_console_lock_nesting); + } + + return debug; +} + +void no_debug_console_lock_exit(int debug) +{ + if (debug) { + if (this_cpu_read(no_debug_console_lock_nesting) <= 0) { + debug_console_lock = 0; + pr_err("Underflow of no_debug_console_lock_nesting! Disabled console lock debuginng.\n"); + migrate_enable(); + return; + } + this_cpu_dec(no_debug_console_lock_nesting); + migrate_enable(); + } +} + asmlinkage int vprintk_emit(int facility, int level, const struct dev_printk_info *dev_info, const char *fmt, va_list args) { struct console_flush_type ft; + int debug_caller; int printed_len; /* Suppress unimportant messages after panic happens */ @@ -2404,6 +2445,8 @@ asmlinkage int vprintk_emit(int facility, int level, !panic_triggering_all_cpu_backtrace) return 0; + debug_caller = no_debug_console_lock_enter(); + printk_get_console_flush_type(&ft); /* If called from the scheduler, we can not call up(). */ @@ -2448,6 +2491,8 @@ asmlinkage int vprintk_emit(int facility, int level, else if (!console_irqwork_blocked) wake_up_klogd(); + no_debug_console_lock_exit(debug_caller); + return printed_len; } EXPORT_SYMBOL(vprintk_emit); @@ -2856,6 +2901,41 @@ static int console_cpu_notify(unsigned int cpu) return 0; } +static void print_console_lock_caller(const char *func_name) +{ + static int reports = 0; + char tcomm[64]; + u8 nested; + + if (!debug_console_lock) + return; + + nested = raw_cpu_read(no_debug_console_lock_nesting); + if (nested) + return; + + if (!func_name) { + pr_err_once("%s: called without func_name\n", __func__); + return; + } + + if (reports++ > 100) { + debug_console_lock = 0; + pr_warn("Too many printed console lock callers. Disabled the debugging.\n"); + } + + if (current->flags & PF_WQ_WORKER) + wq_worker_comm(tcomm, sizeof(tcomm), current); + else if (current->flags & PF_KTHREAD) + get_kthread_comm(tcomm, sizeof(tcomm), current); + else + get_task_comm(tcomm, current); + + pr_info("console lock API call [%d]: %s() by pid=%d, comm=%s\n", + reports, func_name, current->pid, tcomm); + show_stack(current, NULL, KERN_INFO); +} + /** * console_lock - block the console subsystem from printing * @@ -2868,13 +2948,14 @@ void console_lock(void) { might_sleep(); + print_console_lock_caller(__func__); + /* On panic, the console_lock must be left to the panic cpu. */ while (panic_on_other_cpu()) msleep(1000); down_console_sem(); if (console_suspended) { - printk_store(KERN_INFO "printk: %s\n", __func__); /* * Keep console locked, but do not touch * @console_locked or @console_may_schedule. @@ -2897,13 +2978,14 @@ EXPORT_SYMBOL(console_lock); */ int console_trylock(void) { + print_console_lock_caller(__func__); + /* On panic, the console_lock must be left to the panic cpu. */ if (panic_on_other_cpu()) return 0; if (down_trylock_console_sem()) return 0; if (console_suspended) { - printk_store(KERN_INFO "printk: %s\n", __func__); /* * The lock was acquired, but unlock directly and report * failure. Here console_locked=1 and console_may_schedule=1. @@ -2919,6 +3001,8 @@ EXPORT_SYMBOL(console_trylock); int is_console_locked(void) { +// /* This might produce too many entries. */ +// print_console_lock_caller(__func__); return console_locked; } EXPORT_SYMBOL(is_console_locked); @@ -3404,8 +3488,9 @@ void console_unlock(void) { struct console_flush_type ft; + print_console_lock_caller(__func__); + if (console_suspended) { - printk_store(KERN_INFO "printk: %s\n", __func__); /* * Simply unlock directly. * Here console_locked=1 and console_may_schedule=1. -- 2.52.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-30 16:28 ` Petr Mladek @ 2026-01-31 22:22 ` ysard 2026-02-02 11:02 ` Petr Mladek 0 siblings, 1 reply; 16+ messages in thread From: ysard @ 2026-01-31 22:22 UTC (permalink / raw) To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky Here are the logs, debug_console_lock is disabled 4 seconds after each test to avoid consuming the 100 reports (kworker events are frequent here). > It would be nice to provide three logs with this patch: > > 1. console_lock() API called by > echo suspend >/proc/driver/nvidia/suspend [ 65.417500] [ T88] printk: console lock API call [1]: console_lock() by pid=88, comm=kworker/4:1+events [ 65.417581] [ T88] printk: console lock API call [2]: console_unlock() by pid=88, comm=kworker/4:1+events [ 74.984380] [ T88] printk: console lock API call [3]: console_lock() by pid=88, comm=kworker/4:1+events [ 74.984450] [ T88] printk: console lock API call [4]: console_unlock() by pid=88, comm=kworker/4:1+events [ 75.134088] [ T88] printk: console lock API call [5]: console_lock() by pid=88, comm=kworker/4:1+events [ 75.134289] [ T88] printk: console lock API call [6]: console_unlock() by pid=88, comm=kworker/4:1+events > 2. console_lock() API called by the suspend test > > No freeze (expected): > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && systemctl suspend" [ 325.592510] [ T3023] printk: console lock API call [25]: console_lock() by pid=3023, comm=chvt [ 325.592594] [ T3023] printk: console lock API call [26]: console_unlock() by pid=3023, comm=chvt [ 325.592656] [ T200] printk: console lock API call [27]: console_lock() by pid=200, comm=kworker/7:2+events [ 325.592698] [ T200] printk: console lock API call [28]: console_unlock() by pid=200, comm=kworker/7:2+events [ 325.709430] [ T11] printk: console lock API call [29]: console_lock() by pid=11, comm=kworker/0:1+events [ 325.709483] [ T11] printk: console lock API call [30]: console_unlock() by pid=11, comm=kworker/0:1+events [ 326.592523] [ T3023] printk: console lock API call [31]: console_lock() by pid=3023, comm=chvt [ 326.592620] [ T3023] printk: console lock API call [32]: console_unlock() by pid=3023, comm=chvt [ 326.592709] [ T200] printk: console lock API call [33]: console_lock() by pid=200, comm=kworker/7:2+events [ 326.592757] [ T200] printk: console lock API call [34]: console_unlock() by pid=200, comm=kworker/7:2+events [ 327.151095] [ T1428] printk: console lock API call [35]: console_lock() by pid=1428, comm=Xorg [ 327.152983] [ T1428] printk: console lock API call [36]: console_unlock() by pid=1428, comm=Xorg [ 327.204795] [ T3050] printk: console lock API call [37]: console_lock() by pid=3050, comm=9 [ 327.204939] [ T3050] printk: console lock API call [38]: console_unlock() by pid=3050, comm=9 [ 327.205051] [ T3050] printk: console lock API call [39]: console_lock() by pid=3050, comm=9 [ 327.205158] [ T3050] printk: console lock API call [40]: console_unlock() by pid=3050, comm=9 [ 327.205335] [ T3050] printk: console lock API call [41]: console_lock() by pid=3050, comm=9 [ 327.205398] [ T3050] printk: console lock API call [42]: console_unlock() by pid=3050, comm=9 [ 327.517379] [ T3065] printk: console lock API call [43]: console_lock() by pid=3065, comm=9 [ 327.517483] [ T3065] printk: console lock API call [44]: console_unlock() by pid=3065, comm=9 [ 327.517567] [ T3065] printk: console lock API call [45]: console_lock() by pid=3065, comm=9 [ 327.517644] [ T3065] printk: console lock API call [46]: console_unlock() by pid=3065, comm=9 [ 327.517805] [ T3065] printk: console lock API call [47]: console_lock() by pid=3065, comm=9 [ 327.517890] [ T3065] printk: console lock API call [48]: console_unlock() by pid=3065, comm=9 [ 327.558752] [ T3073] printk: console lock API call [49]: console_lock() by pid=3073, comm=9 [ 327.558877] [ T3073] printk: console lock API call [50]: console_unlock() by pid=3073, comm=9 [ 327.558963] [ T3073] printk: console lock API call [51]: console_lock() by pid=3073, comm=9 [ 327.559043] [ T3073] printk: console lock API call [52]: console_unlock() by pid=3073, comm=9 [ 327.559197] [ T3073] printk: console lock API call [53]: console_lock() by pid=3073, comm=9 [ 327.559247] [ T3073] printk: console lock API call [54]: console_unlock() by pid=3073, comm=9 [ 327.585818] [ T3073] printk: console lock API call [55]: console_lock() by pid=3073, comm=systemd-sleep [ 327.585937] [ T3073] printk: console lock API call [56]: console_unlock() by pid=3073, comm=systemd-sleep [ 327.586127] [ T95] printk: console lock API call [57]: console_lock() by pid=95, comm=kworker/6:1+events [ 327.586168] [ T95] printk: console lock API call [58]: console_unlock() by pid=95, comm=kworker/6:1+events [ 327.591067] [ T3073] printk: Suspending console(s) (use no_console_suspend to debug) [ 327.591069] [ T3073] printk: console lock API call [59]: console_lock() by pid=3073, comm=systemd-sleep [ 327.591084] [ T3073] ? _printk+0x54/0x60 [ 327.591163] [ T3073] printk: console lock API call [60]: console_unlock() by pid=3073, comm=systemd-sleep [ 327.591175] [ T3073] ? _printk+0x54/0x60 [ 327.591242] [ T3073] printk: console lock API call [61]: console_lock() by pid=3073, comm=systemd-sleep [ 327.591253] [ T3073] ? _printk+0x54/0x60 [ 327.591322] [ T3073] printk: console lock API call [62]: console_unlock() by pid=3073, comm=systemd-sleep [ 327.591332] [ T3073] ? _printk+0x54/0x60 [ 327.591396] [ T3073] printk: console lock API call [63]: console_lock() by pid=3073, comm=systemd-sleep [ 327.591470] [ T3073] printk: console_suspend_all [ 328.652581] [ T89] printk: console lock API call [64]: console_lock() by pid=89, comm=kworker/1:1+events [ 328.652639] [ T89] printk: console lock API call [65]: console_unlock() by pid=89, comm=kworker/1:1+events [ 330.313166] [ T3073] printk: console_resume_all [ 330.313173] [ T3073] printk: console lock API call [66]: console_unlock() by pid=3073, comm=systemd-sleep [ 330.313319] [ T3073] printk: console lock API call [67]: console_lock() by pid=3073, comm=systemd-sleep [ 330.313425] [ T3073] printk: console lock API call [68]: console_unlock() by pid=3073, comm=systemd-sleep [ 330.313521] [ T3073] printk: console lock API call [69]: console_lock() by pid=3073, comm=systemd-sleep [ 330.313621] [ T3073] printk: console lock API call [70]: console_unlock() by pid=3073, comm=systemd-sleep [ 331.198632] [ T3073] printk: console lock API call [71]: console_lock() by pid=3073, comm=systemd-sleep [ 331.198909] [ T3073] printk: console lock API call [72]: console_unlock() by pid=3073, comm=systemd-sleep [ 331.199418] [ T89] printk: console lock API call [73]: console_lock() by pid=89, comm=kworker/1:1+events [ 331.199564] [ T89] printk: console lock API call [74]: console_unlock() by pid=89, comm=kworker/1:1+events [ 331.222975] [ T3150] printk: console lock API call [75]: console_lock() by pid=3150, comm=chvt [ 331.223105] [ T3150] printk: console lock API call [76]: console_unlock() by pid=3150, comm=chvt [ 331.223221] [ T3150] printk: console lock API call [77]: console_lock() by pid=3150, comm=chvt [ 331.223265] [ T3150] printk: console lock API call [78]: console_unlock() by pid=3150, comm=chvt [ 331.223309] [ T88] printk: console lock API call [79]: console_lock() by pid=88, comm=kworker/7:1+events [ 331.223366] [ T88] printk: console lock API call [80]: console_unlock() by pid=88, comm=kworker/7:1+events [ 331.223394] [ T1428] printk: console lock API call [81]: console_lock() by pid=1428, comm=Xorg [ 331.223491] [ T1428] printk: console lock API call [82]: console_unlock() by pid=1428, comm=Xorg [ 331.223526] [ T3150] printk: console lock API call [83]: console_lock() by pid=3150, comm=chvt [ 331.223605] [ T3150] printk: console lock API call [84]: console_unlock() by pid=3150, comm=chvt [ 331.230047] [ T1164] printk: console lock API call [85]: console_lock() by pid=1164, comm=systemd-logind [ 331.230191] [ T1164] printk: console lock API call [86]: console_unlock() by pid=1164, comm=systemd-logind [ 331.230333] [ T1164] printk: console lock API call [87]: console_lock() by pid=1164, comm=systemd-logind [ 331.230416] [ T1164] printk: console lock API call [88]: console_unlock() by pid=1164, comm=systemd-logind [ 331.230495] [ T1164] printk: console lock API call [89]: console_lock() by pid=1164, comm=systemd-logind [ 331.230579] [ T1164] printk: console lock API call [90]: console_unlock() by pid=1164, comm=systemd-logind > 3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.: > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && echo suspend >/proc/driver/nvidia/suspend \ > && systemctl suspend" [ 91.535688] [ T2576] printk: console lock API call [1]: console_lock() by pid=2576, comm=chvt [ 91.536179] [ T2576] printk: console lock API call [2]: console_unlock() by pid=2576, comm=chvt [ 91.536254] [ T281] printk: console lock API call [3]: console_lock() by pid=281, comm=kworker/3:2+events [ 91.536309] [ T281] printk: console lock API call [4]: console_unlock() by pid=281, comm=kworker/3:2+events [ 91.547297] [ T2580] printk: console lock API call [5]: console_lock() by pid=2580, comm=cached_setup_te [ 91.547433] [ T2580] printk: console lock API call [6]: console_unlock() by pid=2580, comm=cached_setup_te [ 91.547567] [ T2580] printk: console lock API call [7]: console_lock() by pid=2580, comm=cached_setup_te [ 91.547653] [ T2580] printk: console lock API call [8]: console_unlock() by pid=2580, comm=cached_setup_te [ 91.547729] [ T2580] printk: console lock API call [9]: console_lock() by pid=2580, comm=cached_setup_te [ 91.547802] [ T2580] printk: console lock API call [10]: console_unlock() by pid=2580, comm=cached_setup_te [ 91.547871] [ T2580] printk: console lock API call [11]: console_lock() by pid=2580, comm=cached_setup_te [ 91.547945] [ T2580] printk: console lock API call [12]: console_unlock() by pid=2580, comm=cached_setup_te [ 91.548022] [ T2580] printk: console lock API call [13]: console_lock() by pid=2580, comm=cached_setup_te [ 91.548104] [ T2580] printk: console lock API call [14]: console_unlock() by pid=2580, comm=cached_setup_te [ 91.559988] [ T900] printk: console lock API call [15]: console_lock() by pid=900, comm=kworker/4:2+events [ 91.560049] [ T900] printk: console lock API call [16]: console_unlock() by pid=900, comm=kworker/4:2+events [ 92.535710] [ T2576] printk: console lock API call [17]: console_lock() by pid=2576, comm=chvt [ 92.535796] [ T2576] printk: console lock API call [18]: console_unlock() by pid=2576, comm=chvt [ 92.535857] [ T281] printk: console lock API call [19]: console_lock() by pid=281, comm=kworker/3:2+events [ 92.535901] [ T281] printk: console lock API call [20]: console_unlock() by pid=281, comm=kworker/3:2+events [ 93.115410] [ T1426] printk: console lock API call [21]: console_lock() by pid=1426, comm=Xorg [ 93.123140] [ T1426] printk: console lock API call [22]: console_unlock() by pid=1426, comm=Xorg [ 94.008771] [ T2604] printk: console lock API call [23]: console_lock() by pid=2604, comm=9 [ 94.008866] [ T2604] printk: console lock API call [24]: console_unlock() by pid=2604, comm=9 [ 94.008940] [ T2604] printk: console lock API call [25]: console_lock() by pid=2604, comm=9 [ 94.009007] [ T2604] printk: console lock API call [26]: console_unlock() by pid=2604, comm=9 [ 94.009149] [ T2604] printk: console lock API call [27]: console_lock() by pid=2604, comm=9 [ 94.009210] [ T2604] printk: console lock API call [28]: console_unlock() by pid=2604, comm=9 [ 94.296534] [ T2619] printk: console lock API call [29]: console_lock() by pid=2619, comm=9 [ 94.296640] [ T2619] printk: console lock API call [30]: console_unlock() by pid=2619, comm=9 [ 94.296724] [ T2619] printk: console lock API call [31]: console_lock() by pid=2619, comm=9 [ 94.296801] [ T2619] printk: console lock API call [32]: console_unlock() by pid=2619, comm=9 [ 94.296955] [ T2619] printk: console lock API call [33]: console_lock() by pid=2619, comm=9 [ 94.297033] [ T2619] printk: console lock API call [34]: console_unlock() by pid=2619, comm=9 [ 94.346111] [ T76] kauditd_printk_skb: 2 callbacks suppressed [ 94.346957] [ C2] printk: console lock API call [35]: console_trylock() by pid=0, comm=swapper/2 [ 94.347059] [ C2] printk: console lock API call [36]: console_unlock() by pid=0, comm=swapper/2 [ 94.360483] [ T2628] printk: console lock API call [37]: console_lock() by pid=2628, comm=9 [ 94.360603] [ T2628] printk: console lock API call [38]: console_unlock() by pid=2628, comm=9 [ 94.360697] [ T2628] printk: console lock API call [39]: console_lock() by pid=2628, comm=9 [ 94.360786] [ T2628] printk: console lock API call [40]: console_unlock() by pid=2628, comm=9 [ 94.360939] [ T2628] printk: console lock API call [41]: console_lock() by pid=2628, comm=9 [ 94.361007] [ T2628] printk: console lock API call [42]: console_unlock() by pid=2628, comm=9 [ 94.417313] [ T2628] printk: console lock API call [43]: console_lock() by pid=2628, comm=systemd-sleep [ 94.417405] [ T2628] printk: console lock API call [44]: console_unlock() by pid=2628, comm=systemd-sleep [ 94.417562] [ T97] printk: console lock API call [45]: console_lock() by pid=97, comm=kworker/2:1+events [ 94.417601] [ T97] printk: console lock API call [46]: console_unlock() by pid=97, comm=kworker/2:1+events [ 94.420174] [ T2628] printk: Suspending console(s) (use no_console_suspend to debug) [ 94.420176] [ T2628] printk: console lock API call [47]: console_lock() by pid=2628, comm=systemd-sleep [ 94.420191] [ T2628] ? _printk+0x54/0x60 [ 94.420250] [ T2628] printk: console lock API call [48]: console_unlock() by pid=2628, comm=systemd-sleep [ 94.420262] [ T2628] ? _printk+0x54/0x60 [ 94.420310] [ T2628] printk: console lock API call [49]: console_lock() by pid=2628, comm=systemd-sleep [ 94.420321] [ T2628] ? _printk+0x54/0x60 [ 94.420371] [ T2628] printk: console lock API call [50]: console_unlock() by pid=2628, comm=systemd-sleep [ 94.420381] [ T2628] ? _printk+0x54/0x60 [ 94.420427] [ T2628] printk: console lock API call [51]: console_lock() by pid=2628, comm=systemd-sleep [ 94.420482] [ T2628] printk: console_suspend_all [ 95.220250] [ T97] printk: console lock API call [52]: console_lock() by pid=97, comm=kworker/2:1+events [ 95.220331] [ T97] printk: console lock API call [53]: console_unlock() by pid=97, comm=kworker/2:1+events [ 96.879223] [ T2628] printk: console_resume_all [ 96.879230] [ T2628] printk: console lock API call [54]: console_unlock() by pid=2628, comm=systemd-sleep [ 96.879385] [ T2628] printk: console lock API call [55]: console_lock() by pid=2628, comm=systemd-sleep [ 96.879543] [ T2628] printk: console lock API call [56]: console_unlock() by pid=2628, comm=systemd-sleep [ 96.879686] [ T2628] printk: console lock API call [57]: console_lock() by pid=2628, comm=systemd-sleep [ 96.879836] [ T2628] printk: console lock API call [58]: console_unlock() by pid=2628, comm=systemd-sleep [ 97.789766] [ T2628] printk: console lock API call [59]: console_lock() by pid=2628, comm=systemd-sleep [ 97.789850] [ T2628] printk: console lock API call [60]: console_unlock() by pid=2628, comm=systemd-sleep [ 97.789987] [ T293] printk: console lock API call [61]: console_lock() by pid=293, comm=kworker/2:2+events [ 97.790045] [ T293] printk: console lock API call [62]: console_unlock() by pid=293, comm=kworker/2:2+events [ 98.464634] [ T2863] printk: console lock API call [63]: console_lock() by pid=2863, comm=chvt [ 98.464756] [ T2863] printk: console lock API call [64]: console_unlock() by pid=2863, comm=chvt [ 98.464870] [ T2863] printk: console lock API call [65]: console_lock() by pid=2863, comm=chvt [ 98.464953] [ T2863] printk: console lock API call [66]: console_unlock() by pid=2863, comm=chvt [ 98.465031] [ T281] printk: console lock API call [67]: console_lock() by pid=281, comm=kworker/3:2+events [ 98.465105] [ T281] printk: console lock API call [68]: console_unlock() by pid=281, comm=kworker/3:2+events [ 98.465152] [ T1426] printk: console lock API call [69]: console_lock() by pid=1426, comm=Xorg [ 98.465244] [ T2863] printk: console lock API call [70]: console_lock() by pid=2863, comm=chvt [ 98.465284] [ T1426] printk: console lock API call [71]: console_unlock() by pid=1426, comm=Xorg [ 98.465388] [ T2863] printk: console lock API call [72]: console_unlock() by pid=2863, comm=chvt Cheers ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-01-31 22:22 ` ysard @ 2026-02-02 11:02 ` Petr Mladek 2026-02-03 1:32 ` ysard 0 siblings, 1 reply; 16+ messages in thread From: Petr Mladek @ 2026-02-02 11:02 UTC (permalink / raw) To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky On Sat 2026-01-31 23:22:09, ysard wrote: > Here are the logs, debug_console_lock is disabled 4 seconds after each test > to avoid consuming the 100 reports (kworker events are frequent here). > > > It would be nice to provide three logs with this patch: > > > > 1. console_lock() API called by > > echo suspend >/proc/driver/nvidia/suspend > > [ 65.417500] [ T88] printk: console lock API call [1]: console_lock() by pid=88, comm=kworker/4:1+events > [ 65.417581] [ T88] printk: console lock API call [2]: console_unlock() by pid=88, comm=kworker/4:1+events > [ 74.984380] [ T88] printk: console lock API call [3]: console_lock() by pid=88, comm=kworker/4:1+events > [ 74.984450] [ T88] printk: console lock API call [4]: console_unlock() by pid=88, comm=kworker/4:1+events > [ 75.134088] [ T88] printk: console lock API call [5]: console_lock() by pid=88, comm=kworker/4:1+events > [ 75.134289] [ T88] printk: console lock API call [6]: console_unlock() by pid=88, comm=kworker/4:1+events Could you please provide the full log, including the backtraces? I am curious about the callers of the console_lock()... Best Regards, Petr ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-02-02 11:02 ` Petr Mladek @ 2026-02-03 1:32 ` ysard 2026-02-03 14:11 ` Petr Mladek 0 siblings, 1 reply; 16+ messages in thread From: ysard @ 2026-02-03 1:32 UTC (permalink / raw) To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky On 2026-02-02 12:02:07, Petr Mladek wrote: > Could you please provide the full log, including the backtraces? > I am curious about the callers of the console_lock()... Sorry, I had been on the same grep since the beginning. Here are the complete logs. > 1. console_lock() API called by > echo suspend >/proc/driver/nvidia/suspend https://pastebin.com/6gv0bF1c > 2. console_lock() API called by the suspend test > > No freeze (expected): > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && systemctl suspend" https://pastebin.com/10JBnnRx > 3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.: > $ sudo sh -c " > mkdir -p /var/run/nvidia-sleep \ > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > && chvt 63 \ > && echo suspend >/proc/driver/nvidia/suspend \ > && systemctl suspend" https://pastebin.com/nqks8h7Z Cheers ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state 2026-02-03 1:32 ` ysard @ 2026-02-03 14:11 ` Petr Mladek 0 siblings, 0 replies; 16+ messages in thread From: Petr Mladek @ 2026-02-03 14:11 UTC (permalink / raw) To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky On Tue 2026-02-03 02:32:46, ysard wrote: > On 2026-02-02 12:02:07, Petr Mladek wrote: > > Could you please provide the full log, including the backtraces? > > I am curious about the callers of the console_lock()... > > Sorry, I had been on the same grep since the beginning. > Here are the complete logs. > > > > 1. console_lock() API called by > > echo suspend >/proc/driver/nvidia/suspend > > https://pastebin.com/6gv0bF1c > > > 2. console_lock() API called by the suspend test > > > > No freeze (expected): > > $ sudo sh -c " > > mkdir -p /var/run/nvidia-sleep \ > > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > > && chvt 63 \ > > && systemctl suspend" > > https://pastebin.com/10JBnnRx > > > 3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.: > > $ sudo sh -c " > > mkdir -p /var/run/nvidia-sleep \ > > && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \ > > && chvt 63 \ > > && echo suspend >/proc/driver/nvidia/suspend \ > > && systemctl suspend" > > https://pastebin.com/nqks8h7Z Thanks a lot for the complete logs. Honestly, I do not see any obvious ansver on our guestions. 1. Backtraces with nvidia functions =================================== I have hoped that the logs would include some backtraces where a function from the nvidia driver would try to get console_lock(). But I can't find any. Especially the 1st log from "echo suspend >/proc/driver/nvidia/suspend" does not include any nvidia code. Note that there are functions from the nvidia driver in the 3rd log. But they are all with '?', for example: 2026-01-31T23:01:51.032779+01:00 kernel: [ T1426] printk: console lock API call [21]: console_lock() by pid=1426, comm=Xorg 2026-01-31T23:01:51.032793+01:00 kernel: [ T1426] Call Trace: 2026-01-31T23:01:51.032794+01:00 kernel: [ T1426] <TASK> 2026-01-31T23:01:51.032796+01:00 kernel: [ T1426] print_console_lock_caller+0x154/0x220 2026-01-31T23:01:51.032797+01:00 kernel: [ T1426] console_lock+0x1a/0x60 2026-01-31T23:01:51.032798+01:00 kernel: [ T1426] vt_ioctl+0xe6d/0x1430 2026-01-31T23:01:51.032799+01:00 kernel: [ T1426] tty_ioctl+0x261/0x8d0 2026-01-31T23:01:51.032799+01:00 kernel: [ T1426] ? _nv023604rm+0x9/0x50 [nvidia] 2026-01-31T23:01:51.032800+01:00 kernel: [ T1426] ? _nv023631rm+0x28/0xf0 [nvidia] 2026-01-31T23:01:51.032801+01:00 kernel: [ T1426] __x64_sys_ioctl+0x96/0xe0 2026-01-31T23:01:51.032812+01:00 kernel: [ T1426] do_syscall_64+0x85/0x640 2026-01-31T23:01:51.032814+01:00 kernel: [ T1426] ? _nv011691rm+0xbe/0x100 [nvidia] 2026-01-31T23:01:51.032814+01:00 kernel: [ T1426] ? _nv037812rm+0x11a/0x160 [nvidia] It means that the addresses were found on stack but they were not part of the current call chain. The adresses might be on stack because they were passed as parameters or so. But more likely they are there as a garbage (return adresses) from previous stack users. 2. console_(try)lock() callers with suspended consoles ====================================================== The regression has gone after we restored the original behavior of console_(try/un)lock() API after console_suspend_all(). I would expect that a console_(try/un)lock() API caller would get blocked or do an out-of-memory access or so. But there seems to be only one caller in the 3rd log. Let me explain. The are 4 console_lock()/console_unlock() calls from the pr_flush() in console_suspend_all(). For example, the 4th one is: 2026-01-31T23:01:57.758969+01:00 kernel: [ T2628] printk: console lock API call [50]: console_unlock() by pid=2628, comm=systemd-sleep 2026-01-31T23:01:57.758971+01:00 kernel: [ T2628] Call Trace: 2026-01-31T23:01:57.758972+01:00 kernel: [ T2628] <TASK> 2026-01-31T23:01:57.758973+01:00 kernel: [ T2628] print_console_lock_caller+0x154/0x220 2026-01-31T23:01:57.758975+01:00 kernel: [ T2628] console_unlock+0x22/0xf0 2026-01-31T23:01:57.758976+01:00 kernel: [ T2628] ? down+0x1e/0x70 2026-01-31T23:01:57.758978+01:00 kernel: [ T2628] __pr_flush+0x11e/0x280 2026-01-31T23:01:57.758979+01:00 kernel: [ T2628] ? _printk+0x54/0x60 2026-01-31T23:01:57.758980+01:00 kernel: [ T2628] console_suspend_all+0x27/0xc0 2026-01-31T23:01:57.758981+01:00 kernel: [ T2628] suspend_devices_and_enter+0x11b/0xa20 2026-01-31T23:01:57.758983+01:00 kernel: [ T2628] pm_suspend+0x19b/0x620 2026-01-31T23:01:57.758984+01:00 kernel: [ T2628] state_store+0x6c/0xd0 Then there is one more console_lock() call, right before printing the message into the log: 2026-01-31T23:01:57.759223+01:00 kernel: [ T2628] printk: console lock API call [51]: console_lock() by pid=2628, comm=systemd-sleep 2026-01-31T23:01:57.759225+01:00 kernel: [ T2628] Call Trace: 2026-01-31T23:01:57.759227+01:00 kernel: [ T2628] <TASK> 2026-01-31T23:01:57.759229+01:00 kernel: [ T2628] print_console_lock_caller+0x154/0x220 2026-01-31T23:01:57.759230+01:00 kernel: [ T2628] console_lock+0x1a/0x60 2026-01-31T23:01:57.759232+01:00 kernel: [ T2628] console_suspend_all+0x41/0xc0 2026-01-31T23:01:57.759233+01:00 kernel: [ T2628] suspend_devices_and_enter+0x11b/0xa20 2026-01-31T23:01:57.759248+01:00 kernel: [ T2628] pm_suspend+0x19b/0x620 2026-01-31T23:01:57.759251+01:00 kernel: [ T2628] state_store+0x6c/0xd0 2026-01-31T23:01:57.759253+01:00 kernel: [ T2628] kernfs_fop_write_iter+0x130/0x210 2026-01-31T23:01:57.759255+01:00 kernel: [ T2628] vfs_write+0x274/0x430 2026-01-31T23:01:57.759256+01:00 kernel: [ T2628] ksys_write+0x5e/0xe0 2026-01-31T23:01:57.759258+01:00 kernel: [ T2628] do_syscall_64+0x85/0x640 2026-01-31T23:01:57.759259+01:00 kernel: [ T2628] ? ___pte_offset_map+0x1b/0x110 2026-01-31T23:01:57.759260+01:00 kernel: [ T2628] ? __handle_mm_fault+0xbdb/0x1020 2026-01-31T23:01:57.759261+01:00 kernel: [ T2628] ? count_memcg_events+0xdd/0x1a0 2026-01-31T23:01:57.759263+01:00 kernel: [ T2628] ? handle_mm_fault+0xbf/0x300 2026-01-31T23:01:57.759264+01:00 kernel: [ T2628] ? do_user_addr_fault+0x2c1/0x880 2026-01-31T23:01:57.759266+01:00 kernel: [ T2628] ? irqentry_exit+0x7b/0x6b0 2026-01-31T23:01:57.759267+01:00 kernel: [ T2628] entry_SYSCALL_64_after_hwframe+0x76/0x7e 2026-01-31T23:01:57.759269+01:00 kernel: [ T2628] RIP: 0033:0x7f961aa5e687 2026-01-31T23:01:57.759270+01:00 kernel: [ T2628] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff 2026-01-31T23:01:57.759272+01:00 kernel: [ T2628] RSP: 002b:00007ffde2cb0040 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 2026-01-31T23:01:57.759273+01:00 kernel: [ T2628] RAX: ffffffffffffffda RBX: 00007f961b0f8c80 RCX: 00007f961aa5e687 2026-01-31T23:01:57.759275+01:00 kernel: [ T2628] RDX: 0000000000000004 RSI: 000055993c9fee10 RDI: 0000000000000007 2026-01-31T23:01:57.759276+01:00 kernel: [ T2628] RBP: 000055993c9fee10 R08: 0000000000000000 R09: 0000000000000000 2026-01-31T23:01:57.759278+01:00 kernel: [ T2628] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004 2026-01-31T23:01:57.759279+01:00 kernel: [ T2628] R13: 000055993c9f62a0 R14: 00007f961abb4e80 R15: 00007ffde2cb01b0 2026-01-31T23:01:57.759280+01:00 kernel: [ T2628] </TASK> 2026-01-31T23:01:57.759282+01:00 kernel: [ T2628] printk: console_suspend_all Now, here is the critical section where the console_lock() API behaves differently. But there there is no caller until the system gets suspended: 2026-01-31T23:01:57.759283+01:00 kernel: [ T2641] sd 0:0:0:0: [sdc] Synchronizing SCSI cache 2026-01-31T23:01:57.759285+01:00 kernel: [ T269] ata1.00: Entering standby power mode 2026-01-31T23:01:57.759286+01:00 kernel: [ T2638] sd 1:0:0:0: [sda] Synchronizing SCSI cache 2026-01-31T23:01:57.759288+01:00 kernel: [ T2639] sd 2:0:0:0: [sdb] Synchronizing SCSI cache 2026-01-31T23:01:57.759289+01:00 kernel: [ T271] ata2.00: Entering standby power mode 2026-01-31T23:01:57.759290+01:00 kernel: [ T273] ata3.00: Entering standby power mode 2026-01-31T23:01:57.759291+01:00 kernel: [ T2628] ACPI: EC: interrupt blocked 2026-01-31T23:01:57.759293+01:00 kernel: [ T2628] ACPI: PM: Preparing to enter system sleep state S3 2026-01-31T23:01:57.759294+01:00 kernel: [ T2628] ACPI: EC: event blocked 2026-01-31T23:01:57.759296+01:00 kernel: [ T2628] ACPI: EC: EC stopped 2026-01-31T23:01:57.759297+01:00 kernel: [ T2628] ACPI: PM: Saving platform NVS memory 2026-01-31T23:01:57.759299+01:00 kernel: [ T2628] Disabling non-boot CPUs ... 2026-01-31T23:01:57.759300+01:00 kernel: [ T2628] smpboot: CPU 7 is now offline 2026-01-31T23:01:57.759301+01:00 kernel: [ T2628] smpboot: CPU 6 is now offline 2026-01-31T23:01:57.759303+01:00 kernel: [ T2628] smpboot: CPU 5 is now offline 2026-01-31T23:01:57.759304+01:00 kernel: [ T2628] smpboot: CPU 4 is now offline 2026-01-31T23:01:57.759306+01:00 kernel: [ T2628] smpboot: CPU 3 is now offline 2026-01-31T23:01:57.759307+01:00 kernel: [ T2628] smpboot: CPU 2 is now offline 2026-01-31T23:01:57.759309+01:00 kernel: [ T2628] smpboot: CPU 1 is now offline Then the resume starts: 2026-01-31T23:01:57.759310+01:00 kernel: [ T2628] ACPI: PM: Low-level resume complete 2026-01-31T23:01:57.759312+01:00 kernel: [ T2628] ACPI: EC: EC started 2026-01-31T23:01:57.759313+01:00 kernel: [ T2628] ACPI: PM: Restoring platform NVS memory 2026-01-31T23:01:57.759314+01:00 kernel: [ T2628] Enabling non-boot CPUs ... 2026-01-31T23:01:57.759316+01:00 kernel: [ T2628] smpboot: Booting Node 0 Processor 1 APIC 0x2 2026-01-31T23:01:57.759317+01:00 kernel: [ T2628] CPU1 is up 2026-01-31T23:01:57.759319+01:00 kernel: [ T2628] smpboot: Booting Node 0 Processor 2 APIC 0x4 2026-01-31T23:01:57.759320+01:00 kernel: [ T2628] CPU2 is up 2026-01-31T23:01:57.759321+01:00 kernel: [ T2628] smpboot: Booting Node 0 Processor 3 APIC 0x6 2026-01-31T23:01:57.759323+01:00 kernel: [ T2628] CPU3 is up 2026-01-31T23:01:57.759324+01:00 kernel: [ T2628] smpboot: Booting Node 0 Processor 4 APIC 0x1 2026-01-31T23:01:57.759325+01:00 kernel: [ T2628] CPU4 is up 2026-01-31T23:01:57.759326+01:00 kernel: [ T2628] smpboot: Booting Node 0 Processor 5 APIC 0x3 2026-01-31T23:01:57.759328+01:00 kernel: [ T2628] CPU5 is up 2026-01-31T23:01:57.759329+01:00 kernel: [ T2628] smpboot: Booting Node 0 Processor 6 APIC 0x5 2026-01-31T23:01:57.759330+01:00 kernel: [ T2628] CPU6 is up 2026-01-31T23:01:57.759332+01:00 kernel: [ T2628] smpboot: Booting Node 0 Processor 7 APIC 0x7 2026-01-31T23:01:57.759333+01:00 kernel: [ T2628] CPU7 is up 2026-01-31T23:01:57.759335+01:00 kernel: [ T2628] ACPI: PM: Waking up from system sleep state S3 2026-01-31T23:01:57.759336+01:00 kernel: [ T2628] ACPI: EC: interrupt unblocked 2026-01-31T23:01:57.759337+01:00 kernel: [ T70] usb usb1: root hub lost power or was reset 2026-01-31T23:01:57.759339+01:00 kernel: [ T2650] xhci_hcd 0000:00:14.0: xHC error in resume, USBSTS 0x411, Reinit 2026-01-31T23:01:57.759340+01:00 kernel: [ T2650] usb usb3: root hub lost power or was reset 2026-01-31T23:01:57.759342+01:00 kernel: [ T2650] usb usb4: root hub lost power or was reset 2026-01-31T23:01:57.759343+01:00 kernel: [ T2628] ACPI: EC: event unblocked 2026-01-31T23:01:57.759344+01:00 kernel: [ T2655] usb usb2: root hub lost power or was reset and finally, there is a console_lock API caller: 2026-01-31T23:01:57.759347+01:00 kernel: [ T97] printk: console lock API call [52]: console_lock() by pid=97, comm=kworker/2:1+events 2026-01-31T23:01:57.759349+01:00 kernel: [ T97] Call Trace: 2026-01-31T23:01:57.759350+01:00 kernel: [ T97] <TASK> 2026-01-31T23:01:57.759351+01:00 kernel: [ T97] print_console_lock_caller+0x154/0x220 2026-01-31T23:01:57.759353+01:00 kernel: [ T97] console_lock+0x1a/0x60 2026-01-31T23:01:57.759354+01:00 kernel: [ T97] console_callback+0x14/0x1e0 2026-01-31T23:01:57.759355+01:00 kernel: [ T97] process_one_work+0x19b/0x3c0 2026-01-31T23:01:57.759357+01:00 kernel: [ T97] worker_thread+0x256/0x370 2026-01-31T23:01:57.759358+01:00 kernel: [ T97] ? __pfx_worker_thread+0x10/0x10 2026-01-31T23:01:57.759360+01:00 kernel: [ T97] kthread+0xef/0x230 2026-01-31T23:01:57.759361+01:00 kernel: [ T97] ? __pfx_kthread+0x10/0x10 2026-01-31T23:01:57.759362+01:00 kernel: [ T97] ? __pfx_kthread+0x10/0x10 2026-01-31T23:01:57.759363+01:00 kernel: [ T97] ret_from_fork+0x20b/0x260 2026-01-31T23:01:57.759365+01:00 kernel: [ T97] ? __pfx_kthread+0x10/0x10 2026-01-31T23:01:57.759366+01:00 kernel: [ T97] ret_from_fork_asm+0x1a/0x30 2026-01-31T23:01:57.759367+01:00 kernel: [ T97] </TASK> 2026-01-31T23:01:57.759369+01:00 kernel: [ T97] printk: console lock API call [53]: console_unlock() by pid=97, comm=kworker/2:1+events 2026-01-31T23:01:57.759370+01:00 kernel: [ T97] Call Trace: 2026-01-31T23:01:57.759371+01:00 kernel: [ T97] <TASK> 2026-01-31T23:01:57.759373+01:00 kernel: [ T97] print_console_lock_caller+0x154/0x220 2026-01-31T23:01:57.759374+01:00 kernel: [ T97] console_unlock+0x22/0xf0 2026-01-31T23:01:57.759376+01:00 kernel: [ T97] process_one_work+0x19b/0x3c0 2026-01-31T23:01:57.759377+01:00 kernel: [ T97] worker_thread+0x256/0x370 2026-01-31T23:01:57.759378+01:00 kernel: [ T97] ? __pfx_worker_thread+0x10/0x10 2026-01-31T23:01:57.759380+01:00 kernel: [ T97] kthread+0xef/0x230 2026-01-31T23:01:57.759381+01:00 kernel: [ T97] ? __pfx_kthread+0x10/0x10 2026-01-31T23:01:57.759382+01:00 kernel: [ T97] ? __pfx_kthread+0x10/0x10 2026-01-31T23:01:57.759384+01:00 kernel: [ T97] ret_from_fork+0x20b/0x260 2026-01-31T23:01:57.759385+01:00 kernel: [ T97] ? __pfx_kthread+0x10/0x10 2026-01-31T23:01:57.759386+01:00 kernel: [ T97] ret_from_fork_asm+0x1a/0x30 2026-01-31T23:01:57.759388+01:00 kernel: [ T97] </TASK> It seems to be from the console_callback() from the generic drivers/tty/vt/vt.c code. Sigh, I am not sure if it does anything which might be affected by the state of the nvidia driver. Mabye, we should add more people into Cc who are more familiar with the VT code and graphical drivers. Anyway, this was the last and only call before console_resume_all() was called: 2026-01-31T23:01:57.759389+01:00 kernel: [ T2635] usb 1-1: reset high-speed USB device number 2 using ehci-pci 2026-01-31T23:01:57.759390+01:00 kernel: [ T2652] usb 2-1: reset high-speed USB device number 2 using ehci-pci 2026-01-31T23:01:57.759392+01:00 kernel: [ T97] psmouse serio4: synaptics: queried max coordinates: x [..5660], y [..4746] 2026-01-31T23:01:57.759393+01:00 kernel: [ T273] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 2026-01-31T23:01:57.759394+01:00 kernel: [ T271] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) 2026-01-31T23:01:57.759396+01:00 kernel: [ T273] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out 2026-01-31T23:01:57.759397+01:00 kernel: [ T271] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out 2026-01-31T23:01:57.759398+01:00 kernel: [ T271] ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out 2026-01-31T23:01:57.759581+01:00 kernel: [ T273] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out 2026-01-31T23:01:57.759584+01:00 kernel: [ T273] ata3.00: supports DRM functions and may not be fully accessible 2026-01-31T23:01:57.759586+01:00 kernel: [ T271] ata2.00: supports DRM functions and may not be fully accessible 2026-01-31T23:01:57.759588+01:00 kernel: [ T88] sd 2:0:0:0: [sdb] Starting disk 2026-01-31T23:01:57.759589+01:00 kernel: [ T273] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out 2026-01-31T23:01:57.759591+01:00 kernel: [ T273] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out 2026-01-31T23:01:57.759593+01:00 kernel: [ T273] ata3.00: supports DRM functions and may not be fully accessible 2026-01-31T23:01:57.759594+01:00 kernel: [ T179] sd 1:0:0:0: [sda] Starting disk 2026-01-31T23:01:57.759596+01:00 kernel: [ T273] ata3.00: configured for UDMA/133 2026-01-31T23:01:57.759597+01:00 kernel: [ T271] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out 2026-01-31T23:01:57.759599+01:00 kernel: [ T271] ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out 2026-01-31T23:01:57.759600+01:00 kernel: [ T97] psmouse serio4: synaptics: queried min coordinates: x [1282..], y [1106..] 2026-01-31T23:01:57.759602+01:00 kernel: [ T271] ata2.00: supports DRM functions and may not be fully accessible 2026-01-31T23:01:57.759603+01:00 kernel: [ T271] ata2.00: configured for UDMA/133 2026-01-31T23:01:57.759605+01:00 kernel: [ T2667] usb 3-4: reset low-speed USB device number 3 using xhci_hcd 2026-01-31T23:01:57.759606+01:00 kernel: [ T2635] usb 1-1.3: reset high-speed USB device number 3 using ehci-pci 2026-01-31T23:01:57.759608+01:00 kernel: [ T2665] usb 3-1: reset high-speed USB device number 2 using xhci_hcd 2026-01-31T23:01:57.759609+01:00 kernel: [ T2628] printk: console_resume_all 2026-01-31T23:01:57.759610+01:00 kernel: [ T2628] printk: console lock API call [54]: console_unlock() by pid=2628, comm=systemd-sleep 2026-01-31T23:01:57.759612+01:00 kernel: [ T2628] Call Trace: 2026-01-31T23:01:57.759613+01:00 kernel: [ T2628] <TASK> 2026-01-31T23:01:57.759614+01:00 kernel: [ T2628] print_console_lock_caller+0x154/0x220 2026-01-31T23:01:57.759616+01:00 kernel: [ T2628] console_unlock+0x22/0xf0 2026-01-31T23:01:57.759617+01:00 kernel: [ T2628] console_resume_all+0xe4/0x100 2026-01-31T23:01:57.759619+01:00 kernel: [ T2628] suspend_devices_and_enter+0x340/0xa20 2026-01-31T23:01:57.759620+01:00 kernel: [ T2628] pm_suspend+0x19b/0x620 2026-01-31T23:01:57.759621+01:00 kernel: [ T2628] state_store+0x6c/0xd0 So, the problem might be in the console_callback() callbacks. But it seems that most code paths are not called. Otherwise, WARN_CONSOLE_UNLOCKED() would warned. Note that the reverted console_lock() does not set "console_locked" when suspended. Or we might miss some calls. For example, console_trylock() calls in vprintk_emit() and fb_flashcursor() are not logged. But again a lot of vt code is guarded by WARN_CONSOLE_UNLOCKED(). And these parts were not called almost for sure. Hmm, I am not sure how to move forward. One more idea is to try shuffling the code in console_trylock()/ console_lock()/console_unlock() and see when the regression will be visible again. I mean to restore the "broken" behavior step by step: 1. set console_locked = 1 even when console_suspended 2. set console_may_schedule = 1 even when console_suspended 3. call __console_flush_and_unlock() even when console_suspended I wonder which particular change would break the suspend. It might be another clue. Any other ideas are welcome. Best Regards, Petr ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-02-03 14:11 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-21 22:42 Regression: system freeze on resume from suspend introduced by printk per-console suspended state ysard_git
2025-12-23 6:20 ` John Ogness
[not found] ` <trinity-43147d5d-a8ea-47c1-9f83-b578c346b387-1766479103562@3c-app-mailcom-bs12>
2026-01-08 0:05 ` pv
2026-01-08 9:43 ` John Ogness
2026-01-23 7:44 ` ysard
2026-01-23 12:19 ` Petr Mladek
2026-01-24 1:22 ` ysard
2026-01-28 14:00 ` Petr Mladek
2026-01-28 15:25 ` John Ogness
2026-01-29 9:34 ` ysard
2026-01-30 15:56 ` Petr Mladek
2026-01-30 16:28 ` Petr Mladek
2026-01-31 22:22 ` ysard
2026-02-02 11:02 ` Petr Mladek
2026-02-03 1:32 ` ysard
2026-02-03 14:11 ` Petr Mladek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox