public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Regression: system freeze on resume from suspend introduced by printk per-console suspended state
@ 2025-12-21 22:42 ysard_git
  2025-12-23  6:20 ` John Ogness
  0 siblings, 1 reply; 16+ messages in thread
From: ysard_git @ 2025-12-21 22:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: john.ogness, pmladek, senozhatsky

Hello,

I would like to report a suspend/resume regression which I bisected down
to a specific printk commit.

Summary
=======

On an ASUS laptop, resuming from suspend leads to a complete system freeze:
black screen, no keyboard response. The system must be powered off forcibly.

This regression is reproducible and was introduced by the following commit:

9e70a5e109a4a ("printk: Add per-console suspended state")

Hardware / environment
======================

- Laptop: G75VX ASUS
- CPU Arch: x86_64
- BIOS Information
    Version: G75VX.206
    Release Date: 02/27/2013
- GPU: NVIDIA GeForce GTX 670MX (standalone board only, no integrated chip, no optimus)
    Driver Version: 470.256.02
- Suspend mode: suspend-to-RAM (S3)
- Distribution: Debian Testing derivative (but the regression is reproduced with mainline kernel builds)
- Last working kernel from Debian:
    Linux version 6.5.0-5-amd64 (debian-kernel@lists.debian.org)
    (gcc-13 (Debian 13.2.0-7) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41)
    #1 SMP PREEMPT_DYNAMIC Debian 6.5.13-1 (2023-11-29)
- Kernel config: based on defconfig / Debian config
- Taint flag: yes, only Nvidia proprietary drivers are loaded and compiled via DKMS

Symptoms
========

- On resume from suspend to RAM
- Screen remains black
- No visible kernel panic (no logs)
- System is completely frozen (no VT switch, no Caps lock nor Num lock responses)
- Power cycle required

Regression details
==================

Suspend/resume works correctly with kernels prior to this commit.
After this commit, resume consistently results in a hard freeze.

I performed a git bisect on the upstream kernel repository. The result is:

9e70a5e109a4a23367810de09be826c52d27ee2f is the first bad commit

Commit details:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e70a5e109a4a23367810de09be826c52d27ee2f

Reverting this commit on a later kernel (v6.7 0dd3ee31125508cd67f7e7172247f05b7fd1753a)
restores correct suspend/resume behavior on this machine.

The problem is still present on stable v6.18.2.

Additional notes
================

By playing around with the tests in /sys/power/, I can say that the problem only
occurs during a real S3 and not a simulated one.

No problem on resume:
$ echo core > /sys/power/pm_test
$ echo deep > /sys/power/mem_sleep
$ systemctl suspend

Freeze on resume:
$ echo none > /sys/power/pm_test
$ echo deep > /sys/power/mem_sleep
$ systemctl suspend

The issue appears hardware-dependent and may be related to the interaction
between printk console usability and resume ordering. The system likely
blocks silently during resume before the graphics stack becomes usable.

If additional debugging, logs, or testing are needed, I am available to
provide them.

Thank you for your time and for any guidance on how to proceed.

Best regards.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2025-12-21 22:42 Regression: system freeze on resume from suspend introduced by printk per-console suspended state ysard_git
@ 2025-12-23  6:20 ` John Ogness
       [not found]   ` <trinity-43147d5d-a8ea-47c1-9f83-b578c346b387-1766479103562@3c-app-mailcom-bs12>
  0 siblings, 1 reply; 16+ messages in thread
From: John Ogness @ 2025-12-23  6:20 UTC (permalink / raw)
  To: ysard_git, linux-kernel; +Cc: pmladek, senozhatsky

Hi,

On 2025-12-21, ysard_git@gmx.fr wrote:
> I would like to report a suspend/resume regression which I bisected down
> to a specific printk commit.

Thanks for the report and your efforts to isolate the commit! Comments
below...

> Summary
> =======
>
> On an ASUS laptop, resuming from suspend leads to a complete system freeze:
> black screen, no keyboard response. The system must be powered off forcibly.
>
> This regression is reproducible and was introduced by the following commit:
>
> 9e70a5e109a4a ("printk: Add per-console suspended state")
>
> Hardware / environment
> ======================
>
> - Laptop: G75VX ASUS
> - CPU Arch: x86_64
> - BIOS Information
>     Version: G75VX.206
>     Release Date: 02/27/2013
> - GPU: NVIDIA GeForce GTX 670MX (standalone board only, no integrated chip, no optimus)
>     Driver Version: 470.256.02
> - Suspend mode: suspend-to-RAM (S3)
> - Distribution: Debian Testing derivative (but the regression is reproduced with mainline kernel builds)
> - Last working kernel from Debian:
>     Linux version 6.5.0-5-amd64 (debian-kernel@lists.debian.org)
>     (gcc-13 (Debian 13.2.0-7) 13.2.0, GNU ld (GNU Binutils for Debian) 2.41)
>     #1 SMP PREEMPT_DYNAMIC Debian 6.5.13-1 (2023-11-29)
> - Kernel config: based on defconfig / Debian config
> - Taint flag: yes, only Nvidia proprietary drivers are loaded and compiled via DKMS
>
> Symptoms
> ========
>
> - On resume from suspend to RAM
> - Screen remains black
> - No visible kernel panic (no logs)
> - System is completely frozen (no VT switch, no Caps lock nor Num lock responses)
> - Power cycle required
>
> Regression details
> ==================
>
> Suspend/resume works correctly with kernels prior to this commit.
> After this commit, resume consistently results in a hard freeze.
>
> I performed a git bisect on the upstream kernel repository. The result is:
>
> 9e70a5e109a4a23367810de09be826c52d27ee2f is the first bad commit
>
> Commit details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e70a5e109a4a23367810de09be826c52d27ee2f
>
> Reverting this commit on a later kernel (v6.7 0dd3ee31125508cd67f7e7172247f05b7fd1753a)
> restores correct suspend/resume behavior on this machine.

Interesting. This commit removes an ugly console_lock hack to supress
console flushing during the suspend/resume window.

> The problem is still present on stable v6.18.2.

It is great that you can reproduce it with 6.18.2. Are you also able to
test 6.19-rc1 and reproduce it? It would be nice to attack the problem
from the latest code first.

> Additional notes
> ================
>
> By playing around with the tests in /sys/power/, I can say that the problem only
> occurs during a real S3 and not a simulated one.
>
> No problem on resume:
> $ echo core > /sys/power/pm_test
> $ echo deep > /sys/power/mem_sleep
> $ systemctl suspend
>
> Freeze on resume:
> $ echo none > /sys/power/pm_test
> $ echo deep > /sys/power/mem_sleep
> $ systemctl suspend
>
> The issue appears hardware-dependent and may be related to the interaction
> between printk console usability and resume ordering. The system likely
> blocks silently during resume before the graphics stack becomes usable.
>
> If additional debugging, logs, or testing are needed, I am available to
> provide them.

It would be helpful to know if you are doing anything special (like
using "no_console_suspend") or have multiple consoles. Could you send
the output of:

$ cat /proc/consoles

   and 

$ cat /proc/cmdline

Once we have this information, we can start to dig deeper.

John Ogness

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
       [not found]   ` <trinity-43147d5d-a8ea-47c1-9f83-b578c346b387-1766479103562@3c-app-mailcom-bs12>
@ 2026-01-08  0:05     ` pv
  2026-01-08  9:43       ` John Ogness
  0 siblings, 1 reply; 16+ messages in thread
From: pv @ 2026-01-08  0:05 UTC (permalink / raw)
  To: ysard_git; +Cc: John Ogness, linux-kernel, pmladek, senozhatsky

Hi, I am resending the message from December 23 because lkml considered it as spam.

First, thank you for your response.

> It is great that you can reproduce it with 6.18.2. Are you also able to
> test 6.19-rc1 and reproduce it? It would be nice to attack the problem
> from the latest code first.

The code in 6.18.2 has changed enough that I don't dare change it as I did in 6.7.
After compiling and installing 6.19-rc1 (8f0b4cce4481fb22653697cced8d0d04027cb1e8),
I can confirm that the freeze is still present.

> It would be helpful to know if you are doing anything special (like
> using "no_console_suspend") or have multiple consoles. Could you send
> the output of:
>
> $ cat /proc/consoles
>
>    and
>
> $ cat /proc/cmdline
>
> Once we have this information, we can start to dig deeper.

$ cat /proc/consoles
tty0                 -WU (EC p  )    4:2

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.5.0-5-amd64 root=/dev/mapper/vg0-root ro cryptdevice=UUID=397b50bf-8142-4456-8af2-bf39dcc8f732:cryptdisk:allow-discards mitigations=off quiet splash

It's similar for 6.19.0-rc1.

$ cat /sys/module/printk/parameters/console_suspend
Y

Thank you for your time.
Best regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-08  0:05     ` pv
@ 2026-01-08  9:43       ` John Ogness
  2026-01-23  7:44         ` ysard
  0 siblings, 1 reply; 16+ messages in thread
From: John Ogness @ 2026-01-08  9:43 UTC (permalink / raw)
  To: pv, ysard_git; +Cc: linux-kernel, pmladek, senozhatsky

[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]

On 2026-01-08, pv <pierre.vignet@caramail.fr> wrote:
> The code in 6.18.2 has changed enough that I don't dare change it as I did in 6.7.
> After compiling and installing 6.19-rc1 (8f0b4cce4481fb22653697cced8d0d04027cb1e8),
> I can confirm that the freeze is still present.

Thanks for confirmation. Would you be willing to try a debugging patch?

I have attached a patch (based on 6.19-rc4). It should restore the old
console_lock behavior during suspend/resume. Assuming this works for
you, it also adds some debugging information so that we can figure out
who is locking the console.

Please compile with CONFIG_PRINTK_CALLER=y.

After suspend/resume, locate the debug lines with:

# dmesg | grep printk

It will probably look something like this:

# dmesg | grep printk
[   12.085727][  T843] printk: Suspending console(s) (use no_console_suspend to debug)
[   12.236199][  T843] printk: console_suspend
[   17.581734][   T67] printk: console_lock
[   17.583711][   T67] printk: console_unlock
[   17.644819][  T843] printk: console_lock
[   17.644823][  T843] printk: console_unlock
[   17.644827][  T843] printk: console_resume

If you have CONFIG_PRINTK_CALLER enabled (as I do here), it will show
the tasks locking the console. In my case it is PID 67 and 843. It would
be nice to know who these are. For me it is:

# cat /proc/67/comm
kworker/2:1-mm_percpu_wq

# cat /proc/843/comm
sh

So in summary, I would like to know:

1. Does suspend/resume work on 6.19-rc4 with this patch applied?

2. Output of "dmesg | grep printk" (after suspend/resume)

3. Which tasks were locking the console between console_suspend and
console_resume?

Your help in debugging this is greatly appreciated.

John Ogness


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-printk-Debug-new-vs.-old-suspend-resume-behavior.patch --]
[-- Type: text/x-diff, Size: 3581 bytes --]

From f3bb7d49b59b61a30f2fefc2b241d3f535b2afd2 Mon Sep 17 00:00:00 2001
From: John Ogness <john.ogness@linutronix.de>
Date: Thu, 8 Jan 2026 09:31:51 +0000
Subject: [PATCH] printk: Debug new vs. old suspend/resume behavior

This is just for debugging. It should restore the old console_lock
behavior for suspend/resume and also adds some debugging information.
Please compile with CONFIG_PRINTK_CALLER=y so that we can see which
tasks are locking/unlocking the console during suspend/resume.
---
 kernel/printk/printk.c | 61 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1d765ad242b8..fd69cab4368e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,6 +356,22 @@ static void __up_console_sem(unsigned long ip)
  */
 static int console_locked;
 
+static int console_suspended;
+
+int vprintk_store(int facility, int level,
+		  const struct dev_printk_info *dev_info,
+		  const char *fmt, va_list args);
+
+/* Helper function to store-only. */
+static void printk_store(const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
+	va_end(args);
+}
+
 /*
  *	Array of consoles built from command line options (console=)
  */
@@ -2841,6 +2857,15 @@ void console_lock(void)
 		msleep(1000);
 
 	down_console_sem();
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * Keep console locked, but do not touch
+		 * @console_locked or @console_may_schedule.
+		 * (Although they will both be 1 here anyway.)
+		 */
+		return;
+	}
 	console_locked = 1;
 	console_may_schedule = 1;
 }
@@ -2861,6 +2886,15 @@ int console_trylock(void)
 		return 0;
 	if (down_trylock_console_sem())
 		return 0;
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * The lock was acquired, but unlock directly and report
+		 * failure. Here console_locked=1 and console_may_schedule=1.
+		 */
+		up_console_sem();
+		return 0;
+	}
 	console_locked = 1;
 	console_may_schedule = 0;
 	return 1;
@@ -3354,6 +3388,16 @@ void console_unlock(void)
 {
 	struct console_flush_type ft;
 
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * Simply unlock directly.
+		 * Here console_locked=1 and console_may_schedule=1.
+		 */
+		up_console_sem();
+		return;
+	}
+
 	printk_get_console_flush_type(&ft);
 	if (ft.legacy_direct)
 		__console_flush_and_unlock();
@@ -3559,6 +3603,7 @@ struct tty_driver *console_device(int *index)
 void console_suspend(struct console *console)
 {
 	__pr_flush(console, 1000, true);
+
 	console_list_lock();
 	console_srcu_write_flags(console, console->flags & ~CON_ENABLED);
 	console_list_unlock();
@@ -3570,6 +3615,12 @@ void console_suspend(struct console *console)
 	 * using the port.
 	 */
 	synchronize_srcu(&console_srcu);
+
+	console_lock();
+	console_suspended = 1;
+	printk_store(KERN_INFO "printk: %s\n", __func__);
+	/* Unlock directly (i.e. without clearing @console_locked). */
+	up_console_sem();
 }
 EXPORT_SYMBOL(console_suspend);
 
@@ -3597,6 +3648,16 @@ void console_resume(struct console *console)
 		defer_console_output();
 
 	__pr_flush(console, 1000, true);
+
+	down_console_sem();
+	printk_store(KERN_INFO "printk: %s\n", __func__);
+	console_suspended = 0;
+	/*
+	 * Perform a regular unlock.
+	 * Here console_locked=1 and console_may_schedule=1.
+	 * @console_unlocked will be cleared.
+	 */
+	console_unlock();
 }
 EXPORT_SYMBOL(console_resume);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-08  9:43       ` John Ogness
@ 2026-01-23  7:44         ` ysard
  2026-01-23 12:19           ` Petr Mladek
  0 siblings, 1 reply; 16+ messages in thread
From: ysard @ 2026-01-23  7:44 UTC (permalink / raw)
  To: John Ogness; +Cc: linux-kernel, pmladek, senozhatsky

Good evening, thank you for your reply and the patch.


Summary
======

The patch does not seem to have any effect on the problem, *but* I have found a
way to temporarily fix the freeze by disabling the `nvidia-suspend` service.


Additional info for diagnostics
===============================

$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  470.256.02  Thu May  2 14:37:44 UTC 2024
GCC version:  gcc version 12.5.0 (Debian 12.5.0-6)

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89


Procedure requested
===================

> I have attached a patch (based on 6.19-rc4). It should restore the old
> console_lock behavior during suspend/resume. Assuming this works for
> you, it also adds some debugging information so that we can figure out
> who is locking the console.

I applied the patch. The behavior is the same as before (no resume).

$ uname -r
6.19.0-rc4-dirty

$ dmesg | grep printk
[    0.030102] [      T0] printk: log buffer data + meta data: 131072 + 458752 = 589824 bytes
[    0.077779] [      T0] printk: legacy console [tty0] enabled
[  152.678589] [   T1349] printk: Suspending console(s) (use no_console_suspend to debug)
...
no resume


Temporary solution
==================

I had the idea of restarting in recovery mode (rescue.target) to run the test.
The `systemctl suspend` command is not available in this mode, which forced me
to use the `pm-suspend` command, which allows for proper sleep and resume across
all kernel versions that I have been able to test previously.

Systemd triggers a number of services before actually going into sleep mode,
including a call to nvidia-suspend.service, which I disabled
("because it's always nvidia").

The following command restores normal operation of `systemctl suspend`,
including on the first non-functional commit found by the bisect
(9e70a5e109a4a23367810de09be826c52d27ee2f).

$ systemctl disable nvidia-suspend.service

This service calls a script `/usr/bin/nvidia-sleep.sh` that seems to play with
vt consoles and expects that they are still usable (`chvt 63` ?):

    #!/bin/bash

    if [ ! -f /proc/driver/nvidia/suspend ]; then
        exit 0
    fi

    RUN_DIR="/var/run/nvidia-sleep"
    XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number

    PATH="/bin:/usr/bin"

    case "$1" in
        suspend|hibernate)
            mkdir -p "${RUN_DIR}"
            fgconsole > "${XORG_VT_FILE}"
            chvt 63
            if [[ $? -ne 0 ]]; then
                exit $?
            fi
            echo "$1" > /proc/driver/nvidia/suspend
            exit $?
            ;;
        resume)
            echo "$1" > /proc/driver/nvidia/suspend
            #
            # Check if Xorg was determined to be running at the time
            # of suspend, and whether its VT was recorded.  If so,
            # attempt to switch back to this VT.
            #
            if [[ -f "${XORG_VT_FILE}" ]]; then
                XORG_PID=$(cat "${XORG_VT_FILE}")
                rm "${XORG_VT_FILE}"
                chvt "${XORG_PID}"
            fi
            exit 0
            ;;
        *)
            exit 1
    esac


Conclusion
==========

kernel              nvidia-suspend (systemd 259~rc1-1)  result
<  9e70a5e109a4     enabled                             ok
<  9e70a5e109a4     disabled                            ok
>= 9e70a5e109a4     enabled                             freeze
>= 9e70a5e109a4     disabled                            ok

- Reactivating this service causes the freeze to reappear in a reproducible pattern.
- The `pm-suspend` command has never stopped working.

It seems that this is a two-sided problem?
If the kernel is not the issue, I apologize and am sorry for wasting your time;
I should have thought about the layers added by systemd sooner.

Best regards.


Extra
=====

During my tests with 6.19.0-rc1 and 6.19.0-rc4, I noticed that resuming a sleep
test that used to work now fails (it worked in 6.18.2), but I think this is
unrelated and is due to another issue. I am noting this for historical purposes.

$ echo core > /sys/power/pm_test
$ echo deep > /sys/power/mem_sleep

Both commands `pm-suspend` or `systemctl suspend` have the same effect:

- Trigger suspend (`kernel: PM: suspend entry (deep)` in dmesg);
- No response when pressing the power button to wake up;
- Force shutdown by holding down the power button;
- The computer shuts down but the motherboard indicates a state similar to
 sleep mode (LED flashing);
- Pressing the power button starts the computer (fans + HDD spin up) for a
 fraction of a second (<1s) then the machine shuts down;
- Pressing the power button starts the machine normally
 (not a resume from sleep mode).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-23  7:44         ` ysard
@ 2026-01-23 12:19           ` Petr Mladek
  2026-01-24  1:22             ` ysard
  0 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2026-01-23 12:19 UTC (permalink / raw)
  To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky

On Fri 2026-01-23 08:44:39, ysard wrote:
> Good evening, thank you for your reply and the patch.
> 
> 
> Summary
> ======
> 
> The patch does not seem to have any effect on the problem, *but* I have found a
> way to temporarily fix the freeze by disabling the `nvidia-suspend` service.

Great catch!

> Additional info for diagnostics
> ===============================
> 
> $ cat /proc/driver/nvidia/version
> NVRM version: NVIDIA UNIX x86_64 Kernel Module  470.256.02  Thu May  2 14:37:44 UTC 2024
> GCC version:  gcc version 12.5.0 (Debian 12.5.0-6)
> 
> $ nvcc --version
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2019 NVIDIA Corporation
> Built on Wed_Oct_23_19:24:38_PDT_2019
> Cuda compilation tools, release 10.2, V10.2.89
> 
> 
> Procedure requested
> ===================
> 
> > I have attached a patch (based on 6.19-rc4). It should restore the old
> > console_lock behavior during suspend/resume. Assuming this works for
> > you, it also adds some debugging information so that we can figure out
> > who is locking the console.
> 
> I applied the patch. The behavior is the same as before (no resume).
> 
> $ uname -r
> 6.19.0-rc4-dirty
> 
> $ dmesg | grep printk
> [    0.030102] [      T0] printk: log buffer data + meta data: 131072 + 458752 = 589824 bytes
> [    0.077779] [      T0] printk: legacy console [tty0] enabled
> [  152.678589] [   T1349] printk: Suspending console(s) (use no_console_suspend to debug)
> ...
> no resume
> 
> 
> Temporary solution
> ==================
> 
> I had the idea of restarting in recovery mode (rescue.target) to run the test.
> The `systemctl suspend` command is not available in this mode, which forced me
> to use the `pm-suspend` command, which allows for proper sleep and resume across
> all kernel versions that I have been able to test previously.
> 
> Systemd triggers a number of services before actually going into sleep mode,
> including a call to nvidia-suspend.service, which I disabled
> ("because it's always nvidia").
> 
> The following command restores normal operation of `systemctl suspend`,
> including on the first non-functional commit found by the bisect
> (9e70a5e109a4a23367810de09be826c52d27ee2f).
> 
> $ systemctl disable nvidia-suspend.service
> 
> This service calls a script `/usr/bin/nvidia-sleep.sh` that seems to play with
> vt consoles and expects that they are still usable (`chvt 63` ?):

I did run chvt with strace and it does something like:

openat(AT_FDCWD, "/dev/tty0", O_RDWR)   = 3
ioctl(3, TCGETS2, {c_iflag=IGNBRK|IGNPAR, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|, c_cflag=B38400|CS8|CREAD, c_lflag=, ...}) = 0
ioctl(3, KDGKBTYPE, [KB_101])           = 0
rt_sigaction(SIGALRM, {sa_handler=0x556bcf0418e0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f3e60a42910}, NULL, 8) = 0
timer_create(CLOCK_REALTIME, {sigev_value={sival_int=1668762552, sival_ptr=0x7ffc63774bb8}, sigev_signo=SIGALRM, sigev_notify=SIGEV_SIGNAL}, [0]) = 0
timer_settime(0, 0, {it_interval={tv_sec=1, tv_nsec=0}, it_value={tv_sec=1, tv_nsec=0}}, NULL) = 0
ioctl(3, VT_ACTIVATE, 0x3f)             = 0
ioctl(3, VT_WAITACTIVE, 0x3f)           = 0

And vt_ioctl(,,VT_ACTIVATE) calls vc_allocate(arg) under
console_lock()...

The commit 9e70a5e109a4a233 ("printk: Add per-console suspended
state") does some changes in console_lock(). It newly sets:

	console_locked = 1;
	console_may_schedule = 1;

I can't see how this might cause the freeze.

Well, for example, it would allow console_conditional_schedule()
to get asleep. And it might be harder to obtain the lock when
the current owner is sleeping.

But it would have an effect only in CONFIG_PREEMPT_VOLUNTARY kernel.
The process might get scheduled even without cond_resched() with
other CONFIG_PREEMPT modes.

Also I would expect that the userspace waits until the services
finish the job before suspending the kernel.

>     #!/bin/bash
> 
>     if [ ! -f /proc/driver/nvidia/suspend ]; then
>         exit 0
>     fi
> 
>     RUN_DIR="/var/run/nvidia-sleep"
>     XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number
> 
>     PATH="/bin:/usr/bin"
> 
>     case "$1" in
>         suspend|hibernate)
>             mkdir -p "${RUN_DIR}"
>             fgconsole > "${XORG_VT_FILE}"
>             chvt 63
>             if [[ $? -ne 0 ]]; then
>                 exit $?
>             fi
>             echo "$1" > /proc/driver/nvidia/suspend
>             exit $?
>             ;;
>         resume)
>             echo "$1" > /proc/driver/nvidia/suspend
>             #
>             # Check if Xorg was determined to be running at the time
>             # of suspend, and whether its VT was recorded.  If so,
>             # attempt to switch back to this VT.
>             #
>             if [[ -f "${XORG_VT_FILE}" ]]; then
>                 XORG_PID=$(cat "${XORG_VT_FILE}")
>                 rm "${XORG_VT_FILE}"
>                 chvt "${XORG_PID}"
>             fi
>             exit 0

I just wonder. Could you please try to comment out the various
commands here and bisect whether the problem is with
"fgconsole", "chvt", or "echo XXX >/proc/driver/nvidia/suspend"
commands.

I mean to try to disable the counter parts in the suspend/resume
code paths and try whether the freeze is still reproducible?

>             ;;
>         *)
>             exit 1
>     esac
> 
> 
> Conclusion
> ==========
> 
> kernel              nvidia-suspend (systemd 259~rc1-1)  result
> <  9e70a5e109a4     enabled                             ok
> <  9e70a5e109a4     disabled                            ok
> >= 9e70a5e109a4     enabled                             freeze
> >= 9e70a5e109a4     disabled                            ok
> 
> - Reactivating this service causes the freeze to reappear in a reproducible pattern.
> - The `pm-suspend` command has never stopped working.
> 
> It seems that this is a two-sided problem?
> If the kernel is not the issue, I apologize and am sorry for wasting your time;
> I should have thought about the layers added by systemd sooner.

There is no need to apologize. It seems to be somehow affected
by the kernel commit. And it would be great to understand what
is going on.

> Extra
> =====
> 
> During my tests with 6.19.0-rc1 and 6.19.0-rc4, I noticed that resuming a sleep
> test that used to work now fails (it worked in 6.18.2), but I think this is
> unrelated and is due to another issue. I am noting this for historical purposes.
> 
> $ echo core > /sys/power/pm_test
> $ echo deep > /sys/power/mem_sleep
> 
> Both commands `pm-suspend` or `systemctl suspend` have the same effect:
> 
> - Trigger suspend (`kernel: PM: suspend entry (deep)` in dmesg);
> - No response when pressing the power button to wake up;
> - Force shutdown by holding down the power button;
> - The computer shuts down but the motherboard indicates a state similar to
>  sleep mode (LED flashing);
> - Pressing the power button starts the computer (fans + HDD spin up) for a
>  fraction of a second (<1s) then the machine shuts down;
> - Pressing the power button starts the machine normally
>  (not a resume from sleep mode).

This would deserve a separate thread. It would be great if you could
bisect it to the problematic commit.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-23 12:19           ` Petr Mladek
@ 2026-01-24  1:22             ` ysard
  2026-01-28 14:00               ` Petr Mladek
  0 siblings, 1 reply; 16+ messages in thread
From: ysard @ 2026-01-24  1:22 UTC (permalink / raw)
  To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky

On Fri 2026-01-23 13:19:34 +0100, Petr Mladek wrote:
> Also I would expect that the userspace waits until the services
> finish the job before suspending the kernel.

It does:

    janv. 24 00:33:41 systemd[1]: Reached target sleep.target - Sleep.
    janv. 24 00:33:41 systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions...
    janv. 24 00:33:41 suspend[51525]: nvidia-suspend.service
    janv. 24 00:33:41 logger[51525]: <13>Jan 24 00:33:41 suspend: nvidia-suspend.service
    janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.373:2351): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
    janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.969:2352): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
    janv. 24 00:33:45 systemd[1]: nvidia-suspend.service: Deactivated successfully.
    janv. 24 00:33:45 systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions.
    janv. 24 00:33:45 systemd[1]: Starting systemd-suspend.service - System Suspend...
    janv. 24 00:33:45 systemd[1]: session-1.scope: Unit now frozen-by-parent.
    janv. 24 00:33:45 systemd[1]: user@1000.service: Unit now frozen-by-parent.
    janv. 24 00:33:45 systemd[1]: user-1000.slice: Unit now frozen-by-parent.
    janv. 24 00:33:45 systemd[1]: user.slice: Unit now frozen.
    janv. 24 00:33:45 systemd-sleep[51562]: Successfully froze unit 'user.slice'.
    janv. 24 00:33:45 systemd-sleep[51562]: Performing sleep operation 'suspend'...
    janv. 24 00:33:45 kernel: PM: suspend entry (deep)


> >     #!/bin/bash
> >
> >     if [ ! -f /proc/driver/nvidia/suspend ]; then
> >         exit 0
> >     fi
> >
> >     RUN_DIR="/var/run/nvidia-sleep"
> >     XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number
> >
> >     PATH="/bin:/usr/bin"
> >
> >     case "$1" in
> >         suspend|hibernate)
> >             mkdir -p "${RUN_DIR}"
> >             fgconsole > "${XORG_VT_FILE}"
> >             chvt 63
> >             if [[ $? -ne 0 ]]; then
> >                 exit $?
> >             fi
> >             echo "$1" > /proc/driver/nvidia/suspend
> >             exit $?
> >             ;;
> >         resume)
> >             echo "$1" > /proc/driver/nvidia/suspend
> >             #
> >             # Check if Xorg was determined to be running at the time
> >             # of suspend, and whether its VT was recorded.  If so,
> >             # attempt to switch back to this VT.
> >             #
> >             if [[ -f "${XORG_VT_FILE}" ]]; then
> >                 XORG_PID=$(cat "${XORG_VT_FILE}")
> >                 rm "${XORG_VT_FILE}"
> >                 chvt "${XORG_PID}"
> >             fi
> >             exit 0
>
> I just wonder. Could you please try to comment out the various
> commands here and bisect whether the problem is with
> "fgconsole", "chvt", or "echo XXX >/proc/driver/nvidia/suspend"
> commands.
>
> I mean to try to disable the counter parts in the suspend/resume
> code paths and try whether the freeze is still reproducible?

Yes I have a reproducible pattern here. With the service disabled.
The service `nvidia-resume.service` (which basically calls the script
with the 'resume' argument) is expected to start if the resume is 
completed, but the system does not reach this stage during the freeze.

No freeze:
    $ sudo sh -c "
    mkdir -p /var/run/nvidia-sleep \
    && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
    && chvt 63 \
    && systemctl suspend"

Freeze:
    $ sudo sh -c "
    mkdir -p /var/run/nvidia-sleep \
    && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
    && chvt 63 \
    && echo suspend >/proc/driver/nvidia/suspend \
    && systemctl suspend"

So the problem is related to this command:
    $ echo suspend >/proc/driver/nvidia/suspend

Note that without the systemctl order this command suspends and wakes up the gpu correctly:
    $ sudo sh -c "
    chvt 63 \
    && echo suspend >/proc/driver/nvidia/suspend; \
    sleep 4; \
    echo resume >/proc/driver/nvidia/suspend; \
    chvt 2"


Best regards.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-24  1:22             ` ysard
@ 2026-01-28 14:00               ` Petr Mladek
  2026-01-28 15:25                 ` John Ogness
  2026-01-29  9:34                 ` ysard
  0 siblings, 2 replies; 16+ messages in thread
From: Petr Mladek @ 2026-01-28 14:00 UTC (permalink / raw)
  To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky

On Sat 2026-01-24 02:22:41, ysard wrote:
> On Fri 2026-01-23 13:19:34 +0100, Petr Mladek wrote:
> > Also I would expect that the userspace waits until the services
> > finish the job before suspending the kernel.
> 
> It does:
> 
>     janv. 24 00:33:41 systemd[1]: Reached target sleep.target - Sleep.
>     janv. 24 00:33:41 systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions...
>     janv. 24 00:33:41 suspend[51525]: nvidia-suspend.service
>     janv. 24 00:33:41 logger[51525]: <13>Jan 24 00:33:41 suspend: nvidia-suspend.service
>     janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.373:2351): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
>     janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.969:2352): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
>     janv. 24 00:33:45 systemd[1]: nvidia-suspend.service: Deactivated successfully.
>     janv. 24 00:33:45 systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions.
>     janv. 24 00:33:45 systemd[1]: Starting systemd-suspend.service - System Suspend...
>     janv. 24 00:33:45 systemd[1]: session-1.scope: Unit now frozen-by-parent.
>     janv. 24 00:33:45 systemd[1]: user@1000.service: Unit now frozen-by-parent.
>     janv. 24 00:33:45 systemd[1]: user-1000.slice: Unit now frozen-by-parent.
>     janv. 24 00:33:45 systemd[1]: user.slice: Unit now frozen.
>     janv. 24 00:33:45 systemd-sleep[51562]: Successfully froze unit 'user.slice'.
>     janv. 24 00:33:45 systemd-sleep[51562]: Performing sleep operation 'suspend'...
>     janv. 24 00:33:45 kernel: PM: suspend entry (deep)

OK.

> Yes I have a reproducible pattern here. With the service disabled.
> The service `nvidia-resume.service` (which basically calls the script
> with the 'resume' argument) is expected to start if the resume is 
> completed, but the system does not reach this stage during the freeze.
> 
> No freeze:
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && systemctl suspend"
> 
> Freeze:
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && echo suspend >/proc/driver/nvidia/suspend \
>     && systemctl suspend"
> 
> So the problem is related to this command:
>     $ echo suspend >/proc/driver/nvidia/suspend
> 
> Note that without the systemctl order this command suspends and wakes up the gpu correctly:
>     $ sudo sh -c "
>     chvt 63 \
>     && echo suspend >/proc/driver/nvidia/suspend; \
>     sleep 4; \
>     echo resume >/proc/driver/nvidia/suspend; \
>     chvt 2"

Interesting. It looks like the nvidia suspend does something which
breaks the system suspend. But the driver is able to revert it...

To be honest, I do not have any theory which could explain this.

But I have found a bug in John's debug patch from
https://lore.kernel.org/all/877bts1ltv.fsf@jogness.linutronix.de/

The patch tried to restore the original behavior on current mainline.
But console_suspend()/cosnole_resume() function have been renamed recently
to console_suspend_all()/console_resume_all(). The original
names were used for console-specific suspend/resume variants,
see
https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/

Also the debug patch did not revert synchronize_srcu(). I guess that
this was intentional. But I would rather revert it as well because
it is a potentially blocking operation.

Could you please test it with this fixed version of the debug patch?

If the patch helps, by chance, then please try to uncomment
the synchronize_srcu() calls and check if it still works.
I wonder if they make in difference.

From a36b57cbcb239e7e5af4fb8278690cd4965d6fc0 Mon Sep 17 00:00:00 2001
From: John Ogness <john.ogness@linutronix.de>
Date: Thu, 8 Jan 2026 10:49:24 +0106
Subject: [DEBUG v2] printk: Debug new vs. old suspend/resume behavior

This is just for debugging. It should restore the old console_lock
behavior for suspend/resume and also adds some debugging information.
Please compile with CONFIG_PRINTK_CALLER=y so that we can see which
tasks are locking/unlocking the console during suspend/resume.

Changes against v1:

- Set/Clear the global "console_suspended" variable in
  console_suspend_all()/console_restore_all() instead of
  console_suspend()/console_resume().

  The functions have been renamed recently, see
  https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/

- Do not call synchronize_srcu() in the suspend/resume functions.
  They are another potentially blocking operation added by
  the problematic commit 9e70a5e109a4a2336 ("printk: Add per-console
  suspended state").

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 64 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1d765ad242b8..23fddc4006d3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,6 +356,22 @@ static void __up_console_sem(unsigned long ip)
  */
 static int console_locked;
 
+static int console_suspended;
+
+int vprintk_store(int facility, int level,
+		  const struct dev_printk_info *dev_info,
+		  const char *fmt, va_list args);
+
+/* Helper function to store-only. */
+static void printk_store(const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
+	va_end(args);
+}
+
 /*
  *	Array of consoles built from command line options (console=)
  */
@@ -2748,6 +2764,12 @@ void console_suspend_all(void)
 	if (!console_suspend_enabled)
 		return;
 
+	console_lock();
+	console_suspended = 1;
+	printk_store(KERN_INFO "printk: %s\n", __func__);
+	/* Unlock directly (i.e. without clearing @console_locked). */
+	up_console_sem();
+
 	console_list_lock();
 	for_each_console(con)
 		console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
@@ -2759,7 +2781,7 @@ void console_suspend_all(void)
 	 * is guaranteed that all printing has stopped when this function
 	 * completes.
 	 */
-	synchronize_srcu(&console_srcu);
+//	synchronize_srcu(&console_srcu);
 }
 
 void console_resume_all(void)
@@ -2785,7 +2807,17 @@ void console_resume_all(void)
 		 * contexts must be able to see they are no longer suspended so
 		 * that they are guaranteed to wake up and resume printing.
 		 */
-		synchronize_srcu(&console_srcu);
+//		synchronize_srcu(&console_srcu);
+
+		down_console_sem();
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		console_suspended = 0;
+		/*
+		 * Perform a regular unlock.
+		 * Here console_locked=1 and console_may_schedule=1.
+		 * @console_unlocked will be cleared.
+		 */
+		console_unlock();
 	}
 
 	printk_get_console_flush_type(&ft);
@@ -2841,6 +2873,15 @@ void console_lock(void)
 		msleep(1000);
 
 	down_console_sem();
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * Keep console locked, but do not touch
+		 * @console_locked or @console_may_schedule.
+		 * (Although they will both be 1 here anyway.)
+		 */
+		return;
+	}
 	console_locked = 1;
 	console_may_schedule = 1;
 }
@@ -2861,6 +2902,15 @@ int console_trylock(void)
 		return 0;
 	if (down_trylock_console_sem())
 		return 0;
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * The lock was acquired, but unlock directly and report
+		 * failure. Here console_locked=1 and console_may_schedule=1.
+		 */
+		up_console_sem();
+		return 0;
+	}
 	console_locked = 1;
 	console_may_schedule = 0;
 	return 1;
@@ -3354,6 +3404,16 @@ void console_unlock(void)
 {
 	struct console_flush_type ft;
 
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * Simply unlock directly.
+		 * Here console_locked=1 and console_may_schedule=1.
+		 */
+		up_console_sem();
+		return;
+	}
+
 	printk_get_console_flush_type(&ft);
 	if (ft.legacy_direct)
 		__console_flush_and_unlock();
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-28 14:00               ` Petr Mladek
@ 2026-01-28 15:25                 ` John Ogness
  2026-01-29  9:34                 ` ysard
  1 sibling, 0 replies; 16+ messages in thread
From: John Ogness @ 2026-01-28 15:25 UTC (permalink / raw)
  To: Petr Mladek, ysard; +Cc: linux-kernel, senozhatsky

On 2026-01-28, Petr Mladek <pmladek@suse.com> wrote:
> But I have found a bug in John's debug patch from
> https://lore.kernel.org/all/877bts1ltv.fsf@jogness.linutronix.de/
>
> The patch tried to restore the original behavior on current mainline.
> But console_suspend()/cosnole_resume() function have been renamed recently
> to console_suspend_all()/console_resume_all(). The original
> names were used for console-specific suspend/resume variants,
> see
> https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/
>
> Also the debug patch did not revert synchronize_srcu(). I guess that
> this was intentional.

I was also concerned about synchronize_srcu() being the culprit, which
is why I left it in. I was really surprised the test patch still had
problems.

> But I would rather revert it as well because
> it is a potentially blocking operation.
>
> Could you please test it with this fixed version of the debug patch?
>
> If the patch helps, by chance, then please try to uncomment
> the synchronize_srcu() calls and check if it still works.

Also, if the patch still has the problem, it would be nice to see the
dmesg output with the patch applied when you do only the nvidia
suspend/resume and avoid systemctl.

> I wonder if they make in difference.

Thanks Petr for the new patch version. I am curious what comes of it.

John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-28 14:00               ` Petr Mladek
  2026-01-28 15:25                 ` John Ogness
@ 2026-01-29  9:34                 ` ysard
  2026-01-30 15:56                   ` Petr Mladek
  1 sibling, 1 reply; 16+ messages in thread
From: ysard @ 2026-01-29  9:34 UTC (permalink / raw)
  To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky

Summary
=======

The patch works only when I *uncomment* the 2 `synchronize_srcu(&console_srcu);` lines!

With synchronize_srcu commented (as is)
=======================================

No freeze (expected):
    $ sudo sh -c "
    mkdir -p /var/run/nvidia-sleep \
    && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
    && chvt 63 \
    && systemctl suspend"

Logs:
    [  378.634960] [   T2805] printk: Suspending console(s) (use no_console_suspend to debug)
    [  378.634961] [   T2805] printk: console_suspend_all
    [  378.669076] [   T2831] printk: console_trylock
    [  378.669385] [    T271] printk: console_trylock
    [  378.677075] [   T2826] printk: console_trylock
    [  378.677094] [   T2832] printk: console_trylock
    [  378.677597] [    T269] printk: console_trylock
    [  378.678842] [    T273] printk: console_trylock
    [  378.820962] [    T100] printk: console_trylock
    [  379.498727] [   T2805] printk: console_trylock
    [  379.537391] [   T2805] printk: console_trylock
    [  379.650475] [   T2805] printk: console_trylock
    [  379.650477] [   T2805] printk: console_trylock
    [  379.650478] [   T2805] printk: console_trylock
    [  379.650618] [   T2805] printk: console_trylock
    [  379.652295] [   T2805] printk: console_trylock
    [  379.654336] [   T2805] printk: console_trylock
    [  379.656435] [   T2805] printk: console_trylock
    [  379.658727] [   T2805] printk: console_trylock
    [  379.660623] [   T2805] printk: console_trylock
    [  379.662429] [   T2805] printk: console_trylock
    [  379.664265] [   T2805] printk: console_trylock
    [  379.666130] [   T2805] printk: console_trylock
    [  379.666154] [   T2805] printk: console_trylock
    [  379.666156] [   T2805] printk: console_trylock
    [  379.666607] [   T2805] printk: console_trylock
    [  379.666654] [   T2805] printk: console_trylock
    [  379.670212] [   T2805] printk: console_trylock
    [  379.670243] [   T2805] printk: console_trylock
    [  379.673828] [   T2805] printk: console_trylock
    [  379.673863] [   T2805] printk: console_trylock
    [  379.677518] [   T2805] printk: console_trylock
    [  379.677547] [   T2805] printk: console_trylock
    [  379.680645] [   T2805] printk: console_trylock
    [  379.680675] [   T2805] printk: console_trylock
    [  379.683811] [   T2805] printk: console_trylock
    [  379.683836] [   T2805] printk: console_trylock
    [  379.686960] [   T2805] printk: console_trylock
    [  379.686984] [   T2805] printk: console_trylock
    [  379.690141] [   T2805] printk: console_trylock
    [  379.693604] [   T2805] printk: console_trylock
    [  379.753022] [   T2805] printk: console_trylock
    [  379.754384] [   T2805] printk: console_trylock
    [  379.754386] [   T2831] printk: console_trylock
    [  379.754392] [   T2831] printk: console_trylock
    [  379.754395] [   T2831] printk: console_trylock
    [  379.754421] [   T2825] printk: console_trylock
    [  379.754466] [   T2829] printk: console_trylock
    [  379.754563] [   T2826] printk: console_trylock
    [  379.766769] [     T11] printk: console_lock
    [  379.766772] [     T11] printk: console_unlock
    [  380.030746] [   T2826] printk: console_trylock
    [  380.030749] [   T2825] printk: console_trylock
    [  380.072183] [     T11] printk: console_trylock
    [  380.081075] [    T271] printk: console_trylock
    [  380.081100] [    T273] printk: console_trylock
    [  380.081734] [    T273] printk: console_trylock
    [  380.082280] [    T271] printk: console_trylock
    [  380.083257] [    T271] printk: console_trylock
    [  380.083767] [    T273] printk: console_trylock
    [  380.084464] [    T271] printk: console_trylock
    [  380.086088] [    T273] printk: console_trylock
    [  380.091983] [    T334] printk: console_trylock
    [  380.094504] [    T271] printk: console_trylock
    [  380.095494] [    T271] printk: console_trylock
    [  380.096698] [    T271] printk: console_trylock
    [  380.098628] [     T11] printk: console_trylock
    [  380.099849] [     T84] printk: console_trylock
    [  380.104195] [    T271] printk: console_trylock
    [  380.104493] [    T273] printk: console_trylock
    [  380.106533] [    T273] printk: console_trylock
    [  380.108797] [    T273] printk: console_trylock
    [  380.122616] [    T273] printk: console_trylock
    [  380.158723] [   T2831] printk: console_trylock
    [  380.470807] [   T2825] printk: console_trylock
    [  380.652605] [   T2805] printk: console_resume_all

Proc comm values:
    100: kworker/5:1-mm_percpu_wq
    11: kworker/0:1-events
    269: scsi_eh_0
    271: scsi_eh_1
    273: scsi_eh_2
    2805: not exists or not readable
    2825: not exists or not readable
    2826: not exists or not readable
    2829: not exists or not readable
    2831: kworker/u32:17-flush-253:1
    2832: not exists or not readable
    334: not exists or not readable
    76: kauditd
    84: kworker/2:1-events

---

Freeze (as before):
    $ sudo sh -c "
    mkdir -p /var/run/nvidia-sleep \
    && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
    && chvt 63 \
    && echo suspend >/proc/driver/nvidia/suspend \
    && systemctl suspend"


With synchronize_srcu uncommented
=================================

No freeze (expected):
    $ sudo sh -c "
    mkdir -p /var/run/nvidia-sleep \
    && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
    && chvt 63 \
    && systemctl suspend"

Logs:
    [   97.006391] [   T2643] printk: Suspending console(s) (use no_console_suspend to debug)
    [   97.006393] [   T2643] printk: console_suspend_all
    [   97.043073] [   T2654] printk: console_trylock
    [   97.043290] [   T2655] printk: console_trylock
    [   97.043486] [    T270] printk: console_trylock
    [   97.044873] [    T272] printk: console_trylock
    [   97.067023] [   T2656] printk: console_trylock
    [   97.067403] [    T268] printk: console_trylock
    [   97.138911] [      T9] printk: console_trylock
    [   97.853561] [   T2643] printk: console_trylock
    [   97.891353] [   T2643] printk: console_trylock
    [   98.003694] [   T2643] printk: console_trylock
    [   98.003696] [   T2643] printk: console_trylock
    [   98.003697] [   T2643] printk: console_trylock
    [   98.003838] [   T2643] printk: console_trylock
    [   98.005787] [   T2643] printk: console_trylock
    [   98.008142] [   T2643] printk: console_trylock
    [   98.010180] [   T2643] printk: console_trylock
    [   98.012412] [   T2643] printk: console_trylock
    [   98.014319] [   T2643] printk: console_trylock
    [   98.016306] [   T2643] printk: console_trylock
    [   98.018004] [   T2643] printk: console_trylock
    [   98.019610] [   T2643] printk: console_trylock
    [   98.019634] [   T2643] printk: console_trylock
    [   98.019636] [   T2643] printk: console_trylock
    [   98.020082] [   T2643] printk: console_trylock
    [   98.020130] [   T2643] printk: console_trylock
    [   98.023697] [   T2643] printk: console_trylock
    [   98.023730] [   T2643] printk: console_trylock
    [   98.027334] [   T2643] printk: console_trylock
    [   98.027369] [   T2643] printk: console_trylock
    [   98.031044] [   T2643] printk: console_trylock
    [   98.031074] [   T2643] printk: console_trylock
    [   98.034191] [   T2643] printk: console_trylock
    [   98.034221] [   T2643] printk: console_trylock
    [   98.037379] [   T2643] printk: console_trylock
    [   98.037405] [   T2643] printk: console_trylock
    [   98.040576] [   T2643] printk: console_trylock
    [   98.040599] [   T2643] printk: console_trylock
    [   98.043786] [   T2643] printk: console_trylock
    [   98.047294] [   T2643] printk: console_trylock
    [   98.107363] [   T2643] printk: console_trylock
    [   98.108733] [   T2643] printk: console_trylock
    [   98.108748] [     T64] printk: console_trylock
    [   98.108751] [     T70] printk: console_trylock
    [   98.108753] [     T70] printk: console_trylock
    [   98.108862] [   T2660] printk: console_trylock
    [   98.108869] [   T2665] printk: console_trylock
    [   98.124749] [     T84] printk: console_lock
    [   98.124752] [     T84] printk: console_unlock
    [   98.384581] [     T64] printk: console_trylock
    [   98.384591] [   T2665] printk: console_trylock
    [   98.432411] [     T84] printk: console_trylock
    [   98.442955] [    T272] printk: console_trylock
    [   98.442980] [    T270] printk: console_trylock
    [   98.443315] [    T272] printk: console_trylock
    [   98.444400] [    T270] printk: console_trylock
    [   98.445386] [    T270] printk: console_trylock
    [   98.445776] [    T272] printk: console_trylock
    [   98.446591] [    T270] printk: console_trylock
    [   98.448456] [    T272] printk: console_trylock
    [   98.454112] [     T88] printk: console_trylock
    [   98.456644] [    T270] printk: console_trylock
    [   98.457620] [    T270] printk: console_trylock
    [   98.458827] [    T270] printk: console_trylock
    [   98.460353] [     T84] printk: console_trylock
    [   98.464803] [      T9] printk: console_trylock
    [   98.466327] [    T270] printk: console_trylock
    [   98.470287] [    T272] printk: console_trylock
    [   98.472759] [    T272] printk: console_trylock
    [   98.475445] [    T272] printk: console_trylock
    [   98.491793] [    T272] printk: console_trylock
    [   98.512594] [     T70] printk: console_trylock
    [   98.824640] [     T64] printk: console_trylock
    [   99.006658] [   T2643] printk: console_resume_all

Proc comm values:
    2643: not exists or not readable
    2654: kworker/u32:12-async
    2655: kworker/u32:13-async
    2656: kworker/u32:14-async
    2660: kworker/u32:18-events_unbound
    2665: kworker/u32:23-pm
    268: scsi_eh_0
    270: scsi_eh_1
    272: scsi_eh_2
    64: kworker/u32:1-events_unbound
    70: kworker/u32:7-pm
    76: kauditd
    84: kworker/1:1-events
    88: kworker/2:1-cgroup_free
    9: kworker/0:0-events

---

Works now!:
    $ sudo sh -c "
    mkdir -p /var/run/nvidia-sleep \
    && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
    && chvt 63 \
    && echo suspend >/proc/driver/nvidia/suspend \
    && systemctl suspend"

Logs:
    [  338.901995] [   T3134] printk: Suspending console(s) (use no_console_suspend to debug)
    [  338.901997] [   T3134] printk: console_suspend_all
    [  338.932763] [   T2672] printk: console_trylock
    [  338.948664] [   T2659] printk: console_trylock
    [  338.948685] [   T2671] printk: console_trylock
    [  338.950716] [    T272] printk: console_trylock
    [  338.950747] [    T270] printk: console_trylock
    [  338.982194] [   T3134] printk: console_trylock
    [  339.020910] [   T3134] printk: console_trylock
    [  339.044613] [    T158] printk: console_trylock
    [  339.132842] [   T3134] printk: console_trylock
    [  339.132843] [   T3134] printk: console_trylock
    [  339.132844] [   T3134] printk: console_trylock
    [  339.132980] [   T3134] printk: console_trylock
    [  339.134648] [   T3134] printk: console_trylock
    [  339.136741] [   T3134] printk: console_trylock
    [  339.138806] [   T3134] printk: console_trylock
    [  339.141014] [   T3134] printk: console_trylock
    [  339.142875] [   T3134] printk: console_trylock
    [  339.144627] [   T3134] printk: console_trylock
    [  339.146490] [   T3134] printk: console_trylock
    [  339.148175] [   T3134] printk: console_trylock
    [  339.148200] [   T3134] printk: console_trylock
    [  339.148201] [   T3134] printk: console_trylock
    [  339.148657] [   T3134] printk: console_trylock
    [  339.148706] [   T3134] printk: console_trylock
    [  339.152285] [   T3134] printk: console_trylock
    [  339.152318] [   T3134] printk: console_trylock
    [  339.155931] [   T3134] printk: console_trylock
    [  339.155964] [   T3134] printk: console_trylock
    [  339.159760] [   T3134] printk: console_trylock
    [  339.159792] [   T3134] printk: console_trylock
    [  339.162961] [   T3134] printk: console_trylock
    [  339.162987] [   T3134] printk: console_trylock
    [  339.166055] [   T3134] printk: console_trylock
    [  339.166079] [   T3134] printk: console_trylock
    [  339.169188] [   T3134] printk: console_trylock
    [  339.169214] [   T3134] printk: console_trylock
    [  339.172501] [   T3134] printk: console_trylock
    [  339.176003] [   T3134] printk: console_trylock
    [  339.225601] [   T3134] printk: console_trylock
    [  339.226950] [     T66] printk: console_trylock
    [  339.226954] [     T66] printk: console_trylock
    [  339.226957] [     T66] printk: console_trylock
    [  339.226964] [     T65] printk: console_trylock
    [  339.226969] [   T3134] printk: console_trylock
    [  339.227037] [   T2671] printk: console_trylock
    [  339.227045] [   T2661] printk: console_trylock
    [  339.254208] [      T9] printk: console_lock
    [  339.254214] [      T9] printk: console_unlock
    [  339.502296] [   T2673] printk: console_trylock
    [  339.506297] [     T67] printk: console_trylock
    [  339.561605] [    T272] printk: console_trylock
    [  339.561632] [    T270] printk: console_trylock
    [  339.562072] [    T272] printk: console_trylock
    [  339.563175] [    T270] printk: console_trylock
    [  339.564203] [    T270] printk: console_trylock
    [  339.564703] [    T272] printk: console_trylock
    [  339.565094] [    T272] printk: console_trylock
    [  339.565341] [    T270] printk: console_trylock
    [  339.570554] [   T1127] printk: console_trylock
    [  339.571705] [    T272] printk: console_trylock
    [  339.573017] [    T310] printk: console_trylock
    [  339.574203] [    T272] printk: console_trylock
    [  339.575888] [    T270] printk: console_trylock
    [  339.575899] [    T272] printk: console_trylock
    [  339.580570] [    T272] printk: console_trylock
    [  339.582271] [    T270] printk: console_trylock
    [  339.583508] [    T270] printk: console_trylock
    [  339.586181] [      T9] printk: console_trylock
    [  339.591087] [    T270] printk: console_trylock
    [  339.614444] [      T9] printk: console_trylock
    [  339.634304] [   T3149] printk: console_trylock
    [  339.942244] [   T2673] printk: console_trylock
    [  340.123772] [   T3134] printk: console_resume_all

Proc comm values:
    1127: kworker/3:2-cgroup_offline
    158: kworker/5:1-mm_percpu_wq
    2659: kworker/u32:17-async
    2661: kworker/u32:19-async
    2671: kworker/u32:29-async
    2672: kworker/u32:30-async
    2673: kworker/u32:31-kvfree_rcu_reclaim
    270: scsi_eh_1
    272: scsi_eh_2
    310: kworker/2:2-events
    3134: not exists or not readable
    3149: kworker/u32:38-flush-253:3
    65: kworker/u32:2-async
    66: kworker/u32:3-async
    67: kworker/u32:4-async
    9: kworker/0:0-events

---

On 2026-01-28, John Ogness wrote:
> Also, if the patch still has the problem, it would be nice to see the
> dmesg output with the patch applied when you do only the nvidia
> suspend/resume and avoid systemctl.

Unfortunately, there is no printk return under these conditions.

When searching for the names of the console_suspend/console_resume and
console_lock()/console_unlock() functions in the kms module sources,
I only found two uses of the latter, to which I added logs.
These functions are only called very early during startup and then never again;
Maybe this will help.

[    4.967951] [    T248] NV_API_CALL: os_disable_console_access
[    5.164823] [    T248] NV_API_CALL: os_enable_console_access
[    6.724895] [    T248] NV_API_CALL: os_disable_console_access
[    6.724988] [    T248] NV_API_CALL: os_enable_console_access


common/inc/nv-linux.h:

    /*
     * Early 2.6 kernels have acquire_console_sem, but from 2.6.38+ it was
     * renamed to console_lock.
     */
    #if defined(NV_ACQUIRE_CONSOLE_SEM_PRESENT)
    #define NV_ACQUIRE_CONSOLE_SEM() acquire_console_sem()
    #define NV_RELEASE_CONSOLE_SEM() release_console_sem()
    #elif defined(NV_CONSOLE_LOCK_PRESENT)
    #define NV_ACQUIRE_CONSOLE_SEM() console_lock()
    #define NV_RELEASE_CONSOLE_SEM() console_unlock()
    #else
    #error "console lock api unrecognized!."
    #endif


nvidia/os-interface.c:

    NV_STATUS NV_API_CALL os_disable_console_access(void)
    {
        nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_disable_console_access\n");
        NV_ACQUIRE_CONSOLE_SEM();
        return NV_OK;
    }

    NV_STATUS NV_API_CALL os_enable_console_access(void)
    {
        nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_enable_console_access\n");
        NV_RELEASE_CONSOLE_SEM();
        return NV_OK;
    }


Cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-29  9:34                 ` ysard
@ 2026-01-30 15:56                   ` Petr Mladek
  2026-01-30 16:28                     ` Petr Mladek
  0 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2026-01-30 15:56 UTC (permalink / raw)
  To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky

On Thu 2026-01-29 10:34:53, ysard wrote:
> Summary
> =======
> 
> The patch works only when I *uncomment* the 2 `synchronize_srcu(&console_srcu);` lines!

It is a pretty interesting information. It is good to know that
the synchronize_scru() is needed and has a positive effect.

> With synchronize_srcu commented (as is)
> =======================================
> 
> No freeze (expected):
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && systemctl suspend"
> 
> Logs:
>     [  378.634960] [   T2805] printk: Suspending console(s) (use no_console_suspend to debug)
>     [  378.634961] [   T2805] printk: console_suspend_all
>     [  378.669076] [   T2831] printk: console_trylock
>     [  378.669385] [    T271] printk: console_trylock
>     [  378.677075] [   T2826] printk: console_trylock
>     [  378.677094] [   T2832] printk: console_trylock
>     [  378.677597] [    T269] printk: console_trylock
>     [  378.678842] [    T273] printk: console_trylock
>     [  378.820962] [    T100] printk: console_trylock
>     [  379.498727] [   T2805] printk: console_trylock
>     [  379.537391] [   T2805] printk: console_trylock
>     [  379.650475] [   T2805] printk: console_trylock
>     [  379.650477] [   T2805] printk: console_trylock
>     [  379.650478] [   T2805] printk: console_trylock
>     [  379.650618] [   T2805] printk: console_trylock
>     [  379.652295] [   T2805] printk: console_trylock
>     [  379.654336] [   T2805] printk: console_trylock
>     [  379.656435] [   T2805] printk: console_trylock
>     [  379.658727] [   T2805] printk: console_trylock
>     [  379.660623] [   T2805] printk: console_trylock
>     [  379.662429] [   T2805] printk: console_trylock
>     [  379.664265] [   T2805] printk: console_trylock
>     [  379.666130] [   T2805] printk: console_trylock
>     [  379.666154] [   T2805] printk: console_trylock
>     [  379.666156] [   T2805] printk: console_trylock
>     [  379.666607] [   T2805] printk: console_trylock
>     [  379.666654] [   T2805] printk: console_trylock
>     [  379.670212] [   T2805] printk: console_trylock
>     [  379.670243] [   T2805] printk: console_trylock
>     [  379.673828] [   T2805] printk: console_trylock
>     [  379.673863] [   T2805] printk: console_trylock
>     [  379.677518] [   T2805] printk: console_trylock
>     [  379.677547] [   T2805] printk: console_trylock
>     [  379.680645] [   T2805] printk: console_trylock
>     [  379.680675] [   T2805] printk: console_trylock
>     [  379.683811] [   T2805] printk: console_trylock
>     [  379.683836] [   T2805] printk: console_trylock
>     [  379.686960] [   T2805] printk: console_trylock
>     [  379.686984] [   T2805] printk: console_trylock
>     [  379.690141] [   T2805] printk: console_trylock
>     [  379.693604] [   T2805] printk: console_trylock
>     [  379.753022] [   T2805] printk: console_trylock
>     [  379.754384] [   T2805] printk: console_trylock
>     [  379.754386] [   T2831] printk: console_trylock
>     [  379.754392] [   T2831] printk: console_trylock
>     [  379.754395] [   T2831] printk: console_trylock
>     [  379.754421] [   T2825] printk: console_trylock
>     [  379.754466] [   T2829] printk: console_trylock
>     [  379.754563] [   T2826] printk: console_trylock
>     [  379.766769] [     T11] printk: console_lock
>     [  379.766772] [     T11] printk: console_unlock
>     [  380.030746] [   T2826] printk: console_trylock
>     [  380.030749] [   T2825] printk: console_trylock
>     [  380.072183] [     T11] printk: console_trylock
>     [  380.081075] [    T271] printk: console_trylock
>     [  380.081100] [    T273] printk: console_trylock
>     [  380.081734] [    T273] printk: console_trylock
>     [  380.082280] [    T271] printk: console_trylock
>     [  380.083257] [    T271] printk: console_trylock
>     [  380.083767] [    T273] printk: console_trylock
>     [  380.084464] [    T271] printk: console_trylock
>     [  380.086088] [    T273] printk: console_trylock
>     [  380.091983] [    T334] printk: console_trylock
>     [  380.094504] [    T271] printk: console_trylock
>     [  380.095494] [    T271] printk: console_trylock
>     [  380.096698] [    T271] printk: console_trylock
>     [  380.098628] [     T11] printk: console_trylock
>     [  380.099849] [     T84] printk: console_trylock
>     [  380.104195] [    T271] printk: console_trylock
>     [  380.104493] [    T273] printk: console_trylock
>     [  380.106533] [    T273] printk: console_trylock
>     [  380.108797] [    T273] printk: console_trylock
>     [  380.122616] [    T273] printk: console_trylock
>     [  380.158723] [   T2831] printk: console_trylock
>     [  380.470807] [   T2825] printk: console_trylock
>     [  380.652605] [   T2805] printk: console_resume_all
> 
> Proc comm values:
>     100: kworker/5:1-mm_percpu_wq
>     11: kworker/0:1-events
>     269: scsi_eh_0
>     271: scsi_eh_1
>     273: scsi_eh_2
>     2805: not exists or not readable
>     2825: not exists or not readable
>     2826: not exists or not readable
>     2829: not exists or not readable
>     2831: kworker/u32:17-flush-253:1
>     2832: not exists or not readable
>     334: not exists or not readable
>     76: kauditd
>     84: kworker/2:1-events
> 
> ---
> 
> Freeze (as before):
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && echo suspend >/proc/driver/nvidia/suspend \
>     && systemctl suspend"

Note that the disabled synchronize_srcu() might cause another
problems (races) which are not related to the nvidia driver.
It is possible that it failed from other reasons this time.

Anyway, it is great to know that the console_srcu()
could and actually should stay.

> With synchronize_srcu uncommented
> =================================
> 
> No freeze (expected):
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && systemctl suspend"
> 
> Logs:
>     [   97.006391] [   T2643] printk: Suspending console(s) (use no_console_suspend to debug)
>     [   97.006393] [   T2643] printk: console_suspend_all
>     [   97.043073] [   T2654] printk: console_trylock
>     [   97.043290] [   T2655] printk: console_trylock
>     [   97.043486] [    T270] printk: console_trylock
>     [   97.044873] [    T272] printk: console_trylock
>     [   97.067023] [   T2656] printk: console_trylock
>     [   97.067403] [    T268] printk: console_trylock
>     [   97.138911] [      T9] printk: console_trylock
>     [   97.853561] [   T2643] printk: console_trylock
>     [   97.891353] [   T2643] printk: console_trylock
>     [   98.003694] [   T2643] printk: console_trylock
>     [   98.003696] [   T2643] printk: console_trylock
>     [   98.003697] [   T2643] printk: console_trylock
>     [   98.003838] [   T2643] printk: console_trylock
>     [   98.005787] [   T2643] printk: console_trylock
>     [   98.008142] [   T2643] printk: console_trylock
>     [   98.010180] [   T2643] printk: console_trylock
>     [   98.012412] [   T2643] printk: console_trylock
>     [   98.014319] [   T2643] printk: console_trylock
>     [   98.016306] [   T2643] printk: console_trylock
>     [   98.018004] [   T2643] printk: console_trylock
>     [   98.019610] [   T2643] printk: console_trylock
>     [   98.019634] [   T2643] printk: console_trylock
>     [   98.019636] [   T2643] printk: console_trylock
>     [   98.020082] [   T2643] printk: console_trylock
>     [   98.020130] [   T2643] printk: console_trylock
>     [   98.023697] [   T2643] printk: console_trylock
>     [   98.023730] [   T2643] printk: console_trylock
>     [   98.027334] [   T2643] printk: console_trylock
>     [   98.027369] [   T2643] printk: console_trylock
>     [   98.031044] [   T2643] printk: console_trylock
>     [   98.031074] [   T2643] printk: console_trylock
>     [   98.034191] [   T2643] printk: console_trylock
>     [   98.034221] [   T2643] printk: console_trylock
>     [   98.037379] [   T2643] printk: console_trylock
>     [   98.037405] [   T2643] printk: console_trylock
>     [   98.040576] [   T2643] printk: console_trylock
>     [   98.040599] [   T2643] printk: console_trylock
>     [   98.043786] [   T2643] printk: console_trylock
>     [   98.047294] [   T2643] printk: console_trylock
>     [   98.107363] [   T2643] printk: console_trylock
>     [   98.108733] [   T2643] printk: console_trylock
>     [   98.108748] [     T64] printk: console_trylock
>     [   98.108751] [     T70] printk: console_trylock
>     [   98.108753] [     T70] printk: console_trylock
>     [   98.108862] [   T2660] printk: console_trylock
>     [   98.108869] [   T2665] printk: console_trylock
>     [   98.124749] [     T84] printk: console_lock
>     [   98.124752] [     T84] printk: console_unlock
>     [   98.384581] [     T64] printk: console_trylock
>     [   98.384591] [   T2665] printk: console_trylock
>     [   98.432411] [     T84] printk: console_trylock
>     [   98.442955] [    T272] printk: console_trylock
>     [   98.442980] [    T270] printk: console_trylock
>     [   98.443315] [    T272] printk: console_trylock
>     [   98.444400] [    T270] printk: console_trylock
>     [   98.445386] [    T270] printk: console_trylock
>     [   98.445776] [    T272] printk: console_trylock
>     [   98.446591] [    T270] printk: console_trylock
>     [   98.448456] [    T272] printk: console_trylock
>     [   98.454112] [     T88] printk: console_trylock
>     [   98.456644] [    T270] printk: console_trylock
>     [   98.457620] [    T270] printk: console_trylock
>     [   98.458827] [    T270] printk: console_trylock
>     [   98.460353] [     T84] printk: console_trylock
>     [   98.464803] [      T9] printk: console_trylock
>     [   98.466327] [    T270] printk: console_trylock
>     [   98.470287] [    T272] printk: console_trylock
>     [   98.472759] [    T272] printk: console_trylock
>     [   98.475445] [    T272] printk: console_trylock
>     [   98.491793] [    T272] printk: console_trylock
>     [   98.512594] [     T70] printk: console_trylock
>     [   98.824640] [     T64] printk: console_trylock
>     [   99.006658] [   T2643] printk: console_resume_all
> 
> Proc comm values:
>     2643: not exists or not readable
>     2654: kworker/u32:12-async
>     2655: kworker/u32:13-async
>     2656: kworker/u32:14-async
>     2660: kworker/u32:18-events_unbound
>     2665: kworker/u32:23-pm
>     268: scsi_eh_0
>     270: scsi_eh_1
>     272: scsi_eh_2
>     64: kworker/u32:1-events_unbound
>     70: kworker/u32:7-pm
>     76: kauditd
>     84: kworker/1:1-events
>     88: kworker/2:1-cgroup_free
>     9: kworker/0:0-events
> 
> ---
> 
> Works now!:
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && echo suspend >/proc/driver/nvidia/suspend \
>     && systemctl suspend"
> 
> Logs:
>     [  338.901995] [   T3134] printk: Suspending console(s) (use no_console_suspend to debug)
>     [  338.901997] [   T3134] printk: console_suspend_all
>     [  338.932763] [   T2672] printk: console_trylock
>     [  338.948664] [   T2659] printk: console_trylock
>     [  338.948685] [   T2671] printk: console_trylock
>     [  338.950716] [    T272] printk: console_trylock
>     [  338.950747] [    T270] printk: console_trylock
>     [  338.982194] [   T3134] printk: console_trylock
>     [  339.020910] [   T3134] printk: console_trylock
>     [  339.044613] [    T158] printk: console_trylock
>     [  339.132842] [   T3134] printk: console_trylock
>     [  339.132843] [   T3134] printk: console_trylock
>     [  339.132844] [   T3134] printk: console_trylock
>     [  339.132980] [   T3134] printk: console_trylock
>     [  339.134648] [   T3134] printk: console_trylock
>     [  339.136741] [   T3134] printk: console_trylock
>     [  339.138806] [   T3134] printk: console_trylock
>     [  339.141014] [   T3134] printk: console_trylock
>     [  339.142875] [   T3134] printk: console_trylock
>     [  339.144627] [   T3134] printk: console_trylock
>     [  339.146490] [   T3134] printk: console_trylock
>     [  339.148175] [   T3134] printk: console_trylock
>     [  339.148200] [   T3134] printk: console_trylock
>     [  339.148201] [   T3134] printk: console_trylock
>     [  339.148657] [   T3134] printk: console_trylock
>     [  339.148706] [   T3134] printk: console_trylock
>     [  339.152285] [   T3134] printk: console_trylock
>     [  339.152318] [   T3134] printk: console_trylock
>     [  339.155931] [   T3134] printk: console_trylock
>     [  339.155964] [   T3134] printk: console_trylock
>     [  339.159760] [   T3134] printk: console_trylock
>     [  339.159792] [   T3134] printk: console_trylock
>     [  339.162961] [   T3134] printk: console_trylock
>     [  339.162987] [   T3134] printk: console_trylock
>     [  339.166055] [   T3134] printk: console_trylock
>     [  339.166079] [   T3134] printk: console_trylock
>     [  339.169188] [   T3134] printk: console_trylock
>     [  339.169214] [   T3134] printk: console_trylock
>     [  339.172501] [   T3134] printk: console_trylock
>     [  339.176003] [   T3134] printk: console_trylock
>     [  339.225601] [   T3134] printk: console_trylock
>     [  339.226950] [     T66] printk: console_trylock
>     [  339.226954] [     T66] printk: console_trylock
>     [  339.226957] [     T66] printk: console_trylock
>     [  339.226964] [     T65] printk: console_trylock
>     [  339.226969] [   T3134] printk: console_trylock
>     [  339.227037] [   T2671] printk: console_trylock
>     [  339.227045] [   T2661] printk: console_trylock
>     [  339.254208] [      T9] printk: console_lock
>     [  339.254214] [      T9] printk: console_unlock
>     [  339.502296] [   T2673] printk: console_trylock
>     [  339.506297] [     T67] printk: console_trylock
>     [  339.561605] [    T272] printk: console_trylock
>     [  339.561632] [    T270] printk: console_trylock
>     [  339.562072] [    T272] printk: console_trylock
>     [  339.563175] [    T270] printk: console_trylock
>     [  339.564203] [    T270] printk: console_trylock
>     [  339.564703] [    T272] printk: console_trylock
>     [  339.565094] [    T272] printk: console_trylock
>     [  339.565341] [    T270] printk: console_trylock
>     [  339.570554] [   T1127] printk: console_trylock
>     [  339.571705] [    T272] printk: console_trylock
>     [  339.573017] [    T310] printk: console_trylock
>     [  339.574203] [    T272] printk: console_trylock
>     [  339.575888] [    T270] printk: console_trylock
>     [  339.575899] [    T272] printk: console_trylock
>     [  339.580570] [    T272] printk: console_trylock
>     [  339.582271] [    T270] printk: console_trylock
>     [  339.583508] [    T270] printk: console_trylock
>     [  339.586181] [      T9] printk: console_trylock
>     [  339.591087] [    T270] printk: console_trylock
>     [  339.614444] [      T9] printk: console_trylock
>     [  339.634304] [   T3149] printk: console_trylock
>     [  339.942244] [   T2673] printk: console_trylock
>     [  340.123772] [   T3134] printk: console_resume_all
> 
> Proc comm values:
>     1127: kworker/3:2-cgroup_offline
>     158: kworker/5:1-mm_percpu_wq
>     2659: kworker/u32:17-async
>     2661: kworker/u32:19-async
>     2671: kworker/u32:29-async
>     2672: kworker/u32:30-async
>     2673: kworker/u32:31-kvfree_rcu_reclaim
>     270: scsi_eh_1
>     272: scsi_eh_2
>     310: kworker/2:2-events
>     3134: not exists or not readable
>     3149: kworker/u32:38-flush-253:3
>     65: kworker/u32:2-async
>     66: kworker/u32:3-async
>     67: kworker/u32:4-async
>     9: kworker/0:0-events

It is hard to know what is going there. I guess that many
console_trylock() calls are from printk(). But they might also
be from tty or from the nvidia driver code.

I have tried to create a patch which would print backtraces
of the callers. The output might be interesting. I am going
to send it in a separate mail.


> On 2026-01-28, John Ogness wrote:
> > Also, if the patch still has the problem, it would be nice to see the
> > dmesg output with the patch applied when you do only the nvidia
> > suspend/resume and avoid systemctl.
> 
> Unfortunately, there is no printk return under these conditions.
> 
> When searching for the names of the console_suspend/console_resume and
> console_lock()/console_unlock() functions in the kms module sources,
> I only found two uses of the latter, to which I added logs.
> These functions are only called very early during startup and then never again;
> Maybe this will help.
> 
> [    4.967951] [    T248] NV_API_CALL: os_disable_console_access
> [    5.164823] [    T248] NV_API_CALL: os_enable_console_access
> [    6.724895] [    T248] NV_API_CALL: os_disable_console_access
> [    6.724988] [    T248] NV_API_CALL: os_enable_console_access
> 
> 
> common/inc/nv-linux.h:
> 
>     /*
>      * Early 2.6 kernels have acquire_console_sem, but from 2.6.38+ it was
>      * renamed to console_lock.
>      */
>     #if defined(NV_ACQUIRE_CONSOLE_SEM_PRESENT)
>     #define NV_ACQUIRE_CONSOLE_SEM() acquire_console_sem()
>     #define NV_RELEASE_CONSOLE_SEM() release_console_sem()
>     #elif defined(NV_CONSOLE_LOCK_PRESENT)
>     #define NV_ACQUIRE_CONSOLE_SEM() console_lock()
>     #define NV_RELEASE_CONSOLE_SEM() console_unlock()
>     #else
>     #error "console lock api unrecognized!."
>     #endif
> 
> nvidia/os-interface.c:
> 
>     NV_STATUS NV_API_CALL os_disable_console_access(void)
>     {
>         nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_disable_console_access\n");
>         NV_ACQUIRE_CONSOLE_SEM();
>         return NV_OK;
>     }
> 
>     NV_STATUS NV_API_CALL os_enable_console_access(void)
>     {
>         nv_printf(NV_DBG_ERRORS, "NV_API_CALL: os_enable_console_access\n");
>         NV_RELEASE_CONSOLE_SEM();
>         return NV_OK;
>     }

Good to know. So the nvidia driver synchronizes some operations
using console_lock() as well. And it might be affected by
the modified behavior. For example, before the commit
9e70a5e109a4a2336 ("printk: Add per-console suspended state")
during suspend:

   + console_trylock() never succeeded
   + console_lock() set neither cosnole_locked, nor
   console_may_schedule.

...

Best Regards,
Petr

PS: I am going to write some more ideas into another mail with
    a new debug patch. I am not sure if I would send it today.
    I have to leave in 30 minutes or so...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-30 15:56                   ` Petr Mladek
@ 2026-01-30 16:28                     ` Petr Mladek
  2026-01-31 22:22                       ` ysard
  0 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2026-01-30 16:28 UTC (permalink / raw)
  To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky

On Fri 2026-01-30 16:56:56, Petr Mladek wrote:
> On Thu 2026-01-29 10:34:53, ysard wrote:
> > Summary
> > =======
> > 
> > The patch works only when I *uncomment* the 2 `synchronize_srcu(&console_srcu);` lines!
> 
> > Works now!:
> >     $ sudo sh -c "
> >     mkdir -p /var/run/nvidia-sleep \
> >     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> >     && chvt 63 \
> >     && echo suspend >/proc/driver/nvidia/suspend \
> >     && systemctl suspend"
> > 
> > Logs:
> >     [  338.901995] [   T3134] printk: Suspending console(s) (use no_console_suspend to debug)
> >     [  338.901997] [   T3134] printk: console_suspend_all
> >     [  338.932763] [   T2672] printk: console_trylock
> >     [  338.948664] [   T2659] printk: console_trylock
> >     [  338.948685] [   T2671] printk: console_trylock
> >     [  338.950716] [    T272] printk: console_trylock
> >     [  338.950747] [    T270] printk: console_trylock
> >     [  338.982194] [   T3134] printk: console_trylock
> >     [  339.020910] [   T3134] printk: console_trylock
> >     [  339.044613] [    T158] printk: console_trylock
> >     [  339.132842] [   T3134] printk: console_trylock
[...]
> >     [  339.614444] [      T9] printk: console_trylock
> >     [  339.634304] [   T3149] printk: console_trylock
> >     [  339.942244] [   T2673] printk: console_trylock
> >     [  340.123772] [   T3134] printk: console_resume_all
> > 
> > Proc comm values:
> >     1127: kworker/3:2-cgroup_offline
> >     158: kworker/5:1-mm_percpu_wq
> >     2659: kworker/u32:17-async
> >     2661: kworker/u32:19-async
> >     2671: kworker/u32:29-async
> >     2672: kworker/u32:30-async
> >     2673: kworker/u32:31-kvfree_rcu_reclaim
> >     270: scsi_eh_1
> >     272: scsi_eh_2
> >     310: kworker/2:2-events
> >     3134: not exists or not readable
> >     3149: kworker/u32:38-flush-253:3
> >     65: kworker/u32:2-async
> >     66: kworker/u32:3-async
> >     67: kworker/u32:4-async
> >     9: kworker/0:0-events
> 
> It is hard to know what is going there. I guess that many
> console_trylock() calls are from printk(). But they might also
> be from tty or from the nvidia driver code.
> 
> I have tried to create a patch which would print backtraces
> of the callers. The output might be interesting. I am going
> to send it in a separate mail.

Please, find the patch below. It can be applied on top of
the preovius one which reverted the problematic commit,
see https://lore.kernel.org/all/aXoWiJhcOaGGlcmk@pathway.suse.cz/

Important: You still need to explicitely uncomment the
   sychronize_srcu() calls.

It would be nice to provide three logs with this patch:

   1. console_lock() API called by
       echo suspend >/proc/driver/nvidia/suspend

   2. console_lock() API called by the suspend test

      No freeze (expected):
      $ sudo sh -c "
      mkdir -p /var/run/nvidia-sleep \
      && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
      && chvt 63 \
      && systemctl suspend"


   3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.:
      $ sudo sh -c "
      mkdir -p /var/run/nvidia-sleep \
      && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
      && chvt 63 \
      && echo suspend >/proc/driver/nvidia/suspend \
      && systemctl suspend"


Feel free to do your own tests. But please do not spend too much
time on it.

It would be nice to find the culprit. Especially I would like to
know whether the problem is in the core kernel code (printk,
suspend, tty) or in the nvidia driver.

But we might also just need to restore the original behavior
of the console lock API during suspend. I mean the short cuts.
They are a kind of optimization anyway.

But as I said, it would be nice to understand the problem. There
might be a race or infinite loop somewhere and the original
console_lock API behavior just hides the problem.

OK, here is the patch which printed backtraces of console_lock
API callers around the suspend for me.


From ab9e67fc50425eb6574c016c5809865a085c4b44 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Fri, 30 Jan 2026 16:57:25 +0100
Subject: [PATCH] printk: debug: Allow to print backtraces of console_lock API
 callers.

This patch allows to print backtraces of console_lock API callers.
The logging must be explicitely enabled by:

   echo 1 >/sys/module/printk/parameters/debug_console_lock

I suggest to increase the internal log buffer size, for example,
use on the command line:

   log_buf_len=1M

The logging might easily go wild because the console API is
called by printk() and tty code. I have explicitely disabled
the logging in:

   + vprintk_emit()
   + fb_flashcursor()

, see no_debug_console_lock_enter()/exit() calls. Otherwise,
there was an endless loop of printed entries.

Note that I run the tests in a virtualized test system (QEMU)
via ssh. Some more tty API might need to get disabled when
doing the tests using the tty console. But it might hide
the code called during suspend. So, doing the tests via
ssh sounds better to me.

Also the number of printed backtraces is limited to "100".
It helps to keep the system running when printing the report
triggers more callers of the API and causes an infinite loop.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 drivers/video/fbdev/core/fbcon.c |  9 +++-
 include/linux/printk.h           |  2 +
 kernel/printk/printk.c           | 91 ++++++++++++++++++++++++++++++--
 3 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c
index 7be9e865325d..1d5d7af516ed 100644
--- a/drivers/video/fbdev/core/fbcon.c
+++ b/drivers/video/fbdev/core/fbcon.c
@@ -397,14 +397,19 @@ static void fb_flashcursor(struct work_struct *work)
 	struct vc_data *vc = NULL;
 	int c;
 	bool enable;
+	int debug;
 	int ret;
 
+	debug = no_debug_console_lock_enter();
+
 	/* FIXME: we should sort out the unbind locking instead */
 	/* instead we just fail to flash the cursor if we can't get
 	 * the lock instead of blocking fbcon deinit */
 	ret = console_trylock();
-	if (ret == 0)
+	if (ret == 0) {
+		no_debug_console_lock_exit(debug);
 		return;
+	}
 
 	/* protected by console_lock */
 	info = par->info;
@@ -416,6 +421,7 @@ static void fb_flashcursor(struct work_struct *work)
 	    fbcon_info_from_console(vc->vc_num) != info ||
 	    vc->vc_deccm != 1) {
 		console_unlock();
+		no_debug_console_lock_exit(debug);
 		return;
 	}
 
@@ -425,6 +431,7 @@ static void fb_flashcursor(struct work_struct *work)
 			    get_fg_color(vc, info, c),
 			    get_bg_color(vc, info, c));
 	console_unlock();
+	no_debug_console_lock_exit(debug);
 
 	queue_delayed_work(system_power_efficient_wq, &par->cursor_work,
 			   par->cur_blink_jiffies);
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 45c663124c9b..f266b20b7584 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -201,6 +201,8 @@ void __init setup_log_buf(int early);
 __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
+int no_debug_console_lock_enter(void);
+void no_debug_console_lock_exit(int debug);
 extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
 extern asmlinkage void dump_stack(void) __cold;
 void printk_trigger_flush(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6e0ad926f409..c54c70d7a074 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2383,11 +2383,52 @@ MODULE_PARM_DESC(debug_non_panic_cpus,
 		 "allow messages from non-panic CPUs in panic()");
 #endif
 
+int debug_console_lock;
+module_param(debug_console_lock, int, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(debug_console_lock,
+		 "Print some caller info when console_lock() API is called.");
+
+static DEFINE_PER_CPU(int, no_debug_console_lock_nesting);
+
+int no_debug_console_lock_enter(void)
+{
+	int debug;
+
+	debug = READ_ONCE(debug_console_lock);
+	if (debug) {
+		if (this_cpu_read(no_debug_console_lock_nesting) > 256) {
+			debug_console_lock = 0;
+			pr_err("Overflow of no_debug_console_lock_nesting! Disabled console_lock debuginng.\n");
+			return 0;
+		}
+
+		migrate_disable();
+		this_cpu_inc(no_debug_console_lock_nesting);
+	}
+
+	return debug;
+}
+
+void no_debug_console_lock_exit(int debug)
+{
+	if (debug) {
+		if (this_cpu_read(no_debug_console_lock_nesting) <= 0) {
+			debug_console_lock = 0;
+			pr_err("Underflow of no_debug_console_lock_nesting! Disabled console lock debuginng.\n");
+			migrate_enable();
+			return;
+		}
+		this_cpu_dec(no_debug_console_lock_nesting);
+		migrate_enable();
+	}
+}
+
 asmlinkage int vprintk_emit(int facility, int level,
 			    const struct dev_printk_info *dev_info,
 			    const char *fmt, va_list args)
 {
 	struct console_flush_type ft;
+	int debug_caller;
 	int printed_len;
 
 	/* Suppress unimportant messages after panic happens */
@@ -2404,6 +2445,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 	    !panic_triggering_all_cpu_backtrace)
 		return 0;
 
+	debug_caller = no_debug_console_lock_enter();
+
 	printk_get_console_flush_type(&ft);
 
 	/* If called from the scheduler, we can not call up(). */
@@ -2448,6 +2491,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 	else if (!console_irqwork_blocked)
 		wake_up_klogd();
 
+	no_debug_console_lock_exit(debug_caller);
+
 	return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
@@ -2856,6 +2901,41 @@ static int console_cpu_notify(unsigned int cpu)
 	return 0;
 }
 
+static void print_console_lock_caller(const char *func_name)
+{
+	static int reports = 0;
+	char tcomm[64];
+	u8 nested;
+
+	if (!debug_console_lock)
+		return;
+
+	nested = raw_cpu_read(no_debug_console_lock_nesting);
+	if (nested)
+		return;
+
+	if (!func_name) {
+		pr_err_once("%s: called without func_name\n", __func__);
+		return;
+	}
+
+	if (reports++ > 100) {
+		debug_console_lock = 0;
+		pr_warn("Too many printed console lock callers. Disabled the debugging.\n");
+	}
+
+	if (current->flags & PF_WQ_WORKER)
+		wq_worker_comm(tcomm, sizeof(tcomm), current);
+	else if (current->flags & PF_KTHREAD)
+		get_kthread_comm(tcomm, sizeof(tcomm), current);
+	else
+		get_task_comm(tcomm, current);
+
+	pr_info("console lock API call [%d]: %s() by pid=%d, comm=%s\n",
+		reports, func_name, current->pid, tcomm);
+	show_stack(current, NULL, KERN_INFO);
+}
+
 /**
  * console_lock - block the console subsystem from printing
  *
@@ -2868,13 +2948,14 @@ void console_lock(void)
 {
 	might_sleep();
 
+	print_console_lock_caller(__func__);
+
 	/* On panic, the console_lock must be left to the panic cpu. */
 	while (panic_on_other_cpu())
 		msleep(1000);
 
 	down_console_sem();
 	if (console_suspended) {
-		printk_store(KERN_INFO "printk: %s\n", __func__);
 		/*
 		 * Keep console locked, but do not touch
 		 * @console_locked or @console_may_schedule.
@@ -2897,13 +2978,14 @@ EXPORT_SYMBOL(console_lock);
  */
 int console_trylock(void)
 {
+	print_console_lock_caller(__func__);
+
 	/* On panic, the console_lock must be left to the panic cpu. */
 	if (panic_on_other_cpu())
 		return 0;
 	if (down_trylock_console_sem())
 		return 0;
 	if (console_suspended) {
-		printk_store(KERN_INFO "printk: %s\n", __func__);
 		/*
 		 * The lock was acquired, but unlock directly and report
 		 * failure. Here console_locked=1 and console_may_schedule=1.
@@ -2919,6 +3001,8 @@ EXPORT_SYMBOL(console_trylock);
 
 int is_console_locked(void)
 {
+//	/* This might produce too many entries. */
+//	print_console_lock_caller(__func__);
 	return console_locked;
 }
 EXPORT_SYMBOL(is_console_locked);
@@ -3404,8 +3488,9 @@ void console_unlock(void)
 {
 	struct console_flush_type ft;
 
+	print_console_lock_caller(__func__);
+
 	if (console_suspended) {
-		printk_store(KERN_INFO "printk: %s\n", __func__);
 		/*
 		 * Simply unlock directly.
 		 * Here console_locked=1 and console_may_schedule=1.
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-30 16:28                     ` Petr Mladek
@ 2026-01-31 22:22                       ` ysard
  2026-02-02 11:02                         ` Petr Mladek
  0 siblings, 1 reply; 16+ messages in thread
From: ysard @ 2026-01-31 22:22 UTC (permalink / raw)
  To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky

Here are the logs, debug_console_lock is disabled 4 seconds after each test
to avoid consuming the 100 reports (kworker events are frequent here).

> It would be nice to provide three logs with this patch:
>
>    1. console_lock() API called by
>        echo suspend >/proc/driver/nvidia/suspend

[   65.417500] [     T88] printk: console lock API call [1]: console_lock() by pid=88, comm=kworker/4:1+events
[   65.417581] [     T88] printk: console lock API call [2]: console_unlock() by pid=88, comm=kworker/4:1+events
[   74.984380] [     T88] printk: console lock API call [3]: console_lock() by pid=88, comm=kworker/4:1+events
[   74.984450] [     T88] printk: console lock API call [4]: console_unlock() by pid=88, comm=kworker/4:1+events
[   75.134088] [     T88] printk: console lock API call [5]: console_lock() by pid=88, comm=kworker/4:1+events
[   75.134289] [     T88] printk: console lock API call [6]: console_unlock() by pid=88, comm=kworker/4:1+events

>    2. console_lock() API called by the suspend test
>
>       No freeze (expected):
>       $ sudo sh -c "
>       mkdir -p /var/run/nvidia-sleep \
>       && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>       && chvt 63 \
>       && systemctl suspend"

[  325.592510] [   T3023] printk: console lock API call [25]: console_lock() by pid=3023, comm=chvt
[  325.592594] [   T3023] printk: console lock API call [26]: console_unlock() by pid=3023, comm=chvt
[  325.592656] [    T200] printk: console lock API call [27]: console_lock() by pid=200, comm=kworker/7:2+events
[  325.592698] [    T200] printk: console lock API call [28]: console_unlock() by pid=200, comm=kworker/7:2+events
[  325.709430] [     T11] printk: console lock API call [29]: console_lock() by pid=11, comm=kworker/0:1+events
[  325.709483] [     T11] printk: console lock API call [30]: console_unlock() by pid=11, comm=kworker/0:1+events
[  326.592523] [   T3023] printk: console lock API call [31]: console_lock() by pid=3023, comm=chvt
[  326.592620] [   T3023] printk: console lock API call [32]: console_unlock() by pid=3023, comm=chvt
[  326.592709] [    T200] printk: console lock API call [33]: console_lock() by pid=200, comm=kworker/7:2+events
[  326.592757] [    T200] printk: console lock API call [34]: console_unlock() by pid=200, comm=kworker/7:2+events
[  327.151095] [   T1428] printk: console lock API call [35]: console_lock() by pid=1428, comm=Xorg
[  327.152983] [   T1428] printk: console lock API call [36]: console_unlock() by pid=1428, comm=Xorg
[  327.204795] [   T3050] printk: console lock API call [37]: console_lock() by pid=3050, comm=9
[  327.204939] [   T3050] printk: console lock API call [38]: console_unlock() by pid=3050, comm=9
[  327.205051] [   T3050] printk: console lock API call [39]: console_lock() by pid=3050, comm=9
[  327.205158] [   T3050] printk: console lock API call [40]: console_unlock() by pid=3050, comm=9
[  327.205335] [   T3050] printk: console lock API call [41]: console_lock() by pid=3050, comm=9
[  327.205398] [   T3050] printk: console lock API call [42]: console_unlock() by pid=3050, comm=9
[  327.517379] [   T3065] printk: console lock API call [43]: console_lock() by pid=3065, comm=9
[  327.517483] [   T3065] printk: console lock API call [44]: console_unlock() by pid=3065, comm=9
[  327.517567] [   T3065] printk: console lock API call [45]: console_lock() by pid=3065, comm=9
[  327.517644] [   T3065] printk: console lock API call [46]: console_unlock() by pid=3065, comm=9
[  327.517805] [   T3065] printk: console lock API call [47]: console_lock() by pid=3065, comm=9
[  327.517890] [   T3065] printk: console lock API call [48]: console_unlock() by pid=3065, comm=9
[  327.558752] [   T3073] printk: console lock API call [49]: console_lock() by pid=3073, comm=9
[  327.558877] [   T3073] printk: console lock API call [50]: console_unlock() by pid=3073, comm=9
[  327.558963] [   T3073] printk: console lock API call [51]: console_lock() by pid=3073, comm=9
[  327.559043] [   T3073] printk: console lock API call [52]: console_unlock() by pid=3073, comm=9
[  327.559197] [   T3073] printk: console lock API call [53]: console_lock() by pid=3073, comm=9
[  327.559247] [   T3073] printk: console lock API call [54]: console_unlock() by pid=3073, comm=9
[  327.585818] [   T3073] printk: console lock API call [55]: console_lock() by pid=3073, comm=systemd-sleep
[  327.585937] [   T3073] printk: console lock API call [56]: console_unlock() by pid=3073, comm=systemd-sleep
[  327.586127] [     T95] printk: console lock API call [57]: console_lock() by pid=95, comm=kworker/6:1+events
[  327.586168] [     T95] printk: console lock API call [58]: console_unlock() by pid=95, comm=kworker/6:1+events
[  327.591067] [   T3073] printk: Suspending console(s) (use no_console_suspend to debug)
[  327.591069] [   T3073] printk: console lock API call [59]: console_lock() by pid=3073, comm=systemd-sleep
[  327.591084] [   T3073]  ? _printk+0x54/0x60
[  327.591163] [   T3073] printk: console lock API call [60]: console_unlock() by pid=3073, comm=systemd-sleep
[  327.591175] [   T3073]  ? _printk+0x54/0x60
[  327.591242] [   T3073] printk: console lock API call [61]: console_lock() by pid=3073, comm=systemd-sleep
[  327.591253] [   T3073]  ? _printk+0x54/0x60
[  327.591322] [   T3073] printk: console lock API call [62]: console_unlock() by pid=3073, comm=systemd-sleep
[  327.591332] [   T3073]  ? _printk+0x54/0x60
[  327.591396] [   T3073] printk: console lock API call [63]: console_lock() by pid=3073, comm=systemd-sleep
[  327.591470] [   T3073] printk: console_suspend_all
[  328.652581] [     T89] printk: console lock API call [64]: console_lock() by pid=89, comm=kworker/1:1+events
[  328.652639] [     T89] printk: console lock API call [65]: console_unlock() by pid=89, comm=kworker/1:1+events
[  330.313166] [   T3073] printk: console_resume_all
[  330.313173] [   T3073] printk: console lock API call [66]: console_unlock() by pid=3073, comm=systemd-sleep
[  330.313319] [   T3073] printk: console lock API call [67]: console_lock() by pid=3073, comm=systemd-sleep
[  330.313425] [   T3073] printk: console lock API call [68]: console_unlock() by pid=3073, comm=systemd-sleep
[  330.313521] [   T3073] printk: console lock API call [69]: console_lock() by pid=3073, comm=systemd-sleep
[  330.313621] [   T3073] printk: console lock API call [70]: console_unlock() by pid=3073, comm=systemd-sleep
[  331.198632] [   T3073] printk: console lock API call [71]: console_lock() by pid=3073, comm=systemd-sleep
[  331.198909] [   T3073] printk: console lock API call [72]: console_unlock() by pid=3073, comm=systemd-sleep
[  331.199418] [     T89] printk: console lock API call [73]: console_lock() by pid=89, comm=kworker/1:1+events
[  331.199564] [     T89] printk: console lock API call [74]: console_unlock() by pid=89, comm=kworker/1:1+events
[  331.222975] [   T3150] printk: console lock API call [75]: console_lock() by pid=3150, comm=chvt
[  331.223105] [   T3150] printk: console lock API call [76]: console_unlock() by pid=3150, comm=chvt
[  331.223221] [   T3150] printk: console lock API call [77]: console_lock() by pid=3150, comm=chvt
[  331.223265] [   T3150] printk: console lock API call [78]: console_unlock() by pid=3150, comm=chvt
[  331.223309] [     T88] printk: console lock API call [79]: console_lock() by pid=88, comm=kworker/7:1+events
[  331.223366] [     T88] printk: console lock API call [80]: console_unlock() by pid=88, comm=kworker/7:1+events
[  331.223394] [   T1428] printk: console lock API call [81]: console_lock() by pid=1428, comm=Xorg
[  331.223491] [   T1428] printk: console lock API call [82]: console_unlock() by pid=1428, comm=Xorg
[  331.223526] [   T3150] printk: console lock API call [83]: console_lock() by pid=3150, comm=chvt
[  331.223605] [   T3150] printk: console lock API call [84]: console_unlock() by pid=3150, comm=chvt
[  331.230047] [   T1164] printk: console lock API call [85]: console_lock() by pid=1164, comm=systemd-logind
[  331.230191] [   T1164] printk: console lock API call [86]: console_unlock() by pid=1164, comm=systemd-logind
[  331.230333] [   T1164] printk: console lock API call [87]: console_lock() by pid=1164, comm=systemd-logind
[  331.230416] [   T1164] printk: console lock API call [88]: console_unlock() by pid=1164, comm=systemd-logind
[  331.230495] [   T1164] printk: console lock API call [89]: console_lock() by pid=1164, comm=systemd-logind
[  331.230579] [   T1164] printk: console lock API call [90]: console_unlock() by pid=1164, comm=systemd-logind


>    3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.:
>       $ sudo sh -c "
>       mkdir -p /var/run/nvidia-sleep \
>       && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>       && chvt 63 \
>       && echo suspend >/proc/driver/nvidia/suspend \
>       && systemctl suspend"

[   91.535688] [   T2576] printk: console lock API call [1]: console_lock() by pid=2576, comm=chvt
[   91.536179] [   T2576] printk: console lock API call [2]: console_unlock() by pid=2576, comm=chvt
[   91.536254] [    T281] printk: console lock API call [3]: console_lock() by pid=281, comm=kworker/3:2+events
[   91.536309] [    T281] printk: console lock API call [4]: console_unlock() by pid=281, comm=kworker/3:2+events
[   91.547297] [   T2580] printk: console lock API call [5]: console_lock() by pid=2580, comm=cached_setup_te
[   91.547433] [   T2580] printk: console lock API call [6]: console_unlock() by pid=2580, comm=cached_setup_te
[   91.547567] [   T2580] printk: console lock API call [7]: console_lock() by pid=2580, comm=cached_setup_te
[   91.547653] [   T2580] printk: console lock API call [8]: console_unlock() by pid=2580, comm=cached_setup_te
[   91.547729] [   T2580] printk: console lock API call [9]: console_lock() by pid=2580, comm=cached_setup_te
[   91.547802] [   T2580] printk: console lock API call [10]: console_unlock() by pid=2580, comm=cached_setup_te
[   91.547871] [   T2580] printk: console lock API call [11]: console_lock() by pid=2580, comm=cached_setup_te
[   91.547945] [   T2580] printk: console lock API call [12]: console_unlock() by pid=2580, comm=cached_setup_te
[   91.548022] [   T2580] printk: console lock API call [13]: console_lock() by pid=2580, comm=cached_setup_te
[   91.548104] [   T2580] printk: console lock API call [14]: console_unlock() by pid=2580, comm=cached_setup_te
[   91.559988] [    T900] printk: console lock API call [15]: console_lock() by pid=900, comm=kworker/4:2+events
[   91.560049] [    T900] printk: console lock API call [16]: console_unlock() by pid=900, comm=kworker/4:2+events
[   92.535710] [   T2576] printk: console lock API call [17]: console_lock() by pid=2576, comm=chvt
[   92.535796] [   T2576] printk: console lock API call [18]: console_unlock() by pid=2576, comm=chvt
[   92.535857] [    T281] printk: console lock API call [19]: console_lock() by pid=281, comm=kworker/3:2+events
[   92.535901] [    T281] printk: console lock API call [20]: console_unlock() by pid=281, comm=kworker/3:2+events
[   93.115410] [   T1426] printk: console lock API call [21]: console_lock() by pid=1426, comm=Xorg
[   93.123140] [   T1426] printk: console lock API call [22]: console_unlock() by pid=1426, comm=Xorg
[   94.008771] [   T2604] printk: console lock API call [23]: console_lock() by pid=2604, comm=9
[   94.008866] [   T2604] printk: console lock API call [24]: console_unlock() by pid=2604, comm=9
[   94.008940] [   T2604] printk: console lock API call [25]: console_lock() by pid=2604, comm=9
[   94.009007] [   T2604] printk: console lock API call [26]: console_unlock() by pid=2604, comm=9
[   94.009149] [   T2604] printk: console lock API call [27]: console_lock() by pid=2604, comm=9
[   94.009210] [   T2604] printk: console lock API call [28]: console_unlock() by pid=2604, comm=9
[   94.296534] [   T2619] printk: console lock API call [29]: console_lock() by pid=2619, comm=9
[   94.296640] [   T2619] printk: console lock API call [30]: console_unlock() by pid=2619, comm=9
[   94.296724] [   T2619] printk: console lock API call [31]: console_lock() by pid=2619, comm=9
[   94.296801] [   T2619] printk: console lock API call [32]: console_unlock() by pid=2619, comm=9
[   94.296955] [   T2619] printk: console lock API call [33]: console_lock() by pid=2619, comm=9
[   94.297033] [   T2619] printk: console lock API call [34]: console_unlock() by pid=2619, comm=9
[   94.346111] [     T76] kauditd_printk_skb: 2 callbacks suppressed
[   94.346957] [      C2] printk: console lock API call [35]: console_trylock() by pid=0, comm=swapper/2
[   94.347059] [      C2] printk: console lock API call [36]: console_unlock() by pid=0, comm=swapper/2
[   94.360483] [   T2628] printk: console lock API call [37]: console_lock() by pid=2628, comm=9
[   94.360603] [   T2628] printk: console lock API call [38]: console_unlock() by pid=2628, comm=9
[   94.360697] [   T2628] printk: console lock API call [39]: console_lock() by pid=2628, comm=9
[   94.360786] [   T2628] printk: console lock API call [40]: console_unlock() by pid=2628, comm=9
[   94.360939] [   T2628] printk: console lock API call [41]: console_lock() by pid=2628, comm=9
[   94.361007] [   T2628] printk: console lock API call [42]: console_unlock() by pid=2628, comm=9
[   94.417313] [   T2628] printk: console lock API call [43]: console_lock() by pid=2628, comm=systemd-sleep
[   94.417405] [   T2628] printk: console lock API call [44]: console_unlock() by pid=2628, comm=systemd-sleep
[   94.417562] [     T97] printk: console lock API call [45]: console_lock() by pid=97, comm=kworker/2:1+events
[   94.417601] [     T97] printk: console lock API call [46]: console_unlock() by pid=97, comm=kworker/2:1+events
[   94.420174] [   T2628] printk: Suspending console(s) (use no_console_suspend to debug)
[   94.420176] [   T2628] printk: console lock API call [47]: console_lock() by pid=2628, comm=systemd-sleep
[   94.420191] [   T2628]  ? _printk+0x54/0x60
[   94.420250] [   T2628] printk: console lock API call [48]: console_unlock() by pid=2628, comm=systemd-sleep
[   94.420262] [   T2628]  ? _printk+0x54/0x60
[   94.420310] [   T2628] printk: console lock API call [49]: console_lock() by pid=2628, comm=systemd-sleep
[   94.420321] [   T2628]  ? _printk+0x54/0x60
[   94.420371] [   T2628] printk: console lock API call [50]: console_unlock() by pid=2628, comm=systemd-sleep
[   94.420381] [   T2628]  ? _printk+0x54/0x60
[   94.420427] [   T2628] printk: console lock API call [51]: console_lock() by pid=2628, comm=systemd-sleep
[   94.420482] [   T2628] printk: console_suspend_all
[   95.220250] [     T97] printk: console lock API call [52]: console_lock() by pid=97, comm=kworker/2:1+events
[   95.220331] [     T97] printk: console lock API call [53]: console_unlock() by pid=97, comm=kworker/2:1+events
[   96.879223] [   T2628] printk: console_resume_all
[   96.879230] [   T2628] printk: console lock API call [54]: console_unlock() by pid=2628, comm=systemd-sleep
[   96.879385] [   T2628] printk: console lock API call [55]: console_lock() by pid=2628, comm=systemd-sleep
[   96.879543] [   T2628] printk: console lock API call [56]: console_unlock() by pid=2628, comm=systemd-sleep
[   96.879686] [   T2628] printk: console lock API call [57]: console_lock() by pid=2628, comm=systemd-sleep
[   96.879836] [   T2628] printk: console lock API call [58]: console_unlock() by pid=2628, comm=systemd-sleep
[   97.789766] [   T2628] printk: console lock API call [59]: console_lock() by pid=2628, comm=systemd-sleep
[   97.789850] [   T2628] printk: console lock API call [60]: console_unlock() by pid=2628, comm=systemd-sleep
[   97.789987] [    T293] printk: console lock API call [61]: console_lock() by pid=293, comm=kworker/2:2+events
[   97.790045] [    T293] printk: console lock API call [62]: console_unlock() by pid=293, comm=kworker/2:2+events
[   98.464634] [   T2863] printk: console lock API call [63]: console_lock() by pid=2863, comm=chvt
[   98.464756] [   T2863] printk: console lock API call [64]: console_unlock() by pid=2863, comm=chvt
[   98.464870] [   T2863] printk: console lock API call [65]: console_lock() by pid=2863, comm=chvt
[   98.464953] [   T2863] printk: console lock API call [66]: console_unlock() by pid=2863, comm=chvt
[   98.465031] [    T281] printk: console lock API call [67]: console_lock() by pid=281, comm=kworker/3:2+events
[   98.465105] [    T281] printk: console lock API call [68]: console_unlock() by pid=281, comm=kworker/3:2+events
[   98.465152] [   T1426] printk: console lock API call [69]: console_lock() by pid=1426, comm=Xorg
[   98.465244] [   T2863] printk: console lock API call [70]: console_lock() by pid=2863, comm=chvt
[   98.465284] [   T1426] printk: console lock API call [71]: console_unlock() by pid=1426, comm=Xorg
[   98.465388] [   T2863] printk: console lock API call [72]: console_unlock() by pid=2863, comm=chvt

Cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-01-31 22:22                       ` ysard
@ 2026-02-02 11:02                         ` Petr Mladek
  2026-02-03  1:32                           ` ysard
  0 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2026-02-02 11:02 UTC (permalink / raw)
  To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky

On Sat 2026-01-31 23:22:09, ysard wrote:
> Here are the logs, debug_console_lock is disabled 4 seconds after each test
> to avoid consuming the 100 reports (kworker events are frequent here).
>
> > It would be nice to provide three logs with this patch:
> >
> >    1. console_lock() API called by
> >        echo suspend >/proc/driver/nvidia/suspend
> 
> [   65.417500] [     T88] printk: console lock API call [1]: console_lock() by pid=88, comm=kworker/4:1+events
> [   65.417581] [     T88] printk: console lock API call [2]: console_unlock() by pid=88, comm=kworker/4:1+events
> [   74.984380] [     T88] printk: console lock API call [3]: console_lock() by pid=88, comm=kworker/4:1+events
> [   74.984450] [     T88] printk: console lock API call [4]: console_unlock() by pid=88, comm=kworker/4:1+events
> [   75.134088] [     T88] printk: console lock API call [5]: console_lock() by pid=88, comm=kworker/4:1+events
> [   75.134289] [     T88] printk: console lock API call [6]: console_unlock() by pid=88, comm=kworker/4:1+events

Could you please provide the full log, including the backtraces?
I am curious about the callers of the console_lock()...

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-02-02 11:02                         ` Petr Mladek
@ 2026-02-03  1:32                           ` ysard
  2026-02-03 14:11                             ` Petr Mladek
  0 siblings, 1 reply; 16+ messages in thread
From: ysard @ 2026-02-03  1:32 UTC (permalink / raw)
  To: Petr Mladek; +Cc: John Ogness, linux-kernel, senozhatsky

On 2026-02-02 12:02:07, Petr Mladek wrote:
> Could you please provide the full log, including the backtraces?
> I am curious about the callers of the console_lock()...

Sorry, I had been on the same grep since the beginning.
Here are the complete logs.


>    1. console_lock() API called by
>        echo suspend >/proc/driver/nvidia/suspend

https://pastebin.com/6gv0bF1c

>    2. console_lock() API called by the suspend test
>
>       No freeze (expected):
>       $ sudo sh -c "
>       mkdir -p /var/run/nvidia-sleep \
>       && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>       && chvt 63 \
>       && systemctl suspend"

https://pastebin.com/10JBnnRx

>    3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.:
>       $ sudo sh -c "
>       mkdir -p /var/run/nvidia-sleep \
>       && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>       && chvt 63 \
>       && echo suspend >/proc/driver/nvidia/suspend \
>       && systemctl suspend"

https://pastebin.com/nqks8h7Z

Cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state
  2026-02-03  1:32                           ` ysard
@ 2026-02-03 14:11                             ` Petr Mladek
  0 siblings, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2026-02-03 14:11 UTC (permalink / raw)
  To: ysard; +Cc: John Ogness, linux-kernel, senozhatsky

On Tue 2026-02-03 02:32:46, ysard wrote:
> On 2026-02-02 12:02:07, Petr Mladek wrote:
> > Could you please provide the full log, including the backtraces?
> > I am curious about the callers of the console_lock()...
> 
> Sorry, I had been on the same grep since the beginning.
> Here are the complete logs.
> 
> 
> >    1. console_lock() API called by
> >        echo suspend >/proc/driver/nvidia/suspend
> 
> https://pastebin.com/6gv0bF1c
> 
> >    2. console_lock() API called by the suspend test
> >
> >       No freeze (expected):
> >       $ sudo sh -c "
> >       mkdir -p /var/run/nvidia-sleep \
> >       && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> >       && chvt 63 \
> >       && systemctl suspend"
> 
> https://pastebin.com/10JBnnRx
> 
> >    3. No freeze with the 1st patch and uncommented synchronize_srcu() calls.:
> >       $ sudo sh -c "
> >       mkdir -p /var/run/nvidia-sleep \
> >       && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> >       && chvt 63 \
> >       && echo suspend >/proc/driver/nvidia/suspend \
> >       && systemctl suspend"
> 
> https://pastebin.com/nqks8h7Z

Thanks a lot for the complete logs. Honestly, I do not see any obvious
ansver on our guestions.

1. Backtraces with nvidia functions
===================================

I have hoped that the logs would include some backtraces where a
function from the nvidia driver would try to get console_lock().
But I can't find any.

Especially the 1st log from
"echo suspend >/proc/driver/nvidia/suspend"
does not include any nvidia code.

Note that there are functions from the nvidia driver in the 3rd log.
But they are all with '?', for example:

2026-01-31T23:01:51.032779+01:00  kernel: [ T1426] printk: console lock API call [21]: console_lock() by pid=1426, comm=Xorg
2026-01-31T23:01:51.032793+01:00  kernel: [ T1426] Call Trace:
2026-01-31T23:01:51.032794+01:00  kernel: [ T1426]  <TASK>
2026-01-31T23:01:51.032796+01:00  kernel: [ T1426]  print_console_lock_caller+0x154/0x220
2026-01-31T23:01:51.032797+01:00  kernel: [ T1426]  console_lock+0x1a/0x60
2026-01-31T23:01:51.032798+01:00  kernel: [ T1426]  vt_ioctl+0xe6d/0x1430
2026-01-31T23:01:51.032799+01:00  kernel: [ T1426]  tty_ioctl+0x261/0x8d0
2026-01-31T23:01:51.032799+01:00  kernel: [ T1426]  ? _nv023604rm+0x9/0x50 [nvidia]
2026-01-31T23:01:51.032800+01:00  kernel: [ T1426]  ? _nv023631rm+0x28/0xf0 [nvidia]
2026-01-31T23:01:51.032801+01:00  kernel: [ T1426]  __x64_sys_ioctl+0x96/0xe0
2026-01-31T23:01:51.032812+01:00  kernel: [ T1426]  do_syscall_64+0x85/0x640
2026-01-31T23:01:51.032814+01:00  kernel: [ T1426]  ? _nv011691rm+0xbe/0x100 [nvidia]
2026-01-31T23:01:51.032814+01:00  kernel: [ T1426]  ? _nv037812rm+0x11a/0x160 [nvidia]

It means that the addresses were found on stack but they were not part
of the current call chain.

The adresses might be on stack because they were passed as parameters
or so. But more likely they are there as a garbage (return adresses)
from previous stack users.


2. console_(try)lock() callers with suspended consoles
======================================================

The regression has gone after we restored the original behavior of
console_(try/un)lock() API after console_suspend_all().

I would expect that a console_(try/un)lock() API caller would get
blocked or do an out-of-memory access or so. But there seems to
be only one caller in the 3rd log.

Let me explain.

The are 4 console_lock()/console_unlock() calls from the pr_flush()
in console_suspend_all(). For example, the 4th one is:

2026-01-31T23:01:57.758969+01:00  kernel: [ T2628] printk: console lock API call [50]: console_unlock() by pid=2628, comm=systemd-sleep
2026-01-31T23:01:57.758971+01:00  kernel: [ T2628] Call Trace:
2026-01-31T23:01:57.758972+01:00  kernel: [ T2628]  <TASK>
2026-01-31T23:01:57.758973+01:00  kernel: [ T2628]  print_console_lock_caller+0x154/0x220
2026-01-31T23:01:57.758975+01:00  kernel: [ T2628]  console_unlock+0x22/0xf0
2026-01-31T23:01:57.758976+01:00  kernel: [ T2628]  ? down+0x1e/0x70
2026-01-31T23:01:57.758978+01:00  kernel: [ T2628]  __pr_flush+0x11e/0x280
2026-01-31T23:01:57.758979+01:00  kernel: [ T2628]  ? _printk+0x54/0x60
2026-01-31T23:01:57.758980+01:00  kernel: [ T2628]  console_suspend_all+0x27/0xc0
2026-01-31T23:01:57.758981+01:00  kernel: [ T2628]  suspend_devices_and_enter+0x11b/0xa20
2026-01-31T23:01:57.758983+01:00  kernel: [ T2628]  pm_suspend+0x19b/0x620
2026-01-31T23:01:57.758984+01:00  kernel: [ T2628]  state_store+0x6c/0xd0

Then there is one more console_lock() call, right before printing the
message into the log:

2026-01-31T23:01:57.759223+01:00  kernel: [ T2628] printk: console lock API call [51]: console_lock() by pid=2628, comm=systemd-sleep
2026-01-31T23:01:57.759225+01:00  kernel: [ T2628] Call Trace:
2026-01-31T23:01:57.759227+01:00  kernel: [ T2628]  <TASK>
2026-01-31T23:01:57.759229+01:00  kernel: [ T2628]  print_console_lock_caller+0x154/0x220
2026-01-31T23:01:57.759230+01:00  kernel: [ T2628]  console_lock+0x1a/0x60
2026-01-31T23:01:57.759232+01:00  kernel: [ T2628]  console_suspend_all+0x41/0xc0
2026-01-31T23:01:57.759233+01:00  kernel: [ T2628]  suspend_devices_and_enter+0x11b/0xa20
2026-01-31T23:01:57.759248+01:00  kernel: [ T2628]  pm_suspend+0x19b/0x620
2026-01-31T23:01:57.759251+01:00  kernel: [ T2628]  state_store+0x6c/0xd0
2026-01-31T23:01:57.759253+01:00  kernel: [ T2628]  kernfs_fop_write_iter+0x130/0x210
2026-01-31T23:01:57.759255+01:00  kernel: [ T2628]  vfs_write+0x274/0x430
2026-01-31T23:01:57.759256+01:00  kernel: [ T2628]  ksys_write+0x5e/0xe0
2026-01-31T23:01:57.759258+01:00  kernel: [ T2628]  do_syscall_64+0x85/0x640
2026-01-31T23:01:57.759259+01:00  kernel: [ T2628]  ? ___pte_offset_map+0x1b/0x110
2026-01-31T23:01:57.759260+01:00  kernel: [ T2628]  ? __handle_mm_fault+0xbdb/0x1020
2026-01-31T23:01:57.759261+01:00  kernel: [ T2628]  ? count_memcg_events+0xdd/0x1a0
2026-01-31T23:01:57.759263+01:00  kernel: [ T2628]  ? handle_mm_fault+0xbf/0x300
2026-01-31T23:01:57.759264+01:00  kernel: [ T2628]  ? do_user_addr_fault+0x2c1/0x880
2026-01-31T23:01:57.759266+01:00  kernel: [ T2628]  ? irqentry_exit+0x7b/0x6b0
2026-01-31T23:01:57.759267+01:00  kernel: [ T2628]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
2026-01-31T23:01:57.759269+01:00  kernel: [ T2628] RIP: 0033:0x7f961aa5e687
2026-01-31T23:01:57.759270+01:00  kernel: [ T2628] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
2026-01-31T23:01:57.759272+01:00  kernel: [ T2628] RSP: 002b:00007ffde2cb0040 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
2026-01-31T23:01:57.759273+01:00  kernel: [ T2628] RAX: ffffffffffffffda RBX: 00007f961b0f8c80 RCX: 00007f961aa5e687
2026-01-31T23:01:57.759275+01:00  kernel: [ T2628] RDX: 0000000000000004 RSI: 000055993c9fee10 RDI: 0000000000000007
2026-01-31T23:01:57.759276+01:00  kernel: [ T2628] RBP: 000055993c9fee10 R08: 0000000000000000 R09: 0000000000000000
2026-01-31T23:01:57.759278+01:00  kernel: [ T2628] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004
2026-01-31T23:01:57.759279+01:00  kernel: [ T2628] R13: 000055993c9f62a0 R14: 00007f961abb4e80 R15: 00007ffde2cb01b0
2026-01-31T23:01:57.759280+01:00  kernel: [ T2628]  </TASK>
2026-01-31T23:01:57.759282+01:00  kernel: [ T2628] printk: console_suspend_all


Now, here is the critical section where the console_lock() API behaves
differently. But there there is no caller until the system gets
suspended:

2026-01-31T23:01:57.759283+01:00  kernel: [ T2641] sd 0:0:0:0: [sdc] Synchronizing SCSI cache
2026-01-31T23:01:57.759285+01:00  kernel: [  T269] ata1.00: Entering standby power mode
2026-01-31T23:01:57.759286+01:00  kernel: [ T2638] sd 1:0:0:0: [sda] Synchronizing SCSI cache
2026-01-31T23:01:57.759288+01:00  kernel: [ T2639] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
2026-01-31T23:01:57.759289+01:00  kernel: [  T271] ata2.00: Entering standby power mode
2026-01-31T23:01:57.759290+01:00  kernel: [  T273] ata3.00: Entering standby power mode
2026-01-31T23:01:57.759291+01:00  kernel: [ T2628] ACPI: EC: interrupt blocked
2026-01-31T23:01:57.759293+01:00  kernel: [ T2628] ACPI: PM: Preparing to enter system sleep state S3
2026-01-31T23:01:57.759294+01:00  kernel: [ T2628] ACPI: EC: event blocked
2026-01-31T23:01:57.759296+01:00  kernel: [ T2628] ACPI: EC: EC stopped
2026-01-31T23:01:57.759297+01:00  kernel: [ T2628] ACPI: PM: Saving platform NVS memory
2026-01-31T23:01:57.759299+01:00  kernel: [ T2628] Disabling non-boot CPUs ...
2026-01-31T23:01:57.759300+01:00  kernel: [ T2628] smpboot: CPU 7 is now offline
2026-01-31T23:01:57.759301+01:00  kernel: [ T2628] smpboot: CPU 6 is now offline
2026-01-31T23:01:57.759303+01:00  kernel: [ T2628] smpboot: CPU 5 is now offline
2026-01-31T23:01:57.759304+01:00  kernel: [ T2628] smpboot: CPU 4 is now offline
2026-01-31T23:01:57.759306+01:00  kernel: [ T2628] smpboot: CPU 3 is now offline
2026-01-31T23:01:57.759307+01:00  kernel: [ T2628] smpboot: CPU 2 is now offline
2026-01-31T23:01:57.759309+01:00  kernel: [ T2628] smpboot: CPU 1 is now offline

Then the resume starts:

2026-01-31T23:01:57.759310+01:00  kernel: [ T2628] ACPI: PM: Low-level resume complete
2026-01-31T23:01:57.759312+01:00  kernel: [ T2628] ACPI: EC: EC started
2026-01-31T23:01:57.759313+01:00  kernel: [ T2628] ACPI: PM: Restoring platform NVS memory
2026-01-31T23:01:57.759314+01:00  kernel: [ T2628] Enabling non-boot CPUs ...
2026-01-31T23:01:57.759316+01:00  kernel: [ T2628] smpboot: Booting Node 0 Processor 1 APIC 0x2
2026-01-31T23:01:57.759317+01:00  kernel: [ T2628] CPU1 is up
2026-01-31T23:01:57.759319+01:00  kernel: [ T2628] smpboot: Booting Node 0 Processor 2 APIC 0x4
2026-01-31T23:01:57.759320+01:00  kernel: [ T2628] CPU2 is up
2026-01-31T23:01:57.759321+01:00  kernel: [ T2628] smpboot: Booting Node 0 Processor 3 APIC 0x6
2026-01-31T23:01:57.759323+01:00  kernel: [ T2628] CPU3 is up
2026-01-31T23:01:57.759324+01:00  kernel: [ T2628] smpboot: Booting Node 0 Processor 4 APIC 0x1
2026-01-31T23:01:57.759325+01:00  kernel: [ T2628] CPU4 is up
2026-01-31T23:01:57.759326+01:00  kernel: [ T2628] smpboot: Booting Node 0 Processor 5 APIC 0x3
2026-01-31T23:01:57.759328+01:00  kernel: [ T2628] CPU5 is up
2026-01-31T23:01:57.759329+01:00  kernel: [ T2628] smpboot: Booting Node 0 Processor 6 APIC 0x5
2026-01-31T23:01:57.759330+01:00  kernel: [ T2628] CPU6 is up
2026-01-31T23:01:57.759332+01:00  kernel: [ T2628] smpboot: Booting Node 0 Processor 7 APIC 0x7
2026-01-31T23:01:57.759333+01:00  kernel: [ T2628] CPU7 is up
2026-01-31T23:01:57.759335+01:00  kernel: [ T2628] ACPI: PM: Waking up from system sleep state S3
2026-01-31T23:01:57.759336+01:00  kernel: [ T2628] ACPI: EC: interrupt unblocked
2026-01-31T23:01:57.759337+01:00  kernel: [   T70] usb usb1: root hub lost power or was reset
2026-01-31T23:01:57.759339+01:00  kernel: [ T2650] xhci_hcd 0000:00:14.0: xHC error in resume, USBSTS 0x411, Reinit
2026-01-31T23:01:57.759340+01:00  kernel: [ T2650] usb usb3: root hub lost power or was reset
2026-01-31T23:01:57.759342+01:00  kernel: [ T2650] usb usb4: root hub lost power or was reset
2026-01-31T23:01:57.759343+01:00  kernel: [ T2628] ACPI: EC: event unblocked
2026-01-31T23:01:57.759344+01:00  kernel: [ T2655] usb usb2: root hub lost power or was reset

and finally, there is a console_lock API caller:

2026-01-31T23:01:57.759347+01:00  kernel: [   T97] printk: console lock API call [52]: console_lock() by pid=97, comm=kworker/2:1+events
2026-01-31T23:01:57.759349+01:00  kernel: [   T97] Call Trace:
2026-01-31T23:01:57.759350+01:00  kernel: [   T97]  <TASK>
2026-01-31T23:01:57.759351+01:00  kernel: [   T97]  print_console_lock_caller+0x154/0x220
2026-01-31T23:01:57.759353+01:00  kernel: [   T97]  console_lock+0x1a/0x60
2026-01-31T23:01:57.759354+01:00  kernel: [   T97]  console_callback+0x14/0x1e0
2026-01-31T23:01:57.759355+01:00  kernel: [   T97]  process_one_work+0x19b/0x3c0
2026-01-31T23:01:57.759357+01:00  kernel: [   T97]  worker_thread+0x256/0x370
2026-01-31T23:01:57.759358+01:00  kernel: [   T97]  ? __pfx_worker_thread+0x10/0x10
2026-01-31T23:01:57.759360+01:00  kernel: [   T97]  kthread+0xef/0x230
2026-01-31T23:01:57.759361+01:00  kernel: [   T97]  ? __pfx_kthread+0x10/0x10
2026-01-31T23:01:57.759362+01:00  kernel: [   T97]  ? __pfx_kthread+0x10/0x10
2026-01-31T23:01:57.759363+01:00  kernel: [   T97]  ret_from_fork+0x20b/0x260
2026-01-31T23:01:57.759365+01:00  kernel: [   T97]  ? __pfx_kthread+0x10/0x10
2026-01-31T23:01:57.759366+01:00  kernel: [   T97]  ret_from_fork_asm+0x1a/0x30
2026-01-31T23:01:57.759367+01:00  kernel: [   T97]  </TASK>
2026-01-31T23:01:57.759369+01:00  kernel: [   T97] printk: console lock API call [53]: console_unlock() by pid=97, comm=kworker/2:1+events
2026-01-31T23:01:57.759370+01:00  kernel: [   T97] Call Trace:
2026-01-31T23:01:57.759371+01:00  kernel: [   T97]  <TASK>
2026-01-31T23:01:57.759373+01:00  kernel: [   T97]  print_console_lock_caller+0x154/0x220
2026-01-31T23:01:57.759374+01:00  kernel: [   T97]  console_unlock+0x22/0xf0
2026-01-31T23:01:57.759376+01:00  kernel: [   T97]  process_one_work+0x19b/0x3c0
2026-01-31T23:01:57.759377+01:00  kernel: [   T97]  worker_thread+0x256/0x370
2026-01-31T23:01:57.759378+01:00  kernel: [   T97]  ? __pfx_worker_thread+0x10/0x10
2026-01-31T23:01:57.759380+01:00  kernel: [   T97]  kthread+0xef/0x230
2026-01-31T23:01:57.759381+01:00  kernel: [   T97]  ? __pfx_kthread+0x10/0x10
2026-01-31T23:01:57.759382+01:00  kernel: [   T97]  ? __pfx_kthread+0x10/0x10
2026-01-31T23:01:57.759384+01:00  kernel: [   T97]  ret_from_fork+0x20b/0x260
2026-01-31T23:01:57.759385+01:00  kernel: [   T97]  ? __pfx_kthread+0x10/0x10
2026-01-31T23:01:57.759386+01:00  kernel: [   T97]  ret_from_fork_asm+0x1a/0x30
2026-01-31T23:01:57.759388+01:00  kernel: [   T97]  </TASK>


It seems to be from the console_callback() from the generic
drivers/tty/vt/vt.c code.

Sigh, I am not sure if it does anything which might be affected by
the state of the nvidia driver. Mabye, we should add more people
into Cc who are more familiar with the VT code and graphical
drivers.

Anyway, this was the last and only call before console_resume_all()
was called:

2026-01-31T23:01:57.759389+01:00  kernel: [ T2635] usb 1-1: reset high-speed USB device number 2 using ehci-pci
2026-01-31T23:01:57.759390+01:00  kernel: [ T2652] usb 2-1: reset high-speed USB device number 2 using ehci-pci
2026-01-31T23:01:57.759392+01:00  kernel: [   T97] psmouse serio4: synaptics: queried max coordinates: x [..5660], y [..4746]
2026-01-31T23:01:57.759393+01:00  kernel: [  T273] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
2026-01-31T23:01:57.759394+01:00  kernel: [  T271] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
2026-01-31T23:01:57.759396+01:00  kernel: [  T273] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
2026-01-31T23:01:57.759397+01:00  kernel: [  T271] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
2026-01-31T23:01:57.759398+01:00  kernel: [  T271] ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
2026-01-31T23:01:57.759581+01:00  kernel: [  T273] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
2026-01-31T23:01:57.759584+01:00  kernel: [  T273] ata3.00: supports DRM functions and may not be fully accessible
2026-01-31T23:01:57.759586+01:00  kernel: [  T271] ata2.00: supports DRM functions and may not be fully accessible
2026-01-31T23:01:57.759588+01:00  kernel: [   T88] sd 2:0:0:0: [sdb] Starting disk
2026-01-31T23:01:57.759589+01:00  kernel: [  T273] ata3.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
2026-01-31T23:01:57.759591+01:00  kernel: [  T273] ata3.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
2026-01-31T23:01:57.759593+01:00  kernel: [  T273] ata3.00: supports DRM functions and may not be fully accessible
2026-01-31T23:01:57.759594+01:00  kernel: [  T179] sd 1:0:0:0: [sda] Starting disk
2026-01-31T23:01:57.759596+01:00  kernel: [  T273] ata3.00: configured for UDMA/133
2026-01-31T23:01:57.759597+01:00  kernel: [  T271] ata2.00: ACPI cmd f5/00:00:00:00:00:a0(SECURITY FREEZE LOCK) filtered out
2026-01-31T23:01:57.759599+01:00  kernel: [  T271] ata2.00: ACPI cmd ef/10:03:00:00:00:a0(SET FEATURES) filtered out
2026-01-31T23:01:57.759600+01:00  kernel: [   T97] psmouse serio4: synaptics: queried min coordinates: x [1282..], y [1106..]
2026-01-31T23:01:57.759602+01:00  kernel: [  T271] ata2.00: supports DRM functions and may not be fully accessible
2026-01-31T23:01:57.759603+01:00  kernel: [  T271] ata2.00: configured for UDMA/133
2026-01-31T23:01:57.759605+01:00  kernel: [ T2667] usb 3-4: reset low-speed USB device number 3 using xhci_hcd
2026-01-31T23:01:57.759606+01:00  kernel: [ T2635] usb 1-1.3: reset high-speed USB device number 3 using ehci-pci
2026-01-31T23:01:57.759608+01:00  kernel: [ T2665] usb 3-1: reset high-speed USB device number 2 using xhci_hcd
2026-01-31T23:01:57.759609+01:00  kernel: [ T2628] printk: console_resume_all
2026-01-31T23:01:57.759610+01:00  kernel: [ T2628] printk: console lock API call [54]: console_unlock() by pid=2628, comm=systemd-sleep
2026-01-31T23:01:57.759612+01:00  kernel: [ T2628] Call Trace:
2026-01-31T23:01:57.759613+01:00  kernel: [ T2628]  <TASK>
2026-01-31T23:01:57.759614+01:00  kernel: [ T2628]  print_console_lock_caller+0x154/0x220
2026-01-31T23:01:57.759616+01:00  kernel: [ T2628]  console_unlock+0x22/0xf0
2026-01-31T23:01:57.759617+01:00  kernel: [ T2628]  console_resume_all+0xe4/0x100
2026-01-31T23:01:57.759619+01:00  kernel: [ T2628]  suspend_devices_and_enter+0x340/0xa20
2026-01-31T23:01:57.759620+01:00  kernel: [ T2628]  pm_suspend+0x19b/0x620
2026-01-31T23:01:57.759621+01:00  kernel: [ T2628]  state_store+0x6c/0xd0

So, the problem might be in the console_callback() callbacks. But it
seems that most code paths are not called. Otherwise,
WARN_CONSOLE_UNLOCKED() would warned. Note that the reverted
console_lock() does not set "console_locked" when suspended.

Or we might miss some calls. For example, console_trylock() calls
in vprintk_emit() and fb_flashcursor() are not logged. But again
a lot of vt code is guarded by WARN_CONSOLE_UNLOCKED(). And these
parts were not called almost for sure.


Hmm, I am not sure how to move forward.

One more idea is to try shuffling the code in console_trylock()/
console_lock()/console_unlock() and see when the regression
will be visible again. I mean to restore the "broken" behavior
step by step:

    1. set console_locked = 1 even when console_suspended
    2. set console_may_schedule = 1 even when console_suspended
    3. call __console_flush_and_unlock() even when console_suspended

I wonder which particular change would break the suspend.

It might be another clue.

Any other ideas are welcome.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-02-03 14:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-21 22:42 Regression: system freeze on resume from suspend introduced by printk per-console suspended state ysard_git
2025-12-23  6:20 ` John Ogness
     [not found]   ` <trinity-43147d5d-a8ea-47c1-9f83-b578c346b387-1766479103562@3c-app-mailcom-bs12>
2026-01-08  0:05     ` pv
2026-01-08  9:43       ` John Ogness
2026-01-23  7:44         ` ysard
2026-01-23 12:19           ` Petr Mladek
2026-01-24  1:22             ` ysard
2026-01-28 14:00               ` Petr Mladek
2026-01-28 15:25                 ` John Ogness
2026-01-29  9:34                 ` ysard
2026-01-30 15:56                   ` Petr Mladek
2026-01-30 16:28                     ` Petr Mladek
2026-01-31 22:22                       ` ysard
2026-02-02 11:02                         ` Petr Mladek
2026-02-03  1:32                           ` ysard
2026-02-03 14:11                             ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox