public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH printk v3 0/7] various cleanups
@ 2023-07-17 19:46 John Ogness
  2023-07-17 19:46 ` [PATCH printk v3 1/7] kdb: Do not assume write() callback available John Ogness
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Jason Wessel, Daniel Thompson, Douglas Anderson, Aaron Tomlin,
	Greg Kroah-Hartman

Hi,

This is v3 of a series providing some cleanup in preparation
for the threaded/atomic console work. v2 is here [0]. This
series provides useful cleanups independent of the
threaded/atomic work.

Changes since v2:

- "NMI safety" patch split into 3 patches: console_unblank,
  keep non-panic CPUs out, do not lock console on panic flush.

- console_unblank() aborts if there are no consoles implementing
  an unblank() callback. This check is performed before taking
  the console lock.

- Commit messages and comments updated to mention the details
  discussed in the v2 feedback.

John Ogness

[0] https://lore.kernel.org/lkml/20230710134524.25232-1-john.ogness@linutronix.de

John Ogness (7):
  kdb: Do not assume write() callback available
  printk: Reduce console_unblank() usage in unsafe scenarios
  printk: Keep non-panic-CPUs out of console lock
  printk: Do not take console lock for console_flush_on_panic()
  printk: Consolidate console deferred printing
  printk: Add per-console suspended state
  printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic()

 include/linux/console.h     |   3 +
 kernel/debug/kdb/kdb_io.c   |   2 +
 kernel/printk/internal.h    |   2 +
 kernel/printk/printk.c      | 213 +++++++++++++++++++++++++-----------
 kernel/printk/printk_safe.c |   9 +-
 5 files changed, 156 insertions(+), 73 deletions(-)


base-commit: 7ec85f3e089aa423a69559bf4555b6218b5a2ef7
-- 
2.30.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH printk v3 1/7] kdb: Do not assume write() callback available
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
@ 2023-07-17 19:46 ` John Ogness
  2023-07-17 19:46 ` [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios John Ogness
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Jason Wessel, Daniel Thompson, Douglas Anderson, Aaron Tomlin

It is allowed for consoles to not provide a write() callback. For
example ttynull does this.

Check if a write() callback is available before using it.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 kernel/debug/kdb/kdb_io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 5c7e9ba7cd6b..e9139dfc1f0a 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -576,6 +576,8 @@ static void kdb_msg_write(const char *msg, int msg_len)
 			continue;
 		if (c == dbg_io_ops->cons)
 			continue;
+		if (!c->write)
+			continue;
 		/*
 		 * Set oops_in_progress to encourage the console drivers to
 		 * disregard their internal spin locks: in the current calling
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
  2023-07-17 19:46 ` [PATCH printk v3 1/7] kdb: Do not assume write() callback available John Ogness
@ 2023-07-17 19:46 ` John Ogness
  2023-07-18  3:39   ` Sergey Senozhatsky
  2023-07-18 10:21   ` Petr Mladek
  2023-07-17 19:46 ` [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock John Ogness
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

A semaphore is not NMI-safe, even when using down_trylock(). Both
down_trylock() and up() are using internal spinlocks and up()
might even call wake_up_process().

In the panic() code path it gets even worse because the internal
spinlocks of the semaphore may have been taken by a CPU that has
been stopped.

To reduce the risk of deadlocks caused by the console semaphore in
the panic path, make the following changes:

- First check if any consoles have implemented the unblank()
  callback. If not, then there is no reason to take the console
  semaphore anyway. (This check is also useful for the non-panic
  path since the locking/unlocking of the console lock can be
  quite expensive due to console printing.)

- If the panic path is in NMI context, bail out without attempting
  to take the console semaphore or calling any unblank() callbacks.
  Bailing out is acceptable because console_unblank() would already
  bail out if the console semaphore is contended. The alternative of
  ignoring the console semaphore and calling the unblank() callbacks
  anyway is a bad idea because these callbacks are also not NMI-safe.

If consoles with unblank() callbacks exist and console_unblank() is
called from a non-NMI panic context, it will still attempt a
down_trylock(). This could still result in a deadlock if one of the
stopped CPUs is holding the semaphore internal spinlock. But this
is a risk that the kernel has been (and continues to be) willing
to take.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9644f6e5bf15..7aa9dbee12e8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3043,9 +3043,27 @@ EXPORT_SYMBOL(console_conditional_schedule);
 
 void console_unblank(void)
 {
+	bool found_unblank = false;
 	struct console *c;
 	int cookie;
 
+	/*
+	 * First check if there are any consoles implementing the unblank()
+	 * callback. If not, there is no reason to continue and take the
+	 * console lock, which in particular can be dangerous if
+	 * @oops_in_progress is set.
+	 */
+	cookie = console_srcu_read_lock();
+	for_each_console_srcu(c) {
+		if ((console_srcu_read_flags(c) & CON_ENABLED) && c->unblank) {
+			found_unblank = true;
+			break;
+		}
+	}
+	console_srcu_read_unlock(cookie);
+	if (!found_unblank)
+		return;
+
 	/*
 	 * Stop console printing because the unblank() callback may
 	 * assume the console is not within its write() callback.
@@ -3054,6 +3072,16 @@ void console_unblank(void)
 	 * In that case, attempt a trylock as best-effort.
 	 */
 	if (oops_in_progress) {
+		/* Semaphores are not NMI-safe. */
+		if (in_nmi())
+			return;
+
+		/*
+		 * Attempting to trylock the console lock can deadlock
+		 * if another CPU was stopped while modifying the
+		 * semaphore. "Hope and pray" that this is not the
+		 * current situation.
+		 */
 		if (down_trylock_console_sem() != 0)
 			return;
 	} else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
  2023-07-17 19:46 ` [PATCH printk v3 1/7] kdb: Do not assume write() callback available John Ogness
  2023-07-17 19:46 ` [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios John Ogness
@ 2023-07-17 19:46 ` John Ogness
  2023-07-18  3:17   ` Sergey Senozhatsky
  2023-07-18 10:23   ` Petr Mladek
  2023-07-17 19:46 ` [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic() John Ogness
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

When in a panic situation, non-panic CPUs should avoid holding the
console lock so as not to contend with the panic CPU. This is already
implemented with abandon_console_lock_in_panic(), which is checked
after each printed line. However, non-panic CPUs should also avoid
trying to acquire the console lock during a panic.

Modify console_trylock() to fail and console_lock() to block() when
called from a non-panic CPU during a panic.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 45 ++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7aa9dbee12e8..7219991885e6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2583,6 +2583,25 @@ static int console_cpu_notify(unsigned int cpu)
 	return 0;
 }
 
+/*
+ * Return true when this CPU should unlock console_sem without pushing all
+ * messages to the console. This reduces the chance that the console is
+ * locked when the panic CPU tries to use it.
+ */
+static bool abandon_console_lock_in_panic(void)
+{
+	if (!panic_in_progress())
+		return false;
+
+	/*
+	 * We can use raw_smp_processor_id() here because it is impossible for
+	 * the task to be migrated to the panic_cpu, or away from it. If
+	 * panic_cpu has already been set, and we're not currently executing on
+	 * that CPU, then we never will be.
+	 */
+	return atomic_read(&panic_cpu) != raw_smp_processor_id();
+}
+
 /**
  * console_lock - block the console subsystem from printing
  *
@@ -2595,6 +2614,10 @@ void console_lock(void)
 {
 	might_sleep();
 
+	/* On panic, the console_lock must be left to the panic cpu. */
+	while (abandon_console_lock_in_panic())
+		msleep(1000);
+
 	down_console_sem();
 	if (console_suspended)
 		return;
@@ -2613,6 +2636,9 @@ EXPORT_SYMBOL(console_lock);
  */
 int console_trylock(void)
 {
+	/* On panic, the console_lock must be left to the panic cpu. */
+	if (abandon_console_lock_in_panic())
+		return 0;
 	if (down_trylock_console_sem())
 		return 0;
 	if (console_suspended) {
@@ -2631,25 +2657,6 @@ int is_console_locked(void)
 }
 EXPORT_SYMBOL(is_console_locked);
 
-/*
- * Return true when this CPU should unlock console_sem without pushing all
- * messages to the console. This reduces the chance that the console is
- * locked when the panic CPU tries to use it.
- */
-static bool abandon_console_lock_in_panic(void)
-{
-	if (!panic_in_progress())
-		return false;
-
-	/*
-	 * We can use raw_smp_processor_id() here because it is impossible for
-	 * the task to be migrated to the panic_cpu, or away from it. If
-	 * panic_cpu has already been set, and we're not currently executing on
-	 * that CPU, then we never will be.
-	 */
-	return atomic_read(&panic_cpu) != raw_smp_processor_id();
-}
-
 /*
  * Check if the given console is currently capable and allowed to print
  * records.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic()
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
                   ` (2 preceding siblings ...)
  2023-07-17 19:46 ` [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock John Ogness
@ 2023-07-17 19:46 ` John Ogness
  2023-07-18  3:18   ` Sergey Senozhatsky
  2023-07-18 10:28   ` Petr Mladek
  2023-07-17 19:46 ` [PATCH printk v3 5/7] printk: Consolidate console deferred printing John Ogness
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Currently console_flush_on_panic() will attempt to acquire the
console lock when flushing the buffer on panic. If it fails to
acquire the lock, it continues anyway because this is the last
chance to get any pending records printed.

The reason why the console lock was attempted at all was to
prevent any other CPUs from acquiring the console lock for
printing while the panic CPU was printing. But as of the
previous commit, non-panic CPUs will no longer attempt to
acquire the console lock in a panic situation. Therefore it is
no longer strictly necessary for a panic CPU to acquire the
console lock.

Avoiding taking the console lock when flushing in panic has
the additional benefit of avoiding possible deadlocks due to
semaphore usage in NMI context (semaphores are not NMI-safe)
and avoiding possible deadlocks if another CPU accesses the
semaphore and is stopped while holding one of the semaphore's
internal spinlocks.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7219991885e6..51445e8ea730 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3118,14 +3118,24 @@ void console_unblank(void)
  */
 void console_flush_on_panic(enum con_flush_mode mode)
 {
+	bool handover;
+	u64 next_seq;
+
 	/*
-	 * If someone else is holding the console lock, trylock will fail
-	 * and may_schedule may be set.  Ignore and proceed to unlock so
-	 * that messages are flushed out.  As this can be called from any
-	 * context and we don't want to get preempted while flushing,
-	 * ensure may_schedule is cleared.
+	 * Ignore the console lock and flush out the messages. Attempting a
+	 * trylock would not be useful because:
+	 *
+	 *   - if it is contended, it must be ignored anyway
+	 *   - console_lock() and console_trylock() block and fail
+	 *     respectively in panic for non-panic CPUs
+	 *   - semaphores are not NMI-safe
+	 */
+
+	/*
+	 * If another context is holding the console lock,
+	 * @console_may_schedule might be set. Clear it so that
+	 * this context does not call cond_resched() while flushing.
 	 */
-	console_trylock();
 	console_may_schedule = 0;
 
 	if (mode == CONSOLE_REPLAY_ALL) {
@@ -3138,15 +3148,15 @@ void console_flush_on_panic(enum con_flush_mode mode)
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(c) {
 			/*
-			 * If the above console_trylock() failed, this is an
-			 * unsynchronized assignment. But in that case, the
+			 * This is an unsynchronized assignment, but the
 			 * kernel is in "hope and pray" mode anyway.
 			 */
 			c->seq = seq;
 		}
 		console_srcu_read_unlock(cookie);
 	}
-	console_unlock();
+
+	console_flush_all(false, &next_seq, &handover);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH printk v3 5/7] printk: Consolidate console deferred printing
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
                   ` (3 preceding siblings ...)
  2023-07-17 19:46 ` [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic() John Ogness
@ 2023-07-17 19:46 ` John Ogness
  2023-07-17 19:46 ` [PATCH printk v3 6/7] printk: Add per-console suspended state John Ogness
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Printing to consoles can be deferred for several reasons:

- explicitly with printk_deferred()
- printk() in NMI context
- recursive printk() calls

The current implementation is not consistent. For printk_deferred(),
irq work is scheduled twice. For NMI und recursive, panic CPU
suppression and caller delays are not properly enforced.

Correct these inconsistencies by consolidating the deferred printing
code so that vprintk_deferred() is the top-level function for
deferred printing and vprintk_emit() will perform whichever irq_work
queueing is appropriate.

Also add kerneldoc for wake_up_klogd() and defer_console_output() to
clarify their differences and appropriate usage.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c      | 35 ++++++++++++++++++++++++++++-------
 kernel/printk/printk_safe.c |  9 ++-------
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 51445e8ea730..6e853a1441a7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2306,7 +2306,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 		preempt_enable();
 	}
 
-	wake_up_klogd();
+	if (in_sched)
+		defer_console_output();
+	else
+		wake_up_klogd();
+
 	return printed_len;
 }
 EXPORT_SYMBOL(vprintk_emit);
@@ -3841,11 +3845,33 @@ static void __wake_up_klogd(int val)
 	preempt_enable();
 }
 
+/**
+ * wake_up_klogd - Wake kernel logging daemon
+ *
+ * Use this function when new records have been added to the ringbuffer
+ * and the console printing of those records has already occurred or is
+ * known to be handled by some other context. This function will only
+ * wake the logging daemon.
+ *
+ * Context: Any context.
+ */
 void wake_up_klogd(void)
 {
 	__wake_up_klogd(PRINTK_PENDING_WAKEUP);
 }
 
+/**
+ * defer_console_output - Wake kernel logging daemon and trigger
+ *	console printing in a deferred context
+ *
+ * Use this function when new records have been added to the ringbuffer,
+ * this context is responsible for console printing those records, but
+ * the current context is not allowed to perform the console printing.
+ * Trigger an irq_work context to perform the console printing. This
+ * function also wakes the logging daemon.
+ *
+ * Context: Any context.
+ */
 void defer_console_output(void)
 {
 	/*
@@ -3862,12 +3888,7 @@ void printk_trigger_flush(void)
 
 int vprintk_deferred(const char *fmt, va_list args)
 {
-	int r;
-
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, fmt, args);
-	defer_console_output();
-
-	return r;
+	return vprintk_emit(0, LOGLEVEL_SCHED, NULL, fmt, args);
 }
 
 int _printk_deferred(const char *fmt, ...)
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index ef0f9a2044da..6d10927a07d8 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -38,13 +38,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
 	 * Use the main logbuf even in NMI. But avoid calling console
 	 * drivers that might have their own locks.
 	 */
-	if (this_cpu_read(printk_context) || in_nmi()) {
-		int len;
-
-		len = vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
-		defer_console_output();
-		return len;
-	}
+	if (this_cpu_read(printk_context) || in_nmi())
+		return vprintk_deferred(fmt, args);
 
 	/* No obstacles. */
 	return vprintk_default(fmt, args);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH printk v3 6/7] printk: Add per-console suspended state
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
                   ` (4 preceding siblings ...)
  2023-07-17 19:46 ` [PATCH printk v3 5/7] printk: Consolidate console deferred printing John Ogness
@ 2023-07-17 19:46 ` John Ogness
  2023-07-17 19:46 ` [PATCH printk v3 7/7] printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic() John Ogness
  2023-07-18 10:44 ` [PATCH printk v3 0/7] various cleanups Petr Mladek
  7 siblings, 0 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

Currently the global @console_suspended is used to determine if
consoles are in a suspended state. Its primary purpose is to allow
usage of the console_lock when suspended without causing console
printing. It is synchronized by the console_lock.

Rather than relying on the console_lock to determine suspended
state, make it an official per-console state that is set within
console->flags. This allows the state to be queried via SRCU.

Remove @console_suspended. Console printing will still be avoided
when suspended because console_is_usable() returns false when
the new suspended flag is set for that console.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/console.h |  3 ++
 kernel/printk/printk.c  | 74 ++++++++++++++++++++++++-----------------
 2 files changed, 47 insertions(+), 30 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index d3195664baa5..7de11c763eb3 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -154,6 +154,8 @@ static inline int con_debug_leave(void)
  *			receiving the printk spam for obvious reasons.
  * @CON_EXTENDED:	The console supports the extended output format of
  *			/dev/kmesg which requires a larger output buffer.
+ * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
+ *			printing callbacks must not be called.
  */
 enum cons_flags {
 	CON_PRINTBUFFER		= BIT(0),
@@ -163,6 +165,7 @@ enum cons_flags {
 	CON_ANYTIME		= BIT(4),
 	CON_BRL			= BIT(5),
 	CON_EXTENDED		= BIT(6),
+	CON_SUSPENDED		= BIT(7),
 };
 
 /**
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6e853a1441a7..efe577477913 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -86,7 +86,7 @@ EXPORT_SYMBOL(oops_in_progress);
 static DEFINE_MUTEX(console_mutex);
 
 /*
- * console_sem protects updates to console->seq and console_suspended,
+ * console_sem protects updates to console->seq
  * and also provides serialization for console printing.
  */
 static DEFINE_SEMAPHORE(console_sem);
@@ -359,7 +359,7 @@ static bool panic_in_progress(void)
  * paths in the console code where we end up in places I want
  * locked without the console semaphore held).
  */
-static int console_locked, console_suspended;
+static int console_locked;
 
 /*
  *	Array of consoles built from command line options (console=)
@@ -2549,22 +2549,46 @@ MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to hig
  */
 void suspend_console(void)
 {
+	struct console *con;
+
 	if (!console_suspend_enabled)
 		return;
 	pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
 	pr_flush(1000, true);
-	console_lock();
-	console_suspended = 1;
-	up_console_sem();
+
+	console_list_lock();
+	for_each_console(con)
+		console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
+	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All printing
+	 * contexts must be able to see that they are suspended so that it
+	 * is guaranteed that all printing has stopped when this function
+	 * completes.
+	 */
+	synchronize_srcu(&console_srcu);
 }
 
 void resume_console(void)
 {
+	struct console *con;
+
 	if (!console_suspend_enabled)
 		return;
-	down_console_sem();
-	console_suspended = 0;
-	console_unlock();
+
+	console_list_lock();
+	for_each_console(con)
+		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
+	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All printing
+	 * contexts must be able to see they are no longer suspended so
+	 * that they are guaranteed to wake up and resume printing.
+	 */
+	synchronize_srcu(&console_srcu);
+
 	pr_flush(1000, true);
 }
 
@@ -2623,8 +2647,6 @@ void console_lock(void)
 		msleep(1000);
 
 	down_console_sem();
-	if (console_suspended)
-		return;
 	console_locked = 1;
 	console_may_schedule = 1;
 }
@@ -2645,10 +2667,6 @@ int console_trylock(void)
 		return 0;
 	if (down_trylock_console_sem())
 		return 0;
-	if (console_suspended) {
-		up_console_sem();
-		return 0;
-	}
 	console_locked = 1;
 	console_may_schedule = 0;
 	return 1;
@@ -2674,6 +2692,9 @@ static inline bool console_is_usable(struct console *con)
 	if (!(flags & CON_ENABLED))
 		return false;
 
+	if ((flags & CON_SUSPENDED))
+		return false;
+
 	if (!con->write)
 		return false;
 
@@ -2992,11 +3013,6 @@ void console_unlock(void)
 	bool flushed;
 	u64 next_seq;
 
-	if (console_suspended) {
-		up_console_sem();
-		return;
-	}
-
 	/*
 	 * Console drivers are called with interrupts disabled, so
 	 * @console_may_schedule should be cleared before; however, we may
@@ -3726,8 +3742,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 
 		/*
 		 * Hold the console_lock to guarantee safe access to
-		 * console->seq and to prevent changes to @console_suspended
-		 * until all consoles have been processed.
+		 * console->seq.
 		 */
 		console_lock();
 
@@ -3735,6 +3750,11 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 		for_each_console_srcu(c) {
 			if (con && con != c)
 				continue;
+			/*
+			 * If consoles are not usable, it cannot be expected
+			 * that they make forward progress, so only increment
+			 * @diff for usable consoles.
+			 */
 			if (!console_is_usable(c))
 				continue;
 			printk_seq = c->seq;
@@ -3743,18 +3763,12 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 		}
 		console_srcu_read_unlock(cookie);
 
-		/*
-		 * If consoles are suspended, it cannot be expected that they
-		 * make forward progress, so timeout immediately. @diff is
-		 * still used to return a valid flush status.
-		 */
-		if (console_suspended)
-			remaining = 0;
-		else if (diff != last_diff && reset_on_progress)
+		if (diff != last_diff && reset_on_progress)
 			remaining = timeout_ms;
 
 		console_unlock();
 
+		/* Note: @diff is 0 if there are no usable consoles. */
 		if (diff == 0 || remaining == 0)
 			break;
 
@@ -3788,7 +3802,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
  * printer has been seen to make some forward progress.
  *
  * Context: Process context. May sleep while acquiring console lock.
- * Return: true if all enabled printers are caught up.
+ * Return: true if all usable printers are caught up.
  */
 static bool pr_flush(int timeout_ms, bool reset_on_progress)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH printk v3 7/7] printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic()
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
                   ` (5 preceding siblings ...)
  2023-07-17 19:46 ` [PATCH printk v3 6/7] printk: Add per-console suspended state John Ogness
@ 2023-07-17 19:46 ` John Ogness
  2023-07-18 10:44 ` [PATCH printk v3 0/7] various cleanups Petr Mladek
  7 siblings, 0 replies; 16+ messages in thread
From: John Ogness @ 2023-07-17 19:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Currently abandon_console_lock_in_panic() is only used to determine if
the current CPU should immediately release the console lock because
another CPU is in panic. However, later this function will be used by
the CPU to immediately release other resources in this situation.

Rename the function to other_cpu_in_panic(), which is a better
description and does not assume it is related to the console lock.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/internal.h |  2 ++
 kernel/printk/printk.c   | 15 ++++++++-------
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a17704136f1..7d4979d5c3ce 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -103,3 +103,5 @@ struct printk_message {
 	u64			seq;
 	unsigned long		dropped;
 };
+
+bool other_cpu_in_panic(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index efe577477913..8787d3a72114 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2612,11 +2612,12 @@ static int console_cpu_notify(unsigned int cpu)
 }
 
 /*
- * Return true when this CPU should unlock console_sem without pushing all
- * messages to the console. This reduces the chance that the console is
- * locked when the panic CPU tries to use it.
+ * Return true if a panic is in progress on a remote CPU.
+ *
+ * On true, the local CPU should immediately release any printing resources
+ * that may be needed by the panic CPU.
  */
-static bool abandon_console_lock_in_panic(void)
+bool other_cpu_in_panic(void)
 {
 	if (!panic_in_progress())
 		return false;
@@ -2643,7 +2644,7 @@ void console_lock(void)
 	might_sleep();
 
 	/* On panic, the console_lock must be left to the panic cpu. */
-	while (abandon_console_lock_in_panic())
+	while (other_cpu_in_panic())
 		msleep(1000);
 
 	down_console_sem();
@@ -2663,7 +2664,7 @@ EXPORT_SYMBOL(console_lock);
 int console_trylock(void)
 {
 	/* On panic, the console_lock must be left to the panic cpu. */
-	if (abandon_console_lock_in_panic())
+	if (other_cpu_in_panic())
 		return 0;
 	if (down_trylock_console_sem())
 		return 0;
@@ -2978,7 +2979,7 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 			any_progress = true;
 
 			/* Allow panic_cpu to take over the consoles safely. */
-			if (abandon_console_lock_in_panic())
+			if (other_cpu_in_panic())
 				goto abandon;
 
 			if (do_cond_resched)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock
  2023-07-17 19:46 ` [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock John Ogness
@ 2023-07-18  3:17   ` Sergey Senozhatsky
  2023-07-18 10:23   ` Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2023-07-18  3:17 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On (23/07/17 21:52), John Ogness wrote:
> 
> When in a panic situation, non-panic CPUs should avoid holding the
> console lock so as not to contend with the panic CPU. This is already
> implemented with abandon_console_lock_in_panic(), which is checked
> after each printed line. However, non-panic CPUs should also avoid
> trying to acquire the console lock during a panic.
> 
> Modify console_trylock() to fail and console_lock() to block() when
> called from a non-panic CPU during a panic.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic()
  2023-07-17 19:46 ` [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic() John Ogness
@ 2023-07-18  3:18   ` Sergey Senozhatsky
  2023-07-18 10:28   ` Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2023-07-18  3:18 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On (23/07/17 21:52), John Ogness wrote:
> 
> Currently console_flush_on_panic() will attempt to acquire the
> console lock when flushing the buffer on panic. If it fails to
> acquire the lock, it continues anyway because this is the last
> chance to get any pending records printed.
> 
> The reason why the console lock was attempted at all was to
> prevent any other CPUs from acquiring the console lock for
> printing while the panic CPU was printing. But as of the
> previous commit, non-panic CPUs will no longer attempt to
> acquire the console lock in a panic situation. Therefore it is
> no longer strictly necessary for a panic CPU to acquire the
> console lock.
> 
> Avoiding taking the console lock when flushing in panic has
> the additional benefit of avoiding possible deadlocks due to
> semaphore usage in NMI context (semaphores are not NMI-safe)
> and avoiding possible deadlocks if another CPU accesses the
> semaphore and is stopped while holding one of the semaphore's
> internal spinlocks.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios
  2023-07-17 19:46 ` [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios John Ogness
@ 2023-07-18  3:39   ` Sergey Senozhatsky
  2023-07-18 10:21   ` Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2023-07-18  3:39 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On (23/07/17 21:52), John Ogness wrote:
> 
> A semaphore is not NMI-safe, even when using down_trylock(). Both
> down_trylock() and up() are using internal spinlocks and up()
> might even call wake_up_process().
> 
> In the panic() code path it gets even worse because the internal
> spinlocks of the semaphore may have been taken by a CPU that has
> been stopped.
> 
> To reduce the risk of deadlocks caused by the console semaphore in
> the panic path, make the following changes:
> 
> - First check if any consoles have implemented the unblank()
>   callback. If not, then there is no reason to take the console
>   semaphore anyway. (This check is also useful for the non-panic
>   path since the locking/unlocking of the console lock can be
>   quite expensive due to console printing.)
> 
> - If the panic path is in NMI context, bail out without attempting
>   to take the console semaphore or calling any unblank() callbacks.
>   Bailing out is acceptable because console_unblank() would already
>   bail out if the console semaphore is contended. The alternative of
>   ignoring the console semaphore and calling the unblank() callbacks
>   anyway is a bad idea because these callbacks are also not NMI-safe.
> 
> If consoles with unblank() callbacks exist and console_unblank() is
> called from a non-NMI panic context, it will still attempt a
> down_trylock(). This could still result in a deadlock if one of the
> stopped CPUs is holding the semaphore internal spinlock. But this
> is a risk that the kernel has been (and continues to be) willing
> to take.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios
  2023-07-17 19:46 ` [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios John Ogness
  2023-07-18  3:39   ` Sergey Senozhatsky
@ 2023-07-18 10:21   ` Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2023-07-18 10:21 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2023-07-17 21:52:02, John Ogness wrote:
> A semaphore is not NMI-safe, even when using down_trylock(). Both
> down_trylock() and up() are using internal spinlocks and up()
> might even call wake_up_process().
> 
> In the panic() code path it gets even worse because the internal
> spinlocks of the semaphore may have been taken by a CPU that has
> been stopped.
> 
> To reduce the risk of deadlocks caused by the console semaphore in
> the panic path, make the following changes:
> 
> - First check if any consoles have implemented the unblank()
>   callback. If not, then there is no reason to take the console
>   semaphore anyway. (This check is also useful for the non-panic
>   path since the locking/unlocking of the console lock can be
>   quite expensive due to console printing.)
> 
> - If the panic path is in NMI context, bail out without attempting
>   to take the console semaphore or calling any unblank() callbacks.
>   Bailing out is acceptable because console_unblank() would already
>   bail out if the console semaphore is contended. The alternative of
>   ignoring the console semaphore and calling the unblank() callbacks
>   anyway is a bad idea because these callbacks are also not NMI-safe.
> 
> If consoles with unblank() callbacks exist and console_unblank() is
> called from a non-NMI panic context, it will still attempt a
> down_trylock(). This could still result in a deadlock if one of the
> stopped CPUs is holding the semaphore internal spinlock. But this
> is a risk that the kernel has been (and continues to be) willing
> to take.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock
  2023-07-17 19:46 ` [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock John Ogness
  2023-07-18  3:17   ` Sergey Senozhatsky
@ 2023-07-18 10:23   ` Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2023-07-18 10:23 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2023-07-17 21:52:03, John Ogness wrote:
> When in a panic situation, non-panic CPUs should avoid holding the
> console lock so as not to contend with the panic CPU. This is already
> implemented with abandon_console_lock_in_panic(), which is checked
> after each printed line. However, non-panic CPUs should also avoid
> trying to acquire the console lock during a panic.
> 
> Modify console_trylock() to fail and console_lock() to block() when
> called from a non-panic CPU during a panic.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic()
  2023-07-17 19:46 ` [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic() John Ogness
  2023-07-18  3:18   ` Sergey Senozhatsky
@ 2023-07-18 10:28   ` Petr Mladek
  1 sibling, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2023-07-18 10:28 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2023-07-17 21:52:04, John Ogness wrote:
> Currently console_flush_on_panic() will attempt to acquire the
> console lock when flushing the buffer on panic. If it fails to
> acquire the lock, it continues anyway because this is the last
> chance to get any pending records printed.
> 
> The reason why the console lock was attempted at all was to
> prevent any other CPUs from acquiring the console lock for
> printing while the panic CPU was printing. But as of the
> previous commit, non-panic CPUs will no longer attempt to
> acquire the console lock in a panic situation. Therefore it is
> no longer strictly necessary for a panic CPU to acquire the
> console lock.
> 
> Avoiding taking the console lock when flushing in panic has
> the additional benefit of avoiding possible deadlocks due to
> semaphore usage in NMI context (semaphores are not NMI-safe)
> and avoiding possible deadlocks if another CPU accesses the
> semaphore and is stopped while holding one of the semaphore's
> internal spinlocks.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 0/7] various cleanups
  2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
                   ` (6 preceding siblings ...)
  2023-07-17 19:46 ` [PATCH printk v3 7/7] printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic() John Ogness
@ 2023-07-18 10:44 ` Petr Mladek
  2023-07-21  9:17   ` Petr Mladek
  7 siblings, 1 reply; 16+ messages in thread
From: Petr Mladek @ 2023-07-18 10:44 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Jason Wessel, Daniel Thompson, Douglas Anderson, Aaron Tomlin,
	Greg Kroah-Hartman

On Mon 2023-07-17 21:52:00, John Ogness wrote:
> Hi,
> 
> This is v3 of a series providing some cleanup in preparation
> for the threaded/atomic console work. v2 is here [0]. This
> series provides useful cleanups independent of the
> threaded/atomic work.
> 
> Changes since v2:
> 
> - "NMI safety" patch split into 3 patches: console_unblank,
>   keep non-panic CPUs out, do not lock console on panic flush.
> 
> - console_unblank() aborts if there are no consoles implementing
>   an unblank() callback. This check is performed before taking
>   the console lock.
> 
> - Commit messages and comments updated to mention the details
>   discussed in the v2 feedback.

The patchset looks ready for linux-next. I am going to push it
into printk/linux.git, branch rework/misc-cleanups.

I'll wait a day or two. I think that there is only very small
chance that anyone else would do a review.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH printk v3 0/7] various cleanups
  2023-07-18 10:44 ` [PATCH printk v3 0/7] various cleanups Petr Mladek
@ 2023-07-21  9:17   ` Petr Mladek
  0 siblings, 0 replies; 16+ messages in thread
From: Petr Mladek @ 2023-07-21  9:17 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Jason Wessel, Daniel Thompson, Douglas Anderson, Aaron Tomlin,
	Greg Kroah-Hartman

On Tue 2023-07-18 12:44:35, Petr Mladek wrote:
> On Mon 2023-07-17 21:52:00, John Ogness wrote:
> > Hi,
> > 
> > This is v3 of a series providing some cleanup in preparation
> > for the threaded/atomic console work. v2 is here [0]. This
> > series provides useful cleanups independent of the
> > threaded/atomic work.
> > 
> > Changes since v2:
> > 
> > - "NMI safety" patch split into 3 patches: console_unblank,
> >   keep non-panic CPUs out, do not lock console on panic flush.
> > 
> > - console_unblank() aborts if there are no consoles implementing
> >   an unblank() callback. This check is performed before taking
> >   the console lock.
> > 
> > - Commit messages and comments updated to mention the details
> >   discussed in the v2 feedback.
> 
> The patchset looks ready for linux-next. I am going to push it
> into printk/linux.git, branch rework/misc-cleanups.
> 
> I'll wait a day or two. I think that there is only very small
> chance that anyone else would do a review.

JFYI, the patchset has been comitted into pritnk/linux.git
branch rework/misc-cleanups.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-07-21  9:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-17 19:46 [PATCH printk v3 0/7] various cleanups John Ogness
2023-07-17 19:46 ` [PATCH printk v3 1/7] kdb: Do not assume write() callback available John Ogness
2023-07-17 19:46 ` [PATCH printk v3 2/7] printk: Reduce console_unblank() usage in unsafe scenarios John Ogness
2023-07-18  3:39   ` Sergey Senozhatsky
2023-07-18 10:21   ` Petr Mladek
2023-07-17 19:46 ` [PATCH printk v3 3/7] printk: Keep non-panic-CPUs out of console lock John Ogness
2023-07-18  3:17   ` Sergey Senozhatsky
2023-07-18 10:23   ` Petr Mladek
2023-07-17 19:46 ` [PATCH printk v3 4/7] printk: Do not take console lock for console_flush_on_panic() John Ogness
2023-07-18  3:18   ` Sergey Senozhatsky
2023-07-18 10:28   ` Petr Mladek
2023-07-17 19:46 ` [PATCH printk v3 5/7] printk: Consolidate console deferred printing John Ogness
2023-07-17 19:46 ` [PATCH printk v3 6/7] printk: Add per-console suspended state John Ogness
2023-07-17 19:46 ` [PATCH printk v3 7/7] printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic() John Ogness
2023-07-18 10:44 ` [PATCH printk v3 0/7] various cleanups Petr Mladek
2023-07-21  9:17   ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox