* [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock()
@ 2017-11-02 13:51 Tejun Heo
2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo
2017-11-07 4:40 ` [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Sergey Senozhatsky
0 siblings, 2 replies; 13+ messages in thread
From: Tejun Heo @ 2017-11-02 13:51 UTC (permalink / raw)
To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt; +Cc: linux-kernel, kernel-team
We mark for waking up klogd whenever we see a new message sequence in
the main loop. However, the actual wakeup is always at the end of the
function and we can easily test for the wakeup condition when we do
the final should-we-repeat check.
Move the wake_klogd condition check out of the main loop. This avoids
doing the same thing repeatedly and groups similar checks into a
common place.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
kernel/printk/printk.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2184,10 +2184,6 @@ again:
printk_safe_enter_irqsave(flags);
raw_spin_lock(&logbuf_lock);
- if (seen_seq != log_next_seq) {
- wake_klogd = true;
- seen_seq = log_next_seq;
- }
if (console_seq < log_first_seq) {
len = sprintf(text, "** %u printk messages dropped ** ",
@@ -2248,12 +2244,17 @@ skip:
up_console_sem();
/*
- * Someone could have filled up the buffer again, so re-check if there's
- * something to flush. In case we cannot trylock the console_sem again,
- * there's a new owner and the console_unlock() from them will do the
- * flush, no worries.
+ * Check whether userland needs notification. Also, someone could
+ * have filled up the buffer again, so re-check if there's
+ * something to flush. In case we cannot trylock the console_sem
+ * again, there's a new owner and the console_unlock() from them
+ * will do the flush, no worries.
*/
raw_spin_lock(&logbuf_lock);
+ if (seen_seq != log_next_seq) {
+ wake_klogd = true;
+ seen_seq = log_next_seq;
+ }
retry = console_seq != log_next_seq;
raw_spin_unlock(&logbuf_lock);
printk_safe_exit_irqrestore(flags);
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-02 13:51 [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Tejun Heo @ 2017-11-02 13:52 ` Tejun Heo 2017-11-02 14:51 ` [PATCH TRIVIAL UPDATE 2/2] " Tejun Heo 2017-11-04 4:24 ` [PATCH 2/2] Subject: " Sergey Senozhatsky 2017-11-07 4:40 ` [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Sergey Senozhatsky 1 sibling, 2 replies; 13+ messages in thread From: Tejun Heo @ 2017-11-02 13:52 UTC (permalink / raw) To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt Cc: linux-kernel, kernel-team, Linus Torvalds, Andrew Morton When printk flushing isn't contended, whoever creates messages get to flush them, which is great in both fairness and keeping log delivery synchronous. However, when console drivers can't keep up with the rate of new messages, which happens a lot, who ends up with the flushing duty is determined by timing and everyone else's messages become asynchronous. Unfortunately, the arbitrarily chosen flusher can easily be someone calling printk from a non-sleepable and/or non-preemptible context, like while holding a spinlock. This means that the flusher, who can be stuck in the flushing duty for a long time can't yield or be preempted, pegging the CPU. This can lead to RCU stall warnings, hung task warnings and work_on_cpu() stalls and all other sorts of failures, some of which may generate more printk messages to warn about the condition further pegging the flusher and grinding the whole system to a halt. Even if preemptible, it can be really harmful to peg a random task in flushing duty for a very long time. There can easily be locking or other dependency chains where such pegging can lead to more messages leading to meltdown of the system. The problem can be solved by introducing a dedicated aync flush worker and without harming any of the synchroncity guarantees. This patch udpates console_unlock() that each non-flusher caller is responsible for only upto the message that it sees on the first iteration which is guaranteed to include the message it printed, if it did any. If more messages came in while flushing was in progress, which means that those messages are already async, it schedules the dedicated flusher which keeps flushing until empty. The following repro makes the system completely unusable without this patch. After the patch, the system stays completely functional no matter how long the console stays saturated. #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/mutex.h> #include <linux/workqueue.h> static DEFINE_MUTEX(test_mutex); static bool stop_testing; static void atomic_printk_workfn(struct work_struct *work) { while (!READ_ONCE(stop_testing)) { msleep(100); mutex_lock(&test_mutex); mutex_unlock(&test_mutex); preempt_disable(); printk("XXX ATOMIC\n"); preempt_enable(); cond_resched(); } } static DECLARE_WORK(atomic_printk_work, atomic_printk_workfn); static void hog_printk_workfn(struct work_struct *work) { while (!READ_ONCE(stop_testing)) { mutex_lock(&test_mutex); printk("XXX HOG\n"); mutex_unlock(&test_mutex); cond_resched(); } } static DECLARE_WORK(hog_printk_work, hog_printk_workfn); static int __init test_init(void) { queue_work_on(0, system_wq, &atomic_printk_work); msleep(100); queue_work_on(1, system_wq, &hog_printk_work); return 0; } static void __exit test_exit(void) { WRITE_ONCE(stop_testing, true); flush_work(&atomic_printk_work); flush_work(&hog_printk_work); } module_init(test_init); module_exit(test_exit); Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> --- kernel/printk/printk.c | 66 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 58 insertions(+), 8 deletions(-) --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -45,6 +45,7 @@ #include <linux/utsname.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/kthread.h> #include <linux/sched/clock.h> #include <linux/sched/debug.h> #include <linux/sched/task_stack.h> @@ -436,6 +437,12 @@ static char __log_buf[__LOG_BUF_LEN] __a static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* async flush */ +static struct kthread_worker *console_async_worker; + +static void console_async_workfn(struct kthread_work *work); +static DEFINE_KTHREAD_WORK(console_async_work, console_async_workfn); + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -2092,6 +2099,17 @@ int is_console_locked(void) return console_locked; } +static void console_async_workfn(struct kthread_work *work) +{ + console_lock(); + console_unlock(); +} + +static bool current_is_console_async(void) +{ + return console_async_worker && console_async_worker->task == current; +} + /* * Check if we have any console that is capable of printing while cpu is * booting or shutting down. Requires console_sem. @@ -2141,7 +2159,8 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; - bool do_cond_resched, retry; + bool do_cond_resched, request_async; + u64 target_seq; if (console_suspended) { up_console_sem(); @@ -2165,6 +2184,7 @@ void console_unlock(void) do_cond_resched = console_may_schedule; again: console_may_schedule = 0; + target_seq = 0; /* * We released the console_sem lock, so we need to recheck if @@ -2185,6 +2205,18 @@ again: printk_safe_enter_irqsave(flags); raw_spin_lock(&logbuf_lock); + /* + * This function can be called from any context and we + * don't wanna get live-locked by others' messages. Unless + * we're the async worker, flush upto whatever is visible + * on the first iteration which is guaranteed to include + * the message this task printed if it did. If there are + * more messages to be printed, we'll punt to the async + * worker. + */ + if (!target_seq || current_is_console_async()) + target_seq = log_next_seq; + if (console_seq < log_first_seq) { len = sprintf(text, "** %u printk messages dropped ** ", (unsigned)(log_first_seq - console_seq)); @@ -2196,7 +2228,7 @@ again: len = 0; } skip: - if (console_seq == log_next_seq) + if (console_seq >= target_seq) break; msg = log_from_idx(console_idx); @@ -2246,21 +2278,33 @@ skip: /* * Check whether userland needs notification. Also, someone could * have filled up the buffer again, so re-check if there's - * something to flush. In case we cannot trylock the console_sem - * again, there's a new owner and the console_unlock() from them - * will do the flush, no worries. + * something to flush. */ raw_spin_lock(&logbuf_lock); if (seen_seq != log_next_seq) { wake_klogd = true; seen_seq = log_next_seq; } - retry = console_seq != log_next_seq; + request_async = console_seq != log_next_seq; raw_spin_unlock(&logbuf_lock); printk_safe_exit_irqrestore(flags); - if (retry && console_trylock()) - goto again; + if (request_async) { + if (console_async_worker) { + /* async is online, punt */ + kthread_queue_work(console_async_worker, + &console_async_work); + } else { + /* + * We're responsible for flushing. In case we + * cannot trylock the console_sem again, there's a + * new owner and the console_unlock() from them + * will do the flush, no worries. + */ + if (console_trylock()) + goto again; + } + } if (wake_klogd) wake_up_klogd(); @@ -2647,6 +2691,12 @@ static int __init printk_late_init(void) struct console *con; int ret; + console_async_worker = kthread_create_worker(0 , "console_async"); + if (IS_ERR(console_async_worker)) { + pr_err("console: failed to initialize async flusher\n"); + console_async_worker = NULL; + } + for_each_console(con) { if (!(con->flags & CON_BOOT)) continue; ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH TRIVIAL UPDATE 2/2] printk: Don't trap random context in infinite log_buf flush 2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo @ 2017-11-02 14:51 ` Tejun Heo 2017-11-04 4:24 ` [PATCH 2/2] Subject: " Sergey Senozhatsky 1 sibling, 0 replies; 13+ messages in thread From: Tejun Heo @ 2017-11-02 14:51 UTC (permalink / raw) To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt Cc: linux-kernel, kernel-team, Linus Torvalds, Andrew Morton When printk flushing isn't contended, whoever creates messages get to flush them, which is great in both fairness and keeping log delivery synchronous. However, when console drivers can't keep up with the rate of new messages, which happens a lot, who ends up with the flushing duty is determined by timing and everyone else's messages become asynchronous. Unfortunately, the arbitrarily chosen flusher can easily be someone calling printk from a non-sleepable and/or non-preemptible context, like while holding a spinlock. This means that the flusher, who can be stuck in the flushing duty for a long time can't yield or be preempted, pegging the CPU. This can lead to RCU stall warnings, hung task warnings and work_on_cpu() stalls and all other sorts of failures, some of which may generate more printk messages to warn about the condition further pegging the flusher and grinding the whole system to a halt. Even if preemptible, it can be really harmful to peg a random task in flushing duty for a very long time. There can easily be locking or other dependency chains where such pegging can lead to more messages leading to meltdown of the system. The problem can be solved by introducing a dedicated aync flush worker and without harming any of the synchroncity guarantees. This patch udpates console_unlock() that each non-flusher caller is responsible for only upto the message that it sees on the first iteration which is guaranteed to include the message it printed, if it did any. If more messages came in while flushing was in progress, which means that those messages are already async, it schedules the dedicated flusher which keeps flushing until empty. The following repro makes the system completely unusable without this patch. After the patch, the system stays completely functional no matter how long the console stays saturated. #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/mutex.h> #include <linux/workqueue.h> static DEFINE_MUTEX(test_mutex); static bool stop_testing; static void atomic_printk_workfn(struct work_struct *work) { while (!READ_ONCE(stop_testing)) { msleep(100); mutex_lock(&test_mutex); mutex_unlock(&test_mutex); preempt_disable(); printk("XXX ATOMIC\n"); preempt_enable(); cond_resched(); } } static DECLARE_WORK(atomic_printk_work, atomic_printk_workfn); static void hog_printk_workfn(struct work_struct *work) { while (!READ_ONCE(stop_testing)) { mutex_lock(&test_mutex); printk("XXX HOG\n"); mutex_unlock(&test_mutex); cond_resched(); } } static DECLARE_WORK(hog_printk_work, hog_printk_workfn); static int __init test_init(void) { queue_work_on(0, system_wq, &atomic_printk_work); msleep(100); queue_work_on(1, system_wq, &hog_printk_work); return 0; } static void __exit test_exit(void) { WRITE_ONCE(stop_testing, true); flush_work(&atomic_printk_work); flush_work(&hog_printk_work); } module_init(test_init); module_exit(test_exit); v2: checkpatch style fix and comment added to console_async_workfn(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> --- kernel/printk/printk.c | 70 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 62 insertions(+), 8 deletions(-) --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -45,6 +45,7 @@ #include <linux/utsname.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/kthread.h> #include <linux/sched/clock.h> #include <linux/sched/debug.h> #include <linux/sched/task_stack.h> @@ -436,6 +437,12 @@ static char __log_buf[__LOG_BUF_LEN] __a static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* async flush */ +static struct kthread_worker *console_async_worker; + +static void console_async_workfn(struct kthread_work *work); +static DEFINE_KTHREAD_WORK(console_async_work, console_async_workfn); + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -2092,6 +2099,21 @@ int is_console_locked(void) return console_locked; } +static void console_async_workfn(struct kthread_work *work) +{ + console_lock(); + /* + * The following unlock combined with the true return from + * current_is_console_async() keeps flushing log_buf until drained. + */ + console_unlock(); +} + +static bool current_is_console_async(void) +{ + return console_async_worker && console_async_worker->task == current; +} + /* * Check if we have any console that is capable of printing while cpu is * booting or shutting down. Requires console_sem. @@ -2141,7 +2163,8 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; - bool do_cond_resched, retry; + bool do_cond_resched, request_async; + u64 target_seq; if (console_suspended) { up_console_sem(); @@ -2165,6 +2188,7 @@ void console_unlock(void) do_cond_resched = console_may_schedule; again: console_may_schedule = 0; + target_seq = 0; /* * We released the console_sem lock, so we need to recheck if @@ -2185,6 +2209,18 @@ again: printk_safe_enter_irqsave(flags); raw_spin_lock(&logbuf_lock); + /* + * This function can be called from any context and we + * don't wanna get live-locked by others' messages. Unless + * we're the async worker, flush upto whatever is visible + * on the first iteration which is guaranteed to include + * the message this task printed if it did. If there are + * more messages to be printed, we'll punt to the async + * worker. + */ + if (!target_seq || current_is_console_async()) + target_seq = log_next_seq; + if (console_seq < log_first_seq) { len = sprintf(text, "** %u printk messages dropped ** ", (unsigned)(log_first_seq - console_seq)); @@ -2196,7 +2232,7 @@ again: len = 0; } skip: - if (console_seq == log_next_seq) + if (console_seq >= target_seq) break; msg = log_from_idx(console_idx); @@ -2246,21 +2282,33 @@ skip: /* * Check whether userland needs notification. Also, someone could * have filled up the buffer again, so re-check if there's - * something to flush. In case we cannot trylock the console_sem - * again, there's a new owner and the console_unlock() from them - * will do the flush, no worries. + * something to flush. */ raw_spin_lock(&logbuf_lock); if (seen_seq != log_next_seq) { wake_klogd = true; seen_seq = log_next_seq; } - retry = console_seq != log_next_seq; + request_async = console_seq != log_next_seq; raw_spin_unlock(&logbuf_lock); printk_safe_exit_irqrestore(flags); - if (retry && console_trylock()) - goto again; + if (request_async) { + if (console_async_worker) { + /* async is online, punt */ + kthread_queue_work(console_async_worker, + &console_async_work); + } else { + /* + * We're responsible for flushing. In case we + * cannot trylock the console_sem again, there's a + * new owner and the console_unlock() from them + * will do the flush, no worries. + */ + if (console_trylock()) + goto again; + } + } if (wake_klogd) wake_up_klogd(); @@ -2647,6 +2695,12 @@ static int __init printk_late_init(void) struct console *con; int ret; + console_async_worker = kthread_create_worker(0, "console_async"); + if (IS_ERR(console_async_worker)) { + pr_err("console: failed to initialize async flusher\n"); + console_async_worker = NULL; + } + for_each_console(con) { if (!(con->flags & CON_BOOT)) continue; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo 2017-11-02 14:51 ` [PATCH TRIVIAL UPDATE 2/2] " Tejun Heo @ 2017-11-04 4:24 ` Sergey Senozhatsky 2017-11-07 0:22 ` Tejun Heo 1 sibling, 1 reply; 13+ messages in thread From: Sergey Senozhatsky @ 2017-11-04 4:24 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton On (11/02/17 06:52), Tejun Heo wrote: > > When printk flushing isn't contended, whoever creates messages get to > flush them, which is great in both fairness and keeping log delivery > synchronous. However, when console drivers can't keep up with the > rate of new messages, which happens a lot, who ends up with the > flushing duty is determined by timing and everyone else's messages > become asynchronous. Hello Tejun, thanks for the patch set. we are currently looking at another approach: lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home would you be interested in taking a look? there are some concerns, like a huge number of printk-s happening while console_sem is locked. e.g. console_lock()/console_unlock() on one of the CPUs, or console_lock(); printk(); ... printk(); console_unlock(); > > the problem of "the last printk()", which will take > over and do the flush. > > CPU0 CPU1 ~ CPU99 > console_lock(); > printk(); ... printk(); > console_unlock(); > IRQ on CPU2 > printk() > // take over console_sem > console_unlock() > -ss ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-04 4:24 ` [PATCH 2/2] Subject: " Sergey Senozhatsky @ 2017-11-07 0:22 ` Tejun Heo 2017-11-07 2:04 ` Sergey Senozhatsky 2017-11-08 16:20 ` Steven Rostedt 0 siblings, 2 replies; 13+ messages in thread From: Tejun Heo @ 2017-11-07 0:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Steven Rostedt, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton Hello, On Sat, Nov 04, 2017 at 01:24:08PM +0900, Sergey Senozhatsky wrote: > thanks for the patch set. we are currently looking at another approach: > lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home > > would you be interested in taking a look? Hmm... It took some tweaking but the code at the end locks up the machine w/ Steven's patch applied and it's not that contrived a case (e.g. out of memory messages from packet tx/rx paths while OOM is in progress). > there are some concerns, like a huge number of printk-s happening while > console_sem is locked. e.g. console_lock()/console_unlock() on one of the > CPUs, or console_lock(); printk(); ... printk(); console_unlock(); Unless we make all messages fully synchronous, I don't think there's a good solution for that and I don't think we wanna make everything fully synchronous. The information value of messages go down pretty fast in a message deluge and as long as we stay synchronous in the beginning, information loss usually isn't the problem. The current code can lose a lot of messages but the way it loses messages usually doesn't lead to loss of important information. The proposed patch doesn't materially change when and what we lose. It just moves the flush infinite loop into a safe context. Thanks. #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/mutex.h> #include <linux/workqueue.h> #include <linux/hrtimer.h> static bool in_printk; static bool stop_testing; static struct hrtimer printk_timer; static ktime_t timer_interval; static enum hrtimer_restart printk_timerfn(struct hrtimer *timer) { int i; if (READ_ONCE(in_printk)) for (i = 0; i < 1000; i++) printk("%-80s\n", "XXX TIMER"); hrtimer_forward_now(&printk_timer, timer_interval); return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART; } static void preempt_printk_workfn(struct work_struct *work) { int i; hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); printk_timer.function = printk_timerfn; timer_interval = ktime_set(0, NSEC_PER_MSEC); hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL); while (!READ_ONCE(stop_testing)) { preempt_disable(); WRITE_ONCE(in_printk, true); for (i = 0; i < 100; i++) printk("%-80s\n", "XXX PREEMPT"); WRITE_ONCE(in_printk, false); preempt_enable(); msleep(1); } } static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn); static int __init test_init(void) { queue_work_on(0, system_wq, &preempt_printk_work); return 0; } static void __exit test_exit(void) { WRITE_ONCE(stop_testing, true); flush_work(&preempt_printk_work); hrtimer_cancel(&printk_timer); } module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL"); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-07 0:22 ` Tejun Heo @ 2017-11-07 2:04 ` Sergey Senozhatsky 2017-11-07 13:23 ` Tejun Heo 2017-11-08 16:20 ` Steven Rostedt 1 sibling, 1 reply; 13+ messages in thread From: Sergey Senozhatsky @ 2017-11-07 2:04 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton Hi Tejun, On (11/06/17 16:22), Tejun Heo wrote: > Hello, > > On Sat, Nov 04, 2017 at 01:24:08PM +0900, Sergey Senozhatsky wrote: > > thanks for the patch set. we are currently looking at another approach: > > lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home > > > > would you be interested in taking a look? > > Hmm... It took some tweaking but the code at the end locks up the > machine w/ Steven's patch applied and it's not that contrived a case > (e.g. out of memory messages from packet tx/rx paths while OOM is in > progress). thanks! just to make sure. there is a typo in Steven's patch: while (!READ_ONCE(console_waiter)) should be while (READ_ONCE(console_waiter)) is this the "tweaking" you are talking about? > > there are some concerns, like a huge number of printk-s happening while > > console_sem is locked. e.g. console_lock()/console_unlock() on one of the > > CPUs, or console_lock(); printk(); ... printk(); console_unlock(); > > Unless we make all messages fully synchronous, I don't think there's a > good solution for that and I don't think we wanna make everything > fully synchronous. this is where it becomes complicated. offloading logic is not binary, unfortunately. we normally want to offload; but not always. things like sysrq or late PM warnings, or kexec, etc. want to stay fully sync, regardless the consequences. some of sysrq prints out even do touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current printk-kthread patch set tries to consider those cases and to avoid any offloading. -ss ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-07 2:04 ` Sergey Senozhatsky @ 2017-11-07 13:23 ` Tejun Heo 2017-11-08 5:29 ` Sergey Senozhatsky 2017-11-08 16:22 ` Steven Rostedt 0 siblings, 2 replies; 13+ messages in thread From: Tejun Heo @ 2017-11-07 13:23 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton Hello, Sergey. On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote: > just to make sure. there is a typo in Steven's patch: > > while (!READ_ONCE(console_waiter)) > > should be > > while (READ_ONCE(console_waiter)) > > is this the "tweaking" you are talking about? Oh, I was talking about tweaking the repro, but I'm not sure the above would change anything. The problem that the repro demonstrates is a message deluge involving an non-sleepable flusher + local irq (or other atomic contexts) message producer. In the above case, none of the involved contexts can serve as the flusher for a long time without messing up the system. If you wanna allow printks to be async without falling into these lockups, you gotta introduce an independent safe context to flush from. > > > there are some concerns, like a huge number of printk-s happening while > > > console_sem is locked. e.g. console_lock()/console_unlock() on one of the > > > CPUs, or console_lock(); printk(); ... printk(); console_unlock(); > > > > Unless we make all messages fully synchronous, I don't think there's a > > good solution for that and I don't think we wanna make everything > > fully synchronous. > > this is where it becomes complicated. offloading logic is not binary, > unfortunately. we normally want to offload; but not always. things > like sysrq or late PM warnings, or kexec, etc. want to stay fully sync, > regardless the consequences. some of sysrq prints out even do > touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current > printk-kthread patch set tries to consider those cases and to avoid > any offloading. Yeah, sure, selectively opting out of asynchronous operation is a different (solvable) issue. Also, just to be clear, the proposed patch doesn't make any of these worse in any meaningful way - e.g. we could end up trapping a nice 20 task pinned to an overloaded CPU in the flusher role. The following is a completely untested patch to show how we can put the console in full sync mode, just the general idea. I'm a bit skeptical we really wanna do this given that we already (with or without the patch) stay sync for most of these events due to the way we go async, but, yeah, if we wanna do that, we can do that. Thanks. --- kernel/printk/printk.c | 40 +++++++++++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 5 deletions(-) --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -438,6 +438,7 @@ static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; /* async flush */ +static bool console_sync_mode; static struct kthread_worker *console_async_worker; static void console_async_workfn(struct kthread_work *work); @@ -2051,7 +2052,8 @@ void console_lock(void) { might_sleep(); - down_console_sem(); + down_console_sem(); // make it return w/ irq disabled on locked + // and don't reenable till unlock if sync mode if (console_suspended) return; console_locked = 1; @@ -2069,8 +2071,22 @@ EXPORT_SYMBOL(console_lock); */ int console_trylock(void) { - if (down_trylock_console_sem()) + unsigned long flags; + bool locked; + + printk_safe_enter_irqsave(flags); + do { + // we can make this nest safe if necessary by remembering the holding cpu + locked = down_trylock_console_sem(); + if (locked) + break; + cpu_relax(); + } while (!locked && READ_ONCE(console_sync_mode)); + printk_safe_exit_irqrestore(flags); + + if (!locked) return 0; + if (console_suspended) { up_console_sem(); return 0; @@ -2087,7 +2103,8 @@ int console_trylock(void) * rcu_preempt_depth(), otherwise RCU read sections modify * preempt_count(). */ - console_may_schedule = !oops_in_progress && + console_may_schedule = !console_sync_mode && + !oops_in_progress && preemptible() && !rcu_preempt_depth(); return 1; @@ -2218,7 +2235,8 @@ again: * more messages to be printed, we'll punt to the async * worker. */ - if (!target_seq || current_is_console_async()) + if (!target_seq || current_is_console_async() || + !console_sync_mode) target_seq = log_next_seq; if (console_seq < log_first_seq) { @@ -2293,7 +2311,7 @@ skip: raw_spin_unlock(&logbuf_lock); printk_safe_exit_irqrestore(flags); - if (request_async) { + if (request_async && !console_sync_mode) { if (console_async_worker) { /* async is online, punt */ kthread_queue_work(console_async_worker, @@ -2413,6 +2431,18 @@ void console_start(struct console *conso } EXPORT_SYMBOL(console_start); +void console_enter_sync_mode(void) +{ + console_lock(); + WRITE_ONCE(console_sync_mode, true); + console_unlock(); +} + +void console_leave_sync_mode(void) +{ + WRITE_ONCE(console_sync_mode, false); +} + static int __read_mostly keep_bootcon; static int __init keep_bootcon_setup(char *str) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-07 13:23 ` Tejun Heo @ 2017-11-08 5:29 ` Sergey Senozhatsky 2017-11-08 14:33 ` Tejun Heo 2017-11-08 16:22 ` Steven Rostedt 1 sibling, 1 reply; 13+ messages in thread From: Sergey Senozhatsky @ 2017-11-08 5:29 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton Hello Tejun, On (11/07/17 05:23), Tejun Heo wrote: > Hello, Sergey. > > On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote: > > just to make sure. there is a typo in Steven's patch: > > > > while (!READ_ONCE(console_waiter)) > > > > should be > > > > while (READ_ONCE(console_waiter)) > > > > is this the "tweaking" you are talking about? > > Oh, I was talking about tweaking the repro, but I'm not sure the above > would change anything. The problem that the repro demonstrates is a > message deluge involving an non-sleepable flusher + local irq (or > other atomic contexts) message producer. > > In the above case, none of the involved contexts can serve as the > flusher for a long time without messing up the system. If you wanna > allow printks to be async without falling into these lockups, you > gotta introduce an independent safe context to flush from. we are in agreement. I Cc-ed you to another thread, let's merge discussions. > > > > there are some concerns, like a huge number of printk-s happening while > > > > console_sem is locked. e.g. console_lock()/console_unlock() on one of the > > > > CPUs, or console_lock(); printk(); ... printk(); console_unlock(); > > > > > > Unless we make all messages fully synchronous, I don't think there's a > > > good solution for that and I don't think we wanna make everything > > > fully synchronous. > > > > this is where it becomes complicated. offloading logic is not binary, > > unfortunately. we normally want to offload; but not always. things > > like sysrq or late PM warnings, or kexec, etc. want to stay fully sync, > > regardless the consequences. some of sysrq prints out even do > > touch_nmi_watchdog() and touch_all_softlockup_watchdogs(). current > > printk-kthread patch set tries to consider those cases and to avoid > > any offloading. > > Yeah, sure, selectively opting out of asynchronous operation is a > different (solvable) issue. Also, just to be clear, the proposed > patch doesn't make any of these worse in any meaningful way - e.g. we > could end up trapping a nice 20 task pinned to an overloaded CPU in > the flusher role. > > The following is a completely untested patch to show how we can put > the console in full sync mode, just the general idea. I'm a bit > skeptical we really wanna do this given that we already (with or > without the patch) stay sync for most of these events due to the way > we go async, but, yeah, if we wanna do that, we can do that. we've been going in a slightly different direction in printk-kthread. we keep printk sync by default [as opposed to previous "immediately offload" approach]. people asked for it, some people demanded it. we offload to printk-kthread only when we detect that this particular task on this particular CPU has been doing printing (without rescheduling) for 1/2 of watchdog threshold value. IOW, if we see that we are heading towards the lockup limit then we offload. otherwise - we let it loop in console_unlock(). -ss ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-08 5:29 ` Sergey Senozhatsky @ 2017-11-08 14:33 ` Tejun Heo 0 siblings, 0 replies; 13+ messages in thread From: Tejun Heo @ 2017-11-08 14:33 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton Hello, On Wed, Nov 08, 2017 at 02:29:35PM +0900, Sergey Senozhatsky wrote: > > The following is a completely untested patch to show how we can put > > the console in full sync mode, just the general idea. I'm a bit > > skeptical we really wanna do this given that we already (with or > > without the patch) stay sync for most of these events due to the way > > we go async, but, yeah, if we wanna do that, we can do that. > > we've been going in a slightly different direction in printk-kthread. > > we keep printk sync by default [as opposed to previous "immediately > offload" approach]. people asked for it, some people demanded it. we > offload to printk-kthread only when we detect that this particular > task on this particular CPU has been doing printing (without rescheduling) > for 1/2 of watchdog threshold value. IOW, if we see that we are heading > towards the lockup limit then we offload. otherwise - we let it loop in > console_unlock(). FWIW, I'm still a bit skeptical whether that's the right direction. People asking for it doesn't indicate that the flipside isn't worse. Hmm... I think my impression is mostly from the fact that throughout the years I can only count few times where loss of printk messages was an actual issue. The system getting slowed down / locking up from printk message deluge was *far* more prevalent. Thanks. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-07 13:23 ` Tejun Heo 2017-11-08 5:29 ` Sergey Senozhatsky @ 2017-11-08 16:22 ` Steven Rostedt 2017-11-08 16:28 ` Tejun Heo 1 sibling, 1 reply; 13+ messages in thread From: Steven Rostedt @ 2017-11-08 16:22 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton On Tue, 7 Nov 2017 05:23:50 -0800 Tejun Heo <tj@kernel.org> wrote: > Hello, Sergey. > > On Tue, Nov 07, 2017 at 11:04:34AM +0900, Sergey Senozhatsky wrote: > > just to make sure. there is a typo in Steven's patch: > > > > while (!READ_ONCE(console_waiter)) > > > > should be > > > > while (READ_ONCE(console_waiter)) > > > > is this the "tweaking" you are talking about? > > Oh, I was talking about tweaking the repro, but I'm not sure the above > would change anything. No, it would change an awful lot. It would cause two printers to access the consoles at the same time, which would lead to unpredictable behavior. -- Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-08 16:22 ` Steven Rostedt @ 2017-11-08 16:28 ` Tejun Heo 0 siblings, 0 replies; 13+ messages in thread From: Tejun Heo @ 2017-11-08 16:28 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton Hello, Steven. On Wed, Nov 08, 2017 at 11:22:29AM -0500, Steven Rostedt wrote: > > Oh, I was talking about tweaking the repro, but I'm not sure the above > > would change anything. > > No, it would change an awful lot. > > It would cause two printers to access the consoles at the same time, > which would lead to unpredictable behavior. Sure, I'll give it a shot later but in the repro there literally is no context which is safe, so I kinda have a difficult time imagining how that will make things materially better. Neither of the producers can get scheduled out no matter they switch back and forth between them or not, so rcu stalls seem rather unavoidable without introducing an independent context. Thanks. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush 2017-11-07 0:22 ` Tejun Heo 2017-11-07 2:04 ` Sergey Senozhatsky @ 2017-11-08 16:20 ` Steven Rostedt 1 sibling, 0 replies; 13+ messages in thread From: Steven Rostedt @ 2017-11-08 16:20 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Petr Mladek, linux-kernel, kernel-team, Linus Torvalds, Andrew Morton On Mon, 6 Nov 2017 16:22:54 -0800 Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Sat, Nov 04, 2017 at 01:24:08PM +0900, Sergey Senozhatsky wrote: > > thanks for the patch set. we are currently looking at another approach: > > lkml.kernel.org/r/20171102134515.6eef16de@gandalf.local.home > > > > would you be interested in taking a look? > > Hmm... It took some tweaking but the code at the end locks up the > machine w/ Steven's patch applied and it's not that contrived a case > (e.g. out of memory messages from packet tx/rx paths while OOM is in > progress). Did you apply it with or without the while (READ_ONCE(console_waiter)) fix? Because that is crucial. -- Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() 2017-11-02 13:51 [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Tejun Heo 2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo @ 2017-11-07 4:40 ` Sergey Senozhatsky 1 sibling, 0 replies; 13+ messages in thread From: Sergey Senozhatsky @ 2017-11-07 4:40 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, linux-kernel, kernel-team On (11/02/17 06:51), Tejun Heo wrote: > We mark for waking up klogd whenever we see a new message sequence in > the main loop. However, the actual wakeup is always at the end of the > function and we can easily test for the wakeup condition when we do > the final should-we-repeat check. > > Move the wake_klogd condition check out of the main loop. This avoids > doing the same thing repeatedly and groups similar checks into a > common place. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Cc: Petr Mladek <pmladek@suse.com> > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Cc: Steven Rostedt <rostedt@goodmis.org> looks good to me. FWIW, Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -ss ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-11-08 16:28 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-11-02 13:51 [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Tejun Heo 2017-11-02 13:52 ` [PATCH 2/2] Subject: printk: Don't trap random context in infinite log_buf flush Tejun Heo 2017-11-02 14:51 ` [PATCH TRIVIAL UPDATE 2/2] " Tejun Heo 2017-11-04 4:24 ` [PATCH 2/2] Subject: " Sergey Senozhatsky 2017-11-07 0:22 ` Tejun Heo 2017-11-07 2:04 ` Sergey Senozhatsky 2017-11-07 13:23 ` Tejun Heo 2017-11-08 5:29 ` Sergey Senozhatsky 2017-11-08 14:33 ` Tejun Heo 2017-11-08 16:22 ` Steven Rostedt 2017-11-08 16:28 ` Tejun Heo 2017-11-08 16:20 ` Steven Rostedt 2017-11-07 4:40 ` [PATCH 1/2] printk: Relocate wake_klogd check close to the end of console_unlock() Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox