From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751088AbdE2J3L (ORCPT ); Mon, 29 May 2017 05:29:11 -0400 Received: from mx2.suse.de ([195.135.220.15]:34886 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750787AbdE2J3J (ORCPT ); Mon, 29 May 2017 05:29:09 -0400 Date: Mon, 29 May 2017 11:29:06 +0200 From: Petr Mladek To: Sergey Senozhatsky Cc: Sergey Senozhatsky , Steven Rostedt , Jan Kara , Andrew Morton , Peter Zijlstra , "Rafael J . Wysocki" , Eric Biederman , Greg Kroah-Hartman , Jiri Slaby , Pavel Machek , Andreas Mohr , Tetsuo Handa , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread Message-ID: <20170529092906.GD21894@pathway.suse.cz> References: <20170509082859.854-1-sergey.senozhatsky@gmail.com> <20170509082859.854-3-sergey.senozhatsky@gmail.com> <20170510055935.GA1966@jagdpanzerIV.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170510055935.GA1966@jagdpanzerIV.localdomain> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 2017-05-10 14:59:35, Sergey Senozhatsky wrote: > This patch introduces a '/sys/module/printk/parameters/atomic_print_limit' > sysfs param, which sets the limit on number of lines a process can print > from console_unlock(). Value 0 corresponds to the current behavior (no > limitation). The printing offloading is happening from console_unlock() > function and, briefly, looks as follows: as soon as process prints more > than `atomic_print_limit' lines it attempts to offload printing to another > process. Since nothing guarantees that there will another process sleeping > on the console_sem or calling printk() on another CPU simultaneously, the > patch also introduces an auxiliary kernel thread - printk_kthread, the > main purpose of which is to take over printing duty. The workflow is, thus, > turns into: as soon as process prints more than `atomic_print_limit' lines > it wakes up printk_kthread and unlocks the console_sem. So in the best case > at this point there will be at least 1 processes trying to lock the > console_sem: printk_kthread. (There can also be a process that was sleeping > on the console_sem and that was woken up by console semaphore up(); and > concurrent printk() invocations from other CPUs). But in the worst case > there won't be any processes ready to take over the printing duty: it > may take printk_kthread some time to become running; or printk_kthread > may even never become running (a misbehaving scheduler, or other critical > condition). That's why after we wake_up() printk_kthread we can't > immediately leave the printing loop, we must ensure that the console_sem > has a new owner before we do so. Therefore, `atomic_print_limit' is a soft > limit, not the hard one: we let task to overrun `atomic_print_limit'. > But, at the same time, the console_unlock() printing loop behaves differently > for tasks that have exceeded `atomic_print_limit': after every printed > logbuf entry (call_console_drivers()) such a process wakes up printk_kthread, > unlocks the console_sem and attempts to console_trylock() a bit later > (if there any are pending messages in the logbuf, of course). In the best case > scenario either printk_kthread or some other tasks will lock the console_sem, > so current printing task will see failed console_trylock(), which will > indicate a successful printing offloading. In the worst case, however, > current will successfully console_trylock(), which will indicate that > offloading did not take place and we can't return from console_unlock(), > so the printing task will print one more line from the logbuf and attempt > to offload printing once again; and it will continue doing so until another > process locks the console_sem or until there are pending messages in the > logbuf. So if everything goes wrong - we can't wakeup printk_kthread and > there are no other processes sleeping on the console_sem or trying to down() > it - then we will have the existing console_unlock() behavior: print all > pending messages in one shot. Please, try to avoid such long paragraphs ;-) It looks fine when you read it for the first time. But it becomes problematic when you try to go back and re-read some detail. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 2cb7f4753b76..a113f684066c 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2155,6 +2216,85 @@ static inline int can_use_console(void) > return cpu_online(raw_smp_processor_id()) || have_callable_console(); > } > > +/* > + * Under heavy printing load or with a slow serial console (or both) > + * console_unlock() can stall CPUs, which can result in soft/hard-lockups, > + * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the > + * number of lines a process can print from console_unlock(). > + * > + * There is one more reason to do offloading - there might be other processes > + * (including user space tasks) sleeping on the console_sem in uninterruptible > + * state; and keeping the console_sem locked for long time may have a negative > + * impact on them. > + * > + * This function must be called from 'printk_safe' context under > + * console_sem lock. > + */ > +static inline bool console_offload_printing(void) > +{ > + static struct task_struct *printing_task; > + static unsigned long long lines_printed; > + static unsigned long saved_csw; > + > + if (!printk_offloading_enabled()) > + return false; > + > + if (system_state != SYSTEM_RUNNING || oops_in_progress) > + return false; Wow, I really like this test. I was not aware of this variable. Well, I would allow to use the offload also during boot. We have reports when softlockups happened during boot. The offloading can be easily disabled via the command line but not the other way around. > + /* A new task - reset the counters. */ > + if (printing_task != current) { > + lines_printed = 0; > + printing_task = current; > + saved_csw = current->nvcsw + current->nivcsw; > + return false; > + } > + > + lines_printed++; > + if (lines_printed < atomic_print_limit) > + return false; > + > + if (current == printk_kthread) { > + /* > + * Reset the `lines_printed' counter, just in case if > + * printk_kthread is the only process left that would > + * down() console_sem and call console_unlock(). > + */ > + lines_printed = 0; > + return true; > + } > + > + /* > + * A trivial emergency enforcement - give up on printk_kthread if > + * we can't wake it up. This assumes that `atomic_print_limit' is > + * reasonably and sufficiently large. > + */ > + if (lines_printed > 10 * (unsigned long long)atomic_print_limit && > + saved_csw == (current->nvcsw + current->nivcsw)) { > + printk_enforce_emergency = true; > + pr_crit("Declaring printk emergency mode.\n"); > + return false; > + } This is interesting way how to detect that the system is in the emergency state. The question is how reliable it is. It might work only if the waken printk kthread is scheduled in a predictable time period. You try to achieve this by setting read time priority (4th patch of this patchset). The question is how this would work: First, the real time priority is questionable on its own. Logging is important but the real time priority is dangerous. Any "flood" of messages will starve all other processes with normal priority. It is better than a softlockup but it might cause problems as well. Second, it might be pretty hard to tune the prediction of the emergency situation. The above formula might depend on: + the speed of active consoles + length of the printed lines + the number and priorities of other real time tasks on the system and the related average time when printk kthread is scheduled + HZ size; the lower HZ the more messages might be handled in one jiffy + the current CPU speed, ... Third, if the console_trylock() fails below, it is not guaranteed that the other waiter will really continue printing. The console_trylock() would fail because someone else called console_lock(), failed to get it, and went into sleep. There is no guarantee that it will be waken once again. I have discussed this with my colleagues. They think that we are lost anyway if the scheduling does not work. It is because many console drivers use sleeping locks and depends on scheduling. They suggested me to write down what we really wanted to achieve first. I thought that I knew it. But it actually might be interesting to write it down: Ideal printk: + show and store all messages to know what's going on Current printk does the best effort to: + store "all" messages into logbuf_log and + show important ones on console + allow to see/store them in userspace + allow to find them in crashdump Current printk has limitations: + storing the messages: + limited buffer size => loosing messages + possible deadlocks => temporary passing via alternative buffers in NMI, recursion + userspace access ?: + on demand => more behind the reality that it might be + console output: + slow => async => + might be far behind + loosing messages + showing only ones with high log level + possible softlockup + not guaranteed when dying + possible deadlock => explicitly async in sched, NMI Now, this patchset is trying to avoid the softlockup caused by console output by making it even more async. This makes the other problems of the async output more visible. The patch tries to reduce the negative impact by detecting when the negative impact is visible and switching back to the sync mode. The fact is that we need to handle the emergency situation well. This is where the system has troubles and the messages might be crucial to debug the problem and fix it. The question is if we are able to detect the emergency situation reliable and if the solution does not bring other problems. If the detection has too false positives, it will not help much with the original problem (softlockup). If it has too false negatives, it will increase the negative impact of the limits. Also adding kthread with real time priority might have strange side effects on its own. Let me to look at it from the other side. What are the emergency situations? + oops/panic() - easy to detect + suspend, shutdown, halt - rather easy to detect as well + flood of messages => console far behind logbuf_log + sudden dead without triggering panic() Did I miss anything? Anyway, we could detect the first three situations rather well: + oops/panic by a check for oops_in_progress + suspend, shutdown, ... by the check of system_state and the notifiers in the 5th patch + I would suggest to detect flood of messages by a difference between log_next_seq and console_seq. We could take into account the number of messages in the buffer (log_next_seq - log_first_seq) + I do not see a reliable way how to detect the sudden dead. Please note, that we do not really need that questionable trick with console_unlock(), console_trylock(), counting handled messages over the limit, and realtime kthread. In addition, I think that a semi-sync-async mode might better handle the sudden dead. What is it? I have already talked about something similar last time. Normal printk() should do something like: printk() vprintk_emit() log_store() if (!deferred) console_trylock() console_unlock() while (num_lines < atomic_print_limit) handle message up_console_sem() if (pending_messages) wakeup(printk_kthread) and printk_kthread should do: while() { if (!pending_messages) sleep() console_lock() preempt_disable() console_unlock() while (pending_message && !need_resched) handle message up_console_sem() preempt_enable() } By other words: + printk() should try to get console and handle few lines synchronously. It will handle its own message in most cases. + printk_kthread is only a fallback. It is waken when there are unhandled messages. It handles as many messages as possible in its assigned time slot. But it releases console_sem before it goes into sleep. The advantages: + all messages are handled synchronously if there is not a flood of messages + if there is the flood of messages; kthread works only as a fallback; processes that produce the messages are involved into handling the console (nature throttling); the messages appear on the console even when printk_kthread is sleeping; but the softlockup is prevented => it should work pretty well in all situations, including the flood of messages and sudden dead. What do you think, please? Did I made a mistake in the logic? Did I miss some important reason for an emergency situation? I am sorry for the long mail. It took me long time to sort ideas. I hope that it makes sense. Best Regards, Petr