From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751088AbdE2J3L (ORCPT <rfc822;w@1wt.eu>);
        Mon, 29 May 2017 05:29:11 -0400
Received: from mx2.suse.de ([195.135.220.15]:34886 "EHLO mx1.suse.de"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1750787AbdE2J3J (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 29 May 2017 05:29:09 -0400
Date: Mon, 29 May 2017 11:29:06 +0200
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
        Steven Rostedt <rostedt@goodmis.org>, Jan Kara <jack@suse.cz>,
        Andrew Morton <akpm@linux-foundation.org>,
        Peter Zijlstra <peterz@infradead.org>,
        "Rafael J . Wysocki" <rjw@rjwysocki.net>,
        Eric Biederman <ebiederm@xmission.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Jiri Slaby <jslaby@suse.com>, Pavel Machek <pavel@ucw.cz>,
        Andreas Mohr <andi@lisas.de>,
        Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
        linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
Message-ID: <20170529092906.GD21894@pathway.suse.cz>
References: <20170509082859.854-1-sergey.senozhatsky@gmail.com>
 <20170509082859.854-3-sergey.senozhatsky@gmail.com>
 <20170510055935.GA1966@jagdpanzerIV.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170510055935.GA1966@jagdpanzerIV.localdomain>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 2017-05-10 14:59:35, Sergey Senozhatsky wrote:
> This patch introduces a '/sys/module/printk/parameters/atomic_print_limit'
> sysfs param, which sets the limit on number of lines a process can print
> from console_unlock(). Value 0 corresponds to the current behavior (no
> limitation). The printing offloading is happening from console_unlock()
> function and, briefly, looks as follows: as soon as process prints more
> than `atomic_print_limit' lines it attempts to offload printing to another
> process. Since nothing guarantees that there will another process sleeping
> on the console_sem or calling printk() on another CPU simultaneously, the
> patch also introduces an auxiliary kernel thread - printk_kthread, the
> main purpose of which is to take over printing duty. The workflow is, thus,
> turns into: as soon as process prints more than `atomic_print_limit' lines
> it wakes up printk_kthread and unlocks the console_sem. So in the best case
> at this point there will be at least 1 processes trying to lock the
> console_sem: printk_kthread. (There can also be a process that was sleeping
> on the console_sem and that was woken up by console semaphore up(); and
> concurrent printk() invocations from other CPUs). But in the worst case
> there won't be any processes ready to take over the printing duty: it
> may take printk_kthread some time to become running; or printk_kthread
> may even never become running (a misbehaving scheduler, or other critical
> condition). That's why after we wake_up() printk_kthread we can't
> immediately leave the printing loop, we must ensure that the console_sem
> has a new owner before we do so. Therefore, `atomic_print_limit' is a soft
> limit, not the hard one: we let task to overrun `atomic_print_limit'.
> But, at the same time, the console_unlock() printing loop behaves differently
> for tasks that have exceeded `atomic_print_limit': after every printed
> logbuf entry (call_console_drivers()) such a process wakes up printk_kthread,
> unlocks the console_sem and attempts to console_trylock() a bit later
> (if there any are pending messages in the logbuf, of course). In the best case
> scenario either printk_kthread or some other tasks will lock the console_sem,
> so current printing task will see failed console_trylock(), which will
> indicate a successful printing offloading. In the worst case, however,
> current will successfully console_trylock(), which will indicate that
> offloading did not take place and we can't return from console_unlock(),
> so the printing task will print one more line from the logbuf and attempt
> to offload printing once again; and it will continue doing so until another
> process locks the console_sem or until there are pending messages in the
> logbuf. So if everything goes wrong - we can't wakeup printk_kthread and
> there are no other processes sleeping on the console_sem or trying to down()
> it - then we will have the existing console_unlock() behavior: print all
> pending messages in one shot.

Please, try to avoid such long paragraphs ;-) It looks fine when you
read it for the first time. But it becomes problematic when you try
to go back and re-read some detail.

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2cb7f4753b76..a113f684066c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2155,6 +2216,85 @@ static inline int can_use_console(void)
>  	return cpu_online(raw_smp_processor_id()) || have_callable_console();
>  }
>  
> +/*
> + * Under heavy printing load or with a slow serial console (or both)
> + * console_unlock() can stall CPUs, which can result in soft/hard-lockups,
> + * lost interrupts, RCU stalls, etc. Therefore we attempt to limit the
> + * number of lines a process can print from console_unlock().
> + *
> + * There is one more reason to do offloading - there might be other processes
> + * (including user space tasks) sleeping on the console_sem in uninterruptible
> + * state; and keeping the console_sem locked for long time may have a negative
> + * impact on them.
> + *
> + * This function must be called from 'printk_safe' context under
> + * console_sem lock.
> + */
> +static inline bool console_offload_printing(void)
> +{
> +	static struct task_struct *printing_task;
> +	static unsigned long long lines_printed;
> +	static unsigned long saved_csw;
> +
> +	if (!printk_offloading_enabled())
> +		return false;
> +
> +	if (system_state != SYSTEM_RUNNING || oops_in_progress)
> +		return false;

Wow, I really like this test. I was not aware of this variable.

Well, I would allow to use the offload also during boot.
We have reports when softlockups happened during boot.
The offloading can be easily disabled via the command line
but not the other way around.


> +	/* A new task - reset the counters. */
> +	if (printing_task != current) {
> +		lines_printed = 0;
> +		printing_task = current;
> +		saved_csw = current->nvcsw + current->nivcsw;
> +		return false;
> +	}
> +
> +	lines_printed++;
> +	if (lines_printed < atomic_print_limit)
> +		return false;
> +
> +	if (current == printk_kthread) {
> +		/*
> +		 * Reset the `lines_printed' counter, just in case if
> +		 * printk_kthread is the only process left that would
> +		 * down() console_sem and call console_unlock().
> +		 */
> +		lines_printed = 0;
> +		return true;
> +	}
> +
> +	/*
> +	 * A trivial emergency enforcement - give up on printk_kthread if
> +	 * we can't wake it up. This assumes that `atomic_print_limit' is
> +	 * reasonably and sufficiently large.
> +	 */
> +	if (lines_printed > 10 * (unsigned long long)atomic_print_limit &&
> +			saved_csw == (current->nvcsw + current->nivcsw)) {
> +		printk_enforce_emergency = true;
> +		pr_crit("Declaring printk emergency mode.\n");
> +		return false;
> +	}

This is interesting way how to detect that the system is in the
emergency state. The question is how reliable it is.

It might work only if the waken printk kthread is scheduled in a predictable
time period. You try to achieve this by setting read time priority
(4th patch of this patchset). The question is how this would work:


First, the real time priority is questionable on its own. Logging
is important but the real time priority is dangerous. Any "flood"
of messages will starve all other processes with normal priority.
It is better than a softlockup but it might cause problems as well.


Second, it might be pretty hard to tune the prediction of the
emergency situation. The above formula might depend on:

  + the speed of active consoles

  + length of the printed lines

  + the number and priorities of other real time tasks
    on the system and the related average time when
    printk kthread is scheduled

  + HZ size; the lower HZ the more messages might be
    handled in one jiffy

  + the current CPU speed, ...


Third, if the console_trylock() fails below, it is not guaranteed that
the other waiter will really continue printing. The console_trylock()
would fail because someone else called console_lock(), failed to get it,
and went into sleep. There is no guarantee that it will be waken
once again.

I have discussed this with my colleagues. They think that we are
lost anyway if the scheduling does not work. It is because many
console drivers use sleeping locks and depends on scheduling.

They suggested me to write down what we really wanted to achieve
first. I thought that I knew it. But it actually might be
interesting to write it down:


Ideal printk:

  + show and store all messages to know what's going on


Current printk does the best effort to:

  + store "all" messages into logbuf_log and

      + show important ones on console
      + allow to see/store them in userspace
      + allow to find them in crashdump


Current printk has limitations:

  + storing the messages:

      + limited buffer size => loosing messages
      + possible deadlocks => temporary passing via alternative
			buffers in NMI, recursion


  + userspace access ?:

      + on demand => more behind the reality that it might be


  + console output:

      + slow => async =>
		+ might be far behind
		+ loosing messages
		+ showing only ones with high log level
		+ possible softlockup
		+ not guaranteed when dying

      + possible deadlock => explicitly async in sched, NMI


Now, this patchset is trying to avoid the softlockup caused by console
output by making it even more async. This makes the other problems of
the async output more visible.

The patch tries to reduce the negative impact by detecting when
the negative impact is visible and switching back to the sync mode.

The fact is that we need to handle the emergency situation well.
This is where the system has troubles and the messages might
be crucial to debug the problem and fix it.


<repeating myself>

The question is if we are able to detect the emergency situation
reliable and if the solution does not bring other problems.

If the detection has too false positives, it will not help much
with the original problem (softlockup). If it has too false negatives,
it will increase the negative impact of the limits.

Also adding kthread with real time priority might have strange
side effects on its own.

</repeating myself>


Let me to look at it from the other side. What are the emergency
situations?

  + oops/panic() - easy to detect
  + suspend, shutdown, halt - rather easy to detect as well
  + flood of messages => console far behind logbuf_log
  + sudden dead without triggering panic()

Did I miss anything?

Anyway, we could detect the first three situations rather well:

  + oops/panic by a check for oops_in_progress

  + suspend, shutdown, ... by the check of system_state
    and the notifiers in the 5th patch

  + I would suggest to detect flood of messages by a difference
    between log_next_seq and console_seq. We could take into
    account the number of messages in the buffer
    (log_next_seq - log_first_seq)

  + I do not see a reliable way how to detect the sudden dead.


Please note, that we do not really need that questionable
trick with console_unlock(), console_trylock(), counting handled
messages over the limit, and realtime kthread.


In addition, I think that a semi-sync-async mode might better
handle the sudden dead. What is it? I have already talked about
something similar last time.

Normal printk() should do something like:

  printk()
    vprintk_emit()
      log_store()
      if (!deferred)
	console_trylock()
	console_unlock()
	  while (num_lines < atomic_print_limit)
	    handle message
	  up_console_sem()
	  if (pending_messages)
	    wakeup(printk_kthread)


and printk_kthread should do:

   while() {

       if (!pending_messages)
	 sleep()

       console_lock()

       preempt_disable()
       console_unlock()
	 while (pending_message && !need_resched)
	   handle message
	 up_console_sem()
       preempt_enable()
    }


By other words:

  + printk() should try to get console and handle few lines
    synchronously. It will handle its own message in most cases.

  + printk_kthread is only a fallback. It is waken when there are
    unhandled messages. It handles as many messages as
    possible in its assigned time slot. But it releases
    console_sem before it goes into sleep.


The advantages:

   + all messages are handled synchronously if there is not
     a flood of messages

   + if there is the flood of messages; kthread works only
     as a fallback; processes that produce the messages
     are involved into handling the console (nature throttling);
     the messages appear on the console even when printk_kthread
     is sleeping; but the softlockup is prevented

=> it should work pretty well in all situations, including the flood
   of messages and sudden dead.

What do you think, please?

Did I made a mistake in the logic?
Did I miss some important reason for an emergency situation?

I am sorry for the long mail. It took me long time to sort ideas.
I hope that it makes sense.


Best Regards,
Petr