linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>, Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rafael Wysocki <rjw@rjwysocki.net>, Pavel Machek <pavel@ucw.cz>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-kernel@vger.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
Date: Thu, 28 Dec 2017 15:48:58 +0900	[thread overview]
Message-ID: <20171228064858.GA892@jagdpanzerIV> (raw)
In-Reply-To: <20171221231932.27727fab@vmware.local.home>


Hello,

On (12/21/17 23:19), Steven Rostedt wrote:
[..]
> > I wrote this before but this isn't a theoretical problem.  We see
> > these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> > pressure is high and OOM triggers and printk starts printing out OOM
> > warnings; then, a network packet comes in which triggers allocations
> > in the network layer, which fails due to memory pressure, which then
> > generates memory allocation failure messages which then generates
> > netconsole packets which then tries to allocate more memory and so on.
> 
> It doesn't matter if preemption is enabled or not. The hand off should
> happen either way.

preemption does matter. it matters so much, that it makes the first
thing your patch depends on to be questionable - if CPUA stuck
   in console_unlock() printing other CPU's messages, it's
   because there are currently printk()-s happening on CPUB-CPUZ.

which is not always true. with preemption enabled CPUA can stuck in
console_unlock() because printk()-s from CPUB-CPUZ could have happened
seconds or even minutes ago. I have demonstrated it with the traces; it
didn't convenience you. OK, I sat down and went trough Tetsuo's reports
starting from 2016:

	https://marc.info/?l=linux-kernel&m=151375384500555

and Tetsuo has reported several times that printing CPU can sleep for
minutes with console_sem being locked; while other CPUs are happy to
printk()->log_store() as much as they want. and that's why I believe
that that "The hand off should happen either way" is dangerous, especially
when we hand off from a preemptible context to an atomic context. you
don't think that this is a problem, because your patch requires logbuf to
be small enough for any atomic context (irq, under spin_lock, etc) to be
able to print out all the pending messages to all registered consoles [we
print to consoles sequentially] within watchdog_threshold seconds (10
seconds by default). who does adjust the size of the logbuf in a way that
console_unlock() can print the entire logbuf under 10 seconds?


> > It's just that there's no one else to give that flushing duty too, so
> > the pingpoinging that your patch implements can't really help
> > anything.
> > 
> > You argue that it isn't made worse by your patch, which may be true
> > but your patch doesn't solve actual problems and is most likely
> > unnecessary complication which gets in the way for the actual
> > solution.  It's a weird argument to push an approach which is
> > fundamentally broken.  Let's please stop that.
> 
> BULLSHIT!
> 
> It's not a complex solution, and coming from the cgroup and workqueue
> maintainer, that's a pretty ironic comment.
> 
> It has already been proven that it can solve problems:
> 
>   http://lkml.kernel.org/r/20171219143147.GB15210@dhcp22.suse.cz

and Tetsuo also said that his test does not cover all the scenarios;
and he has sent us all a pretty clear warning:

 : Therefore, while it is true that any approach would survive my
 : environment, it is dangerous to assume that any approach is safe
 : for my customer's enterprise servers.

https://marc.info/?l=linux-kernel&m=151377161705573


when it comes to lockups, printk() design is so flexible, that one can
justify nearly every patch no matter how many regressions it introduces,
by simply saying:
	"but the same lockup scenario could have happened even before my
	 patch. it's just printk() was designed this way. you were lucky
	 enough not to hit the problem before; now you are less lucky".

been there, done that. it's a trap.

> You don't think handing off printks to an offloaded thread isn't more
> complex nor can it cause even more issues (like more likely to lose
> relevant information on kernel crashes)?

printk_kthread *used* to be way too independent. basically what we had
before was

	bad_function()
	 printk()
	  vprintk_emit()
	  {
	    if (oops_in_progress)
	       can_offload_to_printk = false;

	    if (can_offload_to_printk)
	       wake_up(printk_kthread);
	    else if (console_trylock())
	       console_unlock();
	  }


the bad things:

- first, we do not take into account the fact that printk_kthread can
  already be active.

- second, we do not take into account the fact that printk_kthread can
  be way-way behind - a huge gap between `console_seq' and `log_next_seq'.

- third, we do not take into account the fact that printk_kthread can
  be preempted under console_sem.

so, IOW, the CPU which was in trouble declared the "oh, we should not
offload to printk kthread" emergency only for itself, basically. the CPU
which printk_kthread was running on or sleeping on [preemption] did not
care, neither did printk_kthread.

what we have now:

- printk_kthread cannot be preempted under console_sem. if it acquires
  the console_sem it stays RUNNING, printing the messages.

- printk_kthread does check for emergency on other CPUs. right before
  it starts printing the next pending logbuf line (every line).
  IOW, instead of that:

  console_unlock()
  {
      again:
           for (;;) {
              call_console_drivers();
           }

           up();

           if (more_pending_messages && down_trylock())
               goto again;
  }

  it now does this:

  console_unlock()
  {
          preempt_disable();
     again:
          for (;;) {
              if (something_is_not_right)
                 break;
              if (printing_for_too_long)
                 break;
              call_console_drivers();
          }

          up();
          preempt_enable();

          if (!something_is_not_right && !printing_for_too_long &&
                           more_pending_messages && down_trylock())
              goto again;
  }



so if CPUA declares emergency (oops_in_progress, etc) then printk_kthread,
if it's running, will get out of CPUA's way as soon as possible. so I
don't think we are still *more likely* to lose relevant information on
kernel crashes because of printk_kthread.

and I'm actually thinking about returning back the old vprintk_emit()
behavior

       vprintk_emit()
       {
+         preempt_disable();
         if (console_trylock())
             console_unlock();
+         preempt_enable();
       }

and letting console_unlock() to offload when needed. preemption in
vprintk_emit() is no good.

	-ss

  reply	other threads:[~2017-12-28  6:48 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 02/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 03/12] printk: consider watchdogs thresholds for offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 04/12] printk: add sync printk_emergency API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 05/12] printk: enable printk offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 06/12] PM: switch between printk emergency modes Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 07/12] printk: register syscore notifier Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 08/12] printk: force printk_kthread to offload printing Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 09/12] printk: do not cond_resched() when we can offload Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 10/12] printk: move offloading logic to per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 11/12] printk: add offloading watchdog API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 12/12] printk: improve printk offloading mechanism Sergey Senozhatsky
2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 1/4] printk/lib: add offloading trace events and test_printk module Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 2/4] printk/lib: simulate slow consoles Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 3/4] printk: add offloading takeover traces Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 4/4] printk: add task name and CPU to console messages Sergey Senozhatsky
2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
2017-12-14 14:39   ` Sergey Senozhatsky
2017-12-15 15:55     ` Steven Rostedt
2017-12-14 15:25   ` Tejun Heo
2017-12-14 17:55     ` Steven Rostedt
2017-12-14 18:11       ` Tejun Heo
2017-12-14 18:21         ` Steven Rostedt
2017-12-22  0:09           ` Tejun Heo
2017-12-22  4:19             ` Steven Rostedt
2017-12-28  6:48               ` Sergey Senozhatsky [this message]
2017-12-28 10:07                 ` Sergey Senozhatsky
2017-12-29 13:59                   ` Tetsuo Handa
2017-12-31  1:44                     ` Sergey Senozhatsky
2018-01-09 20:06               ` Tejun Heo
2018-01-09 22:08                 ` Tetsuo Handa
2018-01-09 22:17                   ` Tejun Heo
2018-01-11 11:14                     ` Tetsuo Handa
2018-01-09 22:08                 ` Steven Rostedt
2018-01-09 22:17                   ` Tejun Heo
2018-01-09 22:47                     ` Steven Rostedt
2018-01-09 22:53                       ` Tejun Heo
2018-01-10  7:18                         ` Steven Rostedt
2018-01-10 14:04                           ` Tejun Heo
2017-12-15  2:10         ` Sergey Senozhatsky
2017-12-15  3:18           ` Steven Rostedt
2017-12-15  5:06             ` Sergey Senozhatsky
2017-12-15  6:52               ` Sergey Senozhatsky
2017-12-15 15:39                 ` Steven Rostedt
2017-12-15  8:31               ` Petr Mladek
2017-12-15  8:42                 ` Sergey Senozhatsky
2017-12-15  9:08                   ` Petr Mladek
2017-12-15 15:47                     ` Steven Rostedt
2017-12-18  9:36                     ` Sergey Senozhatsky
2017-12-18 10:36                       ` Sergey Senozhatsky
2017-12-18 12:35                         ` Sergey Senozhatsky
2017-12-18 13:51                         ` Petr Mladek
2017-12-18 13:31                       ` Petr Mladek
2017-12-18 13:39                         ` Sergey Senozhatsky
2017-12-18 14:13                           ` Petr Mladek
2017-12-18 17:46                             ` Steven Rostedt
2017-12-19  1:03                               ` Sergey Senozhatsky
2017-12-19  1:08                                 ` Steven Rostedt
2017-12-19  1:24                                   ` Sergey Senozhatsky
2017-12-19  2:03                                     ` Steven Rostedt
2017-12-19  2:46                                       ` Sergey Senozhatsky
2017-12-19  3:38                                         ` Steven Rostedt
2017-12-19  4:58                                           ` Sergey Senozhatsky
2017-12-19 14:40                                             ` Steven Rostedt
2017-12-20  7:46                                               ` Sergey Senozhatsky
2017-12-19 14:31                                     ` Michal Hocko
2017-12-20  7:10                                       ` Sergey Senozhatsky
2017-12-20 12:06                                         ` Tetsuo Handa
2017-12-21  6:52                                           ` Sergey Senozhatsky
2017-12-19  4:36                               ` Sergey Senozhatsky
2017-12-18 14:10                         ` Petr Mladek
2017-12-19  1:09                           ` Sergey Senozhatsky
2017-12-15 15:42                 ` Steven Rostedt
2017-12-15 15:19               ` Steven Rostedt
2017-12-19  0:52                 ` Sergey Senozhatsky
2017-12-19  1:03                   ` Steven Rostedt
2018-01-05  2:54 ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171228064858.GA892@jagdpanzerIV \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).