public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Tejun Heo <tj@kernel.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Petr Mladek <pmladek@suse.com>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rafael Wysocki <rjw@rjwysocki.net>, Pavel Machek <pavel@ucw.cz>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
Date: Tue, 19 Dec 2017 09:52:48 +0900	[thread overview]
Message-ID: <20171219005248.GA8892@jagdpanzerIV> (raw)
In-Reply-To: <20171215101957.5d4a1004@gandalf.local.home>

Hello Steven,

I couldn't reply sooner.


On (12/15/17 10:19), Steven Rostedt wrote:
> > On (12/14/17 22:18), Steven Rostedt wrote:
> > > > Steven, your approach works ONLY when we have the following preconditions:
> > > > 
> > > >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> > > >     etc) context
> > > > 
> > > >         what does guarantee that? what happens if there is NO non-atomic
> > > >         CPU or that non-atomic simplky missses the console_owner != false
> > > >         point? we are going to conclude
> > > > 
> > > >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > > > 
> > > > 
> > > >         what if that non-atomic CPU does not call printk(), but instead
> > > >         it does console_lock()/console_unlock()? why there is no handoff?
> 
> The case here, you are talking about a CPU doing console_lock() from a
> non printk() case. Which is what I was asking about how often this
> happens.

I'd say often enough. but the point I was trying to make is that we can
have non-atomic CPUs which can do the print out, instead of "sharing the
load" between atomic CPUs.

> As for why there's no handoff. Does the non printk()
> console_lock/unlock ever happen from a critical location? I don't think
> it does (but I haven't checked). Then it is the perfect candidate to do
> all the printing.

that's right. that is the point I was trying making. we can have better
candidates to do all the printing.

[..]
> > deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> > happening at the same moment + NMI backtraces from all the CPUs (more
> > than 3 cpus) that follows the lockups, over not-so-fast serial console.
> > exactly the bug report I received two days ago. so which one of the CPUs
> > here is a good candidate to successfully emit all of the pending logbuf
> > entries? none. all of them either have local IRQs disabled, or dump_stack()
> > from either backtrace IPI or backtrace NMI (depending on the configuration).
> > 
> 
> Is the above showing an issue of console_lock happening in the non
> printk() case?
>
> > do we periodically do console_lock() on a running system? yes, we do.
> > add to console_unlock()
> 
> Right, and the non printk() console_lock() should be fine to do all
> printing when it unlocks.

that's right.

> > this argument is really may be applied against your patch as well. I
> > really don't want us to have this type of "technical" discussion.
> 
> Sure, but my patch fixes the unfair approach that printk currently does.

I did tests yesterday, traces are available. I can't conclude that
the patch fixes the unfairness of printk().


> > printk() is a tool for developers. but developers can't use.
> > 
> > 
> > > >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> > > >     console_owner is set.  
> > > 
> > > I haven't looked at the latest code, but my last patch didn't care
> > > about "atomic" and "non-atomic"  
> > 
> > I know. and I think it is sort of a problem.
> 
> Please show me the case that it is. And don't explain where it is.
> Please apply the patch and have the problem occur and show it to me.
> That's all that I'm asking for.

I did some tests yesterday. I posted analysis and traces.

[..]
> No, because it is unrealistic. For example:

right.

> +static void test_noirq_console_unlock(void)
> +{
> +       unsigned long flags;
> +       unsigned long num_messages = 0;
> +
> +       pr_err("=== TEST %s\n", __func__);
> +
> +       num_messages = 0;
> +       console_lock();
> +       while (num_messages++ < max_num_messages)
> +               pr_info("=== %s Append message %lu out of %lu\n",
> +                               __func__,
> +                               num_messages,
> +                               max_num_messages);
> +
> +       local_irq_save(flags);
> +       console_unlock();
> 
> Where in the kernel do we do this?

the funny thing is that we _are going to start doing this_ with
the console_owner hand off enabled.

consider the following case

we have console_lock() from non-atomic context. console_sem owner is
getting preempted, under console_sem. which is totally possible and
happens a lot. in the mean time we have OOM, which can print a lot of
info. by the time console_sem returns back to TASK_RUNNING logbuf
contains some number of pending messages [lets say 10 seconds worth
of printing]. console owner goes to console_unlock(). accidentally
we have printk from IRQ on CPUz. console_owner hands over printing
duty to CPUz. so now we have to print 10 seconds worth of OOM messages
from irq.



CPU0                        CPU1 ~ CPUx                     CPUz

console_lock

 << preempted >>


   OOM                    OOM printouts, lots
                          of OOM traces, etc.

                          OOM end [progress done].

 << back to RUNNING >>

  console_unlock()
    
    for (;;)
      sets console_owner
      call_console_drivers()				  IRQ
                                                         printk
							  sees console_owner
							  sets console_waiter

      clears console_owner
      sees console_waier
      handoff
                                                          for (;;) {
							     call_console_drivers()
							     ??? lockup
							  }
							  up()

this is how we down() from non-atomic and up() from atomic [if we make
it to up(). we might end up in NMI panic]. this scenario is totally possible,
isn't it? the optimistic expectation here is that some other printk() from
non-atomic CPU will jump in and take over printing from atomic CPUz. but I
don't see why we are counting on it.

	-ss

  reply	other threads:[~2017-12-19  0:52 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 02/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 03/12] printk: consider watchdogs thresholds for offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 04/12] printk: add sync printk_emergency API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 05/12] printk: enable printk offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 06/12] PM: switch between printk emergency modes Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 07/12] printk: register syscore notifier Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 08/12] printk: force printk_kthread to offload printing Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 09/12] printk: do not cond_resched() when we can offload Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 10/12] printk: move offloading logic to per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 11/12] printk: add offloading watchdog API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 12/12] printk: improve printk offloading mechanism Sergey Senozhatsky
2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 1/4] printk/lib: add offloading trace events and test_printk module Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 2/4] printk/lib: simulate slow consoles Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 3/4] printk: add offloading takeover traces Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 4/4] printk: add task name and CPU to console messages Sergey Senozhatsky
2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
2017-12-14 14:39   ` Sergey Senozhatsky
2017-12-15 15:55     ` Steven Rostedt
2017-12-14 15:25   ` Tejun Heo
2017-12-14 17:55     ` Steven Rostedt
2017-12-14 18:11       ` Tejun Heo
2017-12-14 18:21         ` Steven Rostedt
2017-12-22  0:09           ` Tejun Heo
2017-12-22  4:19             ` Steven Rostedt
2017-12-28  6:48               ` Sergey Senozhatsky
2017-12-28 10:07                 ` Sergey Senozhatsky
2017-12-29 13:59                   ` Tetsuo Handa
2017-12-31  1:44                     ` Sergey Senozhatsky
2018-01-09 20:06               ` Tejun Heo
2018-01-09 22:08                 ` Tetsuo Handa
2018-01-09 22:17                   ` Tejun Heo
2018-01-11 11:14                     ` Tetsuo Handa
2018-01-09 22:08                 ` Steven Rostedt
2018-01-09 22:17                   ` Tejun Heo
2018-01-09 22:47                     ` Steven Rostedt
2018-01-09 22:53                       ` Tejun Heo
2018-01-10  7:18                         ` Steven Rostedt
2018-01-10 14:04                           ` Tejun Heo
2017-12-15  2:10         ` Sergey Senozhatsky
2017-12-15  3:18           ` Steven Rostedt
2017-12-15  5:06             ` Sergey Senozhatsky
2017-12-15  6:52               ` Sergey Senozhatsky
2017-12-15 15:39                 ` Steven Rostedt
2017-12-15  8:31               ` Petr Mladek
2017-12-15  8:42                 ` Sergey Senozhatsky
2017-12-15  9:08                   ` Petr Mladek
2017-12-15 15:47                     ` Steven Rostedt
2017-12-18  9:36                     ` Sergey Senozhatsky
2017-12-18 10:36                       ` Sergey Senozhatsky
2017-12-18 12:35                         ` Sergey Senozhatsky
2017-12-18 13:51                         ` Petr Mladek
2017-12-18 13:31                       ` Petr Mladek
2017-12-18 13:39                         ` Sergey Senozhatsky
2017-12-18 14:13                           ` Petr Mladek
2017-12-18 17:46                             ` Steven Rostedt
2017-12-19  1:03                               ` Sergey Senozhatsky
2017-12-19  1:08                                 ` Steven Rostedt
2017-12-19  1:24                                   ` Sergey Senozhatsky
2017-12-19  2:03                                     ` Steven Rostedt
2017-12-19  2:46                                       ` Sergey Senozhatsky
2017-12-19  3:38                                         ` Steven Rostedt
2017-12-19  4:58                                           ` Sergey Senozhatsky
2017-12-19 14:40                                             ` Steven Rostedt
2017-12-20  7:46                                               ` Sergey Senozhatsky
2017-12-19 14:31                                     ` Michal Hocko
2017-12-20  7:10                                       ` Sergey Senozhatsky
2017-12-20 12:06                                         ` Tetsuo Handa
2017-12-21  6:52                                           ` Sergey Senozhatsky
2017-12-19  4:36                               ` Sergey Senozhatsky
2017-12-18 14:10                         ` Petr Mladek
2017-12-19  1:09                           ` Sergey Senozhatsky
2017-12-15 15:42                 ` Steven Rostedt
2017-12-15 15:19               ` Steven Rostedt
2017-12-19  0:52                 ` Sergey Senozhatsky [this message]
2017-12-19  1:03                   ` Steven Rostedt
2018-01-05  2:54 ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171219005248.GA8892@jagdpanzerIV \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox