All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Tejun Heo <tj@kernel.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Petr Mladek <pmladek@suse.com>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rafael Wysocki <rjw@rjwysocki.net>, Pavel Machek <pavel@ucw.cz>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
Date: Fri, 15 Dec 2017 14:06:07 +0900	[thread overview]
Message-ID: <20171215050607.GC11199@jagdpanzerIV> (raw)
In-Reply-To: <20171214221831.3ead0298@vmware.local.home>

Hello,

On (12/14/17 22:18), Steven Rostedt wrote:
> > Steven, your approach works ONLY when we have the following preconditions:
> > 
> >  a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> >     etc) context
> > 
> >         what does guarantee that? what happens if there is NO non-atomic
> >         CPU or that non-atomic simplky missses the console_owner != false
> >         point? we are going to conclude
> > 
> >         "if printk() doesn't work for you, it's because you are holding it wrong"?
> > 
> > 
> >         what if that non-atomic CPU does not call printk(), but instead
> >         it does console_lock()/console_unlock()? why there is no handoff?
> > 
> >         CPU0				CPU1 ~ CPU10
> > 					in atomic contexts [!]. ping-ponging console_sem
> > 					ownership to each other. while what they really
> > 					need to do is to simply up() and let CPU0 to
> > 					handle it.
> > 					printk
> > 	console_lock()
> > 	 schedule()
> > 					...
> > 					printk
> > 					printk
> > 					...
> > 					printk
> > 					printk
> > 
> > 					up()
> > 
> > 	// woken up
> > 	console_unlock()
> > 
> >         why do we make an emphasis on fixing vprintk_printk()?
> 
> Where do we do the above? And has this been proven to be an issue?

um... hundreds of cases.

deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
happening at the same moment + NMI backtraces from all the CPUs (more
than 3 cpus) that follows the lockups, over not-so-fast serial console.
exactly the bug report I received two days ago. so which one of the CPUs
here is a good candidate to successfully emit all of the pending logbuf
entries? none. all of them either have local IRQs disabled, or dump_stack()
from either backtrace IPI or backtrace NMI (depending on the configuration).


do we periodically do console_lock() on a running system? yes, we do.
add to console_unlock()

---

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b9006617710f..1c811f6d94bf 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2143,6 +2143,10 @@ void console_unlock(void)
        bool wake_klogd = false;
        bool do_cond_resched, retry;
 
+       if (!(current->flags & PF_KTHREAD))
+               dump_stack();
+
+
        if (console_suspended) {
                up_console_sem();
                return;

---

and just boot the system.


I work for a company that has several thousand engineers spread
across the globe. and people do use printk(), and issues do happen.

the scenarios that Tejun and I talk about are not theoretical. if those
scenarios are completely theoretical, as you suggest, - then, OK, what
exactly guarantees that

	whenever atomic CPUs printk there is always a non-atomic
	CPU to take over the printing?



> >  b) non-atomic CPU sees console_owner set (which is set for a very short
> >     period of time)
> > 
> >         again. what if that non-atomic CPU does not see console_owner?
> >         "don't use printk()"?
> 
> May I ask, why are we doing the printk in the first place?

this argument is really may be applied against your patch as well. I
really don't want us to have this type of "technical" discussion.

printk() is a tool for developers. but developers can't use.


> >  c) the task that is looping in console_unlock() sees non-atomic CPU when
> >     console_owner is set.
> 
> I haven't looked at the latest code, but my last patch didn't care
> about "atomic" and "non-atomic"

I know. and I think it is sort of a problem.

lots of printk-s are happening from IRQs / softirqs and so on.
take a look at CONFIG_IP_ROUTE_VERBOSE, for example.

do_softirq() -> ip_handle_martian_source() and a bunch of other
places. 
these irq->printk-s can "steal" the console_sem and go to
console_unlock().

"don't use printk() then" type of argument does not really help
to a guy who reports the lockup.


> > Steven, I thought we reached the agreement [**] that the solution we should
> > be working on is a combination of prinkt_kthread and console_sem hand
> > off. Simply because it adds the missing "there is a non-atomic CPU wishing
> > to console_unlock()" thing.
> > 
> > 	lkml.kernel.org/r/20171108162813.GA983427@devbig577.frc2.facebook.com
> > 
> > 	https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> > 	https://marc.info/?l=linux-kernel&m=151015141407368&w=2
> > 	https://marc.info/?l=linux-kernel&m=151018900919386&w=2
> > 	https://marc.info/?l=linux-kernel&m=151019815721161&w=2
> > 	https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> > **	https://marc.info/?l=linux-kernel&m=151020404622181&w=2
> > **	https://marc.info/?l=linux-kernel&m=151020565222469&w=2
> 
> I'm still fine with the hybrid approach, but I want to see a problem
> first before we fix it.
> 
> > 
> > 
> > what am I missing?
> 
> The reproducer.

will that printk_test module

  lkml.kernel.org/r/20171204135314.9122-2-sergey.senozhatsky@gmail.com

suffice?

	-ss

  reply	other threads:[~2017-12-15  5:06 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-04 13:48 [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 01/12] printk: move printk_pending out of per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 02/12] printk: introduce printing kernel thread Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 03/12] printk: consider watchdogs thresholds for offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 04/12] printk: add sync printk_emergency API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 05/12] printk: enable printk offloading Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 06/12] PM: switch between printk emergency modes Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 07/12] printk: register syscore notifier Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 08/12] printk: force printk_kthread to offload printing Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 09/12] printk: do not cond_resched() when we can offload Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 10/12] printk: move offloading logic to per-cpu Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 11/12] printk: add offloading watchdog API Sergey Senozhatsky
2017-12-04 13:48 ` [RFC][PATCHv6 12/12] printk: improve printk offloading mechanism Sergey Senozhatsky
2017-12-04 13:53 ` [PATCH 0/4] printk: offloading testing module/trace events Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 1/4] printk/lib: add offloading trace events and test_printk module Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 2/4] printk/lib: simulate slow consoles Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 3/4] printk: add offloading takeover traces Sergey Senozhatsky
2017-12-04 13:53   ` [PATCH 4/4] printk: add task name and CPU to console messages Sergey Senozhatsky
2017-12-14 14:27 ` [RFC][PATCHv6 00/12] printk: introduce printing kernel thread Petr Mladek
2017-12-14 14:39   ` Sergey Senozhatsky
2017-12-15 15:55     ` Steven Rostedt
2017-12-14 15:25   ` Tejun Heo
2017-12-14 17:55     ` Steven Rostedt
2017-12-14 18:11       ` Tejun Heo
2017-12-14 18:21         ` Steven Rostedt
2017-12-22  0:09           ` Tejun Heo
2017-12-22  4:19             ` Steven Rostedt
2017-12-28  6:48               ` Sergey Senozhatsky
2017-12-28 10:07                 ` Sergey Senozhatsky
2017-12-29 13:59                   ` Tetsuo Handa
2017-12-31  1:44                     ` Sergey Senozhatsky
2018-01-09 20:06               ` Tejun Heo
2018-01-09 22:08                 ` Tetsuo Handa
2018-01-09 22:17                   ` Tejun Heo
2018-01-11 11:14                     ` Tetsuo Handa
2018-01-09 22:08                 ` Steven Rostedt
2018-01-09 22:17                   ` Tejun Heo
2018-01-09 22:47                     ` Steven Rostedt
2018-01-09 22:53                       ` Tejun Heo
2018-01-10  7:18                         ` Steven Rostedt
2018-01-10 14:04                           ` Tejun Heo
2017-12-15  2:10         ` Sergey Senozhatsky
2017-12-15  3:18           ` Steven Rostedt
2017-12-15  5:06             ` Sergey Senozhatsky [this message]
2017-12-15  6:52               ` Sergey Senozhatsky
2017-12-15 15:39                 ` Steven Rostedt
2017-12-15  8:31               ` Petr Mladek
2017-12-15  8:42                 ` Sergey Senozhatsky
2017-12-15  9:08                   ` Petr Mladek
2017-12-15 15:47                     ` Steven Rostedt
2017-12-18  9:36                     ` Sergey Senozhatsky
2017-12-18 10:36                       ` Sergey Senozhatsky
2017-12-18 12:35                         ` Sergey Senozhatsky
2017-12-18 13:51                         ` Petr Mladek
2017-12-18 13:31                       ` Petr Mladek
2017-12-18 13:39                         ` Sergey Senozhatsky
2017-12-18 14:13                           ` Petr Mladek
2017-12-18 17:46                             ` Steven Rostedt
2017-12-19  1:03                               ` Sergey Senozhatsky
2017-12-19  1:08                                 ` Steven Rostedt
2017-12-19  1:24                                   ` Sergey Senozhatsky
2017-12-19  2:03                                     ` Steven Rostedt
2017-12-19  2:46                                       ` Sergey Senozhatsky
2017-12-19  3:38                                         ` Steven Rostedt
2017-12-19  4:58                                           ` Sergey Senozhatsky
2017-12-19 14:40                                             ` Steven Rostedt
2017-12-20  7:46                                               ` Sergey Senozhatsky
2017-12-19 14:31                                     ` Michal Hocko
2017-12-20  7:10                                       ` Sergey Senozhatsky
2017-12-20 12:06                                         ` Tetsuo Handa
2017-12-21  6:52                                           ` Sergey Senozhatsky
2017-12-19  4:36                               ` Sergey Senozhatsky
2017-12-18 14:10                         ` Petr Mladek
2017-12-19  1:09                           ` Sergey Senozhatsky
2017-12-15 15:42                 ` Steven Rostedt
2017-12-15 15:19               ` Steven Rostedt
2017-12-19  0:52                 ` Sergey Senozhatsky
2017-12-19  1:03                   ` Steven Rostedt
2018-01-05  2:54 ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171215050607.GC11199@jagdpanzerIV \
    --to=sergey.senozhatsky.work@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.