From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>, Nigel Croxon <ncroxon@redhat.com>,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Steven Rostedt <rostedt@goodmis.org>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
dm-devel@redhat.com, Mikulas Patocka <mpatocka@redhat.com>,
linux-serial@vger.kernel.org
Subject: Re: Serial console is causing system lock-up
Date: Thu, 7 Mar 2019 21:26:42 +0900 [thread overview]
Message-ID: <20190307122642.GA10415@tigerII.localdomain> (raw)
In-Reply-To: <87o96nezr2.fsf@linutronix.de>
On (03/07/19 11:37), John Ogness wrote:
> > Sorry John, the reasoning is that I'm trying to understand
> > why this does not look like soft or hard lock-up or RCU stall
> > scenario.
>
> The reason is that you are seeing data being printed on the console. The
> watchdogs (soft, hard, rcu, nmi) are all touched with each emergency
> message.
Correct. Please see below.
> > The CPU which spins on prb_lock() can have preemption disabled and,
> > additionally, can have local IRQs disabled, or be under RCU read
> > side lock. If consoles are busy, then there are CPUs which printk()
> > data and keep prb_lock contended; prb_lock() does not seem to be
> > fair. What am I missing?
>
> You are correct. Making prb_lock fair might be something we want to look
> into. Perhaps also based on the loglevel of what needs to be
> printed. (For example, KERN_ALERT always wins over KERN_CRIT.)
Good.
I'm not insisting, but I have a feeling that touching watchdogs after
call_console_drivers() might be too late, sometimes. When we spin in
prb_lock() we wait for all CPUs which are before/ahead of us to
finish their call_console_drivers(), one by one. So if CPUZ is very
unlucky and is in atomic context, then prb_lock() for that CPUZ can
last for N * call_console_drivers(). And depending on N (which also
includes unfairness) and call_console_drivers() timings NMI watchdog
may pay CPUZ a visit before it gets its chance to touch watchdogs.
*May be* sometimes we might want to touch watchdogs in prb_lock().
So, given the design of new printk(), I can't help thinking about the
fact that current
"the winner takes it all"
may become
"the winner waits for all".
Mikulas mentioned that he observes "** X messages dropped" warnings.
And this suggests that, _most likely_, we had significantly more that
2 CPUs calling printk() concurrently.
- A single source - single CPU calling printk() - would not lose messages,
because it would print its own message before it printk() another one (we
still could have another CPU rescheduling under console_sem, but I don't
think this is the case).
- Two CPUs would also probably not lose messages, Steven's console_owner
would throttle them down.
So I think what we have was a spike of WARN/ERR printk-s comming from
N CPUs concurrently.
And this brings us to another pessimistic scenario: a very unlucky
CPUZ has to spin in prb_lock() waiting for other CPUs to print out
the very same 2 million chars. Which in terms of printk() latency
looks to me just like current printk.
John, sorry to ask this, does new printk() design always provide
latency guarantees good enough for PREEMPT_RT?
I'm surely missing something. Where am I wrong?
-ss
next prev parent reply other threads:[~2019-03-07 12:26 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-06 14:27 Serial console is causing system lock-up Mikulas Patocka
2019-03-06 15:22 ` Petr Mladek
2019-03-06 16:07 ` Mikulas Patocka
2019-03-06 16:30 ` Theodore Y. Ts'o
2019-03-06 17:11 ` Mikulas Patocka
2019-03-06 22:19 ` Steven Rostedt
2019-03-06 22:43 ` John Ogness
2019-03-07 2:22 ` Sergey Senozhatsky
2019-03-07 8:17 ` John Ogness
2019-03-07 8:25 ` Sergey Senozhatsky
2019-03-07 8:34 ` John Ogness
2019-03-07 9:17 ` Sergey Senozhatsky
2019-03-07 10:37 ` John Ogness
2019-03-07 12:26 ` Sergey Senozhatsky [this message]
2019-03-07 12:54 ` Mikulas Patocka
2019-03-07 14:21 ` John Ogness
2019-03-07 15:35 ` Petr Mladek
2019-03-12 2:32 ` Sergey Senozhatsky
2019-03-12 8:17 ` John Ogness
2019-03-12 8:59 ` Sergey Senozhatsky
2019-03-12 10:05 ` Mikulas Patocka
2019-03-12 13:19 ` John Ogness
2019-03-12 13:44 ` Petr Mladek
2019-03-12 12:08 ` Petr Mladek
2019-03-12 15:19 ` John Ogness
2019-03-13 2:38 ` Sergey Senozhatsky
2019-03-13 8:43 ` John Ogness
2019-03-14 10:30 ` Sergey Senozhatsky
2019-03-07 14:08 ` John Stoffel
2019-03-07 14:26 ` Mikulas Patocka
2019-03-08 1:22 ` Sergey Senozhatsky
2019-03-08 1:39 ` Sergey Senozhatsky
2019-03-08 2:36 ` John Ogness
2019-03-07 15:16 ` Petr Mladek
2019-03-07 1:56 ` Sergey Senozhatsky
2019-03-07 13:12 ` Mikulas Patocka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190307122642.GA10415@tigerII.localdomain \
--to=sergey.senozhatsky@gmail.com \
--cc=dm-devel@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=john.ogness@linutronix.de \
--cc=linux-serial@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=ncroxon@redhat.com \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox