Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	Daniel Thompson <daniel.thompson@linaro.org>,
	Jiri Kosina <jkosina@suse.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Chris Metcalf <cmetcalf@ezchip.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	adi-buildroot-devel@lists.sourceforge.net,
	linux-cris-kernel@axis.com, linux-mips@linux-mips.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	linux-sh@vger.kernel.org, sparclinux@vger.kernel.org,
	Jan Kara <jack@suse.cz>, Ralf Baechle <ralf@linux-mips.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	David Miller <davem@davemloft.net>
Subject: Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
Date: Thu, 27 Apr 2017 15:38:19 +0200	[thread overview]
Message-ID: <20170427133819.GW3452@pathway.suse.cz> (raw)
In-Reply-To: <20170424021747.GA630@jagdpanzerIV.localdomain>

On Mon 2017-04-24 11:17:47, Sergey Senozhatsky wrote:
> On (04/21/17 14:06), Petr Mladek wrote:
> [..]
> > > I agree that this_cpu_read(printk_context) covers slightly more than
> > > logbuf_lock scope, so we may get positive this_cpu_read(printk_context)
> > > with unlocked logbuf_lock, but I don't tend to think that it's a big
> > > problem.
> > 
> > PRINTK_SAFE_CONTEXT is set also in call_console_drivers().
> > It might take rather long and logbuf_lock is availe. So, it is
> > noticeable source of false positives.
> 
> yes, agree.
> 
> probably we need additional printk_safe annotations for
> 		"logbuf_lock is locked from _this_ CPU"
> 
> false positives there can be very painful.
> 
> [..]
> > 	if (raw_spin_is_locked(&logbuf_lock))
> > 		this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
> > 	else
> > 		this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
> 
> well, if everyone is fine with logbuf_lock access from every CPU from every
> NMI then I won't object either. but may be it makes sense to reduce the
> possibility of false positives. Steven is loosing critically important logs,
> after all.
> 
> 
> by the way,
> does this `nmi_print_seq' bypass even fix anything for Steven?

I think that this is the most important question.

Steven, does the patch from
https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
help you to see the debug messages, please?

> it sort of
> can, in theory, but just in theory. so may be we need direct message flush
> from NMI handler (printk->console_unlock), which will be a really big problem.

I thought about it a lot and got scared where this might go.
We need to balance the usefulness and the complexity of the solution.

It took one year to discover this regression. Before it was
suggested to avoid calling printk() in NMI context at all.
Now, we are trying to fix printk() to handle MBs of messages
in NMI context.

If my proposed patch solves the problem for Steven, I would still
like to get similar solution in. It is not that complex and helps
to bypass the limited per-CPU buffer in most cases. I always thought
that 8kB might be not enough in some cases.

Note that my patch is very defensive. It uses the main log buffer
only when it is really safe. It has higher potential for unneeded
fallback but if it works for Steven (really existing usecase), ...

On the other hand, I would prefer to avoid any much more complex
solution until we have a real reports that they are needed.

Also we need to look for alternatives. There is a chance
to create crashdump and get the ftrace messages from it.
Also this might be scenario when we might need to suggest
the early_printk() patchset from Peter Zijlstra.

> logbuf might not be big enough for 4890096 messages (Steven's report
> mentions "Lost 4890096 message(s)!"). we are counting on the fact that
> in case of `nmi_print_seq' bypass some other CPU will call console_unlock()
> and print pending logbuf messages, but this is not guaranteed and the
> messages can be dropped even from logbuf.

Yup. I tested the patch here and I needed to increase the main log buffer
size to see all ftrace messages. Fortunately, it was possible to use a really
huge global buffer. But it is not realistic to use huge per-CPU ones.

Best Regards,
Petr

next prev parent reply	other threads:[~2017-04-27 13:38 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek
2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek
2016-04-27  9:31   ` Russell King - ARM Linux
2017-04-19 17:13   ` Steven Rostedt
2017-04-19 17:21     ` Peter Zijlstra
2017-04-20  3:31     ` Sergey Senozhatsky
2017-04-20 13:11       ` Petr Mladek
2017-04-21  1:57         ` Sergey Senozhatsky
2017-04-21 12:06           ` Petr Mladek
2017-04-24  2:17             ` Sergey Senozhatsky
2017-04-27 13:38               ` Petr Mladek [this message]
2017-04-27 14:31                 ` Steven Rostedt
2017-04-27 15:28                   ` Petr Mladek
2017-04-27 15:42                     ` Steven Rostedt
2017-04-28  9:02                 ` Peter Zijlstra
2017-04-28 13:44                   ` Petr Mladek
2017-04-28 13:58                     ` Peter Zijlstra
2017-04-28 14:47                       ` Steven Rostedt
2017-04-27 16:14         ` Steven Rostedt
2017-04-28  1:35           ` Sergey Senozhatsky
2017-04-28 12:57             ` Petr Mladek
2017-04-28 14:16               ` Steven Rostedt
2017-04-28  1:25         ` Sergey Senozhatsky
2017-04-28 12:38           ` Petr Mladek
2016-04-21 11:48 ` [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context Petr Mladek
2016-04-27  9:34   ` Russell King - ARM Linux
2016-04-21 11:48 ` [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek
2016-04-21 11:48 ` [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic Petr Mladek
2016-04-23  3:49   ` Sergey Senozhatsky
2016-04-26 14:21     ` Petr Mladek
2016-04-27  0:34       ` Sergey Senozhatsky
2016-04-27  0:36 ` [PATCH v5 0/4] Cleaning printk stuff in NMI context Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170427133819.GW3452@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=adi-buildroot-devel@lists.sourceforge.net \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cmetcalf@ezchip.com \
    --cc=daniel.thompson@linaro.org \
    --cc=davem@davemloft.net \
    --cc=jack@suse.cz \
    --cc=jkosina@suse.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-cris-kernel@axis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ralf@linux-mips.org \
    --cc=rmk+kernel@arm.linux.org.uk \
    --cc=rostedt@goodmis.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).