All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: "long.wanglong" <long.wanglong@huawei.com>
Cc: Petr Mladek <pmladek@suse.cz>,
	rostedt@goodmis.org, jkosina@suse.cz, stable@vger.kernel.org,
	peifeiyue@huawei.com, linux-kernel@vger.kernel.org,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com, x86@kernel.org,
	morgan.wang@huawei.com, sasha.levin@oracle.com
Subject: Re: [PATCH v2 00/17] [request for stable 3.10 inclusion] x86/nmi: Print all cpu stacks from NMI safely
Date: Mon, 29 Jun 2015 16:56:12 -0700	[thread overview]
Message-ID: <20150629235612.GB29763@kroah.com> (raw)
In-Reply-To: <555D370F.2070501@huawei.com>

On Thu, May 21, 2015 at 09:38:23AM +0800, long.wanglong wrote:
> On 2015/5/20 21:22, Petr Mladek wrote:
> > On Tue 2015-05-19 14:57:46, Petr Mladek wrote:
> >> On Tue 2015-05-19 09:08:45, Wang Long wrote:
> >>> This is my backport patch series to Fix the problem(backport to 3.10):
> >>> "
> >>> When trigger_all_cpu_backtrace() is called on x86, it will trigger an
> >>> NMI on each CPU and call show_regs(). But this can lead to a hard lock
> >>> up if the NMI comes in on another printk().
> >>> "
> >>> The solution is described in commit "a9edc88093287183ac934be44f295f183b2c62dd":
> >>> when the NMI triggers, it switches the printk routine for that CPU to call 
> >>> a NMI safe printk function that records the printk in a per_cpu seq_buf 
> >>> descriptor. After all NMIs have finished recording its data, the trace_
> >>> seqs are printed in a safe context.
> >>>
> >>> The solution use "switch printk routine" and "seq_buf" infrastructures, but the
> >>> 3.10 stable have no both of them.
> >>>
> >>> The patch 1-13 backport the "seq_buf" infrastructures. in detail, patch 1, 2
> >>> and 6 only backport "seq_buf" related code.
> >>>
> >>> The patch 14-15 backport the "switch printk routine".
> >>>
> >>> The patch 16-17 is the patch to print all cpu stacks from NMI safely
> >>>
> >>> as discussed in https://lkml.org/lkml/2015/5/13/497, in 3.10 stable, this is 
> >>> the only way to solve the problem and the backport code is a bit more.
> >>>
> >>> v1 -> v2:
> >>>  * fix the indent error.
> >>>  * rebase on 3.10.79
> >>>
> >>> Any thoughts?
> >>
> >> Please, wait with the integration. I am testing it with a storm of
> >> sysrq requests:
> >>
> >>     $> while true ; do echo l >/proc/sysrq-trigger ; done
> >>
> >> with iptables enabled:
> >>
> >>     $> iptables -A INPUT -j LOG --log-prefix "incomming packet:"
> >>
> >> and storm of pings from other machine:
> >>
> >>     $> ping -f <patched-host>
> >>
> >>
> >> The machine somehow freezes. It does not make sense. I am trying to investigate.
> > 
> > OK, it seems that the machine freezes because there are still few
> > messages printed in the NMI context, e.g.:
> > 
> > [ 3080.286277] Uhhuh. NMI received for unknown reason 3d on CPU 12.
> > [ 3637.939276] Uhhuh. NMI received for unknown reason 2d on CPU 13.
> > 
> > I am not exactly sure why I get them on the test machine. But I get
> > such messages from time to time when hammering it by the pings and
> > sysrq-l requests.
> > 
> > I modified vprintk_emit() to do raw_spin_trylock(&logbuf_lock)
> > and do not try to lock console in NMI context. The trylock fails
> > from time to time but it does not longer freeze.
> > 
> > I am going to clean up the vprintk_emit() modification and send it for
> > review.
> > 
> > Anyway, this patch set seems to work as expected. It heavily reduces
> > the risk of NMI/printk-related deadlocks => it is worth having.
> > 
> > Feel free to use the following for the whole patchset (backport):
> > 
> > Reviewed-by: Petr Mladek <pmladek@suse.cz>
> > Tested-by: Petr Mladek <pmladek@suse.cz>
> 
> Hi Greg,
> 
> This patch set is the only way to solve the NMI/printk-related deadlock problems.
> Could you please include them to 3.10 stable?
> 
> Although the code a bit more, most of the code is "seq_buf" infrastructures and
> it does not affect other parts of the kernel.

Yeah, but this is way too much for a -stable kernel.  I suggest that if
a user has this problem, please move to 3.14 or newer kernels, which has
this fixed.  There's too many changes here for me to be confortable
accepting to a -stable kernel, sorry.

greg k-h

      reply	other threads:[~2015-06-29 23:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-19  9:08 [PATCH v2 00/17] [request for stable 3.10 inclusion] x86/nmi: Print all cpu stacks from NMI safely Wang Long
2015-05-19  9:08 ` [PATCH v2 01/17] tracing: Create seq_buf layer in trace_seq Wang Long
2015-05-19  9:08 ` [PATCH v2 02/17] tracing: Convert seq_buf_path() to be like seq_path() Wang Long
2015-05-19  9:08 ` [PATCH v2 03/17] tracing: Convert seq_buf fields to be like seq_file fields Wang Long
2015-05-19  9:08 ` [PATCH v2 04/17] tracing: Add a seq_buf_clear() helper and clear len and readpos in init Wang Long
2015-05-19  9:08 ` [PATCH v2 05/17] seq_buf: Create seq_buf_used() to find out how much was written Wang Long
2015-05-19  9:08 ` [PATCH v2 06/17] tracing: Use trace_seq_used() and seq_buf_used() instead of len Wang Long
2015-05-19  9:08 ` [PATCH v2 07/17] seq_buf: Add seq_buf_can_fit() helper function Wang Long
2015-05-19  9:08 ` [PATCH v2 08/17] tracing: Have seq_buf use full buffer Wang Long
2015-05-19  9:08 ` [PATCH v2 09/17] tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions Wang Long
2015-05-19  9:08 ` [PATCH v2 10/17] seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF Wang Long
2015-05-19  9:08 ` [PATCH v2 11/17] seq_buf: Move the seq_buf code to lib/ Wang Long
2015-05-19  9:08 ` [PATCH v2 12/17] seq_buf: Fix seq_buf_vprintf() truncation Wang Long
2015-05-19  9:08 ` [PATCH v2 13/17] seq_buf: Fix seq_buf_bprintf() truncation Wang Long
2015-05-19  9:08 ` [PATCH v2 14/17] printk: Add per_cpu printk func to allow printk to be diverted Wang Long
2015-05-19  9:09 ` [PATCH v2 15/17] printk/percpu: Define printk_func when printk is not defined Wang Long
2015-05-19  9:09 ` [PATCH v2 16/17] x86/nmi: Perform a safe NMI stack trace on all CPUs Wang Long
2015-05-19  9:09 ` [PATCH v2 17/17] x86/nmi: Fix use of unallocated cpumask_var_t Wang Long
2015-05-19 12:57 ` [PATCH v2 00/17] [request for stable 3.10 inclusion] x86/nmi: Print all cpu stacks from NMI safely Petr Mladek
2015-05-20 13:22   ` Petr Mladek
2015-05-21  1:38     ` long.wanglong
2015-06-29 23:56       ` Greg KH [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150629235612.GB29763@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=dzickus@redhat.com \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=long.wanglong@huawei.com \
    --cc=morgan.wang@huawei.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peifeiyue@huawei.com \
    --cc=pmladek@suse.cz \
    --cc=rostedt@goodmis.org \
    --cc=sasha.levin@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.