All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Russell King <rmk+kernel@arm.linux.org.uk>,
	Daniel Thompson <daniel.thompson@linaro.org>,
	Jiri Kosina <jkosina@suse.com>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Chris Metcalf <cmetcalf@ezchip.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	adi-buildroot-devel@lists.sourceforge.net,
	linux-cris-kernel@axis.com, linux-mips@linux-mips.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org,
	linux-sh@vger.kernel.org, sparclinux@vger.kernel.org,
	Jan Kara <jack@suse.cz>, Ralf Baechle <ralf@linux-mips.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	David Miller <davem@davemloft.net>
Subject: Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
Date: Thu, 27 Apr 2017 17:28:07 +0200	[thread overview]
Message-ID: <20170427152807.GY3452@pathway.suse.cz> (raw)
In-Reply-To: <20170427103118.56351d30@gandalf.local.home>

On Thu 2017-04-27 10:31:18, Steven Rostedt wrote:
> On Thu, 27 Apr 2017 15:38:19 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > > by the way,
> > > does this `nmi_print_seq' bypass even fix anything for Steven?  
> > 
> > I think that this is the most important question.
> > 
> > Steven, does the patch from
> > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
> > help you to see the debug messages, please?
> 
> You'll have to wait for a bit. The box that I was debugging takes 45
> minutes to reboot. And I don't have much more time to play on it before
> I have to give it back. I already found the bug I was looking for and
> I'm trying not to crash it again (due to the huge bring up time).

I see.

> When I get a chance, I'll see if I can insert a trigger to crash the
> kernel from NMI on another box and see if this patch helps.

I actually tested it here using this hack:

diff --cc lib/nmi_backtrace.c
index d531f85c0c9b,0bc0a3535a8a..000000000000
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
        int cpu = smp_processor_id();
  
        if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 +              if (in_nmi())
 +                      panic("Simulating panic in NMI\n");
+               arch_spin_lock(&lock);
                if (regs && cpu_in_idle(instruction_pointer(regs))) {
                        pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
                                cpu, instruction_pointer(regs));

and triggered by:

   echo  l > /proc/sysrq-trigger

The patch really helped to see much more (all) messages from the ftrace
buffers in NMI mode.

But the test is a bit artifical. The patch might not help when there
is a big printk() activity on the system when the panic() is
triggered. We might wrongly use the small per-CPU buffer when
the logbuf_lock is tested and taken on another CPU at the same time.
It means that it will not always help.

I personally think that the patch might be good enough. I am not sure
if a perfect (more comlpex) solution is worth it.

Best Regards,
Petr

WARNING: multiple messages have this Message-ID (diff)
From: Petr Mladek <pmladek@suse.com>
To: linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
Date: Thu, 27 Apr 2017 15:28:07 +0000	[thread overview]
Message-ID: <20170427152807.GY3452@pathway.suse.cz> (raw)
In-Reply-To: <20170427103118.56351d30@gandalf.local.home>

On Thu 2017-04-27 10:31:18, Steven Rostedt wrote:
> On Thu, 27 Apr 2017 15:38:19 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > > by the way,
> > > does this `nmi_print_seq' bypass even fix anything for Steven?  
> > 
> > I think that this is the most important question.
> > 
> > Steven, does the patch from
> > https://lkml.kernel.org/r/20170420131154.GL3452@pathway.suse.cz
> > help you to see the debug messages, please?
> 
> You'll have to wait for a bit. The box that I was debugging takes 45
> minutes to reboot. And I don't have much more time to play on it before
> I have to give it back. I already found the bug I was looking for and
> I'm trying not to crash it again (due to the huge bring up time).

I see.

> When I get a chance, I'll see if I can insert a trigger to crash the
> kernel from NMI on another box and see if this patch helps.

I actually tested it here using this hack:

diff --cc lib/nmi_backtrace.c
index d531f85c0c9b,0bc0a3535a8a..000000000000
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
        int cpu = smp_processor_id();
  
        if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 +              if (in_nmi())
 +                      panic("Simulating panic in NMI\n");
+               arch_spin_lock(&lock);
                if (regs && cpu_in_idle(instruction_pointer(regs))) {
                        pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
                                cpu, instruction_pointer(regs));

and triggered by:

   echo  l > /proc/sysrq-trigger

The patch really helped to see much more (all) messages from the ftrace
buffers in NMI mode.

But the test is a bit artifical. The patch might not help when there
is a big printk() activity on the system when the panic() is
triggered. We might wrongly use the small per-CPU buffer when
the logbuf_lock is tested and taken on another CPU at the same time.
It means that it will not always help.

I personally think that the patch might be good enough. I am not sure
if a perfect (more comlpex) solution is worth it.

Best Regards,
Petr

WARNING: multiple messages have this Message-ID (diff)
From: pmladek@suse.com (Petr Mladek)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI
Date: Thu, 27 Apr 2017 17:28:07 +0200	[thread overview]
Message-ID: <20170427152807.GY3452@pathway.suse.cz> (raw)
In-Reply-To: <20170427103118.56351d30@gandalf.local.home>

On Thu 2017-04-27 10:31:18, Steven Rostedt wrote:
> On Thu, 27 Apr 2017 15:38:19 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > > by the way,
> > > does this `nmi_print_seq' bypass even fix anything for Steven?  
> > 
> > I think that this is the most important question.
> > 
> > Steven, does the patch from
> > https://lkml.kernel.org/r/20170420131154.GL3452 at pathway.suse.cz
> > help you to see the debug messages, please?
> 
> You'll have to wait for a bit. The box that I was debugging takes 45
> minutes to reboot. And I don't have much more time to play on it before
> I have to give it back. I already found the bug I was looking for and
> I'm trying not to crash it again (due to the huge bring up time).

I see.

> When I get a chance, I'll see if I can insert a trigger to crash the
> kernel from NMI on another box and see if this patch helps.

I actually tested it here using this hack:

diff --cc lib/nmi_backtrace.c
index d531f85c0c9b,0bc0a3535a8a..000000000000
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@@ -89,8 -90,7 +90,9 @@@ bool nmi_cpu_backtrace(struct pt_regs *
        int cpu = smp_processor_id();
  
        if (cpumask_test_cpu(cpu, to_cpumask(backtrace_mask))) {
 +              if (in_nmi())
 +                      panic("Simulating panic in NMI\n");
+               arch_spin_lock(&lock);
                if (regs && cpu_in_idle(instruction_pointer(regs))) {
                        pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
                                cpu, instruction_pointer(regs));

and triggered by:

   echo  l > /proc/sysrq-trigger

The patch really helped to see much more (all) messages from the ftrace
buffers in NMI mode.

But the test is a bit artifical. The patch might not help when there
is a big printk() activity on the system when the panic() is
triggered. We might wrongly use the small per-CPU buffer when
the logbuf_lock is tested and taken on another CPU at the same time.
It means that it will not always help.

I personally think that the patch might be good enough. I am not sure
if a perfect (more comlpex) solution is worth it.

Best Regards,
Petr

  reply	other threads:[~2017-04-27 15:28 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-21 11:48 [PATCH v5 0/4] Cleaning printk stuff in NMI context Petr Mladek
2016-04-21 11:48 ` Petr Mladek
2016-04-21 11:48 ` Petr Mladek
2016-04-21 11:48 ` [PATCH v5 1/4] printk/nmi: generic solution for safe printk in NMI Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-27  9:31   ` Russell King - ARM Linux
2016-04-27  9:31     ` Russell King - ARM Linux
2016-04-27  9:31     ` Russell King - ARM Linux
2017-04-19 17:13   ` Steven Rostedt
2017-04-19 17:13     ` Steven Rostedt
2017-04-19 17:13     ` Steven Rostedt
2017-04-19 17:21     ` Peter Zijlstra
2017-04-19 17:21       ` Peter Zijlstra
2017-04-19 17:21       ` Peter Zijlstra
2017-04-20  3:31     ` Sergey Senozhatsky
2017-04-20  3:31       ` Sergey Senozhatsky
2017-04-20  3:31       ` Sergey Senozhatsky
2017-04-20 13:11       ` Petr Mladek
2017-04-20 13:11         ` Petr Mladek
2017-04-20 13:11         ` Petr Mladek
2017-04-20 13:11         ` Petr Mladek
2017-04-21  1:57         ` Sergey Senozhatsky
2017-04-21  1:57           ` Sergey Senozhatsky
2017-04-21  1:57           ` Sergey Senozhatsky
2017-04-21 12:06           ` Petr Mladek
2017-04-21 12:06             ` Petr Mladek
2017-04-21 12:06             ` Petr Mladek
2017-04-24  2:17             ` Sergey Senozhatsky
2017-04-24  2:17               ` Sergey Senozhatsky
2017-04-24  2:17               ` Sergey Senozhatsky
2017-04-27 13:38               ` Petr Mladek
2017-04-27 13:38                 ` Petr Mladek
2017-04-27 13:38                 ` Petr Mladek
2017-04-27 14:31                 ` Steven Rostedt
2017-04-27 14:31                   ` Steven Rostedt
2017-04-27 14:31                   ` Steven Rostedt
2017-04-27 15:28                   ` Petr Mladek [this message]
2017-04-27 15:28                     ` Petr Mladek
2017-04-27 15:28                     ` Petr Mladek
2017-04-27 15:42                     ` Steven Rostedt
2017-04-27 15:42                       ` Steven Rostedt
2017-04-27 15:42                       ` Steven Rostedt
2017-04-28  9:02                 ` Peter Zijlstra
2017-04-28  9:02                   ` Peter Zijlstra
2017-04-28  9:02                   ` Peter Zijlstra
2017-04-28 13:44                   ` Petr Mladek
2017-04-28 13:44                     ` Petr Mladek
2017-04-28 13:44                     ` Petr Mladek
2017-04-28 13:58                     ` Peter Zijlstra
2017-04-28 13:58                       ` Peter Zijlstra
2017-04-28 13:58                       ` Peter Zijlstra
2017-04-28 14:47                       ` Steven Rostedt
2017-04-28 14:47                         ` Steven Rostedt
2017-04-28 14:47                         ` Steven Rostedt
2017-04-27 16:14         ` Steven Rostedt
2017-04-27 16:14           ` Steven Rostedt
2017-04-27 16:14           ` Steven Rostedt
2017-04-28  1:35           ` Sergey Senozhatsky
2017-04-28  1:35             ` Sergey Senozhatsky
2017-04-28  1:35             ` Sergey Senozhatsky
2017-04-28 12:57             ` Petr Mladek
2017-04-28 12:57               ` Petr Mladek
2017-04-28 12:57               ` Petr Mladek
2017-04-28 14:16               ` Steven Rostedt
2017-04-28 14:16                 ` Steven Rostedt
2017-04-28 14:16                 ` Steven Rostedt
2017-04-28  1:25         ` Sergey Senozhatsky
2017-04-28  1:25           ` Sergey Senozhatsky
2017-04-28  1:25           ` Sergey Senozhatsky
2017-04-28 12:38           ` Petr Mladek
2017-04-28 12:38             ` Petr Mladek
2017-04-28 12:38             ` Petr Mladek
2016-04-21 11:48 ` [PATCH v5 2/4] printk/nmi: warn when some message has been lost in NMI context Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-27  9:34   ` Russell King - ARM Linux
2016-04-27  9:34     ` Russell King - ARM Linux
2016-04-27  9:34     ` Russell King - ARM Linux
2016-04-21 11:48 ` [PATCH v5 3/4] printk/nmi: increase the size of NMI buffer and make it configurable Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48 ` [PATCH v5 4/4] printk/nmi: flush NMI messages on the system panic Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-21 11:48   ` Petr Mladek
2016-04-23  3:49   ` Sergey Senozhatsky
2016-04-23  3:49     ` Sergey Senozhatsky
2016-04-23  3:49     ` Sergey Senozhatsky
2016-04-26 14:21     ` Petr Mladek
2016-04-26 14:21       ` Petr Mladek
2016-04-26 14:21       ` Petr Mladek
2016-04-27  0:34       ` Sergey Senozhatsky
2016-04-27  0:34         ` Sergey Senozhatsky
2016-04-27  0:34         ` Sergey Senozhatsky
2016-04-27  0:36 ` [PATCH v5 0/4] Cleaning printk stuff in NMI context Sergey Senozhatsky
2016-04-27  0:36   ` Sergey Senozhatsky
2016-04-27  0:36   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170427152807.GY3452@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=adi-buildroot-devel@lists.sourceforge.net \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=cmetcalf@ezchip.com \
    --cc=daniel.thompson@linaro.org \
    --cc=davem@davemloft.net \
    --cc=jack@suse.cz \
    --cc=jkosina@suse.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-cris-kernel@axis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ralf@linux-mips.org \
    --cc=rmk+kernel@arm.linux.org.uk \
    --cc=rostedt@goodmis.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.