linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Thompson <daniel.thompson@linaro.org>
To: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, Aaron Tomlin <atomlin@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Daniel Lezcano <daniel.lezcano@linaro.org>
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus
Date: Mon, 7 Mar 2016 15:26:37 +0700	[thread overview]
Message-ID: <56DD3B3D.1010404@linaro.org> (raw)
In-Reply-To: <56D5BCE6.3010300@mellanox.com>

On 01/03/16 23:01, Chris Metcalf wrote:
> (+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)
>
> On 03/01/2016 09:23 AM, Daniel Thompson wrote:
>> On 29/02/16 21:40, Chris Metcalf wrote:
>>> When doing an nmi backtrace of many cores, and most of them are idle,
>>> the output is a little overwhelming and very uninformative. Suppress
>>> messages for cpus that are idling when they are interrupted and
>>> just emit one line, "NMI backtrace for N skipped: idle".
>>
>> I can see this makes the logs more attractive but this is code for
>> emergency situations.
>>
>> The idle task is responsible for certain power management activities.
>> How can you be sure the system is wedged because of bugs in that code?
>
> It's a fair point, but as core count increases, you really run the risk
> of losing the valuable data in a sea of data that isn't.  For example,
> for the architecture I maintain, we have the TILE-Gx72, which is a
> 72-core chip.  If each core's register dump and backtrace is 40 lines,
> we're up to around 3,000 lines of console output. Filtering that down by
> a factor of 10x or more (if only a handful of cores happen to be active,
> which is not uncommon) is a substantial usability improvement.

No objections to your use case. The output feels very verbose even with 
"only" eight cores.


> That said, it's true that the original solution I offered (examining
> just is_idle_task() plus interrupt nesting) is imprecise.  It is
> relatively straightforward to add a bit of per-cpu state that is set at
> the same moment we currently do stop/start_critical_timings(), which
> would indicate much more specifically that the cpu was running the
> idling code itself, and not anything more complex.  In that case if the
> flag was set, you would know you were either sitting on a
> processor-specific idle instruction in arch_cpu_idle(), or else polling
> one or two memory locations in a tight loop in cpu_idle_poll(), which
> presumably would offer sufficient precision to feel safe.
>
> Here's an alternative version of the patch which incorporates this
> idea.  Do you think this is preferable?  Thanks!

I prefer the approach taken by the new patch although I think the 
implementation might be buggy...


> commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
> Author: Chris Metcalf <cmetcalf@ezchip.com>
> Date:   Mon Feb 29 11:56:32 2016 -0500
>
>      nmi_backtrace: generate one-line reports for idle cpus
>      When doing an nmi backtrace of many cores, and most of them are idle,
>      the output is a little overwhelming and very uninformative.  Suppress
>      messages for cpus that are idling when they are interrupted and
>      just emit one line, "NMI backtrace for N skipped: idle".
>      Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
>
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 786ad32631a6..b8c3c4cf88ad 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct
> cpuidle_driver *drv,
>   /* kernel/sched/idle.c */
>   extern void sched_idle_set_state(struct cpuidle_state *idle_state);
>   extern void default_idle_call(void);
> +extern bool in_cpu_idle(void);
>
>   #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
>   void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
> atomic_t *a);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 544a7133cbd1..9aff315f278b 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>   __setup("hlt", cpu_idle_nopoll_setup);
>   #endif
>
> +static DEFINE_PER_CPU(bool, cpu_idling);
> +
> +/* Was the cpu was in the low-level idle code when interrupted? */
> +bool in_cpu_idle(void)
> +{
> +    return this_cpu_read(cpu_idling);

I think we continue to need the code to identify a core that is running 
an interrupt handler. Interrupts are not masked at the point we set 
cpu_idling to false meaning we can easily be preempted before we clear 
the flag.


> +}
> +
>   static inline int cpu_idle_poll(void)
>   {
>       rcu_idle_enter();
>       trace_cpu_idle_rcuidle(0, smp_processor_id());
>       local_irq_enable();
>       stop_critical_timings();
> +    this_cpu_write(cpu_idling, true);
>       while (!tif_need_resched() &&
>           (cpu_idle_force_poll || tick_check_broadcast_expired()))
>           cpu_relax();
> +    this_cpu_write(cpu_idling, false);
>       start_critical_timings();
>       trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>       rcu_idle_exit();
> @@ -89,7 +99,9 @@ void default_idle_call(void)
>           local_irq_enable();
>       } else {
>           stop_critical_timings();
> +        this_cpu_write(cpu_idling, true);
>           arch_cpu_idle();
> +        this_cpu_write(cpu_idling, false);
>           start_critical_timings();
>       }
>   }
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index db63ac75eba0..75b5eacaa5d3 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -17,6 +17,7 @@
>   #include <linux/kprobes.h>
>   #include <linux/nmi.h>
>   #include <linux/seq_buf.h>
> +#include <linux/cpuidle.h>
>
>   #ifdef arch_trigger_cpumask_backtrace
>   /* For reliability, we're prepared to waste bits here. */
> @@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>
>           /* Replace printk to write into the NMI seq */
>           this_cpu_write(printk_func, nmi_vprintk);
> -        pr_warn("NMI backtrace for cpu %d\n", cpu);
> -        if (regs)
> -            show_regs(regs);
> -        else
> -            dump_stack();
> +        if (in_cpu_idle()) {
> +            pr_warn("NMI backtrace for cpu %d skipped: idle\n",
> +                cpu);
> +        } else {
> +            pr_warn("NMI backtrace for cpu %d\n", cpu);
> +            if (regs)
> +                show_regs(regs);
> +            else
> +                dump_stack();
> +        }
>           this_cpu_write(printk_func, printk_func_save);
>
>           cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>

  reply	other threads:[~2016-03-07  8:26 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-29 21:40 [PATCH 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-02-29 21:40 ` [PATCH 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-02-29 21:40 ` [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-03-01 14:23   ` Daniel Thompson
2016-03-01 16:01     ` Chris Metcalf
2016-03-07  8:26       ` Daniel Thompson [this message]
2016-03-07 17:05         ` Chris Metcalf
2016-03-07  9:48       ` Peter Zijlstra
2016-03-07 17:38         ` Chris Metcalf
2016-03-07 20:43           ` Peter Zijlstra
2016-03-16 17:02             ` [PATCH v2 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-03-16 17:02               ` [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-03-17 19:36                 ` Peter Zijlstra
2016-03-17 22:31                   ` Chris Metcalf
2016-03-17 22:38                     ` Peter Zijlstra
2016-03-17 22:41                       ` Chris Metcalf
2016-03-17 23:14                         ` Peter Zijlstra
2016-03-17 22:55                   ` Paul E. McKenney
2016-03-17 23:09                     ` Peter Zijlstra
2016-03-17 23:11                     ` Peter Zijlstra
2016-03-18  0:28                       ` Paul E. McKenney
2016-03-18  0:17                     ` Chris Metcalf
2016-03-18  0:33                       ` Paul E. McKenney
2016-03-18  9:40                         ` Daniel Thompson
2016-03-18 23:54                           ` Paul E. McKenney
2016-03-16 17:02               ` [PATCH v2 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-03-16 17:02               ` [PATCH v2 3/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-16 17:02               ` [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-03-16 18:46                 ` kbuild test robot
2016-03-21 15:38                 ` Peter Zijlstra
2016-03-21 15:46                   ` Chris Metcalf
2016-03-21 15:42                 ` Peter Zijlstra
2016-03-21 16:15                   ` Chris Metcalf
2016-03-21 16:32                     ` Peter Zijlstra
2016-03-21 17:12                       ` Chris Metcalf
2016-03-21 17:17                         ` Peter Zijlstra
2016-03-21 16:48                 ` Peter Zijlstra
2016-03-21 21:49                 ` Peter Zijlstra
2016-03-22 17:19                   ` [PATCH v3 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 3/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-03-22 17:30                       ` Peter Zijlstra
2016-03-22 22:28                         ` Rafael J. Wysocki
2016-03-22 22:31                       ` Rafael J. Wysocki
2016-03-22 22:45                         ` Peter Zijlstra
2016-03-23  0:50                           ` Rafael J. Wysocki
2016-03-23  7:53                             ` Peter Zijlstra
2016-03-30 17:16                             ` [PATCH v4 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 3/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-02-29 21:40 ` [PATCH 3/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-02-29 21:40 ` [PATCH 4/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-01  0:49 ` [PATCH 0/4] improvements to the nmi_backtrace code Andrew Morton
2016-03-01 10:01   ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56DD3B3D.1010404@linaro.org \
    --to=daniel.thompson@linaro.org \
    --cc=akpm@osdl.org \
    --cc=atomlin@redhat.com \
    --cc=cmetcalf@mellanox.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).