All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Thompson <daniel.thompson@linaro.org>
To: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@arm.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, Aaron Tomlin <atomlin@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Daniel Lezcano <daniel.lezcano@linaro.org>
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus
Date: Mon, 7 Mar 2016 15:26:37 +0700	[thread overview]
Message-ID: <56DD3B3D.1010404@linaro.org> (raw)
In-Reply-To: <56D5BCE6.3010300@mellanox.com>

On 01/03/16 23:01, Chris Metcalf wrote:
> (+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)
>
> On 03/01/2016 09:23 AM, Daniel Thompson wrote:
>> On 29/02/16 21:40, Chris Metcalf wrote:
>>> When doing an nmi backtrace of many cores, and most of them are idle,
>>> the output is a little overwhelming and very uninformative. Suppress
>>> messages for cpus that are idling when they are interrupted and
>>> just emit one line, "NMI backtrace for N skipped: idle".
>>
>> I can see this makes the logs more attractive but this is code for
>> emergency situations.
>>
>> The idle task is responsible for certain power management activities.
>> How can you be sure the system is wedged because of bugs in that code?
>
> It's a fair point, but as core count increases, you really run the risk
> of losing the valuable data in a sea of data that isn't.  For example,
> for the architecture I maintain, we have the TILE-Gx72, which is a
> 72-core chip.  If each core's register dump and backtrace is 40 lines,
> we're up to around 3,000 lines of console output. Filtering that down by
> a factor of 10x or more (if only a handful of cores happen to be active,
> which is not uncommon) is a substantial usability improvement.

No objections to your use case. The output feels very verbose even with 
"only" eight cores.


> That said, it's true that the original solution I offered (examining
> just is_idle_task() plus interrupt nesting) is imprecise.  It is
> relatively straightforward to add a bit of per-cpu state that is set at
> the same moment we currently do stop/start_critical_timings(), which
> would indicate much more specifically that the cpu was running the
> idling code itself, and not anything more complex.  In that case if the
> flag was set, you would know you were either sitting on a
> processor-specific idle instruction in arch_cpu_idle(), or else polling
> one or two memory locations in a tight loop in cpu_idle_poll(), which
> presumably would offer sufficient precision to feel safe.
>
> Here's an alternative version of the patch which incorporates this
> idea.  Do you think this is preferable?  Thanks!

I prefer the approach taken by the new patch although I think the 
implementation might be buggy...


> commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
> Author: Chris Metcalf <cmetcalf@ezchip.com>
> Date:   Mon Feb 29 11:56:32 2016 -0500
>
>      nmi_backtrace: generate one-line reports for idle cpus
>      When doing an nmi backtrace of many cores, and most of them are idle,
>      the output is a little overwhelming and very uninformative.  Suppress
>      messages for cpus that are idling when they are interrupted and
>      just emit one line, "NMI backtrace for N skipped: idle".
>      Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
>
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 786ad32631a6..b8c3c4cf88ad 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct
> cpuidle_driver *drv,
>   /* kernel/sched/idle.c */
>   extern void sched_idle_set_state(struct cpuidle_state *idle_state);
>   extern void default_idle_call(void);
> +extern bool in_cpu_idle(void);
>
>   #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
>   void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
> atomic_t *a);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 544a7133cbd1..9aff315f278b 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>   __setup("hlt", cpu_idle_nopoll_setup);
>   #endif
>
> +static DEFINE_PER_CPU(bool, cpu_idling);
> +
> +/* Was the cpu was in the low-level idle code when interrupted? */
> +bool in_cpu_idle(void)
> +{
> +    return this_cpu_read(cpu_idling);

I think we continue to need the code to identify a core that is running 
an interrupt handler. Interrupts are not masked at the point we set 
cpu_idling to false meaning we can easily be preempted before we clear 
the flag.


> +}
> +
>   static inline int cpu_idle_poll(void)
>   {
>       rcu_idle_enter();
>       trace_cpu_idle_rcuidle(0, smp_processor_id());
>       local_irq_enable();
>       stop_critical_timings();
> +    this_cpu_write(cpu_idling, true);
>       while (!tif_need_resched() &&
>           (cpu_idle_force_poll || tick_check_broadcast_expired()))
>           cpu_relax();
> +    this_cpu_write(cpu_idling, false);
>       start_critical_timings();
>       trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>       rcu_idle_exit();
> @@ -89,7 +99,9 @@ void default_idle_call(void)
>           local_irq_enable();
>       } else {
>           stop_critical_timings();
> +        this_cpu_write(cpu_idling, true);
>           arch_cpu_idle();
> +        this_cpu_write(cpu_idling, false);
>           start_critical_timings();
>       }
>   }
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index db63ac75eba0..75b5eacaa5d3 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -17,6 +17,7 @@
>   #include <linux/kprobes.h>
>   #include <linux/nmi.h>
>   #include <linux/seq_buf.h>
> +#include <linux/cpuidle.h>
>
>   #ifdef arch_trigger_cpumask_backtrace
>   /* For reliability, we're prepared to waste bits here. */
> @@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>
>           /* Replace printk to write into the NMI seq */
>           this_cpu_write(printk_func, nmi_vprintk);
> -        pr_warn("NMI backtrace for cpu %d\n", cpu);
> -        if (regs)
> -            show_regs(regs);
> -        else
> -            dump_stack();
> +        if (in_cpu_idle()) {
> +            pr_warn("NMI backtrace for cpu %d skipped: idle\n",
> +                cpu);
> +        } else {
> +            pr_warn("NMI backtrace for cpu %d\n", cpu);
> +            if (regs)
> +                show_regs(regs);
> +            else
> +                dump_stack();
> +        }
>           this_cpu_write(printk_func, printk_func_save);
>
>           cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>

  reply	other threads:[~2016-03-07  8:26 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-29 21:40 [PATCH 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-02-29 21:40 ` Chris Metcalf
2016-02-29 21:40 ` [PATCH 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-02-29 21:40   ` Chris Metcalf
2016-02-29 21:40 ` [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-03-01 14:23   ` Daniel Thompson
2016-03-01 16:01     ` Chris Metcalf
2016-03-07  8:26       ` Daniel Thompson [this message]
2016-03-07 17:05         ` Chris Metcalf
2016-03-07  9:48       ` Peter Zijlstra
2016-03-07 17:38         ` Chris Metcalf
2016-03-07 20:43           ` Peter Zijlstra
2016-03-16 17:02             ` [PATCH v2 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-03-16 17:02               ` Chris Metcalf
2016-03-16 17:02               ` Chris Metcalf
2016-03-16 17:02               ` Chris Metcalf
2016-03-16 17:02               ` [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-03-16 17:02                 ` Chris Metcalf
2016-03-17 19:36                 ` Peter Zijlstra
2016-03-17 19:36                   ` Peter Zijlstra
2016-03-17 22:31                   ` Chris Metcalf
2016-03-17 22:31                     ` Chris Metcalf
2016-03-17 22:38                     ` Peter Zijlstra
2016-03-17 22:38                       ` Peter Zijlstra
2016-03-17 22:41                       ` Chris Metcalf
2016-03-17 22:41                         ` Chris Metcalf
2016-03-17 23:14                         ` Peter Zijlstra
2016-03-17 23:14                           ` Peter Zijlstra
2016-03-17 22:55                   ` Paul E. McKenney
2016-03-17 22:55                     ` Paul E. McKenney
2016-03-17 23:09                     ` Peter Zijlstra
2016-03-17 23:09                       ` Peter Zijlstra
2016-03-17 23:11                     ` Peter Zijlstra
2016-03-17 23:11                       ` Peter Zijlstra
2016-03-18  0:28                       ` Paul E. McKenney
2016-03-18  0:28                         ` Paul E. McKenney
2016-03-18  0:17                     ` Chris Metcalf
2016-03-18  0:17                       ` Chris Metcalf
2016-03-18  0:33                       ` Paul E. McKenney
2016-03-18  0:33                         ` Paul E. McKenney
2016-03-18  9:40                         ` Daniel Thompson
2016-03-18  9:40                           ` Daniel Thompson
2016-03-18 23:54                           ` Paul E. McKenney
2016-03-18 23:54                             ` Paul E. McKenney
2016-03-16 17:02               ` [PATCH v2 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-03-16 17:02                 ` Chris Metcalf
2016-03-16 17:02               ` [PATCH v2 3/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-16 17:02               ` [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-03-16 17:02                 ` Chris Metcalf
2016-03-16 17:02                 ` Chris Metcalf
2016-03-16 18:46                 ` kbuild test robot
2016-03-16 18:46                   ` kbuild test robot
2016-03-16 18:46                   ` kbuild test robot
2016-03-16 18:46                   ` kbuild test robot
2016-03-21 15:38                 ` Peter Zijlstra
2016-03-21 15:38                   ` Peter Zijlstra
2016-03-21 15:38                   ` Peter Zijlstra
2016-03-21 15:46                   ` Chris Metcalf
2016-03-21 15:46                     ` Chris Metcalf
2016-03-21 15:46                     ` Chris Metcalf
2016-03-21 15:46                     ` Chris Metcalf
2016-03-21 15:42                 ` Peter Zijlstra
2016-03-21 15:42                   ` Peter Zijlstra
2016-03-21 16:15                   ` Chris Metcalf
2016-03-21 16:15                     ` Chris Metcalf
2016-03-21 16:15                     ` Chris Metcalf
2016-03-21 16:32                     ` Peter Zijlstra
2016-03-21 16:32                       ` Peter Zijlstra
2016-03-21 17:12                       ` Chris Metcalf
2016-03-21 17:12                         ` Chris Metcalf
2016-03-21 17:12                         ` Chris Metcalf
2016-03-21 17:12                         ` Chris Metcalf
2016-03-21 17:17                         ` Peter Zijlstra
2016-03-21 17:17                           ` Peter Zijlstra
2016-03-21 16:48                 ` Peter Zijlstra
2016-03-21 16:48                   ` Peter Zijlstra
2016-03-21 21:49                 ` Peter Zijlstra
2016-03-21 21:49                   ` Peter Zijlstra
2016-03-22 17:19                   ` [PATCH v3 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-03-22 17:19                     ` Chris Metcalf
2016-03-22 17:19                     ` Chris Metcalf
2016-03-22 17:19                     ` Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-03-22 17:19                       ` Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-03-22 17:19                       ` Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 3/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-22 17:19                     ` [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-03-22 17:19                       ` Chris Metcalf
2016-03-22 17:19                       ` Chris Metcalf
2016-03-22 17:30                       ` Peter Zijlstra
2016-03-22 17:30                         ` Peter Zijlstra
2016-03-22 22:28                         ` Rafael J. Wysocki
2016-03-22 22:28                           ` Rafael J. Wysocki
2016-03-22 22:31                       ` Rafael J. Wysocki
2016-03-22 22:31                         ` Rafael J. Wysocki
2016-03-22 22:45                         ` Peter Zijlstra
2016-03-22 22:45                           ` Peter Zijlstra
2016-03-23  0:50                           ` Rafael J. Wysocki
2016-03-23  0:50                             ` Rafael J. Wysocki
2016-03-23  7:53                             ` Peter Zijlstra
2016-03-23  7:53                               ` Peter Zijlstra
2016-03-30 17:16                             ` [PATCH v4 0/4] improvements to the nmi_backtrace code Chris Metcalf
2016-03-30 17:16                               ` Chris Metcalf
2016-03-30 17:16                               ` Chris Metcalf
2016-03-30 17:16                               ` Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods Chris Metcalf
2016-03-30 17:16                                 ` Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-03-30 17:16                                 ` Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 3/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-30 17:16                               ` [PATCH v4 4/4] nmi_backtrace: generate one-line reports for idle cpus Chris Metcalf
2016-03-30 17:16                                 ` Chris Metcalf
2016-03-30 17:16                                 ` Chris Metcalf
2016-02-29 21:40 ` [PATCH 3/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI Chris Metcalf
2016-02-29 21:40   ` Chris Metcalf
2016-02-29 21:40 ` [PATCH 4/4] arch/tile: adopt the new nmi_backtrace framework Chris Metcalf
2016-03-01  0:49 ` [PATCH 0/4] improvements to the nmi_backtrace code Andrew Morton
2016-03-01  0:49   ` Andrew Morton
2016-03-01 10:01   ` Petr Mladek
2016-03-01 10:01     ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56DD3B3D.1010404@linaro.org \
    --to=daniel.thompson@linaro.org \
    --cc=akpm@osdl.org \
    --cc=atomlin@redhat.com \
    --cc=cmetcalf@mellanox.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.