public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 V4] Print traces on softlockup
@ 2014-04-23 20:40 Don Zickus
  2014-04-23 20:40 ` [PATCH 1/2 v4] nmi: Provide the option to issue an NMI back trace to every cpu but current Don Zickus
  2014-04-23 20:40 ` [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection Don Zickus
  0 siblings, 2 replies; 6+ messages in thread
From: Don Zickus @ 2014-04-23 20:40 UTC (permalink / raw)
  To: LKML; +Cc: akpm, x86, davem, sparclinux, mguzik, Don Zickus

Added more patches to handle the 'uniprocessor' panic case by sending NMIs
to every cpu but self.  Only affects x86, sparc.

V4: roll x86 and sparc patches into patch1
    add CONFIG_SMP in various places based on Andrew's suggestion
V3: wrap x86 code with get/put_cpu based on Oleg's suggestions
V2: expand from one patch to 4

Aaron Tomlin (4):
  nmi: Provide the option to issue an NMI back trace to every cpu but
    current
  x86, nmi: Add more flexible NMI back trace support
  sparc64, nmi: Add more flexible NMI back trace support
  watchdog: Printing traces for all cpus on lockup detection

 Documentation/kernel-parameters.txt |    5 +++++
 Documentation/sysctl/kernel.txt     |   17 +++++++++++++++++
 arch/sparc/include/asm/irq_64.h     |    2 +-
 arch/sparc/kernel/process_64.c      |   14 +++++++++-----
 arch/x86/include/asm/irq.h          |    2 +-
 arch/x86/kernel/apic/hw_nmi.c       |   17 +++++++++++++----
 include/linux/nmi.h                 |   12 +++++++++++-
 kernel/sysctl.c                     |    9 +++++++++
 kernel/watchdog.c                   |   32 ++++++++++++++++++++++++++++++++
 9 files changed, 98 insertions(+), 12 deletions(-)

Aaron Tomlin (2):
  nmi: Provide the option to issue an NMI back trace to every cpu but
    current
  watchdog: Printing traces for all cpus on lockup detection

 Documentation/kernel-parameters.txt |    5 +++++
 Documentation/sysctl/kernel.txt     |   17 +++++++++++++++++
 arch/sparc/include/asm/irq_64.h     |    2 +-
 arch/sparc/kernel/process_64.c      |   14 +++++++++-----
 arch/x86/include/asm/irq.h          |    2 +-
 arch/x86/kernel/apic/hw_nmi.c       |   17 +++++++++++++----
 include/linux/nmi.h                 |   14 +++++++++++++-
 kernel/sysctl.c                     |   11 +++++++++++
 kernel/watchdog.c                   |   34 ++++++++++++++++++++++++++++++++++
 9 files changed, 104 insertions(+), 12 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2 v4] nmi: Provide the option to issue an NMI back trace to every cpu but current
  2014-04-23 20:40 [PATCH 0/2 V4] Print traces on softlockup Don Zickus
@ 2014-04-23 20:40 ` Don Zickus
  2014-04-23 20:40 ` [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection Don Zickus
  1 sibling, 0 replies; 6+ messages in thread
From: Don Zickus @ 2014-04-23 20:40 UTC (permalink / raw)
  To: LKML; +Cc: akpm, x86, davem, sparclinux, mguzik, Aaron Tomlin, Don Zickus

From: Aaron Tomlin <atomlin@redhat.com>

Some times it is preferred not to use the
trigger_all_cpu_backtrace() routine when one wants
to avoid capturing a back trace for current.
For instance if one was previously captured
recently.

This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers
the flexibility to issue an NMI to every cpu but
current and capture a back trace accordingly.

Patched x86 and sparc to support new routine.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
[Added stub in #else clause]
[Don't print message in single processor case,
 wrap with get/put_cpu based on Oleg's suggestion]
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 arch/sparc/include/asm/irq_64.h |    2 +-
 arch/sparc/kernel/process_64.c  |   14 +++++++++-----
 arch/x86/include/asm/irq.h      |    2 +-
 arch/x86/kernel/apic/hw_nmi.c   |   17 +++++++++++++----
 include/linux/nmi.h             |   11 ++++++++++-
 5 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/arch/sparc/include/asm/irq_64.h b/arch/sparc/include/asm/irq_64.h
index abf6afe..4f072b9 100644
--- a/arch/sparc/include/asm/irq_64.h
+++ b/arch/sparc/include/asm/irq_64.h
@@ -89,7 +89,7 @@ static inline unsigned long get_softint(void)
 	return retval;
 }
 
-void arch_trigger_all_cpu_backtrace(void);
+void arch_trigger_all_cpu_backtrace(bool);
 #define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
 
 extern void *hardirq_stack[NR_CPUS];
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 32a280e..3d61b98 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -237,7 +237,7 @@ static void __global_reg_poll(struct global_reg_snapshot *gp)
 	}
 }
 
-void arch_trigger_all_cpu_backtrace(void)
+void arch_trigger_all_cpu_backtrace(bool include_self)
 {
 	struct thread_info *tp = current_thread_info();
 	struct pt_regs *regs = get_irq_regs();
@@ -249,15 +249,19 @@ void arch_trigger_all_cpu_backtrace(void)
 
 	spin_lock_irqsave(&global_cpu_snapshot_lock, flags);
 
-	memset(global_cpu_snapshot, 0, sizeof(global_cpu_snapshot));
-
 	this_cpu = raw_smp_processor_id();
 
-	__global_reg_self(tp, regs, this_cpu);
+	memset(global_cpu_snapshot, 0, sizeof(global_cpu_snapshot));
+
+	if (include_self)
+		__global_reg_self(tp, regs, this_cpu);
 
 	smp_fetch_global_regs();
 
 	for_each_online_cpu(cpu) {
+		if (!include_self && cpu == this_cpu)
+			continue;
+
 		struct global_reg_snapshot *gp = &global_cpu_snapshot[cpu].reg;
 
 		__global_reg_poll(gp);
@@ -290,7 +294,7 @@ void arch_trigger_all_cpu_backtrace(void)
 
 static void sysrq_handle_globreg(int key)
 {
-	arch_trigger_all_cpu_backtrace();
+	arch_trigger_all_cpu_backtrace(true);
 }
 
 static struct sysrq_key_op sparc_globalreg_op = {
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index cb6cfcd..a80cbb8 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -43,7 +43,7 @@ extern int vector_used_by_percpu_irq(unsigned int vector);
 extern void init_ISA_irqs(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_all_cpu_backtrace(void);
+void arch_trigger_all_cpu_backtrace(bool);
 #define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
 #endif
 
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index a698d71..1400d72 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -33,31 +33,40 @@ static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
 /* "in progress" flag of arch_trigger_all_cpu_backtrace */
 static unsigned long backtrace_flag;
 
-void arch_trigger_all_cpu_backtrace(void)
+void arch_trigger_all_cpu_backtrace(bool include_self)
 {
 	int i;
+	int cpu = get_cpu();
 
-	if (test_and_set_bit(0, &backtrace_flag))
+	if (test_and_set_bit(0, &backtrace_flag)) {
 		/*
 		 * If there is already a trigger_all_cpu_backtrace() in progress
 		 * (backtrace_flag == 1), don't output double cpu dump infos.
 		 */
+		put_cpu();
 		return;
+	}
 
 	cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
+	if (!include_self)
+		cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
 
-	printk(KERN_INFO "sending NMI to all CPUs:\n");
-	apic->send_IPI_all(NMI_VECTOR);
+	if (!cpumask_empty(to_cpumask(backtrace_mask))) {
+		pr_info("sending NMI to %s CPUs:\n", (include_self ? "all" : "other"));
+		apic->send_IPI_mask(to_cpumask(backtrace_mask), NMI_VECTOR);
+	}
 
 	/* Wait for up to 10 seconds for all CPUs to do the backtrace */
 	for (i = 0; i < 10 * 1000; i++) {
 		if (cpumask_empty(to_cpumask(backtrace_mask)))
 			break;
 		mdelay(1);
+		touch_softlockup_watchdog();
 	}
 
 	clear_bit(0, &backtrace_flag);
 	smp_mb__after_clear_bit();
+	put_cpu();
 }
 
 static int __kprobes
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 6a45fb5..a17ab63 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -32,15 +32,24 @@ static inline void touch_nmi_watchdog(void)
 #ifdef arch_trigger_all_cpu_backtrace
 static inline bool trigger_all_cpu_backtrace(void)
 {
-	arch_trigger_all_cpu_backtrace();
+	arch_trigger_all_cpu_backtrace(true);
 
 	return true;
 }
+static inline bool trigger_allbutself_cpu_backtrace(void)
+{
+	arch_trigger_all_cpu_backtrace(false);
+	return true;
+}
 #else
 static inline bool trigger_all_cpu_backtrace(void)
 {
 	return false;
 }
+static inline bool trigger_allbutself_cpu_backtrace(void)
+{
+	return false;
+}
 #endif
 
 #ifdef CONFIG_LOCKUP_DETECTOR
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection
  2014-04-23 20:40 [PATCH 0/2 V4] Print traces on softlockup Don Zickus
  2014-04-23 20:40 ` [PATCH 1/2 v4] nmi: Provide the option to issue an NMI back trace to every cpu but current Don Zickus
@ 2014-04-23 20:40 ` Don Zickus
  2014-04-23 21:14   ` Andrew Morton
  1 sibling, 1 reply; 6+ messages in thread
From: Don Zickus @ 2014-04-23 20:40 UTC (permalink / raw)
  To: LKML; +Cc: akpm, x86, davem, sparclinux, mguzik, Aaron Tomlin, Don Zickus

From: Aaron Tomlin <atomlin@redhat.com>

A 'softlockup' is defined as a bug that causes the kernel to
loop in kernel mode for more than a predefined period to
time, without giving other tasks a chance to run.

Currently, upon detection of this condition by the per-cpu
watchdog task, debug information (including a stack trace)
is sent to the system log.

On some occasions, we have observed that the "victim" rather
than the actual "culprit" (i.e. the owner/holder of the
contended resource) is reported to the user.
Often this information has proven to be insufficient to
assist debugging efforts.

To avoid loss of useful debug information, for architectures
which support NMI, this patch makes it possible to improve
soft lockup reporting. This is accomplished by issuing an
NMI to each cpu to obtain a stack trace.

If NMI is not supported we just revert back to the old method.
A sysctl and boot-time parameter is available to toggle this
feature.

Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
[added CONFIG_SMP in certain areas]
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 Documentation/kernel-parameters.txt |    5 +++++
 Documentation/sysctl/kernel.txt     |   17 +++++++++++++++++
 include/linux/nmi.h                 |    3 +++
 kernel/sysctl.c                     |   11 +++++++++++
 kernel/watchdog.c                   |   34 ++++++++++++++++++++++++++++++++++
 5 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7116fda..80f2a21 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3047,6 +3047,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			[KNL] Should the soft-lockup detector generate panics.
 			Format: <integer>
 
+	softlockup_all_cpu_backtrace=
+			[KNL] Should the soft-lockup detector generate
+			backtraces on all cpus.
+			Format: <integer>
+
 	sonypi.*=	[HW] Sony Programmable I/O Control Device driver
 			See Documentation/laptops/sonypi.txt
 
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index e55124e..b6873b2 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -75,6 +75,7 @@ show up in /proc/sys/kernel:
 - shmall
 - shmmax                      [ sysv ipc ]
 - shmmni
+- softlockup_all_cpu_backtrace
 - stop-a                      [ SPARC only ]
 - sysrq                       ==> Documentation/sysrq.txt
 - tainted
@@ -768,6 +769,22 @@ without users and with a dead originative process will be destroyed.
 
 ==============================================================
 
+softlockup_all_cpu_backtrace:
+
+This value controls the soft lockup detector thread's behavior
+when a soft lockup condition is detected as to whether or not
+to gather further debug information. If enabled, each cpu will
+be issued an NMI and instructed to capture stack trace.
+
+This feature is only applicable for architectures which support
+NMI.
+
+0: do nothing. This is the default behavior.
+
+1: on detection capture more debug information.
+
+==============================================================
+
 tainted:
 
 Non-zero if the kernel has been tainted.  Numeric values, which
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index a17ab63..961b177 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -57,6 +57,9 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *);
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
 extern int watchdog_user_enabled;
 extern int watchdog_thresh;
+#ifdef CONFIG_SMP
+extern int sysctl_softlockup_all_cpu_backtrace;
+#endif
 struct ctl_table;
 extern int proc_dowatchdog(struct ctl_table *, int ,
 			   void __user *, size_t *, loff_t *);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 49e13e1..caae52b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -854,6 +854,17 @@ static struct ctl_table kern_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one,
 	},
+#ifdef CONFIG_SMP
+	{
+		.procname	= "softlockup_all_cpu_backtrace",
+		.data		= &sysctl_softlockup_all_cpu_backtrace,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif /* CONFIG_SMP */
 	{
 		.procname       = "nmi_watchdog",
 		.data           = &watchdog_user_enabled,
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4431610..d9ad681 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -31,6 +31,7 @@
 
 int watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
+int __read_mostly sysctl_softlockup_all_cpu_backtrace;
 static int __read_mostly watchdog_running;
 static u64 __read_mostly sample_period;
 
@@ -47,6 +48,7 @@ static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 #endif
+static unsigned long soft_lockup_nmi_warn;
 
 /* boot commands */
 /*
@@ -95,6 +97,15 @@ static int __init nosoftlockup_setup(char *str)
 }
 __setup("nosoftlockup", nosoftlockup_setup);
 /*  */
+#ifdef CONFIG_SMP
+static int __init softlockup_all_cpu_backtrace_setup(char *str)
+{
+	sysctl_softlockup_all_cpu_backtrace =
+		!!simple_strtol(str, NULL, 0);
+	return 1;
+}
+__setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
+#endif
 
 /*
  * Hard-lockup warnings should be triggered after just a few seconds. Soft-
@@ -267,6 +278,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
 	struct pt_regs *regs = get_irq_regs();
 	int duration;
+	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
 
 	/* kick the hardlockup detector */
 	watchdog_interrupt_count();
@@ -313,6 +325,17 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		if (__this_cpu_read(soft_watchdog_warn) == true)
 			return HRTIMER_RESTART;
 
+		if (softlockup_all_cpu_backtrace) {
+			/* Prevent multiple soft-lockup reports if one cpu is already
+			 * engaged in dumping cpu back traces
+			 */
+			if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {
+				/* Someone else will report us. Let's give up */
+				__this_cpu_write(soft_watchdog_warn, true);
+				return HRTIMER_RESTART;
+			}
+		}
+
 		printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
 			smp_processor_id(), duration,
 			current->comm, task_pid_nr(current));
@@ -323,6 +346,17 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		else
 			dump_stack();
 
+		if (softlockup_all_cpu_backtrace) {
+			/* Avoid generating two back traces for current
+			 * given that one is already made above
+			 */
+			trigger_allbutself_cpu_backtrace();
+
+			clear_bit(0, &soft_lockup_nmi_warn);
+			/* Barrier to sync with other cpus */
+			smp_mb__after_clear_bit();
+		}
+
 		if (softlockup_panic)
 			panic("softlockup: hung tasks");
 		__this_cpu_write(soft_watchdog_warn, true);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection
  2014-04-23 20:40 ` [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection Don Zickus
@ 2014-04-23 21:14   ` Andrew Morton
  2014-04-24 13:48     ` Don Zickus
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2014-04-23 21:14 UTC (permalink / raw)
  To: Don Zickus; +Cc: LKML, x86, davem, sparclinux, mguzik, Aaron Tomlin

On Wed, 23 Apr 2014 16:40:05 -0400 Don Zickus <dzickus@redhat.com> wrote:

> From: Aaron Tomlin <atomlin@redhat.com>
> 
> A 'softlockup' is defined as a bug that causes the kernel to
> loop in kernel mode for more than a predefined period to
> time, without giving other tasks a chance to run.
> 
> Currently, upon detection of this condition by the per-cpu
> watchdog task, debug information (including a stack trace)
> is sent to the system log.
> 
> On some occasions, we have observed that the "victim" rather
> than the actual "culprit" (i.e. the owner/holder of the
> contended resource) is reported to the user.
> Often this information has proven to be insufficient to
> assist debugging efforts.
> 
> To avoid loss of useful debug information, for architectures
> which support NMI, this patch makes it possible to improve
> soft lockup reporting. This is accomplished by issuing an
> NMI to each cpu to obtain a stack trace.
> 
> If NMI is not supported we just revert back to the old method.
> A sysctl and boot-time parameter is available to toggle this
> feature.
> 
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -57,6 +57,9 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *);
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
>  extern int watchdog_user_enabled;
>  extern int watchdog_thresh;
> +#ifdef CONFIG_SMP
> +extern int sysctl_softlockup_all_cpu_backtrace;
> +#endif

The ifdefs aren't really needed here.  If we omit them then error
reporting happens at link time rather than at compile time, but that's
a small price to pay for cleaning up the code.

> +		if (softlockup_all_cpu_backtrace) {
> +			/* Prevent multiple soft-lockup reports if one cpu is already
> +			 * engaged in dumping cpu back traces
> +			 */
> +			if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {
> +				/* Someone else will report us. Let's give up */
> +				__this_cpu_write(soft_watchdog_warn, true);
> +				return HRTIMER_RESTART;
> +			}
> +		}

You missed my suggestion here.

   text    data     bss     dec     hex filename
   1519     524      24    2067     813 kernel/watchdog.o-before
   1471     520      16    2007     7d7 kernel/watchdog.o-after


--- a/include/linux/nmi.h~watchdog-printing-traces-for-all-cpus-on-lockup-detection-fix
+++ a/include/linux/nmi.h
@@ -57,9 +57,7 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
 extern int watchdog_user_enabled;
 extern int watchdog_thresh;
-#ifdef CONFIG_SMP
 extern int sysctl_softlockup_all_cpu_backtrace;
-#endif
 struct ctl_table;
 extern int proc_dowatchdog(struct ctl_table *, int ,
 			   void __user *, size_t *, loff_t *);
--- a/kernel/watchdog.c~watchdog-printing-traces-for-all-cpus-on-lockup-detection-fix
+++ a/kernel/watchdog.c
@@ -31,7 +31,12 @@
 
 int watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
+#ifdef CONFIG_SMP
 int __read_mostly sysctl_softlockup_all_cpu_backtrace;
+#else
+#define sysctl_softlockup_all_cpu_backtrace 0
+#endif
+
 static int __read_mostly watchdog_running;
 static u64 __read_mostly sample_period;
 
_


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection
  2014-04-23 21:14   ` Andrew Morton
@ 2014-04-24 13:48     ` Don Zickus
  2014-04-24 13:50       ` Don Zickus
  0 siblings, 1 reply; 6+ messages in thread
From: Don Zickus @ 2014-04-24 13:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, x86, davem, sparclinux, mguzik, Aaron Tomlin

On Wed, Apr 23, 2014 at 02:14:07PM -0700, Andrew Morton wrote:
> On Wed, 23 Apr 2014 16:40:05 -0400 Don Zickus <dzickus@redhat.com> wrote:
> 
> > From: Aaron Tomlin <atomlin@redhat.com>
> > 
> > A 'softlockup' is defined as a bug that causes the kernel to
> > loop in kernel mode for more than a predefined period to
> > time, without giving other tasks a chance to run.
> > 
> > Currently, upon detection of this condition by the per-cpu
> > watchdog task, debug information (including a stack trace)
> > is sent to the system log.
> > 
> > On some occasions, we have observed that the "victim" rather
> > than the actual "culprit" (i.e. the owner/holder of the
> > contended resource) is reported to the user.
> > Often this information has proven to be insufficient to
> > assist debugging efforts.
> > 
> > To avoid loss of useful debug information, for architectures
> > which support NMI, this patch makes it possible to improve
> > soft lockup reporting. This is accomplished by issuing an
> > NMI to each cpu to obtain a stack trace.
> > 
> > If NMI is not supported we just revert back to the old method.
> > A sysctl and boot-time parameter is available to toggle this
> > feature.
> > 
> > --- a/include/linux/nmi.h
> > +++ b/include/linux/nmi.h
> > @@ -57,6 +57,9 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *);
> >  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> >  extern int watchdog_user_enabled;
> >  extern int watchdog_thresh;
> > +#ifdef CONFIG_SMP
> > +extern int sysctl_softlockup_all_cpu_backtrace;
> > +#endif
> 
> The ifdefs aren't really needed here.  If we omit them then error
> reporting happens at link time rather than at compile time, but that's
> a small price to pay for cleaning up the code.
> 
> > +		if (softlockup_all_cpu_backtrace) {
> > +			/* Prevent multiple soft-lockup reports if one cpu is already
> > +			 * engaged in dumping cpu back traces
> > +			 */
> > +			if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {
> > +				/* Someone else will report us. Let's give up */
> > +				__this_cpu_write(soft_watchdog_warn, true);
> > +				return HRTIMER_RESTART;
> > +			}
> > +		}
> 
> You missed my suggestion here.
> 
>    text    data     bss     dec     hex filename
>    1519     524      24    2067     813 kernel/watchdog.o-before
>    1471     520      16    2007     7d7 kernel/watchdog.o-after
> 
> 
> --- a/include/linux/nmi.h~watchdog-printing-traces-for-all-cpus-on-lockup-detection-fix
> +++ a/include/linux/nmi.h
> @@ -57,9 +57,7 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
>  extern int watchdog_user_enabled;
>  extern int watchdog_thresh;
> -#ifdef CONFIG_SMP
>  extern int sysctl_softlockup_all_cpu_backtrace;
> -#endif
>  struct ctl_table;
>  extern int proc_dowatchdog(struct ctl_table *, int ,
>  			   void __user *, size_t *, loff_t *);
> --- a/kernel/watchdog.c~watchdog-printing-traces-for-all-cpus-on-lockup-detection-fix
> +++ a/kernel/watchdog.c
> @@ -31,7 +31,12 @@
>  
>  int watchdog_user_enabled = 1;
>  int __read_mostly watchdog_thresh = 10;
> +#ifdef CONFIG_SMP
>  int __read_mostly sysctl_softlockup_all_cpu_backtrace;
> +#else
> +#define sysctl_softlockup_all_cpu_backtrace 0
> +#endif
> +
>  static int __read_mostly watchdog_running;
>  static u64 __read_mostly sample_period;
>  
> _

Ah ok.  I will respin the patch with that cleanup.  Thanks!

Cheers,
Don

> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection
  2014-04-24 13:48     ` Don Zickus
@ 2014-04-24 13:50       ` Don Zickus
  0 siblings, 0 replies; 6+ messages in thread
From: Don Zickus @ 2014-04-24 13:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, x86, davem, sparclinux, mguzik, Aaron Tomlin

On Thu, Apr 24, 2014 at 09:48:04AM -0400, Don Zickus wrote:
> On Wed, Apr 23, 2014 at 02:14:07PM -0700, Andrew Morton wrote:
> > On Wed, 23 Apr 2014 16:40:05 -0400 Don Zickus <dzickus@redhat.com> wrote:
> > 
> > > From: Aaron Tomlin <atomlin@redhat.com>
> > > 
> > > A 'softlockup' is defined as a bug that causes the kernel to
> > > loop in kernel mode for more than a predefined period to
> > > time, without giving other tasks a chance to run.
> > > 
> > > Currently, upon detection of this condition by the per-cpu
> > > watchdog task, debug information (including a stack trace)
> > > is sent to the system log.
> > > 
> > > On some occasions, we have observed that the "victim" rather
> > > than the actual "culprit" (i.e. the owner/holder of the
> > > contended resource) is reported to the user.
> > > Often this information has proven to be insufficient to
> > > assist debugging efforts.
> > > 
> > > To avoid loss of useful debug information, for architectures
> > > which support NMI, this patch makes it possible to improve
> > > soft lockup reporting. This is accomplished by issuing an
> > > NMI to each cpu to obtain a stack trace.
> > > 
> > > If NMI is not supported we just revert back to the old method.
> > > A sysctl and boot-time parameter is available to toggle this
> > > feature.
> > > 
> > > --- a/include/linux/nmi.h
> > > +++ b/include/linux/nmi.h
> > > @@ -57,6 +57,9 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *);
> > >  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> > >  extern int watchdog_user_enabled;
> > >  extern int watchdog_thresh;
> > > +#ifdef CONFIG_SMP
> > > +extern int sysctl_softlockup_all_cpu_backtrace;
> > > +#endif
> > 
> > The ifdefs aren't really needed here.  If we omit them then error
> > reporting happens at link time rather than at compile time, but that's
> > a small price to pay for cleaning up the code.
> > 
> > > +		if (softlockup_all_cpu_backtrace) {
> > > +			/* Prevent multiple soft-lockup reports if one cpu is already
> > > +			 * engaged in dumping cpu back traces
> > > +			 */
> > > +			if (test_and_set_bit(0, &soft_lockup_nmi_warn)) {
> > > +				/* Someone else will report us. Let's give up */
> > > +				__this_cpu_write(soft_watchdog_warn, true);
> > > +				return HRTIMER_RESTART;
> > > +			}
> > > +		}
> > 
> > You missed my suggestion here.
> > 
> >    text    data     bss     dec     hex filename
> >    1519     524      24    2067     813 kernel/watchdog.o-before
> >    1471     520      16    2007     7d7 kernel/watchdog.o-after
> > 
> > 
> > --- a/include/linux/nmi.h~watchdog-printing-traces-for-all-cpus-on-lockup-detection-fix
> > +++ a/include/linux/nmi.h
> > @@ -57,9 +57,7 @@ int hw_nmi_is_cpu_stuck(struct pt_regs *
> >  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> >  extern int watchdog_user_enabled;
> >  extern int watchdog_thresh;
> > -#ifdef CONFIG_SMP
> >  extern int sysctl_softlockup_all_cpu_backtrace;
> > -#endif
> >  struct ctl_table;
> >  extern int proc_dowatchdog(struct ctl_table *, int ,
> >  			   void __user *, size_t *, loff_t *);
> > --- a/kernel/watchdog.c~watchdog-printing-traces-for-all-cpus-on-lockup-detection-fix
> > +++ a/kernel/watchdog.c
> > @@ -31,7 +31,12 @@
> >  
> >  int watchdog_user_enabled = 1;
> >  int __read_mostly watchdog_thresh = 10;
> > +#ifdef CONFIG_SMP
> >  int __read_mostly sysctl_softlockup_all_cpu_backtrace;
> > +#else
> > +#define sysctl_softlockup_all_cpu_backtrace 0
> > +#endif
> > +
> >  static int __read_mostly watchdog_running;
> >  static u64 __read_mostly sample_period;
> >  
> > _
> 
> Ah ok.  I will respin the patch with that cleanup.  Thanks!

Or I can just be happy you took care of that for me.  :-)

/me should read all his email first...

Thanks,
Don

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-24 14:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-23 20:40 [PATCH 0/2 V4] Print traces on softlockup Don Zickus
2014-04-23 20:40 ` [PATCH 1/2 v4] nmi: Provide the option to issue an NMI back trace to every cpu but current Don Zickus
2014-04-23 20:40 ` [PATCH 2/2 v4] watchdog: Printing traces for all cpus on lockup detection Don Zickus
2014-04-23 21:14   ` Andrew Morton
2014-04-24 13:48     ` Don Zickus
2014-04-24 13:50       ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox