public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] watchdog: Add a sysctl to disable soft lockup detector
@ 2013-12-03 21:54 Ben Zhang
  2013-12-03 22:27 ` Andrew Morton
  2013-12-04 21:29 ` Don Zickus
  0 siblings, 2 replies; 13+ messages in thread
From: Ben Zhang @ 2013-12-03 21:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Don Zickus, Andrew Morton, Ingo Molnar, Frederic Weisbecker,
	Ben Zhang

This provides usermode a way to disable only the soft
lockup detector while keeping the hard lockup detector
running.

kernel.softlockup_detector_enable=1:
This is the default. The soft lockup detector is enabled.
When a soft lockup is detected, a warning message with
debug info is printed. The kernel may be configured to
panics in this case via the sysctl kernel.softlockup_panic.

kernel.softlockup_detector_enable=0:
The soft lockup detector is disabled. Warning message is
not printed on soft lockup. The kernel does not panic on
soft lockup regardless of the value of kernel.softlockup_panic.
Note kernel.softlockup_detector_enable does not affect
the hard lockup detector.

Signed-off-by: Ben Zhang <benzh@chromium.org>
---
 include/linux/sched.h |  1 +
 kernel/sysctl.c       |  9 +++++++++
 kernel/watchdog.c     | 15 +++++++++++++++
 3 files changed, 25 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 768b037..93ebec4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -270,6 +270,7 @@ extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
 				  void __user *buffer,
 				  size_t *lenp, loff_t *ppos);
 extern unsigned int  softlockup_panic;
+extern unsigned int  softlockup_detector_enable;
 void lockup_detector_init(void);
 #else
 static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 34a6047..8ae1f36 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -840,6 +840,15 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &one,
 	},
 	{
+		.procname	= "softlockup_detector_enable",
+		.data		= &softlockup_detector_enable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
 		.procname       = "nmi_watchdog",
 		.data           = &watchdog_user_enabled,
 		.maxlen         = sizeof (int),
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4431610..b9594e6 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -80,6 +80,18 @@ static int __init softlockup_panic_setup(char *str)
 }
 __setup("softlockup_panic=", softlockup_panic_setup);
 
+unsigned int __read_mostly softlockup_detector_enable = 1;
+
+static int __init softlockup_detector_enable_setup(char *str)
+{
+	unsigned long res;
+	if (kstrtoul(str, 0, &res))
+		res = 1;
+	softlockup_detector_enable = res;
+	return 1;
+}
+__setup("softlockup_detector_enable=", softlockup_detector_enable_setup);
+
 static int __init nowatchdog_setup(char *str)
 {
 	watchdog_user_enabled = 0;
@@ -293,6 +305,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		return HRTIMER_RESTART;
 	}
 
+	if (!softlockup_detector_enable)
+		return HRTIMER_RESTART;
+
 	/* check for a softlockup
 	 * This is done by making sure a high priority task is
 	 * being scheduled.  The task touches the watchdog to
-- 
1.8.4.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH] watchdog:  touch_nmi_watchdog should only touch local cpu not every one
@ 2010-11-05  1:18 Don Zickus
  2010-11-05 13:51 ` Sergey Senozhatsky
  2010-11-07 22:09 ` Frederic Weisbecker
  0 siblings, 2 replies; 13+ messages in thread
From: Don Zickus @ 2010-11-05  1:18 UTC (permalink / raw)
  To: fweisbec, Peter Zijlstra
  Cc: Ingo Molnar, LKML, akpm, sergey.senozhatsky, Don Zickus

I ran into a scenario where while one cpu was stuck and should have panic'd
because of the NMI watchdog, it didn't.  The reason was another cpu was spewing
stack dumps on to the console.  Upon investigation, I noticed that when writing
to the console and also when dumping the stack, the watchdog is touched.

This causes all the cpus to reset their NMI watchdog flags and the 'stuck' cpu
just spins forever.

This change causes the semantics of touch_nmi_watchdog to be changed slightly.
Previously, I accidentally changed the semantics and we noticed there was a
codepath in which touch_nmi_watchdog could be touched from a preemtible area.
That caused a BUG() to happen when CONFIG_DEBUG_PREEMPT was enabled.  I believe
it was the acpi code.

My attempt here re-introduces the change to have the touch_nmi_watchdog() code
only touch the local cpu instead of all of the cpus.  But instead of using
__get_cpu_var(), I use the __raw_get_cpu_var() version.

This avoids the preemption problem.  However my reasoning wasn't because I was
trying to be lazy.  Instead I rationalized it as, well if preemption is enabled
then interrupts should be enabled to and the NMI watchdog will have no reason
to trigger.  So it won't matter if the wrong cpu is touched because the percpu
interrupt counters the NMI watchdog uses should still be incrementing.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index dc8e168..dd0c140 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -141,6 +141,21 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
+	/*
+	 * Using __raw here because some code paths have
+	 * preemption enabled.  If preemption is enabled
+	 * then interrupts should be enabled too, in which
+	 * case we shouldn't have to worry about the watchdog
+	 * going off.
+	 */
+	__raw_get_cpu_var(watchdog_nmi_touch) = true;
+
+	touch_softlockup_watchdog();
+}
+EXPORT_SYMBOL(touch_nmi_watchdog);
+
+void touch_all_nmi_watchdogs(void)
+{
 	if (watchdog_enabled) {
 		unsigned cpu;
 
@@ -151,7 +166,7 @@ void touch_nmi_watchdog(void)
 	}
 	touch_softlockup_watchdog();
 }
-EXPORT_SYMBOL(touch_nmi_watchdog);
+EXPORT_SYMBOL(touch_all_nmi_watchdogs);
 
 #endif
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-12-16 15:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-03 21:54 [PATCH] watchdog: Add a sysctl to disable soft lockup detector Ben Zhang
2013-12-03 22:27 ` Andrew Morton
2013-12-04 21:29 ` Don Zickus
2013-12-05  1:55   ` [PATCH v2] " Ben Zhang
2013-12-05  3:12     ` Don Zickus
2013-12-05 20:42       ` [PATCH] watchdog: touch_nmi_watchdog should only touch local cpu not every one Ben Zhang
2013-12-16 15:55         ` Don Zickus
  -- strict thread matches above, loose matches on Subject: below --
2010-11-05  1:18 Don Zickus
2010-11-05 13:51 ` Sergey Senozhatsky
2010-11-05 19:58   ` Andrew Morton
2010-11-08 13:37     ` Don Zickus
2010-11-07 22:09 ` Frederic Weisbecker
2010-11-08 13:38   ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox