* [PATCH] Softlockup (out of cpu) killer
@ 2011-12-11 22:48 Vincent Li
2011-12-12 0:28 ` Frederic Weisbecker
2011-12-12 9:38 ` Peter Zijlstra
0 siblings, 2 replies; 4+ messages in thread
From: Vincent Li @ 2011-12-11 22:48 UTC (permalink / raw)
To: Ingo Molnar
Cc: Don Zickus, Peter Zijlstra, Andrew Morton, Mandeep Singh Baines,
linux-kernel, Vincent Li
In kernel, there is out of memory (OOM) killer, why not make an out of cpu (OOC) killer?
I tested following patch by running an user-space cpu hogging process and the softlockukp
detector killed the process successfully.
Softlockup could be caused by user-space process hogging cpu, add softlockup_kill kernel
config to allow kernel to kill the user space cpu hogging process. this feature is
useful for high availability systems that have uptime gurantees and where a softlockup
must be resolved ASAP
echo 1 > /proc/sys/kernel/softlockukp_kill to enable cpu hog process killer
echo 0 > /proc/sys/kernel/softlockup_kill to disable cpu hog process killer
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
---
Documentation/kernel-parameters.txt | 4 ++++
include/linux/sched.h | 1 +
kernel/sysctl.c | 9 +++++++++
kernel/watchdog.c | 18 ++++++++++++++++++
lib/Kconfig.debug | 21 +++++++++++++++++++++
5 files changed, 53 insertions(+), 0 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 81c287f..1609387 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2418,6 +2418,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
[KNL] Should the soft-lockup detector generate panics.
Format: <integer>
+ softlockup_panic=
+ [KNL] Should the soft-lockup detector kill cpu hog process.
+ Format: <integer>
+
sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/laptops/sonypi.txt
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1c4f3e9..4783fac 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -315,6 +315,7 @@ extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos);
extern unsigned int softlockup_panic;
+extern unsigned int softlockup_kill;
void lockup_detector_init(void);
#else
static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ae27196..e79ea9c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -770,6 +770,15 @@ static struct ctl_table kern_table[] = {
.extra2 = &one,
},
{
+ .procname = "softlockup_kill",
+ .data = &softlockup_kill,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+ {
.procname = "nmi_watchdog",
.data = &watchdog_enabled,
.maxlen = sizeof (int),
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 1d7bca7..5832a90 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -75,6 +75,17 @@ static int __init softlockup_panic_setup(char *str)
}
__setup("softlockup_panic=", softlockup_panic_setup);
+unsigned int __read_mostly softlockup_kill =
+ CONFIG_BOOTPARAM_SOFTLOCKUP_KILL_VALUE;
+
+static int __init softlockup_kill_setup(char *str)
+{
+ softlockup_kill = simple_strtoul(str, NULL, 0);
+
+ return 1;
+}
+__setup("softlockup_kill=", softlockup_kill_setup);
+
static int __init nowatchdog_setup(char *str)
{
watchdog_enabled = 0;
@@ -306,6 +317,13 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
else
dump_stack();
+ if (softlockup_kill) {
+ printk(KERN_ERR "Kill softlockup process [%s:%d] on CPU#%d\n",
+ current->comm, task_pid_nr(current),
+ smp_processor_id());
+ force_sig(SIGKILL, current);
+ }
+
if (softlockup_panic)
panic("softlockup: hung tasks");
__this_cpu_write(soft_watchdog_warn, true);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 82928f5..e4afc98 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -224,6 +224,27 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
+config BOOTPARAM_SOFTLOCKUP_KILL
+ bool "Kill (cpu hog process) On Soft Lockups"
+ depends on LOCKUP_DETECTOR
+ help
+ Say Y here to enable the kernel to kill cpu hog process on
+ "soft lockups", which are bugs that cause the kernel to
+ loop in kernel mode for more than 60 seconds, without giving
+ other tasks a chance to run.
+
+ This feature is useful for high-availability systems that
+ have uptime guarantees and where a lockup must be resolved ASAP.
+
+ Say N if unsure.
+
+config BOOTPARAM_SOFTLOCKUP_KILL_VALUE
+ int
+ depends on LOCKUP_DETECTOR
+ range 0 1
+ default 0 if !BOOTPARAM_SOFTLOCKUP_KILL
+ default 1 if BOOTPARAM_SOFTLOCKUP_KILL
+
config DETECT_HUNG_TASK
bool "Detect Hung Tasks"
depends on DEBUG_KERNEL
--
1.7.0.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] Softlockup (out of cpu) killer
2011-12-11 22:48 [PATCH] Softlockup (out of cpu) killer Vincent Li
@ 2011-12-12 0:28 ` Frederic Weisbecker
2011-12-12 9:38 ` Peter Zijlstra
1 sibling, 0 replies; 4+ messages in thread
From: Frederic Weisbecker @ 2011-12-12 0:28 UTC (permalink / raw)
To: Vincent Li
Cc: Ingo Molnar, Don Zickus, Peter Zijlstra, Andrew Morton,
Mandeep Singh Baines, linux-kernel
On Sun, Dec 11, 2011 at 02:48:55PM -0800, Vincent Li wrote:
> In kernel, there is out of memory (OOM) killer, why not make an out of cpu (OOC) killer?
> I tested following patch by running an user-space cpu hogging process and the softlockukp
> detector killed the process successfully.
>
> Softlockup could be caused by user-space process hogging cpu, add softlockup_kill kernel
> config to allow kernel to kill the user space cpu hogging process. this feature is
> useful for high availability systems that have uptime gurantees and where a softlockup
> must be resolved ASAP
>
> echo 1 > /proc/sys/kernel/softlockukp_kill to enable cpu hog process killer
> echo 0 > /proc/sys/kernel/softlockup_kill to disable cpu hog process killer
That assumes a signal would be enough to pull a process out of its softlockup.
I believe this is seldom the case. A process in a softlockup is stuck in some
place that has preemption disabled. Unless it luckily polls there for pending
signals, that won't work.
But may be that happens more often than I think. May be other people have
more insight.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Softlockup (out of cpu) killer
2011-12-11 22:48 [PATCH] Softlockup (out of cpu) killer Vincent Li
2011-12-12 0:28 ` Frederic Weisbecker
@ 2011-12-12 9:38 ` Peter Zijlstra
2011-12-12 18:00 ` Vincent Li
1 sibling, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2011-12-12 9:38 UTC (permalink / raw)
To: Vincent Li
Cc: Ingo Molnar, Don Zickus, Andrew Morton, Mandeep Singh Baines,
linux-kernel
On Sun, 2011-12-11 at 14:48 -0800, Vincent Li wrote:
> In kernel, there is out of memory (OOM) killer, why not make an out of cpu (OOC) killer?
> I tested following patch by running an user-space cpu hogging process and the softlockukp
> detector killed the process successfully.
>
> Softlockup could be caused by user-space process hogging cpu, add softlockup_kill kernel
> config to allow kernel to kill the user space cpu hogging process. this feature is
> useful for high availability systems that have uptime gurantees and where a softlockup
> must be resolved ASAP
>
> echo 1 > /proc/sys/kernel/softlockukp_kill to enable cpu hog process killer
> echo 0 > /proc/sys/kernel/softlockup_kill to disable cpu hog process killer
>
> Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
Your whole premise is broken. Being a cpu hog and the softlockup
mechanism aren't related at all.
Furthermore, since the normal scheduling policy is a proportional one, a
cpu hog can't in fact starve anybody (although a fork bomb could). And
FIFO/RR are privileged ops.
Furthermore the distinction between memory and cpu-time is that memory
isn't a renewable resource, whereas time is. There's always more time,
but there's not always more memory.
So no, I don't think either you patch nor your concept make any sense.
Consider it nacked.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Softlockup (out of cpu) killer
2011-12-12 9:38 ` Peter Zijlstra
@ 2011-12-12 18:00 ` Vincent Li
0 siblings, 0 replies; 4+ messages in thread
From: Vincent Li @ 2011-12-12 18:00 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Don Zickus, Andrew Morton, Mandeep Singh Baines,
linux-kernel
>
> Your whole premise is broken. Being a cpu hog and the softlockup
> mechanism aren't related at all.
>
I fully understand that I may misunderstand the the cpu hog and
softlockup mechanism :)
> Furthermore, since the normal scheduling policy is a proportional one, a
> cpu hog can't in fact starve anybody (although a fork bomb could). And
> FIFO/RR are privileged ops.
>
I have a test program with FIFO privileges
http://www.vcn.bc.ca/~vli/schedrtcpu.c.txt that reliably eat 100% cpu
in top and the patch can kill it reliably, we have an user-space
traffic processing program that runs on FIFO similar like the test
program, under some condition, that user-space program could stuck on
the cpu and we want to kill it for high availability reason. with this
patch, we were able to do that.
I do notice that in the schedrtcpu.c test program, if I fork two
process like below:
pid_t spawn() {
pid_t pid = fork();
if (pid == 0)
busyloop();
return pid;
}
pid1 = spawn();
pid2 = spawn();
waitpid(pid1, &status, 0);
waitpid(pid2, &status, 0);
and run it on two cpu box, I got "sched: RT throttling activated" on
console and the test program wouldn't stuck on cpu, and can only reach
to 95% percent, it is strange that if I don't fork process, and only
runs the busyloop, it would not activate RT throttling and
consistently eat 100% single cpu.
in our corner case, it appears that patch does help solve our problem.
> Furthermore the distinction between memory and cpu-time is that memory
> isn't a renewable resource, whereas time is. There's always more time,
> but there's not always more memory.
>
understood, thanks
> So no, I don't think either you patch nor your concept make any sense.
> Consider it nacked.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-12-12 18:00 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-11 22:48 [PATCH] Softlockup (out of cpu) killer Vincent Li
2011-12-12 0:28 ` Frederic Weisbecker
2011-12-12 9:38 ` Peter Zijlstra
2011-12-12 18:00 ` Vincent Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox