* [RFC] wait*() induced tasklist_lock starvation
@ 2014-01-26 23:04 David Rientjes
2014-01-27 17:48 ` Oleg Nesterov
0 siblings, 1 reply; 2+ messages in thread
From: David Rientjes @ 2014-01-26 23:04 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Andrew Morton, linux-kernel
Hi Oleg,
We've found that it's pretty easy to cause NMI watchdog timeouts due to
tasklist_lock starvation by using repeated wait4(), waitid(), or waitpid()
since it takes the readside of the lock and cascading calls to the
syscalls from multiple processes will starve anything in the fork() or
exit() path that is waiting on the writeside with irqs disabled.
The only way I've been able to remedy this problem is by serializing the
taking of the readside of this lock with a spinlock specifically for these
syscalls, otherwise my testcase will panic any machine if we panic on
these NMI watchdog timeouts, which we do.
Is there any way we can do this in a less expensive way? Or is it just
another case of tasklist_lock problems that needs a major overhaul?
---
kernel/exit.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/exit.c b/kernel/exit.c
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -59,6 +59,14 @@
#include <asm/pgtable.h>
#include <asm/mmu_context.h>
+/*
+ * Ensures the wait family of syscalls -- wait4(), waitid(), and waitpid() --
+ * don't cascade taking readside of tasklist_lock which will starve processes
+ * doing fork() or exit() and cause NMI watchdog timeouts with interrupts
+ * disabled.
+ */
+static DEFINE_SPINLOCK(wait_lock);
+
static void exit_mm(struct task_struct * tsk);
static void __unhash_process(struct task_struct *p, bool group_dead)
@@ -1028,6 +1036,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
get_task_struct(p);
read_unlock(&tasklist_lock);
+ spin_unlock(&wait_lock);
if ((exit_code & 0x7f) == 0) {
why = CLD_EXITED;
status = exit_code >> 8;
@@ -1112,6 +1121,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
* thread can reap it because we set its state to EXIT_DEAD.
*/
read_unlock(&tasklist_lock);
+ spin_unlock(&wait_lock);
retval = wo->wo_rusage
? getrusage(p, RUSAGE_BOTH, wo->wo_rusage) : 0;
@@ -1246,6 +1256,7 @@ unlock_sig:
pid = task_pid_vnr(p);
why = ptrace ? CLD_TRAPPED : CLD_STOPPED;
read_unlock(&tasklist_lock);
+ spin_unlock(&wait_lock);
if (unlikely(wo->wo_flags & WNOWAIT))
return wait_noreap_copyout(wo, p, pid, uid, why, exit_code);
@@ -1308,6 +1319,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
pid = task_pid_vnr(p);
get_task_struct(p);
read_unlock(&tasklist_lock);
+ spin_unlock(&wait_lock);
if (!wo->wo_info) {
retval = wo->wo_rusage
@@ -1523,6 +1535,7 @@ repeat:
goto notask;
set_current_state(TASK_INTERRUPTIBLE);
+ spin_lock(&wait_lock);
read_lock(&tasklist_lock);
tsk = current;
do {
@@ -1538,6 +1551,7 @@ repeat:
break;
} while_each_thread(current, tsk);
read_unlock(&tasklist_lock);
+ spin_unlock(&wait_lock);
notask:
retval = wo->notask_error;
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-01-27 17:48 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-26 23:04 [RFC] wait*() induced tasklist_lock starvation David Rientjes
2014-01-27 17:48 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox