All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: Nathan Lynch <nathanl@austin.ibm.com>,
	lkml - Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: 2.6.8.1-mm2
Date: Fri, 20 Aug 2004 17:40:51 +1000	[thread overview]
Message-ID: <1092987650.28849.349.camel@bach> (raw)
In-Reply-To: <20040819181603.700a9a0e.akpm@osdl.org>

On Fri, 2004-08-20 at 11:16, Andrew Morton wrote:
> Nathan Lynch <nathanl@austin.ibm.com> wrote:
> >
> > > +dont-sleep-after-were-out-of-task-list.patch
> > > 
> > >  CPU hotplug race fix
> > 
> > I don't mean to be a pain, but this patch does not fix the bug I
> > reported.

Nathan, can you revert that, and apply this?  This actually fixes the
might_sleep problem, and should fix at least the problem Vatsa saw.  If
it doesn't solve your problem, we need to look again.

Thanks,
Rusty.

Name: Don't Sleep After We're Out Of Task List
Status: Booted on 2.6.8.1-mm1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (authored)
Version: -mm

Ingo recently accidentally broke CPU hotplug by enabling preemption
around release_task(), which can be called on the current task if the
parent isn't interested.  This is because release_task() can now
sleep.

The problem is, the task can be preempted and then the CPU can go
down: it's not in the task list any more, and so it won't get migrated
after the CPU goes down.  It stays on the down CPU, which triggers a
BUG_ON.

We have had previous problems with tasks releasing themselves:
oprofile has a comment about it, and we had the case of trying to
deliver SIGXCPU in the timer tick to the current task which had called
release_task().  I tried shuffling release_task off the
finish_arch_switch, but that can't sleep either.  I tried using rcu,
but same problem.  Finally, I just use a workqueue and a per-cpu list,
which also guarantees the task has actually finished running.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .29294-linux-2.6.8.1-mm1/include/linux/sched.h .29294-linux-2.6.8.1-mm1.updated/include/linux/sched.h
--- .29294-linux-2.6.8.1-mm1/include/linux/sched.h	2004-08-17 11:32:33.000000000 +1000
+++ .29294-linux-2.6.8.1-mm1.updated/include/linux/sched.h	2004-08-20 12:18:37.000000000 +1000
@@ -590,6 +591,8 @@ struct task_struct {
 	struct rw_semaphore pagg_sem;
 #endif
 
+	/* Delayed cleanup on death of task with uncaring parent. */
+	struct list_head death;
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .29294-linux-2.6.8.1-mm1/kernel/exit.c .29294-linux-2.6.8.1-mm1.updated/kernel/exit.c
--- .29294-linux-2.6.8.1-mm1/kernel/exit.c	2004-08-17 11:32:33.000000000 +1000
+++ .29294-linux-2.6.8.1-mm1.updated/kernel/exit.c	2004-08-20 14:50:25.000000000 +1000
@@ -51,6 +51,41 @@ static void __unhash_process(struct task
 	REMOVE_LINKS(p);
 }
 
+/* Keventd handles tasks whose parent won't ever release_task them. */
+static DEFINE_PER_CPU(struct list_head, unreleased_tasks);
+static DEFINE_PER_CPU(struct work_struct, release_task_work);
+
+static void release_tasks(void *tasks)
+{
+	struct task_struct *tsk, *next;
+	LIST_HEAD(list);
+
+	write_lock_irq(&tasklist_lock);
+	list_splice_init((struct list_head *)tasks, &list);
+	write_unlock_irq(&tasklist_lock);
+
+	list_for_each_entry_safe(tsk, next, &list, death) {
+		BUG_ON(!(tsk->flags & PF_DEAD));
+		/* I don't *think* this should happen... --RR */
+		WARN_ON(tsk->state != TASK_DEAD);
+		list_del(&tsk->death);
+		release_task(tsk);
+	}
+}
+
+static __init int init_release_tasks(void)
+{
+	unsigned int i;
+
+	for_each_cpu(i) {
+		INIT_LIST_HEAD(&per_cpu(unreleased_tasks, i));
+		INIT_WORK(&per_cpu(release_task_work, i), release_tasks,
+			  &per_cpu(unreleased_tasks, i));
+	}
+	return 0;
+}
+core_initcall(init_release_tasks);
+
 void release_task(struct task_struct * p)
 {
 	int zap_leader;
@@ -656,7 +691,6 @@ static inline void forget_original_paren
  */
 static void exit_notify(struct task_struct *tsk)
 {
-	int state;
 	struct task_struct *t;
 	struct list_head ptrace_dead, *_p, *_n;
 
@@ -753,47 +787,37 @@ static void exit_notify(struct task_stru
 		do_notify_parent(tsk, SIGCHLD);
 	}
 
-	state = TASK_ZOMBIE;
-	if (tsk->exit_signal == -1 && tsk->ptrace == 0)
-		state = TASK_DEAD;
-	else
-		tsk->state = state;
-	/*
-	 * Clear these here so that update_process_times() won't try to deliver
-	 * itimer, profile or rlimit signals to this task while it is in late exit.
-	 */
-	tsk->it_virt_value = 0;
-	tsk->it_prof_value = 0;
-	tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY;
+	tsk->flags |= PF_DEAD;
+	if (tsk->exit_signal == -1 && tsk->ptrace == 0) {
+		/* Uncaring parent: keventd will do the final release_task */
+		tsk->state = TASK_DEAD;
+		list_add(&tsk->death, &__get_cpu_var(unreleased_tasks));
+		schedule_work(&__get_cpu_var(release_task_work));
+	} else
+		tsk->state = TASK_ZOMBIE;
 
 	/*
-	 * Get a reference to it so that we can set the state
-	 * as the last step. The state-setting only matters if the
-	 * current task is releasing itself, to trigger the final
-	 * put_task_struct() in finish_task_switch(). (thread self-reap)
+	 * In the preemption case it must be impossible for the task
+	 * to get runnable again, so use "_raw_" unlock to keep
+	 * preempt_count elevated until we schedule().  Also, this
+	 * ensures that keventd won't release this task until
+	 * after we schedule().
+	 *
+	 * To avoid deadlock on SMP, interrupts must be unmasked.  If we
+	 * don't, subsequently called functions (e.g, wait_task_inactive()
+	 * via release_task()) will spin, with interrupt flags
+	 * unwittingly blocked, until the other task sleeps.  That task
+	 * may itself be waiting for smp_call_function() to answer and
+	 * complete, and with interrupts blocked that will never happen.
 	 */
-	get_task_struct(tsk);
-
-	write_unlock_irq(&tasklist_lock);
+	_raw_write_unlock(&tasklist_lock);
+	local_irq_enable();
 
 	list_for_each_safe(_p, _n, &ptrace_dead) {
 		list_del_init(_p);
 		t = list_entry(_p,struct task_struct,ptrace_list);
 		release_task(t);
 	}
-
-	/* If the process is dead, release it - nobody will wait for it */
-	if (state == TASK_DEAD) {
-		release_task(tsk);
-		write_lock_irq(&tasklist_lock);
-		tsk->state = state;
-		_raw_write_unlock(&tasklist_lock);
-		local_irq_enable();
-	} else
-		preempt_disable();
-
-	tsk->flags |= PF_DEAD;
-	put_task_struct(tsk);
 }
 
 asmlinkage NORET_TYPE void do_exit(long code)

-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


  reply	other threads:[~2004-08-20  7:42 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-19  8:42 2.6.8.1-mm2 Andrew Morton
2004-08-19  9:10 ` 2.6.8.1-mm2 Ryan Cumming
2004-08-19  9:35   ` 2.6.8.1-mm2 Chris Wedgwood
2004-08-23 21:25   ` 2.6.8.1-mm2 Adrian Bunk
2004-08-25 18:32     ` 2.6.8.1-mm2 Bill Davidsen
2004-08-19  9:12 ` 2.6.8.1-mm2 Dipankar Sarma
2004-08-19  9:39 ` aliased directories, was 2.6.8.1-mm2 Christoph Hellwig
2004-08-19  9:43 ` 2.6.8.1-mm2 Christoph Hellwig
2004-08-19 11:36 ` 2.6.8.1-mm2 Gene Heskett
2004-08-19 12:29 ` [PATCH] 2.6.8.1-mm2 --- UML build fixes Chris Wedgwood
2004-08-19 20:55   ` Sam Ravnborg
2004-08-19 21:19     ` Jeff Dike
2004-08-19 22:32       ` Sam Ravnborg
2004-08-19 14:43 ` 2.6.8.1-mm2 Michael Geithe
2004-08-19 14:52 ` 2.6.8.1-mm2 John Cherry
2004-08-19 14:29   ` 2.6.8.1-mm2 Alan Cox
2004-08-20  7:06   ` 2.6.8.1-mm2 Hans Reiser
2004-08-20  7:16     ` 2.6.8.1-mm2 Andrew Morton
2004-08-20  7:37       ` 2.6.8.1-mm2 Hans Reiser
2004-08-20 13:53         ` 2.6.8.1-mm2 Alex Zarochentsev
2004-08-20 18:05           ` 2.6.8.1-mm2 Hans Reiser
     [not found] ` <200408191245.46726.gene.heskett@verizon.net>
     [not found]   ` <20040819182752.GA3024@viasys.com>
2004-08-19 19:17     ` 2.6.8.1-mm2 Gene Heskett
2004-08-20  1:51       ` 2.6.8.1-mm2 Gene Heskett
2004-08-20  0:50 ` 2.6.8.1-mm2 Marcelo Tosatti
2004-08-20  6:08   ` 2.6.8.1-mm2 Andrew Morton
2004-08-20  9:11     ` 2.6.8.1-mm2 Antonino A. Daplas
2004-08-20 16:20       ` Kronos
2004-08-20  1:08 ` 2.6.8.1-mm2 Nathan Lynch
2004-08-20  1:16   ` 2.6.8.1-mm2 Andrew Morton
2004-08-20  7:40     ` Rusty Russell [this message]
2004-08-20  8:14       ` 2.6.8.1-mm2 Ingo Molnar
2004-08-20  8:29         ` 2.6.8.1-mm2 Srivatsa Vaddagiri
2004-08-20  8:59           ` 2.6.8.1-mm2 Ingo Molnar
2004-08-20  9:03             ` 2.6.8.1-mm2 Ingo Molnar
2004-08-21  7:51         ` 2.6.8.1-mm2 Rusty Russell
2004-08-20  8:33       ` 2.6.8.1-mm2 Ingo Molnar
2004-08-20 21:35         ` 2.6.8.1-mm2 Andrew Morton
2004-08-20 22:12           ` 2.6.8.1-mm2 Nathan Lynch
2004-08-20  6:17 ` 2.6.8.1-mm2 Srivatsa Vaddagiri
2004-08-20  6:59   ` 2.6.8.1-mm2 Paul Mackerras
2004-08-20  8:21 ` 2.6.8.1-mm2 ismail dönmez
2004-08-20 22:43 ` 2.6.8.1-mm2 Rik van Riel
2004-08-20 23:05   ` 2.6.8.1-mm2 - reiser4 Rik van Riel
2004-08-20 23:15     ` William Lee Irwin III
2004-08-20 23:20     ` Anton Blanchard
2004-08-20 23:34       ` Andrew Morton
2004-08-21  0:12         ` Rik van Riel
2004-08-21  6:24         ` Hans Reiser
2004-08-21  0:15     ` Rik van Riel
2004-08-21  8:57       ` Hans Reiser
2004-08-21  7:30   ` 2.6.8.1-mm2 Hans Reiser
2004-08-22 21:32   ` 2.6.8.1-mm2 Alex Zarochentsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1092987650.28849.349.camel@bach \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nathanl@austin.ibm.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.