public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@osdl.org>,
	Nathan Lynch <nathanl@austin.ibm.com>,
	lkml - Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>
Subject: Re: 2.6.8.1-mm2
Date: Sat, 21 Aug 2004 17:51:14 +1000	[thread overview]
Message-ID: <1093074658.4883.99.camel@bach> (raw)
In-Reply-To: <20040820081458.GA4949@elte.hu>

On Fri, 2004-08-20 at 18:14, Ingo Molnar wrote:
> * Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > +	/* Delayed cleanup on death of task with uncaring parent. */
> > +	struct list_head death;
> 
> > +/* Keventd handles tasks whose parent won't ever release_task them. */
> > +static DEFINE_PER_CPU(struct list_head, unreleased_tasks);
> > +static DEFINE_PER_CPU(struct work_struct, release_task_work);
> 
> no! this removes the whole performance optimization of self-reaping and
> re-introduces the context-switches. Just measure the
> creation/destruction performance of threads. Strong NACK.

OK, that patch sucked.  But this self-reaping optimization has caused
more than one subtle bug; I am determined to get rid of it.

What do you think of this? (against 2.6.8.1-mm2)
Rusty.

Name: Don't Sleep After We're Out Of Task List
Status: Booted on 2.6.8.1-mm2
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (authored)
Version: -mm

Ingo recently accidentally broke CPU hotplug by enabling preemption
around release_task(), which can be called on the current task if the
parent isn't interested.  This is because release_task() can now
sleep.

The problem is, the task can be preempted and then the CPU can go
down: it's not in the task list any more, and so it won't get migrated
after the CPU goes down.  It stays on the down CPU, which triggers a
BUG_ON.

We have had previous problems with tasks releasing themselves:
oprofile has a comment about it, and we had the case of trying to
deliver SIGXCPU in the timer tick to the current task which had called
release_task().  I tried shuffling release_task off the
finish_arch_switch, but that can't sleep either.  I tried using rcu,
but same problem.  Finally, I just use a workqueue and a per-cpu list,
which also guarantees the task has actually finished running.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .6239-linux-2.6.8.1-mm2/include/linux/sched.h .6239-linux-2.6.8.1-mm2.updated/include/linux/sched.h
--- .6239-linux-2.6.8.1-mm2/include/linux/sched.h	2004-08-20 17:33:26.000000000 +1000
+++ .6239-linux-2.6.8.1-mm2.updated/include/linux/sched.h	2004-08-21 17:12:34.000000000 +1000
@@ -615,7 +615,6 @@ do { if (atomic_dec_and_test(&(tsk)->usa
 					/* Not implemented yet, only for 486*/
 #define PF_STARTING	0x00000002	/* being created */
 #define PF_EXITING	0x00000004	/* getting shut down */
-#define PF_DEAD		0x00000008	/* Dead */
 #define PF_SELFREAP	0x00000010	/* Never a zombie, must be released */
 #define PF_FORKNOEXEC	0x00000040	/* forked but didn't exec */
 #define PF_SUPERPRIV	0x00000100	/* used super-user privileges */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .6239-linux-2.6.8.1-mm2/kernel/exit.c .6239-linux-2.6.8.1-mm2.updated/kernel/exit.c
--- .6239-linux-2.6.8.1-mm2/kernel/exit.c	2004-08-20 17:33:27.000000000 +1000
+++ .6239-linux-2.6.8.1-mm2.updated/kernel/exit.c	2004-08-21 17:10:52.000000000 +1000
@@ -646,13 +646,14 @@ static inline void forget_original_paren
 	}
 }
 
+static DEFINE_PER_CPU(struct task_struct *, last_exited_task);
+
 /*
  * Send signals to all our closest relatives so that they know
  * to properly mourn us..
  */
 static void exit_notify(struct task_struct *tsk)
 {
-	int state;
 	struct task_struct *t;
 	struct list_head ptrace_dead, *_p, *_n;
 
@@ -749,18 +750,11 @@ static void exit_notify(struct task_stru
 		do_notify_parent(tsk, SIGCHLD);
 	}
 
-	state = TASK_ZOMBIE;
-	if (tsk->exit_signal == -1 && tsk->ptrace == 0)
-		state = TASK_DEAD;
-	tsk->state = state;
-
-	/*
-	 * Clear these here so that update_process_times() won't try to deliver
-	 * itimer, profile or rlimit signals to this task while it is in late exit.
-	 */
-	tsk->it_virt_value = 0;
-	tsk->it_prof_value = 0;
-	tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY;
+	if (tsk->exit_signal == -1 && tsk->ptrace == 0) {
+		tsk->state = TASK_DEAD;
+		tsk->flags |= PF_SELFREAP;
+	} else
+		tsk->state = TASK_ZOMBIE;
 
 	write_unlock_irq(&tasklist_lock);
 
@@ -770,12 +764,25 @@ static void exit_notify(struct task_stru
 		release_task(t);
 	}
 
+again:
 	preempt_disable();
-	/* PF_DEAD says drop ref after we schedule. */
-	tsk->flags |= PF_DEAD;
-	/* PF_SELFREAP says there's no parent to wait4() for us. */
-	if (state == TASK_DEAD)
-		tsk->flags |= PF_SELFREAP;
+
+	/* Now it's safe to do final put on last task which exited. */
+	t = __get_cpu_var(last_exited_task);
+	if (t) {
+		if (t->flags & PF_SELFREAP) {
+			/* release_task can sleep: release and retry. */
+			__get_cpu_var(last_exited_task) = NULL;
+			preempt_enable();
+			release_task(t);
+			put_task_struct(t);
+			goto again;
+		}
+		put_task_struct(t);
+	}
+
+	/* Once we've set this, we must not be preempted. */
+	__get_cpu_var(last_exited_task) = tsk;
 }
 
 asmlinkage NORET_TYPE void do_exit(long code)
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .6239-linux-2.6.8.1-mm2/kernel/sched.c .6239-linux-2.6.8.1-mm2.updated/kernel/sched.c
--- .6239-linux-2.6.8.1-mm2/kernel/sched.c	2004-08-20 17:33:27.000000000 +1000
+++ .6239-linux-2.6.8.1-mm2.updated/kernel/sched.c	2004-08-21 17:01:31.000000000 +1000
@@ -1467,30 +1467,12 @@ static void finish_task_switch(task_t *p
 {
 	runqueue_t *rq = this_rq();
 	struct mm_struct *mm = rq->prev_mm;
-	unsigned long prev_task_flags;
 
 	rq->prev_mm = NULL;
 
-	/*
-	 * A task struct has one reference for the use as "current".
-	 * If a task dies, then it sets TASK_ZOMBIE in tsk->state and calls
-	 * schedule one last time. The schedule call will never return,
-	 * and the scheduled task must drop that reference.
-	 * The test for TASK_ZOMBIE must occur while the runqueue locks are
-	 * still held, otherwise prev could be scheduled on another cpu, die
-	 * there before we look at prev->state, and then the reference would
-	 * be dropped twice.
-	 *		Manfred Spraul <manfred@colorfullife.com>
-	 */
-	prev_task_flags = prev->flags;
 	finish_arch_switch(rq, prev);
 	if (mm)
 		mmdrop(mm);
-	if (unlikely(prev_task_flags & PF_DEAD)) {
-		if (prev_task_flags & PF_SELFREAP)
-			release_task(prev);
-		put_task_struct(prev);
-	}
 }
 
 /**

-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


  parent reply	other threads:[~2004-08-21  7:51 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-19  8:42 2.6.8.1-mm2 Andrew Morton
2004-08-19  9:10 ` 2.6.8.1-mm2 Ryan Cumming
2004-08-19  9:35   ` 2.6.8.1-mm2 Chris Wedgwood
2004-08-23 21:25   ` 2.6.8.1-mm2 Adrian Bunk
2004-08-25 18:32     ` 2.6.8.1-mm2 Bill Davidsen
2004-08-19  9:12 ` 2.6.8.1-mm2 Dipankar Sarma
2004-08-19 11:36 ` 2.6.8.1-mm2 Gene Heskett
2004-08-19 12:29 ` [PATCH] 2.6.8.1-mm2 --- UML build fixes Chris Wedgwood
2004-08-19 20:55   ` Sam Ravnborg
2004-08-19 21:19     ` Jeff Dike
2004-08-19 22:32       ` Sam Ravnborg
2004-08-19 14:43 ` 2.6.8.1-mm2 Michael Geithe
2004-08-19 14:52 ` 2.6.8.1-mm2 John Cherry
2004-08-19 14:29   ` 2.6.8.1-mm2 Alan Cox
2004-08-20  7:06   ` 2.6.8.1-mm2 Hans Reiser
2004-08-20  7:16     ` 2.6.8.1-mm2 Andrew Morton
2004-08-20  7:37       ` 2.6.8.1-mm2 Hans Reiser
2004-08-20 13:53         ` 2.6.8.1-mm2 Alex Zarochentsev
2004-08-20 18:05           ` 2.6.8.1-mm2 Hans Reiser
     [not found] ` <200408191245.46726.gene.heskett@verizon.net>
     [not found]   ` <20040819182752.GA3024@viasys.com>
2004-08-19 19:17     ` 2.6.8.1-mm2 Gene Heskett
2004-08-20  1:51       ` 2.6.8.1-mm2 Gene Heskett
2004-08-20  0:50 ` 2.6.8.1-mm2 Marcelo Tosatti
2004-08-20  1:08 ` 2.6.8.1-mm2 Nathan Lynch
2004-08-20  1:16   ` 2.6.8.1-mm2 Andrew Morton
2004-08-20  7:40     ` 2.6.8.1-mm2 Rusty Russell
2004-08-20  8:14       ` 2.6.8.1-mm2 Ingo Molnar
2004-08-20  8:29         ` 2.6.8.1-mm2 Srivatsa Vaddagiri
2004-08-20  8:59           ` 2.6.8.1-mm2 Ingo Molnar
2004-08-20  9:03             ` 2.6.8.1-mm2 Ingo Molnar
2004-08-21  7:51         ` Rusty Russell [this message]
2004-08-20  8:33       ` 2.6.8.1-mm2 Ingo Molnar
2004-08-20 21:35         ` 2.6.8.1-mm2 Andrew Morton
2004-08-20 22:12           ` 2.6.8.1-mm2 Nathan Lynch
2004-08-20  6:17 ` 2.6.8.1-mm2 Srivatsa Vaddagiri
2004-08-20  6:59   ` 2.6.8.1-mm2 Paul Mackerras
2004-08-20  8:21 ` 2.6.8.1-mm2 ismail dönmez
2004-08-20 22:43 ` 2.6.8.1-mm2 Rik van Riel
2004-08-20 23:05   ` 2.6.8.1-mm2 - reiser4 Rik van Riel
2004-08-20 23:15     ` William Lee Irwin III
2004-08-20 23:20     ` Anton Blanchard
2004-08-20 23:34       ` Andrew Morton
2004-08-21  0:12         ` Rik van Riel
2004-08-21  6:24         ` Hans Reiser
2004-08-21  0:15     ` Rik van Riel
2004-08-21  8:57       ` Hans Reiser
2004-08-21  7:30   ` 2.6.8.1-mm2 Hans Reiser
2004-08-22 21:32   ` 2.6.8.1-mm2 Alex Zarochentsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1093074658.4883.99.camel@bach \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nathanl@austin.ibm.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox