public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Andrew Morton <akpm@osdl.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	nathanl@austin.ibm.com,
	lkml - Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Zwane Mwaikambo <zwane@linuxpower.ca>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: 2.6.8.1-mm1
Date: Wed, 18 Aug 2004 11:04:36 +1000	[thread overview]
Message-ID: <1092791073.27352.183.camel@bach> (raw)
In-Reply-To: <20040817105322.6f596061.akpm@osdl.org>

On Wed, 2004-08-18 at 03:53, Andrew Morton wrote:
> Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:
> >
> > I found this to be due to task leak in exit code. In release_task:
> > 
> >  	a. Task is removed from task-list (unhash_process)
> >  	b. More processing is done (like proc_pid_flush etc)
> >  	   before task finally dies.
> > 
> >  The problem is the task can get preempted between a and b.
> 
> It seems wrong that a task can be preempted so late in its lifetime.  We're
> just asking for nasty bugs by permitting that.

Indeed, but recent changes mean we need to be able to sleep
(proc_pid_flush) after we've unlinked the task.

IMHO, the best solution is to side-step the release_task-on-yourself
case which has been traditionally problematic anyway (remember the
SIGXCPU problem?) and do the release_task from finish_task_switch if the
parent isn't going to do it.  

Ingo, I don't understand this comment in exit_notify;

	/*
	 * Get a reference to it so that we can set the state
	 * as the last step. The state-setting only matters if the
	 * current task is releasing itself, to trigger the final
	 * put_task_struct() in finish_task_switch(). (thread self-reap)
	 */
	get_task_struct(tsk);

The flags, not the state, is checked in finish_task_switch: we must not
hit schedule with the PF_DEAD flag set, but the state should be OK?
I also don't see the need for get_task_struct().  Am I missing
something?

Thanks,
Rusty.

Name: Don't Sleep After We're Out Of Task List
Status: Booted on 2.6.8.1-mm1
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (authored)
Version: -mm

Ingo recently accidentally broke CPU hotplug by enabling preemption
around release_task(), which can be called on the current task if the
parent isn't interested.

The problem is, the task can be preempted and then the CPU can go
down: it's not in the task list any more, and so it won't get migrated
after the CPU goes down.  It stays on the down CPU, which triggers a
BUG_ON.

We have had previous problems with tasks releasing themselves:
oprofile has a comment about it, and we had the case of trying to
deliver SIGXCPU in the timer tick to the current task which had called
release_task().  This patch shuffles the self-reaping off to
finish_task_switch, so there's never a running task which isn't in the
task list, except idle threads.

diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .28758-linux-2.6.8.1-mm1/include/linux/sched.h .28758-linux-2.6.8.1-mm1.updated/include/linux/sched.h
--- .28758-linux-2.6.8.1-mm1/include/linux/sched.h	2004-08-17 11:32:33.000000000 +1000
+++ .28758-linux-2.6.8.1-mm1.updated/include/linux/sched.h	2004-08-18 09:34:24.000000000 +1000
@@ -610,6 +610,7 @@ do { if (atomic_dec_and_test(&(tsk)->usa
 #define PF_STARTING	0x00000002	/* being created */
 #define PF_EXITING	0x00000004	/* getting shut down */
 #define PF_DEAD		0x00000008	/* Dead */
+#define PF_SELFREAP	0x00000010	/* Never a zombie, must be released */
 #define PF_FORKNOEXEC	0x00000040	/* forked but didn't exec */
 #define PF_SUPERPRIV	0x00000100	/* used super-user privileges */
 #define PF_DUMPCORE	0x00000200	/* dumped core */
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .28758-linux-2.6.8.1-mm1/kernel/exit.c .28758-linux-2.6.8.1-mm1.updated/kernel/exit.c
--- .28758-linux-2.6.8.1-mm1/kernel/exit.c	2004-08-17 11:32:33.000000000 +1000
+++ .28758-linux-2.6.8.1-mm1.updated/kernel/exit.c	2004-08-18 10:10:26.000000000 +1000
@@ -756,8 +756,8 @@ static void exit_notify(struct task_stru
 	state = TASK_ZOMBIE;
 	if (tsk->exit_signal == -1 && tsk->ptrace == 0)
 		state = TASK_DEAD;
-	else
-		tsk->state = state;
+	tsk->state = state;
+
 	/*
 	 * Clear these here so that update_process_times() won't try to deliver
 	 * itimer, profile or rlimit signals to this task while it is in late exit.
@@ -766,14 +766,6 @@ static void exit_notify(struct task_stru
 	tsk->it_prof_value = 0;
 	tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY;
 
-	/*
-	 * Get a reference to it so that we can set the state
-	 * as the last step. The state-setting only matters if the
-	 * current task is releasing itself, to trigger the final
-	 * put_task_struct() in finish_task_switch(). (thread self-reap)
-	 */
-	get_task_struct(tsk);
-
 	write_unlock_irq(&tasklist_lock);
 
 	list_for_each_safe(_p, _n, &ptrace_dead) {
@@ -782,18 +774,12 @@ static void exit_notify(struct task_stru
 		release_task(t);
 	}
 
-	/* If the process is dead, release it - nobody will wait for it */
-	if (state == TASK_DEAD) {
-		release_task(tsk);
-		write_lock_irq(&tasklist_lock);
-		tsk->state = state;
-		_raw_write_unlock(&tasklist_lock);
-		local_irq_enable();
-	} else
-		preempt_disable();
-
+	preempt_disable();
+	/* PF_DEAD says drop ref after we schedule. */
 	tsk->flags |= PF_DEAD;
-	put_task_struct(tsk);
+	/* PF_SELFREAP says there's no parent to wait4() for us. */
+	if (state == TASK_DEAD)
+		tsk->flags |= PF_SELFREAP;
 }
 
 asmlinkage NORET_TYPE void do_exit(long code)
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal .28758-linux-2.6.8.1-mm1/kernel/sched.c .28758-linux-2.6.8.1-mm1.updated/kernel/sched.c
--- .28758-linux-2.6.8.1-mm1/kernel/sched.c	2004-08-17 11:32:34.000000000 +1000
+++ .28758-linux-2.6.8.1-mm1.updated/kernel/sched.c	2004-08-18 09:35:29.000000000 +1000
@@ -1480,8 +1480,11 @@ static void finish_task_switch(task_t *p
 	finish_arch_switch(rq, prev);
 	if (mm)
 		mmdrop(mm);
-	if (unlikely(prev_task_flags & PF_DEAD))
+	if (unlikely(prev_task_flags & PF_DEAD)) {
+		if (prev_task_flags & PF_SELFREAP)
+			release_task(prev);
 		put_task_struct(prev);
+	}
 }
 
 /**


-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


  reply	other threads:[~2004-08-18  1:05 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-16 21:37 2.6.8.1-mm1 Andrew Morton
2004-08-16 21:47 ` 2.6.8.1-mm1 Christoph Hellwig
2004-08-17 13:20   ` 2.6.8.1-mm1 Frediano Ziglio
2004-08-18 23:57   ` 2.6.8.1-mm1 Peter Osterlund
2004-08-19  9:45     ` 2.6.8.1-mm1 Christoph Hellwig
2004-08-20  5:44       ` 2.6.8.1-mm1 Peter Osterlund
2004-08-20  6:03         ` 2.6.8.1-mm1 Christoph Hellwig
2004-08-16 22:30 ` 2.6.8.1-mm1 Bartlomiej Zolnierkiewicz
2004-08-16 21:51   ` 2.6.8.1-mm1 Alan Cox
2004-08-16 23:25 ` 2.6.8.1-mm1 Arkadiusz Miskiewicz
2004-08-16 23:39 ` 2.6.8.1-mm1 Martin J. Bligh
2004-08-17  1:32   ` 2.6.8.1-mm1 Nathan Lynch
2004-08-17  6:59     ` 2.6.8.1-mm1 Sam Ravnborg
2004-08-17  6:25       ` 2.6.8.1-mm1 Martin J. Bligh
2004-08-17  6:38         ` 2.6.8.1-mm1 Andrew Morton
2004-08-17  7:00       ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17  7:05         ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17  3:07 ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17  3:09   ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17  3:19     ` 2.6.8.1-mm1 Andrew Morton
2004-08-17  3:41       ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17  4:16         ` 2.6.8.1-mm1 Nick Piggin
2004-08-17 14:38       ` 2.6.8.1-mm1 (compile stats) John Cherry
2004-08-17  5:59 ` 2.6.8.1-mm1 Nathan Lynch
2004-08-17  7:19   ` 2.6.8.1-mm1 Rusty Russell
2004-08-17  8:45     ` [patch] new-task-fix.patch, 2.6.8.1-mm1 Ingo Molnar
2004-08-17 11:35       ` Nick Piggin
2004-08-17 11:38   ` 2.6.8.1-mm1 Srivatsa Vaddagiri
2004-08-17 17:53     ` 2.6.8.1-mm1 Andrew Morton
2004-08-18  1:04       ` Rusty Russell [this message]
2004-08-18 17:36         ` 2.6.8.1-mm1 Nathan Lynch
2004-08-17  6:20 ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17  6:40   ` 2.6.8.1-mm1 sam
2004-08-17  7:05 ` 2.6.8.1-mm1 Nathan Lynch
2004-08-17 13:39   ` 2.6.8.1-mm1 Zwane Mwaikambo
2004-08-17 12:54 ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17 14:15   ` 2.6.8.1-mm1 William Lee Irwin III
2004-08-17 21:59 ` ldchk -> arch/arm/Makefile? [Was: 2.6.8.1-mm1] Sam Ravnborg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1092791073.27352.183.camel@bach \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nathanl@austin.ibm.com \
    --cc=vatsa@in.ibm.com \
    --cc=zwane@linuxpower.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox