Problem with exiting threads under NPTL

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Problem with exiting threads under NPTL
@ 2003-12-14  5:25 Petr Vandrovec
  2003-12-14 15:02 ` Martin Schlemmer
  2003-12-14 19:38 ` [patch] " Ingo Molnar
  0 siblings, 2 replies; 23+ messages in thread
From: Petr Vandrovec @ 2003-12-14  5:25 UTC (permalink / raw)
  To: linux-kernel

Hi,
  several times one of our threads ended up as ZOMBIE and
nobody wants to pick him up - even init ignores it. Inspection
of kernel structures revealed that task's exit code is 0,
exit_signal is -1, ptrace is 0 and state is 8 (ZOMBIE).

   Quick look at exit_notify reveals that it should not happen:
if task has exit_signal == -1 and is not ptraced, we should immediate
move to TASK_DEAD and get rid of task as fast as possible.

   But problem is that in do_notify_parent we have code which
sets exit_signal to -1 if parent ignores SIGCHLD, and we call
do_notify_parent() for group leader from release_task() when
last member of thread group exits, without taking any care
about eventual changes in exit_signal field.

   So if some process ignores SIGCHLD, and spawns child process 
which creates additional thread, and main thread in child exits 
before child it created, you'll end up with immortal zombie.

						Best regards,
							Petr Vandrovec
							vandrove@vc.cvut.cz

F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME COMMAND
5     0  3709     1  15   0     0    0 exit   Z    tty3       0:00 [test] <defunct>

Creator of immortal zombies:

#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <stdio.h>

static void* thread(void* dummy) {
	/* Make sure that we exit as second thread */
        sleep(1);
        return NULL;
}

void main(void) {
        int pid;

        signal(SIGCHLD, SIG_IGN);

        pid = fork();
        if (pid == 0) {
                pthread_t p;

                pthread_create(&p, NULL, thread, NULL);
        } else {
                /* Sleep some time so we know that child threads exit before us */
                sleep(2);
                printf("Look for task %d...\n", pid);
        }
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Problem with exiting threads under NPTL
  2003-12-14  5:25 Problem with exiting threads under NPTL Petr Vandrovec
@ 2003-12-14 15:02 ` Martin Schlemmer
  2003-12-14 19:38 ` [patch] " Ingo Molnar
  1 sibling, 0 replies; 23+ messages in thread
From: Martin Schlemmer @ 2003-12-14 15:02 UTC (permalink / raw)
  To: Petr Vandrovec; +Cc: Linux Kernel Mailing Lists

[-- Attachment #1: Type: text/plain, Size: 693 bytes --]

On Sun, 2003-12-14 at 07:25, Petr Vandrovec wrote:
> Hi,
>   several times one of our threads ended up as ZOMBIE and
> nobody wants to pick him up - even init ignores it. Inspection
> of kernel structures revealed that task's exit code is 0,
> exit_signal is -1, ptrace is 0 and state is 8 (ZOMBIE).
> 

>    So if some process ignores SIGCHLD, and spawns child process 
> which creates additional thread, and main thread in child exits 
> before child it created, you'll end up with immortal zombie.
> 

I can confirm this behavior here, although I must admit I do not
know if the sample code is legal.  Latest glibc from cvs + bk kernel.


Cheers,

-- 
Martin Schlemmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [patch] Re: Problem with exiting threads under NPTL
  2003-12-14  5:25 Problem with exiting threads under NPTL Petr Vandrovec
  2003-12-14 15:02 ` Martin Schlemmer
@ 2003-12-14 19:38 ` Ingo Molnar
  2003-12-14 20:38   ` Linus Torvalds
  1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2003-12-14 19:38 UTC (permalink / raw)
  To: Petr Vandrovec
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Roland McGrath


On Sun, 14 Dec 2003, Petr Vandrovec wrote:

>   several times one of our threads ended up as ZOMBIE and nobody wants
> to pick him up - even init ignores it. Inspection of kernel structures
> revealed that task's exit code is 0, exit_signal is -1, ptrace is 0 and
> state is 8 (ZOMBIE).

ok, we only recently started zapping SIGCHLD==ignore processes via the
detached-thread method. Your testcase shows one hole in this mechanism: if
a process that was created as a 'deatched process' itself creates a
detached thread (NPTL thread), and the thread exits before the thread
group leader, then nobody will release the leader. The patch below fixes
this bug, your testcase now works fine on my box. I've tested both UP and
SMP kernels.

the code is a bit ugly, but it's necessary - a parent can decide _after_
starting the child that it wants to detach it. (by setting SIGCHLD to
SIG_IGN. The testcase doesnt do this.) So the only place where we can
detect the detached-ness of a process is in do_notify_parent().

	Ingo

--- linux/kernel/exit.c.orig	
+++ linux/kernel/exit.c	
@@ -50,6 +50,7 @@ static void __unhash_process(struct task
 void release_task(struct task_struct * p)
 {
 	task_t *leader;
+	int zap_leader = 0;
 	struct dentry *proc_dentry;
  
 	BUG_ON(p->state < TASK_ZOMBIE);
@@ -72,8 +73,16 @@ void release_task(struct task_struct * p
 	 */
 	leader = p->group_leader;
 	if (leader != p && thread_group_empty(leader) &&
-		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1)
+		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1) {
 		do_notify_parent(leader, leader->exit_signal);
+		/*
+		 * If we were the last child thread and the leader has
+		 * exited already, and the leader's parent ignores SIGCHLD,
+		 * then we are the one who should release the leader:
+		 */
+		if (leader->exit_signal == -1)
+			zap_leader = 1;
+	}
 
 	p->parent->cutime += p->utime + p->cutime;
 	p->parent->cstime += p->stime + p->cstime;
@@ -88,6 +97,13 @@ void release_task(struct task_struct * p
 	proc_pid_flush(proc_dentry);
 	release_thread(p);
 	put_task_struct(p);
+
+	/*
+	 * Do this outside the tasklist lock. The reference to the
+	 * leader is safe:
+	 */
+	if (zap_leader)
+		release_task(leader);
 }
 
 /* we are using it only for SMP init */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 19:38 ` [patch] " Ingo Molnar
@ 2003-12-14 20:38   ` Linus Torvalds
  2003-12-14 20:45     ` Linus Torvalds
                       ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Linus Torvalds @ 2003-12-14 20:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath

On Sun, 14 Dec 2003, Ingo Molnar wrote:
>
> the code is a bit ugly, but it's necessary - a parent can decide _after_
> starting the child that it wants to detach it. (by setting SIGCHLD to
> SIG_IGN. The testcase doesnt do this.) So the only place where we can
> detect the detached-ness of a process is in do_notify_parent().

Hmm.. What if "leader->exit_signal" was -1 already _before_ we call
"do_notify_parent()"? In that case we'd never call "do_notify_parent()"
for the leader at all, and we would also not release it outselves, the way
you've done the test.

Or is that case impossible to trigger? Looks a bit like that. But if it
_is_ impossible to trigger (ie exit_signal cannot be -1 for a thread
leader), then why does the current code test for "&& leader->exit_signal
!= -1)" at all?

That code looks fragile as hell. I think you fixed a bug and it might be
the absolutely proper fix, but I'd be happier about it if it was more
obvious what the rules are and _why_ that is the only case that matters..

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 20:38   ` Linus Torvalds
@ 2003-12-14 20:45     ` Linus Torvalds
  2003-12-14 21:02       ` Ingo Molnar
  2003-12-15 15:04       ` Jörn Engel
  2003-12-14 21:06     ` Ingo Molnar
  2003-12-15 23:06     ` Roland McGrath
  2 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2003-12-14 20:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath



On Sun, 14 Dec 2003, Linus Torvalds wrote:
>
> That code looks fragile as hell. I think you fixed a bug and it might be
> the absolutely proper fix, but I'd be happier about it if it was more
> obvious what the rules are and _why_ that is the only case that matters..

Btw, on another note: to avoid the appearance of recursion, I'd prefer a

	p = leader;
	goto top;

instead of a "release_task(leader);".

I realize that the recursion should be just one deep (the leader of the
leader is itself, and that will stop the thing from going further), but it
looks trivial to avoid it, and any automated source checking tool would be
confused by the apparent recursion.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 20:45     ` Linus Torvalds
@ 2003-12-14 21:02       ` Ingo Molnar
  2003-12-15 15:04       ` Jörn Engel
  1 sibling, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2003-12-14 21:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath


On Sun, 14 Dec 2003, Linus Torvalds wrote:

> Btw, on another note: to avoid the appearance of recursion, I'd prefer a
> 
> 	p = leader;
> 	goto top;
> 
> instead of a "release_task(leader);".
> 
> I realize that the recursion should be just one deep (the leader of the
> leader is itself, and that will stop the thing from going further), but
> it looks trivial to avoid it, and any automated source checking tool
> would be confused by the apparent recursion.

yeah. New (SMP-testbooted) patch attached. I also got rid of the
zap_leader flag. (we can signal leader-zapping via the leader pointer.)

	Ingo

--- linux/kernel/exit.c.orig	
+++ linux/kernel/exit.c	
@@ -51,7 +51,8 @@ void release_task(struct task_struct * p
 {
 	task_t *leader;
 	struct dentry *proc_dentry;
- 
+
+restart:	
 	BUG_ON(p->state < TASK_ZOMBIE);
  
 	atomic_dec(&p->user->processes);
@@ -72,8 +73,20 @@ void release_task(struct task_struct * p
 	 */
 	leader = p->group_leader;
 	if (leader != p && thread_group_empty(leader) &&
-		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1)
+		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1) {
 		do_notify_parent(leader, leader->exit_signal);
+		/*
+		 * If we were the last child thread and the leader has
+		 * exited already, and the leader's parent ignores SIGCHLD,
+		 * then we are the one who should release the leader.
+		 *
+		 * (non-NULL leader triggers the zapping of the leader at
+		 * the end of this function.)
+		 */
+		if (leader->exit_signal != -1)
+			leader = NULL;
+	} else
+		leader = NULL;
 
 	p->parent->cutime += p->utime + p->cutime;
 	p->parent->cstime += p->stime + p->cstime;
@@ -88,6 +101,16 @@ void release_task(struct task_struct * p
 	proc_pid_flush(proc_dentry);
 	release_thread(p);
 	put_task_struct(p);
+
+	/*
+	 * Do this outside the tasklist lock. The reference to the
+	 * leader is safe. There's no recursion possible, since
+	 * the leader of the leader is itself:
+	 */
+	if (unlikely(leader)) {
+		p = leader;
+		goto restart;
+	}
 }
 
 /* we are using it only for SMP init */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 20:45     ` Linus Torvalds
  2003-12-14 21:02       ` Ingo Molnar
@ 2003-12-15 15:04       ` Jörn Engel
  1 sibling, 0 replies; 23+ messages in thread
From: Jörn Engel @ 2003-12-15 15:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, Andrew Morton

On Sun, 14 December 2003 12:45:34 -0800, Linus Torvalds wrote:
> 
> Btw, on another note: to avoid the appearance of recursion, I'd prefer a
> 
> 	p = leader;
> 	goto top;
> 
> instead of a "release_task(leader);".
> 
> I realize that the recursion should be just one deep (the leader of the
> leader is itself, and that will stop the thing from going further), but it
> looks trivial to avoid it, and any automated source checking tool would be
> confused by the apparent recursion.

Since you mentioned it - how would you prefer the asct (we need a
better acronym) to detect recursion depth.  Currently, I have those in
a seperate file that should come with the kernel, maybe in
Documentation/recursions or so.  But how about this:

/**
 * RECURSION:	2
 * NAME:	do_recurse
 */
void do_recurse(int recurse)
{
	if (recurse)
		do_recurse(0);
}

Ok, the format is ugly, feel free to pick anything nicer.  But
explicitly stating the recursion depth right where it happens makes
sense to me, as many human readers would like a similar comment
anyway.

Any opinion?

Jörn

-- 
Fools ignore complexity.  Pragmatists suffer it.
Some can avoid it.  Geniuses remove it.
-- Perlis's Programming Proverb #58, SIGPLAN Notices, Sept.  1982

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 20:38   ` Linus Torvalds
  2003-12-14 20:45     ` Linus Torvalds
@ 2003-12-14 21:06     ` Ingo Molnar
  2003-12-14 22:10       ` Linus Torvalds
  2003-12-15 23:06     ` Roland McGrath
  2 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2003-12-14 21:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath


On Sun, 14 Dec 2003, Linus Torvalds wrote:

> That code looks fragile as hell. I think you fixed a bug and it might be
> the absolutely proper fix, but I'd be happier about it if it was more
> obvious what the rules are and _why_ that is the only case that
> matters..

agreed, and we had a fair number of bugs in this area already. But i dont
know whether (or how) it could be made simpler - i think most of the
ugliness is a reflexion of the POSIX semantics. (combined with ptrace
complexities.)

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 21:06     ` Ingo Molnar
@ 2003-12-14 22:10       ` Linus Torvalds
  2003-12-14 22:17         ` Ingo Molnar
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Linus Torvalds @ 2003-12-14 22:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath

Ingo,

I'm being anal for 2.6.0 again and trying to think of all the
ramifications of the changes, and I found a serious bug in your patch:

Even though the parent ignores SIGCHLD it _can_ be running on another CPU
in "wait4()". And since we drop the tasklist lock before we do the
"release_task()" on the leader, and since the leader is still marked
TASK_ZOMBIE, we may have _two_ different people trying to release it.
First the parent, and then the last thread that still remembers the
leader.

Note that we at this point have not unhashed the thread group leader yet:
we just unhashed the last thread. So we're dropping the task list lock,
and the leader pointer is _not_ safe to keep that way.

We've had that race before, and the fix is to mark the leader TASK_DEAD
while still under the tasklist lock - that way the parent won't see the
soon-to-be-reaped leader even if it were to happen to be in the middle of
a wait4() call and get in while the tasklist lock is released.

So how about this patch? It has this bug fixed, and is otherwise a mix of
your two patches, along with an added sanity check ("the exit_signal for a
thread group leader cannot be -1 until the last thread exited"), which
makes the test make a whole lot more sense in my opinion (the previous
test for "exit_signal != -1" looks nonsensical, and appears to be a
cut-and-paste from somewhere else).

Roland, Petr, comments? Can you see any hole in my logic?

		Linus

----
--- 1.119/kernel/exit.c	Mon Oct 27 11:49:59 2003
+++ edited/kernel/exit.c	Sun Dec 14 14:00:10 2003
@@ -49,9 +49,11 @@

 void release_task(struct task_struct * p)
 {
+	int zap_leader;
 	task_t *leader;
 	struct dentry *proc_dentry;
-
+
+repeat:
 	BUG_ON(p->state < TASK_ZOMBIE);

 	atomic_dec(&p->user->processes);
@@ -70,10 +72,24 @@
 	 * group, and the leader is zombie, then notify the
 	 * group leader's parent process. (if it wants notification.)
 	 */
+	zap_leader = 0;
 	leader = p->group_leader;
-	if (leader != p && thread_group_empty(leader) &&
-		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1)
+	if (leader != p && thread_group_empty(leader) && leader->state == TASK_ZOMBIE) {
+		BUG_ON(leader->exit_signal == -1);
 		do_notify_parent(leader, leader->exit_signal);
+		/*
+		 * If we were the last child thread and the leader has
+		 * exited already, and the leader's parent ignores SIGCHLD,
+		 * then we are the one who should release the leader.
+		 *
+		 * do_notify_parent() will have marked it self-reaping in
+		 * that case.
+		 */
+		if (leader->exit_signal == -1) {
+			zap_leader = 1;
+			leader->state = TASK_DEAD;
+		}
+	}

 	p->parent->cutime += p->utime + p->cutime;
 	p->parent->cstime += p->stime + p->cstime;
@@ -88,6 +104,10 @@
 	proc_pid_flush(proc_dentry);
 	release_thread(p);
 	put_task_struct(p);
+
+	p = leader;
+	if (zap_leader)
+		goto repeat;
 }

 /* we are using it only for SMP init */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 22:10       ` Linus Torvalds
@ 2003-12-14 22:17         ` Ingo Molnar
  2003-12-14 22:32           ` Linus Torvalds
  2003-12-14 22:28         ` Ingo Molnar
  2003-12-15  8:54         ` Arjan van de Ven
  2 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2003-12-14 22:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath


On Sun, 14 Dec 2003, Linus Torvalds wrote:

> Even though the parent ignores SIGCHLD it _can_ be running on another
> CPU in "wait4()". And since we drop the tasklist lock before we do the
> "release_task()" on the leader, and since the leader is still marked
> TASK_ZOMBIE, we may have _two_ different people trying to release it.
> First the parent, and then the last thread that still remembers the
> leader.

are you sure this can happen? eligible_child() does this:

        if (p->exit_signal == -1 && !p->ptrace)
                return 0;

so can the parallel situation really happen?

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 22:17         ` Ingo Molnar
@ 2003-12-14 22:32           ` Linus Torvalds
  2003-12-15 23:04             ` Roland McGrath
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-12-14 22:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath

On Sun, 14 Dec 2003, Ingo Molnar wrote:
>
> are you sure this can happen? eligible_child() does this:

Interesting. That code should have been enough, but we've later on added
extra code into wait_task_zombie to check _exactly_ the same case because
we saw problems.

Hmm.. Maybe that was to protect against concurrent wait4()'ers (which can
happen - the wait case only takes the read lock). Threads can wait for
each others children, after all.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 22:32           ` Linus Torvalds
@ 2003-12-15 23:04             ` Roland McGrath
  0 siblings, 0 replies; 23+ messages in thread
From: Roland McGrath @ 2003-12-15 23:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Petr Vandrovec, Kernel Mailing List, Andrew Morton

> On Sun, 14 Dec 2003, Ingo Molnar wrote:
> >
> > are you sure this can happen? eligible_child() does this:
> 
> Interesting. That code should have been enough, but we've later on added
> extra code into wait_task_zombie to check _exactly_ the same case because
> we saw problems.
> 
> Hmm.. Maybe that was to protect against concurrent wait4()'ers (which can
> happen - the wait case only takes the read lock). Threads can wait for
> each others children, after all.

My recollection about the wait_task_* changes was that they were mostly to
address races between concurrent wait4 calls.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 22:10       ` Linus Torvalds
  2003-12-14 22:17         ` Ingo Molnar
@ 2003-12-14 22:28         ` Ingo Molnar
  2003-12-14 22:45           ` Linus Torvalds
  2003-12-15  8:54         ` Arjan van de Ven
  2 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2003-12-14 22:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath


here's yet another variant of the patch. Changes relative to yours:

- it sets zap_leader to 0 outside the critical section.

- it doesnt set leader to TASK_DEAD, exit_signal == -1 should really 
  protect it from sys_wait4() i believe.

- it sets 'p = leader' within the unlikely branch - slightly faster common
  case.

the patch testbooted fine here.

	Ingo

--- linux/kernel/exit.c.orig	
+++ linux/kernel/exit.c	
@@ -50,8 +50,11 @@ static void __unhash_process(struct task
 void release_task(struct task_struct * p)
 {
 	task_t *leader;
+	int zap_leader;
 	struct dentry *proc_dentry;
- 
+
+repeat:
+	zap_leader = 0;
 	BUG_ON(p->state < TASK_ZOMBIE);
  
 	atomic_dec(&p->user->processes);
@@ -72,8 +75,16 @@ void release_task(struct task_struct * p
 	 */
 	leader = p->group_leader;
 	if (leader != p && thread_group_empty(leader) &&
-		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1)
+		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1) {
 		do_notify_parent(leader, leader->exit_signal);
+		/*
+		 * If we were the last child thread and the leader has
+		 * exited already, and the leader's parent ignores SIGCHLD,
+		 * then we are the one who should release the leader.
+		 */
+		if (leader->exit_signal == -1)
+			zap_leader = 1;
+	}
 
 	p->parent->cutime += p->utime + p->cutime;
 	p->parent->cstime += p->stime + p->cstime;
@@ -88,6 +99,16 @@ void release_task(struct task_struct * p
 	proc_pid_flush(proc_dentry);
 	release_thread(p);
 	put_task_struct(p);
+
+	/*
+	 * Do this outside the tasklist lock. The reference to the
+	 * leader is safe. There's no recursion possible, since
+	 * the leader of the leader is itself:
+	 */
+	if (unlikely(zap_leader)) {
+		p = leader;
+		goto repeat;
+	}
 }
 
 /* we are using it only for SMP init */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 22:28         ` Ingo Molnar
@ 2003-12-14 22:45           ` Linus Torvalds
  2003-12-14 23:08             ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2003-12-14 22:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath

On Sun, 14 Dec 2003, Ingo Molnar wrote:
>
> - it sets zap_leader to 0 outside the critical section.
>
> - it sets 'p = leader' within the unlikely branch - slightly faster common
>   case.

Note that both of these micro-optimizations are equally likely to _hurt_.

In particular, the bigger the liveness region of a variable is, the more
it adds to register pressure, and the worse code gcc generates.
Initializing variables further away from its use tends to just increase
the likelihood that gcc will have to spill that (or another) variable to
the stack.

So setting variables earlier tends to actually pessimize code (yes, a
perfect compiler wouldn't care, and just move the assignment later. But
for such a perfect compiler it doesn't matter where you put it anyway).

Rule of thumb: don't use variables "early" just to get one instruction out
of a critical region. The cost of a critical region is not the variables
inside of it, but the locking, and using variables early not only has the
potential to screw the compiler, it makes the code less readable.

Use the variable where they logically make sense.

Similarly, straight-line unconditional code without any memory footprint
(ie with only cached reads/writes) tends to be basically "free" on any
modern CPU. In contrast, jumping around sure isn't (and gcc tends to
optimize for the wrong things here).

For this reason, don't do

	if (x) {
		a = ...
	} else {
		a = ...;
	}

if one of the cases is trivial and 'a' is a local variable. It's usually
better to use

	a = ...;
	if (x) {
		a = ...
	}

and avoid the jumps.

In fact, from a performance standpoint, it can often be better to use

	a = ...;
	tmp = ....;
	a = x ? tmp : a;

because that allows the compiler to generate straight-line code with
conditional moves - which it would _not_ be able to do if it decided that
one of the subexpressions might cause faults or other unintended
consequences.

Basically: linearize the code. That's what modern CPU's are good at
handling. It's ok to do a small number of extra "normal" instructions if
you can avoid the nasty kind (jumps etc).

			Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 22:45           ` Linus Torvalds
@ 2003-12-14 23:08             ` Ingo Molnar
  2003-12-15  6:31               ` dan carpenter
  2003-12-15 23:15               ` Roland McGrath
  0 siblings, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2003-12-14 23:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath


On Sun, 14 Dec 2003, Linus Torvalds wrote:

> Note that both of these micro-optimizations are equally likely to
> _hurt_.

agreed. Patch without these changes reverted is attached. (this is pretty
close to your last patch.)

just one nit wrt. the first micro-optimization:

> So setting variables earlier tends to actually pessimize code [...]

in this specific case the critical section only sets the variable, and the
lifetime of the variable necessiates a stackslot anyway, on x86. So moving
the initialization outside the lock shouldnt pessimise the generated code,
in fact it could even free up a register, due to the compiler being able
to do:

  movl $1, 0x12(%esp)

due to the clear register spill, instead of using up a register due to the
initialization being too close to the setting to 1.
 
But i agree, due to worse readability alone it is not worth having this
change. And on non-x86 it could even result in a spilled register.

(the nonlinear code at the end of the function was clearly a bad idea.)

	Ingo

--- kernel/exit.c.orig
+++ kernel/exit.c
@@ -50,8 +50,10 @@ static void __unhash_process(struct task
 void release_task(struct task_struct * p)
 {
 	task_t *leader;
+	int zap_leader;
 	struct dentry *proc_dentry;
- 
+
+repeat: 
 	BUG_ON(p->state < TASK_ZOMBIE);
  
 	atomic_dec(&p->user->processes);
@@ -70,10 +72,22 @@ void release_task(struct task_struct * p
 	 * group, and the leader is zombie, then notify the
 	 * group leader's parent process. (if it wants notification.)
 	 */
+	zap_leader = 0;
 	leader = p->group_leader;
-	if (leader != p && thread_group_empty(leader) &&
-		    leader->state == TASK_ZOMBIE && leader->exit_signal != -1)
+	if (leader != p && thread_group_empty(leader) && leader->state == TASK_ZOMBIE) {
+		BUG_ON(leader->exit_signal == -1);
 		do_notify_parent(leader, leader->exit_signal);
+		/*
+		 * If we were the last child thread and the leader has
+		 * exited already, and the leader's parent ignores SIGCHLD,
+		 * then we are the one who should release the leader.
+		 *
+		 * do_notify_parent() will have marked it self-reaping in
+		 * that case.
+		 */
+		if (leader->exit_signal == -1)
+			zap_leader = 1;
+	}
 
 	p->parent->cutime += p->utime + p->cutime;
 	p->parent->cstime += p->stime + p->cstime;
@@ -88,6 +102,15 @@ void release_task(struct task_struct * p
 	proc_pid_flush(proc_dentry);
 	release_thread(p);
 	put_task_struct(p);
+
+	/*
+	 * Do this outside the tasklist lock. The reference to the
+	 * leader is safe. There's no recursion possible, since
+	 * the leader of the leader is itself:
+	 */
+	p = leader;
+	if (zap_leader)
+		goto repeat;
 }
 
 /* we are using it only for SMP init */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 23:08             ` Ingo Molnar
@ 2003-12-15  6:31               ` dan carpenter
  2003-12-15 11:43                 ` Ingo Molnar
  2003-12-15 15:11                 ` Linus Torvalds
  2003-12-15 23:15               ` Roland McGrath
  1 sibling, 2 replies; 23+ messages in thread
From: dan carpenter @ 2003-12-15  6:31 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath

I'm running the patch from BK and I still get unkillable zombies.  I'm running 
strace on these processes and I don't know if that makes a difference.  
Basically, my program is a modified strace that calls the syscalls as well as 
printing it out...  (I'm trying to test the kernel with bogus syscalls).

When I hit ALT-SYSREQ-T most of the zombie processes look like this.

msgctl09      Z 9FF99D73     0  1503   1465          1505  1500 (L-TLB)
c9fc9f68 00000046 c88b49b0 9ff99d73 00000258 c81cea08 c9fc9f68 c012a65b 
       c81ce9b0 00000011 9ff99d73 00000258 c88b49b0 c11e5bc0 000734f9 a150a7ac 
       00000258 c81ce9b0 cbfa7f38 c81cefc4 cbfe8850 00000000 c81ce9b0 c9fc9f90 
Call Trace:
 [<c012a65b>] exit_notify+0x2eb/0x900
 [<c012b08f>] do_exit+0x41f/0x5c0
 [<c012b3d7>] do_group_exit+0x107/0x190
 [<c010aa8f>] syscall_call+0x7/0xb

There are some process that are stuck but not in zombie state that look like 
this.

mknod09       T 00000001     0  1403      1          1420  1394 (NOTLB)
c73adf80 00000082 c11e5bc0 00000001 00000001 c2e91c28 cbfa3f38 858434c0 
       00000256 c35529b0 c11e5bc0 00000ebe 85843808 c11e5bc0 000004c0 858731ea 
       00000256 c62c39b0 08053000 c73ac000 c73ac000 08053000 c73ac000 c73adfa4 
Call Trace:
 [<c0131209>] ptrace_notify+0x49/0xf6
 [<c015c4ac>] sys_brk+0x2c/0x130
 [<c0110cf8>] do_syscall_trace+0x48/0x73
 [<c010ab01>] syscall_trace_entry+0x11/0x24

The CPU usage is normal.

thanks,
dan carpenter


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-15  6:31               ` dan carpenter
@ 2003-12-15 11:43                 ` Ingo Molnar
  2003-12-15 13:07                   ` dan carpenter
  2003-12-15 15:11                 ` Linus Torvalds
  1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2003-12-15 11:43 UTC (permalink / raw)
  To: dan carpenter
  Cc: Linus Torvalds, Petr Vandrovec, Kernel Mailing List,
	Andrew Morton, Roland McGrath


On Sun, 14 Dec 2003, dan carpenter wrote:

>  [<c012a65b>] exit_notify+0x2eb/0x900
>  [<c012b08f>] do_exit+0x41f/0x5c0
>  [<c012b3d7>] do_group_exit+0x107/0x190
>  [<c010aa8f>] syscall_call+0x7/0xb
> 
> There are some process that are stuck but not in zombie state that look
> like this.

do they go away if you do a kill -CONT on them?

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-15 11:43                 ` Ingo Molnar
@ 2003-12-15 13:07                   ` dan carpenter
  0 siblings, 0 replies; 23+ messages in thread
From: dan carpenter @ 2003-12-15 13:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Petr Vandrovec, Kernel Mailing List,
	Andrew Morton, Roland McGrath

On Monday 15 December 2003 03:43 am, Ingo Molnar wrote:
>
> do they go away if you do a kill -CONT on them?
>
> 	Ingo

Yes, they do.   Thanks.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-15  6:31               ` dan carpenter
  2003-12-15 11:43                 ` Ingo Molnar
@ 2003-12-15 15:11                 ` Linus Torvalds
  1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2003-12-15 15:11 UTC (permalink / raw)
  To: dan carpenter
  Cc: Ingo Molnar, Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath

On Sun, 14 Dec 2003, dan carpenter wrote:
>
> I'm running the patch from BK and I still get unkillable zombies.

Note that zombies are normal, and yes, they are _always_ unkillable.
That's why they are called zombies ;)

The only way to get rid of a zombie is to reap it with "wait()", or just
have the parent die (at which point init will do so).

Your case looks like the parent isn't waiting for the zombie, mostly
because the parent is stopped:

> mknod09       T 00000001     0  1403      1          1420  1394 (NOTLB)

and that is easy to make happen with strace if you don't detach from the
thing you are tracing.

You can fix it up by just a "killall -CONT mknod09", and once the parent
continues it will reap the zombie.

			Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 23:08             ` Ingo Molnar
  2003-12-15  6:31               ` dan carpenter
@ 2003-12-15 23:15               ` Roland McGrath
  1 sibling, 0 replies; 23+ messages in thread
From: Roland McGrath @ 2003-12-15 23:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Petr Vandrovec, Kernel Mailing List,
	Andrew Morton

Ingo's final patch looks good to me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 22:10       ` Linus Torvalds
  2003-12-14 22:17         ` Ingo Molnar
  2003-12-14 22:28         ` Ingo Molnar
@ 2003-12-15  8:54         ` Arjan van de Ven
  2003-12-15 22:55           ` Roland McGrath
  2 siblings, 1 reply; 23+ messages in thread
From: Arjan van de Ven @ 2003-12-15  8:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Petr Vandrovec, Kernel Mailing List, Andrew Morton,
	Roland McGrath

[-- Attachment #1: Type: text/plain, Size: 262 bytes --]

On Sun, 2003-12-14 at 23:10, Linus Torvalds wrote:

> Even though the parent ignores SIGCHLD it _can_ be running on another CPU
> in "wait4()".

which fwiw is a case of illegal behavior in the program ... of course
the kernel shouldn't die if it happens.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-15  8:54         ` Arjan van de Ven
@ 2003-12-15 22:55           ` Roland McGrath
  0 siblings, 0 replies; 23+ messages in thread
From: Roland McGrath @ 2003-12-15 22:55 UTC (permalink / raw)
  To: arjanv
  Cc: Linus Torvalds, Ingo Molnar, Petr Vandrovec, Kernel Mailing List,
	Andrew Morton

> On Sun, 2003-12-14 at 23:10, Linus Torvalds wrote:
> 
> > Even though the parent ignores SIGCHLD it _can_ be running on another CPU
> > in "wait4()".
> 
> which fwiw is a case of illegal behavior in the program ... of course
> the kernel shouldn't die if it happens.

No, it is legal to call wait* when ignoring SIGCHLD--and it is required to
return ECHILD for the dead ones.  For example, using waitpid with WNOHANG
is a valid way to poll for the liveness of a child (though there is no good
reason why an application wouldn't just use kill(,0) for that).  It's not a
method that has anything to recommend it, but it is perfectly valid and the
range of permissible results is well-specified.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [patch] Re: Problem with exiting threads under NPTL
  2003-12-14 20:38   ` Linus Torvalds
  2003-12-14 20:45     ` Linus Torvalds
  2003-12-14 21:06     ` Ingo Molnar
@ 2003-12-15 23:06     ` Roland McGrath
  2 siblings, 0 replies; 23+ messages in thread
From: Roland McGrath @ 2003-12-15 23:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Petr Vandrovec, Kernel Mailing List, Andrew Morton

> Or is that case impossible to trigger? Looks a bit like that. But if it
> _is_ impossible to trigger (ie exit_signal cannot be -1 for a thread
> leader), then why does the current code test for "&& leader->exit_signal
> != -1)" at all?

I think that most of the logic testing exit_signal was written when
CLONE_DETACHED could be used independent of CLONE_THREAD.  We concluded
that was unworkable due to some related races and changed clone so that the
situation of a (live) thread group leader with exit_signal == -1 is now
impossible.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-12-15 23:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-14  5:25 Problem with exiting threads under NPTL Petr Vandrovec
2003-12-14 15:02 ` Martin Schlemmer
2003-12-14 19:38 ` [patch] " Ingo Molnar
2003-12-14 20:38   ` Linus Torvalds
2003-12-14 20:45     ` Linus Torvalds
2003-12-14 21:02       ` Ingo Molnar
2003-12-15 15:04       ` Jörn Engel
2003-12-14 21:06     ` Ingo Molnar
2003-12-14 22:10       ` Linus Torvalds
2003-12-14 22:17         ` Ingo Molnar
2003-12-14 22:32           ` Linus Torvalds
2003-12-15 23:04             ` Roland McGrath
2003-12-14 22:28         ` Ingo Molnar
2003-12-14 22:45           ` Linus Torvalds
2003-12-14 23:08             ` Ingo Molnar
2003-12-15  6:31               ` dan carpenter
2003-12-15 11:43                 ` Ingo Molnar
2003-12-15 13:07                   ` dan carpenter
2003-12-15 15:11                 ` Linus Torvalds
2003-12-15 23:15               ` Roland McGrath
2003-12-15  8:54         ` Arjan van de Ven
2003-12-15 22:55           ` Roland McGrath
2003-12-15 23:06     ` Roland McGrath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox