All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Roland McGrath <roland@redhat.com>
Cc: Andreas Schwab <schwab@suse.de>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>
Subject: Re: zombie with CLONE_THREAD
Date: Thu, 1 Jul 2004 06:08:34 +0200	[thread overview]
Message-ID: <20040701040834.GC15086@dualathlon.random> (raw)
In-Reply-To: <200407010322.i613MPDr016785@magilla.sf.frob.com>

On Wed, Jun 30, 2004 at 08:22:25PM -0700, Roland McGrath wrote:
> The reason strace hangs in that case is an strace bug.  strace is

As I wrote in my first email I suspected it needed a bug in strace to
trigger it (that's what exposes the kernel bug). But I only focused on
the clear kernel bug ;)

> The kernel bug is that when you kill strace, the non-leader zombie hangs
> around and that keeps the zombie leader around too (which is the one
> you'll see in ps).  The following patch fixes that for me.  I am not
> 100% confident that the locking dance required here doesn't create some
> weird issue (and it certainly seems inefficient how many times the lock
> is released and retaken in this sequence), but maybe 92% sure.

this looks much less obvious than my fix. Instead of fixing
TASK_DEAD like I did, you're actually working around the fact the child
didn't go away when exit_notify was called on it. The child is still
there and you now hack and try to release it by hand anyways. it looks
much less obvious and more risky, and it doesn't look the right fix to
me.

My patch sounds much simpler to me, can you tell me what's the point of
leaving ptrace enabled on a task that already executed do_exit? There's
no point IMHO, such a task will never run again ever and the address
space is long gone. So I believe my fix is prefereable, since it simply
make sure that the tasks that needs to be released with TASK_DEAD will
be actually released.

Infact maybe we should just call ptrace_unlink unconditionally there,
and remove it from release_task, but since the code was working fine for
the leader-thread and the normal processes, I only taken care of
releasing the clones (to keep the patch as simple as possible just in
case I was missing something about ptrace making any sense even after
do_exit has been called).

so maybe on the same lines of my yesterday fix, a cleaner approch could
be this. beware, patch completely untested, only to show you the idea.
Again I'm assuming ptrace makes no sense after do_exit, I don't see how
it could.

I believe the real bug is that we let tsk->ptrace mess with
release_task in the TASK_DEAD case with exit_signal == -1, and what I'm
changing make sure that the task is released when it has been set
TASK_DEAD with exit_signal == -1. There is no point that I can see that
would make sense to leave such tsk->ptrace checks there in exit_notify,
when ptrace cannot make a difference anymore.

--- sles/kernel/exit.c.~1~	2004-06-30 01:41:58.000000000 +0200
+++ sles/kernel/exit.c	2004-07-01 06:01:54.197527656 +0200
@@ -71,8 +71,7 @@ repeat: 
 	spin_lock(&p->proc_lock);
 	proc_dentry = proc_pid_unhash(p);
 	write_lock_irq(&tasklist_lock);
-	if (unlikely(p->ptrace))
-		__ptrace_unlink(p);
+	BUG_ON(p->ptrace);
 	BUG_ON(!list_empty(&p->ptrace_list) || !list_empty(&p->ptrace_children));
 	__exit_signal(p);
 	__exit_sighand(p);
@@ -734,8 +733,16 @@ static void exit_notify(struct task_stru
 		do_notify_parent(tsk, SIGCHLD);
 	}
 
+	/*
+	 * Give up ptrace here (task is gone and it will never run again),
+	 * this also make sure to release_task() any TASK_DEAD task with
+	 * exit_signal == -1.
+	 */
+	if (unlikely(tsk->ptrace))
+		__ptrace_unlink(tsk);
+
 	state = TASK_ZOMBIE;
-	if (tsk->exit_signal == -1 && tsk->ptrace == 0)
+	if (tsk->exit_signal == -1)
 		state = TASK_DEAD;
 	tsk->state = state;
 	tsk->flags |= PF_DEAD;

  reply	other threads:[~2004-07-01  4:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-30  6:00 zombie with CLONE_THREAD Andrea Arcangeli
2004-06-30  6:08 ` Andrew Morton
2004-06-30  7:14   ` Roland McGrath
2004-06-30  9:04     ` Andreas Schwab
2004-07-01  3:22       ` Roland McGrath
2004-07-01  4:08         ` Andrea Arcangeli [this message]
2004-07-01  4:42           ` Linus Torvalds
2004-07-01  5:39           ` Roland McGrath
2004-07-01  5:56             ` Linus Torvalds
2004-07-01  7:06               ` Roland McGrath
2004-07-01 14:26                 ` Andrea Arcangeli
2004-07-01 21:33                   ` Roland McGrath
2004-07-01 15:49                 ` Linus Torvalds
2004-07-01 16:23                   ` Andrea Arcangeli
2004-07-01 16:43                     ` Linus Torvalds
2004-07-01 20:27                   ` Roland McGrath
2004-07-01  4:57         ` Linus Torvalds
2004-07-01  7:02           ` Roland McGrath
2004-07-01 16:50             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040701040834.GC15086@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roland@redhat.com \
    --cc=schwab@suse.de \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.