* zombie with CLONE_THREAD @ 2004-06-30 6:00 Andrea Arcangeli 2004-06-30 6:08 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Andrea Arcangeli @ 2004-06-30 6:00 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton Hello, I debugged a problem with CLONE_THREAD under strace generating zombies that cannot be reaped by init. Basically what's going on is that release_task is never called on the clones, and in turn the parent thread will remain zombie forever because thread_group_empty == 0 (it never notifies init). the group can become empty only after release_task has been called on all the clones. What's going on is that if the clone happen to be under strace by the time it exits its state will not be set to TASK_DEAD and nobody will ever call wait4 on the clone because the parent is being killed at the same time. But the parent cannot go away until the clone goes away too. I believe strace needs as well a little race where it has the sigchld disabled but what I'm discussing here is still a kernel bug generating zombie threads. I think I could have fixed even with a strictier patch (adding a exit_signal == -1 check just to cover that case), but I believe that it makes no sense to leave ptrace enabled on a clone that is being killed, it happens to be safe without a thread-group just because there will be always init able to call wait4->release_task on it, that will call ptrace_unlink later in release_task, same goes for the "leader" of the thread group, that as well can be detached by ptrace via release_task). so far this thing worked fine in practice. Would be nice to hear a comment from somebody who understand this CLONE_THREAD signal mess^Wstuff with ptrace better. --- sles/kernel/exit.c.~1~ 2004-06-30 01:41:58.000000000 +0200 +++ sles/kernel/exit.c 2004-06-30 07:49:21.705154000 +0200 @@ -734,6 +734,13 @@ static void exit_notify(struct task_stru do_notify_parent(tsk, SIGCHLD); } + /* + * To allow the group leader of a thread group to be released + * we must really go away synchronously if exit_signal == -1. + */ + if (unlikely(tsk->ptrace) && tsk != tsk->group_leader) + __ptrace_unlink(tsk); + state = TASK_ZOMBIE; if (tsk->exit_signal == -1 && tsk->ptrace == 0) state = TASK_DEAD; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-06-30 6:00 zombie with CLONE_THREAD Andrea Arcangeli @ 2004-06-30 6:08 ` Andrew Morton 2004-06-30 7:14 ` Roland McGrath 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2004-06-30 6:08 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Linus Torvalds, Roland McGrath Andrea Arcangeli <andrea@suse.de> wrote: > > Hello, > > I debugged a problem with CLONE_THREAD under strace generating zombies > that cannot be reaped by init. > > Basically what's going on is that release_task is never called on the > clones, and in turn the parent thread will remain zombie forever because > thread_group_empty == 0 (it never notifies init). the group can become > empty only after release_task has been called on all the clones. > > What's going on is that if the clone happen to be under strace by the > time it exits its state will not be set to TASK_DEAD and nobody will > ever call wait4 on the clone because the parent is being killed at the > same time. But the parent cannot go away until the clone goes away too. > I believe strace needs as well a little race where it has the sigchld > disabled but what I'm discussing here is still a kernel bug generating > zombie threads. > > I think I could have fixed even with a strictier patch (adding a > exit_signal == -1 check just to cover that case), but I believe that it > makes no sense to leave ptrace enabled on a clone that is being killed, > it happens to be safe without a thread-group just because there will be > always init able to call wait4->release_task on it, that will call > ptrace_unlink later in release_task, same goes for the "leader" of the > thread group, that as well can be detached by ptrace via release_task). > > so far this thing worked fine in practice. Would be nice to hear a > comment from somebody who understand this CLONE_THREAD signal > mess^Wstuff with ptrace better. Maybe Linus or Roland could comment. > --- sles/kernel/exit.c.~1~ 2004-06-30 01:41:58.000000000 +0200 > +++ sles/kernel/exit.c 2004-06-30 07:49:21.705154000 +0200 > @@ -734,6 +734,13 @@ static void exit_notify(struct task_stru > do_notify_parent(tsk, SIGCHLD); > } > > + /* > + * To allow the group leader of a thread group to be released > + * we must really go away synchronously if exit_signal == -1. > + */ > + if (unlikely(tsk->ptrace) && tsk != tsk->group_leader) > + __ptrace_unlink(tsk); > + Should this be using thread_group_leader()? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-06-30 6:08 ` Andrew Morton @ 2004-06-30 7:14 ` Roland McGrath 2004-06-30 9:04 ` Andreas Schwab 0 siblings, 1 reply; 19+ messages in thread From: Roland McGrath @ 2004-06-30 7:14 UTC (permalink / raw) To: Andrew Morton; +Cc: Andrea Arcangeli, linux-kernel, Linus Torvalds That change sure looks incorrect and unnecessary to me. It wasn't clear to me from Andrea's description whether there is really any kernel bug here or not. If some process is ptrace'ing a CLONE_THREAD thread, then the thread should not disappear as this patch makes it do. It becomes a zombie and reports its death status to the ptracer. When the ptracer waits for it, it reverts to its original parent and then the logic in wait_task_zombie should call release_task for the CLONE_THREAD case. Are you saying that if the ptracer dies, it can leave some threads in limbo? I think that case is supposed to work because forget_original_parent will move all the threads ptrace'd by the dying tracer process to be ptrace'd by init, which will then clean up their zombies as previously described. Thanks, Roland ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-06-30 7:14 ` Roland McGrath @ 2004-06-30 9:04 ` Andreas Schwab 2004-07-01 3:22 ` Roland McGrath 0 siblings, 1 reply; 19+ messages in thread From: Andreas Schwab @ 2004-06-30 9:04 UTC (permalink / raw) To: Roland McGrath Cc: Andrew Morton, Andrea Arcangeli, linux-kernel, Linus Torvalds [-- Attachment #1: Type: text/plain, Size: 551 bytes --] Roland McGrath <roland@redhat.com> writes: > Are you saying that if the ptracer dies, it can leave some threads in limbo? > I think that case is supposed to work because forget_original_parent will > move all the threads ptrace'd by the dying tracer process to be ptrace'd by > init, which will then clean up their zombies as previously described. Here is the test case, run it with "strace -f ./clone". When the bug happens then strace is stuck waiting for it's traced child that just died, but you may have to try a few times before it happens. [-- Attachment #2: clone.c --] [-- Type: text/plain, Size: 966 bytes --] #include <stdio.h> #include <signal.h> #include <sched.h> #include <unistd.h> #include <sys/types.h> #include <sys/mman.h> extern int __clone (int (*__fn) (void *__arg), void *__child_stack, int __flags, void *__arg, ...); extern int __clone2 (int (*__fn) (void *__arg), void *__child_stack_base, size_t __child_stack_size, int __flags, void *__arg, ...); static int thread (void *arg) { write (2, "thread\n", sizeof ("thread\n")); *(volatile int *) 0; return 0; } #define STACK_SIZE 1024 * 1024 #define CLONE_FLAGS CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM int main (void) { void *stack = mmap (0, STACK_SIZE, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0); pid_t pid; #ifdef __ia64__ pid = __clone2 (thread, stack, STACK_SIZE - 64, CLONE_FLAGS, 0); #else pid = __clone (thread, stack + STACK_SIZE - 64, CLONE_FLAGS, 0); #endif printf ("pid = %d\n", pid); sleep (1); return 0; } [-- Attachment #3: Type: text/plain, Size: 228 bytes --] Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-06-30 9:04 ` Andreas Schwab @ 2004-07-01 3:22 ` Roland McGrath 2004-07-01 4:08 ` Andrea Arcangeli 2004-07-01 4:57 ` Linus Torvalds 0 siblings, 2 replies; 19+ messages in thread From: Roland McGrath @ 2004-07-01 3:22 UTC (permalink / raw) To: Andrea Arcangeli Cc: Andreas Schwab, Andrew Morton, linux-kernel, Linus Torvalds The reason strace hangs in that case is an strace bug. strace is blocking in a wait4 call on the specific PID of the zombie group leader, which will never report until strace reaps the other non-leader thread it is tracing. But strace won't ever do that, because it's blocked saying it only cares about the zombie leader's PID. You should report that bug in strace. I pity the fool who agreed to maintain that unholy pile of spaghetti--what a maroon! The kernel bug is that when you kill strace, the non-leader zombie hangs around and that keeps the zombie leader around too (which is the one you'll see in ps). The following patch fixes that for me. I am not 100% confident that the locking dance required here doesn't create some weird issue (and it certainly seems inefficient how many times the lock is released and retaken in this sequence), but maybe 92% sure. Thanks, Roland --- linux-2.6.7-mm4/kernel/exit.c.~1~ 2004-06-30 16:29:06.000000000 -0700 +++ linux-2.6.7-mm4/kernel/exit.c 2004-06-30 18:55:36.000000000 -0700 @@ -618,9 +618,21 @@ static inline void forget_original_paren reparent_thread(p, father, 0); } else { ptrace_unlink (p); - if (p->state == TASK_ZOMBIE && p->exit_signal != -1 && - thread_group_empty(p)) - do_notify_parent(p, p->exit_signal); + if (p->state == TASK_ZOMBIE) { + if (p->exit_signal == -1) { + /* + * This was only a zombie because + * we were tracing it. Now it should + * disappear as it would have done + * if we hadn't been tracing it. + */ + write_unlock_irq(&tasklist_lock); + release_task(p); + write_lock_irq(&tasklist_lock); + } + else if (thread_group_empty(p)) + do_notify_parent(p, p->exit_signal); + } } } list_for_each_safe(_p, _n, &father->ptrace_children) { ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 3:22 ` Roland McGrath @ 2004-07-01 4:08 ` Andrea Arcangeli 2004-07-01 4:42 ` Linus Torvalds 2004-07-01 5:39 ` Roland McGrath 2004-07-01 4:57 ` Linus Torvalds 1 sibling, 2 replies; 19+ messages in thread From: Andrea Arcangeli @ 2004-07-01 4:08 UTC (permalink / raw) To: Roland McGrath Cc: Andreas Schwab, Andrew Morton, linux-kernel, Linus Torvalds On Wed, Jun 30, 2004 at 08:22:25PM -0700, Roland McGrath wrote: > The reason strace hangs in that case is an strace bug. strace is As I wrote in my first email I suspected it needed a bug in strace to trigger it (that's what exposes the kernel bug). But I only focused on the clear kernel bug ;) > The kernel bug is that when you kill strace, the non-leader zombie hangs > around and that keeps the zombie leader around too (which is the one > you'll see in ps). The following patch fixes that for me. I am not > 100% confident that the locking dance required here doesn't create some > weird issue (and it certainly seems inefficient how many times the lock > is released and retaken in this sequence), but maybe 92% sure. this looks much less obvious than my fix. Instead of fixing TASK_DEAD like I did, you're actually working around the fact the child didn't go away when exit_notify was called on it. The child is still there and you now hack and try to release it by hand anyways. it looks much less obvious and more risky, and it doesn't look the right fix to me. My patch sounds much simpler to me, can you tell me what's the point of leaving ptrace enabled on a task that already executed do_exit? There's no point IMHO, such a task will never run again ever and the address space is long gone. So I believe my fix is prefereable, since it simply make sure that the tasks that needs to be released with TASK_DEAD will be actually released. Infact maybe we should just call ptrace_unlink unconditionally there, and remove it from release_task, but since the code was working fine for the leader-thread and the normal processes, I only taken care of releasing the clones (to keep the patch as simple as possible just in case I was missing something about ptrace making any sense even after do_exit has been called). so maybe on the same lines of my yesterday fix, a cleaner approch could be this. beware, patch completely untested, only to show you the idea. Again I'm assuming ptrace makes no sense after do_exit, I don't see how it could. I believe the real bug is that we let tsk->ptrace mess with release_task in the TASK_DEAD case with exit_signal == -1, and what I'm changing make sure that the task is released when it has been set TASK_DEAD with exit_signal == -1. There is no point that I can see that would make sense to leave such tsk->ptrace checks there in exit_notify, when ptrace cannot make a difference anymore. --- sles/kernel/exit.c.~1~ 2004-06-30 01:41:58.000000000 +0200 +++ sles/kernel/exit.c 2004-07-01 06:01:54.197527656 +0200 @@ -71,8 +71,7 @@ repeat: spin_lock(&p->proc_lock); proc_dentry = proc_pid_unhash(p); write_lock_irq(&tasklist_lock); - if (unlikely(p->ptrace)) - __ptrace_unlink(p); + BUG_ON(p->ptrace); BUG_ON(!list_empty(&p->ptrace_list) || !list_empty(&p->ptrace_children)); __exit_signal(p); __exit_sighand(p); @@ -734,8 +733,16 @@ static void exit_notify(struct task_stru do_notify_parent(tsk, SIGCHLD); } + /* + * Give up ptrace here (task is gone and it will never run again), + * this also make sure to release_task() any TASK_DEAD task with + * exit_signal == -1. + */ + if (unlikely(tsk->ptrace)) + __ptrace_unlink(tsk); + state = TASK_ZOMBIE; - if (tsk->exit_signal == -1 && tsk->ptrace == 0) + if (tsk->exit_signal == -1) state = TASK_DEAD; tsk->state = state; tsk->flags |= PF_DEAD; ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 4:08 ` Andrea Arcangeli @ 2004-07-01 4:42 ` Linus Torvalds 2004-07-01 5:39 ` Roland McGrath 1 sibling, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2004-07-01 4:42 UTC (permalink / raw) To: Andrea Arcangeli Cc: Roland McGrath, Andreas Schwab, Andrew Morton, linux-kernel On Thu, 1 Jul 2004, Andrea Arcangeli wrote: > > My patch sounds much simpler to me, can you tell me what's the point of > leaving ptrace enabled on a task that already executed do_exit? To let the tracer look at the exit code? How would you otherwise see what exit code the child exited with? Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 4:08 ` Andrea Arcangeli 2004-07-01 4:42 ` Linus Torvalds @ 2004-07-01 5:39 ` Roland McGrath 2004-07-01 5:56 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Roland McGrath @ 2004-07-01 5:39 UTC (permalink / raw) To: Andrea Arcangeli Cc: Andreas Schwab, Andrew Morton, linux-kernel, Linus Torvalds > this looks much less obvious than my fix. Instead of fixing > TASK_DEAD like I did, you're actually working around the fact the child > didn't go away when exit_notify was called on it. No, I am preserving the feature that the child doesn't go away in this case. ptraced threads always become zombies and let the ptracer see their exit notification and status value. That is the way we want it to stay. Linus makes the same point: > To let the tracer look at the exit code? > > How would you otherwise see what exit code the child exited with? In fact, the exit code is usually completely uninteresting for a CLONE_THREAD thread (after all, ptrace is the *only* way to see that value, so the _exit call didn't expect to pass useful information that way). However, the reliable notification of the fact that the thread died is very useful for anything tracing/debugging it. Thanks, Roland ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 5:39 ` Roland McGrath @ 2004-07-01 5:56 ` Linus Torvalds 2004-07-01 7:06 ` Roland McGrath 0 siblings, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2004-07-01 5:56 UTC (permalink / raw) To: Roland McGrath Cc: Andrea Arcangeli, Andreas Schwab, Andrew Morton, linux-kernel On Wed, 30 Jun 2004, Roland McGrath wrote: > > No, I am preserving the feature that the child doesn't go away in this case. > ptraced threads always become zombies and let the ptracer see their exit > notification and status value. That is the way we want it to stay. Umm.. This is not the "ptrace_list". This is the _regular_ child list. Which means that a bad person can try to: - have "normal" children that are self-reaping. - _also_ have a self-reaping ptraced child. Now those _normal_ children may go away, no? > Linus makes the same point: > > To let the tracer look at the exit code? > > > > How would you otherwise see what exit code the child exited with? > > In fact, the exit code is usually completely uninteresting for a > CLONE_THREAD thread (after all, ptrace is the *only* way to see that value, > so the _exit call didn't expect to pass useful information that way). Hmm. That would argue for Andrea's patch. > However, the reliable notification of the fact that the thread died is very > useful for anything tracing/debugging it. .. since this information should be available anyway (we'll have woken up the tracer, and the tracer will see that the child is gone by simply seeing the ESRCH errorcode from ptrace). Wouldn't that be reliable enough? So now I'm starting to lean towards Andrea's patch after all.. (Let no-one say that I don't change my mind when presented with good arguments. Some people complain that I change my mind _too_ often ;) Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 5:56 ` Linus Torvalds @ 2004-07-01 7:06 ` Roland McGrath 2004-07-01 14:26 ` Andrea Arcangeli 2004-07-01 15:49 ` Linus Torvalds 0 siblings, 2 replies; 19+ messages in thread From: Roland McGrath @ 2004-07-01 7:06 UTC (permalink / raw) To: Linus Torvalds Cc: Andrea Arcangeli, Andreas Schwab, Andrew Morton, linux-kernel > On Wed, 30 Jun 2004, Roland McGrath wrote: > > > > No, I am preserving the feature that the child doesn't go away in this case. > > ptraced threads always become zombies and let the ptracer see their exit > > notification and status value. That is the way we want it to stay. > > Umm.. This is not the "ptrace_list". This is the _regular_ child list. The ptrace_list/ptrace_children list is the list of your natural children that someone else has stolen by tracing them. The "regular" child list is all your natural children that noone has stolen, plus the ones you are tracing. We are talking here about an element on your "regular" child list that is one you have stolen by tracing it, not a natural child. If it is a natural child, whether or not you are tracing it, this is not the case we have been addressing so far. > Which means that a bad person can try to: > - have "normal" children that are self-reaping. > - _also_ have a self-reaping ptraced child. > > Now those _normal_ children may go away, no? I think you are talking about this case: #include <signal.h> #include <unistd.h> #include <sys/ptrace.h> int main (void) { signal(SIGCHLD,SIG_IGN); switch (fork()) { case -1:perror("fork");return 2; case 0: ptrace(PTRACE_TRACEME, 0, (char *) 1, 0); _exit(2); default: sleep(2); return 0; } } This program leaves a leaked zombie around. That is fixed by handling the case in reparent_thread where it possibly calls do_notify_parent in the same way as the forget_original_parent case. Not surprising, as both places have the same existing code to handle the same issue--and both overlook the same case. I've just tested a version of my prior patch that covers this case as well, and it works. I can give you either the lock-reacquiring version of that or the version based on the list-collection patch I just posted. > .. since this information should be available anyway (we'll have woken up > the tracer, and the tracer will see that the child is gone by simply > seeing the ESRCH errorcode from ptrace). When did you wake up the tracer? I don't see how that happened. If the tracer is blocked in a wait4 call and still has other live children, it stays blocked. Next time it wakes up for some other reason, it can poll via a wait4 or ptrace call for each specific thread it knows it was attached to and ascertain through ESRCH errors when one died. Having to do that sucks ass. Thanks, Roland ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 7:06 ` Roland McGrath @ 2004-07-01 14:26 ` Andrea Arcangeli 2004-07-01 21:33 ` Roland McGrath 2004-07-01 15:49 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Andrea Arcangeli @ 2004-07-01 14:26 UTC (permalink / raw) To: Roland McGrath Cc: Linus Torvalds, Andreas Schwab, Andrew Morton, linux-kernel On Thu, Jul 01, 2004 at 12:06:51AM -0700, Roland McGrath wrote: > When did you wake up the tracer? I don't see how that happened. If the > tracer is blocked in a wait4 call and still has other live children, it > stays blocked. Next time it wakes up for some other reason, it can poll the wait4 in strace needing to see the "stolen child" in its ->children list is the only good reason for deferring the release_task of the otherwise self-reaping tasks, until the killing of strace itself. You ignored my patch completely in your reply, so it wasn't clear to me if something was wrong with it in terms of stracing and so I asked what's the point of the "deferring", now it's clear. The thing I figured out was that the deferring of the release_task for self-reaping clones (the one I incidentally tried to remove ;) is what generates the buggy code since things goes a bit complex there and we need to make them even a bit more complex with your fix to preserve the feature (I now agree it is really a feature). I think I understood this code now, thanks a lot for the help. What about this? (fixes both testcases for me) --- sles/kernel/exit.c.~1~ 2004-06-30 01:41:58.000000000 +0200 +++ sles/kernel/exit.c 2004-07-01 16:19:13.207742224 +0200 @@ -596,7 +596,8 @@ static inline void reparent_thread(task_ * group, and if no such member exists, give it to * the global child reaper process (ie "init") */ -static inline void forget_original_parent(struct task_struct * father) +static inline void forget_original_parent(struct task_struct * father, + struct list_head *to_release) { struct task_struct *p, *reaper = father; struct list_head *_p, *_n; @@ -614,16 +615,34 @@ static inline void forget_original_paren * Search them and reparent children. */ list_for_each_safe(_p, _n, &father->children) { + int ptrace; p = list_entry(_p,struct task_struct,sibling); + + ptrace = p->ptrace; + + /* if father isn't the real parent, then ptrace must be enabled */ + BUG_ON(father != p->real_parent && !ptrace); + if (father == p->real_parent) { + /* reparent with a reaper, real father it's us */ choose_new_parent(p, reaper, child_reaper); reparent_thread(p, father, 0); } else { - ptrace_unlink (p); + /* reparent ptraced task to its real parent */ + __ptrace_unlink (p); if (p->state == TASK_ZOMBIE && p->exit_signal != -1 && thread_group_empty(p)) do_notify_parent(p, p->exit_signal); } + + /* + * if the ptraced child is a zombie with exit_signal == -1 + * we must collect it before we exit, or it will remain + * zombie forever since we prevented it from self-reap itself + * while it was being traced by us, to be able to see it in wait4. + */ + if (unlikely(ptrace && p->state == TASK_ZOMBIE && p->exit_signal == -1)) + list_add(&p->ptrace_list, to_release); } list_for_each_safe(_p, _n, &father->ptrace_children) { p = list_entry(_p,struct task_struct,ptrace_list); @@ -640,6 +659,7 @@ static void exit_notify(struct task_stru { int state; struct task_struct *t; + struct list_head ptrace_dead, *_p, *_n; ckrm_cb_exit(tsk); @@ -677,8 +697,10 @@ static void exit_notify(struct task_stru * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) */ - forget_original_parent(tsk); + INIT_LIST_HEAD(&ptrace_dead); + forget_original_parent(tsk, &ptrace_dead); BUG_ON(!list_empty(&tsk->children)); + BUG_ON(!list_empty(&tsk->ptrace_children)); /* * Check to see if any process groups have become orphaned @@ -763,6 +785,12 @@ static void exit_notify(struct task_stru _raw_write_unlock(&tasklist_lock); local_irq_enable(); + list_for_each_safe(_p, _n, &ptrace_dead) { + list_del_init(_p); + t = list_entry(_p,struct task_struct,ptrace_list); + release_task(t); + } + /* If the process is dead, release it - nobody will wait for it */ if (state == TASK_DEAD) release_task(tsk); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 14:26 ` Andrea Arcangeli @ 2004-07-01 21:33 ` Roland McGrath 0 siblings, 0 replies; 19+ messages in thread From: Roland McGrath @ 2004-07-01 21:33 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Andreas Schwab, Andrew Morton, linux-kernel Here is the full version of my patch, in case you care. It has the same effect as Andrea's variant. But I think this one doesn't introduce any new tests into the hot code path, which that one does. Thanks, Roland Signed-off-by: Roland McGrath <roland@redhat.com> --- linux-2.6.7-mm4/kernel/exit.c.~1~ 2004-06-30 16:29:06.000000000 -0700 +++ linux-2.6.7-mm4/kernel/exit.c 2004-07-01 14:23:36.000000000 -0700 @@ -537,7 +537,8 @@ static inline void choose_new_parent(tas BUG(); } -static inline void reparent_thread(task_t *p, task_t *father, int traced) +static inline void reparent_thread(task_t *p, task_t *father, int traced, + struct list_head *to_release) { /* We don't want people slaying init. */ if (p->exit_signal != -1) @@ -566,9 +567,12 @@ static inline void reparent_thread(task_ /* If we'd notified the old parent about this child's death, * also notify the new parent. */ - if (p->state == TASK_ZOMBIE && p->exit_signal != -1 && - thread_group_empty(p)) - do_notify_parent(p, p->exit_signal); + if (p->state == TASK_ZOMBIE) { + if (p->exit_signal == -1) + list_add(&p->ptrace_list, to_release); + else if (thread_group_empty(p)) + do_notify_parent(p, p->exit_signal); + } } /* @@ -594,7 +598,8 @@ static inline void reparent_thread(task_ * group, and if no such member exists, give it to * the global child reaper process (ie "init") */ -static inline void forget_original_parent(struct task_struct * father) +static inline void forget_original_parent(struct task_struct * father, + struct list_head *to_release) { struct task_struct *p, *reaper = father; struct list_head *_p, *_n; @@ -615,18 +620,29 @@ static inline void forget_original_paren p = list_entry(_p,struct task_struct,sibling); if (father == p->real_parent) { choose_new_parent(p, reaper, child_reaper); - reparent_thread(p, father, 0); + reparent_thread(p, father, 0, to_release); } else { - ptrace_unlink (p); - if (p->state == TASK_ZOMBIE && p->exit_signal != -1 && - thread_group_empty(p)) - do_notify_parent(p, p->exit_signal); + BUG_ON(!p->ptrace); + __ptrace_unlink (p); + if (p->state == TASK_ZOMBIE) { + if (p->exit_signal == -1) { + /* + * This was only a zombie because + * we were tracing it. Now it should + * disappear as it would have done + * if we hadn't been tracing it. + */ + list_add(&p->ptrace_list, to_release); + } + else if (thread_group_empty(p)) + do_notify_parent(p, p->exit_signal); + } } } list_for_each_safe(_p, _n, &father->ptrace_children) { p = list_entry(_p,struct task_struct,ptrace_list); choose_new_parent(p, reaper, child_reaper); - reparent_thread(p, father, 1); + reparent_thread(p, father, 1, to_release); } } @@ -638,6 +654,7 @@ static void exit_notify(struct task_stru { int state; struct task_struct *t; + struct list_head ptrace_dead, *_p, *_n; if (signal_pending(tsk) && !tsk->signal->group_exit && !thread_group_empty(tsk)) { @@ -673,7 +690,8 @@ static void exit_notify(struct task_stru * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) */ - forget_original_parent(tsk); + INIT_LIST_HEAD(&ptrace_dead); + forget_original_parent(tsk, &ptrace_dead); BUG_ON(!list_empty(&tsk->children)); /* @@ -759,6 +777,12 @@ static void exit_notify(struct task_stru _raw_write_unlock(&tasklist_lock); local_irq_enable(); + list_for_each_safe(_p, _n, &ptrace_dead) { + list_del_init(_p); + t = list_entry(_p,struct task_struct,ptrace_list); + release_task(t); + } + /* If the process is dead, release it - nobody will wait for it */ if (state == TASK_DEAD) release_task(tsk); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 7:06 ` Roland McGrath 2004-07-01 14:26 ` Andrea Arcangeli @ 2004-07-01 15:49 ` Linus Torvalds 2004-07-01 16:23 ` Andrea Arcangeli 2004-07-01 20:27 ` Roland McGrath 1 sibling, 2 replies; 19+ messages in thread From: Linus Torvalds @ 2004-07-01 15:49 UTC (permalink / raw) To: Roland McGrath Cc: Andrea Arcangeli, Andreas Schwab, Andrew Morton, linux-kernel On Thu, 1 Jul 2004, Roland McGrath wrote: > > > .. since this information should be available anyway (we'll have woken up > > the tracer, and the tracer will see that the child is gone by simply > > seeing the ESRCH errorcode from ptrace). > > When did you wake up the tracer? I don't see how that happened. exit_notify() will inform the tracer: if (tsk->exit_signal != -1 && thread_group_empty(tsk)) { int signal = tsk->parent == tsk->real_parent ? tsk->exit_signal : SIGCHLD; do_notify_parent(tsk, signal); } else if (tsk->ptrace) { *** do_notify_parent(tsk, SIGCHLD); ***** } so this should catch it. It even gets the pid of the child in the siginfo structure if it really wants to see that.. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 15:49 ` Linus Torvalds @ 2004-07-01 16:23 ` Andrea Arcangeli 2004-07-01 16:43 ` Linus Torvalds 2004-07-01 20:27 ` Roland McGrath 1 sibling, 1 reply; 19+ messages in thread From: Andrea Arcangeli @ 2004-07-01 16:23 UTC (permalink / raw) To: Linus Torvalds Cc: Roland McGrath, Andreas Schwab, Andrew Morton, linux-kernel On Thu, Jul 01, 2004 at 08:49:10AM -0700, Linus Torvalds wrote: > > > On Thu, 1 Jul 2004, Roland McGrath wrote: > > > > > .. since this information should be available anyway (we'll have woken up > > > the tracer, and the tracer will see that the child is gone by simply > > > seeing the ESRCH errorcode from ptrace). > > > > When did you wake up the tracer? I don't see how that happened. > > exit_notify() will inform the tracer: > > if (tsk->exit_signal != -1 && thread_group_empty(tsk)) { > int signal = tsk->parent == tsk->real_parent ? tsk->exit_signal : SIGCHLD; > do_notify_parent(tsk, signal); > } else if (tsk->ptrace) { > *** do_notify_parent(tsk, SIGCHLD); ***** > } > > so this should catch it. It even gets the pid of the child in the siginfo > structure if it really wants to see that.. it will get the wakeup, but that doesn't mean wait4 will return if it's waiting for all childs. Now I don't know but with strace it may even be ok by luck, but it certainly changes the semantics of wait4/ptrace at least a little and I could remotely imagine if somebody wants a wait4 to return when a ptraced child exited, if there are other "regular" children. That's why I admitted it can be considered a feature and not a completely worthless effort to leave self-reaping tasks as zombies if they're ptraced. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 16:23 ` Andrea Arcangeli @ 2004-07-01 16:43 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2004-07-01 16:43 UTC (permalink / raw) To: Andrea Arcangeli Cc: Roland McGrath, Andreas Schwab, Andrew Morton, linux-kernel On Thu, 1 Jul 2004, Andrea Arcangeli wrote: > > That's why I admitted it can be considered a feature and not a > completely worthless effort to leave self-reaping tasks as zombies if > they're ptraced. Hey, if you both agree, I'm not going to complain. Applied. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 15:49 ` Linus Torvalds 2004-07-01 16:23 ` Andrea Arcangeli @ 2004-07-01 20:27 ` Roland McGrath 1 sibling, 0 replies; 19+ messages in thread From: Roland McGrath @ 2004-07-01 20:27 UTC (permalink / raw) To: Linus Torvalds Cc: Andrea Arcangeli, Andreas Schwab, Andrew Morton, linux-kernel > It even gets the pid of the child in the siginfo > structure if it really wants to see that.. Unless there were two deaths quickly before it got scheduled to run its signal handler. Then it only finds out one of the relevant PIDs. Thanks, Roland ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 3:22 ` Roland McGrath 2004-07-01 4:08 ` Andrea Arcangeli @ 2004-07-01 4:57 ` Linus Torvalds 2004-07-01 7:02 ` Roland McGrath 1 sibling, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2004-07-01 4:57 UTC (permalink / raw) To: Roland McGrath Cc: Andrea Arcangeli, Andreas Schwab, Andrew Morton, linux-kernel On Wed, 30 Jun 2004, Roland McGrath wrote: > > The following patch fixes that for me. I am not 100% confident that the > locking dance required here doesn't create some weird issue (and it > certainly seems inefficient how many times the lock is released and > retaken in this sequence), but maybe 92% sure. I do think the locking is broken in your patch. Since you release the tasklist lock, the children on our list of children might go away while you released the lock, making the list_for_each_safe(..) thing unsafe - the next entry in the list (that we have cached in "next") may have gone away, resulting in us using a stale pointer when we re-start the loop after re-aquiring the lock. HOWEVER, I think you can fix it with something like _n = father->children.next; after you've re-aquired the lock (that will re-start the loop, but since we should have gotten rid of all the previous entries, the "restart" is actually going to just continue at the point where we were going to continue anyway, so it shouldn't cause any extra iterations). Does that still work for you, or have I totally messed up? I do agree with Andrea that it's ugly, and my patch just makes it uglier still. I wonder if there is some cleaner way to do the same thing. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 4:57 ` Linus Torvalds @ 2004-07-01 7:02 ` Roland McGrath 2004-07-01 16:50 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Roland McGrath @ 2004-07-01 7:02 UTC (permalink / raw) To: Linus Torvalds Cc: Andrea Arcangeli, Andreas Schwab, Andrew Morton, linux-kernel > I do think the locking is broken in your patch. I was afraid of that. > Since you release the tasklist lock, the children on our list of children > might go away while you released the lock, making the > > list_for_each_safe(..) You know, I never really looked at the macros, but seeing "_safe" here made me think that exactly this is what it's safe from. That was obviously a silly thing to think, since it's clearly just safe from removing the list element during the iteration. > HOWEVER, I think you can fix it with something like > > _n = father->children.next; > > after you've re-aquired the lock (that will re-start the loop, but since > we should have gotten rid of all the previous entries, the "restart" is > actually going to just continue at the point where we were going to > continue anyway, so it shouldn't cause any extra iterations). > > Does that still work for you, or have I totally messed up? That does still work. > I do agree with Andrea that it's ugly, and my patch just makes it uglier > still. I wonder if there is some cleaner way to do the same thing. Well, which thing? As to this locking issue, something I was considering to reduce that dance was changing release_task to take a flag saying the caller already holds some lock. That would be to make things hold the tasklist_lock for longer stretches than it does now, which might not be what we wnat. I can think of two approaches to simply avoid calling release_task inside that loop in forget_original_parent, and so have the locking issue to contend with, if that is what you mean. First, queue them on a list for calling release_task later when life is calmer. The following patch works: --- linux-2.6.7-mm4/kernel/exit.c.~1~ 2004-06-30 16:29:06.000000000 -0700 +++ linux-2.6.7-mm4/kernel/exit.c 2004-06-30 23:03:57.000000000 -0700 @@ -594,7 +594,8 @@ static inline void reparent_thread(task_ * group, and if no such member exists, give it to * the global child reaper process (ie "init") */ -static inline void forget_original_parent(struct task_struct * father) +static inline void forget_original_parent(struct task_struct * father, + struct list_head *to_release) { struct task_struct *p, *reaper = father; struct list_head *_p, *_n; @@ -618,9 +619,19 @@ static inline void forget_original_paren reparent_thread(p, father, 0); } else { ptrace_unlink (p); - if (p->state == TASK_ZOMBIE && p->exit_signal != -1 && - thread_group_empty(p)) - do_notify_parent(p, p->exit_signal); + if (p->state == TASK_ZOMBIE) { + if (p->exit_signal == -1) { + /* + * This was only a zombie because + * we were tracing it. Now it should + * disappear as it would have done + * if we hadn't been tracing it. + */ + list_add(&p->ptrace_list, to_release); + } + else if (thread_group_empty(p)) + do_notify_parent(p, p->exit_signal); + } } } list_for_each_safe(_p, _n, &father->ptrace_children) { @@ -638,6 +649,7 @@ static void exit_notify(struct task_stru { int state; struct task_struct *t; + struct list_head ptrace_dead, *_p, *_n; if (signal_pending(tsk) && !tsk->signal->group_exit && !thread_group_empty(tsk)) { @@ -673,7 +685,8 @@ static void exit_notify(struct task_stru * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) */ - forget_original_parent(tsk); + INIT_LIST_HEAD(&ptrace_dead); + forget_original_parent(tsk, &ptrace_dead); BUG_ON(!list_empty(&tsk->children)); /* @@ -759,6 +772,12 @@ static void exit_notify(struct task_stru _raw_write_unlock(&tasklist_lock); local_irq_enable(); + list_for_each_safe(_p, _n, &ptrace_dead) { + list_del_init(_p); + t = list_entry(_p,struct task_struct,ptrace_list); + release_task(t); + } + /* If the process is dead, release it - nobody will wait for it */ if (state == TASK_DEAD) release_task(tsk); The second approach to that is to have some other thread call release_task for you. The benefit there would be no change whatsoever in the main exit_notify code path, the only new code in the exit path being in just this one unusual case. To make init do that reaping in the normal course of things might be a pain. The thread would have to be fully divorced from its thread group and made a normal zombie (i.e. only one in its own thread group) in its own right. Tweaking the pid hashes to do that entails its own locking nightmare. If not init, it could be some random kernel service thread, but I don't see what existing thread would want to do such a thing. Both of those are probably worse in real costs than the ugliness of dropping and reacquiring the lock around calling release_task in the loop. If you are talking about reorganizing exit handling in a larger sense not to have this kind of trouble, then that would take more thought than I am going to give it tonight. Thanks, Roland ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: zombie with CLONE_THREAD 2004-07-01 7:02 ` Roland McGrath @ 2004-07-01 16:50 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2004-07-01 16:50 UTC (permalink / raw) To: Roland McGrath Cc: Andrea Arcangeli, Andreas Schwab, Andrew Morton, linux-kernel On Thu, 1 Jul 2004, Roland McGrath wrote: > > > > list_for_each_safe(..) > > You know, I never really looked at the macros, but seeing "_safe" here made > me think that exactly this is what it's safe from. That was obviously a > silly thing to think, since it's clearly just safe from removing the list > element during the iteration. Right. I'd love to say it's "always safe", but obviously that is not technically possible. I agree that the name is misleading, and it is more about "can_delete" than about being "safe". Anyway, I applied your/Andrea's patch (I happened to read email backwards this morning, so I saw Andrea's first). Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2004-07-01 21:34 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-06-30 6:00 zombie with CLONE_THREAD Andrea Arcangeli 2004-06-30 6:08 ` Andrew Morton 2004-06-30 7:14 ` Roland McGrath 2004-06-30 9:04 ` Andreas Schwab 2004-07-01 3:22 ` Roland McGrath 2004-07-01 4:08 ` Andrea Arcangeli 2004-07-01 4:42 ` Linus Torvalds 2004-07-01 5:39 ` Roland McGrath 2004-07-01 5:56 ` Linus Torvalds 2004-07-01 7:06 ` Roland McGrath 2004-07-01 14:26 ` Andrea Arcangeli 2004-07-01 21:33 ` Roland McGrath 2004-07-01 15:49 ` Linus Torvalds 2004-07-01 16:23 ` Andrea Arcangeli 2004-07-01 16:43 ` Linus Torvalds 2004-07-01 20:27 ` Roland McGrath 2004-07-01 4:57 ` Linus Torvalds 2004-07-01 7:02 ` Roland McGrath 2004-07-01 16:50 ` Linus Torvalds
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.