public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* More waitpid issues with CLONE_DETACHED/CLONE_THREAD
@ 2004-02-01  3:25 Daniel Jacobowitz
  2004-02-01  4:38 ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Jacobowitz @ 2004-02-01  3:25 UTC (permalink / raw)
  To: linux-kernel

This may be related to the python bug reported today...

I've been playing around with gdbserver support for NPTL threading all day
today.  Right now it works, except that when I say "kill" in the GDB client,
gdbserver hangs.  The problem is that we kill the child, and wait for it,
but wait never returns it.

write(2, "Killing inferior\n", 17)      = 17
ptrace(PTRACE_CONT, 8454, 0, SIG_0)     = 0
tkill(8454, SIGKILL)                    = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(8454, 0xbfffec04, WNOHANG)      = 0
waitpid(8454, 0xbfffec04, WNOHANG|__WCLONE) = -1 ECHILD (No child processes)
nanosleep({0, 1000000}, 0)              = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
setup()                                 = 0
waitpid(8454, 0xbfffec04, WNOHANG)      = 0
waitpid(8454, 0xbfffec04, WNOHANG|__WCLONE) = -1 ECHILD (No child processes)

and so on (looping on waitpid).  At this point, process 8454 is marked as a
zombie, and nothing can reap it.  After gdbserver is killed, it reparents to
init:

Name:   linux-dp
State:  Z (zombie)
SleepAVG:       91%
Tgid:   8454
Pid:    8454
PPid:   1
TracerPid:      0

but init can't reap it either.

8454 was the original (i.e. non-CLONE_DETACHED) thread.  Same behavior if I
use ptrace_kill.

GDB doesn't suffer from the same problem.  A little time with strace and I
found out why: GDB PTRACE_KILL's the detached threads, PTRACE_KILL's the
parent thread, waitpid's the detached threads, and then waitpid's the parent
thread.  No design, just different order of items on the linked list.

If I change gdbserver to do "kill thread; wait for thread; kill next thread;
wait for next thread; kill parent last; wait for parent last" then it
terminates and I don't get an unkillable zombie.

ptrace(PTRACE_KILL, 18348, 0, 0)        = 0
waitpid(18348, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(18348, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18348
ptrace(PTRACE_KILL, 18349, 0, 0)        = 0
waitpid(18349, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18349, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18349
--- SIGCHLD (Child exited) @ 0 (0) ---
ptrace(PTRACE_KILL, 18350, 0, 0)        = 0
waitpid(18350, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18350, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18350
ptrace(PTRACE_KILL, 18351, 0, 0)        = 0
waitpid(18351, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18351, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18351
ptrace(PTRACE_KILL, 18352, 0, 0)        = 0
waitpid(18352, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18352, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18352
--- SIGCHLD (Child exited) @ 0 (0) ---
ptrace(PTRACE_KILL, 18329, 0, 0)        = 0
waitpid(18329, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG) = 18329
exit_group(0)                           = ?

So it looks like something gets very confused if the parent is SIGKILLed
before the children.  What should happen?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-02-05  4:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-01  3:25 More waitpid issues with CLONE_DETACHED/CLONE_THREAD Daniel Jacobowitz
2004-02-01  4:38 ` Linus Torvalds
2004-02-01  4:43   ` Daniel Jacobowitz
2004-02-01  5:12     ` Daniel Jacobowitz
2004-02-01 21:41       ` Linus Torvalds
2004-02-01 22:25         ` Roland McGrath
2004-02-02  0:55           ` Linus Torvalds
2004-02-02  2:20             ` Andries Brouwer
2004-02-02  2:30               ` Linus Torvalds
2004-02-02  0:52         ` Daniel Jacobowitz
2004-02-02  2:41           ` Davide Libenzi
2004-02-02  2:55             ` Davide Libenzi
2004-02-04 14:22               ` fs/eventpoll : reduce sizeof(struct epitem) dada1
2004-02-05  4:23                 ` Davide Libenzi
2004-02-01  5:12     ` More waitpid issues with CLONE_DETACHED/CLONE_THREAD Linus Torvalds
2004-02-01  5:14       ` Daniel Jacobowitz
2004-02-01  5:42         ` Roland McGrath
2004-02-01  5:46           ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox