public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Jacobowitz <dan@debian.org>
To: linux-kernel@vger.kernel.org
Subject: More waitpid issues with CLONE_DETACHED/CLONE_THREAD
Date: Sat, 31 Jan 2004 22:25:25 -0500	[thread overview]
Message-ID: <20040201032525.GA10254@nevyn.them.org> (raw)

This may be related to the python bug reported today...

I've been playing around with gdbserver support for NPTL threading all day
today.  Right now it works, except that when I say "kill" in the GDB client,
gdbserver hangs.  The problem is that we kill the child, and wait for it,
but wait never returns it.

write(2, "Killing inferior\n", 17)      = 17
ptrace(PTRACE_CONT, 8454, 0, SIG_0)     = 0
tkill(8454, SIGKILL)                    = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(8454, 0xbfffec04, WNOHANG)      = 0
waitpid(8454, 0xbfffec04, WNOHANG|__WCLONE) = -1 ECHILD (No child processes)
nanosleep({0, 1000000}, 0)              = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
setup()                                 = 0
waitpid(8454, 0xbfffec04, WNOHANG)      = 0
waitpid(8454, 0xbfffec04, WNOHANG|__WCLONE) = -1 ECHILD (No child processes)

and so on (looping on waitpid).  At this point, process 8454 is marked as a
zombie, and nothing can reap it.  After gdbserver is killed, it reparents to
init:

Name:   linux-dp
State:  Z (zombie)
SleepAVG:       91%
Tgid:   8454
Pid:    8454
PPid:   1
TracerPid:      0

but init can't reap it either.

8454 was the original (i.e. non-CLONE_DETACHED) thread.  Same behavior if I
use ptrace_kill.

GDB doesn't suffer from the same problem.  A little time with strace and I
found out why: GDB PTRACE_KILL's the detached threads, PTRACE_KILL's the
parent thread, waitpid's the detached threads, and then waitpid's the parent
thread.  No design, just different order of items on the linked list.

If I change gdbserver to do "kill thread; wait for thread; kill next thread;
wait for next thread; kill parent last; wait for parent last" then it
terminates and I don't get an unkillable zombie.

ptrace(PTRACE_KILL, 18348, 0, 0)        = 0
waitpid(18348, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(18348, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18348
ptrace(PTRACE_KILL, 18349, 0, 0)        = 0
waitpid(18349, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18349, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18349
--- SIGCHLD (Child exited) @ 0 (0) ---
ptrace(PTRACE_KILL, 18350, 0, 0)        = 0
waitpid(18350, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18350, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18350
ptrace(PTRACE_KILL, 18351, 0, 0)        = 0
waitpid(18351, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18351, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18351
ptrace(PTRACE_KILL, 18352, 0, 0)        = 0
waitpid(18352, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18352, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18352
--- SIGCHLD (Child exited) @ 0 (0) ---
ptrace(PTRACE_KILL, 18329, 0, 0)        = 0
waitpid(18329, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG) = 18329
exit_group(0)                           = ?

So it looks like something gets very confused if the parent is SIGKILLed
before the children.  What should happen?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

             reply	other threads:[~2004-02-01  3:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-01  3:25 Daniel Jacobowitz [this message]
2004-02-01  4:38 ` More waitpid issues with CLONE_DETACHED/CLONE_THREAD Linus Torvalds
2004-02-01  4:43   ` Daniel Jacobowitz
2004-02-01  5:12     ` Daniel Jacobowitz
2004-02-01 21:41       ` Linus Torvalds
2004-02-01 22:25         ` Roland McGrath
2004-02-02  0:55           ` Linus Torvalds
2004-02-02  2:20             ` Andries Brouwer
2004-02-02  2:30               ` Linus Torvalds
2004-02-02  0:52         ` Daniel Jacobowitz
2004-02-02  2:41           ` Davide Libenzi
2004-02-02  2:55             ` Davide Libenzi
2004-02-04 14:22               ` fs/eventpoll : reduce sizeof(struct epitem) dada1
2004-02-05  4:23                 ` Davide Libenzi
2004-02-01  5:12     ` More waitpid issues with CLONE_DETACHED/CLONE_THREAD Linus Torvalds
2004-02-01  5:14       ` Daniel Jacobowitz
2004-02-01  5:42         ` Roland McGrath
2004-02-01  5:46           ` Daniel Jacobowitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040201032525.GA10254@nevyn.them.org \
    --to=dan@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox