All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Jacobowitz <dan@debian.org>
To: linux-kernel@vger.kernel.org
Subject: More waitpid issues with CLONE_DETACHED/CLONE_THREAD
Date: Sat, 31 Jan 2004 22:25:25 -0500	[thread overview]
Message-ID: <20040201032525.GA10254@nevyn.them.org> (raw)

This may be related to the python bug reported today...

I've been playing around with gdbserver support for NPTL threading all day
today.  Right now it works, except that when I say "kill" in the GDB client,
gdbserver hangs.  The problem is that we kill the child, and wait for it,
but wait never returns it.

write(2, "Killing inferior\n", 17)      = 17
ptrace(PTRACE_CONT, 8454, 0, SIG_0)     = 0
tkill(8454, SIGKILL)                    = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(8454, 0xbfffec04, WNOHANG)      = 0
waitpid(8454, 0xbfffec04, WNOHANG|__WCLONE) = -1 ECHILD (No child processes)
nanosleep({0, 1000000}, 0)              = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
setup()                                 = 0
waitpid(8454, 0xbfffec04, WNOHANG)      = 0
waitpid(8454, 0xbfffec04, WNOHANG|__WCLONE) = -1 ECHILD (No child processes)

and so on (looping on waitpid).  At this point, process 8454 is marked as a
zombie, and nothing can reap it.  After gdbserver is killed, it reparents to
init:

Name:   linux-dp
State:  Z (zombie)
SleepAVG:       91%
Tgid:   8454
Pid:    8454
PPid:   1
TracerPid:      0

but init can't reap it either.

8454 was the original (i.e. non-CLONE_DETACHED) thread.  Same behavior if I
use ptrace_kill.

GDB doesn't suffer from the same problem.  A little time with strace and I
found out why: GDB PTRACE_KILL's the detached threads, PTRACE_KILL's the
parent thread, waitpid's the detached threads, and then waitpid's the parent
thread.  No design, just different order of items on the linked list.

If I change gdbserver to do "kill thread; wait for thread; kill next thread;
wait for next thread; kill parent last; wait for parent last" then it
terminates and I don't get an unkillable zombie.

ptrace(PTRACE_KILL, 18348, 0, 0)        = 0
waitpid(18348, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(18348, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18348
ptrace(PTRACE_KILL, 18349, 0, 0)        = 0
waitpid(18349, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18349, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18349
--- SIGCHLD (Child exited) @ 0 (0) ---
ptrace(PTRACE_KILL, 18350, 0, 0)        = 0
waitpid(18350, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18350, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18350
ptrace(PTRACE_KILL, 18351, 0, 0)        = 0
waitpid(18351, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18351, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18351
ptrace(PTRACE_KILL, 18352, 0, 0)        = 0
waitpid(18352, 0xbfffec04, WNOHANG)     = -1 ECHILD (No child processes)
waitpid(18352, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG|__WCLONE) = 18352
--- SIGCHLD (Child exited) @ 0 (0) ---
ptrace(PTRACE_KILL, 18329, 0, 0)        = 0
waitpid(18329, [WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL], WNOHANG) = 18329
exit_group(0)                           = ?

So it looks like something gets very confused if the parent is SIGKILLed
before the children.  What should happen?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

             reply	other threads:[~2004-02-01  3:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-01  3:25 Daniel Jacobowitz [this message]
2004-02-01  4:38 ` More waitpid issues with CLONE_DETACHED/CLONE_THREAD Linus Torvalds
2004-02-01  4:43   ` Daniel Jacobowitz
2004-02-01  5:12     ` Daniel Jacobowitz
2004-02-01 21:41       ` Linus Torvalds
2004-02-01 22:25         ` Roland McGrath
2004-02-02  0:55           ` Linus Torvalds
2004-02-02  2:20             ` Andries Brouwer
2004-02-02  2:30               ` Linus Torvalds
2004-02-02  0:52         ` Daniel Jacobowitz
2004-02-02  2:41           ` Davide Libenzi
2004-02-02  2:55             ` Davide Libenzi
2004-02-04 14:22               ` fs/eventpoll : reduce sizeof(struct epitem) dada1
2004-02-05  4:23                 ` Davide Libenzi
2004-02-01  5:12     ` More waitpid issues with CLONE_DETACHED/CLONE_THREAD Linus Torvalds
2004-02-01  5:14       ` Daniel Jacobowitz
2004-02-01  5:42         ` Roland McGrath
2004-02-01  5:46           ` Daniel Jacobowitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040201032525.GA10254@nevyn.them.org \
    --to=dan@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.