From: Oleg Nesterov <oleg@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: roland@redhat.com, jan.kratochvil@redhat.com,
vda.linux@googlemail.com, linux-kernel@vger.kernel.org,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
indan@nul.nu
Subject: Re: [PATCH 3/8] job control: Fix ptracer wait(2) hang and explain notask_error clearing
Date: Mon, 21 Mar 2011 16:19:41 +0100 [thread overview]
Message-ID: <20110321151941.GA20917@redhat.com> (raw)
In-Reply-To: <1299614199-25142-4-git-send-email-tj@kernel.org>
On 03/08, Tejun Heo wrote:
>
> static void *nooper(void *arg)
> {
> pause();
> return NULL;
> }
>
> int main(void)
> {
> const struct timespec ts1s = { .tv_sec = 1 };
> pid_t tracee, tracer;
> siginfo_t si;
>
> tracee = fork();
> if (tracee == 0) {
> pthread_t thr;
>
> pthread_create(&thr, NULL, nooper, NULL);
> nanosleep(&ts1s, NULL);
> printf("tracee exiting\n");
> pthread_exit(NULL); /* let subthread run */
> }
>
> tracer = fork();
> if (tracer == 0) {
> ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
> while (1) {
> if (waitid(P_PID, tracee, &si, WSTOPPED) < 0) {
> perror("waitid");
> break;
> }
> ptrace(PTRACE_CONT, tracee, NULL,
> (void *)(long)si.si_status);
> }
> return 0;
> }
>
> waitid(P_PID, tracer, &si, WEXITED);
> kill(tracee, SIGKILL);
> return 0;
> }
>
> Before the patch, after the tracee becomes a zombie, the tracer's
> waitid(WSTOPPED) never returns and the program doesn't terminate.
Yes. The program should terminate after nooper() exits. Agreed, this
looks strange.
> After the patch, tracee exiting triggers waitid() to fail.
>
> tracee exiting
> waitid: No child processes
OK, but
> @@ -1550,17 +1550,41 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
> return 0;
> }
>
> - /*
> - * We don't reap group leaders with subthreads.
> - */
> - if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
> - return wait_task_zombie(wo, p);
> + /* slay zombie? */
> + if (p->exit_state == EXIT_ZOMBIE) {
> + /* we don't reap group leaders with subthreads */
> + if (!delay_group_leader(p))
> + return wait_task_zombie(wo, p);
>
> - /*
> - * It's stopped or running now, so it might
> - * later continue, exit, or stop again.
> - */
> - wo->notask_error = 0;
> + /*
> + * Allow access to stopped/continued state via zombie by
> + * falling through. Clearing of notask_error is complex.
> + *
> + * When !@ptrace:
> + *
> + * If WEXITED is set, notask_error should naturally be
> + * cleared. If not, subset of WSTOPPED|WCONTINUED is set,
> + * so, if there are live subthreads, there are events to
> + * wait for. If all subthreads are dead, it's still safe
> + * to clear - this function will be called again in finite
> + * amount time once all the subthreads are released and
> + * will then return without clearing.
> + *
> + * When @ptrace:
> + *
> + * Stopped state is per-task and thus can't change once the
> + * target task dies, so notask_error should be cleared only
> + * if WCONTINUED is set.
> + */
> + if (likely(!ptrace) || (wo->wo_flags & WCONTINUED))
> + wo->notask_error = 0;
I don't understand this part. Suppose that this task is not traced and
its real parent does do_wait(WEXITED). We shouldn't return -ECHLD in
this case (EXIT_ZOMBIE && delay_group_leader()).
If the task is traced and debugger does do_wait(WEXITED) we should not
return -ECHLD too.
So I think this patch is wrong in its current state. Hmm, I just noticed
the comment says "If WEXITED is set, notask_error should ...", but this
is not what the code does?
But the main problem is, I do not think do_wait() should block in this
case, and thus I am starting to think this patch is not "complete".
I mean, we are going to add the user-visible change, and I agree the
current behaviour looks strange. But if we change this code, perhaps
the tracer should ignore delay_group_leader() at all (unless we are
going to do wait_task_zombie) ?
Your test-case could use waitid(WEXITED) instead WSTOPPED with the same
result, it should hang. Why it hangs? The tracee is dead, we can't do
ptrace(PTRACE_DETACH), and we can do nothing until other threads exit.
This looks equally strange.
IOW. Assuming that ptrace == T and WEXITED is set, perhaps we should
do something like this pseudo-code
if (p->exit_state == EXIT_ZOMBIE) {
if (!delay_group_leader(p))
return wait_task_zombie(wo, p);
ptrace_unlink();
wait_task_zombie(WNOWAIT);
}
However. This is another user-visible change, we need another discussion
even if I am right. In particular, it is not clear what should we do
if parent == real_parent. And probably this can confuse gdb, but iirc
gdb already have the problems with the dead leader anyway.
Oleg.
next prev parent reply other threads:[~2011-03-21 15:28 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-08 19:56 [RFC PATCHSET] ptrace,signal: Fix notifications to the real parent while ptraced Tejun Heo
2011-03-08 19:56 ` [PATCH 1/8] job control: Don't set group_stop exit_code if re-entering job control stop Tejun Heo
2011-03-21 13:20 ` Oleg Nesterov
2011-03-21 15:52 ` Tejun Heo
2011-03-22 18:44 ` Oleg Nesterov
2011-03-23 8:44 ` Tejun Heo
2011-03-23 16:40 ` Oleg Nesterov
2011-03-23 17:02 ` Tejun Heo
2011-03-23 17:09 ` Oleg Nesterov
2011-03-23 17:22 ` Tejun Heo
2011-03-08 19:56 ` [PATCH 2/8] job control: Small reorganization of wait_consider_task() Tejun Heo
2011-03-08 19:56 ` [PATCH 3/8] job control: Fix ptracer wait(2) hang and explain notask_error clearing Tejun Heo
2011-03-21 15:19 ` Oleg Nesterov [this message]
2011-03-21 16:09 ` Oleg Nesterov
2011-03-21 16:12 ` Tejun Heo
2011-03-22 19:08 ` Oleg Nesterov
2011-03-22 10:51 ` [PATCH UPDATED " Tejun Heo
2011-03-08 19:56 ` [PATCH 4/8] job control: Allow access to job control events through ptracees Tejun Heo
2011-03-21 16:39 ` Oleg Nesterov
2011-03-21 17:20 ` Tejun Heo
2011-03-22 11:10 ` [PATCH UPDATED " Tejun Heo
2011-03-08 19:56 ` [PATCH 5/8] job control: Add @for_ptrace to do_notify_parent_cldstop() Tejun Heo
2011-03-08 19:56 ` [PATCH 6/8] job control: Job control stop notifications should always go to the real parent Tejun Heo
2011-03-21 17:12 ` Oleg Nesterov
2011-03-08 19:56 ` [PATCH 7/8] job control: Notify the real parent of job control events regardless of ptrace Tejun Heo
2011-03-21 17:43 ` Oleg Nesterov
2011-03-22 8:04 ` Tejun Heo
2011-03-22 19:44 ` Oleg Nesterov
2011-03-23 9:17 ` Tejun Heo
2011-03-23 9:24 ` Tejun Heo
2011-03-23 16:46 ` Oleg Nesterov
2011-03-23 16:59 ` Tejun Heo
2011-03-23 17:07 ` Oleg Nesterov
2011-03-23 17:20 ` Tejun Heo
2011-03-23 17:17 ` Oleg Nesterov
2011-03-22 11:30 ` [PATCH UPDATED " Tejun Heo
2011-03-08 19:56 ` [PATCH 8/8] job control: Don't send duplicate job control stop notification while ptraced Tejun Heo
2011-03-21 17:48 ` Oleg Nesterov
2011-03-08 20:01 ` [RFC PATCHSET] ptrace,signal: Fix notifications to the real parent " Linus Torvalds
2011-03-09 16:50 ` Oleg Nesterov
2011-03-22 10:20 ` [PATCH 0.1/8] ptrace: Collapse ptrace_untrace() into __ptrace_unlink() Tejun Heo
2011-03-22 10:20 ` [PATCH 0.2/8] ptrace: Always put ptracee into appropriate execution state Tejun Heo
2011-03-22 20:33 ` Oleg Nesterov
2011-03-23 8:00 ` Tejun Heo
2011-03-22 13:11 ` [RFC PATCHSET] ptrace,signal: Fix notifications to the real parent while ptraced Tejun Heo
2011-03-22 20:59 ` Oleg Nesterov
2011-03-23 8:48 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110321151941.GA20917@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=indan@nul.nu \
--cc=jan.kratochvil@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=roland@redhat.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vda.linux@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.