From: Oleg Nesterov <oleg@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: roland@redhat.com, jan.kratochvil@redhat.com,
vda.linux@googlemail.com, linux-kernel@vger.kernel.org,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
indan@nul.nu
Subject: Re: [PATCH 3/8] job control: Fix ptracer wait(2) hang and explain notask_error clearing
Date: Mon, 21 Mar 2011 16:19:41 +0100 [thread overview]
Message-ID: <20110321151941.GA20917@redhat.com> (raw)
In-Reply-To: <1299614199-25142-4-git-send-email-tj@kernel.org>
On 03/08, Tejun Heo wrote:
>
> static void *nooper(void *arg)
> {
> pause();
> return NULL;
> }
>
> int main(void)
> {
> const struct timespec ts1s = { .tv_sec = 1 };
> pid_t tracee, tracer;
> siginfo_t si;
>
> tracee = fork();
> if (tracee == 0) {
> pthread_t thr;
>
> pthread_create(&thr, NULL, nooper, NULL);
> nanosleep(&ts1s, NULL);
> printf("tracee exiting\n");
> pthread_exit(NULL); /* let subthread run */
> }
>
> tracer = fork();
> if (tracer == 0) {
> ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
> while (1) {
> if (waitid(P_PID, tracee, &si, WSTOPPED) < 0) {
> perror("waitid");
> break;
> }
> ptrace(PTRACE_CONT, tracee, NULL,
> (void *)(long)si.si_status);
> }
> return 0;
> }
>
> waitid(P_PID, tracer, &si, WEXITED);
> kill(tracee, SIGKILL);
> return 0;
> }
>
> Before the patch, after the tracee becomes a zombie, the tracer's
> waitid(WSTOPPED) never returns and the program doesn't terminate.
Yes. The program should terminate after nooper() exits. Agreed, this
looks strange.
> After the patch, tracee exiting triggers waitid() to fail.
>
> tracee exiting
> waitid: No child processes
OK, but
> @@ -1550,17 +1550,41 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
> return 0;
> }
>
> - /*
> - * We don't reap group leaders with subthreads.
> - */
> - if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
> - return wait_task_zombie(wo, p);
> + /* slay zombie? */
> + if (p->exit_state == EXIT_ZOMBIE) {
> + /* we don't reap group leaders with subthreads */
> + if (!delay_group_leader(p))
> + return wait_task_zombie(wo, p);
>
> - /*
> - * It's stopped or running now, so it might
> - * later continue, exit, or stop again.
> - */
> - wo->notask_error = 0;
> + /*
> + * Allow access to stopped/continued state via zombie by
> + * falling through. Clearing of notask_error is complex.
> + *
> + * When !@ptrace:
> + *
> + * If WEXITED is set, notask_error should naturally be
> + * cleared. If not, subset of WSTOPPED|WCONTINUED is set,
> + * so, if there are live subthreads, there are events to
> + * wait for. If all subthreads are dead, it's still safe
> + * to clear - this function will be called again in finite
> + * amount time once all the subthreads are released and
> + * will then return without clearing.
> + *
> + * When @ptrace:
> + *
> + * Stopped state is per-task and thus can't change once the
> + * target task dies, so notask_error should be cleared only
> + * if WCONTINUED is set.
> + */
> + if (likely(!ptrace) || (wo->wo_flags & WCONTINUED))
> + wo->notask_error = 0;
I don't understand this part. Suppose that this task is not traced and
its real parent does do_wait(WEXITED). We shouldn't return -ECHLD in
this case (EXIT_ZOMBIE && delay_group_leader()).
If the task is traced and debugger does do_wait(WEXITED) we should not
return -ECHLD too.
So I think this patch is wrong in its current state. Hmm, I just noticed
the comment says "If WEXITED is set, notask_error should ...", but this
is not what the code does?
But the main problem is, I do not think do_wait() should block in this
case, and thus I am starting to think this patch is not "complete".
I mean, we are going to add the user-visible change, and I agree the
current behaviour looks strange. But if we change this code, perhaps
the tracer should ignore delay_group_leader() at all (unless we are
going to do wait_task_zombie) ?
Your test-case could use waitid(WEXITED) instead WSTOPPED with the same
result, it should hang. Why it hangs? The tracee is dead, we can't do
ptrace(PTRACE_DETACH), and we can do nothing until other threads exit.
This looks equally strange.
IOW. Assuming that ptrace == T and WEXITED is set, perhaps we should
do something like this pseudo-code
if (p->exit_state == EXIT_ZOMBIE) {
if (!delay_group_leader(p))
return wait_task_zombie(wo, p);
ptrace_unlink();
wait_task_zombie(WNOWAIT);
}
However. This is another user-visible change, we need another discussion
even if I am right. In particular, it is not clear what should we do
if parent == real_parent. And probably this can confuse gdb, but iirc
gdb already have the problems with the dead leader anyway.
Oleg.
next prev parent reply other threads:[~2011-03-21 15:28 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-08 19:56 [RFC PATCHSET] ptrace,signal: Fix notifications to the real parent while ptraced Tejun Heo
2011-03-08 19:56 ` [PATCH 1/8] job control: Don't set group_stop exit_code if re-entering job control stop Tejun Heo
2011-03-21 13:20 ` Oleg Nesterov
2011-03-21 15:52 ` Tejun Heo
2011-03-22 18:44 ` Oleg Nesterov
2011-03-23 8:44 ` Tejun Heo
2011-03-23 16:40 ` Oleg Nesterov
2011-03-23 17:02 ` Tejun Heo
2011-03-23 17:09 ` Oleg Nesterov
2011-03-23 17:22 ` Tejun Heo
2011-03-08 19:56 ` [PATCH 2/8] job control: Small reorganization of wait_consider_task() Tejun Heo
2011-03-08 19:56 ` [PATCH 3/8] job control: Fix ptracer wait(2) hang and explain notask_error clearing Tejun Heo
2011-03-21 15:19 ` Oleg Nesterov [this message]
2011-03-21 16:09 ` Oleg Nesterov
2011-03-21 16:12 ` Tejun Heo
2011-03-22 19:08 ` Oleg Nesterov
2011-03-22 10:51 ` [PATCH UPDATED " Tejun Heo
2011-03-08 19:56 ` [PATCH 4/8] job control: Allow access to job control events through ptracees Tejun Heo
2011-03-21 16:39 ` Oleg Nesterov
2011-03-21 17:20 ` Tejun Heo
2011-03-22 11:10 ` [PATCH UPDATED " Tejun Heo
2011-03-08 19:56 ` [PATCH 5/8] job control: Add @for_ptrace to do_notify_parent_cldstop() Tejun Heo
2011-03-08 19:56 ` [PATCH 6/8] job control: Job control stop notifications should always go to the real parent Tejun Heo
2011-03-21 17:12 ` Oleg Nesterov
2011-03-08 19:56 ` [PATCH 7/8] job control: Notify the real parent of job control events regardless of ptrace Tejun Heo
2011-03-21 17:43 ` Oleg Nesterov
2011-03-22 8:04 ` Tejun Heo
2011-03-22 19:44 ` Oleg Nesterov
2011-03-23 9:17 ` Tejun Heo
2011-03-23 9:24 ` Tejun Heo
2011-03-23 16:46 ` Oleg Nesterov
2011-03-23 16:59 ` Tejun Heo
2011-03-23 17:07 ` Oleg Nesterov
2011-03-23 17:20 ` Tejun Heo
2011-03-23 17:17 ` Oleg Nesterov
2011-03-22 11:30 ` [PATCH UPDATED " Tejun Heo
2011-03-08 19:56 ` [PATCH 8/8] job control: Don't send duplicate job control stop notification while ptraced Tejun Heo
2011-03-21 17:48 ` Oleg Nesterov
2011-03-08 20:01 ` [RFC PATCHSET] ptrace,signal: Fix notifications to the real parent " Linus Torvalds
2011-03-09 16:50 ` Oleg Nesterov
2011-03-22 10:20 ` [PATCH 0.1/8] ptrace: Collapse ptrace_untrace() into __ptrace_unlink() Tejun Heo
2011-03-22 10:20 ` [PATCH 0.2/8] ptrace: Always put ptracee into appropriate execution state Tejun Heo
2011-03-22 20:33 ` Oleg Nesterov
2011-03-23 8:00 ` Tejun Heo
2011-03-22 13:11 ` [RFC PATCHSET] ptrace,signal: Fix notifications to the real parent while ptraced Tejun Heo
2011-03-22 20:59 ` Oleg Nesterov
2011-03-23 8:48 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110321151941.GA20917@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=indan@nul.nu \
--cc=jan.kratochvil@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=roland@redhat.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vda.linux@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox