* ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
@ 2013-07-23 10:05 Mike Galbraith
2013-07-23 15:58 ` Oleg Nesterov
0 siblings, 1 reply; 5+ messages in thread
From: Mike Galbraith @ 2013-07-23 10:05 UTC (permalink / raw)
To: LKML; +Cc: Oleg Nesterov
I received a report that glibc:elf/pldd hangs occasionally, and indeed..
for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
..will do so. Rummage.....
ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
pldd source:
if (ptrace (PTRACE_ATTACH, tid, NULL, NULL) != 0)
{
/* There might be a race between reading the directory and
threads terminating. Ignore errors attaching to unknown
threads unless this is the main thread. */
if (errno == ESRCH && tid != pid)
continue;
error (EXIT_FAILURE, errno, gettext ("cannot attach to process %lu"),
tid);
}
struct thread_list *newp = alloca (sizeof (*newp));
newp->tid = tid;
newp->next = thread_list;
thread_list = newp;
}
closedir (dir);
int status = get_process_info (dfd, pid);
assert (thread_list != NULL);
do
{
ptrace (PTRACE_DETACH, thread_list->tid, NULL, NULL);
thread_list = thread_list->next;
}
while (thread_list != NULL);
Seems this usually works only because cycles expended between attach and
detach is usually enough to let trap happen so tracee can set its state
to TASK_TRACED as PTRACE_DETACH expects it to be.
Is this expected behavior? It looks a bit like "Doctor Doctor..".
-Mike
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
2013-07-23 10:05 ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck Mike Galbraith
@ 2013-07-23 15:58 ` Oleg Nesterov
2013-07-23 16:38 ` Oleg Nesterov
2013-07-24 2:21 ` Mike Galbraith
0 siblings, 2 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-07-23 15:58 UTC (permalink / raw)
To: Mike Galbraith; +Cc: LKML
On 07/23, Mike Galbraith wrote:
>
> I received a report that glibc:elf/pldd hangs occasionally, and indeed..
>
> for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
>
> ..will do so. Rummage.....
>
> ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
>
> pldd source:
>
[...snip...]
>
> Seems this usually works only because cycles expended between attach and
> detach is usually enough to let trap happen so tracee can set its state
> to TASK_TRACED as PTRACE_DETACH expects it to be.
>
> Is this expected behavior?
Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
between, this is expected.
PTRACE_DETACH like (almost) any other ptrace request needs the stopped
tracee. Otherwise, say, ptrace_disable() or flush_ptrace_hw_breakpoint()
are not safe.
We could probably add PTRACE_UNTRACE which only does __ptrace_unlink/etc
like the exiting tracer does. (In particular, it could help to detach a
zombie).
But note that even PTRACE_ATTACH + PTRACE_UNTRACE won't be really correct.
PTRACE_ATTACH sends SIGSTOP, so without sys_wait() in between the tracee
can stop in TASK_STOPPED.
Oleg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
2013-07-23 15:58 ` Oleg Nesterov
@ 2013-07-23 16:38 ` Oleg Nesterov
2013-07-23 16:43 ` Oleg Nesterov
2013-07-24 2:21 ` Mike Galbraith
1 sibling, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2013-07-23 16:38 UTC (permalink / raw)
To: Mike Galbraith; +Cc: LKML
On 07/23, Oleg Nesterov wrote:
>
> On 07/23, Mike Galbraith wrote:
> >
> > I received a report that glibc:elf/pldd hangs occasionally, and indeed..
> >
> > for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
> >
> > ..will do so. Rummage.....
> >
> > ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> > which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
> >
> > pldd source:
> >
> [...snip...]
> >
> > Seems this usually works only because cycles expended between attach and
> > detach is usually enough to let trap happen so tracee can set its state
> > to TASK_TRACED as PTRACE_DETACH expects it to be.
> >
> > Is this expected behavior?
>
> Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
> between, this is expected.
>
> PTRACE_DETACH like (almost) any other ptrace request needs the stopped
> tracee. Otherwise, say, ptrace_disable() or flush_ptrace_hw_breakpoint()
> are not safe.
I have found the source of pldd.c. It seems that it has another reason
for waitpid().
/* Stop all threads since otherwise the list of loaded modules might
change while we are reading it. */
Yes, but without waitpid() we can't know if it was actually stopped.
OTOH, in this particular case pldd.c doesn't really need PTRACE_DETACH,
it can simply exit.
Oleg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
2013-07-23 16:38 ` Oleg Nesterov
@ 2013-07-23 16:43 ` Oleg Nesterov
0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-07-23 16:43 UTC (permalink / raw)
To: Mike Galbraith; +Cc: LKML
Damn. Sorry for noise Mike,
On 07/23, Oleg Nesterov wrote:
>
> OTOH, in this particular case pldd.c doesn't really need PTRACE_DETACH,
> it can simply exit.
No it can't, I forgot that exit_ptrace() doesn't (and can't) clear
->exit_code. And this is another reason why PTRACE_DETACH needs the
stopped tracee.
Oleg.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
2013-07-23 15:58 ` Oleg Nesterov
2013-07-23 16:38 ` Oleg Nesterov
@ 2013-07-24 2:21 ` Mike Galbraith
1 sibling, 0 replies; 5+ messages in thread
From: Mike Galbraith @ 2013-07-24 2:21 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: LKML
On Tue, 2013-07-23 at 17:58 +0200, Oleg Nesterov wrote:
> On 07/23, Mike Galbraith wrote:
> >
> > I received a report that glibc:elf/pldd hangs occasionally, and indeed..
> >
> > for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
> >
> > ..will do so. Rummage.....
> >
> > ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> > which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
> >
> > pldd source:
> >
> [...snip...]
> >
> > Seems this usually works only because cycles expended between attach and
> > detach is usually enough to let trap happen so tracee can set its state
> > to TASK_TRACED as PTRACE_DETACH expects it to be.
> >
> > Is this expected behavior?
>
> Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
> between, this is expected.
Thanks for confirmation. The man page was pretty clear (read it after
slogging through source/traces, oh well, educational;) that -ESRCH was
expected, but I wanted to be sure about tracee state thereafter.
-Mike
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-07-24 2:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-23 10:05 ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck Mike Galbraith
2013-07-23 15:58 ` Oleg Nesterov
2013-07-23 16:38 ` Oleg Nesterov
2013-07-23 16:43 ` Oleg Nesterov
2013-07-24 2:21 ` Mike Galbraith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).