linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ptrace(PTRACE_ATTACH)  [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
@ 2013-07-23 10:05 Mike Galbraith
  2013-07-23 15:58 ` Oleg Nesterov
  0 siblings, 1 reply; 5+ messages in thread
From: Mike Galbraith @ 2013-07-23 10:05 UTC (permalink / raw)
  To: LKML; +Cc: Oleg Nesterov

I received a report that glibc:elf/pldd hangs occasionally, and indeed..

  for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done

..will do so.  Rummage.....

ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).

pldd source:

      if (ptrace (PTRACE_ATTACH, tid, NULL, NULL) != 0)
        {
          /* There might be a race between reading the directory and
             threads terminating.  Ignore errors attaching to unknown
             threads unless this is the main thread.  */
          if (errno == ESRCH && tid != pid)
            continue;

          error (EXIT_FAILURE, errno, gettext ("cannot attach to process %lu"),
                 tid);
        } 

      struct thread_list *newp = alloca (sizeof (*newp));
      newp->tid = tid;
      newp->next = thread_list;
      thread_list = newp;
    }

  closedir (dir);

  int status = get_process_info (dfd, pid);

  assert (thread_list != NULL);
  do
    {
      ptrace (PTRACE_DETACH, thread_list->tid, NULL, NULL);
      thread_list = thread_list->next;
    }
  while (thread_list != NULL);

Seems this usually works only because cycles expended between attach and
detach is usually enough to let trap happen so tracee can set its state
to TASK_TRACED as PTRACE_DETACH expects it to be.

Is this expected behavior?  It looks a bit like "Doctor Doctor..".

-Mike


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ptrace(PTRACE_ATTACH)  [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
  2013-07-23 10:05 ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck Mike Galbraith
@ 2013-07-23 15:58 ` Oleg Nesterov
  2013-07-23 16:38   ` Oleg Nesterov
  2013-07-24  2:21   ` Mike Galbraith
  0 siblings, 2 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-07-23 15:58 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

On 07/23, Mike Galbraith wrote:
>
> I received a report that glibc:elf/pldd hangs occasionally, and indeed..
>
>   for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
>
> ..will do so.  Rummage.....
>
> ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
>
> pldd source:
>
[...snip...]
>
> Seems this usually works only because cycles expended between attach and
> detach is usually enough to let trap happen so tracee can set its state
> to TASK_TRACED as PTRACE_DETACH expects it to be.
>
> Is this expected behavior?

Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
between, this is expected.

PTRACE_DETACH like (almost) any other ptrace request needs the stopped
tracee. Otherwise, say, ptrace_disable() or flush_ptrace_hw_breakpoint()
are not safe.

We could probably add PTRACE_UNTRACE which only does __ptrace_unlink/etc
like the exiting tracer does. (In particular, it could help to detach a
zombie).

But note that even PTRACE_ATTACH + PTRACE_UNTRACE won't be really correct.
PTRACE_ATTACH sends SIGSTOP, so without sys_wait() in between the tracee
can stop in TASK_STOPPED.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ptrace(PTRACE_ATTACH)  [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
  2013-07-23 15:58 ` Oleg Nesterov
@ 2013-07-23 16:38   ` Oleg Nesterov
  2013-07-23 16:43     ` Oleg Nesterov
  2013-07-24  2:21   ` Mike Galbraith
  1 sibling, 1 reply; 5+ messages in thread
From: Oleg Nesterov @ 2013-07-23 16:38 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

On 07/23, Oleg Nesterov wrote:
>
> On 07/23, Mike Galbraith wrote:
> >
> > I received a report that glibc:elf/pldd hangs occasionally, and indeed..
> >
> >   for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
> >
> > ..will do so.  Rummage.....
> >
> > ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> > which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
> >
> > pldd source:
> >
> [...snip...]
> >
> > Seems this usually works only because cycles expended between attach and
> > detach is usually enough to let trap happen so tracee can set its state
> > to TASK_TRACED as PTRACE_DETACH expects it to be.
> >
> > Is this expected behavior?
>
> Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
> between, this is expected.
>
> PTRACE_DETACH like (almost) any other ptrace request needs the stopped
> tracee. Otherwise, say, ptrace_disable() or flush_ptrace_hw_breakpoint()
> are not safe.

I have found the source of pldd.c. It seems that it has another reason
for waitpid().

	/* Stop all threads since otherwise the list of loaded modules might
	   change while we are reading it. */

Yes, but without waitpid() we can't know if it was actually stopped.

OTOH, in this particular case pldd.c doesn't really need PTRACE_DETACH,
it can simply exit.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ptrace(PTRACE_ATTACH)  [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
  2013-07-23 16:38   ` Oleg Nesterov
@ 2013-07-23 16:43     ` Oleg Nesterov
  0 siblings, 0 replies; 5+ messages in thread
From: Oleg Nesterov @ 2013-07-23 16:43 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: LKML

Damn. Sorry for noise Mike,

On 07/23, Oleg Nesterov wrote:
>
> OTOH, in this particular case pldd.c doesn't really need PTRACE_DETACH,
> it can simply exit.

No it can't, I forgot that exit_ptrace() doesn't (and can't) clear
->exit_code. And this is another reason why PTRACE_DETACH needs the
stopped tracee.

Oleg.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ptrace(PTRACE_ATTACH)  [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck
  2013-07-23 15:58 ` Oleg Nesterov
  2013-07-23 16:38   ` Oleg Nesterov
@ 2013-07-24  2:21   ` Mike Galbraith
  1 sibling, 0 replies; 5+ messages in thread
From: Mike Galbraith @ 2013-07-24  2:21 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: LKML

On Tue, 2013-07-23 at 17:58 +0200, Oleg Nesterov wrote: 
> On 07/23, Mike Galbraith wrote:
> >
> > I received a report that glibc:elf/pldd hangs occasionally, and indeed..
> >
> >   for i in `seq 1 1000`; do taskset -c 3 pldd $$ > /dev/null 2>&1; done
> >
> > ..will do so.  Rummage.....
> >
> > ptrace(PTRACE_DETACH) returns -ESRCH when the trap hasn't happened yet,
> > which happens because pldd doesn't wait() before ptrace(PTRACE_DETACH).
> >
> > pldd source:
> >
> [...snip...]
> >
> > Seems this usually works only because cycles expended between attach and
> > detach is usually enough to let trap happen so tracee can set its state
> > to TASK_TRACED as PTRACE_DETACH expects it to be.
> >
> > Is this expected behavior?
> 
> Yes. PTRACE_ATTACH + PTRACE_DETACH is not correct without wait() in
> between, this is expected.

Thanks for confirmation.  The man page was pretty clear (read it after
slogging through source/traces, oh well, educational;) that -ESRCH was
expected, but I wanted to be sure about tracee state thereafter.

-Mike


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-07-24  2:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-23 10:05 ptrace(PTRACE_ATTACH) [no intervering wait] ptrace(PTRACE_DETACH) may leave tracee stuck Mike Galbraith
2013-07-23 15:58 ` Oleg Nesterov
2013-07-23 16:38   ` Oleg Nesterov
2013-07-23 16:43     ` Oleg Nesterov
2013-07-24  2:21   ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).