Questions about ptrace on a dying process

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Questions about ptrace on a dying process
@ 2012-02-29 18:06 Tim Bird
  2012-02-29 18:50 ` Oleg Nesterov
  2012-02-29 19:12 ` Andi Kleen
  0 siblings, 2 replies; 10+ messages in thread
From: Tim Bird @ 2012-02-29 18:06 UTC (permalink / raw)
  To: Roland McGrath, Oleg Nesterov, Denys Vlasenko, linux kernel

ptrace maintainers (and interested parties)...

I'm working on a crash handler for Linux, which uses ptrace to retrieve information
about a process during it's coredump.  Specifically, from within a core handler
program (started within do_coredump() as a user_mode_helper), I would like to make
ptrace calls against the dying process.

My problem is that the process state is not entering into TASK_TRACED, when
I do an PTRACE_ATTACH against it.  I have worked around the problem with the
hack below, and am now trying to find a more correct solution to my problem:

--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -122,6 +122,8 @@ int ptrace_check_attach(struct task_struct *child, int kill)
                WARN_ON_ONCE(task_is_stopped(child));
                if (task_is_traced(child) || kill)
                        ret = 0;
+               // FIXTHIS - force tracing support for crash handler
+               ret = 0;
                spin_unlock_irq(&child->sighand->siglock);
        }
        read_unlock(&tasklist_lock);

I'm trying to decipher the code, but I'm not sure if I understand it correctly.

Here's the problem I have found:

The code in ptrace_check_attach() tests whether the process state has __TASK_TRACED.
(the above 'task_is_trace(child)' conditional).  This appears to be the gateway
for allowing ptrace operations on the process.

A process usually sets it's state to TASK_TRACED (which includes __TASK_TRACED)
via the function ptrace_stop(), which is usually called (I believe), when a task
processes a STOP signal.  In the case of a dying process, however, it appears
this is never called, since the process never actually returns through the
signal-handling code on it's way back to user-space, since user-space never
runs again.  At least, I tried to send a signal from my crash_handler program
to the dying process, and the dying process never processes the signal.

In ptrace_attach(), a stop signal *is* submitted to the process, but via
a call to send_sig_info(SIGSTOP...), not by calling ptrace_stop().
This ends up adding the STOP signal to the pending bit array, but not
converting the process to TASK_TRACED at that time.

The code in these codes paths looks quite tricky, and I am loathe to make any
changes to the generic state machine for my case.

However, what would you think of having special case code in ptrace_attach()
or ptrace_check_attach() code, for the in-coredump case?

Have you heard of other uses of ptrace on dying processes?  Have other people
gotten thts working successfully?  That is, is there something
simple I'm missing, in terms of manipulating the process state to allow for
ptrace operation in this situation?

Thanks for any help or ideas you can provide.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-02-29 18:06 Questions about ptrace on a dying process Tim Bird
@ 2012-02-29 18:50 ` Oleg Nesterov
  2012-02-29 20:53   ` Tim Bird
  2012-02-29 19:12 ` Andi Kleen
  1 sibling, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2012-02-29 18:50 UTC (permalink / raw)
  To: Tim Bird; +Cc: Roland McGrath, Denys Vlasenko, linux kernel

On 02/29, Tim Bird wrote:
>
> ptrace maintainers (and interested parties)...
>
> I'm working on a crash handler for Linux, which uses ptrace to retrieve information
> about a process during it's coredump.  Specifically, from within a core handler
> program (started within do_coredump() as a user_mode_helper), I would like to make
> ptrace calls against the dying process.

Which calls? just curious.

> My problem is that the process state is not entering into TASK_TRACED, when
> I do an PTRACE_ATTACH against it.

Yes, it can never do ptrace_stop() in do_coredump() paths.

Perhaps you can use PTRACE_O_TRACEEXIT. PTRACE_EVENT_EXIT will be reported
after the coredumping. I think the core handler should close the pipe first,
otherwise the dumping tracee will wait for the handler forever.

However. You need PTRACE_SEIZE, not PTRACE_ATTACH. And this can only work
with the recent patch from Denys which allows to pass PTRACE_O_TRACEEXIT
with PTRACE_SEIZE (currently in -mm tree).

Just in case, it could have other threads sleeping in TASK_UNINTERRUPTIBLE
until do_coredump() completes. But these threads have already passed
ptrace_event(PTRACE_EVENT_EXIT).

Oleg.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-02-29 18:50 ` Oleg Nesterov
@ 2012-02-29 20:53   ` Tim Bird
  0 siblings, 0 replies; 10+ messages in thread
From: Tim Bird @ 2012-02-29 20:53 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Denys Vlasenko, linux kernel

On 02/29/2012 10:50 AM, Oleg Nesterov wrote:
> On 02/29, Tim Bird wrote:
>>
>> ptrace maintainers (and interested parties)...
>>
>> I'm working on a crash handler for Linux, which uses ptrace to retrieve information
>> about a process during it's coredump.  Specifically, from within a core handler
>> program (started within do_coredump() as a user_mode_helper), I would like to make
>> ptrace calls against the dying process.
> 
> Which calls? just curious.

Right now, I'm using:
	PTRACE_ATTACH, PTRACE_GETREGS,
	PTRAGE_PEEKTEXT and PTRACE_GETSIGINFO


>> My problem is that the process state is not entering into TASK_TRACED, when
>> I do an PTRACE_ATTACH against it.
> 
> Yes, it can never do ptrace_stop() in do_coredump() paths.

That's what I figured.

> Perhaps you can use PTRACE_O_TRACEEXIT. PTRACE_EVENT_EXIT will be reported
> after the coredumping. I think the core handler should close the pipe first,
> otherwise the dumping tracee will wait for the handler forever.
>
> However. You need PTRACE_SEIZE, not PTRACE_ATTACH. And this can only work
> with the recent patch from Denys which allows to pass PTRACE_O_TRACEEXIT
> with PTRACE_SEIZE (currently in -mm tree).
> 
> Just in case, it could have other threads sleeping in TASK_UNINTERRUPTIBLE
> until do_coredump() completes. But these threads have already passed
> ptrace_event(PTRACE_EVENT_EXIT).

I'll look at these and see if they might work.  I've determined
that there are some funny races for accessing /proc, using ptrace,
and reading the coredump from stdin (in the core pipe handler)
associated with the technique I'm currently using. Maybe some
of these other ptrace requests or options would help out.

Thanks very much for the suggestions.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-02-29 18:06 Questions about ptrace on a dying process Tim Bird
  2012-02-29 18:50 ` Oleg Nesterov
@ 2012-02-29 19:12 ` Andi Kleen
  2012-02-29 20:45   ` Tim Bird
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2012-02-29 19:12 UTC (permalink / raw)
  To: Tim Bird; +Cc: Roland McGrath, Oleg Nesterov, Denys Vlasenko, linux kernel

Tim Bird <tim.bird@am.sony.com> writes:

> ptrace maintainers (and interested parties)...
>
> I'm working on a crash handler for Linux, which uses ptrace to retrieve information
> about a process during it's coredump.  Specifically, from within a core handler
> program (started within do_coredump() as a user_mode_helper), I would like to make
> ptrace calls against the dying process.

The standard approach is to define a core pipe handler and parse the
elf memory dump.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-02-29 19:12 ` Andi Kleen
@ 2012-02-29 20:45   ` Tim Bird
  2012-03-01  7:12     ` Denys Vlasenko
  0 siblings, 1 reply; 10+ messages in thread
From: Tim Bird @ 2012-02-29 20:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Oleg Nesterov, Denys Vlasenko, linux kernel

On 02/29/2012 11:12 AM, Andi Kleen wrote:
> Tim Bird <tim.bird@am.sony.com> writes:
> 
>> ptrace maintainers (and interested parties)...
>>
>> I'm working on a crash handler for Linux, which uses ptrace to retrieve information
>> about a process during it's coredump.  Specifically, from within a core handler
>> program (started within do_coredump() as a user_mode_helper), I would like to make
>> ptrace calls against the dying process.
> 
> The standard approach is to define a core pipe handler and parse the
> elf memory dump.

Yeah - I may be doing something new here.  Android uses ptrace
in debuggerd, which is their crash reporting tool, but they wake
it up with signals before the dying program goes into coredump.

I'm taking a different approach and trying to do initiated
by the coredump feature in Linux.  This makes it so that
a process does not need to be persistently running to capture
these events.

This is on embedded systems, where the dump is not saved.  The dump
is available via stdin to the core pipe handler, but it would be
kind of a pain to wrapper that for random access, which is needed
for stuff like stack unwinding.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-02-29 20:45   ` Tim Bird
@ 2012-03-01  7:12     ` Denys Vlasenko
  2012-03-01 18:18       ` Tim Bird
  2012-03-01 18:30       ` Tim Bird
  0 siblings, 2 replies; 10+ messages in thread
From: Denys Vlasenko @ 2012-03-01  7:12 UTC (permalink / raw)
  To: Tim Bird; +Cc: Andi Kleen, Oleg Nesterov, linux kernel

On Wednesday 29 February 2012 21:45, Tim Bird wrote:
> On 02/29/2012 11:12 AM, Andi Kleen wrote:
> > Tim Bird <tim.bird@am.sony.com> writes:
> > 
> >> ptrace maintainers (and interested parties)...
> >>
> >> I'm working on a crash handler for Linux, which uses ptrace to retrieve information
> >> about a process during it's coredump.  Specifically, from within a core handler
> >> program (started within do_coredump() as a user_mode_helper), I would like to make
> >> ptrace calls against the dying process.
> > 
> > The standard approach is to define a core pipe handler and parse the
> > elf memory dump.
> 
> Yeah - I may be doing something new here.  Android uses ptrace
> in debuggerd, which is their crash reporting tool, but they wake
> it up with signals before the dying program goes into coredump.

I think ptrace API does not provide guarantees that it is possible
to attach to the process when it coredumps.

It might work in current kernels, but might break in new ones.

> I'm taking a different approach and trying to do initiated
> by the coredump feature in Linux.  This makes it so that
> a process does not need to be persistently running to capture
> these events.
> 
> This is on embedded systems, where the dump is not saved.  The dump
> is available via stdin to the core pipe handler, but it would be
> kind of a pain to wrapper that for random access, which is needed
> for stuff like stack unwinding.

Stack unwinding only requires the stack data and knowledge
of the mapped binary and library files. You can parse coredump's ELF header,
and skip all sizable data segments which you won't need anyway.

I estimate that usually you will need to save only ~150k of data
in order to produce a stacktrace, and even then,
only because Linux pre-allocates ridiculously large
stack for every new process - 132k. It can easily be reduced
to something saner with one-line patch.

-- 
vda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-03-01  7:12     ` Denys Vlasenko
@ 2012-03-01 18:18       ` Tim Bird
  2012-03-02  1:29         ` Denys Vlasenko
  2012-03-01 18:30       ` Tim Bird
  1 sibling, 1 reply; 10+ messages in thread
From: Tim Bird @ 2012-03-01 18:18 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Andi Kleen, Oleg Nesterov, linux kernel

On 02/29/2012 11:12 PM, Denys Vlasenko wrote:
> On Wednesday 29 February 2012 21:45, Tim Bird wrote:
>> This is on embedded systems, where the dump is not saved.  The dump
>> is available via stdin to the core pipe handler, but it would be
>> kind of a pain to wrapper that for random access, which is needed
>> for stuff like stack unwinding.
> 
> Stack unwinding only requires the stack data and knowledge
> of the mapped binary and library files. You can parse coredump's ELF header,
> and skip all sizable data segments which you won't need anyway.
> 
> I estimate that usually you will need to save only ~150k of data
> in order to produce a stacktrace, and even then,
> only because Linux pre-allocates ridiculously large
> stack for every new process - 132k. It can easily be reduced
> to something saner with one-line patch.

My budget for each crash report is about 8k.  I have to do
the unwind at the time of the crash (on target) (and without
symbols - these are added later on a host).  Given the other
stuff I want to save, saving the whole stack is usually not
an option, and saving a coredump is out of the question.
 -- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-03-01 18:18       ` Tim Bird
@ 2012-03-02  1:29         ` Denys Vlasenko
  0 siblings, 0 replies; 10+ messages in thread
From: Denys Vlasenko @ 2012-03-02  1:29 UTC (permalink / raw)
  To: Tim Bird; +Cc: Andi Kleen, Oleg Nesterov, linux kernel

On Thursday 01 March 2012 19:18, Tim Bird wrote:
> On 02/29/2012 11:12 PM, Denys Vlasenko wrote:
> > On Wednesday 29 February 2012 21:45, Tim Bird wrote:
> >> This is on embedded systems, where the dump is not saved.  The dump
> >> is available via stdin to the core pipe handler, but it would be
> >> kind of a pain to wrapper that for random access, which is needed
> >> for stuff like stack unwinding.
> > 
> > Stack unwinding only requires the stack data and knowledge
> > of the mapped binary and library files. You can parse coredump's ELF header,
> > and skip all sizable data segments which you won't need anyway.
> > 
> > I estimate that usually you will need to save only ~150k of data
> > in order to produce a stacktrace, and even then,
> > only because Linux pre-allocates ridiculously large
> > stack for every new process - 132k. It can easily be reduced
> > to something saner with one-line patch.
> 
> My budget for each crash report is about 8k.  I have to do
> the unwind at the time of the crash (on target) (and without
> symbols - these are added later on a host).  Given the other
> stuff I want to save, saving the whole stack is usually not
> an option, and saving a coredump is out of the question.

How about this algorithm?

Read coredump sequentially. First come ELF header and program headers.
Read headers and remember their virtual address ranges, sizes, and flags.
The rest you don't need.

Then comes NOTE segment. Parse it and fetch the value of stack pointer
register. Don't save anything else.

Then read LOAD segments and discard their data until you reach a
stack segment (one which stack pointer points to).

Read all words from stack starting from stack pointer up to the top
of the stack. If a word, when interpreted as a pointer, points into
an executable segment, then add it to stack trace. Else, ignore the word.

After you finished reading the stack, read and discard the rest (there
usually is nothing after stack).

Then you can sanitize stack trace: if some addresses there point
to a *beginning* of a function, not in the middle of it,
then it's likely a false positive (someone was passing function
pointer parameter, or had it in on-stack auto variable).
To do this sanitization, you don't need (and can't use, even if you'd have it)
coredump - you need to examine binary and libraries instead.

But I suspect even without sanitizing it, resulting stack trace
will often be rather good: it will contain all real return addresses,
and often will have no false positives at all.

-- 
vda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-03-01  7:12     ` Denys Vlasenko
  2012-03-01 18:18       ` Tim Bird
@ 2012-03-01 18:30       ` Tim Bird
  2012-03-01 19:02         ` Denys Vlasenko
  1 sibling, 1 reply; 10+ messages in thread
From: Tim Bird @ 2012-03-01 18:30 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Andi Kleen, Oleg Nesterov, linux kernel, tj

On 02/29/2012 11:12 PM, Denys Vlasenko wrote:
> On Wednesday 29 February 2012 21:45, Tim Bird wrote:
>> On 02/29/2012 11:12 AM, Andi Kleen wrote:
>>> Tim Bird <tim.bird@am.sony.com> writes:
>>>
>>>> ptrace maintainers (and interested parties)...
>>>>
>>>> I'm working on a crash handler for Linux, which uses ptrace to retrieve information
>>>> about a process during it's coredump.  Specifically, from within a core handler
>>>> program (started within do_coredump() as a user_mode_helper), I would like to make
>>>> ptrace calls against the dying process.
>>>
>>> The standard approach is to define a core pipe handler and parse the
>>> elf memory dump.
>>
>> Yeah - I may be doing something new here.  Android uses ptrace
>> in debuggerd, which is their crash reporting tool, but they wake
>> it up with signals before the dying program goes into coredump.
> 
> I think ptrace API does not provide guarantees that it is possible
> to attach to the process when it coredumps.
> 
> It might work in current kernels, but might break in new ones.

If it's not too much trouble, it would be nice to continue
this behaviour.   Admittedly, my style of crash handling appears
to be new, and I don't want to unnecessarily burden the code,
but so far it works really well, and currently requires very
minimal change to the existing ptrace code.  One thing that's
nice about what I'm doing, is that I don't rely on the
whole signal state machine of the process to interact with it
(since a dying process can't respond correctly).

So hopefully, continuing to support ptrace for a dying process
won't interfere or burden the existing (rather complex)
state processing in the current code.

Just for reference, below is the patch I settled on for my own kernel.
I'm planning taking a look at the PTRACE_SEIZE code to see if it
accomplishes what I need, but haven't done that yet.

I do have a question, though - how will a tracer know that it can
use PTRACE_SEIZE?  Is there some introspection API?  My code will
be running mainly against a patched 3.0 for some time (a few years),
but if I could make it interoperate with a kernel that supports
PTRACE_SEIZE (when it is mainlined), that would be great.

 -- Tim

commit dd54b901759428e60ed57b2d6cb77d25a8db767f
Author: tbird <tim.bird@am.sony.com>
Date:   Tue Feb 28 14:13:08 2012 -0800

    Support ptrace_attach with no signal side-effects.

    In the normal case, a ptrace_attach operation will convert
    a process to TASK_TRACED by sending it a SIGSTOP signal,
    after setting task->ptrace.  This won't work on a dying
    process because during do_coredump(), the dying process won't
    process the STOP signal and change state.

    Modify ptrace_attach() so that the tracee task state is modified
    directly.  This allows subsequent ptrace_check_attach() calls
    to work correctly, and avoids having a pending SIGSTOP signal
    on the tree process (which interferes with waiting for
    the core pipe handler).

    Note that a more full-featured implementation of this is in the
    works (as of March, 2012) by Tejun Heo, called PTRACE_SEIZE.
    Once that gets mainlined, this patch may not be needed, or might
    need to be reworked.

    Signed-off-by: Tim Bird <tim.bird@am.sony.com>

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 26147d1..9c7bf8e 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -223,7 +223,16 @@ static int ptrace_attach(struct task_struct *task)
 		task->ptrace |= PT_PTRACE_CAP;

 	__ptrace_link(task, current);
-	send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+	
+	/*
+ 	 * If doing coredump, just convert directly to TASK_TRACED.
+ 	 * A dying process doesn't process signals normally.
+ 	 */
+	if (unlikely(task->mm->core_state)) {
+		set_task_state(task, TASK_TRACED);
+	} else {
+		send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+	}

 	spin_lock(&task->sighand->siglock);


=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Questions about ptrace on a dying process
  2012-03-01 18:30       ` Tim Bird
@ 2012-03-01 19:02         ` Denys Vlasenko
  0 siblings, 0 replies; 10+ messages in thread
From: Denys Vlasenko @ 2012-03-01 19:02 UTC (permalink / raw)
  To: Tim Bird; +Cc: Andi Kleen, Oleg Nesterov, linux kernel, tj

On Thu, Mar 1, 2012 at 7:30 PM, Tim Bird <tim.bird@am.sony.com> wrote:
>>> Yeah - I may be doing something new here.  Android uses ptrace
>>> in debuggerd, which is their crash reporting tool, but they wake
>>> it up with signals before the dying program goes into coredump.
>>
>> I think ptrace API does not provide guarantees that it is possible
>> to attach to the process when it coredumps.
>>
>> It might work in current kernels, but might break in new ones.
>
> If it's not too much trouble, it would be nice to continue
> this behaviour.   Admittedly, my style of crash handling appears
> to be new, and I don't want to unnecessarily burden the code,
> but so far it works really well, and currently requires very
> minimal change to the existing ptrace code.  One thing that's
> nice about what I'm doing, is that I don't rely on the
> whole signal state machine of the process to interact with it
> (since a dying process can't respond correctly).
>
> So hopefully, continuing to support ptrace for a dying process
> won't interfere or burden the existing (rather complex)
> state processing in the current code.
>
> Just for reference, below is the patch I settled on for my own kernel.

You added yet another quirk to ptrace API - and this API
already has no shortage of quirks.
Of course you can maintain it for your kernel, but
adding it in mainline is a bizarre proposition.

I'm not even sure that _if_ PTRACE_SEIZE with PTRACE_O_TRACEEXIT
option works on semi-dead core-dumping process today without
patching, it is guaranteed to do so in the future.


> I'm planning taking a look at the PTRACE_SEIZE code to see if it
> accomplishes what I need, but haven't done that yet.
>
> I do have a question, though - how will a tracer know that it can
> use PTRACE_SEIZE?

By looking at kernel version.

-- 
vda

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-02  1:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-29 18:06 Questions about ptrace on a dying process Tim Bird
2012-02-29 18:50 ` Oleg Nesterov
2012-02-29 20:53   ` Tim Bird
2012-02-29 19:12 ` Andi Kleen
2012-02-29 20:45   ` Tim Bird
2012-03-01  7:12     ` Denys Vlasenko
2012-03-01 18:18       ` Tim Bird
2012-03-02  1:29         ` Denys Vlasenko
2012-03-01 18:30       ` Tim Bird
2012-03-01 19:02         ` Denys Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox