From: Masami Hiramatsu <mhiramat@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Mel Gorman <mel@csn.ul.ie>,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
Jim Keniston <jkenisto@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"Frank Ch. Eigler" <fche@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Roland McGrath <roland@redhat.com>,
Oleg Nesterov <oleg@redhat.com>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH v1 7/10] Uprobes Implementation
Date: Tue, 23 Mar 2010 13:36:56 -0400 [thread overview]
Message-ID: <4BA8FC38.9060002@redhat.com> (raw)
In-Reply-To: <1269357311.5109.81.camel@twins>
Peter Zijlstra wrote:
> On Tue, 2010-03-23 at 10:20 -0400, Masami Hiramatsu wrote:
>
>>> And yes, all processes that share that DSO, consumers can install
>>> filters.
>>
>> Hmm, for low-level interface, it will be good. If we provide
>> a user interface(trace_uprobe.c), we'd better add pid filter
>> for it.
>
> ftrace already has pid filters.
Indeed.
>
>>>>> Also, like we discussed in person, I think we can do away with the
>>>>> handler_in_interrupt thing by letting the handler have an error return
>>>>> value and doing something like:
>>>>>
>>>>> do_int3:
>>>>>
>>>>> uprobe = find_probe_point(addr);
>>>>>
>>>>> pagefault_disable();
>>>>> err = uprobe->handler(uprobe, regs);
>>>>> pagefault_enable();
>>>>>
>>>>> if (err == -EFAULT) {
>>>>> /* set TIF flag and call the handler again from
>>>>> task context */
>>>>> }
>>>>>
>>>>> This should allow the handler to optimistically access memory from the
>>>>> trap handler, but in case it does need to fault pages in we'll call it
>>>>> from task context.
>>>>
>>>> Okay but what if the handler is coded to sleep.
>>>
>>> Don't do that ;-)
>>>
>>> What reason would you have to sleep from a int3 anyway? You want to log
>>> bits and get on with life, right? The only interesting case is faulting
>>> when some memory references you want are not currently available, and
>>> that can be done as suggested.
>>
>> Out of curiously, what does the task-context mean? ('current' is probed
>> task in int3, isn't it?). I think, uprobe handler can cause page fault
>> (and should sleep) if the page is swapped out.
>
> Task context means the regular kernel task stack where we can schedule,
> int3 has its own exception stack and we cannot schedule from that.
>
> And yes, the fault thing is the one case where sleeping makes sense and
> is dealt with in my proposal, you don't need two handlers for that, just
> call it from trap context with pagefault_disable() and when it fails
> with -EFAULT set a TIF flag to deal with it later when we're back in
> task context.
Ah, I see. so it will be done later. Actually, since int3 handler will
disable irq, it is reasonable.
> There is a very good probability that the memory you want to reference
> is mapped (because typically the program itself will want to access it
> as well) so doing the optimistic access with pagefault_disabled() will
> work most of the times and you only end up taking the slow path when it
> does indeed fault.
hm, similar technique can be applied to kprobe-tracer too (for getting
__user arguments). :)
>>>>> Everybody else simply places callbacks in kernel/fork.c and
>>>>> kernel/exit.c, but as it is I don't think you want per-task state like
>>>>> this.
>>>>>
>>>>> One thing I would like to see is a slot per task, that has a number of
>>>>> advantages over the current patch-set in that it doesn't have one page
>>>>> limit in number of probe sites, nor do you need to insert vmas into each
>>>>> and every address space that happens to have your DSO mapped.
>>>>>
>>>>
>>>> where are the per task slots stored?
>>>> or Are you looking at a XOL vma area per DSO?
>>>
>>> The per task slot (note the singular, each task needs only ever have a
>>> single slot since a task can only ever hit one trap at a time) would
>>> live in the task TLS or task stack.
>>
>> Hmm, I just worried about whether TLS/task stack can be executable
>> (no one set NX bit).
>
> You can remove the NX bit from that one page I guess.
OK.
>>>>> Also, I would simply kill the user_bkpt stuff and merge it into uprobes,
>>>>> we don't have a kernel_bkpt thing either, only kprobes.
>>>>>
>>>>
>>>> We had uprobes as one single layer. However it was suggested that
>>>> breaking it up into two layers was useful because it would help code
>>>> reuse. Esp it was felt that a generic user_bkpt layer would be far more
>>>> useful than being used for just uprobes.
>>>> Here are links where these discussion happened.
>>>
>>> I'm so not going to read ancient emails on a funky list. What re-use?
>>> uprobe should be the only interface to this, there's no second interface
>>> to kprobes either is there?
>>
>> It will be good when we start working on 'ptrace2' :)
>> Anyway, the patch order looks a bit odd, because user_bkpt uses XOL
>> but XOL patch is introduced after user_bkpt patch...
>
> But why would ptrace2 use a different interface? Also, why introduce
> some abstraction layer now without having a user for it, you could
> always restructure things and or add interfaces later when you have a
> clear idea what it is you need.
Because 'ptrace' doesn't have any breakpoint insertion helper.
Programs which uses ptrace must setup single-stepping buffer and
modify target code by themselves. This causes problems when
multiple debuggers/tracers attach to the same process and
try to modify same address. First program can see the original
instruction, but next one will see int3! I think we'd better
provide some abstraction interface for breakpoint setting in
next generation ptrace (of course, we also need to provide
memory peek interface which returns original instructions).
But anyway, I agree with you, we don't need it *now*, but someday.
Thank you,
--
Masami Hiramatsu
e-mail: mhiramat@redhat.com
next prev parent reply other threads:[~2010-03-23 17:39 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-20 14:24 [PATCH v1 0/10] Uprobes patches Srikar Dronamraju
2010-03-20 14:25 ` [PATCH v1 1/10] Move Macro W to insn.h Srikar Dronamraju
2010-03-20 15:50 ` Masami Hiramatsu
2010-03-22 6:24 ` Srikar Dronamraju
2010-03-22 14:11 ` Masami Hiramatsu
2010-03-20 14:25 ` [PATCH v1 2/10] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-20 14:25 ` [PATCH v1 3/10] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-20 14:25 ` [PATCH v1 4/10] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-23 1:40 ` Andrew Morton
2010-03-23 4:48 ` Randy Dunlap
2010-03-23 11:26 ` Srikar Dronamraju
2010-03-20 14:25 ` [PATCH v1 5/10] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-20 14:26 ` [PATCH v1 6/10] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-20 14:26 ` [PATCH v1 7/10] Uprobes Implementation Srikar Dronamraju
2010-03-23 11:01 ` Peter Zijlstra
2010-03-23 11:04 ` Peter Zijlstra
2010-03-23 12:23 ` Srikar Dronamraju
2010-03-23 13:46 ` Peter Zijlstra
2010-03-23 14:20 ` Masami Hiramatsu
2010-03-23 15:15 ` Peter Zijlstra
2010-03-23 17:36 ` Masami Hiramatsu [this message]
2010-03-24 10:22 ` Srikar Dronamraju
2010-03-23 15:05 ` Ananth N Mavinakayanahalli
2010-03-23 15:15 ` Peter Zijlstra
2010-03-23 15:26 ` Frank Ch. Eigler
2010-03-24 5:59 ` Ananth N Mavinakayanahalli
2010-03-24 7:58 ` Srikar Dronamraju
2010-03-24 13:00 ` Peter Zijlstra
2010-03-25 7:56 ` Srikar Dronamraju
2010-03-25 8:41 ` Srikar Dronamraju
2010-03-20 14:26 ` [PATCH v1 8/10] X86 details for uprobes Srikar Dronamraju
2010-03-20 14:26 ` [PATCH v1 9/10] Uprobes Documentation patch Srikar Dronamraju
2010-03-22 3:00 ` Randy Dunlap
2010-03-22 5:34 ` Srikar Dronamraju
2010-03-22 14:51 ` Randy Dunlap
2010-03-20 14:26 ` [PATCH v1 10/10] Uprobes samples Srikar Dronamraju
2010-03-23 1:38 ` [PATCH v1 0/10] Uprobes patches Andrew Morton
2010-03-23 10:55 ` Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BA8FC38.9060002@redhat.com \
--to=mhiramat@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ananth@in.ibm.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hch@infradead.org \
--cc=jkenisto@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=roland@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox