From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>, Steven Rostedt <rostedt@goodmis.org>,
Arnaldo Carvalho de Melo <acme@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
Oleg Nesterov <oleg@redhat.com>, Mark Wielaard <mjw@redhat.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Andrew Morton <akpm@linux-foundation.org>,
Naren A Devaiah <naren.devaiah@in.ibm.com>,
Jim Keniston <jkenisto@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"Frank Ch. Eigler" <fche@redhat.com>,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
LKML <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Subject: Re: [PATCHv11 2.6.36-rc2-tip 5/15] 5: uprobes: Uprobes (un)registration and exception handling.
Date: Tue, 7 Sep 2010 17:21:17 +0530 [thread overview]
Message-ID: <20100907115117.GJ14891@linux.vnet.ibm.com> (raw)
In-Reply-To: <1283852003.1930.1133.camel@laptop>
> > multiple processes example libc), and the user is interested in tracing
> > just one instance of the process, then dont wont the inode based tracing
> > amount to far more number of breakpoints hits?
>
> Not if your filter function works.
>
> So let me try this again, (assumes boosted probes):
>
> struct uprobe {
> struct inode *inode; /* we hold a ref */
> unsigned long offset;
>
> int (*handler)(void); /* arguments.. ? */
> int (*filter)(struct task_struct *);
>
> int insn_size; /* size of */
> char insn[MAX_INSN_SIZE]; /* the original insn */
>
> int ret_addr_offset; /* return addr offset
> in the slot */
> char replacement[SLOT_SIZE]; /* replacement
> instructions */
>
> atomic_t ref; /* lifetime muck */
> struct rcu_head rcu;
> };
struct uprobe is a input structure. Do we want to have
implementation details in it?
>
> static struct {
> raw_spinlock_t tree_lock;
> rb_root tree;
> } uprobes;
>
> static void uprobes_add(struct uprobe *uprobe)
> {
> /* add to uprobes.tree, sorted on inode:offset */
> }
>
> static void uprobes_del(struct uprobe *uprobe)
> {
> /* delete from uprobes.tree */
> }
>
> static struct uprobe *
> uprobes_find_get(struct address_space *mapping, unsigned long offset)
> {
> unsigned long flags;
> struct uprobe *uprobe;
>
> raw_spin_lock_irqsave(&uprobes.treelock, flags);
> uprobe = find_in_tree(&uprobes.tree);
Wouldnt this be a scalability issue on bigger machines?
Every probehit having to parse a global tree to figureout which
uprobe it was seems a overkill.
Consider a 5000 uprobes placed on a 128 box with probes placed on
heavily used functions.
> if (!atomic_inc_not_zero(&uprobe->ref))
> uprobe = NULL;
> raw_spin_unlock_irqrestore(&uprobes.treelock, flags);
>
> return uprobe;
> }
>
> static void __uprobe_free(struct rcu_head *head)
> {
> struct uprobe *uprobe = container_of(head, struct uprobe, rcu);
>
> kfree(uprobe);
> }
>
> static void put_uprobe(struct uprobe *uprobe)
> {
> if (atomic_dec_and_test(&uprobe->ref))
> call_rcu(&uprobe->rcu, __uprobe_free);
How are we synchronizing put_uprobe and a thread that has hit the
breakpoint and searching thro global probes list?
One Nit: On probe hit we increment the ref only few times. However
we are decrementing everytime. So if two probes occur on two cpus
simultaneously, we have a chance of uprobe being freed after both of
them are handled. (or am I missing something?)
> }
>
> static inline int valid_vma(struct vm_area_struct *vma)
> {
> if (!vma->vm_file)
> return 0;
>
> if (vma->vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED) ==
> (VM_READ|VM_EXEC))
> return 1;
>
> return 0;
> }
>
> int register_uprobe(struct uprobe *uprobe)
> {
> struct vm_area_struct *vma;
>
> inode_get(uprobe->inode);
> atomic_set(1, &uprobe->ref);
>
> uprobes_add(uprobe); /* add before the rmap walk, so that
> new mmap()s will find it too */
>
> for_each_rmap_vma(vma, uprobe->inode->i_mapping) {
I understand that perf top calls perf record in a loop.
For every perf record, we would be looping thro each vma associated with
the inode.
For a probe on a libc, we would iterate thro all vmas. If the
tracing was per posix process, this may not be needed.
> struct mm_struct *mm = vma->vm_mm;
> int install_probe = 0;
>
> if (!valid_vma(vma))
> continue;
>
> for_each_task_in_process(p, mm->owner) {
> if (uprobe->filter(p)) {
> p->has_uprobe = 1;
> install_probe = 1;
> }
> }
>
> if (install_probe) {
> mm->has_uprobes = 1;
> frob_text(uprobe, mm);
> }
> }
> }
>
> void unregister_uprobe(struct uprobe *uprobe)
> {
> /* pretty much the same, except restore the original text */
> put_uprobe(uprobe);
> }
>
> void uprobe_fork(struct task_struct *child)
> {
> struct vm_area_struct *vma;
>
> if (!child->mm->has_uprobes)
> return;
>
> for_each_vma(vma, child->mm) {
> struct uprobe *uprobe;
>
> if (!valid_vma(vma))
> continue;
>
> for_each_probe_in_mapping(uprobe, vma->vm_file->f_mapping) {
Are you looking at listing of uprobes per vma?
Does it again traverse the global list?
> if (uprobe->filter(child)) {
> child->has_uprobe = 1;
> return;
> }
> }
> }
> }
>
> void uprobe_mmap(struct vm_area_struct *vma)
> {
> struct uprobe *uprobe;
>
> if (!valid_vma(vma))
> return;
>
> for_each_probe_in_mapping(uprobe, vma->vm_file->f_mapping) {
For each mmap, we are traversing all elements in the global tree?
What would happen if we have a huge number of uprobes in a system all
from one user on his app. Wont it slow down mmap for all other users?
> int install_probe = 0;
>
> for_each_task_in_process(p, vma->vm_mm->owner) {
> if (uprobe->filter(p)) {
> p->has_uprobe = 1;
> install_probe = 1;
> }
> }
>
> if (install_probe) {
> mm->has_uprobes = 1;
> frob_text(uprobe, mm);
> }
> }
> }
>
> void uprobe_hit(struct pt_regs *regs)
> {
> unsigned long addr = instruction_pointer(regs);
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma;
> unsigned long offset;
>
> down_read(&mm->mmap_sem);
uprobe_hit I assume is going to be called in interrupt context.
I guess down_read can sleep here.
> vma = find_vma(mm, addr);
>
> if (!valid_vma)
> goto fail;
>
> offset = addr - vma->vm_start + (vma->vm_pgoff << PAGE_SHIFT);
Again for every probehit, we are going through the list of vmas and
checking if it has a probe which I think is unnecessary.
Nit: In some archs, the instruction pointer might be pointing to th next
instruction after a breakpoint hit, we would have to adjust that.
> uprobe = uprobes_find_get(vma->vm_file->f_mapping, offset);
> up_read(&mm->mmap_sem);
>
> if (!uprobe)
> goto fail;
>
> if (current->has_uprobe && uprobe->filter(current))
> uprobe->handle();
>
> ret_addr = addr + uprobe->insn_size;
>
> cpu = get_cpu()
> slot = get_slot(cpu);
> memcpy(slot, uprobe->replacement, SLOT_SIZE);
> memcpy(slot + uprobe->ret_addr_offset, &ret_addr, sizeof(unsigned
> long));
> set_instruction_pointer(regs, uaddr_addr_of(slot));
> put_cpu(); /* preemption notifiers would take it from here */
>
What if we were pre-empted after this. Would preemption notifiers also
do a copy of instruction to the new slot? If yes, can you please
update me with more pointers.
And I dont know if we can do a boosting for all instructions.
I think even on kprobes we dont do a boosting for all instructions.
Masami, Can you correct me on this?
> put_uprobe(uprobe);
> return;
>
> fail:
> SIGTRAP
> }
>
> See, no extra traps, no funny intermediate data structures to manage,
> and you get the power of ->filter() to implement whatever policy you
> want, including simple process wide things.
Yes, I see its advantages and disadvantages, I feel this
implementation wouldnt scale. Just because we dont want to
housekeep some information, we are looping thro the global tree to
figure out if there is uprobe specific stuff to be done.
--
Thanks and Regards
Srikar
next prev parent reply other threads:[~2010-09-07 11:57 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-25 13:41 [PATCHv11 2.6.36-rc2-tip 0/15] 0: Uprobes Patches Srikar Dronamraju
2010-08-25 13:41 ` [PATCHv11 2.6.36-rc2-tip 1/15] 1: mm: Move replace_page() / write_protect_page() to mm/memory.c Srikar Dronamraju
2010-08-25 13:41 ` [PATCHv11 2.6.36-rc2-tip 2/15] 2: uprobes: Breakpoint insertion/removal in user space applications Srikar Dronamraju
2010-09-01 19:38 ` Peter Zijlstra
2010-08-25 13:41 ` [PATCHv11 2.6.36-rc2-tip 3/15] 3: uprobes: Slot allocation for Execution out of line(XOL) Srikar Dronamraju
2010-09-01 20:13 ` Peter Zijlstra
2010-09-03 16:40 ` Srikar Dronamraju
2010-09-03 16:51 ` Peter Zijlstra
2010-09-03 17:26 ` Srikar Dronamraju
2010-09-03 17:41 ` Peter Zijlstra
2010-09-06 5:38 ` Srikar Dronamraju
2010-09-03 17:25 ` Peter Zijlstra
2010-09-02 8:23 ` Peter Zijlstra
2010-09-02 17:47 ` Srikar Dronamraju
2010-09-03 7:26 ` Peter Zijlstra
2010-09-06 17:59 ` Srikar Dronamraju
2010-09-06 18:20 ` Peter Zijlstra
2010-09-06 18:28 ` Peter Zijlstra
2010-08-25 13:42 ` [PATCHv11 2.6.36-rc2-tip 4/15] 4: uprobes: x86 specific functions for user space breakpointing Srikar Dronamraju
2010-09-03 10:26 ` Andi Kleen
2010-09-03 17:48 ` Srikar Dronamraju
2010-09-03 18:00 ` Peter Zijlstra
2010-09-06 7:53 ` Andi Kleen
2010-09-06 13:44 ` Srikar Dronamraju
2010-09-06 14:16 ` Andi Kleen
2010-09-07 0:56 ` Masami Hiramatsu
2010-08-25 13:42 ` [PATCHv11 2.6.36-rc2-tip 5/15] 5: uprobes: Uprobes (un)registration and exception handling Srikar Dronamraju
2010-09-01 21:43 ` Peter Zijlstra
2010-09-02 8:12 ` Peter Zijlstra
2010-09-03 16:42 ` Srikar Dronamraju
2010-09-03 17:19 ` Peter Zijlstra
2010-09-06 17:46 ` Srikar Dronamraju
2010-09-06 18:15 ` Peter Zijlstra
2010-09-06 18:15 ` Peter Zijlstra
2010-09-07 6:48 ` Srikar Dronamraju
2010-09-07 9:33 ` Peter Zijlstra
2010-09-07 11:51 ` Srikar Dronamraju [this message]
2010-09-07 12:25 ` Peter Zijlstra
2010-09-06 18:25 ` Mathieu Desnoyers
2010-09-06 20:40 ` Christoph Hellwig
2010-09-06 21:06 ` Peter Zijlstra
2010-09-06 21:12 ` Christoph Hellwig
2010-09-06 21:18 ` Peter Zijlstra
2010-09-07 12:02 ` Srikar Dronamraju
2010-09-07 16:47 ` Mathieu Desnoyers
2010-09-03 17:27 ` Peter Zijlstra
2010-09-01 21:46 ` Peter Zijlstra
2010-08-25 13:42 ` [PATCHv11 2.6.36-rc2-tip 6/15] 6: uprobes: X86 support for Uprobes Srikar Dronamraju
2010-08-25 13:42 ` [PATCHv11 2.6.36-rc2-tip 7/15] 7: uprobes: Uprobes Documentation Srikar Dronamraju
2010-08-25 13:42 ` [PATCHv11 2.6.36-rc2-tip 8/15] 8: tracing: Extract out common code for kprobes/uprobes traceevents Srikar Dronamraju
2010-08-25 13:43 ` [PATCHv11 2.6.36-rc2-tip 9/15] 9: tracing: uprobes trace_event interface Srikar Dronamraju
2010-08-25 13:43 ` [PATCHv11 2.6.36-rc2-tip 10/15] 10: tracing: config option to enable both kprobe-tracer and uprobe-tracer Srikar Dronamraju
2010-08-26 6:02 ` Masami Hiramatsu
2010-08-27 9:31 ` Srikar Dronamraju
2010-08-27 11:04 ` Masami Hiramatsu
2010-08-27 12:17 ` Srikar Dronamraju
2010-08-27 15:37 ` Masami Hiramatsu
2010-08-27 14:10 ` [PATCHv11a " Srikar Dronamraju
2010-08-25 13:43 ` [PATCHv11 2.6.36-rc2-tip 11/15] 11: perf: list symbols in a dso in ascending order Srikar Dronamraju
2010-08-25 23:21 ` Arnaldo Carvalho de Melo
2010-08-26 4:32 ` Srikar Dronamraju
2010-08-30 8:35 ` [tip:perf/core] perf symbols: List symbols in a dso in ascending name order tip-bot for Srikar Dronamraju
2010-08-25 13:43 ` [PATCHv11 2.6.36-rc2-tip 12/15] 12: perf: show possible probes in a given file Srikar Dronamraju
2010-08-27 14:21 ` [PATCHv11a " Srikar Dronamraju
2010-10-20 9:56 ` Masami Hiramatsu
2010-08-25 13:43 ` [PATCHv11 2.6.36-rc2-tip 13/15] 13: perf: Loop thro each of the maps in a map_group Srikar Dronamraju
2010-08-25 13:44 ` [PATCHv11 2.6.36-rc2-tip 14/15] 14: perf: perf interface for uprobes Srikar Dronamraju
2010-08-25 13:44 ` [PATCHv11 2.6.36-rc2-tip 15/15] 15: perf: Show Potential probe points Srikar Dronamraju
2010-10-29 9:23 ` [PATCHv11 2.6.36-rc2-tip 0/15] 0: Uprobes Patches Christoph Hellwig
2010-10-29 10:48 ` Srikar Dronamraju
2010-11-04 18:45 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100907115117.GJ14891@linux.vnet.ibm.com \
--to=srikar@linux.vnet.ibm.com \
--cc=acme@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=ananth@in.ibm.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=hch@infradead.org \
--cc=jkenisto@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@elte.hu \
--cc=mjw@redhat.com \
--cc=naren.devaiah@in.ibm.com \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox