* [0/3] Kprobes: User space probes support
@ 2006-03-20 6:07 Prasanna S Panchamukhi
2006-03-20 6:09 ` [1/3 PATCH] Kprobes: User space probes support- base interface Prasanna S Panchamukhi
2006-03-21 11:39 ` [0/3] Kprobes: User space probes support Christoph Hellwig
0 siblings, 2 replies; 33+ messages in thread
From: Prasanna S Panchamukhi @ 2006-03-20 6:07 UTC (permalink / raw)
To: akpm, Andi Kleen, davem, suparna, richardj_moore, prasanna; +Cc: linux-kernel
This patch set provides the basic infrastructure for user-space probes
based on IBM's Dprobes. Similar to kprobes, user-space probes uses the
breakpoint mechanism to insert probes. User has to write a kernel module
to insert probes in the executable/library and specify the handlers in
the kernel module. Using this mechanism user can not only log user-space
data structure present in the memory when probe was hit, but also log
kernel data structures, stack traces, system registers etc.
User-space tools like systemtap can use this interface to probe applications
and libraries.
Design and development of this user-space probe mechanism has been
discussed in systemtap mailing lists.
a) Features supported:
1. Probes on application executable.
2. Probes on libraries.
3. Probes are visible across fork().
4. Probes can be inserted even on executables/libraries
that are not even started running(pages not present in memory).
5. Multiple handlers can be inserted/removed for each probe.
b) Interfaces:
1. register_uprobe(struct uprobe *uprobe) : accepts a pointer to uprobe.
User has to allocate the uprobes structure and initialize following
elements:
pathname - points to the application's pathname
offset - offset of the probe from the file beginning;
[It's still the case that the user has to specify the offset as well
as the address (see TODO list)]
In case of library calls, the offset is the
relative offset from the beginning of the of
the mapped library.
kp.addr - virtual address within the executable.
kp.pre_handler - handler to be executed when probe is fired.
kp.post_handler - handler to be executed after single stepping
the original instruction.
kp.fault_handler- handler to be executed if fault occurs while
executing the original instruction or the
handlers.
As with a kprobe, the user should not modify the uprobe while it is
registered. This routine returns zero on successful registeration.
2. unregister_uprobe(struct uprobe *uprobe) : accepts a pointer to uprobe.
c) Objects:
struct uprobe - Allocated per probe by the user.
struct uprobe_module - Allocated per application by the userspace probe
mechanism.
struct uprobe {
/* pointer to the pathname of the application */
char *pathname;
/* kprobe structure with user specified handlers */
struct kprobe kp;
/* hlist of all the userspace probes per application */
struct hlist_node ulist;
/* inode of the probed application */
struct inode *inode;
/* probe offset within the file */
unsigned long offset;
};
struct uprobe_module {
/* hlist head of all userspace probes per application */
struct hlist_head ulist_head;
/* list of all uprobe_module for probed application */
struct list_head mlist;
/* to hold path/dentry etc. */
struct nameidata nd;
/* original readpage operations */
struct address_space_operations *ori_a_ops;
/* readpage hooks added operations */
struct address_space_operations user_a_ops;
};
d) Usage:
/* Allocate a uprobe structure */
struct uprobe p;
/* Define pre handler */
int handler_pre(struct kprobe *p, struct pt_regs *regs)
{
.............collect useful data..............
return 0;
}
void handler_post(struct kprobe *p, struct pt_regs *regs,
unsigned long flags)
{
.............collect useful data..............
}
int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
{
........ release allocated resources & try to recover ....
return 0;
}
Before inserting the probe, specify the pathname of the application
on which the probe is to be inserted.
/*pointer to the pathname of the application */
p.pathname = "/home/prasanna/bin/myapp";
p.kp.pre_handler=handler_pre;
p.kp.post_handler=handler_post;
p.kp.fault_handler=handler_fault;
/* Specify the probe address */
/* $nm appln |grep func1 */
p.kp.addr = (kprobe_opcode_t *)0x080484d4;
/* Specify the offset within the application/executable*/
p.offset = (unsigned long)0x4d4;
/* Now register the userspace probe */
if (ret = register_uprobe(&p))
printk("register_uprobe: unsuccessful ret= %d\n", ret);
/* To unregister the registered probed, just call..*/
unregister_uprobe(&p);
e) TODO List:
1. Execution of probe handlers are serialized using a uprobe_lock,
need to make them scalable.
2. Provide jprobes type mechansim to allow the handlers to run in user
mode.
3. Insert probes on copy-on-write pages. Tracks all COW pages for the
page containing the specified probe point and inserts/removes all
the probe points for that page.
4. Optimize the insertion of probes through readpage hooks. Identify
all the probes to be inserted on the read page and insert them at
once.
5. A wrapper routine to calculate the offset from the probed file
beginning. In case of dynamic shared library, the offset is
calculated by subtracting the beginning address of the file mapped
from the address of the probe point.
6. Probes are visible if a copy of the probed executable is made,
remove probes from the new copied image.
7. Make user-space probes coexist with other debuggers like gdb etc.
8. Support probepoints within MAX_INSN_SIZE bytes of the end of a
vm area. (Right now we just copy MAX_INSN_SIZE bytes and assume we
won't read of the end of the vm area.) This would be useful for
return probes, where we'd like to put the trampoline between
mm->end_code and the end of that page, if possible. Really, all we
need is 1 byte.
--
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-51776329
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [1/3 PATCH] Kprobes: User space probes support- base interface 2006-03-20 6:07 [0/3] Kprobes: User space probes support Prasanna S Panchamukhi @ 2006-03-20 6:09 ` Prasanna S Panchamukhi 2006-03-20 6:10 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi 2006-03-20 10:42 ` [1/3 PATCH] Kprobes: User space probes support- base interface Andrew Morton 2006-03-21 11:39 ` [0/3] Kprobes: User space probes support Christoph Hellwig 1 sibling, 2 replies; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 6:09 UTC (permalink / raw) To: akpm, Andi Kleen, davem, suparna, richardj_moore; +Cc: linux-kernel This patch provides two interfaces to insert and remove user space probes. Each probe is uniquely identified by inode and offset within that executable/library file. Insertion of a probe involves getting the code page for a given offset, mapping it into the memory and then insert the breakpoint at the given offset. Also the probe is added to the uprobe_table hash list. A uprobe_module data strcture is allocated for every probed application/library image on disk. Removal of a probe involves getting the code page for a given offset, mapping that page into the memory and then replacing the breakpoint instruction with a the original opcode. This patch also provides aggrigate probe handler feature, where user can define multiple handlers per probe. Signed-off-by : Prasanna S Panchamukhi <prasanna@in.ibm.com> arch/i386/kernel/uprobes.c | 71 +++++ fs/namei.c | 13 include/linux/kprobes.h | 51 +++ include/linux/namei.h | 1 kernel/Makefile | 2 kernel/kprobes.c | 3 kernel/uprobes.c | 589 +++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 725 insertions(+), 5 deletions(-) diff -puN include/linux/kprobes.h~kprobes_userspace_probes-base-interface include/linux/kprobes.h --- linux-2.6.16-rc6-mm2/include/linux/kprobes.h~kprobes_userspace_probes-base-interface 2006-03-20 10:34:58.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/include/linux/kprobes.h 2006-03-20 10:35:29.000000000 +0530 @@ -37,6 +37,10 @@ #include <linux/spinlock.h> #include <linux/rcupdate.h> #include <linux/mutex.h> +#include <linux/mm.h> +#include <linux/dcache.h> +#include <linux/namei.h> +#include <linux/pagemap.h> #ifdef CONFIG_KPROBES #include <asm/kprobes.h> @@ -54,6 +58,7 @@ struct kprobe; struct pt_regs; struct kretprobe; struct kretprobe_instance; +extern struct uprobe *current_uprobe; typedef int (*kprobe_pre_handler_t) (struct kprobe *, struct pt_regs *); typedef int (*kprobe_break_handler_t) (struct kprobe *, struct pt_regs *); typedef void (*kprobe_post_handler_t) (struct kprobe *, struct pt_regs *, @@ -117,6 +122,32 @@ struct jprobe { DECLARE_PER_CPU(struct kprobe *, current_kprobe); DECLARE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk); +struct uprobe { + /* pointer to the pathname of the application */ + char *pathname; + /* kprobe structure with user specified handlers */ + struct kprobe kp; + /* hlist of all the userspace probes per application */ + struct hlist_node ulist; + /* inode of the probed application */ + struct inode *inode; + /* probe offset within the file */ + unsigned long offset; +}; + +struct uprobe_module { + /* hlist head of all userspace probes per application */ + struct hlist_head ulist_head; + /* list of all uprobe_module for probed application */ + struct list_head mlist; + /* to hold path/dentry etc. */ + struct nameidata nd; + /* original readpage operations */ + struct address_space_operations *ori_a_ops; + /* readpage hooks added operations */ + struct address_space_operations user_a_ops; +}; + #ifdef ARCH_SUPPORTS_KRETPROBES extern void arch_prepare_kretprobe(struct kretprobe *rp, struct pt_regs *regs); #else /* ARCH_SUPPORTS_KRETPROBES */ @@ -162,9 +193,16 @@ extern void show_registers(struct pt_reg extern kprobe_opcode_t *get_insn_slot(void); extern void free_insn_slot(kprobe_opcode_t *slot); extern void kprobes_inc_nmissed_count(struct kprobe *p); +extern void copy_kprobe(struct kprobe *old_p, struct kprobe *p); +extern int arch_copy_uprobe(struct kprobe *p, kprobe_opcode_t *address); +extern void arch_arm_uprobe(kprobe_opcode_t *address); +extern void arch_disarm_uprobe(struct kprobe *p, kprobe_opcode_t *address); +extern void init_uprobes(void); /* Get the kprobe at this addr (if any) - called with preemption disabled */ struct kprobe *get_kprobe(void *addr); +struct kprobe *get_uprobe(void *addr); +extern int arch_alloc_insn(struct kprobe *p); struct hlist_head * kretprobe_inst_table_head(struct task_struct *tsk); /* kprobe_running() will just return the current_kprobe on this CPU */ @@ -183,6 +221,16 @@ static inline struct kprobe_ctlblk *get_ return (&__get_cpu_var(kprobe_ctlblk)); } +static inline void set_uprobe_instance(struct kprobe *p) +{ + current_uprobe = container_of(p, struct uprobe, kp); +} + +static inline void reset_uprobe_instance(void) +{ + current_uprobe = NULL; +} + int register_kprobe(struct kprobe *p); void unregister_kprobe(struct kprobe *p); int setjmp_pre_handler(struct kprobe *, struct pt_regs *); @@ -194,6 +242,9 @@ void jprobe_return(void); int register_kretprobe(struct kretprobe *rp); void unregister_kretprobe(struct kretprobe *rp); +int register_uprobe(struct uprobe *uprobe); +void unregister_uprobe(struct uprobe *uprobe); + struct kretprobe_instance *get_free_rp_inst(struct kretprobe *rp); void add_rp_inst(struct kretprobe_instance *ri); void kprobe_flush_task(struct task_struct *tk); diff -puN /dev/null kernel/uprobes.c --- /dev/null 2004-06-24 23:34:38.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/kernel/uprobes.c 2006-03-20 10:49:20.000000000 +0530 @@ -0,0 +1,589 @@ +/* + * User-space Probes (UProbes) + * kernel/uprobes.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006 + * + * 2006-Mar Created by Prasanna S Panchamukhi <prasanna@in.ibm.com> + * User-space probes initial implementation based on IBM's + * Dprobes. + */ +#include <linux/kprobes.h> +#include <linux/hash.h> +#include <linux/init.h> +#include <linux/slab.h> +#include <linux/module.h> +#include <linux/moduleloader.h> +#include <asm-generic/sections.h> +#include <asm/cacheflush.h> +#include <asm/errno.h> +#include <asm/kdebug.h> + +#define UPROBE_HASH_BITS 6 +#define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS) + +/* user space probes lists */ +static struct list_head uprobe_module_list; +static struct hlist_head uprobe_table[UPROBE_TABLE_SIZE]; +DEFINE_SPINLOCK(uprobe_lock); /* Protects uprobe_table*/ +DEFINE_MUTEX(uprobe_mutex); /* Protects uprobe_module_table */ + +/* + * Aggregate handlers for multiple uprobes support - these handlers + * take care of invoking the individual uprobe handlers on p->list + */ +static int __kprobes aggr_user_pre_handler(struct kprobe *p, + struct pt_regs *regs) +{ + struct kprobe *kp; + + list_for_each_entry(kp, &p->list, list) { + if (kp->pre_handler) { + set_uprobe_instance(kp); + if (kp->pre_handler(kp, regs)) + return 1; + } + } + return 0; +} + +static void __kprobes aggr_user_post_handler(struct kprobe *p, + struct pt_regs *regs, unsigned long flags) +{ + struct kprobe *kp; + + list_for_each_entry(kp, &p->list, list) { + if (kp->post_handler) { + set_uprobe_instance(kp); + kp->post_handler(kp, regs, flags); + } + } +} + +static int __kprobes aggr_user_fault_handler(struct kprobe *p, + struct pt_regs *regs, int trapnr) +{ + struct kprobe *cur; + + /* + * if we faulted "during" the execution of a user specified + * probe handler, invoke just that probe's fault handler + */ + cur = ¤t_uprobe->kp; + if (cur && cur->fault_handler) + if (cur->fault_handler(cur, regs, trapnr)) + return 1; + return 0; +} + +/** + * This routine looks for an existing uprobe at the given offset and inode. + * If it's found, returns the corresponding kprobe pointer. + * This should be called with uprobe_lock held. + */ +static struct kprobe __kprobes *get_kprobe_user(struct inode *inode, + unsigned long offset) +{ + struct hlist_head *head; + struct hlist_node *node; + struct kprobe *p, *kpr; + struct uprobe *uprobe; + + head = &uprobe_table[hash_ptr((kprobe_opcode_t *) + (((unsigned long)inode) * offset), UPROBE_HASH_BITS)]; + + hlist_for_each_entry(p, node, head, hlist) { + if (p->pre_handler == aggr_user_pre_handler) { + kpr = list_entry(p->list.next, typeof(*kpr), list); + uprobe = container_of(kpr, struct uprobe, kp); + } else + uprobe = container_of(p, struct uprobe, kp); + + if ((uprobe->inode == inode) && (uprobe->offset == offset)) + return p; + } + + return NULL; +} + +/** + * Finds a uprobe at the specified user-space address in the current task. + * Points current_uprobe at that uprobe and returns the corresponding kprobe. + */ +struct kprobe __kprobes *get_uprobe(void *addr) +{ + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + struct inode *inode; + unsigned long offset; + struct kprobe *p, *kpr; + struct uprobe *uprobe; + + vma = find_vma(mm, (unsigned long)addr); + + BUG_ON(!vma); /* this should not happen, not in our memory map */ + + offset = (unsigned long)addr - (vma->vm_start + + (vma->vm_pgoff << PAGE_SHIFT)); + if (!vma->vm_file) + return NULL; + + inode = vma->vm_file->f_dentry->d_inode; + + p = get_kprobe_user(inode, offset); + if (!p) + return NULL; + + if (p->pre_handler == aggr_user_pre_handler) { + /* + * Walk the uprobe aggregate list and return firt + * element on aggregate list. + */ + kpr = list_entry((p)->list.next, typeof(*kpr), list); + uprobe = container_of(kpr, struct uprobe, kp); + } else + uprobe = container_of(p, struct uprobe, kp); + + if (uprobe) + current_uprobe = uprobe; + + return p; +} + +/* + * Fill in the required fields of the "manager uprobe". Replace the + * earlier kprobe in the hlist with the manager uprobe + */ +static inline void add_aggr_uprobe(struct kprobe *ap, struct kprobe *p) +{ + copy_kprobe(p, ap); + ap->addr = p->addr; + ap->pre_handler = aggr_user_pre_handler; + ap->post_handler = aggr_user_post_handler; + ap->fault_handler = aggr_user_fault_handler; + + INIT_LIST_HEAD(&ap->list); + list_add(&p->list, &ap->list); + + hlist_replace_rcu(&p->hlist, &ap->hlist); +} + +/* + * This is the second or subsequent uprobe at the address - handle + * the intricacies + */ +static int __kprobes register_aggr_uprobe(struct kprobe *old_p, + struct kprobe *p) +{ + struct kprobe *ap; + + if (old_p->pre_handler == aggr_user_pre_handler) { + copy_kprobe(old_p, p); + list_add(&p->list, &old_p->list); + } else { + ap = kzalloc(sizeof(struct kprobe), GFP_ATOMIC); + if (!ap) + return -ENOMEM; + add_aggr_uprobe(ap, old_p); + copy_kprobe(ap, p); + list_add(&p->list, &old_p->list); + } + return 0; +} + +typedef int (*process_uprobe_func_t)(struct uprobe *uprobe, + kprobe_opcode_t *address); + +/** + * Saves the original instruction in the uprobe structure and + * inserts the breakpoint at the given address. + */ +int __kprobes insert_kprobe_user(struct uprobe *uprobe, + kprobe_opcode_t *address) +{ + int ret = 0; + + ret = arch_copy_uprobe(&uprobe->kp, address); + if (ret) { + printk("Breakpoint already present\n"); + return ret; + } + arch_arm_uprobe(address); + + return 0; +} + +/** + * Wait for the page to be unlocked if someone else had locked it, + * then map the page and insert or remove the breakpoint. + */ +static int __kprobes map_uprobe_page(struct page *page, struct uprobe *uprobe, + process_uprobe_func_t process_kprobe_user) +{ + int ret = 0; + kprobe_opcode_t *uprobe_address; + + if (!page) + return -EINVAL; /* TODO: more suitable errno */ + + wait_on_page_locked(page); + /* could probably retry readpage here. */ + if (!PageUptodate(page)) + return -EINVAL; /* TODO: more suitable errno */ + + lock_page(page); + + uprobe_address = (kprobe_opcode_t *)kmap(page); + uprobe_address = (kprobe_opcode_t *)((unsigned long)uprobe_address + + (uprobe->offset & ~PAGE_MASK)); + ret = (*process_kprobe_user)(uprobe, uprobe_address); + kunmap(page); + + unlock_page(page); + + return ret; +} + +/** + * flush_vma walks through the list of process private mappings, + * gets the vma containing the offset and flushes all the vmas + * containing the probed page. + */ +static void __kprobes flush_vma(struct address_space *mapping, + struct page *page, struct uprobe *uprobe) +{ + struct vm_area_struct *vma = NULL; + struct prio_tree_iter iter; + struct prio_tree_root *head = &mapping->i_mmap; + struct mm_struct *mm; + unsigned long start, end, offset = uprobe->offset; + + spin_lock(&mapping->i_mmap_lock); + vma_prio_tree_foreach(vma, &iter, head, offset, offset) { + mm = vma->vm_mm; + start = vma->vm_start - (vma->vm_pgoff << PAGE_SHIFT); + end = vma->vm_end - (vma->vm_pgoff << PAGE_SHIFT); + + if ((start + offset) < end) + flush_icache_user_range(vma, page, + (unsigned long)uprobe->kp.addr, + sizeof(kprobe_opcode_t)); + } + spin_unlock(&mapping->i_mmap_lock); +} + +/** + * Walk the uprobe_module_list and return the uprobe module with matching + * inode. + */ +static struct uprobe_module __kprobes *get_module_by_inode(struct inode *inode) +{ + struct uprobe_module *umodule; + + list_for_each_entry(umodule, &uprobe_module_list, mlist) { + if (umodule->nd.dentry->d_inode == inode) + return umodule; + } + + return NULL; +} + +/** + * Gets exclusive write access to the given inode to ensure that the file + * on which probes are currently applied does not change. Use the function, + * deny_write_access_to_inode() we added in fs/namei.c. + */ +static inline int ex_write_lock(struct inode *inode) +{ + return deny_write_access_to_inode(inode); +} + +/** + * Called when removing user space probes to release the write lock on the + * inode. + */ +static inline int ex_write_unlock(struct inode *inode) +{ + atomic_inc(&inode->i_writecount); + return 0; +} + +/** + * Add uprobe and uprobe_module to the appropriate hash list. + */ +static void __kprobes get_inode_ops(struct uprobe *uprobe, + struct uprobe_module *umodule) +{ + INIT_HLIST_HEAD(&umodule->ulist_head); + hlist_add_head(&uprobe->ulist, &umodule->ulist_head); + list_add(&umodule->mlist, &uprobe_module_list); +} + +/* + * Removes the specified uprobe from either aggregate uprobe list + * or individual uprobe hash table. + */ + +static int __kprobes remove_uprobe(struct uprobe *uprobe) +{ + struct kprobe *old_p, *list_p, *p; + int ret = 0; + + p = &uprobe->kp; + old_p = get_kprobe_user(uprobe->inode, uprobe->offset); + if (unlikely(!old_p)) + return 0; + + if (p != old_p) { + list_for_each_entry(list_p, &old_p->list, list) + if (list_p == p) + /* kprobe p is a valid probe */ + goto valid_p; + return 0; + } + +valid_p: + if ((old_p == p) || + ((old_p->pre_handler == aggr_user_pre_handler) && + (p->list.next == &old_p->list) && + (p->list.prev == &old_p->list))) { + /* + * Only probe on the hash list, mark the corresponding + * instruction slot for freeing by return 1. + */ + ret = 1; + hlist_del(&old_p->hlist); + if (p != old_p) { + list_del(&p->list); + kfree(old_p); + } + } else + list_del(&p->list); + + return ret; +} + +/* + * Disarms the probe and frees the corresponding instruction slot. + */ +static int __kprobes remove_kprobe_user(struct uprobe *uprobe, + kprobe_opcode_t *address) +{ + struct kprobe *p = &uprobe->kp; + + arch_disarm_uprobe(p, address); + arch_remove_kprobe(p); + + return 0; +} + +/* + * Adds the given uprobe to the uprobe_hash table if it is + * the first probe to be inserted at the given address else + * adds to the aggregate uprobe's list. + */ +static int __kprobes insert_uprobe(struct uprobe *uprobe) +{ + struct kprobe *old_p; + int ret = 0; + unsigned long offset = uprobe->offset; + unsigned long inode = (unsigned long) uprobe->inode; + struct hlist_head *head; + unsigned long flags; + + spin_lock_irqsave(&uprobe_lock, flags); + uprobe->kp.nmissed = 0; + + old_p = get_kprobe_user(uprobe->inode, uprobe->offset); + + if (old_p) + register_aggr_uprobe(old_p, &uprobe->kp); + else { + head = &uprobe_table[hash_ptr((kprobe_opcode_t *) + (offset * inode), UPROBE_HASH_BITS)]; + INIT_HLIST_NODE(&uprobe->kp.hlist); + hlist_add_head(&uprobe->kp.hlist, head); + /* + * The original instruction must be copied into the instruction + * slot, hence return 1. + */ + ret = 1; + } + + spin_unlock_irqrestore(&uprobe_lock, flags); + + return ret; +} + +/** + * unregister_uprobe: Disarms the probe, removes the uprobe + * pointers from the hash list and unhooks readpage routines. + */ +void __kprobes unregister_uprobe(struct uprobe *uprobe) +{ + struct address_space *mapping; + struct uprobe_module *umodule; + struct page *page; + unsigned long flags; + int ret = 0; + + if (!uprobe->inode) + return; + + mapping = uprobe->inode->i_mapping; + + page = find_get_page(mapping, uprobe->offset >> PAGE_CACHE_SHIFT); + + spin_lock_irqsave(&uprobe_lock, flags); + ret = remove_uprobe(uprobe); + spin_unlock_irqrestore(&uprobe_lock, flags); + + mutex_lock(&uprobe_mutex); + if (!(umodule = get_module_by_inode(uprobe->inode))) + goto out; + + hlist_del(&uprobe->ulist); + if (hlist_empty(&umodule->ulist_head)) { + list_del(&umodule->mlist); + ex_write_unlock(uprobe->inode); + path_release(&umodule->nd); + kfree(umodule); + } + +out: + mutex_unlock(&uprobe_mutex); + if (ret) + ret = map_uprobe_page(page, uprobe, remove_kprobe_user); + + if (ret == -EINVAL) + return; + /* + * TODO: unregister_uprobe should not fail, need to handle + * if it fails. + */ + flush_vma(mapping, page, uprobe); + + if (page) + page_cache_release(page); +} + +/** + * register_uprobe(): combination of inode and offset is used to + * identify each probe uniquely. Each uprobe can be found from the + * uprobes_hash table by using inode and offset. register_uprobe(), + * inserts the breakpoint at the given address by locating and mapping + * the page. return 0 on success and error on failure. + */ +int __kprobes register_uprobe(struct uprobe *uprobe) +{ + struct address_space *mapping; + struct uprobe_module *umodule = NULL; + struct inode *inode; + struct nameidata nd; + struct page *page; + int error = 0; + + INIT_HLIST_NODE(&uprobe->ulist); + + /* + * TODO: Need to calculate the absolute file offset for dynamic + * shared libraries. + */ + if ((error = path_lookup(uprobe->pathname, LOOKUP_FOLLOW, &nd))) + return error; + + mutex_lock(&uprobe_mutex); + + inode = nd.dentry->d_inode; + error = ex_write_lock(inode); + if (error) + goto out; + + /* + * Check if there are probes already on this application and + * add the corresponding uprobe to per application probe's list. + */ + umodule = get_module_by_inode(inode); + if (!umodule) { + + error = arch_alloc_insn(&uprobe->kp); + if (error) + goto out; + + /* + * Allocate a uprobe_module structure for this + * application if not allocated before. + */ + umodule = kzalloc(sizeof(struct uprobe_module), GFP_KERNEL); + if (!umodule) { + error = -ENOMEM; + ex_write_unlock(inode); + arch_remove_kprobe(&uprobe->kp); + goto out; + } + memcpy(&umodule->nd, &nd, sizeof(struct nameidata)); + get_inode_ops(uprobe, umodule); + } else { + path_release(&nd); + ex_write_unlock(inode); + hlist_add_head(&uprobe->ulist, &umodule->ulist_head); + } + mutex_unlock(&uprobe_mutex); + + uprobe->inode = inode; + mapping = inode->i_mapping; + page = find_get_page(mapping, (uprobe->offset >> PAGE_CACHE_SHIFT)); + + if (insert_uprobe(uprobe)) + error = map_uprobe_page(page, uprobe, insert_kprobe_user); + + /* + * If error == -EINVAL, return success, probes will inserted by + * readpage hooks. + * TODO: Use a more suitable errno? + */ + if (error == -EINVAL) + error = 0; + flush_vma(mapping, page, uprobe); + + if (page) + page_cache_release(page); + + return error; +out: + path_release(&nd); + mutex_unlock(&uprobe_mutex); + + return error; +} + +void init_uprobes(void) +{ + int i; + + /* FIXME allocate the probe table, currently defined statically */ + /* initialize all list heads */ + for (i = 0; i < UPROBE_TABLE_SIZE; i++) + INIT_HLIST_HEAD(&uprobe_table[i]); + + INIT_LIST_HEAD(&uprobe_module_list); +} + +EXPORT_SYMBOL_GPL(register_uprobe); +EXPORT_SYMBOL_GPL(unregister_uprobe); + + diff -puN /dev/null arch/i386/kernel/uprobes.c --- /dev/null 2004-06-24 23:34:38.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/arch/i386/kernel/uprobes.c 2006-03-20 10:35:28.000000000 +0530 @@ -0,0 +1,71 @@ +/* + * User-space Probes (UProbes) + * arch/i386/kernel/uprobes.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006. + * + * 2006-Mar Created by Prasanna S Panchamukhi <prasanna@in.ibm.com> + * User-space probes initial implementation based on IBM's + * Dprobes. + */ + +#include <linux/config.h> +#include <linux/kprobes.h> +#include <linux/ptrace.h> +#include <linux/preempt.h> +#include <asm/cacheflush.h> +#include <asm/kdebug.h> +#include <asm/desc.h> + +int __kprobes arch_alloc_insn(struct kprobe *p) +{ + mutex_lock(&kprobe_mutex); + p->ainsn.insn = get_insn_slot(); + mutex_unlock(&kprobe_mutex); + + if (!p->ainsn.insn) + return -ENOMEM; + + return 0; +} + +void __kprobes arch_disarm_uprobe(struct kprobe *p, kprobe_opcode_t *address) +{ + if (p->opcode != BREAKPOINT_INSTRUCTION) + *address = p->opcode; +} + +void __kprobes arch_arm_uprobe(kprobe_opcode_t *address) +{ + *address = BREAKPOINT_INSTRUCTION; +} + +int __kprobes arch_copy_uprobe(struct kprobe *p, kprobe_opcode_t *address) +{ + int ret = 1; + + /* + * TODO: Check if the given address is a valid to access user memory. + */ + if (*address != BREAKPOINT_INSTRUCTION) { + memcpy(p->ainsn.insn, address, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); + ret = 0; + } + p->opcode = *(kprobe_opcode_t *)address; + + return ret; +} diff -puN kernel/kprobes.c~kprobes_userspace_probes-base-interface kernel/kprobes.c --- linux-2.6.16-rc6-mm2/kernel/kprobes.c~kprobes_userspace_probes-base-interface 2006-03-20 10:34:58.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/kernel/kprobes.c 2006-03-20 10:34:58.000000000 +0530 @@ -356,7 +356,7 @@ static inline void free_rp_inst(struct k /* * Keep all fields in the kprobe consistent */ -static inline void copy_kprobe(struct kprobe *old_p, struct kprobe *p) +void copy_kprobe(struct kprobe *old_p, struct kprobe *p) { memcpy(&p->opcode, &old_p->opcode, sizeof(kprobe_opcode_t)); memcpy(&p->ainsn, &old_p->ainsn, sizeof(struct arch_specific_insn)); @@ -650,6 +650,7 @@ static int __init init_kprobes(void) INIT_HLIST_HEAD(&kretprobe_inst_table[i]); } + init_uprobes(); err = arch_init_kprobes(); if (!err) err = register_die_notifier(&kprobe_exceptions_nb); diff -puN include/linux/namei.h~kprobes_userspace_probes-base-interface include/linux/namei.h --- linux-2.6.16-rc6-mm2/include/linux/namei.h~kprobes_userspace_probes-base-interface 2006-03-20 10:34:58.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/include/linux/namei.h 2006-03-20 10:34:58.000000000 +0530 @@ -81,6 +81,7 @@ extern int follow_up(struct vfsmount **, extern struct dentry *lock_rename(struct dentry *, struct dentry *); extern void unlock_rename(struct dentry *, struct dentry *); +extern int deny_write_access_to_inode(struct inode *inode); static inline void nd_set_link(struct nameidata *nd, char *path) { diff -puN fs/namei.c~kprobes_userspace_probes-base-interface fs/namei.c --- linux-2.6.16-rc6-mm2/fs/namei.c~kprobes_userspace_probes-base-interface 2006-03-20 10:34:58.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/fs/namei.c 2006-03-20 10:34:58.000000000 +0530 @@ -322,10 +322,9 @@ int get_write_access(struct inode * inod return 0; } -int deny_write_access(struct file * file) +/* This routine increments the writecount for a given inode. */ +int deny_write_access_to_inode(struct inode *inode) { - struct inode *inode = file->f_dentry->d_inode; - spin_lock(&inode->i_lock); if (atomic_read(&inode->i_writecount) > 0) { spin_unlock(&inode->i_lock); @@ -337,6 +336,14 @@ int deny_write_access(struct file * file return 0; } +/* Wrapper routine that increments the writecount for a given file pointer. */ +int deny_write_access(struct file * file) +{ + struct inode *inode = file->f_dentry->d_inode; + + return deny_write_access_to_inode(inode); +} + void path_release(struct nameidata *nd) { dput(nd->dentry); diff -puN kernel/Makefile~kprobes_userspace_probes-base-interface kernel/Makefile --- linux-2.6.16-rc6-mm2/kernel/Makefile~kprobes_userspace_probes-base-interface 2006-03-20 10:34:58.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/kernel/Makefile 2006-03-20 10:34:58.000000000 +0530 @@ -31,7 +31,7 @@ obj-$(CONFIG_IKCONFIG) += configs.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o obj-$(CONFIG_AUDIT) += audit.o auditfilter.o obj-$(CONFIG_AUDITSYSCALL) += auditsc.o -obj-$(CONFIG_KPROBES) += kprobes.o +obj-$(CONFIG_KPROBES) += kprobes.o uprobes.o obj-$(CONFIG_SYSFS) += ksysfs.o obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ _ -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-20 6:09 ` [1/3 PATCH] Kprobes: User space probes support- base interface Prasanna S Panchamukhi @ 2006-03-20 6:10 ` Prasanna S Panchamukhi 2006-03-20 6:11 ` [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line Prasanna S Panchamukhi ` (2 more replies) 2006-03-20 10:42 ` [1/3 PATCH] Kprobes: User space probes support- base interface Andrew Morton 1 sibling, 3 replies; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 6:10 UTC (permalink / raw) To: akpm, Andi Kleen, davem, suparna, richardj_moore; +Cc: linux-kernel This patch provides the feature of inserting probes on pages that are not present in the memory during registration. To add readpage and readpages() hooks, two new elements are added to the uprobe_module object: struct address_space_operations *ori_a_ops; struct address_space_operations user_a_ops; User-space probes allows probes to be inserted even in pages that are not present in the memory at the time of registration. This is done by adding hooks to the readpage and readpages routines. During registration, the address space operation object is modified by substituting user-space probes's specific readpage and readpages routines. When the pages are read into memory through the readpage and readpages address space operations, any associated probes are automatically inserted into those pages. These user-space probes readpage and readpages routines internally call the original readpage() and readpages() routines, and then check whether probes are to be added to these pages, inserting probes as necessary. The overhead of adding these hooks is limited to the application on which the probes are inserted. During unregistration, care should be taken to replace the readpage and readpages hooks with the original routines if no probes remain on that application. Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> kernel/uprobes.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 121 insertions(+) diff -puN kernel/uprobes.c~kprobes_userspace_probes-hook-readpage kernel/uprobes.c --- linux-2.6.16-rc6-mm2/kernel/uprobes.c~kprobes_userspace_probes-hook-readpage 2006-03-20 10:49:14.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/kernel/uprobes.c 2006-03-20 10:49:14.000000000 +0530 @@ -302,6 +302,115 @@ static struct uprobe_module __kprobes *g return NULL; } +static inline void insert_readpage_uprobe(struct page *page, + struct address_space *mapping, struct uprobe *uprobe) +{ + unsigned long page_start = page->index << PAGE_CACHE_SHIFT; + unsigned long page_end = page_start + PAGE_SIZE; + + if ((uprobe->offset >= page_start) && (uprobe->offset < page_end)) { + map_uprobe_page(page, uprobe, insert_kprobe_user); + flush_vma(mapping, page, uprobe); + } +} + +/** + * This function hooks the readpages() of all modules that have active + * probes on them. The original readpages() is called for the given + * inode/address_space to actually read the pages into the memory. + * Then all probes that are specified on these pages are inserted. + */ +static int __kprobes uprobe_readpages(struct file *file, + struct address_space *mapping, + struct list_head *pages, unsigned nr_pages) +{ + int retval = 0; + struct page *page; + struct uprobe_module *umodule; + struct uprobe *uprobe = NULL; + struct hlist_node *node; + + mutex_lock(&uprobe_mutex); + + umodule = get_module_by_inode(file->f_dentry->d_inode); + if (!umodule) { + /* + * No module associated with this file, call the + * original readpages(). + */ + retval = mapping->a_ops->readpages(file, mapping, + pages, nr_pages); + goto out; + } + + /* call original readpages() */ + retval = umodule->ori_a_ops->readpages(file, mapping, pages, nr_pages); + if (retval < 0) + goto out; + + /* + * TODO: Walk through readpages page list and get + * pages with probes instead of find_get_page(). + */ + hlist_for_each_entry(uprobe, node, &umodule->ulist_head, ulist) { + page = find_get_page(mapping, + uprobe->offset >> PAGE_CACHE_SHIFT); + if (!page) + continue; + + if (!uprobe->kp.opcode) + insert_readpage_uprobe(page, mapping, uprobe); + page_cache_release(page); + } + +out: + mutex_unlock(&uprobe_mutex); + + return retval; +} + +/** + * This function hooks the readpage() of all modules that have active + * probes on them. The original readpage() is called for the given + * inode/address_space to actually read the pages into the memory. + * Then all probes that are specified on this page are inserted. + */ +int __kprobes uprobe_readpage(struct file *file, struct page *page) +{ + int retval = 0; + struct uprobe_module *umodule; + struct uprobe *uprobe = NULL; + struct hlist_node *node; + struct address_space *mapping = file->f_dentry->d_inode->i_mapping; + + mutex_lock(&uprobe_mutex); + + umodule = get_module_by_inode(file->f_dentry->d_inode); + if (!umodule) { + /* + * No module associated with this file, call the + * original readpage(). + */ + retval = mapping->a_ops->readpage(file, page); + goto out; + } + + /* call original readpage() */ + retval = umodule->ori_a_ops->readpage(file, page); + if (retval < 0) + goto out; + + hlist_for_each_entry(uprobe, node, &umodule->ulist_head, ulist) { + if (!uprobe->kp.opcode) + insert_readpage_uprobe(page, mapping, uprobe); + } + +out: + mutex_unlock(&uprobe_mutex); + + return retval; +} + /** * Gets exclusive write access to the given inode to ensure that the file * on which probes are currently applied does not change. Use the function, @@ -324,13 +433,23 @@ static inline int ex_write_unlock(struct /** * Add uprobe and uprobe_module to the appropriate hash list. + * Also swithces i_op to hooks into readpage and readpages(). */ static void __kprobes get_inode_ops(struct uprobe *uprobe, struct uprobe_module *umodule) { + struct address_space *as; + INIT_HLIST_HEAD(&umodule->ulist_head); hlist_add_head(&uprobe->ulist, &umodule->ulist_head); list_add(&umodule->mlist, &uprobe_module_list); + as = umodule->nd.dentry->d_inode->i_mapping; + umodule->ori_a_ops = as->a_ops; + umodule->user_a_ops = *as->a_ops; + umodule->user_a_ops.readpage = uprobe_readpage; + umodule->user_a_ops.readpages = uprobe_readpages; + as->a_ops = &umodule->user_a_ops; + } /* @@ -459,6 +578,8 @@ void __kprobes unregister_uprobe(struct hlist_del(&uprobe->ulist); if (hlist_empty(&umodule->ulist_head)) { list_del(&umodule->mlist); + umodule->nd.dentry->d_inode->i_mapping->a_ops = + umodule->ori_a_ops; ex_write_unlock(uprobe->inode); path_release(&umodule->nd); kfree(umodule); _ -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 6:10 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi @ 2006-03-20 6:11 ` Prasanna S Panchamukhi 2006-03-20 11:09 ` Andrew Morton 2006-03-20 10:53 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Andrew Morton 2006-03-20 11:10 ` Andrew Morton 2 siblings, 1 reply; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 6:11 UTC (permalink / raw) To: akpm, Andi Kleen, davem, suparna, richardj_moore; +Cc: linux-kernel This patch provides a mechanism for probe handling and executing the user-specified handlers. Each userspace probe is uniquely identified by the combination of inode and offset, hence during registeration the inode and offset combination is added to uprobes hash table. Initially when breakpoint instruction is hit, the uprobes hash table is looked up for matching inode and offset. The pre_handlers are called in sequence if multiple probes are registered. Similar to kprobes, uprobes also adopts to single step out-of-line, so that probe miss in SMP environment can be avoided. But for userspace probes, instruction copied into kernel address space cannot be single stepped, hence the instruction must be copied to user address space. The solution is to find free space in the current process address space and then copy the original instruction and single step that instruction. User processes use stack space to store local variables, agruments and return values. Normally the stack space either below or above the stack pointer indicates the free stack space. The instruction to be single stepped can modify the stack space, hence before using the free stack space, sufficient stack space should be left. The instruction is copied to the bottom of the page and check is made such that the copied instruction does not cross the page boundry. The copied instruction is then single stepped. Several architectures does not allow the instruction to be executed from the stack location, since no-exec bit is set for the stack pages. In those architectures, the page table entry corresponding to the stack page is identified and the no-exec bit is unset making the instruction on that stack page to be executed. There are situations where even the free stack space is not enough for the user instruction to be copied and single stepped. In such situations, the virtual memory area(vma) can be expanded beyond the current stack vma. This expaneded stack can be used to copy the original instruction and single step out-of-line. Even if the vma cannot be extended then the instruction much be executed inline, by replacing the breakpoint instruction with original instruction. Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> arch/i386/kernel/Makefile | 2 arch/i386/kernel/kprobes.c | 4 arch/i386/kernel/uprobes.c | 537 +++++++++++++++++++++++++++++++++++++++++++++ arch/i386/mm/fault.c | 3 include/asm-i386/kprobes.h | 20 + include/linux/kprobes.h | 8 6 files changed, 569 insertions(+), 5 deletions(-) diff -puN include/asm-i386/kprobes.h~kprobes_userspace_probes-ss-out-of-line include/asm-i386/kprobes.h --- linux-2.6.16-rc6-mm2/include/asm-i386/kprobes.h~kprobes_userspace_probes-ss-out-of-line 2006-03-20 10:49:29.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/include/asm-i386/kprobes.h 2006-03-20 10:49:29.000000000 +0530 @@ -26,6 +26,7 @@ */ #include <linux/types.h> #include <linux/ptrace.h> +#include <asm/cacheflush.h> #define __ARCH_WANT_KPROBES_INSN_SLOT @@ -77,6 +78,19 @@ struct kprobe_ctlblk { struct prev_kprobe prev_kprobe; }; +/* per user probe control block */ +struct uprobe_ctlblk { + unsigned long uprobe_status; + unsigned long uprobe_saved_eflags; + unsigned long uprobe_old_eflags; + unsigned long singlestep_addr; + unsigned long flags; + struct kprobe *curr_p; + pte_t *upte; + struct page *upage; + struct task_struct *tsk; +}; + /* trap3/1 are intr gates for kprobes. So, restore the status of IF, * if necessary, before executing the original int3/1 (trap) handler. */ @@ -88,4 +102,10 @@ static inline void restore_interrupts(st extern int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data); +extern int uprobe_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data); +extern unsigned long get_segment_eip(struct pt_regs *regs, + unsigned long *eip_limit); +extern int is_IF_modifier(kprobe_opcode_t opcode); + #endif /* _ASM_KPROBES_H */ diff -puN arch/i386/kernel/uprobes.c~kprobes_userspace_probes-ss-out-of-line arch/i386/kernel/uprobes.c --- linux-2.6.16-rc6-mm2/arch/i386/kernel/uprobes.c~kprobes_userspace_probes-ss-out-of-line 2006-03-20 10:49:29.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/arch/i386/kernel/uprobes.c 2006-03-20 10:56:23.000000000 +0530 @@ -30,6 +30,10 @@ #include <asm/cacheflush.h> #include <asm/kdebug.h> #include <asm/desc.h> +#include <asm/uaccess.h> + +static struct uprobe_ctlblk uprobe_ctlblk; +struct uprobe *current_uprobe; int __kprobes arch_alloc_insn(struct kprobe *p) { @@ -69,3 +73,536 @@ int __kprobes arch_copy_uprobe(struct kp return ret; } + +/** + * This routines get the pte of the page containing the specified address. + */ +static pte_t __kprobes *get_uprobe_pte(unsigned long address) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte = NULL; + + pgd = pgd_offset(current->mm, address); + if (!pgd) + goto out; + + pud = pud_offset(pgd, address); + if (!pud) + goto out; + + pmd = pmd_offset(pud, address); + if (!pmd) + goto out; + + pte = pte_alloc_map(current->mm, pmd, address); + +out: + return pte; +} + +/** + * This routine check for space in the current process's stack + * address space. If enough address space is found, copy the original + * instruction on that page for single stepping out-of-line. + */ +static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe , + struct pt_regs *regs, struct vm_area_struct *vma) +{ + unsigned long addr, stack_addr = regs->esp; + int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t); + + if (vma->vm_flags & VM_GROWSDOWN) { + if (((stack_addr - sizeof(long long))) < + (vma->vm_start + size)) + return -ENOMEM; + addr = vma->vm_start; + } else if (vma->vm_flags & VM_GROWSUP) { + if ((vma->vm_end - size) < (stack_addr + sizeof(long long))) + return -ENOMEM; + addr = vma->vm_end - size; + } else + return -EFAULT; + + vma->vm_flags |= VM_LOCKED; + + if (__copy_to_user_inatomic((unsigned long *)addr, + (unsigned long *)uprobe->kp.ainsn.insn, size)) + return -EFAULT; + + regs->eip = addr; + + return 0; +} + +/** + * This routine expands the stack beyond the present process address + * space and copies the instruction to that location, so that + * processor can single step out-of-line. + */ +static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe, + struct pt_regs *regs, struct vm_area_struct *vma) +{ + unsigned long addr, vm_addr; + int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t); + struct vm_area_struct *new_vma; + struct mm_struct *mm = current->mm; + + + if (!down_read_trylock(¤t->mm->mmap_sem)) + return -ENOMEM; + + if (vma->vm_flags & VM_GROWSDOWN) + vm_addr = vma->vm_start - size; + else if (vma->vm_flags & VM_GROWSUP) + vm_addr = vma->vm_end + size; + else { + up_read(¤t->mm->mmap_sem); + return -EFAULT; + } + + new_vma = find_extend_vma(mm, vm_addr); + if (!new_vma) { + up_read(¤t->mm->mmap_sem); + return -ENOMEM; + } + + if (new_vma->vm_flags & VM_GROWSDOWN) + addr = new_vma->vm_start; + else + addr = new_vma->vm_end - size; + + new_vma->vm_flags |= VM_LOCKED; + up_read(¤t->mm->mmap_sem); + + if (__copy_to_user_inatomic((unsigned long *)addr, + (unsigned long *)uprobe->kp.ainsn.insn, size)) + return -EFAULT; + + regs->eip = addr; + + return 0; +} + +/** + * This routine checks for stack free space below the stack pointer + * and then copies the instructions at that location so that the + * processor can single step out-of-line. If there is not enough stack + * space or if copy_to_user fails or if the vma is invalid, it returns + * error. + */ +static int __kprobes copy_insn_onstack(struct uprobe *uprobe, + struct pt_regs *regs, unsigned long flags) +{ + unsigned long page_addr, stack_addr = regs->esp; + int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t); + unsigned long *source = (unsigned long *)uprobe->kp.ainsn.insn; + + if (flags & VM_GROWSDOWN) { + page_addr = stack_addr & PAGE_MASK; + + if (((stack_addr - sizeof(long long))) < (page_addr + size)) + return -ENOMEM; + + if (__copy_to_user_inatomic((unsigned long *)page_addr, + source, size)) + return -EFAULT; + + regs->eip = page_addr; + } else if (flags & VM_GROWSUP) { + page_addr = stack_addr & PAGE_MASK; + + if (page_addr == stack_addr) + return -ENOMEM; + else + page_addr += PAGE_SIZE; + + if ((page_addr - size) < (stack_addr + sizeof(long long))) + return -ENOMEM; + + if (__copy_to_user_inatomic( + (unsigned long *)(page_addr - size), source, size)) + return -EFAULT; + + regs->eip = page_addr - size; + } else + return -EINVAL; + + return 0; +} + +/** + * This routines get the page containing the probe, maps it and + * replaced the instruction at the probed address with specified + * opcode. + */ +void __kprobes replace_original_insn(struct uprobe *uprobe, + struct pt_regs *regs, kprobe_opcode_t opcode) +{ + kprobe_opcode_t *addr; + struct page *page; + + page = find_get_page(uprobe->inode->i_mapping, + uprobe->offset >> PAGE_CACHE_SHIFT); + BUG_ON(!page); + + __lock_page(page); + + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); + addr = (kprobe_opcode_t *)((unsigned long)addr + + (unsigned long)(uprobe->offset & ~PAGE_MASK)); + *addr = opcode; + /*TODO: flush vma ? */ + kunmap_atomic(addr, KM_USER1); + + unlock_page(page); + + if (page) + page_cache_release(page); + regs->eip = (unsigned long)uprobe->kp.addr; +} + +/** + * This routine provides the functionality of single stepping + * out-of-line. If single stepping out-of-line cannot be achieved, + * it replaces with the original instruction allowing it to single + * step inline. + */ +static inline int prepare_singlestep_uprobe(struct uprobe *uprobe, + struct uprobe_ctlblk *ucb, struct pt_regs *regs) +{ + unsigned long stack_addr = regs->esp, flags; + struct vm_area_struct *vma = NULL; + int err = 0; + + vma = find_vma(current->mm, (stack_addr & PAGE_MASK)); + if (!vma) { + /* TODO: Need better error reporting? */ + goto no_vma; + } + flags = vma->vm_flags; + + regs->eflags |= TF_MASK; + regs->eflags &= ~IF_MASK; + + /* + * Copy_insn_on_stack tries to find some room for the instruction slot + * in the same page as the current esp. + */ + err = copy_insn_onstack(uprobe, regs, flags); + + /* + * If copy_insn_on_stack() fails, copy_insn_on_new_page() is called to + * try to find some room in the next pages below the current esp; + */ + if (err) + err = copy_insn_on_new_page(uprobe, regs, vma); + /* + * If copy_insn_on_new_pagek() fails, copy_insn_on_expstack() is called to + * try to grow the stack's VM area by one page. + */ + if (err) + err = copy_insn_onexpstack(uprobe, regs, vma); + + ucb->uprobe_status = UPROBE_HIT_SS; + + if (!err) { + ucb->upte = get_uprobe_pte(regs->eip); + if (!ucb->upte) + goto no_vma; + ucb->upage = pte_page(*ucb->upte); + __lock_page(ucb->upage); + } +no_vma: + if (err) { + replace_original_insn(uprobe, regs, uprobe->kp.opcode); + ucb->uprobe_status = UPROBE_SS_INLINE; + } + + ucb->singlestep_addr = regs->eip; + + return 0; +} + +/* + * uprobe_handler() executes the user specified handler and setup for + * single stepping the original instruction either out-of-line or inline. + */ +static int __kprobes uprobe_handler(struct pt_regs *regs) +{ + struct kprobe *p; + int ret = 0; + kprobe_opcode_t *addr = NULL; + struct uprobe_ctlblk *ucb = &uprobe_ctlblk; + unsigned long limit; + + spin_lock_irqsave(&uprobe_lock, ucb->flags); + /* preemption is disabled, remains disabled + * until we single step on original instruction. + */ + preempt_disable(); + + addr = (kprobe_opcode_t *)(get_segment_eip(regs, &limit) - 1); + + p = get_uprobe(addr); + if (!p) { + + if (*addr != BREAKPOINT_INSTRUCTION) { + /* + * The breakpoint instruction was removed right + * after we hit it. Another cpu has removed + * either a probepoint or a debugger breakpoint + * at this address. In either case, no further + * handling of this interrupt is appropriate. + * Back up over the (now missing) int3 and run + * the original instruction. + */ + regs->eip -= sizeof(kprobe_opcode_t); + ret = 1; + } + /* Not one of ours: let kernel handle it */ + goto no_uprobe; + } + + if (p->opcode == BREAKPOINT_INSTRUCTION) { + /* + * Breakpoint was already present even before the probe + * was inserted, this might break some compatability with + * other debuggers like gdb etc. We dont handle such probes. + */ + current_uprobe = NULL; + goto no_uprobe; + } + + ucb->curr_p = p; + ucb->tsk = current; + ucb->uprobe_status = UPROBE_HIT_ACTIVE; + ucb->uprobe_saved_eflags = (regs->eflags & (TF_MASK | IF_MASK)); + ucb->uprobe_old_eflags = (regs->eflags & (TF_MASK | IF_MASK)); + + if (p->pre_handler && p->pre_handler(p, regs)) + /* handler has already set things up, so skip ss setup */ + return 1; + + prepare_singlestep_uprobe(current_uprobe, ucb, regs); + /* + * Avoid scheduling the current while returning from + * kernel to user mode. + */ + clear_need_resched(); + return 1; + +no_uprobe: + spin_unlock_irqrestore(&uprobe_lock, ucb->flags); + preempt_enable_no_resched(); + + return ret; +} + +/* + * Called after single-stepping. p->addr is the address of the + * instruction whose first byte has been replaced by the "int 3" + * instruction. To avoid the SMP problems that can occur when we + * temporarily put back the original opcode to single-step, we + * single-stepped a copy of the instruction. The address of this + * copy is p->ainsn.insn. + * + * This function prepares to return from the post-single-step + * interrupt. We have to fix up the stack as follows: + * + * 0) Typically, the new eip is relative to the copied instruction. We + * need to make it relative to the original instruction. Exceptions are + * return instructions and absolute or indirect jump or call instructions. + * + * 1) If the single-stepped instruction was pushfl, then the TF and IF + * flags are set in the just-pushed eflags, and may need to be cleared. + * + * 2) If the single-stepped instruction was a call, the return address + * that is atop the stack is the address following the copied instruction. + * We need to make it the address following the original instruction. + */ +static void __kprobes resume_execution_user(struct kprobe *p, + struct pt_regs *regs, struct uprobe_ctlblk *ucb) +{ + unsigned long *tos = (unsigned long *)regs->esp; + unsigned long next_eip = 0; + unsigned long copy_eip = ucb->singlestep_addr; + unsigned long orig_eip = (unsigned long)p->addr; + + switch (p->ainsn.insn[0]) { + case 0x9c: /* pushfl */ + *tos &= ~(TF_MASK | IF_MASK); + *tos |= ucb->uprobe_old_eflags; + break; + case 0xc3: /* ret/lret */ + case 0xcb: + case 0xc2: + case 0xca: + next_eip = regs->eip; + /* eip is already adjusted, no more changes required*/ + return; + break; + case 0xe8: /* call relative - Fix return addr */ + *tos = orig_eip + (*tos - copy_eip); + break; + case 0xff: + if ((p->ainsn.insn[1] & 0x30) == 0x10) { + /* call absolute, indirect */ + /* Fix return addr; eip is correct. */ + next_eip = regs->eip; + *tos = orig_eip + (*tos - copy_eip); + } else if (((p->ainsn.insn[1] & 0x31) == 0x20) || + ((p->ainsn.insn[1] & 0x31) == 0x21)) { + /* jmp near or jmp far absolute indirect */ + /* eip is correct. */ + next_eip = regs->eip; + } + break; + case 0xea: /* jmp absolute -- eip is correct */ + next_eip = regs->eip; + break; + default: + break; + } + + regs->eflags &= ~TF_MASK; + if (next_eip) + regs->eip = next_eip; + else + regs->eip = orig_eip + (regs->eip - copy_eip); +} + +/* + * post_uprobe_handler(), executes the user specified handlers and + * resumes with the normal execution. + */ +static inline int post_uprobe_handler(struct pt_regs *regs) +{ + struct kprobe *cur; + struct uprobe_ctlblk *ucb; + + if (!current_uprobe) + return 0; + + ucb = &uprobe_ctlblk; + cur = ucb->curr_p; + + if (!cur || ucb->tsk != current) + return 0; + + if (cur->post_handler) { + if (ucb->uprobe_status == UPROBE_SS_INLINE) + ucb->uprobe_status = UPROBE_SSDONE_INLINE; + else + ucb->uprobe_status = UPROBE_HIT_SSDONE; + cur->post_handler(cur, regs, 0); + } + + resume_execution_user(cur, regs, ucb); + regs->eflags |= ucb->uprobe_saved_eflags; + + if (ucb->uprobe_status == UPROBE_SSDONE_INLINE) + replace_original_insn(current_uprobe, regs, + BREAKPOINT_INSTRUCTION); + else { + unlock_page(ucb->upage); + pte_unmap(ucb->upte); + } + current_uprobe = NULL; + spin_unlock_irqrestore(&uprobe_lock, ucb->flags); + preempt_enable_no_resched(); + /* + * if somebody else is singlestepping across a probe point, eflags + * will have TF set, in which case, continue the remaining processing + * of do_debug, as if this is not a probe hit. + */ + if (regs->eflags & TF_MASK) + return 0; + + return 1; +} + +static inline int uprobe_fault_handler(struct pt_regs *regs, int trapnr) +{ + struct kprobe *cur; + struct uprobe_ctlblk *ucb; + int ret = 0; + + ucb = &uprobe_ctlblk; + cur = ucb->curr_p; + + if (ucb->tsk != current || !cur) + return 0; + + switch(ucb->uprobe_status) { + case UPROBE_HIT_SS: + unlock_page(ucb->upage); + pte_unmap(ucb->upte); + /* TODO: All acceptable number of faults before disabling */ + replace_original_insn(current_uprobe, regs, cur->opcode); + /* Fall through and reset the current probe */ + case UPROBE_SS_INLINE: + regs->eip = (unsigned long)cur->addr; + regs->eflags |= ucb->uprobe_old_eflags; + regs->eflags &= ~TF_MASK; + current_uprobe = NULL; + ret = 1; + spin_unlock_irqrestore(&uprobe_lock, ucb->flags); + preempt_enable_no_resched(); + break; + case UPROBE_HIT_ACTIVE: + case UPROBE_SSDONE_INLINE: + case UPROBE_HIT_SSDONE: + if (cur->fault_handler && cur->fault_handler(cur, regs, trapnr)) + return 1; + + if (fixup_exception(regs)) + return 1; + /* + * We must not allow the system page handler to continue while + * holding a lock, since page fault handler can sleep and + * reschedule it on different cpu. Hence return 1. + */ + return 1; + break; + default: + break; + } + return ret; +} + +/* + * Wrapper routine to for handling exceptions. + */ +int __kprobes uprobe_exceptions_notify(struct notifier_block *self, + unsigned long val, void *data) +{ + struct die_args *args = (struct die_args *)data; + int ret = NOTIFY_DONE; + + if (args->regs->eflags & VM_MASK) { + /* We are in virtual-8086 mode. Return NOTIFY_DONE */ + return ret; + } + + switch (val) { + case DIE_INT3: + if (uprobe_handler(args->regs)) + ret = NOTIFY_STOP; + break; + case DIE_DEBUG: + if (post_uprobe_handler(args->regs)) + ret = NOTIFY_STOP; + break; + case DIE_GPF: + case DIE_PAGE_FAULT: + if (current_uprobe && + uprobe_fault_handler(args->regs, args->trapnr)) + ret = NOTIFY_STOP; + break; + default: + break; + } + return ret; +} diff -puN include/linux/kprobes.h~kprobes_userspace_probes-ss-out-of-line include/linux/kprobes.h --- linux-2.6.16-rc6-mm2/include/linux/kprobes.h~kprobes_userspace_probes-ss-out-of-line 2006-03-20 10:49:29.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/include/linux/kprobes.h 2006-03-20 10:49:29.000000000 +0530 @@ -51,6 +51,13 @@ #define KPROBE_REENTER 0x00000004 #define KPROBE_HIT_SSDONE 0x00000008 +/* uprobe_status settings */ +#define UPROBE_HIT_ACTIVE 0x00000001 +#define UPROBE_HIT_SS 0x00000002 +#define UPROBE_HIT_SSDONE 0x00000004 +#define UPROBE_SS_INLINE 0x00000008 +#define UPROBE_SSDONE_INLINE 0x00000010 + /* Attach to insert probes on any functions which should be ignored*/ #define __kprobes __attribute__((__section__(".kprobes.text"))) @@ -183,6 +190,7 @@ struct kretprobe_instance { struct task_struct *task; }; +extern spinlock_t uprobe_lock; extern spinlock_t kretprobe_lock; extern struct mutex kprobe_mutex; extern int arch_prepare_kprobe(struct kprobe *p); diff -puN arch/i386/kernel/kprobes.c~kprobes_userspace_probes-ss-out-of-line arch/i386/kernel/kprobes.c --- linux-2.6.16-rc6-mm2/arch/i386/kernel/kprobes.c~kprobes_userspace_probes-ss-out-of-line 2006-03-20 10:49:29.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/arch/i386/kernel/kprobes.c 2006-03-20 10:49:29.000000000 +0530 @@ -88,7 +88,7 @@ static inline int can_boost(kprobe_opcod /* * returns non-zero if opcode modifies the interrupt flag. */ -static inline int is_IF_modifier(kprobe_opcode_t opcode) +int is_IF_modifier(kprobe_opcode_t opcode) { switch (opcode) { case 0xfa: /* cli */ @@ -610,7 +610,7 @@ int __kprobes kprobe_exceptions_notify(s int ret = NOTIFY_DONE; if (args->regs && user_mode(args->regs)) - return ret; + return uprobe_exceptions_notify(self, val, data); switch (val) { case DIE_INT3: diff -puN arch/i386/mm/fault.c~kprobes_userspace_probes-ss-out-of-line arch/i386/mm/fault.c --- linux-2.6.16-rc6-mm2/arch/i386/mm/fault.c~kprobes_userspace_probes-ss-out-of-line 2006-03-20 10:49:29.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/arch/i386/mm/fault.c 2006-03-20 10:49:29.000000000 +0530 @@ -71,8 +71,7 @@ void bust_spinlocks(int yes) * * This is slow, but is very rarely executed. */ -static inline unsigned long get_segment_eip(struct pt_regs *regs, - unsigned long *eip_limit) +unsigned long get_segment_eip(struct pt_regs *regs, unsigned long *eip_limit) { unsigned long eip = regs->eip; unsigned seg = regs->xcs & 0xffff; diff -puN arch/i386/kernel/Makefile~kprobes_userspace_probes-ss-out-of-line arch/i386/kernel/Makefile --- linux-2.6.16-rc6-mm2/arch/i386/kernel/Makefile~kprobes_userspace_probes-ss-out-of-line 2006-03-20 10:49:29.000000000 +0530 +++ linux-2.6.16-rc6-mm2-prasanna/arch/i386/kernel/Makefile 2006-03-20 10:49:29.000000000 +0530 @@ -29,7 +29,7 @@ obj-$(CONFIG_KEXEC) += machine_kexec.o obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_X86_NUMAQ) += numaq.o obj-$(CONFIG_X86_SUMMIT_NUMA) += summit.o -obj-$(CONFIG_KPROBES) += kprobes.o +obj-$(CONFIG_KPROBES) += kprobes.o uprobes.o obj-$(CONFIG_MODULES) += module.o obj-y += sysenter.o vsyscall.o obj-$(CONFIG_ACPI_SRAT) += srat.o _ -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 6:11 ` [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line Prasanna S Panchamukhi @ 2006-03-20 11:09 ` Andrew Morton 2006-03-20 11:24 ` Arjan van de Ven ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Andrew Morton @ 2006-03-20 11:09 UTC (permalink / raw) To: prasanna; +Cc: ak, davem, suparna, richardj_moore, linux-kernel Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > This patch provides a mechanism for probe handling and > executing the user-specified handlers. > > Each userspace probe is uniquely identified by the combination of > inode and offset, hence during registeration the inode and offset > combination is added to uprobes hash table. Initially when > breakpoint instruction is hit, the uprobes hash table is looked up > for matching inode and offset. The pre_handlers are called in > sequence if multiple probes are registered. Similar to kprobes, > uprobes also adopts to single step out-of-line, so that probe miss in > SMP environment can be avoided. But for userspace probes, instruction > copied into kernel address space cannot be single stepped, hence the > instruction must be copied to user address space. The solution is to > find free space in the current process address space and then copy the > original instruction and single step that instruction. > > User processes use stack space to store local variables, agruments and > return values. Normally the stack space either below or above the > stack pointer indicates the free stack space. > > The instruction to be single stepped can modify the stack space, > hence before using the free stack space, sufficient stack space should > be left. The instruction is copied to the bottom of the page and check > is made such that the copied instruction does not cross the page > boundry. The copied instruction is then single stepped. Several > architectures does not allow the instruction to be executed from the > stack location, since no-exec bit is set for the stack pages. In those > architectures, the page table entry corresponding to the stack page is > identified and the no-exec bit is unset making the instruction on that > stack page to be executed. > > There are situations where even the free stack space is not enough for > the user instruction to be copied and single stepped. In such > situations, the virtual memory area(vma) can be expanded beyond the > current stack vma. This expaneded stack can be used to copy the > original instruction and single step out-of-line. > > Even if the vma cannot be extended then the instruction much be > executed inline, by replacing the breakpoint instruction with original > instruction. > > ... > > + > +/** > + * This routines get the pte of the page containing the specified address. > + */ > +static pte_t __kprobes *get_uprobe_pte(unsigned long address) > +{ > + pgd_t *pgd; > + pud_t *pud; > + pmd_t *pmd; > + pte_t *pte = NULL; > + > + pgd = pgd_offset(current->mm, address); > + if (!pgd) > + goto out; > + > + pud = pud_offset(pgd, address); > + if (!pud) > + goto out; > + > + pmd = pmd_offset(pud, address); > + if (!pmd) > + goto out; > + > + pte = pte_alloc_map(current->mm, pmd, address); > + > +out: > + return pte; > +} That's familiar looking code.. I guess this should be given a more generic name then placed in mm/memory.c, which is where we do pagetable walking. > +/** > + * This routine check for space in the current process's stack > + * address space. If enough address space is found, copy the original > + * instruction on that page for single stepping out-of-line. > + */ > +static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe , > + struct pt_regs *regs, struct vm_area_struct *vma) > +{ > + unsigned long addr, stack_addr = regs->esp; > + int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t); > + > + if (vma->vm_flags & VM_GROWSDOWN) { > + if (((stack_addr - sizeof(long long))) < > + (vma->vm_start + size)) > + return -ENOMEM; > + addr = vma->vm_start; > + } else if (vma->vm_flags & VM_GROWSUP) { > + if ((vma->vm_end - size) < (stack_addr + sizeof(long long))) > + return -ENOMEM; > + addr = vma->vm_end - size; > + } else > + return -EFAULT; > + > + vma->vm_flags |= VM_LOCKED; > + > + if (__copy_to_user_inatomic((unsigned long *)addr, > + (unsigned long *)uprobe->kp.ainsn.insn, size)) > + return -EFAULT; > + > + regs->eip = addr; > + > + return 0; > +} If we're going to use __copy_to_user_inatomic() then we'll need some nice comments explaining why this is happening. And we'll need to actually *be* in-atomic. That means we need an open-coded inc_preempt_count() and dec_preempt_count() in there and I don't see them. > +/** > + * This routine expands the stack beyond the present process address > + * space and copies the instruction to that location, so that > + * processor can single step out-of-line. > + */ > +static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe, > + struct pt_regs *regs, struct vm_area_struct *vma) > +{ > + unsigned long addr, vm_addr; > + int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t); > + struct vm_area_struct *new_vma; > + struct mm_struct *mm = current->mm; > + > + > + if (!down_read_trylock(¤t->mm->mmap_sem)) > + return -ENOMEM; > + > + if (vma->vm_flags & VM_GROWSDOWN) > + vm_addr = vma->vm_start - size; > + else if (vma->vm_flags & VM_GROWSUP) > + vm_addr = vma->vm_end + size; > + else { > + up_read(¤t->mm->mmap_sem); > + return -EFAULT; > + } > + > + new_vma = find_extend_vma(mm, vm_addr); > + if (!new_vma) { > + up_read(¤t->mm->mmap_sem); > + return -ENOMEM; > + } > + > + if (new_vma->vm_flags & VM_GROWSDOWN) > + addr = new_vma->vm_start; > + else > + addr = new_vma->vm_end - size; > + > + new_vma->vm_flags |= VM_LOCKED; > + up_read(¤t->mm->mmap_sem); > + > + if (__copy_to_user_inatomic((unsigned long *)addr, > + (unsigned long *)uprobe->kp.ainsn.insn, size)) > + return -EFAULT; > + > + regs->eip = addr; > + > + return 0; > +} Why is VM_LOCKED being set? (It needs a comment). Where does it get unset? > + > + if (__copy_to_user_inatomic((unsigned long *)page_addr, > + source, size)) > + if (__copy_to_user_inatomic( > + (unsigned long *)(page_addr - size), source, size)) See above. > + > +/** > + * This routines get the page containing the probe, maps it and > + * replaced the instruction at the probed address with specified > + * opcode. > + */ > +void __kprobes replace_original_insn(struct uprobe *uprobe, > + struct pt_regs *regs, kprobe_opcode_t opcode) > +{ > + kprobe_opcode_t *addr; > + struct page *page; > + > + page = find_get_page(uprobe->inode->i_mapping, > + uprobe->offset >> PAGE_CACHE_SHIFT); > + BUG_ON(!page); > + > + __lock_page(page); Whoa. Why is __lock_page() being used here? It looks like a bug is being covered up. > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > + addr = (kprobe_opcode_t *)((unsigned long)addr + > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > + *addr = opcode; > + /*TODO: flush vma ? */ flush_dcache_page() would be needed. But then, what happens if the page is shared by other processes? Do they all start taking debug traps? > + kunmap_atomic(addr, KM_USER1); > + > + unlock_page(page); > + > + if (page) > + page_cache_release(page); > + regs->eip = (unsigned long)uprobe->kp.addr; > +} > + > +/** > + * This routine provides the functionality of single stepping > + * out-of-line. If single stepping out-of-line cannot be achieved, > + * it replaces with the original instruction allowing it to single > + * step inline. > + */ > +static inline int prepare_singlestep_uprobe(struct uprobe *uprobe, > + struct uprobe_ctlblk *ucb, struct pt_regs *regs) > +{ > + unsigned long stack_addr = regs->esp, flags; > + struct vm_area_struct *vma = NULL; > + int err = 0; > + > + vma = find_vma(current->mm, (stack_addr & PAGE_MASK)); I don't think mmap_sem is held here? > +static inline int uprobe_fault_handler(struct pt_regs *regs, int trapnr) If, for some reason, the compiler decides to not inline this function then you have a hunk of code which isn't marked __kprobes, but it should be. I'd suggest that you remove all inlining from this code and add the appropriate section markers. Or I guess you could use __always_inline, but I'm not sure that it's really worth the fuss and obscurity of doing that. All kprobes-related code should be audited for this problem. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 11:09 ` Andrew Morton @ 2006-03-20 11:24 ` Arjan van de Ven 2006-03-20 14:05 ` Prasanna S Panchamukhi 2006-03-20 11:32 ` Nick Piggin 2006-03-20 13:48 ` Prasanna S Panchamukhi 2 siblings, 1 reply; 33+ messages in thread From: Arjan van de Ven @ 2006-03-20 11:24 UTC (permalink / raw) To: Andrew Morton; +Cc: prasanna, ak, davem, suparna, richardj_moore, linux-kernel > And we'll need to actually *be* in-atomic. That means we need an > open-coded inc_preempt_count() and dec_preempt_count() in there and I don't > see them. > .. > Why is VM_LOCKED being set? (It needs a comment). > > Where does it get unset? if this is an attempt to make the copy_in_atomic to be atomic, then it is a bug; the user can unset this bit after all via mprotect, even from another thread, and concurrently. UnGood(tm). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 11:24 ` Arjan van de Ven @ 2006-03-20 14:05 ` Prasanna S Panchamukhi 2006-03-20 14:13 ` Arjan van de Ven 0 siblings, 1 reply; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 14:05 UTC (permalink / raw) To: Arjan van de Ven Cc: Andrew Morton, ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 12:24:01PM +0100, Arjan van de Ven wrote: > > Lines: 16 > > > And we'll need to actually *be* in-atomic. That means we need an > > open-coded inc_preempt_count() and dec_preempt_count() in there and I don't > > see them. > > > > .. > > > Why is VM_LOCKED being set? (It needs a comment). > > > > Where does it get unset? > > > if this is an attempt to make the copy_in_atomic to be atomic, then it > is a bug; the user can unset this bit after all via mprotect, even from > another thread, and concurrently. U You are right, the purpose was to make copy_to_user_inatomic() to suceed. I need to look at this issue again. Any suggestions? Thanks Prasanna -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 14:05 ` Prasanna S Panchamukhi @ 2006-03-20 14:13 ` Arjan van de Ven 0 siblings, 0 replies; 33+ messages in thread From: Arjan van de Ven @ 2006-03-20 14:13 UTC (permalink / raw) To: prasanna; +Cc: Andrew Morton, ak, davem, suparna, richardj_moore, linux-kernel On Mon, 2006-03-20 at 19:35 +0530, Prasanna S Panchamukhi wrote: > On Mon, Mar 20, 2006 at 12:24:01PM +0100, Arjan van de Ven wrote: > > > > Lines: 16 > > > > > And we'll need to actually *be* in-atomic. That means we need an > > > open-coded inc_preempt_count() and dec_preempt_count() in there and I don't > > > see them. > > > > > > > .. > > > > > Why is VM_LOCKED being set? (It needs a comment). > > > > > > Where does it get unset? > > > > > > if this is an attempt to make the copy_in_atomic to be atomic, then it > > is a bug; the user can unset this bit after all via mprotect, even from > > another thread, and concurrently. U > > You are right, the purpose was to make copy_to_user_inatomic() to suceed. I > need to look at this issue again. Any suggestions? get_user_pages seems to be the only viable API for this.. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 11:09 ` Andrew Morton 2006-03-20 11:24 ` Arjan van de Ven @ 2006-03-20 11:32 ` Nick Piggin 2006-03-20 13:52 ` Prasanna S Panchamukhi 2006-03-20 13:48 ` Prasanna S Panchamukhi 2 siblings, 1 reply; 33+ messages in thread From: Nick Piggin @ 2006-03-20 11:32 UTC (permalink / raw) To: prasanna; +Cc: Andrew Morton, ak, davem, suparna, richardj_moore, linux-kernel Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: >>+ >>+/** >>+ * This routines get the pte of the page containing the specified address. >>+ */ >>+static pte_t __kprobes *get_uprobe_pte(unsigned long address) >>+{ >>+ pgd_t *pgd; >>+ pud_t *pud; >>+ pmd_t *pmd; >>+ pte_t *pte = NULL; >>+ >>+ pgd = pgd_offset(current->mm, address); >>+ if (!pgd) >>+ goto out; >>+ >>+ pud = pud_offset(pgd, address); >>+ if (!pud) >>+ goto out; >>+ >>+ pmd = pmd_offset(pud, address); >>+ if (!pmd) >>+ goto out; >>+ >>+ pte = pte_alloc_map(current->mm, pmd, address); >>+ >>+out: >>+ return pte; >>+} > > > That's familiar looking code.. > > I guess this should be given a more generic name then placed in > mm/memory.c, which is where we do pagetable walking. > Apart from this, there looks like quite a bit of other mm code that has been crammed into everywhere but mm/ (yes this has happened before, but it shouldn't be encouraged in new code). For this specific example, I'm not sure that a function returning a pointer to a pte is a good idea to be exporting. I'd like to see some good reasons why things like get_user_pages, find_*_page, and other standard APIs can't be used. Then you can list those reasons in an individual patch to add your required API to mm/. This can be more easily reviewed by people who aren't as good at wading through code as Andrew. Also, adding your own mm code outside core files really makes things hard to maintain and audit when somebody would like to change anything. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 11:32 ` Nick Piggin @ 2006-03-20 13:52 ` Prasanna S Panchamukhi 0 siblings, 0 replies; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 13:52 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 10:32:40PM +1100, Nick Piggin wrote: > Andrew Morton wrote: > >Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > >>+ > >>+/** > >>+ * This routines get the pte of the page containing the specified > >>address. > >>+ */ > >>+static pte_t __kprobes *get_uprobe_pte(unsigned long address) > >>+{ > >>+ pgd_t *pgd; > >>+ pud_t *pud; > >>+ pmd_t *pmd; > >>+ pte_t *pte = NULL; > >>+ > >>+ pgd = pgd_offset(current->mm, address); > >>+ if (!pgd) > >>+ goto out; > >>+ > >>+ pud = pud_offset(pgd, address); > >>+ if (!pud) > >>+ goto out; > >>+ > >>+ pmd = pmd_offset(pud, address); > >>+ if (!pmd) > >>+ goto out; > >>+ > >>+ pte = pte_alloc_map(current->mm, pmd, address); > >>+ > >>+out: > >>+ return pte; > >>+} > > > > > >That's familiar looking code.. > > > >I guess this should be given a more generic name then placed in > >mm/memory.c, which is where we do pagetable walking. > > > > Apart from this, there looks like quite a bit of other mm code > that has been crammed into everywhere but mm/ (yes this has > happened before, but it shouldn't be encouraged in new code). > > For this specific example, I'm not sure that a function returning > a pointer to a pte is a good idea to be exporting. I'd like to see > some good reasons why things like get_user_pages, find_*_page, and > other standard APIs can't be used. Then you can list those reasons > in an individual patch to add your required API to mm/. This can > be more easily reviewed by people who aren't as good at wading > through code as Andrew. > > Also, adding your own mm code outside core files really makes > things hard to maintain and audit when somebody would like to > change anything. > Nick, I will send out a separate patch to add this piece of code with proper comments. Thanks Prasanna > -- > SUSE Labs, Novell Inc. > Send instant messages to your online friends http://au.messenger.yahoo.com -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 11:09 ` Andrew Morton 2006-03-20 11:24 ` Arjan van de Ven 2006-03-20 11:32 ` Nick Piggin @ 2006-03-20 13:48 ` Prasanna S Panchamukhi 2006-03-20 20:40 ` Andrew Morton 2 siblings, 1 reply; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 13:48 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 03:09:22AM -0800, Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > This patch provides a mechanism for probe handling and > > executing the user-specified handlers. > > > > Each userspace probe is uniquely identified by the combination of > > inode and offset, hence during registeration the inode and offset > > combination is added to uprobes hash table. Initially when > > breakpoint instruction is hit, the uprobes hash table is looked up > > for matching inode and offset. The pre_handlers are called in > > sequence if multiple probes are registered. Similar to kprobes, > > uprobes also adopts to single step out-of-line, so that probe miss in > > SMP environment can be avoided. But for userspace probes, instruction > > copied into kernel address space cannot be single stepped, hence the > > instruction must be copied to user address space. The solution is to > > find free space in the current process address space and then copy the > > original instruction and single step that instruction. > > > > User processes use stack space to store local variables, agruments and > > return values. Normally the stack space either below or above the > > stack pointer indicates the free stack space. > > > > The instruction to be single stepped can modify the stack space, > > hence before using the free stack space, sufficient stack space should > > be left. The instruction is copied to the bottom of the page and check > > is made such that the copied instruction does not cross the page > > boundry. The copied instruction is then single stepped. Several > > architectures does not allow the instruction to be executed from the > > stack location, since no-exec bit is set for the stack pages. In those > > architectures, the page table entry corresponding to the stack page is > > identified and the no-exec bit is unset making the instruction on that > > stack page to be executed. > > > > There are situations where even the free stack space is not enough for > > the user instruction to be copied and single stepped. In such > > situations, the virtual memory area(vma) can be expanded beyond the > > current stack vma. This expaneded stack can be used to copy the > > original instruction and single step out-of-line. > > > > Even if the vma cannot be extended then the instruction much be > > executed inline, by replacing the breakpoint instruction with original > > instruction. > > > > ... > > > > + > > +/** > > + * This routines get the pte of the page containing the specified address. > > + */ > > +static pte_t __kprobes *get_uprobe_pte(unsigned long address) > > +{ > > + pgd_t *pgd; > > + pud_t *pud; > > + pmd_t *pmd; > > + pte_t *pte = NULL; > > + > > + pgd = pgd_offset(current->mm, address); > > + if (!pgd) > > + goto out; > > + > > + pud = pud_offset(pgd, address); > > + if (!pud) > > + goto out; > > + > > + pmd = pmd_offset(pud, address); > > + if (!pmd) > > + goto out; > > + > > + pte = pte_alloc_map(current->mm, pmd, address); > > + > > +out: > > + return pte; > > +} > > That's familiar looking code.. > > I guess this should be given a more generic name then placed in > mm/memory.c, which is where we do pagetable walking. Agreed, I will send out a separate patch for the helpers. > > > +/** > > + * This routine check for space in the current process's stack > > + * address space. If enough address space is found, copy the original > > + * instruction on that page for single stepping out-of-line. > > + */ > > +static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe , > > + struct pt_regs *regs, struct vm_area_struct *vma) > > +{ > > + unsigned long addr, stack_addr = regs->esp; > > + int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t); > > + > > + if (vma->vm_flags & VM_GROWSDOWN) { > > + if (((stack_addr - sizeof(long long))) < > > + (vma->vm_start + size)) > > + return -ENOMEM; > > + addr = vma->vm_start; > > + } else if (vma->vm_flags & VM_GROWSUP) { > > + if ((vma->vm_end - size) < (stack_addr + sizeof(long long))) > > + return -ENOMEM; > > + addr = vma->vm_end - size; > > + } else > > + return -EFAULT; > > + > > + vma->vm_flags |= VM_LOCKED; > > + > > + if (__copy_to_user_inatomic((unsigned long *)addr, > > + (unsigned long *)uprobe->kp.ainsn.insn, size)) > > + return -EFAULT; > > + > > + regs->eip = addr; > > + > > + return 0; > > +} > > If we're going to use __copy_to_user_inatomic() then we'll need some nice > comments explaining why this is happening. > > And we'll need to actually *be* in-atomic. That means we need an > open-coded inc_preempt_count() and dec_preempt_count() in there and I don't > see them. > We come here, after probe is hit, through uporbe_handler() with interrupts disabled (since it is a interrupt gate). In uprobe_handler() preemption is disabled and remains disabled until original instruction is single stepped. I will add proper comments in next iteration. > > +/** > > + * This routine expands the stack beyond the present process address > > + * space and copies the instruction to that location, so that > > + * processor can single step out-of-line. > > + */ > > +static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe, > > + struct pt_regs *regs, struct vm_area_struct *vma) > > +{ > > + unsigned long addr, vm_addr; > > + int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t); > > + struct vm_area_struct *new_vma; > > + struct mm_struct *mm = current->mm; > > + > > + > > + if (!down_read_trylock(¤t->mm->mmap_sem)) > > + return -ENOMEM; > > + > > + if (vma->vm_flags & VM_GROWSDOWN) > > + vm_addr = vma->vm_start - size; > > + else if (vma->vm_flags & VM_GROWSUP) > > + vm_addr = vma->vm_end + size; > > + else { > > + up_read(¤t->mm->mmap_sem); > > + return -EFAULT; > > + } > > + > > + new_vma = find_extend_vma(mm, vm_addr); > > + if (!new_vma) { > > + up_read(¤t->mm->mmap_sem); > > + return -ENOMEM; > > + } > > + > > + if (new_vma->vm_flags & VM_GROWSDOWN) > > + addr = new_vma->vm_start; > > + else > > + addr = new_vma->vm_end - size; > > + > > + new_vma->vm_flags |= VM_LOCKED; > > + up_read(¤t->mm->mmap_sem); > > + > > + if (__copy_to_user_inatomic((unsigned long *)addr, > > + (unsigned long *)uprobe->kp.ainsn.insn, size)) > > + return -EFAULT; > > + > > + regs->eip = addr; > > + > > + return 0; > > +} > > Why is VM_LOCKED being set? (It needs a comment). > > Where does it get unset? As Arjan says, idea was to make copy_to_user_inatomic() succeed but there is some issue here, I have to look at it again. > > + > > + if (__copy_to_user_inatomic((unsigned long *)page_addr, > > + source, size)) > > + if (__copy_to_user_inatomic( > > + (unsigned long *)(page_addr - size), source, size)) > > See above. > > > + > > +/** > > + * This routines get the page containing the probe, maps it and > > + * replaced the instruction at the probed address with specified > > + * opcode. > > + */ > > +void __kprobes replace_original_insn(struct uprobe *uprobe, > > + struct pt_regs *regs, kprobe_opcode_t opcode) > > +{ > > + kprobe_opcode_t *addr; > > + struct page *page; > > + > > + page = find_get_page(uprobe->inode->i_mapping, > > + uprobe->offset >> PAGE_CACHE_SHIFT); > > + BUG_ON(!page); > > + > > + __lock_page(page); > > Whoa. Why is __lock_page() being used here? It looks like a bug is being > covered up. > we come here with a spinlock held. I will add the comment. > > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > > + addr = (kprobe_opcode_t *)((unsigned long)addr + > > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > > + *addr = opcode; > > + /*TODO: flush vma ? */ > > flush_dcache_page() would be needed. > > But then, what happens if the page is shared by other processes? Do they > all start taking debug traps? Yes, you are right. I think single stepping inline was a bad idea, disarming the probe looks to be a better option > > + kunmap_atomic(addr, KM_USER1); > > + > > + unlock_page(page); > > + > > + if (page) > > + page_cache_release(page); > > + regs->eip = (unsigned long)uprobe->kp.addr; > > +} > > + > > +/** > > + * This routine provides the functionality of single stepping > > + * out-of-line. If single stepping out-of-line cannot be achieved, > > + * it replaces with the original instruction allowing it to single > > + * step inline. > > + */ > > +static inline int prepare_singlestep_uprobe(struct uprobe *uprobe, > > + struct uprobe_ctlblk *ucb, struct pt_regs *regs) > > +{ > > + unsigned long stack_addr = regs->esp, flags; > > + struct vm_area_struct *vma = NULL; > > + int err = 0; > > + > > + vma = find_vma(current->mm, (stack_addr & PAGE_MASK)); > > I don't think mmap_sem is held here? Yes, this will be taken care. > > > +static inline int uprobe_fault_handler(struct pt_regs *regs, int trapnr) > > If, for some reason, the compiler decides to not inline this function then > you have a hunk of code which isn't marked __kprobes, but it should be. > > I'd suggest that you remove all inlining from this code and add the > appropriate section markers. > > Or I guess you could use __always_inline, but I'm not sure that it's really > worth the fuss and obscurity of doing that. > > All kprobes-related code should be audited for this problem. Yes, I will audit it and send out a patch if necessary. Thanks Prasanna -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 13:48 ` Prasanna S Panchamukhi @ 2006-03-20 20:40 ` Andrew Morton 2006-03-21 2:02 ` Prasanna S Panchamukhi 0 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2006-03-20 20:40 UTC (permalink / raw) To: prasanna; +Cc: ak, davem, suparna, richardj_moore, linux-kernel Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > + > > > + if (__copy_to_user_inatomic((unsigned long *)addr, > > > + (unsigned long *)uprobe->kp.ainsn.insn, size)) > > > + return -EFAULT; > > > + > > > + regs->eip = addr; > > > + > > > + return 0; > > > +} > > > > If we're going to use __copy_to_user_inatomic() then we'll need some nice > > comments explaining why this is happening. > > > > And we'll need to actually *be* in-atomic. That means we need an > > open-coded inc_preempt_count() and dec_preempt_count() in there and I don't > > see them. > > > > We come here, after probe is hit, through uporbe_handler() with > interrupts disabled (since it is a interrupt gate). In uprobe_handler() > preemption is disabled and remains disabled until original instruction > is single stepped. > > I will add proper comments in next iteration. preempt_disable() is insufficient - it is a no-op on !CONFIG_PREEMPT. You _must_ run inc_preempt_count(). See how kmap_atomic() and kunmap_atomic() work. > > > + */ > > > +void __kprobes replace_original_insn(struct uprobe *uprobe, > > > + struct pt_regs *regs, kprobe_opcode_t opcode) > > > +{ > > > + kprobe_opcode_t *addr; > > > + struct page *page; > > > + > > > + page = find_get_page(uprobe->inode->i_mapping, > > > + uprobe->offset >> PAGE_CACHE_SHIFT); > > > + BUG_ON(!page); > > > + > > > + __lock_page(page); > > > > Whoa. Why is __lock_page() being used here? It looks like a bug is being > > covered up. > > > > we come here with a spinlock held. I will add the comment. Then the code is buggy. __lock_page() can schedule away, causing this CPU to recur onto the same lock and deadlock. > > > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > > > + addr = (kprobe_opcode_t *)((unsigned long)addr + > > > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > > > + *addr = opcode; > > > + /*TODO: flush vma ? */ > > > > flush_dcache_page() would be needed. > > > > But then, what happens if the page is shared by other processes? Do they > > all start taking debug traps? > > Yes, you are right. I think single stepping inline was a bad idea, disarming > the probe looks to be a better option > You skipped my second question? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-20 20:40 ` Andrew Morton @ 2006-03-21 2:02 ` Prasanna S Panchamukhi 2006-03-21 10:05 ` Andrew Morton 0 siblings, 1 reply; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-21 2:02 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 12:40:49PM -0800, Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > > > + > > > > + if (__copy_to_user_inatomic((unsigned long *)addr, > > > > + (unsigned long *)uprobe->kp.ainsn.insn, size)) > > > > + return -EFAULT; > > > > + > > > > + regs->eip = addr; > > > > + > > > > + return 0; > > > > +} > > > > > > If we're going to use __copy_to_user_inatomic() then we'll need some nice > > > comments explaining why this is happening. > > > > > > And we'll need to actually *be* in-atomic. That means we need an > > > open-coded inc_preempt_count() and dec_preempt_count() in there and I don't > > > see them. > > > > > > > We come here, after probe is hit, through uporbe_handler() with > > interrupts disabled (since it is a interrupt gate). In uprobe_handler() > > preemption is disabled and remains disabled until original instruction > > is single stepped. > > > > I will add proper comments in next iteration. > > preempt_disable() is insufficient - it is a no-op on !CONFIG_PREEMPT. > > You _must_ run inc_preempt_count(). See how kmap_atomic() and > kunmap_atomic() work. > Yes, I will use inc_preempt_count() instead of preempt_disable(). > > > > + */ > > > > +void __kprobes replace_original_insn(struct uprobe *uprobe, > > > > + struct pt_regs *regs, kprobe_opcode_t opcode) > > > > +{ > > > > + kprobe_opcode_t *addr; > > > > + struct page *page; > > > > + > > > > + page = find_get_page(uprobe->inode->i_mapping, > > > > + uprobe->offset >> PAGE_CACHE_SHIFT); > > > > + BUG_ON(!page); > > > > + > > > > + __lock_page(page); > > > > > > Whoa. Why is __lock_page() being used here? It looks like a bug is being > > > covered up. > > > > > > > we come here with a spinlock held. I will add the comment. > > Then the code is buggy. __lock_page() can schedule away, causing this CPU > to recur onto the same lock and deadlock. Agreed. I will look at this issue and remove the __lock_page(). > > > > > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > > > > + addr = (kprobe_opcode_t *)((unsigned long)addr + > > > > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > > > > + *addr = opcode; > > > > + /*TODO: flush vma ? */ > > > > > > flush_dcache_page() would be needed. > > > > > > But then, what happens if the page is shared by other processes? Do they > > > all start taking debug traps? > > > > Yes, you are right. I think single stepping inline was a bad idea, disarming > > the probe looks to be a better option > > > > You skipped my second question? There is a small window in which other processor will not be able to see the breakpoint if we are single step inline. But since single stepping inline is a bad idea, we will disarm the probe forever (replace with original instrcution) if we cannot single step out-of-line. Any suggestions? Thanks Prasanna -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-21 2:02 ` Prasanna S Panchamukhi @ 2006-03-21 10:05 ` Andrew Morton 2006-03-21 11:05 ` Richard J Moore ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Andrew Morton @ 2006-03-21 10:05 UTC (permalink / raw) To: prasanna; +Cc: ak, davem, suparna, richardj_moore, linux-kernel Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > > > > > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > > > > > + addr = (kprobe_opcode_t *)((unsigned long)addr + > > > > > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > > > > > + *addr = opcode; > > > > > + /*TODO: flush vma ? */ > > > > > > > > flush_dcache_page() would be needed. > > > > > > > > But then, what happens if the page is shared by other processes? Do they > > > > all start taking debug traps? > > > > > > Yes, you are right. I think single stepping inline was a bad idea, disarming > > > the probe looks to be a better option > > > > > > > You skipped my second question? > > There is a small window in which other processor will not be able to see > the breakpoint if we are single step inline. But since single stepping inline > is a bad idea, we will disarm the probe forever (replace with original instrcution) if we cannot single step out-of-line. > Any suggestions? This doesn't appear to be working. Let's go back in time and pretend that these patches were never written, OK? We're standing around the water cooler saying "hey, wouldn't it be cool if someone did X". You guys are way too far into this and you keep on leaving everyone else behind. When I try to drag you up, you resist ;) So let me rephrase, and thrash around in the dark a little more. >From my reading of the code (and thus far that's my _only_ source of this information) it appears that a design decision has been made (for reasons which have yet to be disclosed) that the way to implement this (yet to be described) requirement is to set user breakpoints upon particular instructions within executables. System-wide, for all processes and threads. There are other things that could have been done. For example, you might have chosen to set breakpoints upon a particular virtual address within a heavyweight process. That's a process-oriented viewpoint, rather than a file-oriented one, if you like. This raises interesting questions, like - How come that decision was made? Why _is_ this an executable-oriented rather than process-oriented thing? Richard has covered that somewhat. - What happens if the executable is writeably mapped? - What happens if someone writes to the executable? (I think both of these were disallowed in the implementation-which-is-not-to-be-named). - What happens if different processes map the executable at different addresses? - Various other things which you've thought of and I haven't and which it would be REALLY interesting to hear about. But this is just one example. I don't think I'm being too picky here - my reaction on seeing all this stuff was, basically, "wtf is all this code for?". References to dprobes won't help, sorry - I've never looked at dprobes and I don't know anyone apart from you guys who has. What I'm asking you for is a description of what problem we're trying to solve and how this code solves that problem. It is hard, it is inefficient and, worse, it is error-prone for us to try to work all that out from a particular implementation. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-21 10:05 ` Andrew Morton @ 2006-03-21 11:05 ` Richard J Moore 2006-03-21 11:13 ` Suparna Bhattacharya 2006-03-21 12:23 ` Prasanna S Panchamukhi 2 siblings, 0 replies; 33+ messages in thread From: Richard J Moore @ 2006-03-21 11:05 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, linux-kernel, prasanna, suparna Andrew Morton <akpm@osdl.org> wrote on 21/03/2006 10:05:21: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > > > > > > > > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > > > > > > + addr = (kprobe_opcode_t *)((unsigned long)addr + > > > > > > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > > > > > > + *addr = opcode; > > > > > > + /*TODO: flush vma ? */ > > > > > > > > > > flush_dcache_page() would be needed. > > > > > > > > > > But then, what happens if the page is shared by other > processes? Do they > > > > > all start taking debug traps? > > > > > > > > Yes, you are right. I think single stepping inline was a bad > idea, disarming > > > > the probe looks to be a better option > > > > > > > > > > You skipped my second question? > > > > There is a small window in which other processor will not be able to see > > the breakpoint if we are single step inline. But since single > stepping inline > > is a bad idea, we will disarm the probe forever (replace with > original instrcution) if we cannot single step out-of-line. > > Any suggestions? > > This doesn't appear to be working. > > Let's go back in time and pretend that these patches were never written, > OK? We're standing around the water cooler saying "hey, wouldn't it be > cool if someone did X". You guys are way too far into this and you keep on > leaving everyone else behind. When I try to drag you up, you resist ;) > > So let me rephrase, and thrash around in the dark a little more. > > From my reading of the code (and thus far that's my _only_ source of this > information) it appears that a design decision has been made (for reasons > which have yet to be disclosed) that the way to implement this (yet to be > described) requirement is to set user breakpoints upon particular > instructions within executables. System-wide, for all processes and > threads. > > There are other things that could have been done. For example, you might > have chosen to set breakpoints upon a particular virtual address within a > heavyweight process. That's a process-oriented viewpoint, rather than a > file-oriented one, if you like. > > This raises interesting questions, like > > - How come that decision was made? Why _is_ this an executable-oriented > rather than process-oriented thing? Richard has covered that somewhat. > > - What happens if the executable is writeably mapped? > > - What happens if someone writes to the executable? (I think both > of these were disallowed in the implementation-which-is-not-to-be-named). > > - What happens if different processes map the executable at different > addresses? > > - Various other things which you've thought of and I haven't and which it > would be REALLY interesting to hear about. > > But this is just one example. I don't think I'm being too picky here - my > reaction on seeing all this stuff was, basically, "wtf is all this code > for?". References to dprobes won't help, sorry - I've never looked at > dprobes and I don't know anyone apart from you guys who has. > > What I'm asking you for is a description of what problem we're trying to > solve and how this code solves that problem. It is hard, it is inefficient > and, worse, it is error-prone for us to try to work all that out from a > particular implementation. This is completely reasonable. And we don't need to refer to dprobes - that was merely an evolutionary step. I do have a notes that captured most the early design discussions and decisions, which Prasanna and I should dive into to help answer some of the points you raise. Let me suggest, we chat among ourselves and pull together the relevant details to answer each of your questions. Richard ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-21 10:05 ` Andrew Morton 2006-03-21 11:05 ` Richard J Moore @ 2006-03-21 11:13 ` Suparna Bhattacharya 2006-03-21 12:23 ` Prasanna S Panchamukhi 2 siblings, 0 replies; 33+ messages in thread From: Suparna Bhattacharya @ 2006-03-21 11:13 UTC (permalink / raw) To: Andrew Morton; +Cc: prasanna, ak, davem, richardj_moore, linux-kernel On Tue, Mar 21, 2006 at 02:05:21AM -0800, Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > > > > > > > > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > > > > > > + addr = (kprobe_opcode_t *)((unsigned long)addr + > > > > > > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > > > > > > + *addr = opcode; > > > > > > + /*TODO: flush vma ? */ > > > > > > > > > > flush_dcache_page() would be needed. > > > > > > > > > > But then, what happens if the page is shared by other processes? Do they > > > > > all start taking debug traps? > > > > > > > > Yes, you are right. I think single stepping inline was a bad idea, disarming > > > > the probe looks to be a better option > > > > > > > > > > You skipped my second question? > > > > There is a small window in which other processor will not be able to see > > the breakpoint if we are single step inline. But since single stepping inline > > is a bad idea, we will disarm the probe forever (replace with original instrcution) if we cannot single step out-of-line. > > Any suggestions? > > This doesn't appear to be working. > > Let's go back in time and pretend that these patches were never written, > OK? We're standing around the water cooler saying "hey, wouldn't it be > cool if someone did X". You guys are way too far into this and you keep on > leaving everyone else behind. When I try to drag you up, you resist ;) > > So let me rephrase, and thrash around in the dark a little more. > > >From my reading of the code (and thus far that's my _only_ source of this > information) it appears that a design decision has been made (for reasons > which have yet to be disclosed) that the way to implement this (yet to be > described) requirement is to set user breakpoints upon particular > instructions within executables. System-wide, for all processes and > threads. > > There are other things that could have been done. For example, you might > have chosen to set breakpoints upon a particular virtual address within a > heavyweight process. That's a process-oriented viewpoint, rather than a > file-oriented one, if you like. > > This raises interesting questions, like > > - How come that decision was made? Why _is_ this an executable-oriented > rather than process-oriented thing? Richard has covered that somewhat. > > - What happens if the executable is writeably mapped? > > - What happens if someone writes to the executable? (I think both > of these were disallowed in the implementation-which-is-not-to-be-named). > > - What happens if different processes map the executable at different > addresses? > > - Various other things which you've thought of and I haven't and which it > would be REALLY interesting to hear about. > > But this is just one example. I don't think I'm being too picky here - my > reaction on seeing all this stuff was, basically, "wtf is all this code > for?". References to dprobes won't help, sorry - I've never looked at > dprobes and I don't know anyone apart from you guys who has. > > What I'm asking you for is a description of what problem we're trying to > solve and how this code solves that problem. It is hard, it is inefficient > and, worse, it is error-prone for us to try to work all that out from a > particular implementation. Prasanna, I guess putting together a short writeup on the problem description and the thinking behind the design decisions, known flaws etc, in the form of a Documentation patch may help for starters. These questions are likely to come up everytime anyone looks at the code. The key thinking behind a lot of the design decisions was the need for a very low overhead probe mechanism that would allow thousands of active probes on the system and could detect any instance which hits the probe, including probes on shared libraries which may be loaded by lots of processes. Thus, (1) no forcing copy-on-write pages, (2) no forcing in executable pages in memory just to place a probe on them (hence zero overhead for probes which are very unlikely to be hit), (3) no restrictions on evicting a page with a probe on it from memory (4) probes being tracked by an (inode, offset) tuple rather than by virtual address so that they can be shared across all processes mapping the executable/library even at different virtual addresses, etc. Regards Suparna -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line 2006-03-21 10:05 ` Andrew Morton 2006-03-21 11:05 ` Richard J Moore 2006-03-21 11:13 ` Suparna Bhattacharya @ 2006-03-21 12:23 ` Prasanna S Panchamukhi 2 siblings, 0 replies; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-21 12:23 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, suparna, richardj_moore, linux-kernel On Tue, Mar 21, 2006 at 02:05:21AM -0800, Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > > > > > > > > + addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1); > > > > > > + addr = (kprobe_opcode_t *)((unsigned long)addr + > > > > > > + (unsigned long)(uprobe->offset & ~PAGE_MASK)); > > > > > > + *addr = opcode; > > > > > > + /*TODO: flush vma ? */ > > > > > > > > > > flush_dcache_page() would be needed. > > > > > > > > > > But then, what happens if the page is shared by other processes? Do they > > > > > all start taking debug traps? > > > > > > > > Yes, you are right. I think single stepping inline was a bad idea, disarming > > > > the probe looks to be a better option > > > > > > > > > > You skipped my second question? > > > > There is a small window in which other processor will not be able to see > > the breakpoint if we are single step inline. But since single stepping inline > > is a bad idea, we will disarm the probe forever (replace with original instrcution) if we cannot single step out-of-line. > > Any suggestions? > > This doesn't appear to be working. > > Let's go back in time and pretend that these patches were never written, > OK? We're standing around the water cooler saying "hey, wouldn't it be > cool if someone did X". You guys are way too far into this and you keep on > leaving everyone else behind. When I try to drag you up, you resist ;) > > So let me rephrase, and thrash around in the dark a little more. > > >From my reading of the code (and thus far that's my _only_ source of this > information) it appears that a design decision has been made (for reasons > which have yet to be disclosed) that the way to implement this (yet to be > described) requirement is to set user breakpoints upon particular > instructions within executables. System-wide, for all processes and > threads. > > There are other things that could have been done. For example, you might > have chosen to set breakpoints upon a particular virtual address within a > heavyweight process. That's a process-oriented viewpoint, rather than a > file-oriented one, if you like. > > This raises interesting questions, like > > - How come that decision was made? Why _is_ this an executable-oriented > rather than process-oriented thing? Richard has covered that somewhat. > > - What happens if the executable is writeably mapped? > > - What happens if someone writes to the executable? (I think both > of these were disallowed in the implementation-which-is-not-to-be-named). > > - What happens if different processes map the executable at different > addresses? > > - Various other things which you've thought of and I haven't and which it > would be REALLY interesting to hear about. > > But this is just one example. I don't think I'm being too picky here - my > reaction on seeing all this stuff was, basically, "wtf is all this code > for?". References to dprobes won't help, sorry - I've never looked at > dprobes and I don't know anyone apart from you guys who has. > > What I'm asking you for is a description of what problem we're trying to > solve and how this code solves that problem. It is hard, it is inefficient > and, worse, it is error-prone for us to try to work all that out from a > particular implementation. Andrew, The basic need is to provide infrastructure for user-space dynamic instrumentation. As with kprobes, there is no need to recompile or restart applications for instrumentation, under a debugger for instance. Possible approaches which were looked up 1. Attaching or loading the application into a tool. In this method the user application must be loaded into a tool or the tool is attached to already running application. Before the user can instrument an application the user should decide what that instrumentation will consist of. Dynaprof uses such a mechanism. http://www.dyninst.org/tools.html 2. Using a "jump" instruction to a trampoline and trampoline executing the instrumented code in user-space. Eg: Paradyn tool. (http://www.paradyn.org/) Issues with method 1 and 2 are: * Induces Intel erratum E49 where the other processors see stale data while one processor replaces the jump instruction. * Instruction can only be replaced atomically if the size of the jump instruction is greater than or equal to the original instruction. * Other processors need to be stopped if the "jump" instruction size is less than the original instruction. 3. Using breakpoint instruction Using a breakpoint instruction and executing the instrumentation code from within the breakpoint handler in the interrupt context. The advantages of the approach (3) taken, apart from what Suparna listed earlier in this thread - User is able to collect user-space data and kernel space data using the same instrumenation code and getting a complete picture - Probes are visible across fork() syscall. Richard's mail earlier in this thread details the per process Vs per executable file based probe. This is just an RFC and we are looking for suggestions (like Christoph's). Thanks Prasanna -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-20 6:10 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi 2006-03-20 6:11 ` [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line Prasanna S Panchamukhi @ 2006-03-20 10:53 ` Andrew Morton 2006-03-20 13:48 ` Prasanna S Panchamukhi 2006-03-20 11:10 ` Andrew Morton 2 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2006-03-20 10:53 UTC (permalink / raw) To: prasanna; +Cc: ak, davem, suparna, richardj_moore, linux-kernel Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > This patch provides the feature of inserting probes on pages that are > not present in the memory during registration. > > To add readpage and readpages() hooks, two new elements are added to > the uprobe_module object: > struct address_space_operations *ori_a_ops; > struct address_space_operations user_a_ops; > > User-space probes allows probes to be inserted even in pages that are > not present in the memory at the time of registration. This is done > by adding hooks to the readpage and readpages routines. During > registration, the address space operation object is modified by > substituting user-space probes's specific readpage and readpages > routines. When the pages are read into memory through the readpage and > readpages address space operations, any associated probes are > automatically inserted into those pages. These user-space probes > readpage and readpages routines internally call the original > readpage() and readpages() routines, and then check whether probes are > to be added to these pages, inserting probes as necessary. The > overhead of adding these hooks is limited to the application on which > the probes are inserted. > > During unregistration, care should be taken to replace the readpage and > readpages hooks with the original routines if no probes remain on that > application. > So... The code's replacing the address_space's address_space_operations with a copy which has .readpage() and .readpages() modified, because it happens that filemap_nopage() uses those. This is all rather hacky. I think we need to step back and discuss what services this feature is trying to provide, and how it is to provide them. Your covering description didn't describe that - it dives straigt into details without even describing what the patches are trying to achieve. So. What are we trying to achieve here, and how are we trying to achieve it? What problems were encountered in the development of the feature and how were they solved? What alternative solutions were there? I can mostly work all that out from background knowledge and looking at the code, but I'd rather hear it from the designers please. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-20 10:53 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Andrew Morton @ 2006-03-20 13:48 ` Prasanna S Panchamukhi 2006-03-21 2:12 ` Andrew Morton 0 siblings, 1 reply; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 13:48 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 02:53:11AM -0800, Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > This patch provides the feature of inserting probes on pages that are > > not present in the memory during registration. > > > > To add readpage and readpages() hooks, two new elements are added to > > the uprobe_module object: > > struct address_space_operations *ori_a_ops; > > struct address_space_operations user_a_ops; > > > > User-space probes allows probes to be inserted even in pages that are > > not present in the memory at the time of registration. This is done > > by adding hooks to the readpage and readpages routines. During > > registration, the address space operation object is modified by > > substituting user-space probes's specific readpage and readpages > > routines. When the pages are read into memory through the readpage and > > readpages address space operations, any associated probes are > > automatically inserted into those pages. These user-space probes > > readpage and readpages routines internally call the original > > readpage() and readpages() routines, and then check whether probes are > > to be added to these pages, inserting probes as necessary. The > > overhead of adding these hooks is limited to the application on which > > the probes are inserted. > > > > During unregistration, care should be taken to replace the readpage and > > readpages hooks with the original routines if no probes remain on that > > application. > > > > So... The code's replacing the address_space's address_space_operations > with a copy which has .readpage() and .readpages() modified, because it > happens that filemap_nopage() uses those. > > This is all rather hacky. > > I think we need to step back and discuss what services this feature is > trying to provide, and how it is to provide them. Your covering > description didn't describe that - it dives straigt into details without > even describing what the patches are trying to achieve. > > So. What are we trying to achieve here, and how are we trying to achieve > it? What problems were encountered in the development of the feature and > how were they solved? What alternative solutions were there? > The basic idea is to insert probes on user applications which may or may not be in memory, at the time of probe insertion. The base interface patch (1/3) allows the user to insert the probes on the text pages that are already present in the memory. This is done by mapping the page via kmap() and then insert the breakpoint instruction at the given page offset. Then comes the issue of inserting a probe on pages not currently in memory. This is useful if we'd want to probe an application that hasn't yet started executing. In such situations, we still want to insert the probe, but defer insertion of actual probe until the text page is read into the memory. Here are a few ways to accomplish this: 1. Add a notifier in the readpage(), readpages() routine, which will notify when the text pages are read into the memory. 2. Reading the page from the executable into memory and then insert probes on that text page. But there are situations where the system can run into low memory problems and those text pages get discarded. 3. Change the readpage() and readpages() routines only for that application where probes will be inserted. This is done by hooking the readpage() and readpages() routines, which is limited to the probed application and will not change anything related to other applications. The current patchset uses approach 3. I'd like to know if there are better/cleaner ways to accomplish this? Thanks Prasanna -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-20 13:48 ` Prasanna S Panchamukhi @ 2006-03-21 2:12 ` Andrew Morton 2006-03-21 9:14 ` Richard J Moore 2006-03-21 11:28 ` Christoph Hellwig 0 siblings, 2 replies; 33+ messages in thread From: Andrew Morton @ 2006-03-21 2:12 UTC (permalink / raw) To: prasanna; +Cc: ak, davem, suparna, richardj_moore, linux-kernel Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > The basic idea is to insert probes on user applications which may or > may not be in memory, at the time of probe insertion. umm yes, but what for? What does this entire feature *do*? Why does Linux need it? OK, so it allows kernel modules to set breakpoints (via debug traps) into user code. But why do we want to be able to do that? What are the use-cases? This may sound like boringly obvious stuff to you, but without a complete problem statement from the designers, how are we to evaluate their proposed solution? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 2:12 ` Andrew Morton @ 2006-03-21 9:14 ` Richard J Moore 2006-03-21 11:14 ` Christoph Hellwig 2006-03-21 11:28 ` Christoph Hellwig 1 sibling, 1 reply; 33+ messages in thread From: Richard J Moore @ 2006-03-21 9:14 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, linux-kernel, prasanna, suparna Andrew, the need is probably better stated as one where system-wide probing or tracing is possible. There are times where one needs a global view. Of course one can use multiple tools to obtain such data e.g. probes in kernel space strace in user space and so on. The advantages of supporting user probes are as follows: 1) A single tool providing the data capture in a consistent manner eases the problems of correlation of events across multiple tools (for kernel and user space) 2) The dynamic aspect allows ad hoc probepoints to be inserted where no existing instrumentation is provided (emergency debug scenario for example). 3) The user probe distinguishes itself from all other externally managed tracing mechanisms in that probepoints are globally applied - i.e. without reference to PID. Compare this with ptrace breakpoints (hence strace and gdb) where tracepoints and breakpoints are localized to a specified set of processes. user-probes achieves this by design without the side effects (privatization of pages) that ptrace has. Again this supports the global view. 4) user-probes also supports the registering of the probepoints before an the probed code is loaded. The clearly has advantages for catching initialization problems. A real life example of where this capability would have been very useful is with a performance problem I am currently investigating. It involves a GPFS + SAMBA + TCPIP + RDAC and some user-space video serving application. We are looking are where the latencies are accumulating in the system for the specific user application. It's a very hard problem. Having multiple tools serve up a partial view and having to coordinate these view from both data analysis and data gathering perspectives is a real nightmare. - - Richard J Moore IBM Advanced Linux Response Team - Linux Technology Centre MOBEX: 264807; Mobile (+44) (0)7739-875237 Office: (+44) (0)1962-817072 Andrew Morton <akpm@osdl.org> To 21/03/2006 prasanna@in.ibm.com 02:12 cc ak@suse.de, davem@davemloft.net, suparna@in.ibm.com, Richard J Moore/UK/IBM@IBMGB, linux-kernel@vger.kernel.org bcc Subject Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > The basic idea is to insert probes on user applications which may or > may not be in memory, at the time of probe insertion. umm yes, but what for? What does this entire feature *do*? Why does Linux need it? OK, so it allows kernel modules to set breakpoints (via debug traps) into user code. But why do we want to be able to do that? What are the use-cases? This may sound like boringly obvious stuff to you, but without a complete problem statement from the designers, how are we to evaluate their proposed solution? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 9:14 ` Richard J Moore @ 2006-03-21 11:14 ` Christoph Hellwig 2006-03-21 11:38 ` Richard J Moore 2006-03-21 12:40 ` Theodore Ts'o 0 siblings, 2 replies; 33+ messages in thread From: Christoph Hellwig @ 2006-03-21 11:14 UTC (permalink / raw) To: Richard J Moore; +Cc: Andrew Morton, ak, davem, linux-kernel, prasanna, suparna > A real life example of where this capability would have been very useful is > with a performance problem I am currently investigating. It involves a GPFS > + SAMBA + TCPIP + RDAC this pobablt tells more about the crappy code quality of your propritary code than a real need for this. please argue without reference to huge blobs of junk. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 11:14 ` Christoph Hellwig @ 2006-03-21 11:38 ` Richard J Moore 2006-03-21 12:40 ` Theodore Ts'o 1 sibling, 0 replies; 33+ messages in thread From: Richard J Moore @ 2006-03-21 11:38 UTC (permalink / raw) To: Christoph Hellwig Cc: ak, Andrew Morton, davem, linux-kernel, prasanna, suparna Christoph Hellwig <hch@infradead.org> wrote on 21/03/2006 11:14:52: > > A real life example of where this capability would have been very useful is > > with a performance problem I am currently investigating. It involves a GPFS > > + SAMBA + TCPIP + RDAC > > this pobablt tells more about the crappy code quality of your propritary > code than a real need for this. please argue without reference to huge > blobs of junk. > Damn! I knew there was an obvious answer. Thanks Christoph, I'll code a fix over lunch. Your insight is as always most refreshing. Richard ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 11:14 ` Christoph Hellwig 2006-03-21 11:38 ` Richard J Moore @ 2006-03-21 12:40 ` Theodore Ts'o 1 sibling, 0 replies; 33+ messages in thread From: Theodore Ts'o @ 2006-03-21 12:40 UTC (permalink / raw) To: Christoph Hellwig, Richard J Moore, Andrew Morton, ak, davem, linux-kernel, prasanna, suparna On Tue, Mar 21, 2006 at 11:14:52AM +0000, Christoph Hellwig wrote: > > A real life example of where this capability would have been very useful is > > with a performance problem I am currently investigating. It involves a GPFS > > + SAMBA + TCPIP + RDAC > > this pobablt tells more about the crappy code quality of your propritary > code than a real need for this. please argue without reference to huge > blobs of junk. In real life there are complicated stacks; sometimes they are open source (for example, like JBoss or Tomcat), sometimes they are propietary products, sometimes they are custom applications written by the end-user. Sun has been making big hay about how with dtrace, you can easily figure out what is going on. Systemtap is a tool that will allow us to have have this kind of capability, and user space probes is part of that project. - Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 2:12 ` Andrew Morton 2006-03-21 9:14 ` Richard J Moore @ 2006-03-21 11:28 ` Christoph Hellwig 2006-03-21 11:42 ` Richard J Moore 1 sibling, 1 reply; 33+ messages in thread From: Christoph Hellwig @ 2006-03-21 11:28 UTC (permalink / raw) To: Andrew Morton; +Cc: prasanna, ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 06:12:55PM -0800, Andrew Morton wrote: > What does this entire feature *do*? Why does Linux need it? it's what dtrace does. thus the marketing departments at ibm and redhat decided to copy the features 1:1 instead of thinking what problem they want to solve. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 11:28 ` Christoph Hellwig @ 2006-03-21 11:42 ` Richard J Moore 2006-03-21 12:15 ` Christoph Hellwig 0 siblings, 1 reply; 33+ messages in thread From: Richard J Moore @ 2006-03-21 11:42 UTC (permalink / raw) To: Christoph Hellwig Cc: ak, Andrew Morton, davem, linux-kernel, prasanna, suparna Christoph Hellwig <hch@infradead.org> wrote on 21/03/2006 11:28:07: > On Mon, Mar 20, 2006 at 06:12:55PM -0800, Andrew Morton wrote: > > What does this entire feature *do*? Why does Linux need it? > > it's what dtrace does. thus the marketing departments at ibm and redhat > decided to copy the features 1:1 instead of thinking what problem they > want to solve. > I think you'll find it happened the other way round. Sun openly references my white papers. They even stole the name of an ancestor to kprobes. But who cares, it not relevant or particularly interesting whether the chicken or the egg came first. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 11:42 ` Richard J Moore @ 2006-03-21 12:15 ` Christoph Hellwig 2006-03-21 16:17 ` Richard J Moore 0 siblings, 1 reply; 33+ messages in thread From: Christoph Hellwig @ 2006-03-21 12:15 UTC (permalink / raw) To: Richard J Moore Cc: Christoph Hellwig, ak, Andrew Morton, davem, linux-kernel, prasanna, suparna > I think you'll find it happened the other way round. Sun openly references > my white papers. They even stole the name of an ancestor to kprobes. But > who cares, it not relevant or particularly interesting whether the chicken > or the egg came first. I know your papers, too. In fact dprobes' RPN program downloads are a far better design than systemtap's generation of kernel code. it's a pity that you gave up on dprobes instead of applying the required work to it and integrate it with other bits of a tracing framework. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-21 12:15 ` Christoph Hellwig @ 2006-03-21 16:17 ` Richard J Moore 0 siblings, 0 replies; 33+ messages in thread From: Richard J Moore @ 2006-03-21 16:17 UTC (permalink / raw) To: Christoph Hellwig Cc: ak, Andrew Morton, davem, Christoph Hellwig, linux-kernel, prasanna, suparna 72 Christoph Hellwig <hch@infradead.org> wrote on 21/03/2006 12:15:50: > > I think you'll find it happened the other way round. Sun openly references > > my white papers. They even stole the name of an ancestor to kprobes. But > > who cares, it not relevant or particularly interesting whether the chicken > > or the egg came first. > > I know your papers, too. In fact dprobes' RPN program downloads are a far > better design than systemtap's generation of kernel code. it's a pity that > you gave up on dprobes instead of applying the required work to it and > integrate it with other bits of a tracing framework. Fascinating, gave up on dprobes, not really! I thought the kernel community felt it was the wrong implementation. We did remove all the RPN stuff to a loadable kernel module and left behind a minimal API set - krpobes - which comprised the kernel probing mechanism, user-space probes extensions and watchpoint probes extension. The result was identical functionality to the original dprobes but with a minimal patch to the mainline kernel. But in addition it provided a very much more generalized interface that would allow other utilities to exploit the kernel interface, which they have. In this sense dprobes still exists and can be used on top of krpobes. What would you recommend be retained from dprobes? And what further modifications? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-20 6:10 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi 2006-03-20 6:11 ` [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line Prasanna S Panchamukhi 2006-03-20 10:53 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Andrew Morton @ 2006-03-20 11:10 ` Andrew Morton 2006-03-20 13:59 ` Prasanna S Panchamukhi 2 siblings, 1 reply; 33+ messages in thread From: Andrew Morton @ 2006-03-20 11:10 UTC (permalink / raw) To: prasanna; +Cc: ak, davem, suparna, richardj_moore, linux-kernel Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > +/** > + * This function hooks the readpages() of all modules that have active > + * probes on them. The original readpages() is called for the given > + * inode/address_space to actually read the pages into the memory. > + * Then all probes that are specified on these pages are inserted. > + */ The "/**" thing is designed to introduce a kerneldoc-style comment, but these comments aren't using kerneldoc. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [2/3 PATCH] Kprobes: User space probes support- readpage hooks 2006-03-20 11:10 ` Andrew Morton @ 2006-03-20 13:59 ` Prasanna S Panchamukhi 0 siblings, 0 replies; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 13:59 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 03:10:17AM -0800, Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > +/** > > + * This function hooks the readpages() of all modules that have active > > + * probes on them. The original readpages() is called for the given > > + * inode/address_space to actually read the pages into the memory. > > + * Then all probes that are specified on these pages are inserted. > > + */ > > The "/**" thing is designed to introduce a kerneldoc-style comment, but > these comments aren't using kerneldoc. Yes. I will take care of this issue. Thanks Prasanna -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [1/3 PATCH] Kprobes: User space probes support- base interface 2006-03-20 6:09 ` [1/3 PATCH] Kprobes: User space probes support- base interface Prasanna S Panchamukhi 2006-03-20 6:10 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi @ 2006-03-20 10:42 ` Andrew Morton 2006-03-20 13:48 ` Prasanna S Panchamukhi 1 sibling, 1 reply; 33+ messages in thread From: Andrew Morton @ 2006-03-20 10:42 UTC (permalink / raw) To: prasanna; +Cc: ak, davem, suparna, richardj_moore, linux-kernel Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > This patch provides two interfaces to insert and remove > user space probes. Each probe is uniquely identified by > inode and offset within that executable/library file. > Insertion of a probe involves getting the code page for > a given offset, mapping it into the memory and then insert > the breakpoint at the given offset. Also the probe is added > to the uprobe_table hash list. A uprobe_module data strcture > is allocated for every probed application/library image on disk. > Removal of a probe involves getting the code page for a given > offset, mapping that page into the memory and then replacing > the breakpoint instruction with a the original opcode. > This patch also provides aggrigate probe handler feature, > where user can define multiple handlers per probe. > > +/** > + * Finds a uprobe at the specified user-space address in the current task. > + * Points current_uprobe at that uprobe and returns the corresponding kprobe. > + */ > +struct kprobe __kprobes *get_uprobe(void *addr) > +{ > + struct mm_struct *mm = current->mm; > + struct vm_area_struct *vma; > + struct inode *inode; > + unsigned long offset; > + struct kprobe *p, *kpr; > + struct uprobe *uprobe; > + > + vma = find_vma(mm, (unsigned long)addr); > + > + BUG_ON(!vma); /* this should not happen, not in our memory map */ > + > + offset = (unsigned long)addr - (vma->vm_start + > + (vma->vm_pgoff << PAGE_SHIFT)); > + if (!vma->vm_file) > + return NULL; All this appears to be happening without mmap_sem held? > +/** > + * Wait for the page to be unlocked if someone else had locked it, > + * then map the page and insert or remove the breakpoint. > + */ > +static int __kprobes map_uprobe_page(struct page *page, struct uprobe *uprobe, > + process_uprobe_func_t process_kprobe_user) > +{ > + int ret = 0; > + kprobe_opcode_t *uprobe_address; > + > + if (!page) > + return -EINVAL; /* TODO: more suitable errno */ > + > + wait_on_page_locked(page); > + /* could probably retry readpage here. */ > + if (!PageUptodate(page)) > + return -EINVAL; /* TODO: more suitable errno */ > + > + lock_page(page); That's a lot of fuss and might be racy with truncate. Why not just run lock_page()? > + uprobe_address = (kprobe_opcode_t *)kmap(page); > + uprobe_address = (kprobe_opcode_t *)((unsigned long)uprobe_address + > + (uprobe->offset & ~PAGE_MASK)); > + ret = (*process_kprobe_user)(uprobe, uprobe_address); > + kunmap(page); kmap_atomic() is preferred. > +/** > + * Gets exclusive write access to the given inode to ensure that the file > + * on which probes are currently applied does not change. Use the function, > + * deny_write_access_to_inode() we added in fs/namei.c. > + */ > +static inline int ex_write_lock(struct inode *inode) > +{ > + return deny_write_access_to_inode(inode); > +} hm, this code likes to poke around in VFS internals. It would be nice to have an overall description of what it's trying to do, why and how. > +/** > + * unregister_uprobe: Disarms the probe, removes the uprobe > + * pointers from the hash list and unhooks readpage routines. > + */ > +void __kprobes unregister_uprobe(struct uprobe *uprobe) > +{ > + struct address_space *mapping; > + struct uprobe_module *umodule; > + struct page *page; > + unsigned long flags; > + int ret = 0; > + > + if (!uprobe->inode) > + return; > + > + mapping = uprobe->inode->i_mapping; > + > + page = find_get_page(mapping, uprobe->offset >> PAGE_CACHE_SHIFT); > + > + spin_lock_irqsave(&uprobe_lock, flags); > + ret = remove_uprobe(uprobe); > + spin_unlock_irqrestore(&uprobe_lock, flags); > + > + mutex_lock(&uprobe_mutex); > + if (!(umodule = get_module_by_inode(uprobe->inode))) > + goto out; > + > + hlist_del(&uprobe->ulist); > + if (hlist_empty(&umodule->ulist_head)) { > + list_del(&umodule->mlist); > + ex_write_unlock(uprobe->inode); > + path_release(&umodule->nd); > + kfree(umodule); > + } > + > +out: > + mutex_unlock(&uprobe_mutex); > + if (ret) > + ret = map_uprobe_page(page, uprobe, remove_kprobe_user); > + > + if (ret == -EINVAL) > + return; > + /* > + * TODO: unregister_uprobe should not fail, need to handle > + * if it fails. > + */ > + flush_vma(mapping, page, uprobe); > + > + if (page) > + page_cache_release(page); > +} That's some pretty awkward coding. Buggy too. It doesn't drop the refcount on the page if map_uprobe_page() returned -EINVAL because it's assuming that EINVAL meant "there was no page". But there are multiple reasons for map_uprobe_page() to return -EINVAL. If that page isn't uptodate, we leak a ref. This function should be doing the checking for a find_get_page() failure, not map_uprobe_page(). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [1/3 PATCH] Kprobes: User space probes support- base interface 2006-03-20 10:42 ` [1/3 PATCH] Kprobes: User space probes support- base interface Andrew Morton @ 2006-03-20 13:48 ` Prasanna S Panchamukhi 0 siblings, 0 replies; 33+ messages in thread From: Prasanna S Panchamukhi @ 2006-03-20 13:48 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 02:42:48AM -0800, Andrew Morton wrote: > Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote: > > > > This patch provides two interfaces to insert and remove > > user space probes. Each probe is uniquely identified by > > inode and offset within that executable/library file. > > Insertion of a probe involves getting the code page for > > a given offset, mapping it into the memory and then insert > > the breakpoint at the given offset. Also the probe is added > > to the uprobe_table hash list. A uprobe_module data strcture > > is allocated for every probed application/library image on disk. > > Removal of a probe involves getting the code page for a given > > offset, mapping that page into the memory and then replacing > > the breakpoint instruction with a the original opcode. > > This patch also provides aggrigate probe handler feature, > > where user can define multiple handlers per probe. > > > > +/** > > + * Finds a uprobe at the specified user-space address in the current task. > > + * Points current_uprobe at that uprobe and returns the corresponding kprobe. > > + */ > > +struct kprobe __kprobes *get_uprobe(void *addr) > > +{ > > + struct mm_struct *mm = current->mm; > > + struct vm_area_struct *vma; > > + struct inode *inode; > > + unsigned long offset; > > + struct kprobe *p, *kpr; > > + struct uprobe *uprobe; > > + > > + vma = find_vma(mm, (unsigned long)addr); > > + > > + BUG_ON(!vma); /* this should not happen, not in our memory map */ > > + > > + offset = (unsigned long)addr - (vma->vm_start + > > + (vma->vm_pgoff << PAGE_SHIFT)); > > + if (!vma->vm_file) > > + return NULL; > > All this appears to be happening without mmap_sem held? Yes, I will make the changes to hold the mmap_sem. > > > +/** > > + * Wait for the page to be unlocked if someone else had locked it, > > + * then map the page and insert or remove the breakpoint. > > + */ > > +static int __kprobes map_uprobe_page(struct page *page, struct uprobe *uprobe, > > + process_uprobe_func_t process_kprobe_user) > > +{ > > + int ret = 0; > > + kprobe_opcode_t *uprobe_address; > > + > > + if (!page) > > + return -EINVAL; /* TODO: more suitable errno */ > > + > > + wait_on_page_locked(page); > > + /* could probably retry readpage here. */ > > + if (!PageUptodate(page)) > > + return -EINVAL; /* TODO: more suitable errno */ > > + > > + lock_page(page); > > That's a lot of fuss and might be racy with truncate. > > Why not just run lock_page()? Yes, I will do that. > > > + uprobe_address = (kprobe_opcode_t *)kmap(page); > > + uprobe_address = (kprobe_opcode_t *)((unsigned long)uprobe_address + > > + (uprobe->offset & ~PAGE_MASK)); > > + ret = (*process_kprobe_user)(uprobe, uprobe_address); > > + kunmap(page); > > kmap_atomic() is preferred. > Agreed. > > +/** > > + * Gets exclusive write access to the given inode to ensure that the file > > + * on which probes are currently applied does not change. Use the function, > > + * deny_write_access_to_inode() we added in fs/namei.c. > > + */ > > +static inline int ex_write_lock(struct inode *inode) > > +{ > > + return deny_write_access_to_inode(inode); > > +} > > hm, this code likes to poke around in VFS internals. It would be nice to > have an overall description of what it's trying to do, why and how. ok, I should probably separate VFS changes in a different patch with proper comments. However this is required to ensure that the executable associated with this inode on which probes are inserted does not change. deny_write_access_to_inode() just decrements the inode usage count to -1. > > > +/** > > + * unregister_uprobe: Disarms the probe, removes the uprobe > > + * pointers from the hash list and unhooks readpage routines. > > + */ > > +void __kprobes unregister_uprobe(struct uprobe *uprobe) > > +{ > > + struct address_space *mapping; > > + struct uprobe_module *umodule; > > + struct page *page; > > + unsigned long flags; > > + int ret = 0; > > + > > + if (!uprobe->inode) > > + return; > > + > > + mapping = uprobe->inode->i_mapping; > > + > > + page = find_get_page(mapping, uprobe->offset >> PAGE_CACHE_SHIFT); > > + > > + spin_lock_irqsave(&uprobe_lock, flags); > > + ret = remove_uprobe(uprobe); > > + spin_unlock_irqrestore(&uprobe_lock, flags); > > + > > + mutex_lock(&uprobe_mutex); > > + if (!(umodule = get_module_by_inode(uprobe->inode))) > > + goto out; > > + > > + hlist_del(&uprobe->ulist); > > + if (hlist_empty(&umodule->ulist_head)) { > > + list_del(&umodule->mlist); > > + ex_write_unlock(uprobe->inode); > > + path_release(&umodule->nd); > > + kfree(umodule); > > + } > > + > > +out: > > + mutex_unlock(&uprobe_mutex); > > + if (ret) > > + ret = map_uprobe_page(page, uprobe, remove_kprobe_user); > > + > > + if (ret == -EINVAL) > > + return; > > + /* > > + * TODO: unregister_uprobe should not fail, need to handle > > + * if it fails. > > + */ > > + flush_vma(mapping, page, uprobe); > > + > > + if (page) > > + page_cache_release(page); > > +} > > That's some pretty awkward coding. Buggy too. It doesn't drop the > refcount on the page if map_uprobe_page() returned -EINVAL because it's > assuming that EINVAL meant "there was no page". But there are multiple > reasons for map_uprobe_page() to return -EINVAL. If that page isn't > uptodate, we leak a ref. > > This function should be doing the checking for a find_get_page() failure, > not map_uprobe_page(). Yes, that is buggy, will fix. Thanks Prasanna -- Prasanna S Panchamukhi Linux Technology Center India Software Labs, IBM Bangalore Email: prasanna@in.ibm.com Ph: 91-80-51776329 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [0/3] Kprobes: User space probes support 2006-03-20 6:07 [0/3] Kprobes: User space probes support Prasanna S Panchamukhi 2006-03-20 6:09 ` [1/3 PATCH] Kprobes: User space probes support- base interface Prasanna S Panchamukhi @ 2006-03-21 11:39 ` Christoph Hellwig 1 sibling, 0 replies; 33+ messages in thread From: Christoph Hellwig @ 2006-03-21 11:39 UTC (permalink / raw) To: Prasanna S Panchamukhi Cc: akpm, Andi Kleen, davem, suparna, richardj_moore, linux-kernel On Mon, Mar 20, 2006 at 11:37:45AM +0530, Prasanna S Panchamukhi wrote: > This patch set provides the basic infrastructure for user-space probes > based on IBM's Dprobes. Similar to kprobes, user-space probes uses the > breakpoint mechanism to insert probes. User has to write a kernel module > to insert probes in the executable/library and specify the handlers in > the kernel module. Doing this from kernelspace is wrong. It should be done from userspace, best using a systemcall. Of couse the handlers can't work as-is you'd need to redo that to work similarlt to ptrace - which will hopefully allow some code reuse aswell. And please - independent of the best api in this case - please don't ever submit large amounts of code again that don't have any in-tree user. ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2006-03-21 16:21 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-20 6:07 [0/3] Kprobes: User space probes support Prasanna S Panchamukhi 2006-03-20 6:09 ` [1/3 PATCH] Kprobes: User space probes support- base interface Prasanna S Panchamukhi 2006-03-20 6:10 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi 2006-03-20 6:11 ` [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line Prasanna S Panchamukhi 2006-03-20 11:09 ` Andrew Morton 2006-03-20 11:24 ` Arjan van de Ven 2006-03-20 14:05 ` Prasanna S Panchamukhi 2006-03-20 14:13 ` Arjan van de Ven 2006-03-20 11:32 ` Nick Piggin 2006-03-20 13:52 ` Prasanna S Panchamukhi 2006-03-20 13:48 ` Prasanna S Panchamukhi 2006-03-20 20:40 ` Andrew Morton 2006-03-21 2:02 ` Prasanna S Panchamukhi 2006-03-21 10:05 ` Andrew Morton 2006-03-21 11:05 ` Richard J Moore 2006-03-21 11:13 ` Suparna Bhattacharya 2006-03-21 12:23 ` Prasanna S Panchamukhi 2006-03-20 10:53 ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Andrew Morton 2006-03-20 13:48 ` Prasanna S Panchamukhi 2006-03-21 2:12 ` Andrew Morton 2006-03-21 9:14 ` Richard J Moore 2006-03-21 11:14 ` Christoph Hellwig 2006-03-21 11:38 ` Richard J Moore 2006-03-21 12:40 ` Theodore Ts'o 2006-03-21 11:28 ` Christoph Hellwig 2006-03-21 11:42 ` Richard J Moore 2006-03-21 12:15 ` Christoph Hellwig 2006-03-21 16:17 ` Richard J Moore 2006-03-20 11:10 ` Andrew Morton 2006-03-20 13:59 ` Prasanna S Panchamukhi 2006-03-20 10:42 ` [1/3 PATCH] Kprobes: User space probes support- base interface Andrew Morton 2006-03-20 13:48 ` Prasanna S Panchamukhi 2006-03-21 11:39 ` [0/3] Kprobes: User space probes support Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox