All of lore.kernel.org
 help / color / mirror / Atom feed
From: Prasanna S Panchamukhi <prasanna@in.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: ak@suse.de, davem@davemloft.net, suparna@in.ibm.com,
	richardj_moore@uk.ibm.com, linux-kernel@vger.kernel.org
Subject: Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line
Date: Mon, 20 Mar 2006 19:18:54 +0530	[thread overview]
Message-ID: <20060320134854.GG8662@in.ibm.com> (raw)
In-Reply-To: <20060320030922.4ea9445b.akpm@osdl.org>

On Mon, Mar 20, 2006 at 03:09:22AM -0800, Andrew Morton wrote:
> Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote:
> >
> > This patch provides a mechanism for probe handling and
> > executing the user-specified handlers.
> > 
> > Each userspace probe is uniquely identified by the combination of
> > inode and offset, hence during registeration the inode and offset
> > combination is added to uprobes hash table. Initially when
> > breakpoint instruction is hit, the uprobes hash table is looked up
> > for matching inode and offset. The pre_handlers are called in
> > sequence if multiple probes are registered. Similar to kprobes,
> > uprobes also adopts to single step out-of-line, so that probe miss in
> > SMP environment can be avoided. But for userspace probes, instruction
> > copied into kernel address space cannot be single stepped, hence the
> > instruction must be copied to user address space. The solution is to
> > find free space in the current process address space and then copy the
> > original instruction and single step that instruction.
> > 
> > User processes use stack space to store local variables, agruments and
> > return values. Normally the stack space either below or above the
> > stack pointer indicates the free stack space.
> > 
> > The instruction to be single stepped can modify the stack space,
> > hence before using the free stack space, sufficient stack space should
> > be left. The instruction is copied to the bottom of the page and check
> > is made such that the copied instruction does not cross the page
> > boundry. The copied instruction is then single stepped. Several
> > architectures does not allow the instruction to be executed from the
> > stack location, since no-exec bit is set for the stack pages. In those
> > architectures, the page table entry corresponding to the stack page is
> > identified and the no-exec bit is unset making the instruction on that
> > stack page to be executed.
> > 
> > There are situations where even the free stack space is not enough for
> > the user instruction to be copied and single stepped. In such
> > situations, the virtual memory area(vma) can be expanded beyond the
> > current stack vma. This expaneded stack can be used to copy the
> > original instruction and single step out-of-line.
> > 
> > Even if the vma cannot be extended then the instruction much be
> > executed inline, by replacing the breakpoint instruction with original
> > instruction.
> > 
> > ...
> >
> > +
> > +/**
> > + * This routines get the pte of the page containing the specified address.
> > + */
> > +static pte_t  __kprobes *get_uprobe_pte(unsigned long address)
> > +{
> > +	pgd_t *pgd;
> > +	pud_t *pud;
> > +	pmd_t *pmd;
> > +	pte_t *pte = NULL;
> > +
> > +	pgd = pgd_offset(current->mm, address);
> > +	if (!pgd)
> > +		goto out;
> > +
> > +	pud = pud_offset(pgd, address);
> > +	if (!pud)
> > +		goto out;
> > +
> > +	pmd = pmd_offset(pud, address);
> > +	if (!pmd)
> > +		goto out;
> > +
> > +	pte = pte_alloc_map(current->mm, pmd, address);
> > +
> > +out:
> > +	return pte;
> > +}
> 
> That's familiar looking code..
> 
> I guess this should be given a more generic name then placed in
> mm/memory.c, which is where we do pagetable walking.

Agreed, I will send out a separate patch for the helpers.

> 
> > +/**
> > + *  This routine check for space in the current process's stack
> > + *  address space. If enough address space is found, copy the original
> > + *  instruction on that page for single stepping out-of-line.
> > + */
> > +static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe ,
> > +			struct pt_regs *regs, struct vm_area_struct *vma)
> > +{
> > +	unsigned long addr, stack_addr = regs->esp;
> > +	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
> > +
> > +	if (vma->vm_flags & VM_GROWSDOWN) {
> > +		if (((stack_addr - sizeof(long long))) <
> > +						(vma->vm_start + size))
> > +			return -ENOMEM;
> > +		addr = vma->vm_start;
> > +	} else if (vma->vm_flags & VM_GROWSUP) {
> > +		if ((vma->vm_end - size) < (stack_addr + sizeof(long long)))
> > +			return -ENOMEM;
> > +		addr = vma->vm_end - size;
> > +	} else
> > +		return -EFAULT;
> > +
> > +	vma->vm_flags |= VM_LOCKED;
> > +
> > +	if (__copy_to_user_inatomic((unsigned long *)addr,
> > +				(unsigned long *)uprobe->kp.ainsn.insn, size))
> > +		return -EFAULT;
> > +
> > +	regs->eip = addr;
> > +
> > +	return 0;
> > +}
> 
> If we're going to use __copy_to_user_inatomic() then we'll need some nice
> comments explaining why this is happening.
> 
> And we'll need to actually *be* in-atomic.  That means we need an
> open-coded inc_preempt_count() and dec_preempt_count() in there and I don't
> see them.
> 

We come here, after probe is hit, through uporbe_handler() with
interrupts disabled (since it is a interrupt gate). In uprobe_handler()
preemption is disabled and remains disabled until original instruction
is single stepped.

I will add proper comments in next iteration.

> > +/**
> > + * This routine expands the stack beyond the present process address
> > + * space and copies the instruction to that location, so that
> > + * processor can single step out-of-line.
> > + */
> > +static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe,
> > +			struct pt_regs *regs, struct vm_area_struct *vma)
> > +{
> > +	unsigned long addr, vm_addr;
> > +	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
> > +	struct vm_area_struct *new_vma;
> > +	struct mm_struct *mm = current->mm;
> > +
> > +
> > +	 if (!down_read_trylock(&current->mm->mmap_sem))
> > +		 return -ENOMEM;
> > +
> > +	if (vma->vm_flags & VM_GROWSDOWN)
> > +		vm_addr = vma->vm_start - size;
> > +	else if (vma->vm_flags & VM_GROWSUP)
> > +		vm_addr = vma->vm_end + size;
> > +	else {
> > +		up_read(&current->mm->mmap_sem);
> > +		return -EFAULT;
> > +	}
> > +
> > +	new_vma = find_extend_vma(mm, vm_addr);
> > +	if (!new_vma) {
> > +		up_read(&current->mm->mmap_sem);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (new_vma->vm_flags & VM_GROWSDOWN)
> > +		addr = new_vma->vm_start;
> > +	else
> > +		addr = new_vma->vm_end - size;
> > +
> > +	new_vma->vm_flags |= VM_LOCKED;
> > +	up_read(&current->mm->mmap_sem);
> > +
> > +	if (__copy_to_user_inatomic((unsigned long *)addr,
> > +				(unsigned long *)uprobe->kp.ainsn.insn, size))
> > +		return -EFAULT;
> > +
> > +	regs->eip = addr;
> > +
> > +	return  0;
> > +}
> 
> Why is VM_LOCKED being set? (It needs a comment).
> 
> Where does it get unset?

As Arjan says, idea was to make copy_to_user_inatomic() succeed but
there is some issue here, I have to look at it again.


> > +
> > +		if (__copy_to_user_inatomic((unsigned long *)page_addr,
> > +								source, size))
> > +		if (__copy_to_user_inatomic(
> > +			(unsigned long *)(page_addr - size), source, size))
> 
> See above.
> 
> > +
> > +/**
> > + * This routines get the page containing the probe, maps it and
> > + * replaced the instruction at the probed address with specified
> > + * opcode.
> > + */
> > +void __kprobes replace_original_insn(struct uprobe *uprobe,
> > +				struct pt_regs *regs, kprobe_opcode_t opcode)
> > +{
> > +	kprobe_opcode_t *addr;
> > +	struct page *page;
> > +
> > +	page = find_get_page(uprobe->inode->i_mapping,
> > +					uprobe->offset >> PAGE_CACHE_SHIFT);
> > +	BUG_ON(!page);
> > +
> > +	__lock_page(page);
> 
> Whoa.  Why is __lock_page() being used here?  It looks like a bug is being
> covered up.
> 

we come here with a spinlock held. I will add the comment.

> > +	addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1);
> > +	addr = (kprobe_opcode_t *)((unsigned long)addr +
> > +				 (unsigned long)(uprobe->offset & ~PAGE_MASK));
> > +	*addr = opcode;
> > +	/*TODO: flush vma ? */
> 
> flush_dcache_page() would be needed.
> 
> But then, what happens if the page is shared by other processes?  Do they
> all start taking debug traps?

Yes, you are right. I think single stepping inline was a bad idea, disarming
the probe looks to be a better option


> > +	kunmap_atomic(addr, KM_USER1);
> > +
> > +	unlock_page(page);
> > +
> > +	if (page)
> > +		page_cache_release(page);
> > +	regs->eip = (unsigned long)uprobe->kp.addr;
> > +}
> > +
> > +/**
> > + * This routine provides the functionality of single stepping
> > + * out-of-line. If single stepping out-of-line cannot be achieved,
> > + * it replaces with the original instruction allowing it to single
> > + * step inline.
> > + */
> > +static inline int prepare_singlestep_uprobe(struct uprobe *uprobe,
> > +				struct uprobe_ctlblk *ucb, struct pt_regs *regs)
> > +{
> > +	unsigned long stack_addr = regs->esp, flags;
> > +	struct vm_area_struct *vma = NULL;
> > +	int err = 0;
> > +
> > +	vma = find_vma(current->mm, (stack_addr & PAGE_MASK));
> 
> I don't think mmap_sem is held here?

Yes, this will be taken care.
> 
> > +static inline int uprobe_fault_handler(struct pt_regs *regs, int trapnr)
> 
> If, for some reason, the compiler decides to not inline this function then
> you have a hunk of code which isn't marked __kprobes, but it should be.
> 
> I'd suggest that you remove all inlining from this code and add the
> appropriate section markers.
> 
> Or I guess you could use __always_inline, but I'm not sure that it's really
> worth the fuss and obscurity of doing that.
> 
> All kprobes-related code should be audited for this problem.

Yes, I will audit it and send out a patch if necessary.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-51776329

  parent reply	other threads:[~2006-03-20 13:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-20  6:07 [0/3] Kprobes: User space probes support Prasanna S Panchamukhi
2006-03-20  6:09 ` [1/3 PATCH] Kprobes: User space probes support- base interface Prasanna S Panchamukhi
2006-03-20  6:10   ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi
2006-03-20  6:11     ` [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line Prasanna S Panchamukhi
2006-03-20 11:09       ` Andrew Morton
2006-03-20 11:24         ` Arjan van de Ven
2006-03-20 14:05           ` Prasanna S Panchamukhi
2006-03-20 14:13             ` Arjan van de Ven
2006-03-20 11:32         ` Nick Piggin
2006-03-20 13:52           ` Prasanna S Panchamukhi
2006-03-20 13:48         ` Prasanna S Panchamukhi [this message]
2006-03-20 20:40           ` Andrew Morton
2006-03-21  2:02             ` Prasanna S Panchamukhi
2006-03-21 10:05               ` Andrew Morton
2006-03-21 11:05                 ` Richard J Moore
2006-03-21 11:13                 ` Suparna Bhattacharya
2006-03-21 12:23                 ` Prasanna S Panchamukhi
2006-03-20 10:53     ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Andrew Morton
2006-03-20 13:48       ` Prasanna S Panchamukhi
2006-03-21  2:12         ` Andrew Morton
2006-03-21  9:14           ` Richard J Moore
2006-03-21 11:14             ` Christoph Hellwig
2006-03-21 11:38               ` Richard J Moore
2006-03-21 12:40               ` Theodore Ts'o
2006-03-21 11:28           ` Christoph Hellwig
2006-03-21 11:42             ` Richard J Moore
2006-03-21 12:15               ` Christoph Hellwig
2006-03-21 16:17                 ` Richard J Moore
2006-03-20 11:10     ` Andrew Morton
2006-03-20 13:59       ` Prasanna S Panchamukhi
2006-03-20 10:42   ` [1/3 PATCH] Kprobes: User space probes support- base interface Andrew Morton
2006-03-20 13:48     ` Prasanna S Panchamukhi
2006-03-21 11:39 ` [0/3] Kprobes: User space probes support Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060320134854.GG8662@in.ibm.com \
    --to=prasanna@in.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richardj_moore@uk.ibm.com \
    --cc=suparna@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.