public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Prasanna S Panchamukhi <prasanna@in.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: ak@suse.de, davem@davemloft.net, suparna@in.ibm.com,
	richardj_moore@uk.ibm.com, linux-kernel@vger.kernel.org
Subject: Re: [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line
Date: Mon, 20 Mar 2006 19:18:54 +0530	[thread overview]
Message-ID: <20060320134854.GG8662@in.ibm.com> (raw)
In-Reply-To: <20060320030922.4ea9445b.akpm@osdl.org>

On Mon, Mar 20, 2006 at 03:09:22AM -0800, Andrew Morton wrote:
> Prasanna S Panchamukhi <prasanna@in.ibm.com> wrote:
> >
> > This patch provides a mechanism for probe handling and
> > executing the user-specified handlers.
> > 
> > Each userspace probe is uniquely identified by the combination of
> > inode and offset, hence during registeration the inode and offset
> > combination is added to uprobes hash table. Initially when
> > breakpoint instruction is hit, the uprobes hash table is looked up
> > for matching inode and offset. The pre_handlers are called in
> > sequence if multiple probes are registered. Similar to kprobes,
> > uprobes also adopts to single step out-of-line, so that probe miss in
> > SMP environment can be avoided. But for userspace probes, instruction
> > copied into kernel address space cannot be single stepped, hence the
> > instruction must be copied to user address space. The solution is to
> > find free space in the current process address space and then copy the
> > original instruction and single step that instruction.
> > 
> > User processes use stack space to store local variables, agruments and
> > return values. Normally the stack space either below or above the
> > stack pointer indicates the free stack space.
> > 
> > The instruction to be single stepped can modify the stack space,
> > hence before using the free stack space, sufficient stack space should
> > be left. The instruction is copied to the bottom of the page and check
> > is made such that the copied instruction does not cross the page
> > boundry. The copied instruction is then single stepped. Several
> > architectures does not allow the instruction to be executed from the
> > stack location, since no-exec bit is set for the stack pages. In those
> > architectures, the page table entry corresponding to the stack page is
> > identified and the no-exec bit is unset making the instruction on that
> > stack page to be executed.
> > 
> > There are situations where even the free stack space is not enough for
> > the user instruction to be copied and single stepped. In such
> > situations, the virtual memory area(vma) can be expanded beyond the
> > current stack vma. This expaneded stack can be used to copy the
> > original instruction and single step out-of-line.
> > 
> > Even if the vma cannot be extended then the instruction much be
> > executed inline, by replacing the breakpoint instruction with original
> > instruction.
> > 
> > ...
> >
> > +
> > +/**
> > + * This routines get the pte of the page containing the specified address.
> > + */
> > +static pte_t  __kprobes *get_uprobe_pte(unsigned long address)
> > +{
> > +	pgd_t *pgd;
> > +	pud_t *pud;
> > +	pmd_t *pmd;
> > +	pte_t *pte = NULL;
> > +
> > +	pgd = pgd_offset(current->mm, address);
> > +	if (!pgd)
> > +		goto out;
> > +
> > +	pud = pud_offset(pgd, address);
> > +	if (!pud)
> > +		goto out;
> > +
> > +	pmd = pmd_offset(pud, address);
> > +	if (!pmd)
> > +		goto out;
> > +
> > +	pte = pte_alloc_map(current->mm, pmd, address);
> > +
> > +out:
> > +	return pte;
> > +}
> 
> That's familiar looking code..
> 
> I guess this should be given a more generic name then placed in
> mm/memory.c, which is where we do pagetable walking.

Agreed, I will send out a separate patch for the helpers.

> 
> > +/**
> > + *  This routine check for space in the current process's stack
> > + *  address space. If enough address space is found, copy the original
> > + *  instruction on that page for single stepping out-of-line.
> > + */
> > +static int __kprobes copy_insn_on_new_page(struct uprobe *uprobe ,
> > +			struct pt_regs *regs, struct vm_area_struct *vma)
> > +{
> > +	unsigned long addr, stack_addr = regs->esp;
> > +	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
> > +
> > +	if (vma->vm_flags & VM_GROWSDOWN) {
> > +		if (((stack_addr - sizeof(long long))) <
> > +						(vma->vm_start + size))
> > +			return -ENOMEM;
> > +		addr = vma->vm_start;
> > +	} else if (vma->vm_flags & VM_GROWSUP) {
> > +		if ((vma->vm_end - size) < (stack_addr + sizeof(long long)))
> > +			return -ENOMEM;
> > +		addr = vma->vm_end - size;
> > +	} else
> > +		return -EFAULT;
> > +
> > +	vma->vm_flags |= VM_LOCKED;
> > +
> > +	if (__copy_to_user_inatomic((unsigned long *)addr,
> > +				(unsigned long *)uprobe->kp.ainsn.insn, size))
> > +		return -EFAULT;
> > +
> > +	regs->eip = addr;
> > +
> > +	return 0;
> > +}
> 
> If we're going to use __copy_to_user_inatomic() then we'll need some nice
> comments explaining why this is happening.
> 
> And we'll need to actually *be* in-atomic.  That means we need an
> open-coded inc_preempt_count() and dec_preempt_count() in there and I don't
> see them.
> 

We come here, after probe is hit, through uporbe_handler() with
interrupts disabled (since it is a interrupt gate). In uprobe_handler()
preemption is disabled and remains disabled until original instruction
is single stepped.

I will add proper comments in next iteration.

> > +/**
> > + * This routine expands the stack beyond the present process address
> > + * space and copies the instruction to that location, so that
> > + * processor can single step out-of-line.
> > + */
> > +static int __kprobes copy_insn_onexpstack(struct uprobe *uprobe,
> > +			struct pt_regs *regs, struct vm_area_struct *vma)
> > +{
> > +	unsigned long addr, vm_addr;
> > +	int size = MAX_INSN_SIZE * sizeof(kprobe_opcode_t);
> > +	struct vm_area_struct *new_vma;
> > +	struct mm_struct *mm = current->mm;
> > +
> > +
> > +	 if (!down_read_trylock(&current->mm->mmap_sem))
> > +		 return -ENOMEM;
> > +
> > +	if (vma->vm_flags & VM_GROWSDOWN)
> > +		vm_addr = vma->vm_start - size;
> > +	else if (vma->vm_flags & VM_GROWSUP)
> > +		vm_addr = vma->vm_end + size;
> > +	else {
> > +		up_read(&current->mm->mmap_sem);
> > +		return -EFAULT;
> > +	}
> > +
> > +	new_vma = find_extend_vma(mm, vm_addr);
> > +	if (!new_vma) {
> > +		up_read(&current->mm->mmap_sem);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	if (new_vma->vm_flags & VM_GROWSDOWN)
> > +		addr = new_vma->vm_start;
> > +	else
> > +		addr = new_vma->vm_end - size;
> > +
> > +	new_vma->vm_flags |= VM_LOCKED;
> > +	up_read(&current->mm->mmap_sem);
> > +
> > +	if (__copy_to_user_inatomic((unsigned long *)addr,
> > +				(unsigned long *)uprobe->kp.ainsn.insn, size))
> > +		return -EFAULT;
> > +
> > +	regs->eip = addr;
> > +
> > +	return  0;
> > +}
> 
> Why is VM_LOCKED being set? (It needs a comment).
> 
> Where does it get unset?

As Arjan says, idea was to make copy_to_user_inatomic() succeed but
there is some issue here, I have to look at it again.


> > +
> > +		if (__copy_to_user_inatomic((unsigned long *)page_addr,
> > +								source, size))
> > +		if (__copy_to_user_inatomic(
> > +			(unsigned long *)(page_addr - size), source, size))
> 
> See above.
> 
> > +
> > +/**
> > + * This routines get the page containing the probe, maps it and
> > + * replaced the instruction at the probed address with specified
> > + * opcode.
> > + */
> > +void __kprobes replace_original_insn(struct uprobe *uprobe,
> > +				struct pt_regs *regs, kprobe_opcode_t opcode)
> > +{
> > +	kprobe_opcode_t *addr;
> > +	struct page *page;
> > +
> > +	page = find_get_page(uprobe->inode->i_mapping,
> > +					uprobe->offset >> PAGE_CACHE_SHIFT);
> > +	BUG_ON(!page);
> > +
> > +	__lock_page(page);
> 
> Whoa.  Why is __lock_page() being used here?  It looks like a bug is being
> covered up.
> 

we come here with a spinlock held. I will add the comment.

> > +	addr = (kprobe_opcode_t *)kmap_atomic(page, KM_USER1);
> > +	addr = (kprobe_opcode_t *)((unsigned long)addr +
> > +				 (unsigned long)(uprobe->offset & ~PAGE_MASK));
> > +	*addr = opcode;
> > +	/*TODO: flush vma ? */
> 
> flush_dcache_page() would be needed.
> 
> But then, what happens if the page is shared by other processes?  Do they
> all start taking debug traps?

Yes, you are right. I think single stepping inline was a bad idea, disarming
the probe looks to be a better option


> > +	kunmap_atomic(addr, KM_USER1);
> > +
> > +	unlock_page(page);
> > +
> > +	if (page)
> > +		page_cache_release(page);
> > +	regs->eip = (unsigned long)uprobe->kp.addr;
> > +}
> > +
> > +/**
> > + * This routine provides the functionality of single stepping
> > + * out-of-line. If single stepping out-of-line cannot be achieved,
> > + * it replaces with the original instruction allowing it to single
> > + * step inline.
> > + */
> > +static inline int prepare_singlestep_uprobe(struct uprobe *uprobe,
> > +				struct uprobe_ctlblk *ucb, struct pt_regs *regs)
> > +{
> > +	unsigned long stack_addr = regs->esp, flags;
> > +	struct vm_area_struct *vma = NULL;
> > +	int err = 0;
> > +
> > +	vma = find_vma(current->mm, (stack_addr & PAGE_MASK));
> 
> I don't think mmap_sem is held here?

Yes, this will be taken care.
> 
> > +static inline int uprobe_fault_handler(struct pt_regs *regs, int trapnr)
> 
> If, for some reason, the compiler decides to not inline this function then
> you have a hunk of code which isn't marked __kprobes, but it should be.
> 
> I'd suggest that you remove all inlining from this code and add the
> appropriate section markers.
> 
> Or I guess you could use __always_inline, but I'm not sure that it's really
> worth the fuss and obscurity of doing that.
> 
> All kprobes-related code should be audited for this problem.

Yes, I will audit it and send out a patch if necessary.

Thanks
Prasanna
-- 
Prasanna S Panchamukhi
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-51776329

  parent reply	other threads:[~2006-03-20 13:48 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-20  6:07 [0/3] Kprobes: User space probes support Prasanna S Panchamukhi
2006-03-20  6:09 ` [1/3 PATCH] Kprobes: User space probes support- base interface Prasanna S Panchamukhi
2006-03-20  6:10   ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Prasanna S Panchamukhi
2006-03-20  6:11     ` [3/3 PATCH] Kprobes: User space probes support- single stepping out-of-line Prasanna S Panchamukhi
2006-03-20 11:09       ` Andrew Morton
2006-03-20 11:24         ` Arjan van de Ven
2006-03-20 14:05           ` Prasanna S Panchamukhi
2006-03-20 14:13             ` Arjan van de Ven
2006-03-20 11:32         ` Nick Piggin
2006-03-20 13:52           ` Prasanna S Panchamukhi
2006-03-20 13:48         ` Prasanna S Panchamukhi [this message]
2006-03-20 20:40           ` Andrew Morton
2006-03-21  2:02             ` Prasanna S Panchamukhi
2006-03-21 10:05               ` Andrew Morton
2006-03-21 11:05                 ` Richard J Moore
2006-03-21 11:13                 ` Suparna Bhattacharya
2006-03-21 12:23                 ` Prasanna S Panchamukhi
2006-03-20 10:53     ` [2/3 PATCH] Kprobes: User space probes support- readpage hooks Andrew Morton
2006-03-20 13:48       ` Prasanna S Panchamukhi
2006-03-21  2:12         ` Andrew Morton
2006-03-21  9:14           ` Richard J Moore
2006-03-21 11:14             ` Christoph Hellwig
2006-03-21 11:38               ` Richard J Moore
2006-03-21 12:40               ` Theodore Ts'o
2006-03-21 11:28           ` Christoph Hellwig
2006-03-21 11:42             ` Richard J Moore
2006-03-21 12:15               ` Christoph Hellwig
2006-03-21 16:17                 ` Richard J Moore
2006-03-20 11:10     ` Andrew Morton
2006-03-20 13:59       ` Prasanna S Panchamukhi
2006-03-20 10:42   ` [1/3 PATCH] Kprobes: User space probes support- base interface Andrew Morton
2006-03-20 13:48     ` Prasanna S Panchamukhi
2006-03-21 11:39 ` [0/3] Kprobes: User space probes support Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060320134854.GG8662@in.ibm.com \
    --to=prasanna@in.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richardj_moore@uk.ibm.com \
    --cc=suparna@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox