Re: [PATCH v2 7/11] Uprobes Implementation

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Masami Hiramatsu <mhiramat@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Jim Keniston <jkenisto@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Roland McGrath <roland@redhat.com>
Subject: Re: [PATCH v2 7/11] Uprobes Implementation
Date: Wed, 21 Apr 2010 12:29:48 +0530	[thread overview]
Message-ID: <20100421065948.GA5440@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100420153023.GA9351@redhat.com>

* Oleg Nesterov <oleg@redhat.com> [2010-04-20 17:30:23]:

> On 04/20, Srikar Dronamraju wrote:
> >
> > > > > > +static void cleanup_uprocess(struct uprobe_process *uproc,
> > > > > > +					struct task_struct *start)
> > > > > > +		tsk = next_thread(tsk);
> > > > >
> > > I meant, it is not safe to use next_thread(tsk) if tsk was already
> > > removed from list by __unhash_process before we take rcu_read_lock().
> >
> > Okay, cleanup_process() gets called only and only if add_utask() fails
> > to allocated utask struct.
> 
> Yes, but afaics we have the same issues in find_next_thread() called
> by create_uprocess().

Okay.

> 
> > Based on your inputs I will synchronize
> > exit_signals() and uprobe_free_utask(). However it still can happen that
> > uprobe calls cleanup_uprocess() with reference to task struct which has just
> > called __unhash_process(). Is there a way out of this?
> 
> In this particular case, probably we can rely on uprobe_mutex. Currently
> cleanup_uprocess() is called with start == cur_t. Instead, we should use
> the last task on which add_utask() succeeded, it can't exit (assuming we
> fix other discussed races with exit) because uprobe_free_utask() takes
> this mutex too.
> 
> However, perhaps it is better to rework this all. Say, can't we move
> uprobe_free_utask() into __put_task_struct() ? Afaics, this can greatly
> simplify things. If we add mm_struct->uproc, then utask doesn't need
> the pointer to uprobe_process.

Okay. I will use mm_struct->uproc, dynamic allocation of utask on probe
hit and freeing of utask on __put_task_struct.

> 
> > > > > > +static struct pid *get_tg_leader(pid_t p)
> > > > > > +{
> > > > > > +	struct pid *pid = NULL;
> > > > > > +
> > > > > > +	rcu_read_lock();
> > > > > > +	if (current->nsproxy)
> > > > > > +		pid = find_vpid(p);
> > > > >
> > > > > Is it really possible to call register/unregister with nsproxy == NULL?
> > >
> > > You didn't answer ;)
> >
> > Can you please let me know when nsproxy is set to NULL?
> 
> exit_notify()->exit_task_namespaces()
> 
> > If we are sure
> > that register/unregister will be called with nsproxy set, then I am
> > happy to remove this check.
> 
> I think the exiting task shouldn't call register/unregister.

Okay I will remove the check for current->nsproxy being non-NULL.

> 
> > > 	- uprobe_process->tg_leader is not really used ?
> >
> > Currently we have a reference to pid struct from the time we created a
> > uprobe_process to the time we free the uprobe process.
> 
> Yes, but
> 
> > So are you
> > suggesting that we dont have a reference to the pid structure or is that
> > we dont need to cache the pid struct and access it thro
> > task_pid(current) in free_uprobes()?
> 
> I must have missed something. But I do not see where do we use
> uprobe_process->tg_leader. We never read it, apart from
> BUG_ON(uproc->tg_leader != tg_leader). No?

static int free_uprocess(struct uprobe_process *uproc)
{
	....
	put_pid(uproc->tg_leader);
        uproc->tg_leader = NULL;
	
}

> 
> > > 	- I don't understand why do we need uprobe_{en,dis}able_interrupts
> > > 	  helpers. pre_ssout() could just do local_irq_enable(). This path
> > > 	  leads to get_signal_to_deliver() which enables irqs anyway, it is
> > > 	  always safe to do this earlier and I don't think we need to disable
> > > 	  irqs again later. In any case, I don't understand why these helpers
> > > 	  use native_irq_xxx().
> >
> > On i686, (unlike x86_64), do_notify_resume() gets called with irqs
> > disabled.  I had tried local_irq_enable couple of times but that didnt
> > help probably because CONFIG_PARAVIRT is set in my .config and hence
> > raw_local_irq_enable resolves to
> >
> > static inline void raw_local_irq_enable(void)
> > {
> > 	PVOP_VCALLEE0(pv_irq_ops.irq_enable);
> > }
> >
> > What we need is the "sti" instruction.  It looks like local_irq_enable
> > actually doesnt do "sti".  So I had to go back to using
> > native_irq_enable().
> 
> Hmm. No, I can't explain this, I know nothing about paravirt. But this
> doesn't look right to me. Probably this should be discussed with paravirt
> developers...

Okay.

> 
> > > 	- pre_ssout() does .xol_vaddr = xol_get_insn_slot(). This looks a
> > > 	  bit confusing, xol_get_insn_slot() should set .xol_vaddr correctly
> > > 	  under lock.
> >
> > Can you please elaborate.
> 
> pre_ssout() does
> 
> 	if (!user_bkpt.xol_vaddr)
> 		user_bkpt.xol_vaddr = xol_get_insn_slot();
> 
> but it could just do
> 
> 	if (!user_bkpt.xol_vaddr)
> 		xol_get_insn_slot();
> 
> because xol_get_insn_slot() populates user_bkpt.xol_vaddr.

Agreed 

> 
> Btw. Why do we have the !CONFIG_USER_BKPT_XOL code in
> include/linux/user_bkpt_xol.h? CONFIG_UPROBES depends on CONFIG_USER_BKPT_XOL.

Okay we can remove the !CONFIG_USER_BKPT_XOL code in user_bkpt_xol.h

> 
> Also the declarations don't look nice... Probably I missed something,
> but why the code uses "void *" instead of "user_bkpt_xol_area *" for
> xol_area everywhere?
> 
> OK, even if "void *" makes sense for uproc->uprobe_process, why
> xol_alloc_area/xol_get_insn_slot/etc do not use "user_bkpt_xol_area *" ?
> 

user_bkpt_xol_area isn't exposed. This provides flexibility in changing
the algorithm for more efficient slot allocation. Currently we allocate
slots from just one page. Later on we could end-up having to allocate
from more than contiguous pages. There was some discussion about
allocating slots from TLS.  So there is more than one reason that
user_bkpt_xol can change. We could expose the struct and not access the
fields directly but that would be hard to enforce.

> > > 	- I don't really understand why ->handler_in_interrupt is really
> > > 	  useful, but never mind.
> >
> > There is a small overhead when running the handlers in task context.
> 
> Sure, but
> 
> > overhead of task over interrupt =  (1.016851 - .907400) = .109451 usec
> > % additional overhead = (.109451/.907400) * 100 = 12.062%
> 
> this overhead looks very minor. To me, it is better to simplify the
> code, at least in the first version.
> 
> That said, this is up to you, I am not asking you to remove this
> optimization. Just imho.


Okay. 

> 
> > > 	- However, handler_in_interrupt && !uses_xol_strategy() doesn't
> > > 	  look right. uprobe_bkpt_notifier() is called with irqs disabled,
> > > 	  right? but set_orig_insn() is might_sleep().
> > >
> >
> > 	Yes, Uprobes currently supports only xol strategy. I.e I have
> > dropped single stepping inline strategy for uprobes. Hence when
> > user_bkpt_pre_sstep gets called from uprobe_bkpt_notifier; we are sure
> > that it doesnt call set_orig_insn().
> 
> OK, thanks. Perhaps a small comment to explain this makes sense...

Okay. 

> 
> > > Suppose that register_uprobe() succeeds and does set_bkpt(). What if another
> > > process (not sub-thread) with the same ->mm hits this bp? uprobe_bkpt_notifier()
> > > will see ->utask == NULL and return 0. Then do_int3() sends SIGTRAP and kills
> > > this task. OK, probably CLONE_VM alone is exotic, but CLONE_VFORK | VM is not.
> > > ...
> > > I think uprobe_process should be per ->mm, not per-process.
> >
> > True, One possibility could be to move the uprobe_process structure to
> > mm_struct. But now sure if VM folks would be okay with that idea.
> 
> Yes, I was thinking about mm->struct->uproc too.
> 

Okay, I will try with mm_struct->uproc. 

> And, assuming we have this pointer in mm_struct:
> 
> > > I wonder if there any possibility to avoid task_struct->utask, or at least,
> > > if we can allocate it in uprobe_bkpt_notifier() on demand. Not sure.
> >
> > Except for the pointer to uprobe_process, all other fields in utask are
> > per task. This per task information is mostly used at probe hit. Hence
> > having it in task_struct makes it easily accessible.  Do you have other
> > ideas from where we could refer utask.
> 
> Well, we could add the list of uprobe_task's into uprobe_process, it
> represents the tasks "inside" the probe hit. But yes, this is not easy,
> lets forget this, at least for now.
> 
> > I did think about allocating a utask on the first hit of a breakpoint. However
> > there are couple of issues.
> >
> > 1. Uprobes needs access to uprobe_process to search the breakpoints
> > installed for that process. Currently we hang it out of utask.
> > 	However if uprobe_process is made a part of mm_struct, this would no
> > more be an issue.
> 
> Yes,
> 
> > 2. Currently when a breakpoint is hit, uprobes increments the refcount
> > for the corresponding probepoint, and sets active_ppt in the utask for
> > the current thread. This happens in interrupt context. Allocating utask
> > on first breakpoint hit for that thread; has to be handled in task
> > context.
> 
> we could use GFP_ATOMIC, but I agree, this is not nice.
> 
> > If the utask has to be allocated, then uprobes has to search
> > for the probepoint again in task context.
> > 	I dont think it would be an issue to search for the probepoint a
> > second time in the task context.
> 
> Agreed. Although we need the new TIF_ bit for tracehook_notify_resume(),
> it can't trust "if (current->utask...)" checks.

But do we need a new TIF bit? Can we just reuse the TIF_NOTIFY_RESUME
flag that we use now?

> 
> 
> Alternatively, without the "on demand" allocations, register_uprobe()
> has to find all threads which use the same ->mm and initialize ->utask.
> This is not easy.

Okay I will try the on demand allocations in the next iteration.

Thanks again for your detailed explainations and suggestions.

--
Thanks and Regards
Srikar

next prev parent reply	other threads:[~2010-04-21  6:59 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-31 15:51 [PATCH v2 0/11] Uprobes patches Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 1/11] Move Macro W to insn.h Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 2/11] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 3/11] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 4/11] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 5/11] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 6/11] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 7/11] Uprobes Implementation Srikar Dronamraju
2010-04-13 18:35   ` Oleg Nesterov
2010-04-15  9:35     ` Srikar Dronamraju
2010-04-19 19:31       ` Oleg Nesterov
2010-04-20 12:43         ` Srikar Dronamraju
2010-04-20 15:30           ` Oleg Nesterov
2010-04-21  6:59             ` Srikar Dronamraju [this message]
2010-04-21 16:05               ` Oleg Nesterov
2010-04-22 13:31                 ` Srikar Dronamraju
2010-04-22 15:40                   ` Oleg Nesterov
2010-04-23 14:58                     ` Srikar Dronamraju
2010-04-23 18:53                       ` Oleg Nesterov
2010-05-11 20:47                       ` Peter Zijlstra
2010-05-11 20:44                     ` Peter Zijlstra
2010-05-11 20:45                     ` Peter Zijlstra
2010-05-12 10:31                       ` Srikar Dronamraju
2010-05-13 19:40                       ` Oleg Nesterov
2010-05-13 19:59                         ` Linus Torvalds
2010-05-13 22:12                           ` Andi Kleen
2010-05-13 22:25                             ` Linus Torvalds
2010-05-14  0:56                           ` Roland McGrath
2010-05-14  5:42                           ` Srikar Dronamraju
2010-05-11 20:43                   ` Peter Zijlstra
2010-05-12 10:41                     ` Srikar Dronamraju
2010-05-12 11:12                       ` Peter Zijlstra
2010-05-12 14:24                         ` Srikar Dronamraju
2010-05-11 20:32             ` Peter Zijlstra
2010-05-11 20:57               ` Frank Ch. Eigler
2010-05-11 21:01                 ` Peter Zijlstra
2010-03-31 15:52 ` [PATCH v2 8/11] X86 details for uprobes Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 9/11] Uprobes Documentation patch Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 10/11] Uprobes samples Srikar Dronamraju
2010-03-31 15:53 ` [PATCH v2 11/11] Uprobes traceevents patch Srikar Dronamraju
2010-03-31 21:24   ` Steven Rostedt
2010-04-01  4:16     ` Masami Hiramatsu
2010-05-12 14:57       ` Frederic Weisbecker
2010-05-12 11:02   ` Frederic Weisbecker
2010-05-12 14:34     ` Srikar Dronamraju
2010-05-12 15:15   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100421065948.GA5440@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=fche@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=jkenisto@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@redhat.com \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@xenotime.net \
    --cc=roland@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).