From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Masami Hiramatsu <mhiramat@redhat.com>,
Randy Dunlap <rdunlap@xenotime.net>,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
Jim Keniston <jkenisto@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"Frank Ch. Eigler" <fche@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Roland McGrath <roland@redhat.com>
Subject: Re: [PATCH v2 7/11] Uprobes Implementation
Date: Wed, 21 Apr 2010 12:29:48 +0530 [thread overview]
Message-ID: <20100421065948.GA5440@linux.vnet.ibm.com> (raw)
In-Reply-To: <20100420153023.GA9351@redhat.com>
* Oleg Nesterov <oleg@redhat.com> [2010-04-20 17:30:23]:
> On 04/20, Srikar Dronamraju wrote:
> >
> > > > > > +static void cleanup_uprocess(struct uprobe_process *uproc,
> > > > > > + struct task_struct *start)
> > > > > > + tsk = next_thread(tsk);
> > > > >
> > > I meant, it is not safe to use next_thread(tsk) if tsk was already
> > > removed from list by __unhash_process before we take rcu_read_lock().
> >
> > Okay, cleanup_process() gets called only and only if add_utask() fails
> > to allocated utask struct.
>
> Yes, but afaics we have the same issues in find_next_thread() called
> by create_uprocess().
Okay.
>
> > Based on your inputs I will synchronize
> > exit_signals() and uprobe_free_utask(). However it still can happen that
> > uprobe calls cleanup_uprocess() with reference to task struct which has just
> > called __unhash_process(). Is there a way out of this?
>
> In this particular case, probably we can rely on uprobe_mutex. Currently
> cleanup_uprocess() is called with start == cur_t. Instead, we should use
> the last task on which add_utask() succeeded, it can't exit (assuming we
> fix other discussed races with exit) because uprobe_free_utask() takes
> this mutex too.
>
> However, perhaps it is better to rework this all. Say, can't we move
> uprobe_free_utask() into __put_task_struct() ? Afaics, this can greatly
> simplify things. If we add mm_struct->uproc, then utask doesn't need
> the pointer to uprobe_process.
Okay. I will use mm_struct->uproc, dynamic allocation of utask on probe
hit and freeing of utask on __put_task_struct.
>
> > > > > > +static struct pid *get_tg_leader(pid_t p)
> > > > > > +{
> > > > > > + struct pid *pid = NULL;
> > > > > > +
> > > > > > + rcu_read_lock();
> > > > > > + if (current->nsproxy)
> > > > > > + pid = find_vpid(p);
> > > > >
> > > > > Is it really possible to call register/unregister with nsproxy == NULL?
> > >
> > > You didn't answer ;)
> >
> > Can you please let me know when nsproxy is set to NULL?
>
> exit_notify()->exit_task_namespaces()
>
> > If we are sure
> > that register/unregister will be called with nsproxy set, then I am
> > happy to remove this check.
>
> I think the exiting task shouldn't call register/unregister.
Okay I will remove the check for current->nsproxy being non-NULL.
>
> > > - uprobe_process->tg_leader is not really used ?
> >
> > Currently we have a reference to pid struct from the time we created a
> > uprobe_process to the time we free the uprobe process.
>
> Yes, but
>
> > So are you
> > suggesting that we dont have a reference to the pid structure or is that
> > we dont need to cache the pid struct and access it thro
> > task_pid(current) in free_uprobes()?
>
> I must have missed something. But I do not see where do we use
> uprobe_process->tg_leader. We never read it, apart from
> BUG_ON(uproc->tg_leader != tg_leader). No?
static int free_uprocess(struct uprobe_process *uproc)
{
....
put_pid(uproc->tg_leader);
uproc->tg_leader = NULL;
}
>
> > > - I don't understand why do we need uprobe_{en,dis}able_interrupts
> > > helpers. pre_ssout() could just do local_irq_enable(). This path
> > > leads to get_signal_to_deliver() which enables irqs anyway, it is
> > > always safe to do this earlier and I don't think we need to disable
> > > irqs again later. In any case, I don't understand why these helpers
> > > use native_irq_xxx().
> >
> > On i686, (unlike x86_64), do_notify_resume() gets called with irqs
> > disabled. I had tried local_irq_enable couple of times but that didnt
> > help probably because CONFIG_PARAVIRT is set in my .config and hence
> > raw_local_irq_enable resolves to
> >
> > static inline void raw_local_irq_enable(void)
> > {
> > PVOP_VCALLEE0(pv_irq_ops.irq_enable);
> > }
> >
> > What we need is the "sti" instruction. It looks like local_irq_enable
> > actually doesnt do "sti". So I had to go back to using
> > native_irq_enable().
>
> Hmm. No, I can't explain this, I know nothing about paravirt. But this
> doesn't look right to me. Probably this should be discussed with paravirt
> developers...
Okay.
>
> > > - pre_ssout() does .xol_vaddr = xol_get_insn_slot(). This looks a
> > > bit confusing, xol_get_insn_slot() should set .xol_vaddr correctly
> > > under lock.
> >
> > Can you please elaborate.
>
> pre_ssout() does
>
> if (!user_bkpt.xol_vaddr)
> user_bkpt.xol_vaddr = xol_get_insn_slot();
>
> but it could just do
>
> if (!user_bkpt.xol_vaddr)
> xol_get_insn_slot();
>
> because xol_get_insn_slot() populates user_bkpt.xol_vaddr.
Agreed
>
> Btw. Why do we have the !CONFIG_USER_BKPT_XOL code in
> include/linux/user_bkpt_xol.h? CONFIG_UPROBES depends on CONFIG_USER_BKPT_XOL.
Okay we can remove the !CONFIG_USER_BKPT_XOL code in user_bkpt_xol.h
>
> Also the declarations don't look nice... Probably I missed something,
> but why the code uses "void *" instead of "user_bkpt_xol_area *" for
> xol_area everywhere?
>
> OK, even if "void *" makes sense for uproc->uprobe_process, why
> xol_alloc_area/xol_get_insn_slot/etc do not use "user_bkpt_xol_area *" ?
>
user_bkpt_xol_area isn't exposed. This provides flexibility in changing
the algorithm for more efficient slot allocation. Currently we allocate
slots from just one page. Later on we could end-up having to allocate
from more than contiguous pages. There was some discussion about
allocating slots from TLS. So there is more than one reason that
user_bkpt_xol can change. We could expose the struct and not access the
fields directly but that would be hard to enforce.
> > > - I don't really understand why ->handler_in_interrupt is really
> > > useful, but never mind.
> >
> > There is a small overhead when running the handlers in task context.
>
> Sure, but
>
> > overhead of task over interrupt = (1.016851 - .907400) = .109451 usec
> > % additional overhead = (.109451/.907400) * 100 = 12.062%
>
> this overhead looks very minor. To me, it is better to simplify the
> code, at least in the first version.
>
> That said, this is up to you, I am not asking you to remove this
> optimization. Just imho.
Okay.
>
> > > - However, handler_in_interrupt && !uses_xol_strategy() doesn't
> > > look right. uprobe_bkpt_notifier() is called with irqs disabled,
> > > right? but set_orig_insn() is might_sleep().
> > >
> >
> > Yes, Uprobes currently supports only xol strategy. I.e I have
> > dropped single stepping inline strategy for uprobes. Hence when
> > user_bkpt_pre_sstep gets called from uprobe_bkpt_notifier; we are sure
> > that it doesnt call set_orig_insn().
>
> OK, thanks. Perhaps a small comment to explain this makes sense...
Okay.
>
> > > Suppose that register_uprobe() succeeds and does set_bkpt(). What if another
> > > process (not sub-thread) with the same ->mm hits this bp? uprobe_bkpt_notifier()
> > > will see ->utask == NULL and return 0. Then do_int3() sends SIGTRAP and kills
> > > this task. OK, probably CLONE_VM alone is exotic, but CLONE_VFORK | VM is not.
> > > ...
> > > I think uprobe_process should be per ->mm, not per-process.
> >
> > True, One possibility could be to move the uprobe_process structure to
> > mm_struct. But now sure if VM folks would be okay with that idea.
>
> Yes, I was thinking about mm->struct->uproc too.
>
Okay, I will try with mm_struct->uproc.
> And, assuming we have this pointer in mm_struct:
>
> > > I wonder if there any possibility to avoid task_struct->utask, or at least,
> > > if we can allocate it in uprobe_bkpt_notifier() on demand. Not sure.
> >
> > Except for the pointer to uprobe_process, all other fields in utask are
> > per task. This per task information is mostly used at probe hit. Hence
> > having it in task_struct makes it easily accessible. Do you have other
> > ideas from where we could refer utask.
>
> Well, we could add the list of uprobe_task's into uprobe_process, it
> represents the tasks "inside" the probe hit. But yes, this is not easy,
> lets forget this, at least for now.
>
> > I did think about allocating a utask on the first hit of a breakpoint. However
> > there are couple of issues.
> >
> > 1. Uprobes needs access to uprobe_process to search the breakpoints
> > installed for that process. Currently we hang it out of utask.
> > However if uprobe_process is made a part of mm_struct, this would no
> > more be an issue.
>
> Yes,
>
> > 2. Currently when a breakpoint is hit, uprobes increments the refcount
> > for the corresponding probepoint, and sets active_ppt in the utask for
> > the current thread. This happens in interrupt context. Allocating utask
> > on first breakpoint hit for that thread; has to be handled in task
> > context.
>
> we could use GFP_ATOMIC, but I agree, this is not nice.
>
> > If the utask has to be allocated, then uprobes has to search
> > for the probepoint again in task context.
> > I dont think it would be an issue to search for the probepoint a
> > second time in the task context.
>
> Agreed. Although we need the new TIF_ bit for tracehook_notify_resume(),
> it can't trust "if (current->utask...)" checks.
But do we need a new TIF bit? Can we just reuse the TIF_NOTIFY_RESUME
flag that we use now?
>
>
> Alternatively, without the "on demand" allocations, register_uprobe()
> has to find all threads which use the same ->mm and initialize ->utask.
> This is not easy.
Okay I will try the on demand allocations in the next iteration.
Thanks again for your detailed explainations and suggestions.
--
Thanks and Regards
Srikar
next prev parent reply other threads:[~2010-04-21 6:59 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-31 15:51 [PATCH v2 0/11] Uprobes patches Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 1/11] Move Macro W to insn.h Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 2/11] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 3/11] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 4/11] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 5/11] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 6/11] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 7/11] Uprobes Implementation Srikar Dronamraju
2010-04-13 18:35 ` Oleg Nesterov
2010-04-15 9:35 ` Srikar Dronamraju
2010-04-19 19:31 ` Oleg Nesterov
2010-04-20 12:43 ` Srikar Dronamraju
2010-04-20 15:30 ` Oleg Nesterov
2010-04-21 6:59 ` Srikar Dronamraju [this message]
2010-04-21 16:05 ` Oleg Nesterov
2010-04-22 13:31 ` Srikar Dronamraju
2010-04-22 15:40 ` Oleg Nesterov
2010-04-23 14:58 ` Srikar Dronamraju
2010-04-23 18:53 ` Oleg Nesterov
2010-05-11 20:47 ` Peter Zijlstra
2010-05-11 20:44 ` Peter Zijlstra
2010-05-11 20:45 ` Peter Zijlstra
2010-05-12 10:31 ` Srikar Dronamraju
2010-05-13 19:40 ` Oleg Nesterov
2010-05-13 19:59 ` Linus Torvalds
2010-05-13 22:12 ` Andi Kleen
2010-05-13 22:25 ` Linus Torvalds
2010-05-14 0:56 ` Roland McGrath
2010-05-14 5:42 ` Srikar Dronamraju
2010-05-11 20:43 ` Peter Zijlstra
2010-05-12 10:41 ` Srikar Dronamraju
2010-05-12 11:12 ` Peter Zijlstra
2010-05-12 14:24 ` Srikar Dronamraju
2010-05-11 20:32 ` Peter Zijlstra
2010-05-11 20:57 ` Frank Ch. Eigler
2010-05-11 21:01 ` Peter Zijlstra
2010-03-31 15:52 ` [PATCH v2 8/11] X86 details for uprobes Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 9/11] Uprobes Documentation patch Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 10/11] Uprobes samples Srikar Dronamraju
2010-03-31 15:53 ` [PATCH v2 11/11] Uprobes traceevents patch Srikar Dronamraju
2010-03-31 21:24 ` Steven Rostedt
2010-04-01 4:16 ` Masami Hiramatsu
2010-05-12 14:57 ` Frederic Weisbecker
2010-05-12 11:02 ` Frederic Weisbecker
2010-05-12 14:34 ` Srikar Dronamraju
2010-05-12 15:15 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100421065948.GA5440@linux.vnet.ibm.com \
--to=srikar@linux.vnet.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=ananth@in.ibm.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=jkenisto@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@redhat.com \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rdunlap@xenotime.net \
--cc=roland@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.