From: Oleg Nesterov <oleg@redhat.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Masami Hiramatsu <mhiramat@redhat.com>,
Randy Dunlap <rdunlap@xenotime.net>,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
Jim Keniston <jkenisto@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"Frank Ch. Eigler" <fche@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Roland McGrath <roland@redhat.com>
Subject: Re: [PATCH v2 7/11] Uprobes Implementation
Date: Tue, 20 Apr 2010 17:30:23 +0200 [thread overview]
Message-ID: <20100420153023.GA9351@redhat.com> (raw)
In-Reply-To: <20100420124358.GA20675@linux.vnet.ibm.com>
On 04/20, Srikar Dronamraju wrote:
>
> > > > > +static void cleanup_uprocess(struct uprobe_process *uproc,
> > > > > + struct task_struct *start)
> > > > > + tsk = next_thread(tsk);
> > > >
> > I meant, it is not safe to use next_thread(tsk) if tsk was already
> > removed from list by __unhash_process before we take rcu_read_lock().
>
> Okay, cleanup_process() gets called only and only if add_utask() fails
> to allocated utask struct.
Yes, but afaics we have the same issues in find_next_thread() called
by create_uprocess().
> Based on your inputs I will synchronize
> exit_signals() and uprobe_free_utask(). However it still can happen that
> uprobe calls cleanup_uprocess() with reference to task struct which has just
> called __unhash_process(). Is there a way out of this?
In this particular case, probably we can rely on uprobe_mutex. Currently
cleanup_uprocess() is called with start == cur_t. Instead, we should use
the last task on which add_utask() succeeded, it can't exit (assuming we
fix other discussed races with exit) because uprobe_free_utask() takes
this mutex too.
However, perhaps it is better to rework this all. Say, can't we move
uprobe_free_utask() into __put_task_struct() ? Afaics, this can greatly
simplify things. If we add mm_struct->uproc, then utask doesn't need
the pointer to uprobe_process.
> > > > > +static struct pid *get_tg_leader(pid_t p)
> > > > > +{
> > > > > + struct pid *pid = NULL;
> > > > > +
> > > > > + rcu_read_lock();
> > > > > + if (current->nsproxy)
> > > > > + pid = find_vpid(p);
> > > >
> > > > Is it really possible to call register/unregister with nsproxy == NULL?
> >
> > You didn't answer ;)
>
> Can you please let me know when nsproxy is set to NULL?
exit_notify()->exit_task_namespaces()
> If we are sure
> that register/unregister will be called with nsproxy set, then I am
> happy to remove this check.
I think the exiting task shouldn't call register/unregister.
> > - uprobe_process->tg_leader is not really used ?
>
> Currently we have a reference to pid struct from the time we created a
> uprobe_process to the time we free the uprobe process.
Yes, but
> So are you
> suggesting that we dont have a reference to the pid structure or is that
> we dont need to cache the pid struct and access it thro
> task_pid(current) in free_uprobes()?
I must have missed something. But I do not see where do we use
uprobe_process->tg_leader. We never read it, apart from
BUG_ON(uproc->tg_leader != tg_leader). No?
> > - I don't understand why do we need uprobe_{en,dis}able_interrupts
> > helpers. pre_ssout() could just do local_irq_enable(). This path
> > leads to get_signal_to_deliver() which enables irqs anyway, it is
> > always safe to do this earlier and I don't think we need to disable
> > irqs again later. In any case, I don't understand why these helpers
> > use native_irq_xxx().
>
> On i686, (unlike x86_64), do_notify_resume() gets called with irqs
> disabled. I had tried local_irq_enable couple of times but that didnt
> help probably because CONFIG_PARAVIRT is set in my .config and hence
> raw_local_irq_enable resolves to
>
> static inline void raw_local_irq_enable(void)
> {
> PVOP_VCALLEE0(pv_irq_ops.irq_enable);
> }
>
> What we need is the "sti" instruction. It looks like local_irq_enable
> actually doesnt do "sti". So I had to go back to using
> native_irq_enable().
Hmm. No, I can't explain this, I know nothing about paravirt. But this
doesn't look right to me. Probably this should be discussed with paravirt
developers...
> > - pre_ssout() does .xol_vaddr = xol_get_insn_slot(). This looks a
> > bit confusing, xol_get_insn_slot() should set .xol_vaddr correctly
> > under lock.
>
> Can you please elaborate.
pre_ssout() does
if (!user_bkpt.xol_vaddr)
user_bkpt.xol_vaddr = xol_get_insn_slot();
but it could just do
if (!user_bkpt.xol_vaddr)
xol_get_insn_slot();
because xol_get_insn_slot() populates user_bkpt.xol_vaddr.
Btw. Why do we have the !CONFIG_USER_BKPT_XOL code in
include/linux/user_bkpt_xol.h? CONFIG_UPROBES depends on CONFIG_USER_BKPT_XOL.
Also the declarations don't look nice... Probably I missed something,
but why the code uses "void *" instead of "user_bkpt_xol_area *" for
xol_area everywhere?
OK, even if "void *" makes sense for uproc->uprobe_process, why
xol_alloc_area/xol_get_insn_slot/etc do not use "user_bkpt_xol_area *" ?
> > - I don't really understand why ->handler_in_interrupt is really
> > useful, but never mind.
>
> There is a small overhead when running the handlers in task context.
Sure, but
> overhead of task over interrupt = (1.016851 - .907400) = .109451 usec
> % additional overhead = (.109451/.907400) * 100 = 12.062%
this overhead looks very minor. To me, it is better to simplify the
code, at least in the first version.
That said, this is up to you, I am not asking you to remove this
optimization. Just imho.
> > - However, handler_in_interrupt && !uses_xol_strategy() doesn't
> > look right. uprobe_bkpt_notifier() is called with irqs disabled,
> > right? but set_orig_insn() is might_sleep().
> >
>
> Yes, Uprobes currently supports only xol strategy. I.e I have
> dropped single stepping inline strategy for uprobes. Hence when
> user_bkpt_pre_sstep gets called from uprobe_bkpt_notifier; we are sure
> that it doesnt call set_orig_insn().
OK, thanks. Perhaps a small comment to explain this makes sense...
> > Suppose that register_uprobe() succeeds and does set_bkpt(). What if another
> > process (not sub-thread) with the same ->mm hits this bp? uprobe_bkpt_notifier()
> > will see ->utask == NULL and return 0. Then do_int3() sends SIGTRAP and kills
> > this task. OK, probably CLONE_VM alone is exotic, but CLONE_VFORK | VM is not.
> > ...
> > I think uprobe_process should be per ->mm, not per-process.
>
> True, One possibility could be to move the uprobe_process structure to
> mm_struct. But now sure if VM folks would be okay with that idea.
Yes, I was thinking about mm->struct->uproc too.
And, assuming we have this pointer in mm_struct:
> > I wonder if there any possibility to avoid task_struct->utask, or at least,
> > if we can allocate it in uprobe_bkpt_notifier() on demand. Not sure.
>
> Except for the pointer to uprobe_process, all other fields in utask are
> per task. This per task information is mostly used at probe hit. Hence
> having it in task_struct makes it easily accessible. Do you have other
> ideas from where we could refer utask.
Well, we could add the list of uprobe_task's into uprobe_process, it
represents the tasks "inside" the probe hit. But yes, this is not easy,
lets forget this, at least for now.
> I did think about allocating a utask on the first hit of a breakpoint. However
> there are couple of issues.
>
> 1. Uprobes needs access to uprobe_process to search the breakpoints
> installed for that process. Currently we hang it out of utask.
> However if uprobe_process is made a part of mm_struct, this would no
> more be an issue.
Yes,
> 2. Currently when a breakpoint is hit, uprobes increments the refcount
> for the corresponding probepoint, and sets active_ppt in the utask for
> the current thread. This happens in interrupt context. Allocating utask
> on first breakpoint hit for that thread; has to be handled in task
> context.
we could use GFP_ATOMIC, but I agree, this is not nice.
> If the utask has to be allocated, then uprobes has to search
> for the probepoint again in task context.
> I dont think it would be an issue to search for the probepoint a
> second time in the task context.
Agreed. Although we need the new TIF_ bit for tracehook_notify_resume(),
it can't trust "if (current->utask...)" checks.
Alternatively, without the "on demand" allocations, register_uprobe()
has to find all threads which use the same ->mm and initialize ->utask.
This is not easy.
Oleg.
next prev parent reply other threads:[~2010-04-20 15:33 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-31 15:51 [PATCH v2 0/11] Uprobes patches Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 1/11] Move Macro W to insn.h Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 2/11] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 3/11] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 4/11] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 5/11] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 6/11] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 7/11] Uprobes Implementation Srikar Dronamraju
2010-04-13 18:35 ` Oleg Nesterov
2010-04-15 9:35 ` Srikar Dronamraju
2010-04-19 19:31 ` Oleg Nesterov
2010-04-20 12:43 ` Srikar Dronamraju
2010-04-20 15:30 ` Oleg Nesterov [this message]
2010-04-21 6:59 ` Srikar Dronamraju
2010-04-21 16:05 ` Oleg Nesterov
2010-04-22 13:31 ` Srikar Dronamraju
2010-04-22 15:40 ` Oleg Nesterov
2010-04-23 14:58 ` Srikar Dronamraju
2010-04-23 18:53 ` Oleg Nesterov
2010-05-11 20:47 ` Peter Zijlstra
2010-05-11 20:44 ` Peter Zijlstra
2010-05-11 20:45 ` Peter Zijlstra
2010-05-12 10:31 ` Srikar Dronamraju
2010-05-13 19:40 ` Oleg Nesterov
2010-05-13 19:59 ` Linus Torvalds
2010-05-13 22:12 ` Andi Kleen
2010-05-13 22:25 ` Linus Torvalds
2010-05-14 0:56 ` Roland McGrath
2010-05-14 5:42 ` Srikar Dronamraju
2010-05-11 20:43 ` Peter Zijlstra
2010-05-12 10:41 ` Srikar Dronamraju
2010-05-12 11:12 ` Peter Zijlstra
2010-05-12 14:24 ` Srikar Dronamraju
2010-05-11 20:32 ` Peter Zijlstra
2010-05-11 20:57 ` Frank Ch. Eigler
2010-05-11 21:01 ` Peter Zijlstra
2010-03-31 15:52 ` [PATCH v2 8/11] X86 details for uprobes Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 9/11] Uprobes Documentation patch Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 10/11] Uprobes samples Srikar Dronamraju
2010-03-31 15:53 ` [PATCH v2 11/11] Uprobes traceevents patch Srikar Dronamraju
2010-03-31 21:24 ` Steven Rostedt
2010-04-01 4:16 ` Masami Hiramatsu
2010-05-12 14:57 ` Frederic Weisbecker
2010-05-12 11:02 ` Frederic Weisbecker
2010-05-12 14:34 ` Srikar Dronamraju
2010-05-12 15:15 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100420153023.GA9351@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ananth@in.ibm.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=jkenisto@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@redhat.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rdunlap@xenotime.net \
--cc=roland@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.