From: Oleg Nesterov <oleg@redhat.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Masami Hiramatsu <mhiramat@redhat.com>,
Randy Dunlap <rdunlap@xenotime.net>,
Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
Jim Keniston <jkenisto@linux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
"Frank Ch. Eigler" <fche@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
Roland McGrath <roland@redhat.com>
Subject: Re: [PATCH v2 7/11] Uprobes Implementation
Date: Mon, 19 Apr 2010 21:31:39 +0200 [thread overview]
Message-ID: <20100419193139.GA24080@redhat.com> (raw)
In-Reply-To: <20100415093506.GA2064@linux.vnet.ibm.com>
Srikar, sorry for delay...
On 04/15, Srikar Dronamraju wrote:
>
> > > +static void cleanup_uprocess(struct uprobe_process *uproc,
> > > + struct task_struct *start)
> > > +{
> > > + struct task_struct *tsk = start;
> > > +
> > > + if (!start)
> > > + return;
> > > +
> > > + rcu_read_lock();
> > > + do {
> > > + if (tsk->utask) {
> > > + kfree(tsk->utask);
> > > + tsk->utask = NULL;
> > > + }
> > > + tsk = next_thread(tsk);
> >
> > This doesn't look right. We can't trust ->thread_group list even under
> > rcu_read_lock(). The task can exit and __exit_signal() can remove it
> > from ->thread_group list before we take rcu_read_lock().
>
> Oh Okay, I get that the thread could be exiting from the time we
> allocated the utask to the time we are cleaning up here and hence we
> could be leaking utask.
>
> Would it be okay if we explicitly (instead of the using
> tracehook_report_exit) call uprobe_free_utask() just after we set
> PF_EXITING. We could take the task_lock to synchronize with the
> add_utask() and do_exit().
Not sure I understand....
I meant, it is not safe to use next_thread(tsk) if tsk was already
removed from list by __unhash_process before we take rcu_read_lock().
> > > +static struct uprobe_task *add_utask(struct task_struct *t,
> > > + struct uprobe_process *uproc)
> > > +{
> > > + struct uprobe_task *utask;
> > > +
> > > + if (!t)
> > > + return NULL;
> > > + utask = kzalloc(sizeof *utask, GFP_USER);
> > > + if (unlikely(utask == NULL))
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + utask->uproc = uproc;
> > > + utask->active_ppt = NULL;
> > > + t->utask = utask;
> > > + atomic_inc(&uproc->refcount);
> > > +
> > > + return utask;
> > > +}
> >
> > This is called by create_uprocess(). Who will free t->utask if t has
> > already passed tracehook_report_exit() ?
>
> Okay.
> Would it work if we
>
> [... snip ...]
I think yes. But see below.
> > > + * Create and populate one utask per thread in this process. We
> > > + * can't call add_utask() while holding RCU lock, so we:
> > > + * 1. rcu_read_lock()
> > > + * 2. Find the next thread, add_me, in this process that's not
> > > + * having a utask struct allocated.
> > > + * 3. rcu_read_unlock()
> > > + * 4. add_utask(add_me, uproc)
> > > + * Repeat 1-4 'til we have utasks for all threads.
> > > + */
> > > + cur_t = get_pid_task(tg_leader, PIDTYPE_PID);
> > > + do {
> > > + utask = add_utask(cur_t, uproc);
> > > + if (IS_ERR(utask)) {
> > > + err = PTR_ERR(utask);
> > > + goto fail;
> > > + }
> > > + add_me = find_next_thread(uproc, cur_t);
> > > + put_task_struct(cur_t);
> > > + cur_t = add_me;
> > > + } while (add_me != NULL);
> >
> > can't we race with clone(CLONE_THREAD) and miss the new thread? Probably
> > I missed something, but afaics we need some barriers to ensure that either
> > tracehook_report_clone() sees current->utask != NULL or find_next_thread()
> > sees the new thread in ->thread_group.
>
> The tracehook_report_clone is called after the element gets added to the
> thread_group list in copy_process().
> Looking at three cases where current thread could be cloning a new thread.
>
> a) current thread has a utask and tracehook_report_clone is not yet
> called.
> utask for the new thread will be created by either
> tracehook_report_clone or the find_next_thread whichever comes first.
Yes, but my point was, we probably need mb's on both sides. Of course,
this is only theoretical problem, but tracehook_report_clone() can read
current->utask == NULL before the result of copy_process()->list_add_tail()
is visible to another CPU which does create_uprocess().
> > > +static struct pid *get_tg_leader(pid_t p)
> > > +{
> > > + struct pid *pid = NULL;
> > > +
> > > + rcu_read_lock();
> > > + if (current->nsproxy)
> > > + pid = find_vpid(p);
> >
> > Is it really possible to call register/unregister with nsproxy == NULL?
You didn't answer ;)
> Do you see a need for checking if the process is exiting before we place
> the probes?
Oh, I don't know. You are going to change this code anyway, I can't see
in advance.
I tried to read the next 8/11 patch, and I have a couple more random questions.
- uprobe_process->tg_leader is not really used ?
- looks like, 7/11 can't be compiled without the next 8/11 ?
say, the next patch defines arch_uprobe_disable_sstep() but
it is used by 7/11
- I don't understand why do we need uprobe_{en,dis}able_interrupts
helpers. pre_ssout() could just do local_irq_enable(). This path
leads to get_signal_to_deliver() which enables irqs anyway, it is
always safe to do this earlier and I don't think we need to disable
irqs again later. In any case, I don't understand why these helpers
use native_irq_xxx().
- pre_ssout() does .xol_vaddr = xol_get_insn_slot(). This looks a
bit confusing, xol_get_insn_slot() should set .xol_vaddr correctly
under lock.
- pre_ssout() does user_bkpt_set_ip() after user_bkpt_pre_sstep().
Why? Shouldn't user_bkpt_pre_sstep() always set regs->ip ?
Otherwise uprobe_bkpt_notifier()->user_bkpt_pre_sstep() is not
right.
- I don't really understand why ->handler_in_interrupt is really
useful, but never mind.
- However, handler_in_interrupt && !uses_xol_strategy() doesn't
look right. uprobe_bkpt_notifier() is called with irqs disabled,
right? but set_orig_insn() is might_sleep().
> > > + } else {
> > > + struct uprobe_probept *ppt;
> > > + int ret;
> > > +
> > > + /*
> > > + * New process spawned by parent. Remove the probepoints
> > > + * in the child's text.
> > > + *
> > > + * We also hold the uproc->mutex for the parent - so no
> > > + * new uprobes will be registered 'til we return.
> > > + */
> > > + mutex_lock(&uproc->mutex);
> > > + ctask = child->utask;
> > > + if (unlikely(ctask)) {
> > > + /*
> > > + * create_uprocess() ran just as this fork
> > > + * happened, and has already created a new utask.
> > > + */
> > > + mutex_unlock(&uproc->mutex);
> > > + return;
> > > + }
> > > + list_for_each_entry(ppt, &uproc->uprobe_list, ut_node) {
> > > + ret = user_bkpt_remove_bkpt(child, &ppt->user_bkpt);
> >
> > OK, iiuc this should restore the original instruction, right?
> >
> > But what about clone(CLONE_VM)? In this case this child shares ->mm with
> > parent.
>
> Okay, So I will remove the breakpoints only for ! CLONE(CLONE_VM) and
> !CLONE(CLONE_THREAD)
> For CLONE(CLONE_VM) I will create a new uproc and utask structures.
> Since mm is shared; I guess the XOL vma gets shared between the processes.
Yes, I think CLONE_VM without CLONE_THREAD needs utask too, but do we need
the new uproc? OK, please forget about this for the moment.
Suppose that register_uprobe() succeeds and does set_bkpt(). What if another
process (not sub-thread) with the same ->mm hits this bp? uprobe_bkpt_notifier()
will see ->utask == NULL and return 0. Then do_int3() sends SIGTRAP and kills
this task. OK, probably CLONE_VM alone is exotic, but CLONE_VFORK | VM is not.
I think uprobe_process should be per ->mm, not per-process.
I wonder if there any possibility to avoid task_struct->utask, or at least,
if we can allocate it in uprobe_bkpt_notifier() on demand. Not sure.
Oleg.
next prev parent reply other threads:[~2010-04-19 19:39 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-31 15:51 [PATCH v2 0/11] Uprobes patches Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 1/11] Move Macro W to insn.h Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 2/11] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 3/11] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 4/11] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 5/11] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 6/11] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 7/11] Uprobes Implementation Srikar Dronamraju
2010-04-13 18:35 ` Oleg Nesterov
2010-04-15 9:35 ` Srikar Dronamraju
2010-04-19 19:31 ` Oleg Nesterov [this message]
2010-04-20 12:43 ` Srikar Dronamraju
2010-04-20 15:30 ` Oleg Nesterov
2010-04-21 6:59 ` Srikar Dronamraju
2010-04-21 16:05 ` Oleg Nesterov
2010-04-22 13:31 ` Srikar Dronamraju
2010-04-22 15:40 ` Oleg Nesterov
2010-04-23 14:58 ` Srikar Dronamraju
2010-04-23 18:53 ` Oleg Nesterov
2010-05-11 20:47 ` Peter Zijlstra
2010-05-11 20:44 ` Peter Zijlstra
2010-05-11 20:45 ` Peter Zijlstra
2010-05-12 10:31 ` Srikar Dronamraju
2010-05-13 19:40 ` Oleg Nesterov
2010-05-13 19:59 ` Linus Torvalds
2010-05-13 22:12 ` Andi Kleen
2010-05-13 22:25 ` Linus Torvalds
2010-05-14 0:56 ` Roland McGrath
2010-05-14 5:42 ` Srikar Dronamraju
2010-05-11 20:43 ` Peter Zijlstra
2010-05-12 10:41 ` Srikar Dronamraju
2010-05-12 11:12 ` Peter Zijlstra
2010-05-12 14:24 ` Srikar Dronamraju
2010-05-11 20:32 ` Peter Zijlstra
2010-05-11 20:57 ` Frank Ch. Eigler
2010-05-11 21:01 ` Peter Zijlstra
2010-03-31 15:52 ` [PATCH v2 8/11] X86 details for uprobes Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 9/11] Uprobes Documentation patch Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 10/11] Uprobes samples Srikar Dronamraju
2010-03-31 15:53 ` [PATCH v2 11/11] Uprobes traceevents patch Srikar Dronamraju
2010-03-31 21:24 ` Steven Rostedt
2010-04-01 4:16 ` Masami Hiramatsu
2010-05-12 14:57 ` Frederic Weisbecker
2010-05-12 11:02 ` Frederic Weisbecker
2010-05-12 14:34 ` Srikar Dronamraju
2010-05-12 15:15 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100419193139.GA24080@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ananth@in.ibm.com \
--cc=fche@redhat.com \
--cc=fweisbec@gmail.com \
--cc=jkenisto@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@redhat.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rdunlap@xenotime.net \
--cc=roland@redhat.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).