All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Masami Hiramatsu <mhiramat@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Jim Keniston <jkenisto@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Roland McGrath <roland@redhat.com>
Subject: Re: [PATCH v2 7/11] Uprobes Implementation
Date: Mon, 19 Apr 2010 21:31:39 +0200	[thread overview]
Message-ID: <20100419193139.GA24080@redhat.com> (raw)
In-Reply-To: <20100415093506.GA2064@linux.vnet.ibm.com>

Srikar, sorry for delay...

On 04/15, Srikar Dronamraju wrote:
>
> > > +static void cleanup_uprocess(struct uprobe_process *uproc,
> > > +					struct task_struct *start)
> > > +{
> > > +	struct task_struct *tsk = start;
> > > +
> > > +	if (!start)
> > > +		return;
> > > +
> > > +	rcu_read_lock();
> > > +	do {
> > > +		if (tsk->utask) {
> > > +			kfree(tsk->utask);
> > > +			tsk->utask = NULL;
> > > +		}
> > > +		tsk = next_thread(tsk);
> >
> > This doesn't look right. We can't trust ->thread_group list even under
> > rcu_read_lock(). The task can exit and __exit_signal() can remove it
> > from ->thread_group list before we take rcu_read_lock().
>
> Oh Okay, I get that the thread could be exiting from the time we
> allocated the utask to the time we are cleaning up here and hence we
> could be leaking utask.
>
> Would it be okay if we explicitly (instead of the using
> tracehook_report_exit) call uprobe_free_utask() just after we set
> PF_EXITING. We could take the task_lock to synchronize with the
> add_utask() and do_exit().

Not sure I understand....

I meant, it is not safe to use next_thread(tsk) if tsk was already
removed from list by __unhash_process before we take rcu_read_lock().

> > > +static struct uprobe_task *add_utask(struct task_struct *t,
> > > +					struct uprobe_process *uproc)
> > > +{
> > > +	struct uprobe_task *utask;
> > > +
> > > +	if (!t)
> > > +		return NULL;
> > > +	utask = kzalloc(sizeof *utask, GFP_USER);
> > > +	if (unlikely(utask == NULL))
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	utask->uproc = uproc;
> > > +	utask->active_ppt = NULL;
> > > +	t->utask = utask;
> > > +	atomic_inc(&uproc->refcount);
> > > +
> > > +	return utask;
> > > +}
> >
> > This is called by create_uprocess(). Who will free t->utask if t has
> > already passed tracehook_report_exit() ?
>
> Okay.
> Would it work if we
>
> [... snip ...]

I think yes. But see below.

> > > +	 * Create and populate one utask per thread in this process.  We
> > > +	 * can't call add_utask() while holding RCU lock, so we:
> > > +	 *	1. rcu_read_lock()
> > > +	 *	2. Find the next thread, add_me, in this process that's not
> > > +	 *	having a utask struct allocated.
> > > +	 *	3. rcu_read_unlock()
> > > +	 *	4. add_utask(add_me, uproc)
> > > +	 *	Repeat 1-4 'til we have utasks for all threads.
> > > +	 */
> > > +	cur_t = get_pid_task(tg_leader, PIDTYPE_PID);
> > > +	do {
> > > +		utask = add_utask(cur_t, uproc);
> > > +		if (IS_ERR(utask)) {
> > > +			err = PTR_ERR(utask);
> > > +			goto fail;
> > > +		}
> > > +		add_me = find_next_thread(uproc, cur_t);
> > > +		put_task_struct(cur_t);
> > > +		cur_t = add_me;
> > > +	} while (add_me != NULL);
> >
> > can't we race with clone(CLONE_THREAD) and miss the new thread? Probably
> > I missed something, but afaics we need some barriers to ensure that either
> > tracehook_report_clone() sees current->utask != NULL or find_next_thread()
> > sees the new thread in ->thread_group.
>
> The tracehook_report_clone is called after the element gets added to the
> thread_group list in copy_process().
> Looking at three cases where current thread could be cloning a new thread.
>
> a) current thread has a utask and tracehook_report_clone is not yet
> called.
> 	utask for the new thread will be created by either
> tracehook_report_clone or the find_next_thread whichever comes first.

Yes, but my point was, we probably need mb's on both sides. Of course,
this is only theoretical problem, but tracehook_report_clone() can read
current->utask == NULL before the result of copy_process()->list_add_tail()
is visible to another CPU which does create_uprocess().

> > > +static struct pid *get_tg_leader(pid_t p)
> > > +{
> > > +	struct pid *pid = NULL;
> > > +
> > > +	rcu_read_lock();
> > > +	if (current->nsproxy)
> > > +		pid = find_vpid(p);
> >
> > Is it really possible to call register/unregister with nsproxy == NULL?

You didn't answer ;)

> Do you see a need for checking if the process is exiting before we place
> the probes?

Oh, I don't know. You are going to change this code anyway, I can't see
in advance.


I tried to read the next 8/11 patch, and I have a couple more random questions.

	- uprobe_process->tg_leader is not really used ?

	- looks like, 7/11 can't be compiled without the next 8/11 ?
	  say, the next patch defines arch_uprobe_disable_sstep() but
	  it is used by 7/11

	- I don't understand why do we need uprobe_{en,dis}able_interrupts
	  helpers. pre_ssout() could just do local_irq_enable(). This path
	  leads to get_signal_to_deliver() which enables irqs anyway, it is
	  always safe to do this earlier and I don't think we need to disable
	  irqs again later. In any case, I don't understand why these helpers
	  use native_irq_xxx().

	- pre_ssout() does .xol_vaddr = xol_get_insn_slot(). This looks a
	  bit confusing, xol_get_insn_slot() should set .xol_vaddr correctly
	  under lock.

	- pre_ssout() does user_bkpt_set_ip() after user_bkpt_pre_sstep().
	  Why? Shouldn't user_bkpt_pre_sstep() always set regs->ip ?
	  Otherwise uprobe_bkpt_notifier()->user_bkpt_pre_sstep() is not
	  right.

	- I don't really understand why ->handler_in_interrupt is really
	  useful, but never mind.

	- However, handler_in_interrupt && !uses_xol_strategy() doesn't
	  look right. uprobe_bkpt_notifier() is called with irqs disabled,
	  right? but set_orig_insn() is might_sleep().


> > > +	} else {
> > > +		struct uprobe_probept *ppt;
> > > +		int ret;
> > > +
> > > +		/*
> > > +		 * New process spawned by parent.  Remove the probepoints
> > > +		 * in the child's text.
> > > +		 *
> > > +		 * We also hold the uproc->mutex for the parent - so no
> > > +		 * new uprobes will be registered 'til we return.
> > > +		 */
> > > +		mutex_lock(&uproc->mutex);
> > > +		ctask = child->utask;
> > > +		if (unlikely(ctask)) {
> > > +			/*
> > > +			 * create_uprocess() ran just as this fork
> > > +			 * happened, and has already created a new utask.
> > > +			 */
> > > +			mutex_unlock(&uproc->mutex);
> > > +			return;
> > > +		}
> > > +		list_for_each_entry(ppt, &uproc->uprobe_list, ut_node) {
> > > +			ret = user_bkpt_remove_bkpt(child, &ppt->user_bkpt);
> >
> > OK, iiuc this should restore the original instruction, right?
> >
> > But what about clone(CLONE_VM)? In this case this child shares ->mm with
> > parent.
>
> Okay, So I will remove the breakpoints only for ! CLONE(CLONE_VM) and
> !CLONE(CLONE_THREAD)
> For CLONE(CLONE_VM) I will create a new uproc and utask structures.
> Since mm is shared; I guess the XOL vma gets shared between the processes.

Yes, I think CLONE_VM without CLONE_THREAD needs utask too, but do we need
the new uproc? OK, please forget about this for the moment.

Suppose that register_uprobe() succeeds and does set_bkpt(). What if another
process (not sub-thread) with the same ->mm hits this bp? uprobe_bkpt_notifier()
will see ->utask == NULL and return 0. Then do_int3() sends SIGTRAP and kills
this task. OK, probably CLONE_VM alone is exotic, but CLONE_VFORK | VM is not.

I think uprobe_process should be per ->mm, not per-process.

I wonder if there any possibility to avoid task_struct->utask, or at least,
if we can allocate it in uprobe_bkpt_notifier() on demand. Not sure.

Oleg.


  reply	other threads:[~2010-04-19 19:39 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-31 15:51 [PATCH v2 0/11] Uprobes patches Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 1/11] Move Macro W to insn.h Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 2/11] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 3/11] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 4/11] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 5/11] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 6/11] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 7/11] Uprobes Implementation Srikar Dronamraju
2010-04-13 18:35   ` Oleg Nesterov
2010-04-15  9:35     ` Srikar Dronamraju
2010-04-19 19:31       ` Oleg Nesterov [this message]
2010-04-20 12:43         ` Srikar Dronamraju
2010-04-20 15:30           ` Oleg Nesterov
2010-04-21  6:59             ` Srikar Dronamraju
2010-04-21 16:05               ` Oleg Nesterov
2010-04-22 13:31                 ` Srikar Dronamraju
2010-04-22 15:40                   ` Oleg Nesterov
2010-04-23 14:58                     ` Srikar Dronamraju
2010-04-23 18:53                       ` Oleg Nesterov
2010-05-11 20:47                       ` Peter Zijlstra
2010-05-11 20:44                     ` Peter Zijlstra
2010-05-11 20:45                     ` Peter Zijlstra
2010-05-12 10:31                       ` Srikar Dronamraju
2010-05-13 19:40                       ` Oleg Nesterov
2010-05-13 19:59                         ` Linus Torvalds
2010-05-13 22:12                           ` Andi Kleen
2010-05-13 22:25                             ` Linus Torvalds
2010-05-14  0:56                           ` Roland McGrath
2010-05-14  5:42                           ` Srikar Dronamraju
2010-05-11 20:43                   ` Peter Zijlstra
2010-05-12 10:41                     ` Srikar Dronamraju
2010-05-12 11:12                       ` Peter Zijlstra
2010-05-12 14:24                         ` Srikar Dronamraju
2010-05-11 20:32             ` Peter Zijlstra
2010-05-11 20:57               ` Frank Ch. Eigler
2010-05-11 21:01                 ` Peter Zijlstra
2010-03-31 15:52 ` [PATCH v2 8/11] X86 details for uprobes Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 9/11] Uprobes Documentation patch Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 10/11] Uprobes samples Srikar Dronamraju
2010-03-31 15:53 ` [PATCH v2 11/11] Uprobes traceevents patch Srikar Dronamraju
2010-03-31 21:24   ` Steven Rostedt
2010-04-01  4:16     ` Masami Hiramatsu
2010-05-12 14:57       ` Frederic Weisbecker
2010-05-12 11:02   ` Frederic Weisbecker
2010-05-12 14:34     ` Srikar Dronamraju
2010-05-12 15:15   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100419193139.GA24080@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=fche@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=jkenisto@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@redhat.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rdunlap@xenotime.net \
    --cc=roland@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.