linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Masami Hiramatsu <mhiramat@redhat.com>,
	Randy Dunlap <rdunlap@xenotime.net>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Jim Keniston <jkenisto@linux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Roland McGrath <roland@redhat.com>
Subject: Re: [PATCH v2 7/11] Uprobes Implementation
Date: Mon, 19 Apr 2010 21:31:39 +0200	[thread overview]
Message-ID: <20100419193139.GA24080@redhat.com> (raw)
In-Reply-To: <20100415093506.GA2064@linux.vnet.ibm.com>

Srikar, sorry for delay...

On 04/15, Srikar Dronamraju wrote:
>
> > > +static void cleanup_uprocess(struct uprobe_process *uproc,
> > > +					struct task_struct *start)
> > > +{
> > > +	struct task_struct *tsk = start;
> > > +
> > > +	if (!start)
> > > +		return;
> > > +
> > > +	rcu_read_lock();
> > > +	do {
> > > +		if (tsk->utask) {
> > > +			kfree(tsk->utask);
> > > +			tsk->utask = NULL;
> > > +		}
> > > +		tsk = next_thread(tsk);
> >
> > This doesn't look right. We can't trust ->thread_group list even under
> > rcu_read_lock(). The task can exit and __exit_signal() can remove it
> > from ->thread_group list before we take rcu_read_lock().
>
> Oh Okay, I get that the thread could be exiting from the time we
> allocated the utask to the time we are cleaning up here and hence we
> could be leaking utask.
>
> Would it be okay if we explicitly (instead of the using
> tracehook_report_exit) call uprobe_free_utask() just after we set
> PF_EXITING. We could take the task_lock to synchronize with the
> add_utask() and do_exit().

Not sure I understand....

I meant, it is not safe to use next_thread(tsk) if tsk was already
removed from list by __unhash_process before we take rcu_read_lock().

> > > +static struct uprobe_task *add_utask(struct task_struct *t,
> > > +					struct uprobe_process *uproc)
> > > +{
> > > +	struct uprobe_task *utask;
> > > +
> > > +	if (!t)
> > > +		return NULL;
> > > +	utask = kzalloc(sizeof *utask, GFP_USER);
> > > +	if (unlikely(utask == NULL))
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	utask->uproc = uproc;
> > > +	utask->active_ppt = NULL;
> > > +	t->utask = utask;
> > > +	atomic_inc(&uproc->refcount);
> > > +
> > > +	return utask;
> > > +}
> >
> > This is called by create_uprocess(). Who will free t->utask if t has
> > already passed tracehook_report_exit() ?
>
> Okay.
> Would it work if we
>
> [... snip ...]

I think yes. But see below.

> > > +	 * Create and populate one utask per thread in this process.  We
> > > +	 * can't call add_utask() while holding RCU lock, so we:
> > > +	 *	1. rcu_read_lock()
> > > +	 *	2. Find the next thread, add_me, in this process that's not
> > > +	 *	having a utask struct allocated.
> > > +	 *	3. rcu_read_unlock()
> > > +	 *	4. add_utask(add_me, uproc)
> > > +	 *	Repeat 1-4 'til we have utasks for all threads.
> > > +	 */
> > > +	cur_t = get_pid_task(tg_leader, PIDTYPE_PID);
> > > +	do {
> > > +		utask = add_utask(cur_t, uproc);
> > > +		if (IS_ERR(utask)) {
> > > +			err = PTR_ERR(utask);
> > > +			goto fail;
> > > +		}
> > > +		add_me = find_next_thread(uproc, cur_t);
> > > +		put_task_struct(cur_t);
> > > +		cur_t = add_me;
> > > +	} while (add_me != NULL);
> >
> > can't we race with clone(CLONE_THREAD) and miss the new thread? Probably
> > I missed something, but afaics we need some barriers to ensure that either
> > tracehook_report_clone() sees current->utask != NULL or find_next_thread()
> > sees the new thread in ->thread_group.
>
> The tracehook_report_clone is called after the element gets added to the
> thread_group list in copy_process().
> Looking at three cases where current thread could be cloning a new thread.
>
> a) current thread has a utask and tracehook_report_clone is not yet
> called.
> 	utask for the new thread will be created by either
> tracehook_report_clone or the find_next_thread whichever comes first.

Yes, but my point was, we probably need mb's on both sides. Of course,
this is only theoretical problem, but tracehook_report_clone() can read
current->utask == NULL before the result of copy_process()->list_add_tail()
is visible to another CPU which does create_uprocess().

> > > +static struct pid *get_tg_leader(pid_t p)
> > > +{
> > > +	struct pid *pid = NULL;
> > > +
> > > +	rcu_read_lock();
> > > +	if (current->nsproxy)
> > > +		pid = find_vpid(p);
> >
> > Is it really possible to call register/unregister with nsproxy == NULL?

You didn't answer ;)

> Do you see a need for checking if the process is exiting before we place
> the probes?

Oh, I don't know. You are going to change this code anyway, I can't see
in advance.


I tried to read the next 8/11 patch, and I have a couple more random questions.

	- uprobe_process->tg_leader is not really used ?

	- looks like, 7/11 can't be compiled without the next 8/11 ?
	  say, the next patch defines arch_uprobe_disable_sstep() but
	  it is used by 7/11

	- I don't understand why do we need uprobe_{en,dis}able_interrupts
	  helpers. pre_ssout() could just do local_irq_enable(). This path
	  leads to get_signal_to_deliver() which enables irqs anyway, it is
	  always safe to do this earlier and I don't think we need to disable
	  irqs again later. In any case, I don't understand why these helpers
	  use native_irq_xxx().

	- pre_ssout() does .xol_vaddr = xol_get_insn_slot(). This looks a
	  bit confusing, xol_get_insn_slot() should set .xol_vaddr correctly
	  under lock.

	- pre_ssout() does user_bkpt_set_ip() after user_bkpt_pre_sstep().
	  Why? Shouldn't user_bkpt_pre_sstep() always set regs->ip ?
	  Otherwise uprobe_bkpt_notifier()->user_bkpt_pre_sstep() is not
	  right.

	- I don't really understand why ->handler_in_interrupt is really
	  useful, but never mind.

	- However, handler_in_interrupt && !uses_xol_strategy() doesn't
	  look right. uprobe_bkpt_notifier() is called with irqs disabled,
	  right? but set_orig_insn() is might_sleep().


> > > +	} else {
> > > +		struct uprobe_probept *ppt;
> > > +		int ret;
> > > +
> > > +		/*
> > > +		 * New process spawned by parent.  Remove the probepoints
> > > +		 * in the child's text.
> > > +		 *
> > > +		 * We also hold the uproc->mutex for the parent - so no
> > > +		 * new uprobes will be registered 'til we return.
> > > +		 */
> > > +		mutex_lock(&uproc->mutex);
> > > +		ctask = child->utask;
> > > +		if (unlikely(ctask)) {
> > > +			/*
> > > +			 * create_uprocess() ran just as this fork
> > > +			 * happened, and has already created a new utask.
> > > +			 */
> > > +			mutex_unlock(&uproc->mutex);
> > > +			return;
> > > +		}
> > > +		list_for_each_entry(ppt, &uproc->uprobe_list, ut_node) {
> > > +			ret = user_bkpt_remove_bkpt(child, &ppt->user_bkpt);
> >
> > OK, iiuc this should restore the original instruction, right?
> >
> > But what about clone(CLONE_VM)? In this case this child shares ->mm with
> > parent.
>
> Okay, So I will remove the breakpoints only for ! CLONE(CLONE_VM) and
> !CLONE(CLONE_THREAD)
> For CLONE(CLONE_VM) I will create a new uproc and utask structures.
> Since mm is shared; I guess the XOL vma gets shared between the processes.

Yes, I think CLONE_VM without CLONE_THREAD needs utask too, but do we need
the new uproc? OK, please forget about this for the moment.

Suppose that register_uprobe() succeeds and does set_bkpt(). What if another
process (not sub-thread) with the same ->mm hits this bp? uprobe_bkpt_notifier()
will see ->utask == NULL and return 0. Then do_int3() sends SIGTRAP and kills
this task. OK, probably CLONE_VM alone is exotic, but CLONE_VFORK | VM is not.

I think uprobe_process should be per ->mm, not per-process.

I wonder if there any possibility to avoid task_struct->utask, or at least,
if we can allocate it in uprobe_bkpt_notifier() on demand. Not sure.

Oleg.


  reply	other threads:[~2010-04-19 19:39 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-31 15:51 [PATCH v2 0/11] Uprobes patches Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 1/11] Move Macro W to insn.h Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 2/11] Move replace_page() to mm/memory.c Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 3/11] Enhance replace_page() to support pagecache Srikar Dronamraju
2010-03-31 15:51 ` [PATCH v2 4/11] User Space Breakpoint Assistance Layer Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 5/11] X86 details for user space breakpoint assistance Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 6/11] Slot allocation for Execution out of line Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 7/11] Uprobes Implementation Srikar Dronamraju
2010-04-13 18:35   ` Oleg Nesterov
2010-04-15  9:35     ` Srikar Dronamraju
2010-04-19 19:31       ` Oleg Nesterov [this message]
2010-04-20 12:43         ` Srikar Dronamraju
2010-04-20 15:30           ` Oleg Nesterov
2010-04-21  6:59             ` Srikar Dronamraju
2010-04-21 16:05               ` Oleg Nesterov
2010-04-22 13:31                 ` Srikar Dronamraju
2010-04-22 15:40                   ` Oleg Nesterov
2010-04-23 14:58                     ` Srikar Dronamraju
2010-04-23 18:53                       ` Oleg Nesterov
2010-05-11 20:47                       ` Peter Zijlstra
2010-05-11 20:44                     ` Peter Zijlstra
2010-05-11 20:45                     ` Peter Zijlstra
2010-05-12 10:31                       ` Srikar Dronamraju
2010-05-13 19:40                       ` Oleg Nesterov
2010-05-13 19:59                         ` Linus Torvalds
2010-05-13 22:12                           ` Andi Kleen
2010-05-13 22:25                             ` Linus Torvalds
2010-05-14  0:56                           ` Roland McGrath
2010-05-14  5:42                           ` Srikar Dronamraju
2010-05-11 20:43                   ` Peter Zijlstra
2010-05-12 10:41                     ` Srikar Dronamraju
2010-05-12 11:12                       ` Peter Zijlstra
2010-05-12 14:24                         ` Srikar Dronamraju
2010-05-11 20:32             ` Peter Zijlstra
2010-05-11 20:57               ` Frank Ch. Eigler
2010-05-11 21:01                 ` Peter Zijlstra
2010-03-31 15:52 ` [PATCH v2 8/11] X86 details for uprobes Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 9/11] Uprobes Documentation patch Srikar Dronamraju
2010-03-31 15:52 ` [PATCH v2 10/11] Uprobes samples Srikar Dronamraju
2010-03-31 15:53 ` [PATCH v2 11/11] Uprobes traceevents patch Srikar Dronamraju
2010-03-31 21:24   ` Steven Rostedt
2010-04-01  4:16     ` Masami Hiramatsu
2010-05-12 14:57       ` Frederic Weisbecker
2010-05-12 11:02   ` Frederic Weisbecker
2010-05-12 14:34     ` Srikar Dronamraju
2010-05-12 15:15   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100419193139.GA24080@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=fche@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=jkenisto@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@redhat.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rdunlap@xenotime.net \
    --cc=roland@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).