From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752695Ab0DSTjK (ORCPT ); Mon, 19 Apr 2010 15:39:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21993 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751209Ab0DSTjI (ORCPT ); Mon, 19 Apr 2010 15:39:08 -0400 Date: Mon, 19 Apr 2010 21:31:39 +0200 From: Oleg Nesterov To: Srikar Dronamraju Cc: Peter Zijlstra , Ingo Molnar , Andrew Morton , Linus Torvalds , Masami Hiramatsu , Randy Dunlap , Ananth N Mavinakayanahalli , Jim Keniston , Frederic Weisbecker , "Frank Ch. Eigler" , LKML , Roland McGrath Subject: Re: [PATCH v2 7/11] Uprobes Implementation Message-ID: <20100419193139.GA24080@redhat.com> References: <20100331155106.4181.50759.sendpatchset@localhost6.localdomain6> <20100331155228.4181.61294.sendpatchset@localhost6.localdomain6> <20100413183537.GA17538@redhat.com> <20100415093506.GA2064@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100415093506.GA2064@linux.vnet.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Srikar, sorry for delay... On 04/15, Srikar Dronamraju wrote: > > > > +static void cleanup_uprocess(struct uprobe_process *uproc, > > > + struct task_struct *start) > > > +{ > > > + struct task_struct *tsk = start; > > > + > > > + if (!start) > > > + return; > > > + > > > + rcu_read_lock(); > > > + do { > > > + if (tsk->utask) { > > > + kfree(tsk->utask); > > > + tsk->utask = NULL; > > > + } > > > + tsk = next_thread(tsk); > > > > This doesn't look right. We can't trust ->thread_group list even under > > rcu_read_lock(). The task can exit and __exit_signal() can remove it > > from ->thread_group list before we take rcu_read_lock(). > > Oh Okay, I get that the thread could be exiting from the time we > allocated the utask to the time we are cleaning up here and hence we > could be leaking utask. > > Would it be okay if we explicitly (instead of the using > tracehook_report_exit) call uprobe_free_utask() just after we set > PF_EXITING. We could take the task_lock to synchronize with the > add_utask() and do_exit(). Not sure I understand.... I meant, it is not safe to use next_thread(tsk) if tsk was already removed from list by __unhash_process before we take rcu_read_lock(). > > > +static struct uprobe_task *add_utask(struct task_struct *t, > > > + struct uprobe_process *uproc) > > > +{ > > > + struct uprobe_task *utask; > > > + > > > + if (!t) > > > + return NULL; > > > + utask = kzalloc(sizeof *utask, GFP_USER); > > > + if (unlikely(utask == NULL)) > > > + return ERR_PTR(-ENOMEM); > > > + > > > + utask->uproc = uproc; > > > + utask->active_ppt = NULL; > > > + t->utask = utask; > > > + atomic_inc(&uproc->refcount); > > > + > > > + return utask; > > > +} > > > > This is called by create_uprocess(). Who will free t->utask if t has > > already passed tracehook_report_exit() ? > > Okay. > Would it work if we > > [... snip ...] I think yes. But see below. > > > + * Create and populate one utask per thread in this process. We > > > + * can't call add_utask() while holding RCU lock, so we: > > > + * 1. rcu_read_lock() > > > + * 2. Find the next thread, add_me, in this process that's not > > > + * having a utask struct allocated. > > > + * 3. rcu_read_unlock() > > > + * 4. add_utask(add_me, uproc) > > > + * Repeat 1-4 'til we have utasks for all threads. > > > + */ > > > + cur_t = get_pid_task(tg_leader, PIDTYPE_PID); > > > + do { > > > + utask = add_utask(cur_t, uproc); > > > + if (IS_ERR(utask)) { > > > + err = PTR_ERR(utask); > > > + goto fail; > > > + } > > > + add_me = find_next_thread(uproc, cur_t); > > > + put_task_struct(cur_t); > > > + cur_t = add_me; > > > + } while (add_me != NULL); > > > > can't we race with clone(CLONE_THREAD) and miss the new thread? Probably > > I missed something, but afaics we need some barriers to ensure that either > > tracehook_report_clone() sees current->utask != NULL or find_next_thread() > > sees the new thread in ->thread_group. > > The tracehook_report_clone is called after the element gets added to the > thread_group list in copy_process(). > Looking at three cases where current thread could be cloning a new thread. > > a) current thread has a utask and tracehook_report_clone is not yet > called. > utask for the new thread will be created by either > tracehook_report_clone or the find_next_thread whichever comes first. Yes, but my point was, we probably need mb's on both sides. Of course, this is only theoretical problem, but tracehook_report_clone() can read current->utask == NULL before the result of copy_process()->list_add_tail() is visible to another CPU which does create_uprocess(). > > > +static struct pid *get_tg_leader(pid_t p) > > > +{ > > > + struct pid *pid = NULL; > > > + > > > + rcu_read_lock(); > > > + if (current->nsproxy) > > > + pid = find_vpid(p); > > > > Is it really possible to call register/unregister with nsproxy == NULL? You didn't answer ;) > Do you see a need for checking if the process is exiting before we place > the probes? Oh, I don't know. You are going to change this code anyway, I can't see in advance. I tried to read the next 8/11 patch, and I have a couple more random questions. - uprobe_process->tg_leader is not really used ? - looks like, 7/11 can't be compiled without the next 8/11 ? say, the next patch defines arch_uprobe_disable_sstep() but it is used by 7/11 - I don't understand why do we need uprobe_{en,dis}able_interrupts helpers. pre_ssout() could just do local_irq_enable(). This path leads to get_signal_to_deliver() which enables irqs anyway, it is always safe to do this earlier and I don't think we need to disable irqs again later. In any case, I don't understand why these helpers use native_irq_xxx(). - pre_ssout() does .xol_vaddr = xol_get_insn_slot(). This looks a bit confusing, xol_get_insn_slot() should set .xol_vaddr correctly under lock. - pre_ssout() does user_bkpt_set_ip() after user_bkpt_pre_sstep(). Why? Shouldn't user_bkpt_pre_sstep() always set regs->ip ? Otherwise uprobe_bkpt_notifier()->user_bkpt_pre_sstep() is not right. - I don't really understand why ->handler_in_interrupt is really useful, but never mind. - However, handler_in_interrupt && !uses_xol_strategy() doesn't look right. uprobe_bkpt_notifier() is called with irqs disabled, right? but set_orig_insn() is might_sleep(). > > > + } else { > > > + struct uprobe_probept *ppt; > > > + int ret; > > > + > > > + /* > > > + * New process spawned by parent. Remove the probepoints > > > + * in the child's text. > > > + * > > > + * We also hold the uproc->mutex for the parent - so no > > > + * new uprobes will be registered 'til we return. > > > + */ > > > + mutex_lock(&uproc->mutex); > > > + ctask = child->utask; > > > + if (unlikely(ctask)) { > > > + /* > > > + * create_uprocess() ran just as this fork > > > + * happened, and has already created a new utask. > > > + */ > > > + mutex_unlock(&uproc->mutex); > > > + return; > > > + } > > > + list_for_each_entry(ppt, &uproc->uprobe_list, ut_node) { > > > + ret = user_bkpt_remove_bkpt(child, &ppt->user_bkpt); > > > > OK, iiuc this should restore the original instruction, right? > > > > But what about clone(CLONE_VM)? In this case this child shares ->mm with > > parent. > > Okay, So I will remove the breakpoints only for ! CLONE(CLONE_VM) and > !CLONE(CLONE_THREAD) > For CLONE(CLONE_VM) I will create a new uproc and utask structures. > Since mm is shared; I guess the XOL vma gets shared between the processes. Yes, I think CLONE_VM without CLONE_THREAD needs utask too, but do we need the new uproc? OK, please forget about this for the moment. Suppose that register_uprobe() succeeds and does set_bkpt(). What if another process (not sub-thread) with the same ->mm hits this bp? uprobe_bkpt_notifier() will see ->utask == NULL and return 0. Then do_int3() sends SIGTRAP and kills this task. OK, probably CLONE_VM alone is exotic, but CLONE_VFORK | VM is not. I think uprobe_process should be per ->mm, not per-process. I wonder if there any possibility to avoid task_struct->utask, or at least, if we can allocate it in uprobe_bkpt_notifier() on demand. Not sure. Oleg.