From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756164Ab0IWTAq (ORCPT ); Thu, 23 Sep 2010 15:00:46 -0400 Received: from tomts40.bellnexxia.net ([209.226.175.97]:59678 "EHLO tomts40-srv.bellnexxia.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754699Ab0IWTAo (ORCPT ); Thu, 23 Sep 2010 15:00:44 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAFY8m0xGGN6i/2dsb2JhbACiM3LIEoVCBA Date: Thu, 23 Sep 2010 14:55:40 -0400 From: Mathieu Desnoyers To: Jason Baron Cc: Steven Rostedt , linux-kernel@vger.kernel.org, Ingo Molnar , Andrew Morton , Frederic Weisbecker , Andi Kleen , David Miller , "Paul E. McKenney" , Rusty Russell , Masami Hiramatsu Subject: Re: [PATCH 03/11] jump label: Base patch for jump label Message-ID: <20100923185540.GA30262@Krystal> References: <20100923034910.867858597@goodmis.org> <20100923035608.228041459@goodmis.org> <20100923143758.GA4022@Krystal> <20100923153902.GD2825@redhat.com> <20100923154852.GA12648@Krystal> <20100923184006.GE2825@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <20100923184006.GE2825@redhat.com> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.27.31-grsec (i686) X-Uptime: 14:48:07 up 169 days, 4:38, 3 users, load average: 0.42, 0.30, 0.25 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Jason Baron (jbaron@redhat.com) wrote: > On Thu, Sep 23, 2010 at 11:48:52AM -0400, Mathieu Desnoyers wrote: > > * Jason Baron (jbaron@redhat.com) wrote: > > > On Thu, Sep 23, 2010 at 10:37:58AM -0400, Mathieu Desnoyers wrote: > > > > * Steven Rostedt (rostedt@goodmis.org) wrote: > > > > > From: Jason Baron > > > > > > > > > > base patch to implement 'jump labeling'. Based on a new 'asm goto' inline > > > > > assembly gcc mechanism, we can now branch to labels from an 'asm goto' > > > > > statment. This allows us to create a 'no-op' fastpath, which can subsequently > > > > > be patched with a jump to the slowpath code. This is useful for code which > > > > > might be rarely used, but which we'd like to be able to call, if needed. > > > > > Tracepoints are the current usecase that these are being implemented for. > > > > > > > > > [...] > > > > > +/*** > > > > > + * jump_label_update - update jump label text > > > > > + * @key - key value associated with a a jump label > > > > > + * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE > > > > > + * > > > > > + * Will enable/disable the jump for jump label @key, depending on the > > > > > + * value of @type. > > > > > + * > > > > > + */ > > > > > + > > > > > +void jump_label_update(unsigned long key, enum jump_label_type type) > > > > > +{ > > > > > + struct jump_entry *iter; > > > > > + struct jump_label_entry *entry; > > > > > + struct hlist_node *module_node; > > > > > + struct jump_label_module_entry *e_module; > > > > > + int count; > > > > > + > > > > > + mutex_lock(&jump_label_mutex); > > > > > + entry = get_jump_label_entry((jump_label_t)key); > > > > > + if (entry) { > > > > > + count = entry->nr_entries; > > > > > + iter = entry->table; > > > > > + while (count--) { > > > > > + if (kernel_text_address(iter->code)) > > > > > > > > As I pointed out in another thread, I'm concerned about the use of > > > > kernel_text_address without module mutex here. kernel_text_address calls > > > > is_module_text_address(), which calls __module_text_address() with > > > > preemption off. > > > > > > > > __module_text_address() looks like: > > > > > > > > struct module *__module_address(unsigned long addr) > > > > { > > > > struct module *mod; > > > > > > > > if (addr < module_addr_min || addr > module_addr_max) > > > > return NULL; > > > > > > > > list_for_each_entry_rcu(mod, &modules, list) > > > > if (within_module_core(addr, mod) > > > > || within_module_init(addr, mod)) > > > > return mod; > > > > return NULL; > > > > } > > > > > > > > struct module *__module_text_address(unsigned long addr) > > > > { > > > > struct module *mod = __module_address(addr); > > > > if (mod) { > > > > /* Make sure it's within the text section. */ > > > > if (!within(addr, mod->module_init, mod->init_text_size) > > > > && !within(addr, mod->module_core, mod->core_text_size)) > > > > mod = NULL; > > > > } > > > > return mod; > > > > } > > > > > > > > So the test for the address being in the module core is already > > > > problematic, since we hold preempt off only within > > > > is_module_text_address(). The is_module_text_address() caller is then > > > > free to write to this address even after the module has been unloaded > > > > and the module unload grace period ended. > > > > > > > > Even worse, such grace period is not waited for at module load time > > > > within: > > > > > > > > init_module() > > > > module_free(mod, mod->module_init); > > > > mod->module_init = NULL; > > > > mod->init_size = 0; > > > > mod->init_text_size = 0; > > > > (done with module_mutex held, while the module is already in the > > > > module list) > > > > > > > > We'd probably have to hold the module mutex around the > > > > is_module_text_address() call and address use (which can be a pain), or > > > > to correctly address this part of init_module() with RCU and require > > > > that preempt off is held across both __module_text_address() call site > > > > and the actual use of that pointer (which does not fit with jump label, > > > > which need to sleep, so we'd have to move module.c to a preemptable > > > > rcu_read_lock/synchronize_rcu() C.S.). > > > > > > > > Thoughts ? > > > > > > > > > > I was thinking about the rcu_read_lock/synchronize_rcu() for this race. > > > We can hold the rcu_read_lock() across the is_module_text_address() > > > check in the jump label code, and then we can do in module.c: > > > > > > mod->module_init = NULL; > > > synchronize_rcu(); > > > module_free(mod, mod->module_init); > > > > Beware that you need to copy the module_init address. Also make sure you > > audit the "module_free" (per-architecture) to make sure they don't use > > "mod" in ways you did not foresee. > > > > > . > > > . > > > . > > > > > > or we could push the rcu_read_lock() further down into > > > is_module_address()? > > > > We need to pull rcu_read_lock further _up_. It needs to be held across > > both is_module_address() and the actual use of the address, otherwise > > the memory mapping can be removed underneath us. > > > > You can see the rcu read lock as keeping the memory mapping alive for as > > long as the rcu read lock is held. > > > > We'd also need to add a synchronize_rcu() in module removal. > > > > I agree that we this synchronization for the module __init section. > > However, I believe we are ok for module removal case. free_module() is > called *after* blocking_notifier_call_chain() call. The > blocking_notifier_call_chain() is going to call back into the jump label > code, grab the jump_label_mutex and remove the reference to the module that > is about to freed. Thus, the jump label code can no longer reference it. Ah, yes, this makes sense. > > So I think the following patch is all that is required here (lightly > tested). Some comments below, > > Steve, I'll re-post as a separate patch, if we agree on this fix. > > thanks, > > -Jason > > > > > jump label: fix __init module section race > > Jump label uses is_module_text_address() to ensure that the module > __init sections are valid before updating them. However, between the > check for a valid module __init section and the subsequent jump > label update, the module's __init section could be free out from under > us. > > We fix this potential race putting the address check *and* the jump > label update under a rcu_read_lock(), and making sure a grace period > has completed before we free the __init section. > > Thanks to Mathieu Desnoyers for pointing out this race condition. > > Signed-off-by: Jason Baron You can put my Reported-by: Mathieu Desnoyers > --- > kernel/jump_label.c | 2 ++ > kernel/module.c | 5 ++++- > 2 files changed, 6 insertions(+), 1 deletions(-) > > diff --git a/kernel/jump_label.c b/kernel/jump_label.c > index f82878b..7830bfb 100644 > --- a/kernel/jump_label.c > +++ b/kernel/jump_label.c > @@ -160,6 +160,7 @@ void jump_label_update(unsigned long key, enum jump_label_type type) > iter++; > } > /* eanble/disable jump labels in modules */ Separate patch for typo: eanble -> enable would be appropriate. > + rcu_read_lock(); > hlist_for_each_entry(e_module, module_node, &(entry->modules), > hlist) { > count = e_module->nr_entries; > @@ -170,6 +171,7 @@ void jump_label_update(unsigned long key, enum jump_label_type type) > iter++; > } > } > + rcu_read_unlock(); Please note that the impact of this read lock is that synchronize_rcu() cannot be called from text_poke anymore (because text_poke is called within a RCU read-side C.S.). So hopefully Masami did not plan to call synchronize_rcu() from the upcoming breakpoint-based text_poke. > } > mutex_unlock(&jump_label_mutex); > } > diff --git a/kernel/module.c b/kernel/module.c > index eba1341..09f7e9e 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -2692,6 +2692,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod, > unsigned long, len, const char __user *, uargs) > { > struct module *mod; > + void *init_code; > int ret = 0; > > /* Must have permission */ > @@ -2749,8 +2750,10 @@ SYSCALL_DEFINE3(init_module, void __user *, umod, > mod->symtab = mod->core_symtab; > mod->strtab = mod->core_strtab; > #endif > - module_free(mod, mod->module_init); > + init_code = mod->module_init; > mod->module_init = NULL; > + synchronize_rcu(); > + module_free(mod, init_code); I'm a bit concerned about the fact that "mod" is passed as parameter to module_free(). It does not seem to be used on x86, but have you reviewed all other architectures ? I would feel more comfortable if we removed this parameter from module_free() if it is indeed unused everywhere. Thanks, Mathieu > mod->init_size = 0; > mod->init_text_size = 0; > mutex_unlock(&module_mutex); > -- > 1.7.1 > > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com