public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@redhat.com>
To: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Andi Kleen <andi@firstfloor.org>,
	David Miller <davem@davemloft.net>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [PATCH 03/11] jump label: Base patch for jump label
Date: Thu, 23 Sep 2010 14:40:06 -0400	[thread overview]
Message-ID: <20100923184006.GE2825@redhat.com> (raw)
In-Reply-To: <20100923154852.GA12648@Krystal>

On Thu, Sep 23, 2010 at 11:48:52AM -0400, Mathieu Desnoyers wrote:
> * Jason Baron (jbaron@redhat.com) wrote:
> > On Thu, Sep 23, 2010 at 10:37:58AM -0400, Mathieu Desnoyers wrote:
> > > * Steven Rostedt (rostedt@goodmis.org) wrote:
> > > > From: Jason Baron <jbaron@redhat.com>
> > > > 
> > > > base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
> > > > assembly gcc mechanism, we can now branch to labels from an 'asm goto'
> > > > statment. This allows us to create a 'no-op' fastpath, which can subsequently
> > > > be patched with a jump to the slowpath code. This is useful for code which
> > > > might be rarely used, but which we'd like to be able to call, if needed.
> > > > Tracepoints are the current usecase that these are being implemented for.
> > > > 
> > > [...]
> > > > +/***
> > > > + * jump_label_update - update jump label text
> > > > + * @key -  key value associated with a a jump label
> > > > + * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE
> > > > + *
> > > > + * Will enable/disable the jump for jump label @key, depending on the
> > > > + * value of @type.
> > > > + *
> > > > + */
> > > > +
> > > > +void jump_label_update(unsigned long key, enum jump_label_type type)
> > > > +{
> > > > +	struct jump_entry *iter;
> > > > +	struct jump_label_entry *entry;
> > > > +	struct hlist_node *module_node;
> > > > +	struct jump_label_module_entry *e_module;
> > > > +	int count;
> > > > +
> > > > +	mutex_lock(&jump_label_mutex);
> > > > +	entry = get_jump_label_entry((jump_label_t)key);
> > > > +	if (entry) {
> > > > +		count = entry->nr_entries;
> > > > +		iter = entry->table;
> > > > +		while (count--) {
> > > > +			if (kernel_text_address(iter->code))
> > > 
> > > As I pointed out in another thread, I'm concerned about the use of
> > > kernel_text_address without module mutex here. kernel_text_address calls
> > > is_module_text_address(), which calls __module_text_address() with
> > > preemption off.
> > > 
> > > __module_text_address() looks like:
> > > 
> > > struct module *__module_address(unsigned long addr)
> > > {
> > >         struct module *mod;
> > > 
> > >         if (addr < module_addr_min || addr > module_addr_max)
> > >                 return NULL;
> > > 
> > >         list_for_each_entry_rcu(mod, &modules, list)
> > >                 if (within_module_core(addr, mod)
> > >                     || within_module_init(addr, mod))
> > >                         return mod;
> > >         return NULL;
> > > }
> > > 
> > > struct module *__module_text_address(unsigned long addr)
> > > {
> > >         struct module *mod = __module_address(addr);
> > >         if (mod) {
> > >                 /* Make sure it's within the text section. */
> > >                 if (!within(addr, mod->module_init, mod->init_text_size)
> > >                     && !within(addr, mod->module_core, mod->core_text_size))
> > >                         mod = NULL;
> > >         }
> > >         return mod;
> > > }
> > > 
> > > So the test for the address being in the module core is already
> > > problematic, since we hold preempt off only within
> > > is_module_text_address(). The is_module_text_address() caller is then
> > > free to write to this address even after the module has been unloaded
> > > and the module unload grace period ended.
> > > 
> > > Even worse, such grace period is not waited for at module load time
> > > within:
> > > 
> > > init_module()
> > >        module_free(mod, mod->module_init);
> > >        mod->module_init = NULL;
> > >        mod->init_size = 0;
> > >        mod->init_text_size = 0;
> > >   (done with module_mutex held, while the module is already in the
> > >    module list)
> > > 
> > > We'd probably have to hold the module mutex around the
> > > is_module_text_address() call and address use (which can be a pain), or
> > > to correctly address this part of init_module() with RCU and require
> > > that preempt off is held across both __module_text_address() call site
> > > and the actual use of that pointer (which does not fit with jump label,
> > > which need to sleep, so we'd have to move module.c to a preemptable
> > > rcu_read_lock/synchronize_rcu() C.S.).
> > > 
> > > Thoughts ?
> > > 
> > 
> > I was thinking about the rcu_read_lock/synchronize_rcu() for this race.
> > We can hold the rcu_read_lock() across the is_module_text_address()
> > check in the jump label code, and then we can do in module.c:
> > 
> > mod->module_init = NULL;
> > synchronize_rcu();
> > module_free(mod, mod->module_init);
> 
> Beware that you need to copy the module_init address. Also make sure you
> audit the "module_free" (per-architecture) to make sure they don't use
> "mod" in ways you did not foresee.
> 
> > .
> > .
> > .
> > 
> > or we could push the rcu_read_lock() further down into
> > is_module_address()?
> 
> We need to pull rcu_read_lock further _up_. It needs to be held across
> both is_module_address() and the actual use of the address, otherwise
> the memory mapping can be removed underneath us.
> 
> You can see the rcu read lock as keeping the memory mapping alive for as
> long as the rcu read lock is held.
> 
> We'd also need to add a synchronize_rcu() in module removal.
> 

I agree that we this synchronization for the module __init section.

However, I believe we are ok for module removal case. free_module() is
called *after* blocking_notifier_call_chain() call. The
blocking_notifier_call_chain() is going to call back into the jump label
code, grab the jump_label_mutex and remove the reference to the module that
is about to freed. Thus, the jump label code can no longer reference it.

So I think the following patch is all that is required here (lightly
tested).

Steve, I'll re-post as a separate patch, if we agree on this fix.

thanks,

-Jason




jump label: fix __init module section race

Jump label uses is_module_text_address() to ensure that the module
__init sections are valid before updating them. However, between the
check for a valid module __init section and the subsequent jump
label update, the module's __init section could be free out from under
us.

We fix this potential race putting the address check *and* the jump
label update under a rcu_read_lock(), and making sure a grace period
has completed before we free the __init section.

Thanks to Mathieu Desnoyers for pointing out this race condition.

Signed-off-by: Jason Baron <jbaron@redhat.com>
---
 kernel/jump_label.c |    2 ++
 kernel/module.c     |    5 ++++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index f82878b..7830bfb 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -160,6 +160,7 @@ void jump_label_update(unsigned long key, enum jump_label_type type)
 			iter++;
 		}
 		/* eanble/disable jump labels in modules */
+		rcu_read_lock();
 		hlist_for_each_entry(e_module, module_node, &(entry->modules),
 							hlist) {
 			count = e_module->nr_entries;
@@ -170,6 +171,7 @@ void jump_label_update(unsigned long key, enum jump_label_type type)
 				iter++;
 			}
 		}
+		rcu_read_unlock();
 	}
 	mutex_unlock(&jump_label_mutex);
 }
diff --git a/kernel/module.c b/kernel/module.c
index eba1341..09f7e9e 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2692,6 +2692,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
 		unsigned long, len, const char __user *, uargs)
 {
 	struct module *mod;
+	void *init_code;
 	int ret = 0;
 
 	/* Must have permission */
@@ -2749,8 +2750,10 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
 	mod->symtab = mod->core_symtab;
 	mod->strtab = mod->core_strtab;
 #endif
-	module_free(mod, mod->module_init);
+	init_code = mod->module_init;
 	mod->module_init = NULL;
+	synchronize_rcu();
+	module_free(mod, init_code);
 	mod->init_size = 0;
 	mod->init_text_size = 0;
 	mutex_unlock(&module_mutex);
-- 
1.7.1




  reply	other threads:[~2010-09-23 18:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-23  3:49 [GIT PULL] jump label: add jump label code Steven Rostedt
2010-09-23  3:49 ` [PATCH 01/11] jump label: Make dynamic no-op selection available outside of ftrace Steven Rostedt
2010-09-23  3:49 ` [PATCH 02/11] jump label: Make text_poke_early() globally visible Steven Rostedt
2010-09-23  3:49 ` [PATCH 03/11] jump label: Base patch for jump label Steven Rostedt
2010-09-23 14:37   ` Mathieu Desnoyers
2010-09-23 15:39     ` Jason Baron
2010-09-23 15:48       ` Mathieu Desnoyers
2010-09-23 18:40         ` Jason Baron [this message]
2010-09-23 18:55           ` Mathieu Desnoyers
2010-09-23 19:08             ` Mathieu Desnoyers
2010-09-23 19:11             ` Jason Baron
2010-09-24  0:44           ` Rusty Russell
2010-09-24 13:54           ` Steven Rostedt
2010-09-24 20:54             ` Mathieu Desnoyers
2010-09-24 21:45               ` Jason Baron
2010-09-23 16:52       ` Steven Rostedt
2010-09-23 17:09         ` Mathieu Desnoyers
2010-09-23  3:49 ` [PATCH 04/11] jump label: Initialize workqueue tracepoints *before* they are registered Steven Rostedt
2010-09-23  3:49 ` [PATCH 05/11] jump label: Add jump_label_text_reserved() to reserve jump points Steven Rostedt
2010-09-23  3:49 ` [PATCH 06/11] jump label: Tracepoint support for jump labels Steven Rostedt
2010-09-23  3:49 ` [PATCH 07/11] jump label: Convert dynamic debug to use " Steven Rostedt
2010-09-23  3:49 ` [PATCH 08/11] jump label: x86 support Steven Rostedt
2010-09-23  3:49 ` [PATCH 09/11] jump label: Add sparc64 support Steven Rostedt
2010-09-23  3:49 ` [PATCH 10/11] jump label: Remove duplicate structure for x86 Steven Rostedt
2010-09-23  3:49 ` [PATCH 11/11] jump label/x86/sparc64: Remove !CC_OPTIMIZE_FOR_SIZE config conditions Steven Rostedt
2010-09-23  4:06 ` [GIT PULL] jump label: add jump label code Steven Rostedt
2010-09-23  6:32   ` Ingo Molnar
2010-09-23 12:42 ` Steven Rostedt
2010-09-24  9:02 ` [tip:perf/core] jump label: Fix GCC feature check when distcc is used tip-bot for Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100923184006.GE2825@redhat.com \
    --to=jbaron@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=compudj@krystal.dyndns.org \
    --cc=davem@davemloft.net \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox