From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Andi Kleen <andi@firstfloor.org>,
David Miller <davem@davemloft.net>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Rusty Russell <rusty@rustcorp.com.au>,
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Subject: Re: [PATCH 03/11] jump label: Base patch for jump label
Date: Thu, 23 Sep 2010 14:55:40 -0400 [thread overview]
Message-ID: <20100923185540.GA30262@Krystal> (raw)
In-Reply-To: <20100923184006.GE2825@redhat.com>
* Jason Baron (jbaron@redhat.com) wrote:
> On Thu, Sep 23, 2010 at 11:48:52AM -0400, Mathieu Desnoyers wrote:
> > * Jason Baron (jbaron@redhat.com) wrote:
> > > On Thu, Sep 23, 2010 at 10:37:58AM -0400, Mathieu Desnoyers wrote:
> > > > * Steven Rostedt (rostedt@goodmis.org) wrote:
> > > > > From: Jason Baron <jbaron@redhat.com>
> > > > >
> > > > > base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
> > > > > assembly gcc mechanism, we can now branch to labels from an 'asm goto'
> > > > > statment. This allows us to create a 'no-op' fastpath, which can subsequently
> > > > > be patched with a jump to the slowpath code. This is useful for code which
> > > > > might be rarely used, but which we'd like to be able to call, if needed.
> > > > > Tracepoints are the current usecase that these are being implemented for.
> > > > >
> > > > [...]
> > > > > +/***
> > > > > + * jump_label_update - update jump label text
> > > > > + * @key - key value associated with a a jump label
> > > > > + * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE
> > > > > + *
> > > > > + * Will enable/disable the jump for jump label @key, depending on the
> > > > > + * value of @type.
> > > > > + *
> > > > > + */
> > > > > +
> > > > > +void jump_label_update(unsigned long key, enum jump_label_type type)
> > > > > +{
> > > > > + struct jump_entry *iter;
> > > > > + struct jump_label_entry *entry;
> > > > > + struct hlist_node *module_node;
> > > > > + struct jump_label_module_entry *e_module;
> > > > > + int count;
> > > > > +
> > > > > + mutex_lock(&jump_label_mutex);
> > > > > + entry = get_jump_label_entry((jump_label_t)key);
> > > > > + if (entry) {
> > > > > + count = entry->nr_entries;
> > > > > + iter = entry->table;
> > > > > + while (count--) {
> > > > > + if (kernel_text_address(iter->code))
> > > >
> > > > As I pointed out in another thread, I'm concerned about the use of
> > > > kernel_text_address without module mutex here. kernel_text_address calls
> > > > is_module_text_address(), which calls __module_text_address() with
> > > > preemption off.
> > > >
> > > > __module_text_address() looks like:
> > > >
> > > > struct module *__module_address(unsigned long addr)
> > > > {
> > > > struct module *mod;
> > > >
> > > > if (addr < module_addr_min || addr > module_addr_max)
> > > > return NULL;
> > > >
> > > > list_for_each_entry_rcu(mod, &modules, list)
> > > > if (within_module_core(addr, mod)
> > > > || within_module_init(addr, mod))
> > > > return mod;
> > > > return NULL;
> > > > }
> > > >
> > > > struct module *__module_text_address(unsigned long addr)
> > > > {
> > > > struct module *mod = __module_address(addr);
> > > > if (mod) {
> > > > /* Make sure it's within the text section. */
> > > > if (!within(addr, mod->module_init, mod->init_text_size)
> > > > && !within(addr, mod->module_core, mod->core_text_size))
> > > > mod = NULL;
> > > > }
> > > > return mod;
> > > > }
> > > >
> > > > So the test for the address being in the module core is already
> > > > problematic, since we hold preempt off only within
> > > > is_module_text_address(). The is_module_text_address() caller is then
> > > > free to write to this address even after the module has been unloaded
> > > > and the module unload grace period ended.
> > > >
> > > > Even worse, such grace period is not waited for at module load time
> > > > within:
> > > >
> > > > init_module()
> > > > module_free(mod, mod->module_init);
> > > > mod->module_init = NULL;
> > > > mod->init_size = 0;
> > > > mod->init_text_size = 0;
> > > > (done with module_mutex held, while the module is already in the
> > > > module list)
> > > >
> > > > We'd probably have to hold the module mutex around the
> > > > is_module_text_address() call and address use (which can be a pain), or
> > > > to correctly address this part of init_module() with RCU and require
> > > > that preempt off is held across both __module_text_address() call site
> > > > and the actual use of that pointer (which does not fit with jump label,
> > > > which need to sleep, so we'd have to move module.c to a preemptable
> > > > rcu_read_lock/synchronize_rcu() C.S.).
> > > >
> > > > Thoughts ?
> > > >
> > >
> > > I was thinking about the rcu_read_lock/synchronize_rcu() for this race.
> > > We can hold the rcu_read_lock() across the is_module_text_address()
> > > check in the jump label code, and then we can do in module.c:
> > >
> > > mod->module_init = NULL;
> > > synchronize_rcu();
> > > module_free(mod, mod->module_init);
> >
> > Beware that you need to copy the module_init address. Also make sure you
> > audit the "module_free" (per-architecture) to make sure they don't use
> > "mod" in ways you did not foresee.
> >
> > > .
> > > .
> > > .
> > >
> > > or we could push the rcu_read_lock() further down into
> > > is_module_address()?
> >
> > We need to pull rcu_read_lock further _up_. It needs to be held across
> > both is_module_address() and the actual use of the address, otherwise
> > the memory mapping can be removed underneath us.
> >
> > You can see the rcu read lock as keeping the memory mapping alive for as
> > long as the rcu read lock is held.
> >
> > We'd also need to add a synchronize_rcu() in module removal.
> >
>
> I agree that we this synchronization for the module __init section.
>
> However, I believe we are ok for module removal case. free_module() is
> called *after* blocking_notifier_call_chain() call. The
> blocking_notifier_call_chain() is going to call back into the jump label
> code, grab the jump_label_mutex and remove the reference to the module that
> is about to freed. Thus, the jump label code can no longer reference it.
Ah, yes, this makes sense.
>
> So I think the following patch is all that is required here (lightly
> tested).
Some comments below,
>
> Steve, I'll re-post as a separate patch, if we agree on this fix.
>
> thanks,
>
> -Jason
>
>
>
>
> jump label: fix __init module section race
>
> Jump label uses is_module_text_address() to ensure that the module
> __init sections are valid before updating them. However, between the
> check for a valid module __init section and the subsequent jump
> label update, the module's __init section could be free out from under
> us.
>
> We fix this potential race putting the address check *and* the jump
> label update under a rcu_read_lock(), and making sure a grace period
> has completed before we free the __init section.
>
> Thanks to Mathieu Desnoyers for pointing out this race condition.
>
> Signed-off-by: Jason Baron <jbaron@redhat.com>
You can put my
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> ---
> kernel/jump_label.c | 2 ++
> kernel/module.c | 5 ++++-
> 2 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index f82878b..7830bfb 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -160,6 +160,7 @@ void jump_label_update(unsigned long key, enum jump_label_type type)
> iter++;
> }
> /* eanble/disable jump labels in modules */
Separate patch for typo: eanble -> enable would be appropriate.
> + rcu_read_lock();
> hlist_for_each_entry(e_module, module_node, &(entry->modules),
> hlist) {
> count = e_module->nr_entries;
> @@ -170,6 +171,7 @@ void jump_label_update(unsigned long key, enum jump_label_type type)
> iter++;
> }
> }
> + rcu_read_unlock();
Please note that the impact of this read lock is that synchronize_rcu()
cannot be called from text_poke anymore (because text_poke is called
within a RCU read-side C.S.). So hopefully Masami did not plan to call
synchronize_rcu() from the upcoming breakpoint-based text_poke.
> }
> mutex_unlock(&jump_label_mutex);
> }
> diff --git a/kernel/module.c b/kernel/module.c
> index eba1341..09f7e9e 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2692,6 +2692,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
> unsigned long, len, const char __user *, uargs)
> {
> struct module *mod;
> + void *init_code;
> int ret = 0;
>
> /* Must have permission */
> @@ -2749,8 +2750,10 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
> mod->symtab = mod->core_symtab;
> mod->strtab = mod->core_strtab;
> #endif
> - module_free(mod, mod->module_init);
> + init_code = mod->module_init;
> mod->module_init = NULL;
> + synchronize_rcu();
> + module_free(mod, init_code);
I'm a bit concerned about the fact that "mod" is passed as parameter to
module_free(). It does not seem to be used on x86, but have you reviewed
all other architectures ? I would feel more comfortable if we removed
this parameter from module_free() if it is indeed unused everywhere.
Thanks,
Mathieu
> mod->init_size = 0;
> mod->init_text_size = 0;
> mutex_unlock(&module_mutex);
> --
> 1.7.1
>
>
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2010-09-23 19:00 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-23 3:49 [GIT PULL] jump label: add jump label code Steven Rostedt
2010-09-23 3:49 ` [PATCH 01/11] jump label: Make dynamic no-op selection available outside of ftrace Steven Rostedt
2010-09-23 3:49 ` [PATCH 02/11] jump label: Make text_poke_early() globally visible Steven Rostedt
2010-09-23 3:49 ` [PATCH 03/11] jump label: Base patch for jump label Steven Rostedt
2010-09-23 14:37 ` Mathieu Desnoyers
2010-09-23 15:39 ` Jason Baron
2010-09-23 15:48 ` Mathieu Desnoyers
2010-09-23 18:40 ` Jason Baron
2010-09-23 18:55 ` Mathieu Desnoyers [this message]
2010-09-23 19:08 ` Mathieu Desnoyers
2010-09-23 19:11 ` Jason Baron
2010-09-24 0:44 ` Rusty Russell
2010-09-24 13:54 ` Steven Rostedt
2010-09-24 20:54 ` Mathieu Desnoyers
2010-09-24 21:45 ` Jason Baron
2010-09-23 16:52 ` Steven Rostedt
2010-09-23 17:09 ` Mathieu Desnoyers
2010-09-23 3:49 ` [PATCH 04/11] jump label: Initialize workqueue tracepoints *before* they are registered Steven Rostedt
2010-09-23 3:49 ` [PATCH 05/11] jump label: Add jump_label_text_reserved() to reserve jump points Steven Rostedt
2010-09-23 3:49 ` [PATCH 06/11] jump label: Tracepoint support for jump labels Steven Rostedt
2010-09-23 3:49 ` [PATCH 07/11] jump label: Convert dynamic debug to use " Steven Rostedt
2010-09-23 3:49 ` [PATCH 08/11] jump label: x86 support Steven Rostedt
2010-09-23 3:49 ` [PATCH 09/11] jump label: Add sparc64 support Steven Rostedt
2010-09-23 3:49 ` [PATCH 10/11] jump label: Remove duplicate structure for x86 Steven Rostedt
2010-09-23 3:49 ` [PATCH 11/11] jump label/x86/sparc64: Remove !CC_OPTIMIZE_FOR_SIZE config conditions Steven Rostedt
2010-09-23 4:06 ` [GIT PULL] jump label: add jump label code Steven Rostedt
2010-09-23 6:32 ` Ingo Molnar
2010-09-23 12:42 ` Steven Rostedt
2010-09-24 9:02 ` [tip:perf/core] jump label: Fix GCC feature check when distcc is used tip-bot for Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100923185540.GA30262@Krystal \
--to=compudj@krystal.dyndns.org \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=fweisbec@gmail.com \
--cc=jbaron@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.