From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Namhyung Kim <namhyung@kernel.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH] tracing: Fix race of function probes counting
Date: Thu, 20 Nov 2014 20:45:09 +0900 [thread overview]
Message-ID: <546DD445.5080108@hitachi.com> (raw)
In-Reply-To: <20141118134643.4b550ee4@gandalf.local.home>
(2014/11/19 3:46), Steven Rostedt wrote:
>
> The function probe counting for traceon and traceoff suffered a race
> condition where if the probe was executing on two or more CPUs at the
> same time, it could decrement the counter by more than one when
> disabling (or enabling) the tracer only once.
>
> The way the traceon and traceoff probes are suppose to work is that
> they disable (or enable) tracing once per count. If a user were to
> echo 'schedule:traceoff:3' into set_ftrace_filter, then when the
> schedule function was called, it would disable tracing. But the count
> should only be decremented once (to 2). Then if the user enabled tracing
> again (via tracing_on file), the next call to schedule would disable
> tracing again and the count would be decremented to 1.
>
> But if multiple CPUS called schedule at the same time, it is possible
> that the count would be decremented more than once because of the
> simple "count--" used.
>
> By reading the count into a local variable and using memory barriers
> we can guarantee that the count would only be decremented once per
> disable (or enable).
>
> The stack trace probe had a similar race, but here the stack trace will
> decrement for each time it is called. But this had the read-modify-
> write race, where it could stack trace more than the number of times
> that was specified. This case we use a cmpxchg to stack trace only the
> number of times specified.
>
> The dump probes can still use the old "update_count()" function as
> they only run once, and that is controlled by the dump logic
> itself.
>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
I have found a couple of typos, but basically, it looks OK.
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> ---
> kernel/trace/trace_functions.c | 117 +++++++++++++++++++++++++++++++++--------
> 1 file changed, 96 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
> index a8e0c7666164..973db52eb070 100644
> --- a/kernel/trace/trace_functions.c
> +++ b/kernel/trace/trace_functions.c
> @@ -261,37 +261,74 @@ static struct tracer function_trace __tracer_data =
> };
>
> #ifdef CONFIG_DYNAMIC_FTRACE
> -static int update_count(void **data)
> +static void update_traceon_count(void **data, int on)
btw, why don't you use bool instead of int?
> {
> - unsigned long *count = (long *)data;
> + long *count = (long *)data;
> + long old_count = *count;
>
> - if (!*count)
> - return 0;
> + /*
> + * Tracing gets disabled (or enabled) once per count.
> + * This function can be called at the same time on mulitple CPUs.
^^^^^^^^ multiple
> + * It is fine if both disable (or enable) tracing, as disabling
> + * (or enabling) the second time doesn't do anything as the
> + * state of the tracer is already disabled (or enabled).
> + * What needs to be synchronized in this case is that the count
> + * only gets decremented once, even if the tracer is disabled
> + * (or enabled) twice, as the second one is really a nop.
> + *
> + * The memory barriers guarantee that we only decrement the
> + * counter once. First the count is read to a local variable
> + * and a read barrier is used to make sure that it is loaded
> + * before checking if the tracer is in the state we want.
> + * If the tracer is not in the state we want, then the count
> + * is guaranteed to be the old count.
> + *
> + * Next the tracer is set to the state we want (disabled or enabled)
> + * then a write memory barrier is used to make sure that
> + * the new state is visible before changing the counter by
> + * one minus the old counter. This guarantees that another CPU
> + * executing this code will see the new state before seeing
> + * the new counter value, and would not do anthing if the new
^^^^^^^anything?
> + * counter is seen.
> + *
> + * Note, there is no synchronization between this and a user
> + * setting the tracing_on file. But we currently don't care
> + * about that.
> + */
> + if (!old_count)
> + return;
>
> - if (*count != -1)
> - (*count)--;
> + /* Make sure we see count before checking tracing state */
> + smp_rmb();
>
> - return 1;
> + if (on == !!tracing_is_on())
> + return;
> +
> + if (on)
> + tracing_on();
> + else
> + tracing_off();
> +
> + /* unlimited? */
> + if (old_count == -1)
> + return;
> +
> + /* Make sure tracing state is visible before updating count */
> + smp_wmb();
> +
> + *count = old_count - 1;
> }
>
> static void
> ftrace_traceon_count(unsigned long ip, unsigned long parent_ip, void **data)
> {
> - if (tracing_is_on())
> - return;
> -
> - if (update_count(data))
> - tracing_on();
> + update_traceon_count(data, 1);
> }
>
> static void
> ftrace_traceoff_count(unsigned long ip, unsigned long parent_ip, void **data)
> {
> - if (!tracing_is_on())
> - return;
> -
> - if (update_count(data))
> - tracing_off();
> + update_traceon_count(data, 0);
> }
>
> static void
> @@ -330,11 +367,49 @@ ftrace_stacktrace(unsigned long ip, unsigned long parent_ip, void **data)
> static void
> ftrace_stacktrace_count(unsigned long ip, unsigned long parent_ip, void **data)
> {
> - if (!tracing_is_on())
> - return;
> + long *count = (long *)data;
> + long old_count;
> + long new_count;
>
> - if (update_count(data))
> - trace_dump_stack(STACK_SKIP);
> + /*
> + * Stack traces should only execute the number of times the
> + * user specified in the counter.
> + */
> + do {
> +
> + if (!tracing_is_on())
> + return;
> +
> + old_count = *count;
> +
> + if (!old_count)
> + return;
> +
> + /* unlimited? */
> + if (old_count == -1) {
> + trace_dump_stack(STACK_SKIP);
> + return;
> + }
> +
> + new_count = old_count - 1;
> + new_count = cmpxchg(count, old_count, new_count);
> + if (new_count == old_count)
> + trace_dump_stack(STACK_SKIP);
> +
> + } while (new_count != old_count);
> +}
> +
> +static int update_count(void **data)
> +{
> + unsigned long *count = (long *)data;
> +
> + if (!*count)
> + return 0;
> +
> + if (*count != -1)
> + (*count)--;
> +
> + return 1;
> }
>
> static void
>
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com
next prev parent reply other threads:[~2014-11-20 11:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-18 18:46 [PATCH] tracing: Fix race of function probes counting Steven Rostedt
2014-11-20 11:45 ` Masami Hiramatsu [this message]
2014-11-20 13:54 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=546DD445.5080108@hitachi.com \
--to=masami.hiramatsu.pt@hitachi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox