stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] tracepoints: Do not trace when cpu is offline
       [not found] <20160216194908.005437159@goodmis.org>
@ 2016-02-16 19:49 ` Steven Rostedt
  2016-02-16 20:09   ` Mathieu Desnoyers
  2016-02-16 19:49 ` [PATCH 2/2] tracing: Fix freak link error caused by branch tracer Steven Rostedt
  1 sibling, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2016-02-16 19:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Paul E. McKenney,
	Mathieu Desnoyers, Denis Kirjanov, stable

[-- Attachment #1: 0001-tracepoints-Do-not-trace-when-cpu-is-offline.patch --]
[-- Type: text/plain, Size: 3297 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.

This can probuce the following warning:

 [ INFO: suspicious RCU usage. ]
 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
 -------------------------------
 include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!

 other info that might help us debug this:

 RCU used illegally from offline CPU!  rcu_scheduler_active = 1, debug_locks = 1
 no locks held by swapper/8/0.

 stack backtrace:
  CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S              4.4.0-00006-g0fe53e8-dirty #34
  Call Trace:
  [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
  [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
  [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
  [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
  [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
  [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
  [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
  [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
  [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
  [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
  [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
  [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14

This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.

Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.

Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.

Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org

Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
Cc: stable@vger.kernel.org # v2.6.28+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/tracepoint.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index acd522a91539..acfdbf353a0b 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -14,8 +14,10 @@
  * See the file COPYING for more details.
  */
 
+#include <linux/smp.h>
 #include <linux/errno.h>
 #include <linux/types.h>
+#include <linux/cpumask.h>
 #include <linux/rcupdate.h>
 #include <linux/tracepoint-defs.h>
 
@@ -132,6 +134,9 @@ extern void syscall_unregfunc(void);
 		void *it_func;						\
 		void *__data;						\
 									\
+		if (!cpu_online(raw_smp_processor_id()))		\
+			return;						\
+									\
 		if (!(cond))						\
 			return;						\
 		prercu;							\
-- 
2.6.4



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] tracing: Fix freak link error caused by branch tracer
       [not found] <20160216194908.005437159@goodmis.org>
  2016-02-16 19:49 ` [PATCH 1/2] tracepoints: Do not trace when cpu is offline Steven Rostedt
@ 2016-02-16 19:49 ` Steven Rostedt
  1 sibling, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2016-02-16 19:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Paul E. McKenney,
	Mathieu Desnoyers, Nicolas Pitre, stable, Arnd Bergmann

[-- Attachment #1: 0002-tracing-Fix-freak-link-error-caused-by-branch-tracer.patch --]
[-- Type: text/plain, Size: 2816 bytes --]

From: Arnd Bergmann <arnd@arndb.de>

In my randconfig tests, I came across a bug that involves several
components:

* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
  replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing

        u32 adc_clock = 450560; /* 45.056 MHz */
        if (state->config.adc_clock)
                adc_clock = state->config.adc_clock;
        do_div(value, adc_clock);

In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.

That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():

dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!

This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.

I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.

Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de

Acked-by: Nicolas Pitre <nico@linaro.org>
Fixes: ab3c9c686e22 ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
Cc: stable@vger.kernel.org # v2.6.30+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/compiler.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 00b042c49ccd..48f5aab117ae 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -144,7 +144,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
  */
 #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
 #define __trace_if(cond) \
-	if (__builtin_constant_p((cond)) ? !!(cond) :			\
+	if (__builtin_constant_p(!!(cond)) ? !!(cond) :			\
 	({								\
 		int ______r;						\
 		static struct ftrace_branch_data			\
-- 
2.6.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] tracepoints: Do not trace when cpu is offline
  2016-02-16 19:49 ` [PATCH 1/2] tracepoints: Do not trace when cpu is offline Steven Rostedt
@ 2016-02-16 20:09   ` Mathieu Desnoyers
  2016-02-16 20:32     ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Mathieu Desnoyers @ 2016-02-16 20:09 UTC (permalink / raw)
  To: rostedt, Thomas Gleixner
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Paul E. McKenney, Denis Kirjanov, stable

----- On Feb 16, 2016, at 2:49 PM, rostedt rostedt@goodmis.org wrote:

> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> The tracepoint infrastructure uses RCU sched protection to enable and
> disable tracepoints safely. There are some instances where tracepoints are
> used in infrastructure code (like kfree()) that get called after a CPU is
> going offline, and perhaps when it is coming back online but hasn't been
> registered yet.
> 
> This can probuce the following warning:
> 
> [ INFO: suspicious RCU usage. ]
> 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
> -------------------------------
> include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> RCU used illegally from offline CPU!  rcu_scheduler_active = 1, debug_locks = 1
> no locks held by swapper/8/0.
> 
> stack backtrace:
>  CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S
>  4.4.0-00006-g0fe53e8-dirty #34
>  Call Trace:
>  [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
>  [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
>  [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
>  [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
>  [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
>  [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
>  [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
>  [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
>  [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
>  [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
>  [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
>  [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
> 
> This warning is not a false positive either. RCU is not protecting code that
> is being executed while the CPU is offline.
> 
> Instead of playing "whack-a-mole(TM)" and adding conditional statements to
> the tracepoints we find that are used in this instance, simply add a
> cpu_online() test to the tracepoint code where the tracepoint will be
> ignored if the CPU is offline.
> 
> Use of raw_smp_processor_id() is fine, as there should never be a case where
> the tracepoint code goes from running on a CPU that is online and suddenly
> gets migrated to a CPU that is offline.
> 
> Link:
> http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org

If I get this right, you are proposing to "hide" events happening
during CPU hot-unplug on dying CPUs from the tracers to fix an issue
caused by interaction of RCU-sched (used for Tracepoint synchronization)
wrt CPU hotplug.

Removing tracing visibility of hot-unplug events seems to be an unwelcome
side-effect. I don't know how far Thomas Gleixner got in his overhaul of
CPU hotplug, but he might have something to say about this, as I believe
he would be the first user concerned.

Thoughts ?

Thanks,

Mathieu

> 
> Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
> Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
> Cc: stable@vger.kernel.org # v2.6.28+
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> include/linux/tracepoint.h | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index acd522a91539..acfdbf353a0b 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -14,8 +14,10 @@
>  * See the file COPYING for more details.
>  */
> 
> +#include <linux/smp.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> +#include <linux/cpumask.h>
> #include <linux/rcupdate.h>
> #include <linux/tracepoint-defs.h>
> 
> @@ -132,6 +134,9 @@ extern void syscall_unregfunc(void);
> 		void *it_func;						\
> 		void *__data;						\
> 									\
> +		if (!cpu_online(raw_smp_processor_id()))		\
> +			return;						\
> +									\
> 		if (!(cond))						\
> 			return;						\
> 		prercu;							\
> --
> 2.6.4

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] tracepoints: Do not trace when cpu is offline
  2016-02-16 20:09   ` Mathieu Desnoyers
@ 2016-02-16 20:32     ` Steven Rostedt
  0 siblings, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2016-02-16 20:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Thomas Gleixner, linux-kernel, Linus Torvalds, Ingo Molnar,
	Andrew Morton, Paul E. McKenney, Denis Kirjanov, stable

On Tue, 16 Feb 2016 20:09:35 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:


> If I get this right, you are proposing to "hide" events happening
> during CPU hot-unplug on dying CPUs from the tracers to fix an issue
> caused by interaction of RCU-sched (used for Tracepoint synchronization)
> wrt CPU hotplug.
> 
> Removing tracing visibility of hot-unplug events seems to be an unwelcome
> side-effect. I don't know how far Thomas Gleixner got in his overhaul of
> CPU hotplug, but he might have something to say about this, as I believe
> he would be the first user concerned.
> 

Well, trace_printk() still works. But right now you *can't* have a
tracepoint executed on a CPU that is offline, because it is a bug.
Period. That's because we use RCU sched to protect tracepoints. When
the CPU is offline, there is no protection. It is possible that the
tracepoint structures may get corrupted, or worse, crash the system.
Granted, the race is quite small but it is a bug never the less.

Now, if you want tracepoints to be visible for CPUs that are offline,
then we need something else to protect it. But until then, this fixes
the issue.

And before this patch, we've been adding conditional tracepoints to
check "if (cpu_online(raw_smp_processor_id()))" when a warning appeared.
This patch gets rid of the need to keep adding these whack-a-mole
patches.

-- Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-02-16 20:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20160216194908.005437159@goodmis.org>
2016-02-16 19:49 ` [PATCH 1/2] tracepoints: Do not trace when cpu is offline Steven Rostedt
2016-02-16 20:09   ` Mathieu Desnoyers
2016-02-16 20:32     ` Steven Rostedt
2016-02-16 19:49 ` [PATCH 2/2] tracing: Fix freak link error caused by branch tracer Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).