* [PATCH 0/2] [GIT PULL] tracing: Fix tlb_flush TP called in RCU ignored location
@ 2015-02-08 0:58 Steven Rostedt
2015-02-08 0:58 ` [PATCH 1/2] tracing: Add condition check to RCU lockdep checks Steven Rostedt
2015-02-08 0:58 ` [PATCH 2/2] x86/tlb/trace: Do not trace on CPU that is offline Steven Rostedt
0 siblings, 2 replies; 3+ messages in thread
From: Steven Rostedt @ 2015-02-08 0:58 UTC (permalink / raw)
To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton
Linus,
During testing Sedat Dilek hit a "suspicious RCU usage" splat that pointed
out a real bug. During suspend and resume the tlb_flush tracepoint is
called when the CPU is going offline. As the CPU has been noted as offline,
RCU is ignoring that CPU, which means that it can not use RCU protected
locks. When tracepoints are activated, they require RCU locking, and
if RCU is ignoring a CPU that runs a tracepoint, there is a chance that
the tracepoint could cause corruption.
The solution was to change the tracepoint into a TRACE_EVENT_CONDITION()
which allows us to check a condition to determine if the tracepoint
should be called or not. If the condition is not met, the rcu protected
code will not be executed. By adding the condition
"cpu_online(smp_processor_id())", this will prevent the RCU protected
code from being executed if the CPU is marked offline.
After adding this, another bug was discovered. As RCU checks rcu callers,
if a rcu call is not done, there is no check (obviously). We found that
tracepoints could be added in RCU ignored locations and not have lockdep
complain until the tracepoint is activated. This missed places where
tracepoints were added in places they should not have been. To fix this,
code was added in 3.18 that if lockdep is enabled, any tracepoint will
still call the rcu checks even if the tracepoint is not enabled. The bug
here, is that the check does not take the CONDITION into account. As the
condition may prevent tracepoints from being activated in RCU ignored
areas (as the one patch does), we get false positives when we enable
lockdep and hit a tracepoint that the condition prevents it from being
called in a RCU ignored location. The fix for this is to add the
CONDITION to the rcu checks, even if the tracepoint is not enabled.
Please pull the latest trace-fixes-v3.19-rc7 tree, which can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-fixes-v3.19-rc7
Tag SHA1: 8c1c683d0b8af59da6b65976bb7f2a2f1cac7345
Head SHA1: 6c8465a82a605bc692304bab42703017dcfff013
Steven Rostedt (Red Hat) (2):
tracing: Add condition check to RCU lockdep checks
x86/tlb/trace: Do not trace on CPU that is offline
----
include/linux/tracepoint.h | 2 +-
include/trace/events/tlb.h | 4 +++-
2 files changed, 4 insertions(+), 2 deletions(-)
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] tracing: Add condition check to RCU lockdep checks
2015-02-08 0:58 [PATCH 0/2] [GIT PULL] tracing: Fix tlb_flush TP called in RCU ignored location Steven Rostedt
@ 2015-02-08 0:58 ` Steven Rostedt
2015-02-08 0:58 ` [PATCH 2/2] x86/tlb/trace: Do not trace on CPU that is offline Steven Rostedt
1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2015-02-08 0:58 UTC (permalink / raw)
To: linux-kernel
Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Dave Hansen,
Sedat Dilek
[-- Attachment #1: 0001-tracing-Add-condition-check-to-RCU-lockdep-checks.patch --]
[-- Type: text/plain, Size: 2039 bytes --]
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
The trace_tlb_flush() tracepoint can be called when a CPU is going offline.
When a CPU is offline, RCU is no longer watching that CPU and since the
tracepoint is protected by RCU, it must not be called. To prevent the
tlb_flush tracepoint from being called when the CPU is offline, it was
converted to a TRACE_EVENT_CONDITION where the condition checks if the
CPU is online before calling the tracepoint.
Unfortunately, this was not enough to stop lockdep from complaining about
it. Even though the RCU protected code of the tracepoint will never be
called, the condition is hidden within the tracepoint, and even though the
condition prevents RCU code from being called, the lockdep checks are
outside the tracepoint (this is to test tracepoints even when they are not
enabled).
Even though tracepoints should be checked to be RCU safe when they are not
enabled, the condition should still be considered when checking RCU.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Fixes: 3a630178fd5f "tracing: generate RCU warnings even when tracepoints are disabled"
Cc: stable@vger.kernel.org # 3.18+
Acked-by: Dave Hansen <dave@sr71.net>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
include/linux/tracepoint.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index e08e21e5f601..c72851328ca9 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -173,7 +173,7 @@ extern void syscall_unregfunc(void);
TP_PROTO(data_proto), \
TP_ARGS(data_args), \
TP_CONDITION(cond),,); \
- if (IS_ENABLED(CONFIG_LOCKDEP)) { \
+ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \
rcu_read_lock_sched_notrace(); \
rcu_dereference_sched(__tracepoint_##name.funcs);\
rcu_read_unlock_sched_notrace(); \
--
2.1.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] x86/tlb/trace: Do not trace on CPU that is offline
2015-02-08 0:58 [PATCH 0/2] [GIT PULL] tracing: Fix tlb_flush TP called in RCU ignored location Steven Rostedt
2015-02-08 0:58 ` [PATCH 1/2] tracing: Add condition check to RCU lockdep checks Steven Rostedt
@ 2015-02-08 0:58 ` Steven Rostedt
1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2015-02-08 0:58 UTC (permalink / raw)
To: linux-kernel
Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, stable, Sedat Dilek,
Paul E. McKenney, Dave Hansen
[-- Attachment #1: 0002-x86-tlb-trace-Do-not-trace-on-CPU-that-is-offline.patch --]
[-- Type: text/plain, Size: 3066 bytes --]
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
When taking a CPU down for suspend and resume, a tracepoint may be called
when the CPU has been designated offline. As tracepoints require RCU for
protection, they must not be called if the current CPU is offline.
Unfortunately, trace_tlb_flush() is called in this scenario as was noted
by LOCKDEP:
...
Disabling non-boot CPUs ...
intel_pstate CPU 1 exiting
===============================
smpboot: CPU 1 didn't die...
[ INFO: suspicious RCU usage. ]
3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
-------------------------------
include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 0
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
Call Trace:
[<ffffffff817e370d>] dump_stack+0x4c/0x65
[<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
[<ffffffff81054c4e>] play_dead_common+0xe/0x50
[<ffffffff81054ca5>] native_play_dead+0x15/0x140
[<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
[<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
[<ffffffff81053e20>] start_secondary+0x140/0x150
intel_pstate CPU 2 exiting
...
By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
code when the CPU is offline.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Cc: stable@vger.kernel.org # 3.17+
Fixes: d17d8f9dedb9 "x86/mm: Add tracepoints for TLB flushes"
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
include/trace/events/tlb.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/tlb.h b/include/trace/events/tlb.h
index 13391d288107..0e7635765153 100644
--- a/include/trace/events/tlb.h
+++ b/include/trace/events/tlb.h
@@ -13,11 +13,13 @@
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" }, \
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }
-TRACE_EVENT(tlb_flush,
+TRACE_EVENT_CONDITION(tlb_flush,
TP_PROTO(int reason, unsigned long pages),
TP_ARGS(reason, pages),
+ TP_CONDITION(cpu_online(smp_processor_id())),
+
TP_STRUCT__entry(
__field( int, reason)
__field(unsigned long, pages)
--
2.1.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-02-08 0:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-08 0:58 [PATCH 0/2] [GIT PULL] tracing: Fix tlb_flush TP called in RCU ignored location Steven Rostedt
2015-02-08 0:58 ` [PATCH 1/2] tracing: Add condition check to RCU lockdep checks Steven Rostedt
2015-02-08 0:58 ` [PATCH 2/2] x86/tlb/trace: Do not trace on CPU that is offline Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox