public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] [GIT PULL] tracing: Fixes with dealing with RCU and context tracking
@ 2013-06-06 22:16 Steven Rostedt
  2013-06-06 22:16 ` [PATCH 1/4] rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu() Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Steven Rostedt @ 2013-06-06 22:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 2065 bytes --]


Linus,

This includes the same patches as my last pull request with
trace-fixes-v3.10-rc3-v2. I added two more fixes to this series.

This contains 4 fixes.

The first two fix the case where full RCU debugging is enabled, enabling
function tracing causes a live lock of the system. This is due to the added
debug checks in rcu_dereference_raw() that is used by the function tracer.
These checks are also traced by the function tracer as well as cause enough
overhead to the function tracer to slow down the system enough that
the time to finish an interrupt can take longer than when the next
interrupt is triggered, causing a live lock from the timer interrupt.

Talking this over with Paul McKenney, we came up with a fix that adds
a new rcu_dereference_raw_notrace() that does not perform these added checks,
and let the function tracer use that.

The third commit fixes a failed compile when branch tracing is enabled,
due to the conversion of the trace_test_buffer() selftest that the
branch trace wasn't converted for.

The forth patch fixes a bug caught by the RCU lockdep code where a
rcu_read_lock() is performed when rcu is disabled (either going to
or from idle, or user space). This happened on the irqsoff tracer
as it calls task_uid(). The fix here was to use current_uid() when
possible that doesn't use rcu locking. Which luckily, is always used
when irqsoff calls this code.


Please pull the latest trace-fixes-v3.10-rc3-v3 tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
trace-fixes-v3.10-rc3-v3

Tag SHA1: 0e87101aa65ebfe736070474684d1b38fea4d805
Head SHA1: f17a5194859a82afe4164e938b92035b86c55794


Steven Rostedt (2):
      rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
      ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends

Steven Rostedt (Red Hat) (2):
      tracing: Fix bad parameter passed in branch selftest
      tracing: Use current_uid() for critical time tracing

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
  2013-06-06 22:16 [PATCH 0/4] [GIT PULL] tracing: Fixes with dealing with RCU and context tracking Steven Rostedt
@ 2013-06-06 22:16 ` Steven Rostedt
  2013-06-06 22:16 ` [PATCH 2/4] ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends Steven Rostedt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2013-06-06 22:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 3000 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

As rcu_dereference_raw() under RCU debug config options can add quite a
bit of checks, and that tracing uses rcu_dereference_raw(), these checks
happen with the function tracer. The function tracer also happens to trace
these debug checks too. This added overhead can livelock the system.

Add a new interface to RCU for both rcu_dereference_raw_notrace() as well
as hlist_for_each_entry_rcu_notrace() as the hlist iterator uses the
rcu_dereference_raw() as well, and is used a bit with the function tracer.

Link: http://lkml.kernel.org/r/20130528184209.304356745@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/rculist.h  |   20 ++++++++++++++++++++
 include/linux/rcupdate.h |    9 +++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8089e35..f4b1001 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -461,6 +461,26 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
 			&(pos)->member)), typeof(*(pos)), member))
 
 /**
+ * hlist_for_each_entry_rcu_notrace - iterate over rcu list of given type (for tracing)
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the hlist_node within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as hlist_add_head_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ *
+ * This is the same as hlist_for_each_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define hlist_for_each_entry_rcu_notrace(pos, head, member)			\
+	for (pos = hlist_entry_safe (rcu_dereference_raw_notrace(hlist_first_rcu(head)),\
+			typeof(*(pos)), member);			\
+		pos;							\
+		pos = hlist_entry_safe(rcu_dereference_raw_notrace(hlist_next_rcu(\
+			&(pos)->member)), typeof(*(pos)), member))
+
+/**
  * hlist_for_each_entry_rcu_bh - iterate over rcu list of given type
  * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4ccd68e..ddcc782 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -640,6 +640,15 @@ static inline void rcu_preempt_sleep_check(void)
 
 #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/
 
+/*
+ * The tracing infrastructure traces RCU (we want that), but unfortunately
+ * some of the RCU checks causes tracing to lock up the system.
+ *
+ * The tracing version of rcu_dereference_raw() must not call
+ * rcu_read_lock_held().
+ */
+#define rcu_dereference_raw_notrace(p) __rcu_dereference_check((p), 1, __rcu)
+
 /**
  * rcu_access_index() - fetch RCU index with no dereferencing
  * @p: The index to read
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends
  2013-06-06 22:16 [PATCH 0/4] [GIT PULL] tracing: Fixes with dealing with RCU and context tracking Steven Rostedt
  2013-06-06 22:16 ` [PATCH 1/4] rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu() Steven Rostedt
@ 2013-06-06 22:16 ` Steven Rostedt
  2013-06-06 22:16 ` [PATCH 3/4] tracing: Fix bad parameter passed in branch selftest Steven Rostedt
  2013-06-06 22:16 ` [PATCH 4/4] tracing: Use current_uid() for critical time tracing Steven Rostedt
  3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2013-06-06 22:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 3538 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

As rcu_dereference_raw() under RCU debug config options can add quite a
bit of checks, and that tracing uses rcu_dereference_raw(), these checks
happen with the function tracer. The function tracer also happens to trace
these debug checks too. This added overhead can livelock the system.

Have the function tracer use the new RCU _notrace equivalents that do
not do the debug checks for RCU.

Link: http://lkml.kernel.org/r/20130528184209.467603904@goodmis.org

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ftrace.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b549b0f..6c508ff 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -120,22 +120,22 @@ static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip);
 
 /*
  * Traverse the ftrace_global_list, invoking all entries.  The reason that we
- * can use rcu_dereference_raw() is that elements removed from this list
+ * can use rcu_dereference_raw_notrace() is that elements removed from this list
  * are simply leaked, so there is no need to interact with a grace-period
- * mechanism.  The rcu_dereference_raw() calls are needed to handle
+ * mechanism.  The rcu_dereference_raw_notrace() calls are needed to handle
  * concurrent insertions into the ftrace_global_list.
  *
  * Silly Alpha and silly pointer-speculation compiler optimizations!
  */
 #define do_for_each_ftrace_op(op, list)			\
-	op = rcu_dereference_raw(list);			\
+	op = rcu_dereference_raw_notrace(list);			\
 	do
 
 /*
  * Optimized for just a single item in the list (as that is the normal case).
  */
 #define while_for_each_ftrace_op(op)				\
-	while (likely(op = rcu_dereference_raw((op)->next)) &&	\
+	while (likely(op = rcu_dereference_raw_notrace((op)->next)) &&	\
 	       unlikely((op) != &ftrace_list_end))
 
 static inline void ftrace_ops_init(struct ftrace_ops *ops)
@@ -779,7 +779,7 @@ ftrace_find_profiled_func(struct ftrace_profile_stat *stat, unsigned long ip)
 	if (hlist_empty(hhd))
 		return NULL;
 
-	hlist_for_each_entry_rcu(rec, hhd, node) {
+	hlist_for_each_entry_rcu_notrace(rec, hhd, node) {
 		if (rec->ip == ip)
 			return rec;
 	}
@@ -1165,7 +1165,7 @@ ftrace_lookup_ip(struct ftrace_hash *hash, unsigned long ip)
 
 	hhd = &hash->buckets[key];
 
-	hlist_for_each_entry_rcu(entry, hhd, hlist) {
+	hlist_for_each_entry_rcu_notrace(entry, hhd, hlist) {
 		if (entry->ip == ip)
 			return entry;
 	}
@@ -1422,8 +1422,8 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
 	struct ftrace_hash *notrace_hash;
 	int ret;
 
-	filter_hash = rcu_dereference_raw(ops->filter_hash);
-	notrace_hash = rcu_dereference_raw(ops->notrace_hash);
+	filter_hash = rcu_dereference_raw_notrace(ops->filter_hash);
+	notrace_hash = rcu_dereference_raw_notrace(ops->notrace_hash);
 
 	if ((ftrace_hash_empty(filter_hash) ||
 	     ftrace_lookup_ip(filter_hash, ip)) &&
@@ -2920,7 +2920,7 @@ static void function_trace_probe_call(unsigned long ip, unsigned long parent_ip,
 	 * on the hash. rcu_read_lock is too dangerous here.
 	 */
 	preempt_disable_notrace();
-	hlist_for_each_entry_rcu(entry, hhd, node) {
+	hlist_for_each_entry_rcu_notrace(entry, hhd, node) {
 		if (entry->ip == ip)
 			entry->ops->func(ip, parent_ip, &entry->data);
 	}
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] tracing: Fix bad parameter passed in branch selftest
  2013-06-06 22:16 [PATCH 0/4] [GIT PULL] tracing: Fixes with dealing with RCU and context tracking Steven Rostedt
  2013-06-06 22:16 ` [PATCH 1/4] rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu() Steven Rostedt
  2013-06-06 22:16 ` [PATCH 2/4] ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends Steven Rostedt
@ 2013-06-06 22:16 ` Steven Rostedt
  2013-06-06 22:16 ` [PATCH 4/4] tracing: Use current_uid() for critical time tracing Steven Rostedt
  3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2013-06-06 22:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Paul E. McKenney,
	Fengguang Wu

[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The branch selftest calls trace_test_buffer(), but with the new code
it expects the first parameter to be a pointer to a struct trace_buffer.
All self tests were changed but the branch selftest was missed.

This caused either a crash or failed test when the branch selftest was
enabled.

Link: http://lkml.kernel.org/r/20130529141333.GA24064@localhost

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_selftest.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 55e2cf6..2901e3b 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -1159,7 +1159,7 @@ trace_selftest_startup_branch(struct tracer *trace, struct trace_array *tr)
 	/* stop the tracing. */
 	tracing_stop();
 	/* check the trace buffer */
-	ret = trace_test_buffer(tr, &count);
+	ret = trace_test_buffer(&tr->trace_buffer, &count);
 	trace->reset(tr);
 	tracing_start();
 
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] tracing: Use current_uid() for critical time tracing
  2013-06-06 22:16 [PATCH 0/4] [GIT PULL] tracing: Fixes with dealing with RCU and context tracking Steven Rostedt
                   ` (2 preceding siblings ...)
  2013-06-06 22:16 ` [PATCH 3/4] tracing: Fix bad parameter passed in branch selftest Steven Rostedt
@ 2013-06-06 22:16 ` Steven Rostedt
  3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2013-06-06 22:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Paul E. McKenney

[-- Attachment #1: Type: text/plain, Size: 4707 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

The irqsoff tracer records the max time that interrupts are disabled.
There are hooks in the assembly code that calls back into the tracer when
interrupts are disabled or enabled.

When they are enabled, the tracer checks if the amount of time they
were disabled is larger than the previous recorded max interrupts off
time. If it is, it creates a snapshot of the currently running trace
to store where the last largest interrupts off time was held and how
it happened.

During testing, this RCU lockdep dump appeared:

[ 1257.829021] ===============================
[ 1257.829021] [ INFO: suspicious RCU usage. ]
[ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G        W
[ 1257.829021] -------------------------------
[ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 1257.829021]
[ 1257.829021] other info that might help us debug this:
[ 1257.829021]
[ 1257.829021]
[ 1257.829021] RCU used illegally from idle CPU!
[ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
[ 1257.829021] RCU used illegally from extended quiescent state!
[ 1257.829021] 2 locks held by trace-cmd/4831:
[ 1257.829021]  #0:  (max_trace_lock){......}, at: [<ffffffff810e2b77>] stop_critical_timing+0x1a3/0x209
[ 1257.829021]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810dae5a>] __update_max_tr+0x88/0x1ee
[ 1257.829021]
[ 1257.829021] stack backtrace:
[ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G        W    3.10.0-rc1-test+ #171
[ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[ 1257.829021]  0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
[ 1257.829021]  ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
[ 1257.829021]  ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
[ 1257.829021] Call Trace:
[ 1257.829021]  [<ffffffff8153dd2b>] dump_stack+0x19/0x1b
[ 1257.829021]  [<ffffffff81092a00>] lockdep_rcu_suspicious+0x109/0x112
[ 1257.829021]  [<ffffffff810daebf>] __update_max_tr+0xed/0x1ee
[ 1257.829021]  [<ffffffff810dae5a>] ? __update_max_tr+0x88/0x1ee
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810dbf85>] update_max_tr_single+0x11d/0x12d
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810e2b15>] stop_critical_timing+0x141/0x209
[ 1257.829021]  [<ffffffff8109569a>] ? trace_hardirqs_on+0xd/0xf
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810e3057>] time_hardirqs_on+0x2a/0x2f
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff8109550c>] trace_hardirqs_on_caller+0x16/0x197
[ 1257.829021]  [<ffffffff8109569a>] trace_hardirqs_on+0xd/0xf
[ 1257.829021]  [<ffffffff811002b9>] user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810029b4>] do_notify_resume+0x92/0x97
[ 1257.829021]  [<ffffffff8154bdca>] int_signal+0x12/0x17

What happened was entering into the user code, the interrupts were enabled
and a max interrupts off was recorded. The trace buffer was saved along with
various information about the task: comm, pid, uid, priority, etc.

The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
to retrieve the data, and this happened to happen where RCU is blind (user_enter).

As only the preempt and irqs off tracers can have this happen, and they both
only have the tsk == current, if tsk == current, use current_uid() instead of
task_uid(), as current_uid() does not use RCU as only current can change its uid.

This fixes the RCU suspicious splat.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4d79485..1a41023 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -843,7 +843,15 @@ __update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu)
 
 	memcpy(max_data->comm, tsk->comm, TASK_COMM_LEN);
 	max_data->pid = tsk->pid;
-	max_data->uid = task_uid(tsk);
+	/*
+	 * If tsk == current, then use current_uid(), as that does not use
+	 * RCU. The irq tracer can be called out of RCU scope.
+	 */
+	if (tsk == current)
+		max_data->uid = current_uid();
+	else
+		max_data->uid = task_uid(tsk);
+
 	max_data->nice = tsk->static_prio - 20 - MAX_RT_PRIO;
 	max_data->policy = tsk->policy;
 	max_data->rt_priority = tsk->rt_priority;
-- 
1.7.10.4



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-06-06 22:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-06 22:16 [PATCH 0/4] [GIT PULL] tracing: Fixes with dealing with RCU and context tracking Steven Rostedt
2013-06-06 22:16 ` [PATCH 1/4] rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu() Steven Rostedt
2013-06-06 22:16 ` [PATCH 2/4] ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends Steven Rostedt
2013-06-06 22:16 ` [PATCH 3/4] tracing: Fix bad parameter passed in branch selftest Steven Rostedt
2013-06-06 22:16 ` [PATCH 4/4] tracing: Use current_uid() for critical time tracing Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox