All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	"Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [for-next][PATCH 15/34] tracepoint: Make rcuidle tracepoint callers use SRCU
Date: Sat, 11 Aug 2018 09:49:43 -0400	[thread overview]
Message-ID: <20180811135013.409952552@goodmis.org> (raw)
In-Reply-To: 20180811134928.034373761@goodmis.org

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

[remove rcu_read_lock_sched_notrace as its the equivalent of
preempt_disable_notrace and is unnecessary to call in tracepoint code]
Link: http://lkml.kernel.org/r/20180730222423.196630-3-joel@joelfernandes.org

Cleaned-up-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ Simplified WARN_ON_ONCE() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/tracepoint.h | 40 ++++++++++++++++++++++++++++++--------
 kernel/tracepoint.c        | 16 ++++++++++++++-
 2 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 19a690b559ca..d9a084c72541 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,16 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void)
+{ }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +138,31 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/* srcu can't be used from NMI */			\
+		WARN_ON_ONCE(rcuidle && in_nmi());			\
+									\
+		/* keep srcu and sched-rcu usage consistent */		\
+		preempt_disable_notrace();				\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle)						\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+									\
+		it_func_ptr = rcu_dereference_raw((tp)->funcs);		\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +170,11 @@ extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle)						\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+									\
+		preempt_enable_notrace();				\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 6dc6356c3327..955148d91b74 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,27 @@ static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So let us chain the SRCU and sched RCU
+		 * callbacks to wait for both grace periods.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }
-- 
2.18.0



  parent reply	other threads:[~2018-08-11 13:51 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-11 13:49 [for-next][PATCH 00/34] tracing: My official queue finally passed testing and ready for linux-next Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 01/34] srcu: Add notrace variants of srcu_read_{lock,unlock} Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 02/34] srcu: Add notrace variant of srcu_dereference Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 03/34] tracing/irqsoff: Split reset into separate functions Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 04/34] lib: Add module for testing preemptoff/irqsoff latency tracers Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 05/34] kselftests: Add tests for the preemptoff and irqsoff tracers Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 06/34] tracing: Make unregister_trigger() static Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 07/34] tracing: Remove orphaned function using_ftrace_ops_list_func() Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 08/34] tracing: Remove orphaned function ftrace_nr_registered_ops() Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 09/34] tracing/kprobes: Simplify the logic of enable_trace_kprobe() Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 10/34] tracing: preemptirq_delay_run() can be static Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 11/34] tracing: kprobes: Prohibit probing on notrace function Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 12/34] selftest/ftrace: Move kprobe selftest function to separate compile unit Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 13/34] selftests/ftrace: Fix kprobe string testcase to not probe notrace function Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 14/34] lockdep: Use this_cpu_ptr instead of get_cpu_var stats Steven Rostedt
2018-08-11 13:49 ` Steven Rostedt [this message]
2018-08-11 13:49 ` [for-next][PATCH 16/34] tracing: Centralize preemptirq tracepoints and unify their usage Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 17/34] tracefs: Annotate tracefs_ops with __ro_after_init Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 18/34] tracing: Do not call start/stop() functions when tracing_on does not change Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 19/34] ftrace: Add missing check for existing hwlat thread Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 20/34] tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 21/34] tracing: Make tracer_tracing_is_on() return bool Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 22/34] ring-buffer: Make ring_buffer_record_is_on() " Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 23/34] ring-buffer: Make ring_buffer_record_is_set_on() " Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 24/34] ftrace: Use true and false for boolean values in ops_references_rec() Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 25/34] tracing/kprobes: Fix within_notrace_func() to check only notrace functions Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 26/34] trace: Use rcu_dereference_raw for hooks from trace-event subsystem Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 27/34] tracing: irqsoff: Account for additional preempt_disable Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 28/34] tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage" Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 29/34] tracing/irqsoff: Handle preempt_count for different configs Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 30/34] tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage" Steven Rostedt
2018-08-11 13:49 ` [for-next][PATCH 31/34] ftrace: Remove unused pointer ftrace_swapper_pid Steven Rostedt
2018-08-11 13:50 ` [for-next][PATCH 32/34] tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister() Steven Rostedt
2018-08-11 13:50 ` [for-next][PATCH 33/34] uprobes: Use synchronize_rcu() not synchronize_sched() Steven Rostedt
2018-08-11 13:50 ` [for-next][PATCH 34/34] tracepoints: Free early tracepoints after RCU is initialized Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180811135013.409952552@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.