The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH 0/2] [GIT PULL] tracing: performance boosts for recording traces
@ 2009-11-12  4:41 Steven Rostedt
  2009-11-12  4:41 ` [PATCH 1/2] [PATCH 1/2] ring-buffer: Add multiple iterations between benchmark timestamps Steven Rostedt
  2009-11-12  4:41 ` [PATCH 2/2] [PATCH 2/2] tracing: do not disable interrupts for trace_clock_local Steven Rostedt
  0 siblings, 2 replies; 3+ messages in thread
From: Steven Rostedt @ 2009-11-12  4:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Frederic Weisbecker,
	Mathieu Desnoyers


Ingo,

The first patch is really a fix to the ring_buffer_benchmark. It was
not giving a correct representation of the time of a trace. The
logic to time the trace was 1/3 of the measurement.

The second patch removes the disabling of interrupts in trace_clock_local.
This showed a 30 to 40 ns latency.

I'm still seeing the time stamp being a large overhead in the tracing.
I'll be posting an RFC patch set on using normalizing the timestamp
on read, to take some of the work out of the write side. This
drops the write from 179 ns to 151 ns. More about that later.

Please pull the latest tip/tracing/core tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/core


Steven Rostedt (2):
      ring-buffer: Add multiple iterations between benchmark timestamps
      tracing: do not disable interrupts for trace_clock_local

----
 kernel/trace/ring_buffer_benchmark.c |   25 ++++++++++++++++---------
 kernel/trace/trace_clock.c           |    8 +++++---
 2 files changed, 21 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] [PATCH 1/2] ring-buffer: Add multiple iterations between benchmark timestamps
  2009-11-12  4:41 [PATCH 0/2] [GIT PULL] tracing: performance boosts for recording traces Steven Rostedt
@ 2009-11-12  4:41 ` Steven Rostedt
  2009-11-12  4:41 ` [PATCH 2/2] [PATCH 2/2] tracing: do not disable interrupts for trace_clock_local Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2009-11-12  4:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Frederic Weisbecker,
	Mathieu Desnoyers

[-- Attachment #1: 0001-ring-buffer-Add-multiple-iterations-between-benchmar.patch --]
[-- Type: text/plain, Size: 4453 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The ring_buffer_benchmark does a gettimeofday after every write to the
ring buffer in its measurements. This adds the overhead of the call
to gettimeofday to the measurements and does not give an accurate picture
of the length of time it takes to record a trace.

This was first noticed with perf top:

------------------------------------------------------------------------------
   PerfTop:     679 irqs/sec  kernel:99.9% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

             1673.00 - 27.8% : trace_clock_local
              806.00 - 13.4% : do_gettimeofday
              590.00 -  9.8% : rb_reserve_next_event
              554.00 -  9.2% : native_read_tsc
              431.00 -  7.2% : ring_buffer_lock_reserve
              365.00 -  6.1% : __rb_reserve_next
              355.00 -  5.9% : rb_end_commit
              322.00 -  5.4% : getnstimeofday
              268.00 -  4.5% : ring_buffer_unlock_commit
              262.00 -  4.4% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
              113.00 -  1.9% : read_tsc
               91.00 -  1.5% : debug_smp_processor_id
               69.00 -  1.1% : trace_recursive_unlock
               66.00 -  1.1% : ring_buffer_event_data
               25.00 -  0.4% : _spin_unlock_irq

And the length of each write to the ring buffer measured at 310ns.

This patch adds a new module parameter called "write_interval" which is
defaulted to 50. This is the number of writes performed between
timestamps. After this patch perf top shows:

------------------------------------------------------------------------------
   PerfTop:     244 irqs/sec  kernel:100.0% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

             2842.00 - 40.4% : trace_clock_local
             1043.00 - 14.8% : rb_reserve_next_event
              784.00 - 11.1% : ring_buffer_lock_reserve
              600.00 -  8.5% : __rb_reserve_next
              579.00 -  8.2% : rb_end_commit
              440.00 -  6.3% : ring_buffer_unlock_commit
              290.00 -  4.1% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
              155.00 -  2.2% : debug_smp_processor_id
              117.00 -  1.7% : trace_recursive_unlock
              103.00 -  1.5% : ring_buffer_event_data
               28.00 -  0.4% : do_gettimeofday
               22.00 -  0.3% : _spin_unlock_irq
               14.00 -  0.2% : native_read_tsc
               11.00 -  0.2% : getnstimeofday

do_gettimeofday dropped from 13% usage to a mere 0.4%! (using the default
50 interval)  The measurement for each timestamp went from 310ns to 210ns.
That's 100ns (1/3rd) overhead that the gettimeofday call was introducing.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer_benchmark.c |   25 ++++++++++++++++---------
 1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/ring_buffer_benchmark.c b/kernel/trace/ring_buffer_benchmark.c
index 573d3cc..70df73e 100644
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -35,6 +35,10 @@ static int disable_reader;
 module_param(disable_reader, uint, 0644);
 MODULE_PARM_DESC(disable_reader, "only run producer");
 
+static int write_iteration = 50;
+module_param(write_iteration, uint, 0644);
+MODULE_PARM_DESC(write_iteration, "# of writes between timestamp readings");
+
 static int read_events;
 
 static int kill_test;
@@ -208,15 +212,18 @@ static void ring_buffer_producer(void)
 	do {
 		struct ring_buffer_event *event;
 		int *entry;
-
-		event = ring_buffer_lock_reserve(buffer, 10);
-		if (!event) {
-			missed++;
-		} else {
-			hit++;
-			entry = ring_buffer_event_data(event);
-			*entry = smp_processor_id();
-			ring_buffer_unlock_commit(buffer, event);
+		int i;
+
+		for (i = 0; i < write_iteration; i++) {
+			event = ring_buffer_lock_reserve(buffer, 10);
+			if (!event) {
+				missed++;
+			} else {
+				hit++;
+				entry = ring_buffer_event_data(event);
+				*entry = smp_processor_id();
+				ring_buffer_unlock_commit(buffer, event);
+			}
 		}
 		do_gettimeofday(&end_tv);
 
-- 
1.6.5



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] [PATCH 2/2] tracing: do not disable interrupts for trace_clock_local
  2009-11-12  4:41 [PATCH 0/2] [GIT PULL] tracing: performance boosts for recording traces Steven Rostedt
  2009-11-12  4:41 ` [PATCH 1/2] [PATCH 1/2] ring-buffer: Add multiple iterations between benchmark timestamps Steven Rostedt
@ 2009-11-12  4:41 ` Steven Rostedt
  1 sibling, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2009-11-12  4:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Frederic Weisbecker,
	Mathieu Desnoyers

[-- Attachment #1: 0002-tracing-do-not-disable-interrupts-for-trace_clock_lo.patch --]
[-- Type: text/plain, Size: 4066 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Disabling interrupts in trace_clock_local takes quite a performance
hit to the recording of traces. Using perf top we see:

------------------------------------------------------------------------------
   PerfTop:     244 irqs/sec  kernel:100.0% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

             2842.00 - 40.4% : trace_clock_local
             1043.00 - 14.8% : rb_reserve_next_event
              784.00 - 11.1% : ring_buffer_lock_reserve
              600.00 -  8.5% : __rb_reserve_next
              579.00 -  8.2% : rb_end_commit
              440.00 -  6.3% : ring_buffer_unlock_commit
              290.00 -  4.1% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
              155.00 -  2.2% : debug_smp_processor_id
              117.00 -  1.7% : trace_recursive_unlock
              103.00 -  1.5% : ring_buffer_event_data
               28.00 -  0.4% : do_gettimeofday
               22.00 -  0.3% : _spin_unlock_irq
               14.00 -  0.2% : native_read_tsc
               11.00 -  0.2% : getnstimeofday

Where trace_clock_local is 40% of the tracing, and the time for recording
a trace according to ring_buffer_benchmark is 210ns. After converting
the interrupts to preemption disabling we have from perf top:

------------------------------------------------------------------------------
   PerfTop:    1084 irqs/sec  kernel:99.9% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

             1277.00 - 16.8% : native_read_tsc
             1148.00 - 15.1% : rb_reserve_next_event
              896.00 - 11.8% : ring_buffer_lock_reserve
              688.00 -  9.1% : __rb_reserve_next
              664.00 -  8.8% : rb_end_commit
              563.00 -  7.4% : ring_buffer_unlock_commit
              508.00 -  6.7% : _spin_unlock_irq
              365.00 -  4.8% : debug_smp_processor_id
              321.00 -  4.2% : trace_clock_local
              303.00 -  4.0% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
              273.00 -  3.6% : native_sched_clock
              122.00 -  1.6% : trace_recursive_unlock
              113.00 -  1.5% : sched_clock
              101.00 -  1.3% : ring_buffer_event_data
               53.00 -  0.7% : tick_nohz_stop_sched_tick

Where trace_clock_local drops from 40% to only taking 4% of the total time.
The trace time also goes from 210ns down to 179ns (31ns).

I talked with Peter Zijlstra about the impact that sched_clock may have
without having interrupts disabled, and he told me that if a timer interrupt
comes in, sched_clock may report a wrong time.

Balancing a seldom incorrect timestamp with a 15% performance boost, I'll
take the performance boost.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_clock.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 20c5f92..878c03f 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -20,6 +20,8 @@
 #include <linux/ktime.h>
 #include <linux/trace_clock.h>
 
+#include "trace.h"
+
 /*
  * trace_clock_local(): the simplest and least coherent tracing clock.
  *
@@ -28,17 +30,17 @@
  */
 u64 notrace trace_clock_local(void)
 {
-	unsigned long flags;
 	u64 clock;
+	int resched;
 
 	/*
 	 * sched_clock() is an architecture implemented, fast, scalable,
 	 * lockless clock. It is not guaranteed to be coherent across
 	 * CPUs, nor across CPU idle events.
 	 */
-	raw_local_irq_save(flags);
+	resched = ftrace_preempt_disable();
 	clock = sched_clock();
-	raw_local_irq_restore(flags);
+	ftrace_preempt_enable(resched);
 
 	return clock;
 }
-- 
1.6.5



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-11-12  4:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-12  4:41 [PATCH 0/2] [GIT PULL] tracing: performance boosts for recording traces Steven Rostedt
2009-11-12  4:41 ` [PATCH 1/2] [PATCH 1/2] ring-buffer: Add multiple iterations between benchmark timestamps Steven Rostedt
2009-11-12  4:41 ` [PATCH 2/2] [PATCH 2/2] tracing: do not disable interrupts for trace_clock_local Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox