[PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
@ 2009-05-12  4:08 Steven Rostedt
  2009-05-12  4:08 ` [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event Steven Rostedt
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton

Ingo,

This patch series tunes the ring buffer to be a bit faster. I used
the ring-buffer-benmark test to help give a good idea on the
performance of the buffer. I ran it on an 2.8 GHz 4way box on an
idle system. I only wanted to test the write without the reader, since
the reader can produce some cacheline bouncing. To do this I inserted
the benchmark module with the "disable_reader=1" option.

Note, when I disable the ring buffer and run the test, I get an average
of 87 ns. Thus the overhead of the test is 87ns, and I will show both
the full time and the 87 subtracted from the time (in parenthesis).

I'm also including the size of the ring_buffer.o object since some changes
helped in shrinking the text segments too.

Before the patch series:

benchmark:  307 ns (220 ns)
   text    data     bss     dec     hex filename
  16554      24      12   16590    40ce kernel/trace/ring_buffer.o

commit 1cd8d7358948909ab80b254eb14bcebc555ad417
ring-buffer: remove type parameter from rb_reserve_next_event

benchmark: 302 ns (215 ns)
   text    data     bss     dec     hex filename
  16538      24      12   16574    40be kernel/trace/ring_buffer.o

commit be957c447f7233a67904a1b11eb3ab61e702bf4d
ring-buffer: move calculation of event length

benchmark: 293 ns (206 ns)
   text    data     bss     dec     hex filename
  16490      24      12   16526    408e kernel/trace/ring_buffer.o

commit 0f0c85fc80adbbd2265d89867d743f929d516805
ring-buffer: small optimizations

benchmark: 285 ns (198 ns)
   text    data     bss     dec     hex filename
  16474      24      12   16510    407e kernel/trace/ring_buffer.o

commit 88eb0125362f2ab272cbaf84252cf101ddc2dec9
ring-buffer: use internal time stamp function

benchmark: 282 ns (195 ns)
   text    data     bss     dec     hex filename
  16474      24      12   16510    407e kernel/trace/ring_buffer.o

commit 168b6b1d0594c7866caa73b12f3b8d91075695f2
ring-buffer: move code around to remove some branches

benchmark: 270 ns (183 ns)
   text    data     bss     dec     hex filename
  16490      24      12   16526    408e kernel/trace/ring_buffer.o

Thus we went from an average of 220 ns per recording, to 183 ns.
Which is about a 17% performance gain.

For your information:

Adding a reader that reads via pages (like splice), the time jumps to
326 ns.

Adding a reader that reades event by event it jumps to (with lots
of overruns)
469 ns.

But disabling the ring buffer, the overhead for the test jumps from 87 ns
to 113 ns, making the ring buffer cost with busy reader: 213 ns and 356 ns.

Please pull the latest tip/tracing/ftrace tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/ftrace

Steven Rostedt (5):
      ring-buffer: remove type parameter from rb_reserve_next_event
      ring-buffer: move calculation of event length
      ring-buffer: small optimizations
      ring-buffer: use internal time stamp function
      ring-buffer: move code around to remove some branches

----
 kernel/trace/ring_buffer.c |   63 +++++++++++++++++++++++++-------------------
 1 files changed, 36 insertions(+), 27 deletions(-)
-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event
  2009-05-12  4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
@ 2009-05-12  4:08 ` Steven Rostedt
  2009-05-12  4:08 ` [PATCH 2/5] ring-buffer: move calculation of event length Steven Rostedt
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton

[-- Attachment #1: 0001-ring-buffer-remove-type-parameter-from-rb_reserve_n.patch --]
[-- Type: text/plain, Size: 2003 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The rb_reserve_next_event is only called for the data type (type = 0).
There is no reason to pass in the type to the function.

Before:
   text    data     bss     dec     hex filename
  16554      24      12   16590    40ce kernel/trace/ring_buffer.o

After:
   text    data     bss     dec     hex filename
  16538      24      12   16574    40be kernel/trace/ring_buffer.o

[ Impact: cleaner, smaller and slightly more efficient code ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3611706..fe40f6c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1389,7 +1389,7 @@ rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
 
 static struct ring_buffer_event *
 rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
-		      unsigned type, unsigned long length)
+		      unsigned long length)
 {
 	struct ring_buffer_event *event;
 	u64 ts, delta;
@@ -1448,7 +1448,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 		/* Non commits have zero deltas */
 		delta = 0;
 
-	event = __rb_reserve_next(cpu_buffer, type, length, &ts);
+	event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
 	if (PTR_ERR(event) == -EAGAIN)
 		goto again;
 
@@ -1556,7 +1556,7 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
 	if (length > BUF_PAGE_SIZE)
 		goto out;
 
-	event = rb_reserve_next_event(cpu_buffer, 0, length);
+	event = rb_reserve_next_event(cpu_buffer, length);
 	if (!event)
 		goto out;
 
@@ -1782,7 +1782,7 @@ int ring_buffer_write(struct ring_buffer *buffer,
 		goto out;
 
 	event_length = rb_calculate_event_length(length);
-	event = rb_reserve_next_event(cpu_buffer, 0, event_length);
+	event = rb_reserve_next_event(cpu_buffer, event_length);
 	if (!event)
 		goto out;
 
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] ring-buffer: move calculation of event length
  2009-05-12  4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
  2009-05-12  4:08 ` [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event Steven Rostedt
@ 2009-05-12  4:08 ` Steven Rostedt
  2009-05-12  4:08 ` [PATCH 3/5] ring-buffer: small optimizations Steven Rostedt
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton

[-- Attachment #1: 0002-ring-buffer-move-calculation-of-event-length.patch --]
[-- Type: text/plain, Size: 2454 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The event length is calculated and passed in to rb_reserve_next_event
in two different locations. Having rb_reserve_next_event do the
calculations directly makes only one location to do the change and
causes the calculation to be inlined by gcc.

Before:
   text    data     bss     dec     hex filename
  16538      24      12   16574    40be kernel/trace/ring_buffer.o

After:
   text    data     bss     dec     hex filename
  16490      24      12   16526    408e kernel/trace/ring_buffer.o

[ Impact: smaller more efficient code ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fe40f6c..493cba4 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -367,6 +367,9 @@ static inline int test_time_stamp(u64 delta)
 
 #define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE)
 
+/* Max payload is BUF_PAGE_SIZE - header (8bytes) */
+#define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
+
 int ring_buffer_print_page_header(struct trace_seq *s)
 {
 	struct buffer_data_page field;
@@ -1396,6 +1399,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 	int commit = 0;
 	int nr_loops = 0;
 
+	length = rb_calculate_event_length(length);
  again:
 	/*
 	 * We allow for interrupts to reenter here and do a trace.
@@ -1552,8 +1556,7 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
 	if (atomic_read(&cpu_buffer->record_disabled))
 		goto out;
 
-	length = rb_calculate_event_length(length);
-	if (length > BUF_PAGE_SIZE)
+	if (length > BUF_MAX_DATA_SIZE)
 		goto out;
 
 	event = rb_reserve_next_event(cpu_buffer, length);
@@ -1758,7 +1761,6 @@ int ring_buffer_write(struct ring_buffer *buffer,
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
 	struct ring_buffer_event *event;
-	unsigned long event_length;
 	void *body;
 	int ret = -EBUSY;
 	int cpu, resched;
@@ -1781,8 +1783,10 @@ int ring_buffer_write(struct ring_buffer *buffer,
 	if (atomic_read(&cpu_buffer->record_disabled))
 		goto out;
 
-	event_length = rb_calculate_event_length(length);
-	event = rb_reserve_next_event(cpu_buffer, event_length);
+	if (length > BUF_MAX_DATA_SIZE)
+		goto out;
+
+	event = rb_reserve_next_event(cpu_buffer, length);
 	if (!event)
 		goto out;
 
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] ring-buffer: small optimizations
  2009-05-12  4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
  2009-05-12  4:08 ` [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event Steven Rostedt
  2009-05-12  4:08 ` [PATCH 2/5] ring-buffer: move calculation of event length Steven Rostedt
@ 2009-05-12  4:08 ` Steven Rostedt
  2009-05-12  4:08 ` [PATCH 4/5] ring-buffer: use internal time stamp function Steven Rostedt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton

[-- Attachment #1: 0003-ring-buffer-small-optimizations.patch --]
[-- Type: text/plain, Size: 1976 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Doing some small changes in the fast path of the ring buffer recording
saves over 3% in the ring-buffer-benchmark test.

[ Impact: a little faster ring buffer recording ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 493cba4..f452de2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1000,7 +1000,7 @@ rb_event_index(struct ring_buffer_event *event)
 	return (addr & ~PAGE_MASK) - (PAGE_SIZE - BUF_PAGE_SIZE);
 }
 
-static int
+static inline int
 rb_is_commit(struct ring_buffer_per_cpu *cpu_buffer,
 	     struct ring_buffer_event *event)
 {
@@ -1423,9 +1423,9 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 	 * also be made. But only the entry that did the actual
 	 * commit will be something other than zero.
 	 */
-	if (cpu_buffer->tail_page == cpu_buffer->commit_page &&
-	    rb_page_write(cpu_buffer->tail_page) ==
-	    rb_commit_index(cpu_buffer)) {
+	if (likely(cpu_buffer->tail_page == cpu_buffer->commit_page &&
+		   rb_page_write(cpu_buffer->tail_page) ==
+		   rb_commit_index(cpu_buffer))) {
 
 		delta = ts - cpu_buffer->write_stamp;
 
@@ -1436,7 +1436,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 		if (unlikely(ts < cpu_buffer->write_stamp))
 			delta = 0;
 
-		if (test_time_stamp(delta)) {
+		else if (unlikely(test_time_stamp(delta))) {
 
 			commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
 
@@ -1470,7 +1470,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 	 * If the timestamp was commited, make the commit our entry
 	 * now so that we will update it when needed.
 	 */
-	if (commit)
+	if (unlikely(commit))
 		rb_set_commit_event(cpu_buffer, event);
 	else if (!rb_is_commit(cpu_buffer, event))
 		delta = 0;
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] ring-buffer: use internal time stamp function
  2009-05-12  4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
                   ` (2 preceding siblings ...)
  2009-05-12  4:08 ` [PATCH 3/5] ring-buffer: small optimizations Steven Rostedt
@ 2009-05-12  4:08 ` Steven Rostedt
  2009-05-12  4:08 ` [PATCH 5/5] ring-buffer: move code around to remove some branches Steven Rostedt
  2009-05-12  8:33 ` [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Ingo Molnar
  5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton

[-- Attachment #1: 0004-ring-buffer-use-internal-time-stamp-function.patch --]
[-- Type: text/plain, Size: 1955 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

The ring_buffer_time_stamp that is exported adds a little more overhead
than is needed for using it internally. This patch adds an internal
timestamp function that can be inlined (a single line function)
and used internally for the ring buffer.

[ Impact: a little less overhead to the ring buffer ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   13 +++++++++----
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f452de2..a9e645a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -454,13 +454,18 @@ struct ring_buffer_iter {
 /* Up this if you want to test the TIME_EXTENTS and normalization */
 #define DEBUG_SHIFT 0
 
+static inline u64 rb_time_stamp(struct ring_buffer *buffer, int cpu)
+{
+	/* shift to debug/test normalization and TIME_EXTENTS */
+	return buffer->clock() << DEBUG_SHIFT;
+}
+
 u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu)
 {
 	u64 time;
 
 	preempt_disable_notrace();
-	/* shift to debug/test normalization and TIME_EXTENTS */
-	time = buffer->clock() << DEBUG_SHIFT;
+	time = rb_time_stamp(buffer, cpu);
 	preempt_enable_no_resched_notrace();
 
 	return time;
@@ -1247,7 +1252,7 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
 		cpu_buffer->tail_page = next_page;
 
 		/* reread the time stamp */
-		*ts = ring_buffer_time_stamp(buffer, cpu_buffer->cpu);
+		*ts = rb_time_stamp(buffer, cpu_buffer->cpu);
 		cpu_buffer->tail_page->page->time_stamp = *ts;
 	}
 
@@ -1413,7 +1418,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 	if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
 		return NULL;
 
-	ts = ring_buffer_time_stamp(cpu_buffer->buffer, cpu_buffer->cpu);
+	ts = rb_time_stamp(cpu_buffer->buffer, cpu_buffer->cpu);
 
 	/*
 	 * Only the first commit can update the timestamp.
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] ring-buffer: move code around to remove some branches
  2009-05-12  4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
                   ` (3 preceding siblings ...)
  2009-05-12  4:08 ` [PATCH 4/5] ring-buffer: use internal time stamp function Steven Rostedt
@ 2009-05-12  4:08 ` Steven Rostedt
  2009-05-12  8:33 ` [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Ingo Molnar
  5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton

[-- Attachment #1: 0005-ring-buffer-move-code-around-to-remove-some-branche.patch --]
[-- Type: text/plain, Size: 2055 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

This is a bit of micro-optimizations. But since the ring buffer is used
in tracing every function call, it is an extreme hot path. Every nanosecond
counts.

This change shows over 5% improvement in the ring-buffer-benchmark.

[ Impact: more efficient code ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/ring_buffer.c |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a9e645a..16b24d4 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1400,7 +1400,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 		      unsigned long length)
 {
 	struct ring_buffer_event *event;
-	u64 ts, delta;
+	u64 ts, delta = 0;
 	int commit = 0;
 	int nr_loops = 0;
 
@@ -1431,20 +1431,21 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 	if (likely(cpu_buffer->tail_page == cpu_buffer->commit_page &&
 		   rb_page_write(cpu_buffer->tail_page) ==
 		   rb_commit_index(cpu_buffer))) {
+		u64 diff;
 
-		delta = ts - cpu_buffer->write_stamp;
+		diff = ts - cpu_buffer->write_stamp;
 
-		/* make sure this delta is calculated here */
+		/* make sure this diff is calculated here */
 		barrier();
 
 		/* Did the write stamp get updated already? */
 		if (unlikely(ts < cpu_buffer->write_stamp))
-			delta = 0;
+			goto get_event;
 
-		else if (unlikely(test_time_stamp(delta))) {
+		delta = diff;
+		if (unlikely(test_time_stamp(delta))) {
 
 			commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
-
 			if (commit == -EBUSY)
 				return NULL;
 
@@ -1453,12 +1454,11 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
 
 			RB_WARN_ON(cpu_buffer, commit < 0);
 		}
-	} else
-		/* Non commits have zero deltas */
-		delta = 0;
+	}
 
+ get_event:
 	event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
-	if (PTR_ERR(event) == -EAGAIN)
+	if (unlikely(PTR_ERR(event) == -EAGAIN))
 		goto again;
 
 	if (!event) {
-- 
1.6.2.4

-- 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
  2009-05-12  4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
                   ` (4 preceding siblings ...)
  2009-05-12  4:08 ` [PATCH 5/5] ring-buffer: move code around to remove some branches Steven Rostedt
@ 2009-05-12  8:33 ` Ingo Molnar
  2009-05-12 13:27   ` Steven Rostedt
  5 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2009-05-12  8:33 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Andrew Morton


* Steven Rostedt <rostedt@goodmis.org> wrote:

> Ingo,
> 
> This patch series tunes the ring buffer to be a bit faster. I used 
> the ring-buffer-benmark test to help give a good idea on the 
> performance of the buffer. I ran it on an 2.8 GHz 4way box on an 
> idle system. I only wanted to test the write without the reader, 
> since the reader can produce some cacheline bouncing. To do this I 
> inserted the benchmark module with the "disable_reader=1" option.
> 
> Note, when I disable the ring buffer and run the test, I get an 
> average of 87 ns. Thus the overhead of the test is 87ns, and I 
> will show both the full time and the 87 subtracted from the time 
> (in parenthesis).
> 
> I'm also including the size of the ring_buffer.o object since some 
> changes helped in shrinking the text segments too.
> 
> Before the patch series:
> 
> benchmark:  307 ns (220 ns)
>    text    data     bss     dec     hex filename
>   16554      24      12   16590    40ce kernel/trace/ring_buffer.o
>           
>          
> commit 1cd8d7358948909ab80b254eb14bcebc555ad417
> ring-buffer: remove type parameter from rb_reserve_next_event
>              
> benchmark: 302 ns (215 ns)
>    text    data     bss     dec     hex filename
>   16538      24      12   16574    40be kernel/trace/ring_buffer.o
>     
> commit be957c447f7233a67904a1b11eb3ab61e702bf4d
> ring-buffer: move calculation of event length
> 
> benchmark: 293 ns (206 ns)
>    text    data     bss     dec     hex filename
>   16490      24      12   16526    408e kernel/trace/ring_buffer.o
>     
> commit 0f0c85fc80adbbd2265d89867d743f929d516805
> ring-buffer: small optimizations
> 
> benchmark: 285 ns (198 ns)
>    text    data     bss     dec     hex filename
>   16474      24      12   16510    407e kernel/trace/ring_buffer.o
> 
> commit 88eb0125362f2ab272cbaf84252cf101ddc2dec9
> ring-buffer: use internal time stamp function
> 
> benchmark: 282 ns (195 ns)
>    text    data     bss     dec     hex filename
>   16474      24      12   16510    407e kernel/trace/ring_buffer.o
> 
> 
> commit 168b6b1d0594c7866caa73b12f3b8d91075695f2
> ring-buffer: move code around to remove some branches
> 
> benchmark: 270 ns (183 ns)
>    text    data     bss     dec     hex filename
>   16490      24      12   16526    408e kernel/trace/ring_buffer.o
> 
> Thus we went from an average of 220 ns per recording, to 183 ns.
> Which is about a 17% performance gain.

Nice!

It's also interesting to see that text size went down when speed 
went up. I'm wondering how these compiler options:

  CONFIG_CC_OPTIMIZE_FOR_SIZE=y
  CONFIG_OPTIMIZE_INLINING=y

My guess is that the combo with the highest performance is:

  CONFIG_CC_OPTIMIZE_FOR_SIZE=y
  # CONFIG_OPTIMIZE_INLINING is not set

Especially if you run it on a fast box with a lot of caches and a 
modern x86 CPU.
 
> For your information:
> 
> Adding a reader that reads via pages (like splice), the time jumps to
> 326 ns.
> 
> Adding a reader that reades event by event it jumps to (with lots
> of overruns)
> 469 ns.
> 
> But disabling the ring buffer, the overhead for the test jumps from 87 ns
> to 113 ns, making the ring buffer cost with busy reader: 213 ns and 356 ns.
> 
> Please pull the latest tip/tracing/ftrace tree, which can be found at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/ftrace
> 
> 
> Steven Rostedt (5):
>       ring-buffer: remove type parameter from rb_reserve_next_event
>       ring-buffer: move calculation of event length
>       ring-buffer: small optimizations
>       ring-buffer: use internal time stamp function
>       ring-buffer: move code around to remove some branches
> 
> ----
>  kernel/trace/ring_buffer.c |   63 +++++++++++++++++++++++++-------------------
>  1 files changed, 36 insertions(+), 27 deletions(-)

Pulled, thanks Steve!

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
  2009-05-12  8:33 ` [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Ingo Molnar
@ 2009-05-12 13:27   ` Steven Rostedt
  2009-05-12 14:25     ` Steven Rostedt
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 13:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton


On Tue, 12 May 2009, Ingo Molnar wrote:

> 
> * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > 
> > Before the patch series:
> > 
> > benchmark:  307 ns (220 ns)
> >    text    data     bss     dec     hex filename
> >   16554      24      12   16590    40ce kernel/trace/ring_buffer.o
> >           
> >          
> > commit 1cd8d7358948909ab80b254eb14bcebc555ad417
> > ring-buffer: remove type parameter from rb_reserve_next_event
> >              
> > benchmark: 302 ns (215 ns)
> >    text    data     bss     dec     hex filename
> >   16538      24      12   16574    40be kernel/trace/ring_buffer.o
> >     
> > commit be957c447f7233a67904a1b11eb3ab61e702bf4d
> > ring-buffer: move calculation of event length
> > 
> > benchmark: 293 ns (206 ns)
> >    text    data     bss     dec     hex filename
> >   16490      24      12   16526    408e kernel/trace/ring_buffer.o
> >     
> > commit 0f0c85fc80adbbd2265d89867d743f929d516805
> > ring-buffer: small optimizations
> > 
> > benchmark: 285 ns (198 ns)
> >    text    data     bss     dec     hex filename
> >   16474      24      12   16510    407e kernel/trace/ring_buffer.o
> > 
> > commit 88eb0125362f2ab272cbaf84252cf101ddc2dec9
> > ring-buffer: use internal time stamp function
> > 
> > benchmark: 282 ns (195 ns)
> >    text    data     bss     dec     hex filename
> >   16474      24      12   16510    407e kernel/trace/ring_buffer.o
> > 
> > 
> > commit 168b6b1d0594c7866caa73b12f3b8d91075695f2
> > ring-buffer: move code around to remove some branches
> > 
> > benchmark: 270 ns (183 ns)
> >    text    data     bss     dec     hex filename
> >   16490      24      12   16526    408e kernel/trace/ring_buffer.o
> > 
> > Thus we went from an average of 220 ns per recording, to 183 ns.
> > Which is about a 17% performance gain.
> 
> Nice!
> 
> It's also interesting to see that text size went down when speed 
> went up. I'm wondering how these compiler options:

But that was not always the case. The biggest boost in performance of the 
series (the last patch) also increased the size.

> 
>   CONFIG_CC_OPTIMIZE_FOR_SIZE=y
>   CONFIG_OPTIMIZE_INLINING=y

Note, my test runs had both the above configure options disabled.
I'll run it again and see how they affect the results:

-- Steve


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
  2009-05-12 13:27   ` Steven Rostedt
@ 2009-05-12 14:25     ` Steven Rostedt
  2009-05-12 14:28       ` Steven Rostedt
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 14:25 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton


On Tue, 12 May 2009, Steven Rostedt wrote:
> > 
> > It's also interesting to see that text size went down when speed 
> > went up. I'm wondering how these compiler options:
> 
> But that was not always the case. The biggest boost in performance of the 
> series (the last patch) also increased the size.
> 
> > 
> >   CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> >   CONFIG_OPTIMIZE_INLINING=y
> 
> Note, my test runs had both the above configure options disabled.
> I'll run it again and see how they affect the results:

Here's the results:

size=n inline=n  270 
size=n inline=y  290
size=y inline=n  315
size=y inline=y  372 (ouch!)

Thus it seems to keep both optimizations off is best for performance.

-- Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
  2009-05-12 14:25     ` Steven Rostedt
@ 2009-05-12 14:28       ` Steven Rostedt
  2009-05-12 14:33       ` Ingo Molnar
  2009-05-13 13:44       ` Steven Rostedt
  2 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 14:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton


On Tue, 12 May 2009, Steven Rostedt wrote:
> 
> Here's the results:
> 
> size=n inline=n  270 
> size=n inline=y  290
> size=y inline=n  315
> size=y inline=y  372 (ouch!)
> 
> Thus it seems to keep both optimizations off is best for performance.

I did not run with ring buffer off on all tests, but the last one brought 
the overhead of the test up from 87 to 130.

-- Steve


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
  2009-05-12 14:25     ` Steven Rostedt
  2009-05-12 14:28       ` Steven Rostedt
@ 2009-05-12 14:33       ` Ingo Molnar
  2009-05-13 13:44       ` Steven Rostedt
  2 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2009-05-12 14:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Andrew Morton, Frédéric Weisbecker,
	Peter Zijlstra


* Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> On Tue, 12 May 2009, Steven Rostedt wrote:
> > > 
> > > It's also interesting to see that text size went down when speed 
> > > went up. I'm wondering how these compiler options:
> > 
> > But that was not always the case. The biggest boost in performance of the 
> > series (the last patch) also increased the size.
> > 
> > > 
> > >   CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> > >   CONFIG_OPTIMIZE_INLINING=y
> > 
> > Note, my test runs had both the above configure options disabled.
> > I'll run it again and see how they affect the results:
> 
> Here's the results:
> 
> size=n inline=n  270 
> size=n inline=y  290
> size=y inline=n  315
> size=y inline=y  372 (ouch!)
> 
> Thus it seems to keep both optimizations off is best for performance.

ok. Maybe newer gcc does better with size=y. optimize-inline=n is 
not a surprise - this is in essence a micro-benchmark where inlining 
helps - and gcc's inliner isnt too fantastic either.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
  2009-05-12 14:25     ` Steven Rostedt
  2009-05-12 14:28       ` Steven Rostedt
  2009-05-12 14:33       ` Ingo Molnar
@ 2009-05-13 13:44       ` Steven Rostedt
  2 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-13 13:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton


On Tue, 12 May 2009, Steven Rostedt wrote:

> 
> On Tue, 12 May 2009, Steven Rostedt wrote:
> > > 
> > > It's also interesting to see that text size went down when speed 
> > > went up. I'm wondering how these compiler options:
> > 
> > But that was not always the case. The biggest boost in performance of the 
> > series (the last patch) also increased the size.
> > 
> > > 
> > >   CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> > >   CONFIG_OPTIMIZE_INLINING=y
> > 
> > Note, my test runs had both the above configure options disabled.
> > I'll run it again and see how they affect the results:
> 
> Here's the results:
> 
> size=n inline=n  270 
> size=n inline=y  290
> size=y inline=n  315
> size=y inline=y  372 (ouch!)
> 
> Thus it seems to keep both optimizations off is best for performance.

Versions do make a difference. I just upgraded my distcc gcc on all my 
distcc boxes from gcc 4.2.2 to 4.4.0 and here's the new results:

size=n inline=n  257
size=n inline=y  259
size=y inline=n  295
size=y inline=y  315

The gcc inlining got a little better, but adding size seems to hurt.

-- Steve


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-05-13 13:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-12  4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
2009-05-12  4:08 ` [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event Steven Rostedt
2009-05-12  4:08 ` [PATCH 2/5] ring-buffer: move calculation of event length Steven Rostedt
2009-05-12  4:08 ` [PATCH 3/5] ring-buffer: small optimizations Steven Rostedt
2009-05-12  4:08 ` [PATCH 4/5] ring-buffer: use internal time stamp function Steven Rostedt
2009-05-12  4:08 ` [PATCH 5/5] ring-buffer: move code around to remove some branches Steven Rostedt
2009-05-12  8:33 ` [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Ingo Molnar
2009-05-12 13:27   ` Steven Rostedt
2009-05-12 14:25     ` Steven Rostedt
2009-05-12 14:28       ` Steven Rostedt
2009-05-12 14:33       ` Ingo Molnar
2009-05-13 13:44       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox