* [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event
2009-05-12 4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
@ 2009-05-12 4:08 ` Steven Rostedt
2009-05-12 4:08 ` [PATCH 2/5] ring-buffer: move calculation of event length Steven Rostedt
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 4:08 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton
[-- Attachment #1: 0001-ring-buffer-remove-type-parameter-from-rb_reserve_n.patch --]
[-- Type: text/plain, Size: 2003 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The rb_reserve_next_event is only called for the data type (type = 0).
There is no reason to pass in the type to the function.
Before:
text data bss dec hex filename
16554 24 12 16590 40ce kernel/trace/ring_buffer.o
After:
text data bss dec hex filename
16538 24 12 16574 40be kernel/trace/ring_buffer.o
[ Impact: cleaner, smaller and slightly more efficient code ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3611706..fe40f6c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1389,7 +1389,7 @@ rb_add_time_stamp(struct ring_buffer_per_cpu *cpu_buffer,
static struct ring_buffer_event *
rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
- unsigned type, unsigned long length)
+ unsigned long length)
{
struct ring_buffer_event *event;
u64 ts, delta;
@@ -1448,7 +1448,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
/* Non commits have zero deltas */
delta = 0;
- event = __rb_reserve_next(cpu_buffer, type, length, &ts);
+ event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
if (PTR_ERR(event) == -EAGAIN)
goto again;
@@ -1556,7 +1556,7 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
if (length > BUF_PAGE_SIZE)
goto out;
- event = rb_reserve_next_event(cpu_buffer, 0, length);
+ event = rb_reserve_next_event(cpu_buffer, length);
if (!event)
goto out;
@@ -1782,7 +1782,7 @@ int ring_buffer_write(struct ring_buffer *buffer,
goto out;
event_length = rb_calculate_event_length(length);
- event = rb_reserve_next_event(cpu_buffer, 0, event_length);
+ event = rb_reserve_next_event(cpu_buffer, event_length);
if (!event)
goto out;
--
1.6.2.4
--
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 2/5] ring-buffer: move calculation of event length
2009-05-12 4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
2009-05-12 4:08 ` [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event Steven Rostedt
@ 2009-05-12 4:08 ` Steven Rostedt
2009-05-12 4:08 ` [PATCH 3/5] ring-buffer: small optimizations Steven Rostedt
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 4:08 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton
[-- Attachment #1: 0002-ring-buffer-move-calculation-of-event-length.patch --]
[-- Type: text/plain, Size: 2454 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The event length is calculated and passed in to rb_reserve_next_event
in two different locations. Having rb_reserve_next_event do the
calculations directly makes only one location to do the change and
causes the calculation to be inlined by gcc.
Before:
text data bss dec hex filename
16538 24 12 16574 40be kernel/trace/ring_buffer.o
After:
text data bss dec hex filename
16490 24 12 16526 408e kernel/trace/ring_buffer.o
[ Impact: smaller more efficient code ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 14 +++++++++-----
1 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fe40f6c..493cba4 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -367,6 +367,9 @@ static inline int test_time_stamp(u64 delta)
#define BUF_PAGE_SIZE (PAGE_SIZE - BUF_PAGE_HDR_SIZE)
+/* Max payload is BUF_PAGE_SIZE - header (8bytes) */
+#define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
+
int ring_buffer_print_page_header(struct trace_seq *s)
{
struct buffer_data_page field;
@@ -1396,6 +1399,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
int commit = 0;
int nr_loops = 0;
+ length = rb_calculate_event_length(length);
again:
/*
* We allow for interrupts to reenter here and do a trace.
@@ -1552,8 +1556,7 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
if (atomic_read(&cpu_buffer->record_disabled))
goto out;
- length = rb_calculate_event_length(length);
- if (length > BUF_PAGE_SIZE)
+ if (length > BUF_MAX_DATA_SIZE)
goto out;
event = rb_reserve_next_event(cpu_buffer, length);
@@ -1758,7 +1761,6 @@ int ring_buffer_write(struct ring_buffer *buffer,
{
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
- unsigned long event_length;
void *body;
int ret = -EBUSY;
int cpu, resched;
@@ -1781,8 +1783,10 @@ int ring_buffer_write(struct ring_buffer *buffer,
if (atomic_read(&cpu_buffer->record_disabled))
goto out;
- event_length = rb_calculate_event_length(length);
- event = rb_reserve_next_event(cpu_buffer, event_length);
+ if (length > BUF_MAX_DATA_SIZE)
+ goto out;
+
+ event = rb_reserve_next_event(cpu_buffer, length);
if (!event)
goto out;
--
1.6.2.4
--
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 3/5] ring-buffer: small optimizations
2009-05-12 4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
2009-05-12 4:08 ` [PATCH 1/5] ring-buffer: remove type parameter from rb_reserve_next_event Steven Rostedt
2009-05-12 4:08 ` [PATCH 2/5] ring-buffer: move calculation of event length Steven Rostedt
@ 2009-05-12 4:08 ` Steven Rostedt
2009-05-12 4:08 ` [PATCH 4/5] ring-buffer: use internal time stamp function Steven Rostedt
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 4:08 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton
[-- Attachment #1: 0003-ring-buffer-small-optimizations.patch --]
[-- Type: text/plain, Size: 1976 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
Doing some small changes in the fast path of the ring buffer recording
saves over 3% in the ring-buffer-benchmark test.
[ Impact: a little faster ring buffer recording ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 493cba4..f452de2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1000,7 +1000,7 @@ rb_event_index(struct ring_buffer_event *event)
return (addr & ~PAGE_MASK) - (PAGE_SIZE - BUF_PAGE_SIZE);
}
-static int
+static inline int
rb_is_commit(struct ring_buffer_per_cpu *cpu_buffer,
struct ring_buffer_event *event)
{
@@ -1423,9 +1423,9 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
* also be made. But only the entry that did the actual
* commit will be something other than zero.
*/
- if (cpu_buffer->tail_page == cpu_buffer->commit_page &&
- rb_page_write(cpu_buffer->tail_page) ==
- rb_commit_index(cpu_buffer)) {
+ if (likely(cpu_buffer->tail_page == cpu_buffer->commit_page &&
+ rb_page_write(cpu_buffer->tail_page) ==
+ rb_commit_index(cpu_buffer))) {
delta = ts - cpu_buffer->write_stamp;
@@ -1436,7 +1436,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
if (unlikely(ts < cpu_buffer->write_stamp))
delta = 0;
- if (test_time_stamp(delta)) {
+ else if (unlikely(test_time_stamp(delta))) {
commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
@@ -1470,7 +1470,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
* If the timestamp was commited, make the commit our entry
* now so that we will update it when needed.
*/
- if (commit)
+ if (unlikely(commit))
rb_set_commit_event(cpu_buffer, event);
else if (!rb_is_commit(cpu_buffer, event))
delta = 0;
--
1.6.2.4
--
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 4/5] ring-buffer: use internal time stamp function
2009-05-12 4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
` (2 preceding siblings ...)
2009-05-12 4:08 ` [PATCH 3/5] ring-buffer: small optimizations Steven Rostedt
@ 2009-05-12 4:08 ` Steven Rostedt
2009-05-12 4:08 ` [PATCH 5/5] ring-buffer: move code around to remove some branches Steven Rostedt
2009-05-12 8:33 ` [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Ingo Molnar
5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 4:08 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton
[-- Attachment #1: 0004-ring-buffer-use-internal-time-stamp-function.patch --]
[-- Type: text/plain, Size: 1955 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The ring_buffer_time_stamp that is exported adds a little more overhead
than is needed for using it internally. This patch adds an internal
timestamp function that can be inlined (a single line function)
and used internally for the ring buffer.
[ Impact: a little less overhead to the ring buffer ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 13 +++++++++----
1 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f452de2..a9e645a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -454,13 +454,18 @@ struct ring_buffer_iter {
/* Up this if you want to test the TIME_EXTENTS and normalization */
#define DEBUG_SHIFT 0
+static inline u64 rb_time_stamp(struct ring_buffer *buffer, int cpu)
+{
+ /* shift to debug/test normalization and TIME_EXTENTS */
+ return buffer->clock() << DEBUG_SHIFT;
+}
+
u64 ring_buffer_time_stamp(struct ring_buffer *buffer, int cpu)
{
u64 time;
preempt_disable_notrace();
- /* shift to debug/test normalization and TIME_EXTENTS */
- time = buffer->clock() << DEBUG_SHIFT;
+ time = rb_time_stamp(buffer, cpu);
preempt_enable_no_resched_notrace();
return time;
@@ -1247,7 +1252,7 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
cpu_buffer->tail_page = next_page;
/* reread the time stamp */
- *ts = ring_buffer_time_stamp(buffer, cpu_buffer->cpu);
+ *ts = rb_time_stamp(buffer, cpu_buffer->cpu);
cpu_buffer->tail_page->page->time_stamp = *ts;
}
@@ -1413,7 +1418,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000))
return NULL;
- ts = ring_buffer_time_stamp(cpu_buffer->buffer, cpu_buffer->cpu);
+ ts = rb_time_stamp(cpu_buffer->buffer, cpu_buffer->cpu);
/*
* Only the first commit can update the timestamp.
--
1.6.2.4
--
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 5/5] ring-buffer: move code around to remove some branches
2009-05-12 4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
` (3 preceding siblings ...)
2009-05-12 4:08 ` [PATCH 4/5] ring-buffer: use internal time stamp function Steven Rostedt
@ 2009-05-12 4:08 ` Steven Rostedt
2009-05-12 8:33 ` [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Ingo Molnar
5 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 4:08 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton
[-- Attachment #1: 0005-ring-buffer-move-code-around-to-remove-some-branche.patch --]
[-- Type: text/plain, Size: 2055 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
This is a bit of micro-optimizations. But since the ring buffer is used
in tracing every function call, it is an extreme hot path. Every nanosecond
counts.
This change shows over 5% improvement in the ring-buffer-benchmark.
[ Impact: more efficient code ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a9e645a..16b24d4 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1400,7 +1400,7 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
unsigned long length)
{
struct ring_buffer_event *event;
- u64 ts, delta;
+ u64 ts, delta = 0;
int commit = 0;
int nr_loops = 0;
@@ -1431,20 +1431,21 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
if (likely(cpu_buffer->tail_page == cpu_buffer->commit_page &&
rb_page_write(cpu_buffer->tail_page) ==
rb_commit_index(cpu_buffer))) {
+ u64 diff;
- delta = ts - cpu_buffer->write_stamp;
+ diff = ts - cpu_buffer->write_stamp;
- /* make sure this delta is calculated here */
+ /* make sure this diff is calculated here */
barrier();
/* Did the write stamp get updated already? */
if (unlikely(ts < cpu_buffer->write_stamp))
- delta = 0;
+ goto get_event;
- else if (unlikely(test_time_stamp(delta))) {
+ delta = diff;
+ if (unlikely(test_time_stamp(delta))) {
commit = rb_add_time_stamp(cpu_buffer, &ts, &delta);
-
if (commit == -EBUSY)
return NULL;
@@ -1453,12 +1454,11 @@ rb_reserve_next_event(struct ring_buffer_per_cpu *cpu_buffer,
RB_WARN_ON(cpu_buffer, commit < 0);
}
- } else
- /* Non commits have zero deltas */
- delta = 0;
+ }
+ get_event:
event = __rb_reserve_next(cpu_buffer, 0, length, &ts);
- if (PTR_ERR(event) == -EAGAIN)
+ if (unlikely(PTR_ERR(event) == -EAGAIN))
goto again;
if (!event) {
--
1.6.2.4
--
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
2009-05-12 4:08 [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Steven Rostedt
` (4 preceding siblings ...)
2009-05-12 4:08 ` [PATCH 5/5] ring-buffer: move code around to remove some branches Steven Rostedt
@ 2009-05-12 8:33 ` Ingo Molnar
2009-05-12 13:27 ` Steven Rostedt
5 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2009-05-12 8:33 UTC (permalink / raw)
To: Steven Rostedt; +Cc: linux-kernel, Andrew Morton
* Steven Rostedt <rostedt@goodmis.org> wrote:
> Ingo,
>
> This patch series tunes the ring buffer to be a bit faster. I used
> the ring-buffer-benmark test to help give a good idea on the
> performance of the buffer. I ran it on an 2.8 GHz 4way box on an
> idle system. I only wanted to test the write without the reader,
> since the reader can produce some cacheline bouncing. To do this I
> inserted the benchmark module with the "disable_reader=1" option.
>
> Note, when I disable the ring buffer and run the test, I get an
> average of 87 ns. Thus the overhead of the test is 87ns, and I
> will show both the full time and the 87 subtracted from the time
> (in parenthesis).
>
> I'm also including the size of the ring_buffer.o object since some
> changes helped in shrinking the text segments too.
>
> Before the patch series:
>
> benchmark: 307 ns (220 ns)
> text data bss dec hex filename
> 16554 24 12 16590 40ce kernel/trace/ring_buffer.o
>
>
> commit 1cd8d7358948909ab80b254eb14bcebc555ad417
> ring-buffer: remove type parameter from rb_reserve_next_event
>
> benchmark: 302 ns (215 ns)
> text data bss dec hex filename
> 16538 24 12 16574 40be kernel/trace/ring_buffer.o
>
> commit be957c447f7233a67904a1b11eb3ab61e702bf4d
> ring-buffer: move calculation of event length
>
> benchmark: 293 ns (206 ns)
> text data bss dec hex filename
> 16490 24 12 16526 408e kernel/trace/ring_buffer.o
>
> commit 0f0c85fc80adbbd2265d89867d743f929d516805
> ring-buffer: small optimizations
>
> benchmark: 285 ns (198 ns)
> text data bss dec hex filename
> 16474 24 12 16510 407e kernel/trace/ring_buffer.o
>
> commit 88eb0125362f2ab272cbaf84252cf101ddc2dec9
> ring-buffer: use internal time stamp function
>
> benchmark: 282 ns (195 ns)
> text data bss dec hex filename
> 16474 24 12 16510 407e kernel/trace/ring_buffer.o
>
>
> commit 168b6b1d0594c7866caa73b12f3b8d91075695f2
> ring-buffer: move code around to remove some branches
>
> benchmark: 270 ns (183 ns)
> text data bss dec hex filename
> 16490 24 12 16526 408e kernel/trace/ring_buffer.o
>
> Thus we went from an average of 220 ns per recording, to 183 ns.
> Which is about a 17% performance gain.
Nice!
It's also interesting to see that text size went down when speed
went up. I'm wondering how these compiler options:
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_OPTIMIZE_INLINING=y
My guess is that the combo with the highest performance is:
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_OPTIMIZE_INLINING is not set
Especially if you run it on a fast box with a lot of caches and a
modern x86 CPU.
> For your information:
>
> Adding a reader that reads via pages (like splice), the time jumps to
> 326 ns.
>
> Adding a reader that reades event by event it jumps to (with lots
> of overruns)
> 469 ns.
>
> But disabling the ring buffer, the overhead for the test jumps from 87 ns
> to 113 ns, making the ring buffer cost with busy reader: 213 ns and 356 ns.
>
> Please pull the latest tip/tracing/ftrace tree, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/ftrace
>
>
> Steven Rostedt (5):
> ring-buffer: remove type parameter from rb_reserve_next_event
> ring-buffer: move calculation of event length
> ring-buffer: small optimizations
> ring-buffer: use internal time stamp function
> ring-buffer: move code around to remove some branches
>
> ----
> kernel/trace/ring_buffer.c | 63 +++++++++++++++++++++++++-------------------
> 1 files changed, 36 insertions(+), 27 deletions(-)
Pulled, thanks Steve!
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
2009-05-12 8:33 ` [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase Ingo Molnar
@ 2009-05-12 13:27 ` Steven Rostedt
2009-05-12 14:25 ` Steven Rostedt
0 siblings, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 13:27 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton
On Tue, 12 May 2009, Ingo Molnar wrote:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> >
> > Before the patch series:
> >
> > benchmark: 307 ns (220 ns)
> > text data bss dec hex filename
> > 16554 24 12 16590 40ce kernel/trace/ring_buffer.o
> >
> >
> > commit 1cd8d7358948909ab80b254eb14bcebc555ad417
> > ring-buffer: remove type parameter from rb_reserve_next_event
> >
> > benchmark: 302 ns (215 ns)
> > text data bss dec hex filename
> > 16538 24 12 16574 40be kernel/trace/ring_buffer.o
> >
> > commit be957c447f7233a67904a1b11eb3ab61e702bf4d
> > ring-buffer: move calculation of event length
> >
> > benchmark: 293 ns (206 ns)
> > text data bss dec hex filename
> > 16490 24 12 16526 408e kernel/trace/ring_buffer.o
> >
> > commit 0f0c85fc80adbbd2265d89867d743f929d516805
> > ring-buffer: small optimizations
> >
> > benchmark: 285 ns (198 ns)
> > text data bss dec hex filename
> > 16474 24 12 16510 407e kernel/trace/ring_buffer.o
> >
> > commit 88eb0125362f2ab272cbaf84252cf101ddc2dec9
> > ring-buffer: use internal time stamp function
> >
> > benchmark: 282 ns (195 ns)
> > text data bss dec hex filename
> > 16474 24 12 16510 407e kernel/trace/ring_buffer.o
> >
> >
> > commit 168b6b1d0594c7866caa73b12f3b8d91075695f2
> > ring-buffer: move code around to remove some branches
> >
> > benchmark: 270 ns (183 ns)
> > text data bss dec hex filename
> > 16490 24 12 16526 408e kernel/trace/ring_buffer.o
> >
> > Thus we went from an average of 220 ns per recording, to 183 ns.
> > Which is about a 17% performance gain.
>
> Nice!
>
> It's also interesting to see that text size went down when speed
> went up. I'm wondering how these compiler options:
But that was not always the case. The biggest boost in performance of the
series (the last patch) also increased the size.
>
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_OPTIMIZE_INLINING=y
Note, my test runs had both the above configure options disabled.
I'll run it again and see how they affect the results:
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
2009-05-12 13:27 ` Steven Rostedt
@ 2009-05-12 14:25 ` Steven Rostedt
2009-05-12 14:28 ` Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 14:25 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton
On Tue, 12 May 2009, Steven Rostedt wrote:
> >
> > It's also interesting to see that text size went down when speed
> > went up. I'm wondering how these compiler options:
>
> But that was not always the case. The biggest boost in performance of the
> series (the last patch) also increased the size.
>
> >
> > CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> > CONFIG_OPTIMIZE_INLINING=y
>
> Note, my test runs had both the above configure options disabled.
> I'll run it again and see how they affect the results:
Here's the results:
size=n inline=n 270
size=n inline=y 290
size=y inline=n 315
size=y inline=y 372 (ouch!)
Thus it seems to keep both optimizations off is best for performance.
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
2009-05-12 14:25 ` Steven Rostedt
@ 2009-05-12 14:28 ` Steven Rostedt
2009-05-12 14:33 ` Ingo Molnar
2009-05-13 13:44 ` Steven Rostedt
2 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-12 14:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton
On Tue, 12 May 2009, Steven Rostedt wrote:
>
> Here's the results:
>
> size=n inline=n 270
> size=n inline=y 290
> size=y inline=n 315
> size=y inline=y 372 (ouch!)
>
> Thus it seems to keep both optimizations off is best for performance.
I did not run with ring buffer off on all tests, but the last one brought
the overhead of the test up from 87 to 130.
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
2009-05-12 14:25 ` Steven Rostedt
2009-05-12 14:28 ` Steven Rostedt
@ 2009-05-12 14:33 ` Ingo Molnar
2009-05-13 13:44 ` Steven Rostedt
2 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2009-05-12 14:33 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Andrew Morton, Frédéric Weisbecker,
Peter Zijlstra
* Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 12 May 2009, Steven Rostedt wrote:
> > >
> > > It's also interesting to see that text size went down when speed
> > > went up. I'm wondering how these compiler options:
> >
> > But that was not always the case. The biggest boost in performance of the
> > series (the last patch) also increased the size.
> >
> > >
> > > CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> > > CONFIG_OPTIMIZE_INLINING=y
> >
> > Note, my test runs had both the above configure options disabled.
> > I'll run it again and see how they affect the results:
>
> Here's the results:
>
> size=n inline=n 270
> size=n inline=y 290
> size=y inline=n 315
> size=y inline=y 372 (ouch!)
>
> Thus it seems to keep both optimizations off is best for performance.
ok. Maybe newer gcc does better with size=y. optimize-inline=n is
not a surprise - this is in essence a micro-benchmark where inlining
helps - and gcc's inliner isnt too fantastic either.
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/5] [GIT PULL] ring-buffer: optimize to 17% performance increase
2009-05-12 14:25 ` Steven Rostedt
2009-05-12 14:28 ` Steven Rostedt
2009-05-12 14:33 ` Ingo Molnar
@ 2009-05-13 13:44 ` Steven Rostedt
2 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2009-05-13 13:44 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Andrew Morton
On Tue, 12 May 2009, Steven Rostedt wrote:
>
> On Tue, 12 May 2009, Steven Rostedt wrote:
> > >
> > > It's also interesting to see that text size went down when speed
> > > went up. I'm wondering how these compiler options:
> >
> > But that was not always the case. The biggest boost in performance of the
> > series (the last patch) also increased the size.
> >
> > >
> > > CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> > > CONFIG_OPTIMIZE_INLINING=y
> >
> > Note, my test runs had both the above configure options disabled.
> > I'll run it again and see how they affect the results:
>
> Here's the results:
>
> size=n inline=n 270
> size=n inline=y 290
> size=y inline=n 315
> size=y inline=y 372 (ouch!)
>
> Thus it seems to keep both optimizations off is best for performance.
Versions do make a difference. I just upgraded my distcc gcc on all my
distcc boxes from gcc 4.2.2 to 4.4.0 and here's the new results:
size=n inline=n 257
size=n inline=y 259
size=y inline=n 295
size=y inline=y 315
The gcc inlining got a little better, but adding size seems to hurt.
-- Steve
^ permalink raw reply [flat|nested] 12+ messages in thread