* [PATCH] jump label: Reduce the cycle count by changing the link order
@ 2011-08-05 20:40 Jason Baron
2011-08-05 22:10 ` Peter Zijlstra
2011-08-09 14:29 ` [tip:perf/urgent] " tip-bot for Jason Baron
0 siblings, 2 replies; 7+ messages in thread
From: Jason Baron @ 2011-08-05 20:40 UTC (permalink / raw)
To: a.p.zijlstra, rostedt; +Cc: pjt, mingo, rth, linux-kernel
In the course of testing jump labels for use with the CFS bandwidth controller,
Paul Turner, discovered that using jump labels reduced the branch count and the
instruction count, but did not reduce the cycle count or wall time.
I noticed that having the jump_label.o included in the kernel but not used in
any way still caused this increase in cycle count and wall time. Thus, I moved
jump_label.o in the kernel/Makefile, thus changing the link order, and
presumably moving it out of hot icache areas. This brought down the cycle
count/time as expected.
In addition to Paul's testing, I've tested the patch using a single
'static_branch()' in the getppid() path, and basically running tight loops of
calls to getppid(). Here are my results for the branch disabled case:
With jump labels turned on (CONFIG_JUMP_LABEL), branch disabled:
Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):
3,969,510,217 instructions # 0.864 IPC ( +-0.000% )
4,592,334,954 cycles ( +- 0.046% )
751,634,470 branches ( +- 0.000% )
1.722635797 seconds time elapsed ( +- 0.046% )
Jump labels turned off (CONFIG_JUMP_LABEL not set), branch disabled:
Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):
4,009,611,846 instructions # 0.867 IPC ( +-0.000% )
4,622,210,580 cycles ( +- 0.012% )
771,662,904 branches ( +- 0.000% )
1.734341454 seconds time elapsed ( +- 0.022% )
Signed-off-by: Jason Baron <jbaron@redhat.com>
Tested-by: Paul Turner <pjt@google.com>
---
kernel/Makefile | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/kernel/Makefile b/kernel/Makefile
index 2d64cfc..329dfcc 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
- async.o range.o jump_label.o
+ async.o range.o
obj-y += groups.o
ifdef CONFIG_FUNCTION_TRACER
@@ -107,6 +107,7 @@ obj-$(CONFIG_PERF_EVENTS) += events/
obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
obj-$(CONFIG_PADATA) += padata.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
+obj-$(CONFIG_JUMP_LABEL) += jump_label.o
ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
--
1.7.5.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] jump label: Reduce the cycle count by changing the link order
2011-08-05 20:40 [PATCH] jump label: Reduce the cycle count by changing the link order Jason Baron
@ 2011-08-05 22:10 ` Peter Zijlstra
2011-08-06 3:20 ` Paul Turner
2011-08-08 15:40 ` Jason Baron
2011-08-09 14:29 ` [tip:perf/urgent] " tip-bot for Jason Baron
1 sibling, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2011-08-05 22:10 UTC (permalink / raw)
To: Jason Baron; +Cc: rostedt, pjt, mingo, rth, linux-kernel
On Fri, 2011-08-05 at 16:40 -0400, Jason Baron wrote:
> In the course of testing jump labels for use with the CFS bandwidth controller,
> Paul Turner, discovered that using jump labels reduced the branch count and the
> instruction count, but did not reduce the cycle count or wall time.
>
> I noticed that having the jump_label.o included in the kernel but not used in
> any way still caused this increase in cycle count and wall time. Thus, I moved
> jump_label.o in the kernel/Makefile, thus changing the link order, and
> presumably moving it out of hot icache areas. This brought down the cycle
> count/time as expected.
>
> In addition to Paul's testing, I've tested the patch using a single
> 'static_branch()' in the getppid() path, and basically running tight loops of
> calls to getppid(). Here are my results for the branch disabled case:
Those numbers don't seem to be pre/post patch, but merely
CONFIG_JUMP_LABEL=y/n so they don't tell us what the patch does.
Anyway, should we put a comment in the Makefile telling us we should
keep jump_label.o last?
Also, pjt mentioned on IRC that mucking about with link order is
something google is not unfamiliar with.. could we use some sort of
runtime feedback to generate linker layout maps or so? That seems like a
more scalable version than randomly mucking about with Makefiles :-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] jump label: Reduce the cycle count by changing the link order
2011-08-05 22:10 ` Peter Zijlstra
@ 2011-08-06 3:20 ` Paul Turner
2011-08-08 15:40 ` Jason Baron
1 sibling, 0 replies; 7+ messages in thread
From: Paul Turner @ 2011-08-06 3:20 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Jason Baron, rostedt, mingo, rth, linux-kernel
On Fri, Aug 5, 2011 at 3:10 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Fri, 2011-08-05 at 16:40 -0400, Jason Baron wrote:
>> In the course of testing jump labels for use with the CFS bandwidth controller,
>> Paul Turner, discovered that using jump labels reduced the branch count and the
>> instruction count, but did not reduce the cycle count or wall time.
>>
>> I noticed that having the jump_label.o included in the kernel but not used in
>> any way still caused this increase in cycle count and wall time. Thus, I moved
>> jump_label.o in the kernel/Makefile, thus changing the link order, and
>> presumably moving it out of hot icache areas. This brought down the cycle
>> count/time as expected.
>>
>> In addition to Paul's testing, I've tested the patch using a single
>> 'static_branch()' in the getppid() path, and basically running tight loops of
>> calls to getppid(). Here are my results for the branch disabled case:
>
> Those numbers don't seem to be pre/post patch, but merely
> CONFIG_JUMP_LABEL=y/n so they don't tell us what the patch does.
>
I have some numbers to support this:
[
Key:
npo_XXX = with CONFIG_JUMP_LABEL, without link order patch (no patched order)
po_XXX = with CONFIG_JUMP_LABEL, with link order patch (patched order)
nojl_XXX = without CONFIG_JUMP_LABEL
head is tip (c5bafb3)
Test was repeated 3 times, each run was 50 repeats w/ typically ~<0.1
in-test variance on reported output
]
[
Key:
npo_XXX = with CONFIG_JUMP_LABEL, without link order patch (no patched order)
po_XXX = with CONFIG_JUMP_LABEL, with link order patch (patched order)
nojl_XXX = without CONFIG_JUMP_LABEL
base is tip (c5bafb3)
Test was repeated 3 times, each run was 50 repeats w/ typically ~<0.1
in-test variance on reported output
]
instructions cycles
branches elapsed
---------------------------------------------------------------------------------------------------------------------
Westmere:
njl_base.1 798832892 722624737
145375836 0.203218936
njl_base.2 798888783 (+0.01) 746118188 (+3.25)
145386807 (+0.01) 0.208573683 (-2.18)
njl_base.3 798864253 (+0.00) 731537139 (+1.23)
145382747 (+0.00) 0.204098175 (-4.28)
npo_base.1 797033521 (-0.23) 731239359 (+1.19)
144571358 (-0.55) 0.206910496 (-2.96)
npo_base.2 797166434 (-0.21) 728926020 (+0.87)
144603465 (-0.53) 0.202906392 (-4.84)
npo_base.3 797165370 (-0.21) 725930458 (+0.46)
144603438 (-0.53) 0.202118274 (-5.21)
po_base.1 797019904 (-0.23) 699008145 (-3.27)
144567652 (-0.56) 0.197272615 (-7.48)
po_base.2 797037682 (-0.22) 705732419 (-2.34)
144572115 (-0.55) 0.197101692 (-7.56)
po_base.3 797079804 (-0.22) 698007668 (-3.41)
144580964 (-0.55) 0.194871253 (-8.61)
Barcelona:
njl_base.1 816842028 748362637
147462095 0.341654152
njl_base.2 816849735 (+0.00) 748480742 (+0.02)
147462652 (+0.00) 0.341450734 (-2.90)
njl_base.3 816834963 (-0.00) 747083797 (-0.17)
147460200 (-0.00) 0.340802353 (-3.09)
npo_base.1 815068563 (-0.22) 775012690 (+3.56)
146661357 (-0.54) 0.353797321 (+0.61)
npo_base.2 815033261 (-0.22) 759613364 (+1.50)
146654106 (-0.55) 0.346462671 (-1.48)
npo_base.3 815029611 (-0.22) 762660196 (+1.91)
146654169 (-0.55) 0.347565129 (-1.16)
po_base.1 815026489 (-0.22) 767229109 (+2.52)
146653376 (-0.55) 0.350241833 (-0.40)
po_base.2 815035127 (-0.22) 770224495 (+2.92)
146654019 (-0.55) 0.351352092 (-0.09)
po_base.3 815109904 (-0.21) 774954096 (+3.55)
146662020 (-0.54) 0.353505054 (+0.53)
At least on Nehalem/Westmere systems it looks worthwhile.
> Anyway, should we put a comment in the Makefile telling us we should
> keep jump_label.o last?
Without doing some sort of FDO sampling this list is always going to
have junk arbitrary ordering constraints (which unfortunately extend
beyond jump_label.o).
This commit being in the reflog for the file is already going to serve
as evidence to that. :(
>
> Also, pjt mentioned on IRC that mucking about with link order is
> something google is not unfamiliar with.. could we use some sort of
> runtime feedback to generate linker layout maps or so? That seems like a
> more scalable version than randomly mucking about with Makefiles :-)
>
I think this is a good longer term direction, but that getting there
will take a while (What are the right workloads to drive the FDO data
for example?).
In the short term it's probably just worth taking since the effects
aren't going away.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] jump label: Reduce the cycle count by changing the link order
2011-08-05 22:10 ` Peter Zijlstra
2011-08-06 3:20 ` Paul Turner
@ 2011-08-08 15:40 ` Jason Baron
2011-08-08 17:52 ` Arnaud Lacombe
1 sibling, 1 reply; 7+ messages in thread
From: Jason Baron @ 2011-08-08 15:40 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: rostedt, pjt, mingo, rth, linux-kernel
On Sat, Aug 06, 2011 at 12:10:09AM +0200, Peter Zijlstra wrote:
> On Fri, 2011-08-05 at 16:40 -0400, Jason Baron wrote:
> > In the course of testing jump labels for use with the CFS bandwidth controller,
> > Paul Turner, discovered that using jump labels reduced the branch count and the
> > instruction count, but did not reduce the cycle count or wall time.
> >
> > I noticed that having the jump_label.o included in the kernel but not used in
> > any way still caused this increase in cycle count and wall time. Thus, I moved
> > jump_label.o in the kernel/Makefile, thus changing the link order, and
> > presumably moving it out of hot icache areas. This brought down the cycle
> > count/time as expected.
> >
> > In addition to Paul's testing, I've tested the patch using a single
> > 'static_branch()' in the getppid() path, and basically running tight loops of
> > calls to getppid(). Here are my results for the branch disabled case:
>
> Those numbers don't seem to be pre/post patch, but merely
> CONFIG_JUMP_LABEL=y/n so they don't tell us what the patch does.
>
oops. I did record all that data, I just didn't include it :( So here it is:
jump label eanbled:
new makefile ordering:
Performance counter stats for 'bash -c /tmp/timing;true' (50 runs):
4,578,321,415 cycles ( +- 0.021% )
3,969,511,833 instructions # 0.867 IPC ( +- 0.000% )
751,633,846 branches ( +- 0.000% )
1.717374497 seconds time elapsed ( +- 0.021% )
old makefile ordering:
Performance counter stats for 'bash -c /tmp/timing;true' (50 runs):
4,623,129,746 cycles ( +- 0.015% )
3,969,600,140 instructions # 0.859 IPC ( +- 0.000% )
751,648,318 branches ( +- 0.000% )
1.734843587 seconds time elapsed ( +- 0.028% )
jump label disabled:
new makefile ordering:
Performance counter stats for 'bash -c /tmp/timing;true' (50 runs):
4,620,784,202 cycles ( +- 0.014% )
4,009,564,429 instructions # 0.868 IPC ( +- 0.000% )
771,654,211 branches ( +- 0.000% )
1.733853839 seconds time elapsed ( +- 0.031% )
old makefile ordering:
Performance counter stats for 'bash -c /tmp/timing;true' (50 runs):
4,623,191,826 cycles ( +- 0.009% )
4,009,561,402 instructions # 0.867 IPC ( +- 0.000% )
771,655,250 branches ( +- 0.000% )
1.734191186 seconds time elapsed ( +- 0.009% )
So, with jump labels enabled we get instructions and branches to fall
even with the old Makefile ordering, but we don't get the corresponding
fall in cycles/wall time, without the new Makefile ordering. This
testing was done on a Kentsfield system.
> Anyway, should we put a comment in the Makefile telling us we should
> keep jump_label.o last?
>
Yes, I think that would be a good idea. I can re-post with the complete
testing results and a Makefile comment, if we are ok with this change.
> Also, pjt mentioned on IRC that mucking about with link order is
> something google is not unfamiliar with.. could we use some sort of
> runtime feedback to generate linker layout maps or so? That seems like a
> more scalable version than randomly mucking about with Makefiles :-)
Agreed. Definitely a good area to research. However, until we have that done, I
think this patch makes sense.
Thanks,
-Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] jump label: Reduce the cycle count by changing the link order
2011-08-08 15:40 ` Jason Baron
@ 2011-08-08 17:52 ` Arnaud Lacombe
2011-08-08 17:55 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Arnaud Lacombe @ 2011-08-08 17:52 UTC (permalink / raw)
To: Jason Baron; +Cc: Peter Zijlstra, rostedt, pjt, mingo, rth, linux-kernel
Hi,
On Mon, Aug 8, 2011 at 11:40 AM, Jason Baron <jbaron@redhat.com> wrote:
> On Sat, Aug 06, 2011 at 12:10:09AM +0200, Peter Zijlstra wrote:
>> On Fri, 2011-08-05 at 16:40 -0400, Jason Baron wrote:
>> [...]
>> Also, pjt mentioned on IRC that mucking about with link order is
>> something google is not unfamiliar with.. could we use some sort of
>> runtime feedback to generate linker layout maps or so? That seems like a
>> more scalable version than randomly mucking about with Makefiles :-)
>
> Agreed. Definitely a good area to research. However, until we have that done, I
> think this patch makes sense.
>
this might be a dumb question as I do not know much about that, but
couldn't hot code be put in their own section ?
Thanks,
- Arnaud
> Thanks,
>
> -Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] jump label: Reduce the cycle count by changing the link order
2011-08-08 17:52 ` Arnaud Lacombe
@ 2011-08-08 17:55 ` Peter Zijlstra
0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2011-08-08 17:55 UTC (permalink / raw)
To: Arnaud Lacombe; +Cc: Jason Baron, rostedt, pjt, mingo, rth, linux-kernel
On Mon, 2011-08-08 at 13:52 -0400, Arnaud Lacombe wrote:
> >> Also, pjt mentioned on IRC that mucking about with link order is
> >> something google is not unfamiliar with.. could we use some sort of
> >> runtime feedback to generate linker layout maps or so? That seems like a
> >> more scalable version than randomly mucking about with Makefiles :-)
> >
> > Agreed. Definitely a good area to research. However, until we have that done, I
> > think this patch makes sense.
> >
> this might be a dumb question as I do not know much about that, but
> couldn't hot code be put in their own section ?
Ah, but what is the hot code? The whole problem statement is finding
that.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:perf/urgent] jump label: Reduce the cycle count by changing the link order
2011-08-05 20:40 [PATCH] jump label: Reduce the cycle count by changing the link order Jason Baron
2011-08-05 22:10 ` Peter Zijlstra
@ 2011-08-09 14:29 ` tip-bot for Jason Baron
1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Jason Baron @ 2011-08-09 14:29 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, pjt, jbaron, tglx, mingo
Commit-ID: b77f0f3c1f587791aa5d9bd1b0012c9a89eb9258
Gitweb: http://git.kernel.org/tip/b77f0f3c1f587791aa5d9bd1b0012c9a89eb9258
Author: Jason Baron <jbaron@redhat.com>
AuthorDate: Fri, 5 Aug 2011 16:40:40 -0400
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 5 Aug 2011 23:57:33 +0200
jump label: Reduce the cycle count by changing the link order
In the course of testing jump labels for use with the CFS
bandwidth controller, Paul Turner, discovered that using jump
labels reduced the branch count and the instruction count, but
did not reduce the cycle count or wall time.
I noticed that having the jump_label.o included in the kernel
but not used in any way still caused this increase in cycle
count and wall time. Thus, I moved jump_label.o in the
kernel/Makefile, thus changing the link order, and presumably
moving it out of hot icache areas. This brought down the cycle
count/time as expected.
In addition to Paul's testing, I've tested the patch using a
single 'static_branch()' in the getppid() path, and basically
running tight loops of calls to getppid(). Here are my results
for the branch disabled case:
With jump labels turned on (CONFIG_JUMP_LABEL), branch disabled:
Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):
3,969,510,217 instructions # 0.864 IPC ( +-0.000% )
4,592,334,954 cycles ( +- 0.046% )
751,634,470 branches ( +- 0.000% )
1.722635797 seconds time elapsed ( +- 0.046% )
Jump labels turned off (CONFIG_JUMP_LABEL not set), branch
disabled:
Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs):
4,009,611,846 instructions # 0.867 IPC ( +-0.000% )
4,622,210,580 cycles ( +- 0.012% )
771,662,904 branches ( +- 0.000% )
1.734341454 seconds time elapsed ( +- 0.022% )
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: rth@redhat.com
Cc: a.p.zijlstra@chello.nl
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20110805204040.GG2522@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Paul Turner <pjt@google.com>
---
kernel/Makefile | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/kernel/Makefile b/kernel/Makefile
index d06467f..eca595e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
- async.o range.o jump_label.o
+ async.o range.o
obj-y += groups.o
ifdef CONFIG_FUNCTION_TRACER
@@ -107,6 +107,7 @@ obj-$(CONFIG_PERF_EVENTS) += events/
obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
obj-$(CONFIG_PADATA) += padata.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
+obj-$(CONFIG_JUMP_LABEL) += jump_label.o
ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-08-09 14:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-05 20:40 [PATCH] jump label: Reduce the cycle count by changing the link order Jason Baron
2011-08-05 22:10 ` Peter Zijlstra
2011-08-06 3:20 ` Paul Turner
2011-08-08 15:40 ` Jason Baron
2011-08-08 17:52 ` Arnaud Lacombe
2011-08-08 17:55 ` Peter Zijlstra
2011-08-09 14:29 ` [tip:perf/urgent] " tip-bot for Jason Baron
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox