From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Jeremy Fitzhardinge <jeremy@goop.org>,
Andi Kleen <andi@firstfloor.org>,
LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Miller <davem@davemloft.net>,
Roland McGrath <roland@redhat.com>,
Ulrich Drepper <drepper@redhat.com>,
Rusty Russell <rusty@rustcorp.com.au>,
Gregory Haskins <ghaskins@novell.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
"Luis Claudio R. Goncalves" <lclaudio@uudg.org>,
Clark Williams <williams@redhat.com>
Subject: Re: [PATCH 0/5] ftrace: to kill a daemon
Date: Wed, 13 Aug 2008 11:38:00 -0400 [thread overview]
Message-ID: <20080813153800.GF5853@Krystal> (raw)
In-Reply-To: <20080813063126.GA12335@Krystal>
* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> >
> > On Fri, 8 Aug 2008, Linus Torvalds wrote:
> > >
> > >
> > > On Fri, 8 Aug 2008, Jeremy Fitzhardinge wrote:
> > > >
> > > > Steven Rostedt wrote:
> > > > > I wish we had a true 5 byte nop.
> > > >
> > > > 0x66 0x66 0x66 0x66 0x90
> > >
> > > I don't think so. Multiple redundant prefixes can be really expensive on
> > > some uarchs.
> > >
> > > A no-op that isn't cheap isn't a no-op at all, it's a slow-op.
> >
> >
> > A quick meaningless benchmark showed a slight perfomance hit.
> >
>
> Hi Steven,
>
> I tried to run my own tests to see if I could get to know if these
> numbers are actually meaningful at all. My results seems to show that
> there is not any significant difference between the various
> configurations, and actually that the only one tendency I see is that
> the 2-bytes jump offset 0x03 would be slightly faster than the 3/2 nop
> on Intel Xeon. But we would have to run these a bit more often to
> confirm that I guess.
>
> I am just trying to get a sense of whether we are really trying hard to
> optimize something worthless in practice, and to me it looks like it.
> But it could be the architecture I am using that brings these results.
>
> Mathieu
>
> Intel Xeon dual quad-core
> Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
>
> 3/2 nop used :
> K8_NOP3 K8_NOP2
> #define K8_NOP2 ".byte 0x66,0x90\n"
> #define K8_NOP3 ".byte 0x66,0x66,0x90\n"
>
Small correction : my architecture uses the P6_NOP5, which is an atomic
5-bytes nop (just looked at the runtime output of find_nop_table()).
5: nopl 0x00(%eax,%eax,1)
#define P6_NOP5 ".byte 0x0f,0x1f,0x44,0x00,0\n"
But the results stands. Maybe I should try to force a test with a
K8_NOP3 K8_NOP2 nop.
Mathieu
> ** Summary **
>
> Test A : make -j20 2.6.27-rc2 kernel (real time)
> Avg. std.dev
> Case 1 : ftrace not compiled-in. 1m9.76s 0.41s
> Case 2 : 3/2 nops 1m9.95s 0.36s
> Case 3 : 2-bytes jump, offset 0x03 1m9.10s 0.40s
> Case 4 : 5-bytes jump, offset 0x00 1m9.25s 0.34s
>
> Test B : hackbench 15
>
> Case 1 : ftrace not compiled-in. 0.349s 0.007s
> Case 2 : 3/2 nops 0.351s 0.014s
> Case 3 : 2-bytes jump, offset 0x03 0.350s 0.007s
> Case 4 : 5-bytes jump, offset 0x00 0.351s 0.010s
>
>
>
> ** Detail **
>
> * Test A
>
> benchmark : make -j20 2.6.27-rc2 kernel
> make clean; make -j20; make clean done before the tests to prime caches.
> Same .config used.
>
>
> Case 1 : ftrace not compiled-in.
>
> real 1m9.980s
> user 7m27.664s
> sys 0m48.771s
>
> real 1m9.330s
> user 7m27.244s
> sys 0m50.567s
>
> real 1m9.393s
> user 7m27.408s
> sys 0m50.511s
>
> real 1m9.674s
> user 7m28.088s
> sys 0m50.327s
>
> real 1m10.441s
> user 7m27.736s
> sys 0m49.687s
>
> real time
> average : 1m9.76s
> std. dev. : 0.41s
>
> after a reboot with the same kernel :
>
> real 1m8.758s
> user 7m26.012s
> sys 0m48.835s
>
> real 1m11.035s
> user 7m26.432s
> sys 0m49.171s
>
> real 1m9.834s
> user 7m25.768s
> sys 0m49.167s
>
>
> Case 2 : 3/2 nops
>
> real 1m9.713s
> user 7m27.524s
> sys 0m48.315s
>
> real 1m9.481s
> user 7m27.144s
> sys 0m48.587s
>
> real 1m10.565s
> user 7m27.048s
> sys 0m48.715s
>
> real 1m10.008s
> user 7m26.436s
> sys 0m49.295s
>
> real 1m9.982s
> user 7m27.160s
> sys 0m48.667s
>
> real time
> avg : 1m9.95s
> std. dev. : 0.36s
>
>
> Case 3 : 2-bytes jump, offset 0x03
>
> real 1m9.158s
> user 7m27.108s
> sys 0m48.775s
>
> real 1m9.159s
> user 7m27.320s
> sys 0m48.659s
>
> real 1m8.390s
> user 7m27.976s
> sys 0m48.359s
>
> real 1m9.143s
> user 7m26.624s
> sys 0m48.719s
>
> real 1m9.642s
> user 7m26.228s
> sys 0m49.483s
>
> real time
> avg : 1m9.10s
> std. dev. : 0.40s
>
> one extra after reboot with same kernel :
>
> real 1m8.855s
> user 7m27.372s
> sys 0m48.543s
>
>
> Case 4 : 5-bytes jump, offset 0x00
>
> real 1m9.173s
> user 7m27.228s
> sys 0m48.151s
>
> real 1m9.735s
> user 7m26.852s
> sys 0m48.499s
>
> real 1m9.502s
> user 7m27.148s
> sys 0m48.107s
>
> real 1m8.727s
> user 7m27.416s
> sys 0m48.071s
>
> real 1m9.115s
> user 7m26.932s
> sys 0m48.727s
>
> real time
> avg : 1m9.25s
> std. dev. : 0.34s
>
>
> * Test B
>
> Hackbench
>
> Case 1 : ftrace not compiled-in.
>
> ./hackbench 15
> Time: 0.358
> ./hackbench 15
> Time: 0.342
> ./hackbench 15
> Time: 0.354
> ./hackbench 15
> Time: 0.338
> ./hackbench 15
> Time: 0.347
>
> Average : 0.349
> std. dev. : 0.007
>
> Case 2 : 3/2 nops
>
> ./hackbench 15
> Time: 0.328
> ./hackbench 15
> Time: 0.368
> ./hackbench 15
> Time: 0.351
> ./hackbench 15
> Time: 0.343
> ./hackbench 15
> Time: 0.366
>
> Average : 0.351
> std. dev. : 0.014
>
> Case 3 : jmp 2 bytes
>
> ./hackbench 15
> Time: 0.346
> ./hackbench 15
> Time: 0.359
> ./hackbench 15
> Time: 0.356
> ./hackbench 15
> Time: 0.350
> ./hackbench 15
> Time: 0.340
>
> Average : 0.350
> std. dev. : 0.007
>
> Case 3 : jmp 5 bytes
>
> ./hackbench 15
> Time: 0.346
> ./hackbench 15
> Time: 0.346
> ./hackbench 15
> Time: 0.364
> ./hackbench 15
> Time: 0.362
> ./hackbench 15
> Time: 0.338
>
> Average : 0.351
> std. dev. : 0.010
>
>
> Hardware used :
>
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 23
> model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
> stepping : 6
> cpu MHz : 2000.114
> cache size : 6144 KB
> physical id : 0
> siblings : 4
> core id : 0
> cpu cores : 4
> apicid : 0
> initial apicid : 0
> fpu : yes
> fpu_exception : yes
> cpuid level : 10
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
> constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx tm2 ssse3
> cx16 xtpr dca sse4_1 lahf_lm
> bogomips : 4000.22
> clflush size : 64
> cache_alignment : 64
> address sizes : 38 bits physical, 48 bits virtual
> power management:
>
> (7 other similar cpus)
>
>
> > Here's 10 runs of "hackbench 50" using the two part 5 byte nop:
> >
> > run 1
> > Time: 4.501
> > run 2
> > Time: 4.855
> > run 3
> > Time: 4.198
> > run 4
> > Time: 4.587
> > run 5
> > Time: 5.016
> > run 6
> > Time: 4.757
> > run 7
> > Time: 4.477
> > run 8
> > Time: 4.693
> > run 9
> > Time: 4.710
> > run 10
> > Time: 4.715
> > avg = 4.6509
> >
> >
> > And 10 runs using the above 5 byte nop:
> >
> > run 1
> > Time: 4.832
> > run 2
> > Time: 5.319
> > run 3
> > Time: 5.213
> > run 4
> > Time: 4.830
> > run 5
> > Time: 4.363
> > run 6
> > Time: 4.391
> > run 7
> > Time: 4.772
> > run 8
> > Time: 4.992
> > run 9
> > Time: 4.727
> > run 10
> > Time: 4.825
> > avg = 4.8264
> >
> > # cat /proc/cpuinfo
> > processor : 0
> > vendor_id : AuthenticAMD
> > cpu family : 15
> > model : 65
> > model name : Dual-Core AMD Opteron(tm) Processor 2220
> > stepping : 3
> > cpu MHz : 2799.992
> > cache size : 1024 KB
> > physical id : 0
> > siblings : 2
> > core id : 0
> > cpu cores : 2
> > apicid : 0
> > initial apicid : 0
> > fdiv_bug : no
> > hlt_bug : no
> > f00f_bug : no
> > coma_bug : no
> > fpu : yes
> > fpu_exception : yes
> > cpuid level : 1
> > wp : yes
> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
> > rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic
> > cr8_legacy
> > bogomips : 5599.98
> > clflush size : 64
> > power management: ts fid vid ttp tm stc
> >
> > There's 4 of these.
> >
> > Just to make sure, I ran the above nop test again:
> >
> > [ this is reverse from the above runs ]
> >
> > run 1
> > Time: 4.723
> > run 2
> > Time: 5.080
> > run 3
> > Time: 4.521
> > run 4
> > Time: 4.841
> > run 5
> > Time: 4.696
> > run 6
> > Time: 4.946
> > run 7
> > Time: 4.754
> > run 8
> > Time: 4.717
> > run 9
> > Time: 4.905
> > run 10
> > Time: 4.814
> > avg = 4.7997
> >
> > And again the two part nop:
> >
> > run 1
> > Time: 4.434
> > run 2
> > Time: 4.496
> > run 3
> > Time: 4.801
> > run 4
> > Time: 4.714
> > run 5
> > Time: 4.631
> > run 6
> > Time: 5.178
> > run 7
> > Time: 4.728
> > run 8
> > Time: 4.920
> > run 9
> > Time: 4.898
> > run 10
> > Time: 4.770
> > avg = 4.757
> >
> >
> > This time it was close, but still seems to have some difference.
> >
> > heh, perhaps it's just noise.
> >
> > -- Steve
> >
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-08-13 15:43 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-07 18:20 [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-07 18:20 ` [PATCH 1/5] ftrace: create __mcount_loc section Steven Rostedt
2008-08-07 18:20 ` [PATCH 2/5] ftrace: mcount call site on boot nops core Steven Rostedt
2008-08-07 18:20 ` [PATCH 3/5] ftrace: enable mcount recording for modules Steven Rostedt
2008-08-08 6:43 ` Rusty Russell
2008-08-08 12:51 ` Steven Rostedt
2008-08-07 18:20 ` [PATCH 4/5] ftrace: rebuild everything on change to FTRACE_MCOUNT_RECORD Steven Rostedt
2008-08-07 18:20 ` [PATCH 5/5] ftrace: enable using mcount recording on x86 Steven Rostedt
2008-08-07 18:47 ` [PATCH 0/5] ftrace: to kill a daemon Mathieu Desnoyers
2008-08-07 20:42 ` Steven Rostedt
2008-08-08 17:22 ` Mathieu Desnoyers
2008-08-08 17:36 ` Steven Rostedt
2008-08-08 17:46 ` Mathieu Desnoyers
2008-08-08 18:13 ` Steven Rostedt
2008-08-08 18:15 ` Peter Zijlstra
2008-08-08 18:21 ` Mathieu Desnoyers
2008-08-08 18:41 ` Steven Rostedt
2008-08-08 19:04 ` Linus Torvalds
2008-08-08 19:05 ` Mathieu Desnoyers
2008-08-08 23:38 ` Steven Rostedt
2008-08-09 0:23 ` Andi Kleen
2008-08-09 0:36 ` Steven Rostedt
2008-08-09 0:47 ` Jeremy Fitzhardinge
2008-08-09 0:51 ` Linus Torvalds
2008-08-09 1:25 ` Steven Rostedt
2008-08-13 6:31 ` Mathieu Desnoyers
2008-08-13 15:38 ` Mathieu Desnoyers [this message]
2008-08-13 17:52 ` Efficient x86 and x86_64 NOP microbenchmarks Mathieu Desnoyers
2008-08-13 18:27 ` Linus Torvalds
2008-08-13 18:41 ` Andi Kleen
2008-08-13 18:45 ` Avi Kivity
2008-08-13 18:51 ` Andi Kleen
2008-08-13 18:56 ` Avi Kivity
2008-08-13 19:30 ` Mathieu Desnoyers
2008-08-13 19:37 ` Andi Kleen
2008-08-13 20:01 ` Mathieu Desnoyers
2008-08-13 23:41 ` [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug Mathieu Desnoyers
2008-08-14 0:01 ` H. Peter Anvin
2008-08-14 1:13 ` Mathieu Desnoyers
2008-08-14 1:22 ` Jeremy Fitzhardinge
2008-08-14 1:26 ` Roland McGrath
2008-08-14 1:49 ` Mathieu Desnoyers
2008-08-14 3:35 ` Jeremy Fitzhardinge
2008-08-14 15:18 ` Mathieu Desnoyers
2008-08-14 16:10 ` Linus Torvalds
2008-08-14 16:13 ` H. Peter Anvin
2008-08-14 16:58 ` Mathieu Desnoyers
2008-08-14 17:05 ` Jeremy Fitzhardinge
2008-08-14 17:30 ` Mathieu Desnoyers
2008-08-14 17:43 ` Jeremy Fitzhardinge
2008-08-14 18:37 ` H. Peter Anvin
2008-08-14 18:53 ` Mathieu Desnoyers
2008-08-14 19:29 ` Jeremy Fitzhardinge
2008-08-14 20:31 ` Mathieu Desnoyers
2008-08-14 20:39 ` H. Peter Anvin
2008-08-14 21:46 ` Jeremy Fitzhardinge
2008-08-14 22:26 ` H. Peter Anvin
2008-08-14 17:17 ` H. Peter Anvin
2008-08-14 18:09 ` Mathieu Desnoyers
2008-08-14 19:49 ` Mathieu Desnoyers
2008-08-14 17:04 ` Jeremy Fitzhardinge
2008-08-14 17:18 ` H. Peter Anvin
2008-08-14 17:28 ` Jeremy Fitzhardinge
2008-08-14 17:31 ` H. Peter Anvin
2008-08-14 17:46 ` Mathieu Desnoyers
2008-08-14 17:49 ` Jeremy Fitzhardinge
2008-08-14 17:55 ` Mathieu Desnoyers
2008-08-14 18:59 ` Gregory Haskins
2008-08-15 21:34 ` Efficient x86 and x86_64 NOP microbenchmarks Steven Rostedt
2008-08-15 21:51 ` Andi Kleen
2008-08-13 19:16 ` Mathieu Desnoyers
2008-08-09 0:51 ` [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-09 0:53 ` Roland McGrath
2008-08-09 1:13 ` Andi Kleen
2008-08-09 1:19 ` Andi Kleen
2008-08-09 1:30 ` Steven Rostedt
2008-08-09 1:55 ` Andi Kleen
2008-08-09 2:03 ` Steven Rostedt
2008-08-09 2:23 ` Andi Kleen
2008-08-09 4:12 ` Steven Rostedt
2008-08-09 0:30 ` Steven Rostedt
2008-08-11 18:21 ` Mathieu Desnoyers
2008-08-11 19:28 ` Steven Rostedt
2008-08-08 19:08 ` Jeremy Fitzhardinge
2008-08-11 2:41 ` Rusty Russell
2008-08-11 12:33 ` Steven Rostedt
2008-08-07 21:11 ` Jeremy Fitzhardinge
2008-08-07 21:29 ` Steven Rostedt
2008-08-07 22:26 ` Roland McGrath
2008-08-08 1:21 ` Steven Rostedt
2008-08-08 1:24 ` Steven Rostedt
2008-08-08 1:56 ` Steven Rostedt
2008-08-08 7:22 ` Peter Zijlstra
2008-08-08 11:31 ` Steven Rostedt
2008-08-08 4:54 ` Sam Ravnborg
2008-08-09 9:48 ` Abhishek Sagar
2008-08-09 13:01 ` Steven Rostedt
2008-08-09 15:01 ` Abhishek Sagar
2008-08-09 15:37 ` Steven Rostedt
2008-08-09 17:14 ` Abhishek Sagar
[not found] <aYipy-5FM-9@gated-at.bofh.it>
2008-08-07 21:28 ` Bodo Eggert
2008-08-07 21:24 ` Jeremy Fitzhardinge
2008-08-07 21:35 ` Steven Rostedt
[not found] ` <aYiIP-6bN-11@gated-at.bofh.it>
[not found] ` <aYkAT-15Q-1@gated-at.bofh.it>
[not found] ` <aYDX5-fa-31@gated-at.bofh.it>
[not found] ` <aYE6H-tg-21@gated-at.bofh.it>
[not found] ` <aYEgj-Fc-17@gated-at.bofh.it>
[not found] ` <aYEJq-1xB-23@gated-at.bofh.it>
[not found] ` <aYET8-1L7-17@gated-at.bofh.it>
[not found] ` <aYFcq-2f2-11@gated-at.bofh.it>
[not found] ` <aYFvP-2Vc-25@gated-at.bofh.it>
[not found] ` <aYJIZ-15G-7@gated-at.bofh.it>
[not found] ` <aYKvs-2gQ-7@gated-at.bofh.it>
[not found] ` <aYKF1-2qs-7@gated-at.bofh.it>
2008-08-09 11:50 ` Bodo Eggert
2008-08-09 13:02 ` Steven Rostedt
2008-08-09 14:25 ` Steven Rostedt
2008-08-09 14:42 ` Bodo Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080813153800.GF5853@Krystal \
--to=compudj@krystal.dyndns.org \
--cc=acme@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=drepper@redhat.com \
--cc=ghaskins@novell.com \
--cc=jeremy@goop.org \
--cc=lclaudio@uudg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=roland@redhat.com \
--cc=rostedt@goodmis.org \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox