public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	Andi Kleen <andi@firstfloor.org>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	Roland McGrath <roland@redhat.com>,
	Ulrich Drepper <drepper@redhat.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	"Luis Claudio R. Goncalves" <lclaudio@uudg.org>,
	Clark Williams <williams@redhat.com>
Subject: Re: [PATCH 0/5] ftrace: to kill a daemon
Date: Wed, 13 Aug 2008 02:31:26 -0400	[thread overview]
Message-ID: <20080813063126.GA12335@Krystal> (raw)
In-Reply-To: <alpine.DEB.1.10.0808082113090.3707@gandalf.stny.rr.com>

* Steven Rostedt (rostedt@goodmis.org) wrote:
> 
> On Fri, 8 Aug 2008, Linus Torvalds wrote:
> > 
> > 
> > On Fri, 8 Aug 2008, Jeremy Fitzhardinge wrote:
> > >
> > > Steven Rostedt wrote:
> > > > I wish we had a true 5 byte nop. 
> > > 
> > > 0x66 0x66 0x66 0x66 0x90
> > 
> > I don't think so. Multiple redundant prefixes can be really expensive on 
> > some uarchs.
> > 
> > A no-op that isn't cheap isn't a no-op at all, it's a slow-op.
> 
> 
> A quick meaningless benchmark showed a slight perfomance hit.
> 

Hi Steven,

I tried to run my own tests to see if I could get to know if these
numbers are actually meaningful at all. My results seems to show that
there is not any significant difference between the various
configurations, and actually that the only one tendency I see is that
the 2-bytes jump offset 0x03 would be slightly faster than the 3/2 nop
on Intel Xeon. But we would have to run these a bit more often to
confirm that I guess.

I am just trying to get a sense of whether we are really trying hard to
optimize something worthless in practice, and to me it looks like it.
But it could be the architecture I am using that brings these results.

Mathieu

Intel Xeon dual quad-core
Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz

3/2 nop used :
K8_NOP3 K8_NOP2
#define K8_NOP2 ".byte 0x66,0x90\n"
#define K8_NOP3 ".byte 0x66,0x66,0x90\n"

** Summary **

Test A : make -j20 2.6.27-rc2 kernel (real time)
                                          Avg.      std.dev
Case 1 : ftrace not compiled-in.          1m9.76s   0.41s
Case 2 : 3/2 nops                         1m9.95s   0.36s
Case 3 : 2-bytes jump, offset 0x03        1m9.10s   0.40s
Case 4 : 5-bytes jump, offset 0x00        1m9.25s   0.34s

Test B : hackbench 15

Case 1 : ftrace not compiled-in.          0.349s    0.007s
Case 2 : 3/2 nops                         0.351s    0.014s
Case 3 : 2-bytes jump, offset 0x03        0.350s    0.007s
Case 4 : 5-bytes jump, offset 0x00        0.351s    0.010s



** Detail **

* Test A

benchmark : make -j20 2.6.27-rc2 kernel
make clean; make -j20; make clean done before the tests to prime caches.
Same .config used.


Case 1 : ftrace not compiled-in.

real	1m9.980s
user	7m27.664s
sys	0m48.771s

real	1m9.330s
user	7m27.244s
sys	0m50.567s

real	1m9.393s
user	7m27.408s
sys	0m50.511s

real	1m9.674s
user	7m28.088s
sys	0m50.327s

real	1m10.441s
user	7m27.736s
sys	0m49.687s

real time
average : 1m9.76s
std. dev. : 0.41s

after a reboot with the same kernel :

real	1m8.758s
user	7m26.012s
sys	0m48.835s

real	1m11.035s
user	7m26.432s
sys	0m49.171s

real	1m9.834s
user	7m25.768s
sys	0m49.167s


Case 2 : 3/2 nops

real	1m9.713s
user	7m27.524s
sys	0m48.315s

real	1m9.481s
user	7m27.144s
sys	0m48.587s

real	1m10.565s
user	7m27.048s
sys	0m48.715s

real	1m10.008s
user	7m26.436s
sys	0m49.295s

real	1m9.982s
user	7m27.160s
sys	0m48.667s

real time
avg : 1m9.95s
std. dev. : 0.36s


Case 3 : 2-bytes jump, offset 0x03

real	1m9.158s
user	7m27.108s
sys	0m48.775s

real	1m9.159s
user	7m27.320s
sys	0m48.659s

real	1m8.390s
user	7m27.976s
sys	0m48.359s

real	1m9.143s
user	7m26.624s
sys	0m48.719s

real	1m9.642s
user	7m26.228s
sys	0m49.483s

real time
avg : 1m9.10s
std. dev. : 0.40s

one extra after reboot with same kernel :

real	1m8.855s
user	7m27.372s
sys	0m48.543s


Case 4 : 5-bytes jump, offset 0x00

real	1m9.173s
user	7m27.228s
sys	0m48.151s

real	1m9.735s
user	7m26.852s
sys	0m48.499s

real	1m9.502s
user	7m27.148s
sys	0m48.107s

real	1m8.727s
user	7m27.416s
sys	0m48.071s

real	1m9.115s
user	7m26.932s
sys	0m48.727s

real time
avg : 1m9.25s
std. dev. : 0.34s


* Test B

Hackbench

Case 1 : ftrace not compiled-in.

./hackbench 15
Time: 0.358
./hackbench 15
Time: 0.342
./hackbench 15
Time: 0.354
./hackbench 15
Time: 0.338
./hackbench 15
Time: 0.347

Average : 0.349
std. dev. : 0.007

Case 2 : 3/2 nops

./hackbench 15
Time: 0.328
./hackbench 15
Time: 0.368
./hackbench 15
Time: 0.351
./hackbench 15
Time: 0.343
./hackbench 15
Time: 0.366

Average : 0.351
std. dev. : 0.014

Case 3 : jmp 2 bytes

./hackbench 15
Time: 0.346
./hackbench 15
Time: 0.359
./hackbench 15
Time: 0.356
./hackbench 15
Time: 0.350
./hackbench 15
Time: 0.340

Average : 0.350
std. dev. : 0.007

Case 3 : jmp 5 bytes

./hackbench 15
Time: 0.346
./hackbench 15
Time: 0.346
./hackbench 15
Time: 0.364
./hackbench 15
Time: 0.362
./hackbench 15
Time: 0.338

Average : 0.351
std. dev. : 0.010


Hardware used :

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping	: 6
cpu MHz		: 2000.114
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx tm2 ssse3
cx16 xtpr dca sse4_1 lahf_lm
bogomips	: 4000.22
clflush size	: 64
cache_alignment	: 64
address sizes	: 38 bits physical, 48 bits virtual
power management:

(7 other similar cpus)


> Here's 10 runs of "hackbench 50" using the two part 5 byte nop:
> 
> run 1
> Time: 4.501
> run 2
> Time: 4.855
> run 3
> Time: 4.198
> run 4
> Time: 4.587
> run 5
> Time: 5.016
> run 6
> Time: 4.757
> run 7
> Time: 4.477
> run 8
> Time: 4.693
> run 9
> Time: 4.710
> run 10
> Time: 4.715
> avg = 4.6509
> 
> 
> And 10 runs using the above 5 byte nop:
> 
> run 1
> Time: 4.832
> run 2
> Time: 5.319
> run 3
> Time: 5.213
> run 4
> Time: 4.830
> run 5
> Time: 4.363
> run 6
> Time: 4.391
> run 7
> Time: 4.772
> run 8
> Time: 4.992
> run 9
> Time: 4.727
> run 10
> Time: 4.825
> avg = 4.8264
> 
> # cat /proc/cpuinfo
> processor	: 0
> vendor_id	: AuthenticAMD
> cpu family	: 15
> model		: 65
> model name	: Dual-Core AMD Opteron(tm) Processor 2220
> stepping	: 3
> cpu MHz		: 2799.992
> cache size	: 1024 KB
> physical id	: 0
> siblings	: 2
> core id		: 0
> cpu cores	: 2
> apicid		: 0
> initial apicid	: 0
> fdiv_bug	: no
> hlt_bug		: no
> f00f_bug	: no
> coma_bug	: no
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 1
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt 
> rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic 
> cr8_legacy
> bogomips	: 5599.98
> clflush size	: 64
> power management: ts fid vid ttp tm stc
> 
> There's 4 of these.
> 
> Just to make sure, I ran the above nop test again:
> 
> [ this is reverse from the above runs ]
> 
> run 1
> Time: 4.723
> run 2
> Time: 5.080
> run 3
> Time: 4.521
> run 4
> Time: 4.841
> run 5
> Time: 4.696
> run 6
> Time: 4.946
> run 7
> Time: 4.754
> run 8
> Time: 4.717
> run 9
> Time: 4.905
> run 10
> Time: 4.814
> avg = 4.7997
> 
> And again the two part nop:
> 
> run 1
> Time: 4.434
> run 2
> Time: 4.496
> run 3
> Time: 4.801
> run 4
> Time: 4.714
> run 5
> Time: 4.631
> run 6
> Time: 5.178
> run 7
> Time: 4.728
> run 8
> Time: 4.920
> run 9
> Time: 4.898
> run 10
> Time: 4.770
> avg = 4.757
> 
> 
> This time it was close, but still seems to have some difference.
> 
> heh, perhaps it's just noise.
> 
> -- Steve
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-08-13  6:31 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-07 18:20 [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-07 18:20 ` [PATCH 1/5] ftrace: create __mcount_loc section Steven Rostedt
2008-08-07 18:20 ` [PATCH 2/5] ftrace: mcount call site on boot nops core Steven Rostedt
2008-08-07 18:20 ` [PATCH 3/5] ftrace: enable mcount recording for modules Steven Rostedt
2008-08-08  6:43   ` Rusty Russell
2008-08-08 12:51     ` Steven Rostedt
2008-08-07 18:20 ` [PATCH 4/5] ftrace: rebuild everything on change to FTRACE_MCOUNT_RECORD Steven Rostedt
2008-08-07 18:20 ` [PATCH 5/5] ftrace: enable using mcount recording on x86 Steven Rostedt
2008-08-07 18:47 ` [PATCH 0/5] ftrace: to kill a daemon Mathieu Desnoyers
2008-08-07 20:42   ` Steven Rostedt
2008-08-08 17:22     ` Mathieu Desnoyers
2008-08-08 17:36       ` Steven Rostedt
2008-08-08 17:46         ` Mathieu Desnoyers
2008-08-08 18:13           ` Steven Rostedt
2008-08-08 18:15             ` Peter Zijlstra
2008-08-08 18:21             ` Mathieu Desnoyers
2008-08-08 18:41               ` Steven Rostedt
2008-08-08 19:04                 ` Linus Torvalds
2008-08-08 19:05                 ` Mathieu Desnoyers
2008-08-08 23:38                   ` Steven Rostedt
2008-08-09  0:23                     ` Andi Kleen
2008-08-09  0:36                       ` Steven Rostedt
2008-08-09  0:47                         ` Jeremy Fitzhardinge
2008-08-09  0:51                           ` Linus Torvalds
2008-08-09  1:25                             ` Steven Rostedt
2008-08-13  6:31                               ` Mathieu Desnoyers [this message]
2008-08-13 15:38                                 ` Mathieu Desnoyers
2008-08-13 17:52                               ` Efficient x86 and x86_64 NOP microbenchmarks Mathieu Desnoyers
2008-08-13 18:27                                 ` Linus Torvalds
2008-08-13 18:41                                   ` Andi Kleen
2008-08-13 18:45                                     ` Avi Kivity
2008-08-13 18:51                                       ` Andi Kleen
2008-08-13 18:56                                         ` Avi Kivity
2008-08-13 19:30                                     ` Mathieu Desnoyers
2008-08-13 19:37                                       ` Andi Kleen
2008-08-13 20:01                                         ` Mathieu Desnoyers
2008-08-13 23:41                                           ` [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug Mathieu Desnoyers
2008-08-14  0:01                                             ` H. Peter Anvin
2008-08-14  1:13                                               ` Mathieu Desnoyers
2008-08-14  1:22                                               ` Jeremy Fitzhardinge
2008-08-14  1:26                                                 ` Roland McGrath
2008-08-14  1:49                                                 ` Mathieu Desnoyers
2008-08-14  3:35                                                   ` Jeremy Fitzhardinge
2008-08-14 15:18                                                     ` Mathieu Desnoyers
2008-08-14 16:10                                                       ` Linus Torvalds
2008-08-14 16:13                                                       ` H. Peter Anvin
2008-08-14 16:58                                                         ` Mathieu Desnoyers
2008-08-14 17:05                                                           ` Jeremy Fitzhardinge
2008-08-14 17:30                                                             ` Mathieu Desnoyers
2008-08-14 17:43                                                               ` Jeremy Fitzhardinge
2008-08-14 18:37                                                                 ` H. Peter Anvin
2008-08-14 18:53                                                                   ` Mathieu Desnoyers
2008-08-14 19:29                                                                     ` Jeremy Fitzhardinge
2008-08-14 20:31                                                                       ` Mathieu Desnoyers
2008-08-14 20:39                                                                         ` H. Peter Anvin
2008-08-14 21:46                                                                         ` Jeremy Fitzhardinge
2008-08-14 22:26                                                                           ` H. Peter Anvin
2008-08-14 17:17                                                           ` H. Peter Anvin
2008-08-14 18:09                                                             ` Mathieu Desnoyers
2008-08-14 19:49                                                             ` Mathieu Desnoyers
2008-08-14 17:04                                                       ` Jeremy Fitzhardinge
2008-08-14 17:18                                                         ` H. Peter Anvin
2008-08-14 17:28                                                           ` Jeremy Fitzhardinge
2008-08-14 17:31                                                             ` H. Peter Anvin
2008-08-14 17:46                                                           ` Mathieu Desnoyers
2008-08-14 17:49                                                             ` Jeremy Fitzhardinge
2008-08-14 17:55                                                               ` Mathieu Desnoyers
2008-08-14 18:59                                                                 ` Gregory Haskins
2008-08-15 21:34                                         ` Efficient x86 and x86_64 NOP microbenchmarks Steven Rostedt
2008-08-15 21:51                                           ` Andi Kleen
2008-08-13 19:16                                   ` Mathieu Desnoyers
2008-08-09  0:51                           ` [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-09  0:53                         ` Roland McGrath
2008-08-09  1:13                           ` Andi Kleen
2008-08-09  1:19                         ` Andi Kleen
2008-08-09  1:30                           ` Steven Rostedt
2008-08-09  1:55                             ` Andi Kleen
2008-08-09  2:03                               ` Steven Rostedt
2008-08-09  2:23                                 ` Andi Kleen
2008-08-09  4:12                           ` Steven Rostedt
2008-08-09  0:30                     ` Steven Rostedt
2008-08-11 18:21                       ` Mathieu Desnoyers
2008-08-11 19:28                         ` Steven Rostedt
2008-08-08 19:08                 ` Jeremy Fitzhardinge
2008-08-11  2:41                 ` Rusty Russell
2008-08-11 12:33                   ` Steven Rostedt
2008-08-07 21:11 ` Jeremy Fitzhardinge
2008-08-07 21:29   ` Steven Rostedt
2008-08-07 22:26     ` Roland McGrath
2008-08-08  1:21       ` Steven Rostedt
2008-08-08  1:24         ` Steven Rostedt
2008-08-08  1:56         ` Steven Rostedt
2008-08-08  7:22         ` Peter Zijlstra
2008-08-08 11:31           ` Steven Rostedt
2008-08-08  4:54       ` Sam Ravnborg
2008-08-09  9:48 ` Abhishek Sagar
2008-08-09 13:01   ` Steven Rostedt
2008-08-09 15:01     ` Abhishek Sagar
2008-08-09 15:37       ` Steven Rostedt
2008-08-09 17:14         ` Abhishek Sagar
     [not found] <aYipy-5FM-9@gated-at.bofh.it>
2008-08-07 21:28 ` Bodo Eggert
2008-08-07 21:24   ` Jeremy Fitzhardinge
2008-08-07 21:35   ` Steven Rostedt
     [not found] ` <aYiIP-6bN-11@gated-at.bofh.it>
     [not found]   ` <aYkAT-15Q-1@gated-at.bofh.it>
     [not found]     ` <aYDX5-fa-31@gated-at.bofh.it>
     [not found]       ` <aYE6H-tg-21@gated-at.bofh.it>
     [not found]         ` <aYEgj-Fc-17@gated-at.bofh.it>
     [not found]           ` <aYEJq-1xB-23@gated-at.bofh.it>
     [not found]             ` <aYET8-1L7-17@gated-at.bofh.it>
     [not found]               ` <aYFcq-2f2-11@gated-at.bofh.it>
     [not found]                 ` <aYFvP-2Vc-25@gated-at.bofh.it>
     [not found]                   ` <aYJIZ-15G-7@gated-at.bofh.it>
     [not found]                     ` <aYKvs-2gQ-7@gated-at.bofh.it>
     [not found]                       ` <aYKF1-2qs-7@gated-at.bofh.it>
2008-08-09 11:50                         ` Bodo Eggert
2008-08-09 13:02                           ` Steven Rostedt
2008-08-09 14:25                             ` Steven Rostedt
2008-08-09 14:42                             ` Bodo Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080813063126.GA12335@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=davem@davemloft.net \
    --cc=drepper@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=jeremy@goop.org \
    --cc=lclaudio@uudg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=roland@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox