public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Steven Rostedt <srostedt@redhat.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	Andi Kleen <andi@firstfloor.org>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	Roland McGrath <roland@redhat.com>,
	Ulrich Drepper <drepper@redhat.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	"Luis Claudio R. Goncalves" <lclaudio@uudg.org>,
	Clark Williams <williams@redhat.com>
Subject: Re: Efficient x86 and x86_64 NOP microbenchmarks
Date: Wed, 13 Aug 2008 15:16:14 -0400	[thread overview]
Message-ID: <20080813191614.GA15547@Krystal> (raw)
In-Reply-To: <alpine.LFD.1.10.0808131119290.3462@nehalem.linux-foundation.org>

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> 
> 
> On Wed, 13 Aug 2008, Mathieu Desnoyers wrote:
> > 
> > I also did some microbenchmarks on my Intel Xeon 64 bits, AMD64 and
> > Intel Pentium 4 boxes to compare a baseline
> 
> Note that the biggest problems of a jump-based nop are likely to happen 
> when there are I$ misses and/or when there are other jumps involved. Ie a 
> some microarchitectures tend to have issues with jumps to jumps, or when 
> there are multiple control changes in the same (possibly partial) 
> cacheline because the instruction stream prediction may be predecoded in 
> the L1 I$, and multiple branches in the same cacheline - or in the same 
> execution cycle - can pollute that kind of thing.
> 

Yup, I agree. Actually, the tests I ran shows that using jumps as nops
does not seems to be the best solution, even cycle-wise.

> So microbenchmarking this way will probably make some things look 
> unrealistically good. 
> 

Yes, I am aware of these "high locality" effects. I use these tests as a
starting point to find out which nops are good candidates, and then it
can be later validated with more thorough testing on real workloads,
which will suffer from higher standard deviation.

Interestingly enough, the P6_NOPS seems to be a poor choice both at the
macro and micro levels for the Intel Xeon (referring to
http://lkml.org/lkml/2008/8/13/253 for the macro-benchmarks).

> On the P4, the trace cache makes things even more interesting, since it's 
> another level of I$ entirely, with very different behavior for the hit 
> case vs the miss case.

As long as the whole kernel agrees on which instructions should be used
for frequently used nops, the instruction trace cache should behave
properly.

> 
> And I$ misses for the kernel are actually fairly high. Not in 
> microbenchmarks that tend to have very repetive behavior and a small I$ 
> footprint, but in a lot of real-life loads the *bulk* of all action is in 
> user space, and then the kernel side is often invoced with few loops (the 
> kernel has very few loops indeed) and a cold I$.

I assume the effect of I$ miss to be the same for all the tested
scenarios (except on P4, and maybe except for the jump cases), given
that in each case we load 5-bytes worth of instructions. Even
considering this, the results I get show that the choices made in the
current kernel does might not be the best ones.

> 
> So your numbers are interesting, but it would be really good to also get 
> some info from Intel/AMD who may know about microarchitectural issues for 
> the cases that don't show up in the hot-I$-cache environment.
> 

Yep. I think it may make a difference if we use jumps, but I doubt it
will change anything to the other various nops. Still, having that
information would be good.

Some more numbers follow for older architectures.

Intel Pentium 3, 550MHz

NR_TESTS                                    10000000
test empty cycles :                        510000254
test 2-bytes jump cycles :                 510000077
test 5-bytes jump cycles :                 510000101
test 3/2 nops cycles :                     500000072
test 5-bytes nop with long prefix cycles : 500000107
test 5-bytes P6 nop cycles :               500000069 (current choice ok)
test Generic 1/4 5-bytes nops cycles :     514687590
test K7 1/4 5-bytes nops cycles :          530000012

Intel Pentium 3, 933MHz

NR_TESTS                                    10000000
test empty cycles :                        510000565
test 2-bytes jump cycles :                 510000133
test 5-bytes jump cycles :                 510000363
test 3/2 nops cycles :                     500000358
test 5-bytes nop with long prefix cycles : 500000331
test 5-bytes P6 nop cycles :               500000625 (current choice ok)
test Generic 1/4 5-bytes nops cycles :     514687797
test K7 1/4 5-bytes nops cycles :          530000273


Intel Pentium M, 2GHz

NR_TESTS                                    10000000
test empty cycles :                        180000515
test 2-bytes jump cycles :                 180000386 (would be the best)
test 5-bytes jump cycles :                 205000435
test 3/2 nops cycles :                     193333517
test 5-bytes nop with long prefix cycles : 205000167
test 5-bytes P6 nop cycles :               205937652
test Generic 1/4 5-bytes nops cycles :     187500174
test K7 1/4 5-bytes nops cycles :          193750161


Intel Pentium 3, 550MHz

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 7
model name	: Pentium III (Katmai)
stepping	: 3
cpu MHz		: 551.295
cache size	: 512 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips	: 1103.44
clflush size	: 32

Intel Pentium 3, 933MHz

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 8
model name	: Pentium III (Coppermine)
stepping	: 6
cpu MHz		: 933.134
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips	: 1868.22
clflush size	: 32

Intel Pentium M, 2GHz

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 13
model name	: Intel(R) Pentium(R) M processor 2.00GHz
stepping	: 8
cpu MHz		: 2000.000
cache size	: 2048 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx bts est tm2
bogomips	: 3994.64
clflush size	: 64

Mathieu

> 			Linus




-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  parent reply	other threads:[~2008-08-13 19:16 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-07 18:20 [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-07 18:20 ` [PATCH 1/5] ftrace: create __mcount_loc section Steven Rostedt
2008-08-07 18:20 ` [PATCH 2/5] ftrace: mcount call site on boot nops core Steven Rostedt
2008-08-07 18:20 ` [PATCH 3/5] ftrace: enable mcount recording for modules Steven Rostedt
2008-08-08  6:43   ` Rusty Russell
2008-08-08 12:51     ` Steven Rostedt
2008-08-07 18:20 ` [PATCH 4/5] ftrace: rebuild everything on change to FTRACE_MCOUNT_RECORD Steven Rostedt
2008-08-07 18:20 ` [PATCH 5/5] ftrace: enable using mcount recording on x86 Steven Rostedt
2008-08-07 18:47 ` [PATCH 0/5] ftrace: to kill a daemon Mathieu Desnoyers
2008-08-07 20:42   ` Steven Rostedt
2008-08-08 17:22     ` Mathieu Desnoyers
2008-08-08 17:36       ` Steven Rostedt
2008-08-08 17:46         ` Mathieu Desnoyers
2008-08-08 18:13           ` Steven Rostedt
2008-08-08 18:15             ` Peter Zijlstra
2008-08-08 18:21             ` Mathieu Desnoyers
2008-08-08 18:41               ` Steven Rostedt
2008-08-08 19:04                 ` Linus Torvalds
2008-08-08 19:05                 ` Mathieu Desnoyers
2008-08-08 23:38                   ` Steven Rostedt
2008-08-09  0:23                     ` Andi Kleen
2008-08-09  0:36                       ` Steven Rostedt
2008-08-09  0:47                         ` Jeremy Fitzhardinge
2008-08-09  0:51                           ` Linus Torvalds
2008-08-09  1:25                             ` Steven Rostedt
2008-08-13  6:31                               ` Mathieu Desnoyers
2008-08-13 15:38                                 ` Mathieu Desnoyers
2008-08-13 17:52                               ` Efficient x86 and x86_64 NOP microbenchmarks Mathieu Desnoyers
2008-08-13 18:27                                 ` Linus Torvalds
2008-08-13 18:41                                   ` Andi Kleen
2008-08-13 18:45                                     ` Avi Kivity
2008-08-13 18:51                                       ` Andi Kleen
2008-08-13 18:56                                         ` Avi Kivity
2008-08-13 19:30                                     ` Mathieu Desnoyers
2008-08-13 19:37                                       ` Andi Kleen
2008-08-13 20:01                                         ` Mathieu Desnoyers
2008-08-13 23:41                                           ` [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug Mathieu Desnoyers
2008-08-14  0:01                                             ` H. Peter Anvin
2008-08-14  1:13                                               ` Mathieu Desnoyers
2008-08-14  1:22                                               ` Jeremy Fitzhardinge
2008-08-14  1:26                                                 ` Roland McGrath
2008-08-14  1:49                                                 ` Mathieu Desnoyers
2008-08-14  3:35                                                   ` Jeremy Fitzhardinge
2008-08-14 15:18                                                     ` Mathieu Desnoyers
2008-08-14 16:10                                                       ` Linus Torvalds
2008-08-14 16:13                                                       ` H. Peter Anvin
2008-08-14 16:58                                                         ` Mathieu Desnoyers
2008-08-14 17:05                                                           ` Jeremy Fitzhardinge
2008-08-14 17:30                                                             ` Mathieu Desnoyers
2008-08-14 17:43                                                               ` Jeremy Fitzhardinge
2008-08-14 18:37                                                                 ` H. Peter Anvin
2008-08-14 18:53                                                                   ` Mathieu Desnoyers
2008-08-14 19:29                                                                     ` Jeremy Fitzhardinge
2008-08-14 20:31                                                                       ` Mathieu Desnoyers
2008-08-14 20:39                                                                         ` H. Peter Anvin
2008-08-14 21:46                                                                         ` Jeremy Fitzhardinge
2008-08-14 22:26                                                                           ` H. Peter Anvin
2008-08-14 17:17                                                           ` H. Peter Anvin
2008-08-14 18:09                                                             ` Mathieu Desnoyers
2008-08-14 19:49                                                             ` Mathieu Desnoyers
2008-08-14 17:04                                                       ` Jeremy Fitzhardinge
2008-08-14 17:18                                                         ` H. Peter Anvin
2008-08-14 17:28                                                           ` Jeremy Fitzhardinge
2008-08-14 17:31                                                             ` H. Peter Anvin
2008-08-14 17:46                                                           ` Mathieu Desnoyers
2008-08-14 17:49                                                             ` Jeremy Fitzhardinge
2008-08-14 17:55                                                               ` Mathieu Desnoyers
2008-08-14 18:59                                                                 ` Gregory Haskins
2008-08-15 21:34                                         ` Efficient x86 and x86_64 NOP microbenchmarks Steven Rostedt
2008-08-15 21:51                                           ` Andi Kleen
2008-08-13 19:16                                   ` Mathieu Desnoyers [this message]
2008-08-09  0:51                           ` [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-09  0:53                         ` Roland McGrath
2008-08-09  1:13                           ` Andi Kleen
2008-08-09  1:19                         ` Andi Kleen
2008-08-09  1:30                           ` Steven Rostedt
2008-08-09  1:55                             ` Andi Kleen
2008-08-09  2:03                               ` Steven Rostedt
2008-08-09  2:23                                 ` Andi Kleen
2008-08-09  4:12                           ` Steven Rostedt
2008-08-09  0:30                     ` Steven Rostedt
2008-08-11 18:21                       ` Mathieu Desnoyers
2008-08-11 19:28                         ` Steven Rostedt
2008-08-08 19:08                 ` Jeremy Fitzhardinge
2008-08-11  2:41                 ` Rusty Russell
2008-08-11 12:33                   ` Steven Rostedt
2008-08-07 21:11 ` Jeremy Fitzhardinge
2008-08-07 21:29   ` Steven Rostedt
2008-08-07 22:26     ` Roland McGrath
2008-08-08  1:21       ` Steven Rostedt
2008-08-08  1:24         ` Steven Rostedt
2008-08-08  1:56         ` Steven Rostedt
2008-08-08  7:22         ` Peter Zijlstra
2008-08-08 11:31           ` Steven Rostedt
2008-08-08  4:54       ` Sam Ravnborg
2008-08-09  9:48 ` Abhishek Sagar
2008-08-09 13:01   ` Steven Rostedt
2008-08-09 15:01     ` Abhishek Sagar
2008-08-09 15:37       ` Steven Rostedt
2008-08-09 17:14         ` Abhishek Sagar
     [not found] <20080813191926.GB15547@Krystal>
2008-08-13 20:00 ` Efficient x86 and x86_64 NOP microbenchmarks Steven Rostedt
2008-08-13 20:06   ` Jeremy Fitzhardinge
2008-08-13 20:34     ` Steven Rostedt
2008-08-13 20:15   ` Andi Kleen
2008-08-13 20:21     ` Linus Torvalds
2008-08-13 20:21     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080813191614.GA15547@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=davem@davemloft.net \
    --cc=drepper@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=jeremy@goop.org \
    --cc=lclaudio@uudg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=roland@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox