public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Jiri Olsa <jolsa@redhat.com>
Cc: "Arnaldo Carvalho de Melo" <acme@redhat.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Peter Zijlstra" <a.p.zijlstra@chello.nl>,
	masami.hiramatsu.pt@hitachi.com, hpa@zytor.com,
	ananth@in.ibm.com, davem@davemloft.net,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	eric.dumazet@gmail.com, 2nddept-manager@sdl.hitachi.co.jp
Subject: Re: [PATCH 1/2] x86: separating entry text section
Date: Mon, 7 Mar 2011 16:29:21 +0100	[thread overview]
Message-ID: <20110307152921.GC28226@elte.hu> (raw)
In-Reply-To: <20110307104423.GA3661@jolsa.redhat.com>


* Jiri Olsa <jolsa@redhat.com> wrote:

> hi,
> any feedback?

Did you get my reply below?

Thanks,

	Ingo

----- Forwarded message from Ingo Molnar <mingo@elte.hu> -----

Date: Tue, 22 Feb 2011 09:09:34 +0100
From: Ingo Molnar <mingo@elte.hu>
To: Jiri Olsa <jolsa@redhat.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Frédéric Weisbecker <fweisbec@gmail.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: masami.hiramatsu.pt@hitachi.com, acme@redhat.com, fweisbec@gmail.com,
	hpa@zytor.com, ananth@in.ibm.com, davem@davemloft.net,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	a.p.zijlstra@chello.nl, eric.dumazet@gmail.com,
	2nddept-manager@sdl.hitachi.co.jp
Subject: Re: [PATCH 1/2] x86: separating entry text section


* Jiri Olsa <jolsa@redhat.com> wrote:

> Putting x86 entry code to the separate section: .entry.text.

Trying to apply your patch i noticed one detail:

> before patch:
>      26282174  L1-icache-load-misses      ( +-   0.099% )  (scaled from 81.00%)
>   0.206651959  seconds time elapsed   ( +-   0.152% )
> 
> after patch:
>      24237651  L1-icache-load-misses      ( +-   0.117% )  (scaled from 80.96%)
>   0.210509948  seconds time elapsed   ( +-   0.140% )

So time elapsed actually went up.

hackbench is notoriously unstable when it comes to runtime - and increasing the 
--repeat value only has limited effects on that.

Dropping all system caches:

   echo 1 > /proc/sys/vm/drop_caches

Seems to do a better job of 'resetting' system state, but if we put that into the 
measured workload then the results are all over the place (as we now depend on IO 
being done):

 # cat hb10

 echo 1 > /proc/sys/vm/drop_caches
 ./hackbench 10

 # perf stat --repeat 3 ./hb10

 Time: 0.097
 Time: 0.095
 Time: 0.101

 Performance counter stats for './hb10' (3 runs):

         21.351257 task-clock-msecs         #      0.044 CPUs    ( +-  27.165% )
                 6 context-switches         #      0.000 M/sec   ( +-  34.694% )
                 1 CPU-migrations           #      0.000 M/sec   ( +-  25.000% )
               410 page-faults              #      0.019 M/sec   ( +-   0.081% )
        25,407,650 cycles                   #   1189.984 M/sec   ( +-  49.154% )
        25,407,650 instructions             #      1.000 IPC     ( +-  49.154% )
         5,126,580 branches                 #    240.107 M/sec   ( +-  46.012% )
           192,272 branch-misses            #      3.750 %       ( +-  44.911% )
           901,701 cache-references         #     42.232 M/sec   ( +-  12.857% )
           802,767 cache-misses             #     37.598 M/sec   ( +-   9.282% )

        0.483297792  seconds time elapsed   ( +-  31.152% )

So here's a perf stat feature suggestion to solve such measurement problems: a new 
'pre-run' 'dry' command could be specified that is executed before the real 'hot' 
run is executed. Something like this:

  perf stat --pre-run-script ./hb10 --repeat 10 ./hackbench 10

Would do the cache-clearing before each run, it would run hackbench once (dry run) 
and then would run hackbench 10 for real - and would repeat the whole thing 10 
times. Only the 'hot' portion of the run would be measured and displayed in the perf 
stat output event counts.

Another observation:

>      24237651  L1-icache-load-misses      ( +-   0.117% )  (scaled from 80.96%)

Could you please do runs that do not display 'scaled from' messages? Since we are 
measuring a relatively small effect here, and scaling adds noise, it would be nice 
to ensure that the effect persists with non-scaled events as well:

You can do that by reducing the number of events that are measured. The PMU can not 
measure all those L1 cache events you listed - so only use the most important one 
and add cycles and instructions to make sure the measurements are comparable:

  -e L1-icache-load-misses -e instructions -e cycles

Btw., there's another 'perf stat' feature suggestion: it would be nice if it was 
possible to 'record' a perf stat run, and do a 'perf diff' over it. That would 
compare the two runs all automatically, without you having to do the comparison 
manually.

Thanks,

	Ingo


  reply	other threads:[~2011-03-07 15:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-14 15:12 [RFC,PATCH] kprobes - optimized kprobes might crash before setting kernel stack Jiri Olsa
2011-02-15  9:41 ` Masami Hiramatsu
2011-02-15 12:30   ` Jiri Olsa
2011-02-15 15:55     ` Masami Hiramatsu
2011-02-15 16:54       ` Jiri Olsa
2011-02-15 17:05       ` [PATCH] kprobes - do not allow optimized kprobes in entry code Jiri Olsa
2011-02-16  3:36         ` Masami Hiramatsu
2011-02-17 15:11           ` Ingo Molnar
2011-02-17 15:20             ` Jiri Olsa
2011-02-18 16:26             ` Jiri Olsa
2011-02-19 14:14               ` Masami Hiramatsu
2011-02-20 12:59                 ` Ingo Molnar
2011-02-21 11:54                   ` Jiri Olsa
2011-02-21 14:25                   ` [PATCH 0/2] x86: separating entry text section + kprobes fix Jiri Olsa
2011-02-21 14:25                     ` [PATCH 1/2] x86: separating entry text section Jiri Olsa
2011-02-22  3:22                       ` Masami Hiramatsu
2011-02-22  8:09                       ` Ingo Molnar
2011-02-22 12:52                         ` Jiri Olsa
2011-03-07 10:44                           ` Jiri Olsa
2011-03-07 15:29                             ` Ingo Molnar [this message]
2011-03-07 18:10                               ` Jiri Olsa
2011-03-08 16:15                                 ` Ingo Molnar
2011-03-08 20:15                                 ` [tip:perf/core] x86: Separate out " tip-bot for Jiri Olsa
2011-02-21 14:25                     ` [PATCH 2/2] kprobes: disabling optimized kprobes for " Jiri Olsa
2011-02-22  3:22                       ` Masami Hiramatsu
2011-03-08 20:16                       ` [tip:perf/core] kprobes: Disabling " tip-bot for Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110307152921.GC28226@elte.hu \
    --to=mingo@elte.hu \
    --cc=2nddept-manager@sdl.hitachi.co.jp \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=ananth@in.ibm.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox