All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@linux.intel.com
Subject: Re: perf-stat changes after "Use hrtimers for event multiplexing"
Date: Tue, 7 Jan 2014 21:20:34 +0800	[thread overview]
Message-ID: <20140107132034.GA1079@localhost> (raw)
In-Reply-To: <CABPqkBRqg4K3US3LLw6g6tspvQ53ZwQgy4y7R83w-L9EyhrvFA@mail.gmail.com>

Hi Stephane,

On Tue, Jan 07, 2014 at 10:52:50AM +0100, Stephane Eranian wrote:
> Hi,
> 
> With the hrtitmer patch, you will get more regular multiplexing when
> you have idle cores during your benchmark.
> Without the patch, multiplexing was piggybacked on timer tick. The
> timer tick does not occur when a core is idle
> when using a tickless kernel. Thus, the quality of the results with
> hrtimers should be improved.

OK, got it. Thanks for the explanations!

Thanks,
Fengguang

> 
> On Sun, Jan 5, 2014 at 2:14 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> > On Sat, Jan 04, 2014 at 08:02:28PM +0100, Peter Zijlstra wrote:
> >> On Thu, Jan 02, 2014 at 02:12:42PM +0800, fengguang.wu@intel.com wrote:
> >> > Greetings,
> >> >
> >> > We noticed many perf-stat changes between commit 9e6302056f ("perf: Use
> >> > hrtimers for event multiplexing") and its parent commit ab573844e.
> >> > Are these expected changes?
> >> >
> >> > ab573844e3058ee  9e6302056f8029f438e853432
> >> > ---------------  -------------------------
> >> >     152917         +842.9%    1441897       TOTAL interrupts.0:IO-APIC-edge.timer
> >> >     545996         +478.0%    3155637       TOTAL interrupts.LOC
> >> >     182281          +12.3%     204718       TOTAL softirqs.SCHED
> >> >  1.986e+08          -96.4%    7105919       TOTAL perf-stat.node-store-misses
> >> >  107241719          -99.7%     317525       TOTAL perf-stat.node-prefetch-misses
> >> >  1.938e+08          -90.7%   17930426       TOTAL perf-stat.node-load-misses
> >> >       2590         +247.8%       9009       TOTAL vmstat.system.in
> >> >  4.549e+12         +158.3%  1.175e+13       TOTAL perf-stat.stalled-cycles-backend
> >> >  6.807e+12         +149.1%  1.696e+13       TOTAL perf-stat.stalled-cycles-frontend
> >> >  1.753e+08          -50.8%   86339289       TOTAL perf-stat.node-prefetches
> >> >  8.326e+11          +45.0%  1.207e+12       TOTAL perf-stat.cpu-cycles
> >> >   37932143          +32.2%   50146025       TOTAL perf-stat.iTLB-load-misses
> >> >  4.738e+11          +30.1%  6.165e+11       TOTAL perf-stat.iTLB-loads
> >> >   2.56e+11          +30.1%   3.33e+11       TOTAL perf-stat.L1-icache-loads
> >> >  4.951e+11          +24.6%  6.169e+11       TOTAL perf-stat.instructions
> >> >   7.85e+08           +7.5%  8.439e+08       TOTAL perf-stat.LLC-prefetch-misses
> >> >  1.891e+12          +22.8%  2.322e+12       TOTAL perf-stat.ref-cycles
> >> >  4.344e+08          -20.3%  3.462e+08       TOTAL perf-stat.node-loads
> >> >  2.836e+11          +17.4%  3.328e+11       TOTAL perf-stat.branch-loads
> >> >  9.506e+10          +24.5%  1.183e+11       TOTAL perf-stat.branch-load-misses
> >> >  2.803e+11          +18.4%  3.319e+11       TOTAL perf-stat.branch-instructions
> >> >  7.988e+10          +20.9%  9.658e+10       TOTAL perf-stat.bus-cycles
> >> >  2.041e+09          +22.2%  2.495e+09       TOTAL perf-stat.branch-misses
> >> >     229145          -17.3%     189601       TOTAL perf-stat.cpu-migrations
> >> >  1.782e+11          +17.9%    2.1e+11       TOTAL perf-stat.dTLB-loads
> >> >  4.702e+08          -14.8%  4.006e+08       TOTAL perf-stat.LLC-load-misses
> >> >  1.418e+11          +17.4%  1.666e+11       TOTAL perf-stat.L1-dcache-loads
> >> >  1.838e+09          +16.1%  2.133e+09       TOTAL perf-stat.LLC-stores
> >> >  2.428e+09          +11.3%  2.702e+09       TOTAL perf-stat.LLC-loads
> >> >  2.788e+11           +8.6%  3.029e+11       TOTAL perf-stat.dTLB-stores
> >> >   8.66e+08          +10.8%  9.594e+08       TOTAL perf-stat.LLC-prefetches
> >> >  1.117e+09          +10.5%  1.234e+09       TOTAL perf-stat.dTLB-store-misses
> >> >  1.705e+09           +5.3%  1.796e+09       TOTAL perf-stat.L1-dcache-store-misses
> >> >  5.671e+09           +6.1%  6.015e+09       TOTAL perf-stat.L1-dcache-load-misses
> >> >  8.794e+10           +3.6%  9.109e+10       TOTAL perf-stat.L1-dcache-stores
> >> >   3.46e+09           +4.6%  3.618e+09       TOTAL perf-stat.cache-references
> >> >  8.696e+08           +1.8%  8.849e+08       TOTAL perf-stat.cache-misses
> >> >    1613129           +2.6%    1655724       TOTAL perf-stat.context-switches
> >> >
> >> > All of the changes happen in one of our test box, which has a DX58SO
> >> > baseboard and 4-core CPU. The boot dmesg and kconfig are attached.
> >> > We can test more boxes if necessary.
> >>
> >> How do you run perf stat?
> >
> > perf stat -a $(-e hardware, cache, software events)
> >
> >> Curious that you notice this now, its a fairly old commit.
> >
> > Yeah, we are feeding old kernels to the 0day performance test system, too. :)
> >
> >> IIRC we did have a few wobbles with that, but I cannot remember much
> >> detail.
> >>
> >> The biggest difference between before and after that patch is that we'd
> >> rotate while the core is 'idle'. So if you do something like 'perf stat
> >> -a' and have significant idle time it does indeed make a difference.
> >
> > It is 'perf stat -a'; the CPU is mostly idle because it's an IO workload.
> >
> > btw, we find another commit that changed some perf-stat output:
> >
> > 2f7f73a520 ("perf/x86: Fix shared register mutual exclusion enforcement")
> >
> > Comparing to its parent commit:
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >  1.308e+08 ~26%     -77.8%   29029594 ~12%  fat/micro/dd-write/1HDD-deadline-xfs-10dd
> >  1.308e+08          -77.8%   29029594       TOTAL perf-stat.LLC-prefetch-misses
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >   97086131 ~ 7%     -71.0%   28127157 ~11%  fat/micro/dd-write/1HDD-deadline-xfs-10dd
> >   97086131          -71.0%   28127157       TOTAL perf-stat.node-prefetches
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >    1.4e+08 ~ 3%     -56.6%   60744486 ~ 9%  fat/micro/dd-write/1HDD-deadline-xfs-10dd
> >    1.4e+08          -56.6%   60744486       TOTAL perf-stat.LLC-load-misses
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >  6.967e+08 ~ 0%     -49.6%  3.513e+08 ~ 6%  fat/micro/dd-write/1HDD-deadline-xfs-10dd
> >  6.967e+08          -49.6%  3.513e+08       TOTAL perf-stat.node-stores
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >  1.933e+09 ~ 1%     -43.0%  1.103e+09 ~ 2%  fat/micro/dd-write/1HDD-deadline-xfs-10dd
> >  1.933e+09          -43.0%  1.103e+09       TOTAL perf-stat.LLC-stores
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >  7.013e+08 ~ 5%     -55.5%  3.118e+08 ~ 4%  fat/micro/dd-write/1HDD-deadline-btrfs-100dd
> >  6.775e+09 ~ 1%     -20.4%  5.391e+09 ~ 1%  lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> >  7.477e+09          -23.7%  5.703e+09       TOTAL perf-stat.LLC-store-misses
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >  2.294e+09 ~ 1%     -10.0%  2.065e+09 ~ 0%  lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> >  2.294e+09          -10.0%  2.065e+09       TOTAL perf-stat.LLC-prefetches
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >  8.685e+09 ~ 0%     -10.0%  7.814e+09 ~ 1%  lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> >  8.685e+09          -10.0%  7.814e+09       TOTAL perf-stat.cache-misses
> >
> > 069e0c3c4058147  2f7f73a52078b667d64df16ea
> > ---------------  -------------------------
> >  1.591e+12 ~ 0%      -8.7%  1.453e+12 ~ 1%  lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-ext4-1dd
> >  1.591e+12           -8.7%  1.453e+12       TOTAL perf-stat.dTLB-loads
> >
> >
> > Thanks,
> > Fengguang

  reply	other threads:[~2014-01-07 13:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-02  6:12 perf-stat changes after "Use hrtimers for event multiplexing" fengguang.wu
2014-01-04 19:02 ` Peter Zijlstra
2014-01-05  1:14   ` Fengguang Wu
2014-01-07  9:52     ` Stephane Eranian
2014-01-07 13:20       ` Fengguang Wu [this message]
2014-01-07 14:26         ` Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140107132034.GA1079@localhost \
    --to=fengguang.wu@intel.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@linux.intel.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.