significant variation in performance counters on POWER6

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* significant variation in performance counters on POWER6
@ 2010-05-04  6:54 Victor Javier
  0 siblings, 0 replies; only message in thread
From: Victor Javier @ 2010-05-04  6:54 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3991 bytes --]

Hello,

I am doing some research where I need to collect performance information
for SPEC CPU2006 benchmarks on a POWER6 JS22 system. Previously I was
using perfmon2, but after the release of "performance counters for
linux" (and the 'perf' tool), I decided to try it. One of the reasons
was the native support for multiplexing.

However, I have been noticing a much higher variability when using perf,
compared to perfmon2. As an example, I will provide data for 'bwaves'
benchmark when run with the reference input set (it takes around 20
minutes to finish).

The information for the kernels I am using is:
* perfmon2: Linux version 2.6.28-pfmon2 (gcc version 4.1.2 20070115 (SUSE Linux)) #6 SMP
* perf: Linux version 2.6.33.3-perf (gcc version 4.1.2 20070115 (SUSE Linux)) #1 SMP

I am using libpfm version 3.8.

I can provide more information, such as modules, detailed processor
information, etc.) if necessary.

The commands I used to collect the counters are:

perfmon2: pfmon -e PM_CYC,PM_INST_CMPL,PM_LD_MISS_L1 ./bwaves_base.Linux64
perf: perf stat -e r1e:u,r2:u,r80080:u ./bwaves_base.Linux64

I also tried to pin the execution to a given CPU, but the results were
the same.
I repeated the executions 10 times, so I am also providing the mean and
the standard deviation.

============
= perfmon2 =
============

        cycles               instrs completed     L1 load misses
        4,567,041,667,206    2,772,827,993,242    6,918,871,375
        4,569,071,274,248    2,772,827,992,642    6,931,066,292
        4,568,234,790,260    2,772,827,992,716    6,922,975,235
        4,566,485,780,016    2,772,827,992,065    6,917,600,192
        4,566,437,677,239    2,772,827,992,067    6,915,222,376
        4,566,640,807,800    2,772,827,992,066    6,915,703,838
        4,566,466,402,423    2,772,827,992,062    6,914,107,325
        4,569,322,329,138    2,772,828,006,865    6,933,546,730
        4,567,018,722,323    2,772,827,992,066    6,914,210,622
        4,566,778,622,700    2,772,827,992,066    6,914,251,098

mean  4,567,349,807,335    2,772,827,993,786    6,919,755,508
stdev     1,107,043,810                4,614        7,178,958

========
= perf =
========

        cycles               instrs completed     L1 load misses
        4,562,017,366,591    2,772,768,370,128    7,134,353,697
        4,541,500,651,248    2,772,868,724,285    6,341,491,710
        4,550,876,532,582    2,772,787,520,375    6,661,719,666
        4,540,558,691,334    2,772,868,724,156    6,266,617,715
        4,573,942,460,136    2,772,861,831,519    7,419,020,488
        4,587,876,861,751    2,772,868,724,189    8,174,507,077
        4,550,771,568,044    2,772,841,147,861    6,547,437,055
        4,600,947,093,875    2,772,787,520,375    9,152,895,835
        4,572,501,705,517    2,772,861,831,526    7,765,464,256
        4,561,690,369,227    2,772,787,520,368    6,902,452,934

mean  4,564,268,330,031    2,772,830,191,478    7,236,596,043
stdev    19,770,352,264           41,980,009      914,965,698

As can be seen, the standard deviation for perf is significantly higher.
Considering the instructions completed, perf shows a 10000x higher
standard deviation. Although this variation may not be very high if
compared to the absolute number of instructions completed, it is an
issue for the case of L1 load misses. In the case of perfmon2 I can
expect misses to be in the range [6,905,397,592 .. 6,934,113,424], which
is a tight confidence interval. However, for perf this interval grows
until [5,406,664,646 .. 9,066,527,440]. This variation is clearly not
acceptable, as I cannot really draw any conclusion from those results.

I would like to know if you are aware of this issue, and which could be
the causes. I would also appreciate any help into fixing this.

In case it is not easy to read the data, I provide it as a separate PDF
file as well. I also attach a couple of graphs showing the variation for
instructions and misses.

Thank you for any help on this,
Victor

[-- Attachment #2: graphs.pdf --]
[-- Type: video/x-ms-wm, Size: 20524 bytes --]

[-- Attachment #3: data.pdf --]
[-- Type: video/x-ms-wm, Size: 38315 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-05-04  7:14 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-04  6:54 significant variation in performance counters on POWER6 Victor Javier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox