From: Ingo Molnar <mingo@elte.hu>
To: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
paulus@samba.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [perf] howto switch from pfmon
Date: Tue, 23 Jun 2009 16:36:01 +0200 [thread overview]
Message-ID: <20090623143601.GA13415@elte.hu> (raw)
In-Reply-To: <4A40DFF5.7010207@inria.fr>
* Brice Goglin <Brice.Goglin@inria.fr> wrote:
> Ingo Molnar wrote:
> > * Ingo Molnar <mingo@elte.hu> wrote:
> >
> >
> >> $ perf stat -e cycles -e instructions -e r1000ffe0 ./hackbench 10
> >> Time: 0.186
> >>
> >
> > Correction: that should be r10000ffe0.
>
> Oh thanks a lot, it seems to work now!
btw., it might make sense to expose NUMA inbalance via generic
enumeration. Right now we have:
PERF_COUNT_HW_CPU_CYCLES = 0,
PERF_COUNT_HW_INSTRUCTIONS = 1,
PERF_COUNT_HW_CACHE_REFERENCES = 2,
PERF_COUNT_HW_CACHE_MISSES = 3,
PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4,
PERF_COUNT_HW_BRANCH_MISSES = 5,
PERF_COUNT_HW_BUS_CYCLES = 6,
plus we have cache stats:
* Generalized hardware cache counters:
*
* { L1-D, L1-I, LLC, ITLB, DTLB, BPU } x
* { read, write, prefetch } x
* { accesses, misses }
NUMA is here to stay, and expressing local versus remote access
stats seems useful. We could add two generic counters:
PERF_COUNT_HW_RAM_LOCAL = 7,
PERF_COUNT_HW_RAM_REMOTE = 8,
And map them properly on all CPUs that support such stats. They'd be
accessible via '-e ram-local-refs' and '-e ram-remote-refs' type of
event symbols.
What is your typical usage pattern of this counter? What (general)
kind of app do you profile with it and how do you make use of the
specific node masks?
Would a local/all-remote distinction be enough, or do you need to
make a distinction between the individual nodes to get the best
insight into the workload?
> One strange thing I noticed: sometimes perf reports that there
> were some accesses to target numa nodes 4-7 while my box only has
> 4 numa nodes: If I request counters only for the non-existing
> target numa nodes (4-7, with -e r1000010e0 -e r1000020e0 -e
> r1000040e0 -e r1000080e0), I always get 4 zeros.
>
> But if I mix some couinters from the existing nodes (0-3) with
> some counters from non-existing nodes (4-7), the non-existing ones
> report some small but non-empty values. Does it ring any bell?
I can see that too. I have a similar system (4 nodes), and if i use
the stats for nodes 4-7 (non-existent) i get:
phoenix:~> perf stat -e r1000010e0 -e r1000020e0 -e r1000040e0 -e r1000080e0 --repeat 10 ./hackbench 30
Time: 0.490
Time: 0.435
Time: 0.492
Time: 0.569
Time: 0.491
Time: 0.498
Time: 0.549
Time: 0.530
Time: 0.543
Time: 0.482
Performance counter stats for './hackbench 30' (10 runs):
0 raw 0x1000010e0 ( +- 0.000% )
0 raw 0x1000020e0 ( +- 0.000% )
0 raw 0x1000040e0 ( +- 0.000% )
0 raw 0x1000080e0 ( +- 0.000% )
0.610303953 seconds time elapsed.
( Note the --repeat option - that way you can repeat workloads and
observe their statistical properties. )
If i try the first 4 nodes i get:
phoenix:~> perf stat -e r1000001e0 -e r1000002e0 -e r1000004e0 -e r1000008e0 --repeat 10 ./hackbench 30
Time: 0.403
Time: 0.431
Time: 0.406
Time: 0.421
Time: 0.461
Time: 0.423
Time: 0.495
Time: 0.462
Time: 0.434
Time: 0.459
Performance counter stats for './hackbench 30' (10 runs):
52255370 raw 0x1000001e0 ( +- 5.510% )
46052950 raw 0x1000002e0 ( +- 8.067% )
45966395 raw 0x1000004e0 ( +- 10.341% )
63240044 raw 0x1000008e0 ( +- 11.707% )
0.530894007 seconds time elapsed.
Quite noisy across runs - which is expected on NUMA, as the memory
allocations are not really deterministic and some more NUMA friendly
than others. This box has all relevant NUMA options enabled:
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_ACPI_NUMA=y
But if i 'mix' counters, i too get weird stats:
phoenix:~> perf stat -e r1000020e0 -e r1000040e0 -e r1000080e0 -e r10000ffe0 --repeat 10 ./hackbench 30
Time: 0.432
Time: 0.446
Time: 0.428
Time: 0.472
Time: 0.443
Time: 0.454
Time: 0.398
Time: 0.438
Time: 0.403
Time: 0.463
Performance counter stats for './hackbench 30' (10 runs):
2355436 raw 0x1000020e0 ( +- 8.989% )
0 raw 0x1000040e0 ( +- 0.000% )
0 raw 0x1000080e0 ( +- 0.000% )
204768941 raw 0x10000ffe0 ( +- 0.788% )
0.528447241 seconds time elapsed.
That 2355436 count for node 5 should have been zero.
Ingo
next prev parent reply other threads:[~2009-06-23 14:36 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-22 20:54 [perf] howto switch from pfmon Brice Goglin
2009-06-23 12:12 ` Andi Kleen
2009-06-23 12:23 ` Peter Zijlstra
2009-06-23 13:57 ` Ingo Molnar
2009-06-23 13:14 ` Ingo Molnar
2009-06-23 13:22 ` Peter Zijlstra
2009-06-23 13:38 ` Ingo Molnar
2009-06-23 13:25 ` Ingo Molnar
2009-06-23 13:47 ` Ingo Molnar
2009-06-23 14:00 ` Brice Goglin
2009-06-23 14:36 ` Ingo Molnar [this message]
2009-06-23 15:22 ` Brice Goglin
2009-06-29 19:29 ` Ingo Molnar
2009-08-06 16:59 ` Brice Goglin
2009-08-06 17:40 ` Peter Zijlstra
2009-08-06 17:48 ` Brice Goglin
2009-08-06 17:59 ` Peter Zijlstra
2009-08-06 18:57 ` [PATCH] perf tools: Fix reading of perf.data file header Peter Zijlstra
2009-08-06 19:03 ` Brice Goglin
2009-08-06 19:59 ` Ingo Molnar
2009-08-06 20:03 ` Brice Goglin
2009-08-06 23:35 ` Brice Goglin
2009-08-07 6:13 ` Brice Goglin
2009-08-07 6:32 ` Ingo Molnar
2009-08-07 7:38 ` Brice Goglin
2009-08-07 7:45 ` Ingo Molnar
2009-08-07 8:18 ` Brice Goglin
2009-08-07 8:23 ` Ingo Molnar
2009-08-07 8:27 ` Ingo Molnar
2009-08-07 8:30 ` [tip:perfcounters/core] perf stat: Rename -S/--scale to -c/--scale tip-bot for Brice Goglin
2009-08-07 11:55 ` [PATCH] perf report: Display per-thread event counters Brice Goglin
2009-08-08 11:54 ` [tip:perfcounters/core] perf report: Fix and improve the displaying of " tip-bot for Brice Goglin
2009-08-08 12:14 ` [PATCH] perf report: Display " Ingo Molnar
2009-08-08 16:10 ` Brice Goglin
2009-08-08 16:13 ` Ingo Molnar
2009-08-07 6:37 ` [tip:perfcounters/urgent] perf tools: Fix multi-counter stat bug caused by incorrect reading of perf.data file header tip-bot for Peter Zijlstra
2009-08-07 7:39 ` tip-bot for Peter Zijlstra
2009-08-06 19:01 ` [perf] howto switch from pfmon Brice Goglin
2009-06-23 14:21 ` Brice Goglin
2009-06-23 14:51 ` Ingo Molnar
2009-06-23 15:29 ` Jaswinder Singh Rajput
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090623143601.GA13415@elte.hu \
--to=mingo@elte.hu \
--cc=Brice.Goglin@inria.fr \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox