* x86 perf's dTLB-load-misses broken on IvyBridge?
@ 2014-02-18 23:11 Dave Hansen
2014-02-19 8:43 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Dave Hansen @ 2014-02-18 23:11 UTC (permalink / raw)
To: Andi Kleen, LKML, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo
I noticed that perf's dTLB-load-misses even t isn't working on my
Ivybridge system:
> Performance counter stats for 'system wide':
>
> 0 dTLB-load-misses [100.00%]
> 48,570 dTLB-store-misses [100.00%]
> 202,573 iTLB-loads [100.00%]
> 271,546 iTLB-load-misses # 134.05% of all iTLB cache hits
But it works on a SandyBridge system that I have.
arch/x86/kernel/cpu/perf_event_intel.c seems to use the same tables for
SandyBridge and IvyBridge, so they both use the
'MEM_UOP_RETIRED.ALL_LOADS' event:
> [ C(DTLB) ] = {
> [ C(OP_READ) ] = {
> [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOP_RETIRED.ALL_LOADS */
> [ C(RESULT_MISS) ] = 0x0108, /* DTLB_LOAD_MISSES.CAUSES_A_WALK */
> },
But that event looks to be unsupported on this CPU:
> /ocperf.py stat -a -e mem_uops_retired.all_loads sleep 1
> perf stat -a -e cpu/event=0xd0,umask=0x81,name=mem_uops_retired_all_loads/ sleep 1
>
> Performance counter stats for 'system wide':
>
> <not supported> mem_uops_retired_all_loads
> 50,204,763 mem_uops_retired_all_loads_ps
But there's a "_ps" version which uses PEBS which does work?
> mem_uops_retired.all_loads [Load uops retired to architected path with filter on bits 0 and 1 applied. (Supports PEBS)]
> mem_uops_retired.all_loads_ps [Load uops retired to architected path with filter on bits 0 and 1 applied. (Uses PEBS) (Uses PEBS)]
Should we swap perf_event_intel.c over to use the PEBS version so that
it works everywhere?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: x86 perf's dTLB-load-misses broken on IvyBridge?
2014-02-18 23:11 x86 perf's dTLB-load-misses broken on IvyBridge? Dave Hansen
@ 2014-02-19 8:43 ` Peter Zijlstra
2014-02-19 15:23 ` Andi Kleen
2014-02-19 15:40 ` x86 perf's dTLB-load-misses broken on IvyBridge? II Andi Kleen
2 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2014-02-19 8:43 UTC (permalink / raw)
To: Dave Hansen; +Cc: Andi Kleen, LKML, Ingo Molnar, Arnaldo Carvalho de Melo
On Tue, Feb 18, 2014 at 03:11:59PM -0800, Dave Hansen wrote:
> I noticed that perf's dTLB-load-misses even t isn't working on my
> Ivybridge system:
>
> > Performance counter stats for 'system wide':
> >
> > 0 dTLB-load-misses [100.00%]
> > 48,570 dTLB-store-misses [100.00%]
> > 202,573 iTLB-loads [100.00%]
> > 271,546 iTLB-load-misses # 134.05% of all iTLB cache hits
>
> But it works on a SandyBridge system that I have.
>
> arch/x86/kernel/cpu/perf_event_intel.c seems to use the same tables for
> SandyBridge and IvyBridge, so they both use the
> 'MEM_UOP_RETIRED.ALL_LOADS' event:
>
> > [ C(DTLB) ] = {
> > [ C(OP_READ) ] = {
> > [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOP_RETIRED.ALL_LOADS */
> > [ C(RESULT_MISS) ] = 0x0108, /* DTLB_LOAD_MISSES.CAUSES_A_WALK */
> > },
>
> But that event looks to be unsupported on this CPU:
>
> > /ocperf.py stat -a -e mem_uops_retired.all_loads sleep 1
That kind of snake voo-doo is that?
> > perf stat -a -e cpu/event=0xd0,umask=0x81,name=mem_uops_retired_all_loads/ sleep 1
So this line only produces the mem_uops_retired_all_loads thing, not the
_ps thing.
> >
> > Performance counter stats for 'system wide':
> >
> > <not supported> mem_uops_retired_all_loads
> > 50,204,763 mem_uops_retired_all_loads_ps
>
> But there's a "_ps" version which uses PEBS which does work?
So clearly there is no _ps version, as I'm still utterly confused as to
wtf you mean and where it came from.
> > mem_uops_retired.all_loads [Load uops retired to architected path with filter on bits 0 and 1 applied. (Supports PEBS)]
> > mem_uops_retired.all_loads_ps [Load uops retired to architected path with filter on bits 0 and 1 applied. (Uses PEBS) (Uses PEBS)]
What's that; SDM not haz this.
> Should we swap perf_event_intel.c over to use the PEBS version so that
> it works everywhere?
I'm confused; where does this _ps thing come from? There's nothing like
that in the SDM. That only lists the D0H event, and says it should work.
Of course the SDM is trying to confuse the living daylight out of people
by calling it crap like "3rd gen intel core", which just shows they
can't bloody well count either, since it went:
core, core2, nhm, wsm, snb, ivb
do its damn well 6th gen.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: x86 perf's dTLB-load-misses broken on IvyBridge?
2014-02-18 23:11 x86 perf's dTLB-load-misses broken on IvyBridge? Dave Hansen
2014-02-19 8:43 ` Peter Zijlstra
@ 2014-02-19 15:23 ` Andi Kleen
2014-02-19 15:40 ` x86 perf's dTLB-load-misses broken on IvyBridge? II Andi Kleen
2 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2014-02-19 15:23 UTC (permalink / raw)
To: Dave Hansen; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
On Tue, Feb 18, 2014 at 03:11:59PM -0800, Dave Hansen wrote:
> I noticed that perf's dTLB-load-misses even t isn't working on my
> Ivybridge system:
>
> > Performance counter stats for 'system wide':
> >
> > 0 dTLB-load-misses [100.00%]
> > 48,570 dTLB-store-misses [100.00%]
> > 202,573 iTLB-loads [100.00%]
> > 271,546 iTLB-load-misses # 134.05% of all iTLB cache hits
>
> But it works on a SandyBridge system that I have.
>
> arch/x86/kernel/cpu/perf_event_intel.c seems to use the same tables for
> SandyBridge and IvyBridge, so they both use the
> 'MEM_UOP_RETIRED.ALL_LOADS' event:
>
> > [ C(DTLB) ] = {
> > [ C(OP_READ) ] = {
> > [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOP_RETIRED.ALL_LOADS */
> > [ C(RESULT_MISS) ] = 0x0108, /* DTLB_LOAD_MISSES.CAUSES_A_WALK */
> > },
>
> But that event looks to be unsupported on this CPU:
I thought you wanted the miss event?
That would be the second entry.
ALL_LOADS is the access event. it works for me, both raw and perf cooked
(not sure why the two numbers are different though)
% perf stat -e dTLB-loads,r81d0 -a sleep 1
Performance counter stats for 'system wide':
12,685,064 dTLB-loads [100.00%]
13,277,367 r81d0
1.001420860 seconds time elapsed
Miss event count too:
perf stat -e dTLB-load-misses,dTLB-load -a sleep 1
Performance counter stats for 'system wide':
19,504 dTLB-load-misses # 0.30% of all dTLB cache hits [100.00%]
6,471,308 dTLB-load
1.001894328 seconds time elapsed
Same raw:
perf stat -e r0108 -a sleep 1
Performance counter stats for 'system wide':
82,285 r0108
1.001353060 seconds time elapsed
> > perf stat -a -e cpu/event=0xd0,umask=0x81,name=mem_uops_retired_all_loads/ sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > <not supported> mem_uops_retired_all_loads
> > 50,204,763 mem_uops_retired_all_loads_ps
>
> But there's a "_ps" version which uses PEBS which does work?
Both works for me on a IvyBridge.
> Should we swap perf_event_intel.c over to use the PEBS version so that
> it works everywhere?
Shouldn't be needed.
PEBS for counting normally doesn't make much sense.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: x86 perf's dTLB-load-misses broken on IvyBridge? II
2014-02-18 23:11 x86 perf's dTLB-load-misses broken on IvyBridge? Dave Hansen
2014-02-19 8:43 ` Peter Zijlstra
2014-02-19 15:23 ` Andi Kleen
@ 2014-02-19 15:40 ` Andi Kleen
2014-02-19 15:54 ` Peter Zijlstra
2 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2014-02-19 15:40 UTC (permalink / raw)
To: Dave Hansen; +Cc: LKML, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
> > [ C(DTLB) ] = {
> > [ C(OP_READ) ] = {
> > [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOP_RETIRED.ALL_LOADS */
> > [ C(RESULT_MISS) ] = 0x0108, /* DTLB_LOAD_MISSES.CAUSES_A_WALK */
> > },
Actually I tested on the wrong system earlier, sorry The miss event code
is wrong on IvyBridge, it should be 0x8108 (DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK),
not 0x0108.
BTW that event only counts demand loads, no prefetches, that would be a separate event.
dTLB-loads works with and without PEBS.
-Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: x86 perf's dTLB-load-misses broken on IvyBridge? II
2014-02-19 15:40 ` x86 perf's dTLB-load-misses broken on IvyBridge? II Andi Kleen
@ 2014-02-19 15:54 ` Peter Zijlstra
0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2014-02-19 15:54 UTC (permalink / raw)
To: Andi Kleen; +Cc: Dave Hansen, LKML, Ingo Molnar, Arnaldo Carvalho de Melo
On Wed, Feb 19, 2014 at 07:40:36AM -0800, Andi Kleen wrote:
> > > [ C(DTLB) ] = {
> > > [ C(OP_READ) ] = {
> > > [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOP_RETIRED.ALL_LOADS */
> > > [ C(RESULT_MISS) ] = 0x0108, /* DTLB_LOAD_MISSES.CAUSES_A_WALK */
> > > },
>
>
> Actually I tested on the wrong system earlier, sorry The miss event code
> is wrong on IvyBridge, it should be 0x8108 (DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK),
> not 0x0108.
Ah indeed, that is different between snb and ivb.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-19 15:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-18 23:11 x86 perf's dTLB-load-misses broken on IvyBridge? Dave Hansen
2014-02-19 8:43 ` Peter Zijlstra
2014-02-19 15:23 ` Andi Kleen
2014-02-19 15:40 ` x86 perf's dTLB-load-misses broken on IvyBridge? II Andi Kleen
2014-02-19 15:54 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox