From: Don Zickus <dzickus@redhat.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org,
peterz@infradead.org, acme@ghostprotocols.net, jolsa@redhat.com,
jmario@redhat.com, eranian@google.com
Subject: Haswell mem-store question
Date: Wed, 14 May 2014 16:50:21 -0400 [thread overview]
Message-ID: <20140514205021.GU39568@redhat.com> (raw)
Hi Andi,
Joe was playing with our c2c tool today and noticed we were losing store
events from perf's mem-stores event. Upon investigation we stumbled into
some differences in data that Haswell reports vs. Ivy/Sandy Bridge.
This leaves our tool needing two different paths depending on the
architect, which seems odd.
I was hoping you or someone can explain to me the correct way to interpret
the mem-stores data.
My current problem is mem_lvl. It can be defined as
/* memory hierarchy (memory level, hit or miss) */
#define PERF_MEM_LVL_NA 0x01 /* not available */
#define PERF_MEM_LVL_HIT 0x02 /* hit level */
#define PERF_MEM_LVL_MISS 0x04 /* miss level */
#define PERF_MEM_LVL_L1 0x08 /* L1 */
#define PERF_MEM_LVL_LFB 0x10 /* Line Fill Buffer */
#define PERF_MEM_LVL_L2 0x20 /* L2 */
#define PERF_MEM_LVL_L3 0x40 /* L3 */
#define PERF_MEM_LVL_LOC_RAM 0x80 /* Local DRAM */
#define PERF_MEM_LVL_REM_RAM1 0x100 /* Remote DRAM (1 hop) */
#define PERF_MEM_LVL_REM_RAM2 0x200 /* Remote DRAM (2 hops) */
#define PERF_MEM_LVL_REM_CCE1 0x400 /* Remote Cache (1 hop) */
#define PERF_MEM_LVL_REM_CCE2 0x800 /* Remote Cache (2 hops) */
#define PERF_MEM_LVL_IO 0x1000 /* I/O memory */
#define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */
#define PERF_MEM_LVL_SHIFT 5
Currently IVB and SNB use LVL_L1 & (LVL_HIT or LVL_MISS) seen here in
arch/x86/kernel/cpu/perf_event_intel_ds.c
static u64 precise_store_data(u64 status)
{
union intel_x86_pebs_dse dse;
u64 val = P(OP, STORE) | P(SNOOP, NA) | P(LVL, L1) | P(TLB, L2);
^^^^^^^^^
defined here
dse.val = status;
<snip>
/*
* bit 0: hit L1 data cache
* if not set, then all we know is that
* it missed L1D
*/
if (dse.st_l1d_hit)
val |= P(LVL, HIT);
else
val |= P(LVL, MISS);
^^^^^^^
updated here
<snip>
}
However Haswell does something different:
static u64 precise_store_data_hsw(u64 status)
{
union perf_mem_data_src dse;
dse.val = 0;
dse.mem_op = PERF_MEM_OP_STORE;
dse.mem_lvl = PERF_MEM_LVL_NA;
^^^^^^
defines NA here
if (status & 1)
dse.mem_lvl = PERF_MEM_LVL_L1;
^^^^^^^
switch to LVL_L1 here
<snip>
}
So our c2c tool kept store statistics to help determine what types of
stores are causing conflicts
<snip>
} else if (op & P(OP,STORE)) {
/* store */
stats->t.store++;
if (!daddr) {
stats->t.st_noadrs++;
return -1;
}
if (lvl & P(LVL,HIT)) {
if (lvl & P(LVL,UNC)) stats->t.st_uncache++;
if (lvl & P(LVL,L1 )) stats->t.st_l1hit++;
} else if (lvl & P(LVL,MISS)) {
if (lvl & P(LVL,L1)) stats->t.st_l1miss++;
}
}
<snip>
This no longer works on Haswell because Haswell doesn't set LVL_HIT or
LVL_MISS any more. Instead it uses LVL_NA or LVL_L1.
So from a generic tool perspective, what is the recommended way to
properly capture these stats to cover both arches? The hack I have now
is:
} else if (op & P(OP,STORE)) {
/* store */
stats->t.store++;
if (!daddr) {
stats->t.st_noadrs++;
return -1;
}
if ((lvl & P(LVL,HIT)) || (lvl & P(LVL,L1))) {
if (lvl & P(LVL,UNC)) stats->t.st_uncache++;
if (lvl & P(LVL,L1 )) stats->t.st_l1hit++;
} else if ((lvl & P(LVL,MISS)) || (lvl & P(LVL,NA))) {
if (lvl & P(LVL,L1)) stats->t.st_l1miss++;
if (lvl & P(LVL,NA)) stats->t.st_l1miss++;
}
}
I am not sure that is really future proof. Thoughts? Help?
Cheers,
Don
next reply other threads:[~2014-05-14 21:12 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-14 20:50 Don Zickus [this message]
2014-05-14 22:07 ` Haswell mem-store question Stephane Eranian
2014-05-15 2:34 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140514205021.GU39568@redhat.com \
--to=dzickus@redhat.com \
--cc=acme@ghostprotocols.net \
--cc=andi@firstfloor.org \
--cc=eranian@google.com \
--cc=jmario@redhat.com \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.