All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lin Ming <ming.m.lin@intel.com>
To: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@elte.hu>, Andi Kleen <andi@firstfloor.org>,
	lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 -tip] perf: x86, add SandyBridge support
Date: Mon, 28 Feb 2011 22:03:41 +0800	[thread overview]
Message-ID: <1298901821.2169.36.camel@localhost> (raw)
In-Reply-To: <AANLkTim_2ZgZTwJw_fSOdeyvAC96e5H4scGp+byuk6ZD@mail.gmail.com>

On Mon, 2011-02-28 at 17:02 +0800, Stephane Eranian wrote:
> On Mon, Feb 28, 2011 at 9:51 AM, Lin Ming <ming.m.lin@intel.com> wrote:
> > On Mon, 2011-02-28 at 16:20 +0800, Stephane Eranian wrote:
> >> On Mon, Feb 28, 2011 at 8:22 AM, Lin Ming <ming.m.lin@intel.com> wrote:
> >> > This patch adds basic SandyBridge support, including hardware cache
> >> > events and PEBS events support.
> >> >
> >> > LLC-* hareware cache events don't work for now, it depends on the
> >> > offcore patches.
> >> >
> >> > All PEBS events are tested on my SandyBridge machine and work well.
> >> > Note that SandyBridge does not support INSTR_RETIRED.ANY(0x00c0) PEBS
> >> > event, instead it supports INST_RETIRED.PRECDIST(0x01c0) event and PMC1
> >> > only.
> >> >
> >> > v1 -> v2:
> >> > - add more raw and PEBS events constraints
> >> > - use offcore events for LLC-* cache events
> >> > - remove the call to Nehalem workaround enable_all function
> >> >
> >> > todo:
> >> > - precise store
> >> > - precise distribution of instructions retired
> >> >
> >> > Signed-off-by: Lin Ming <ming.m.lin@intel.com>
> >> > ---
> >> >  arch/x86/kernel/cpu/perf_event.c          |    2 +
> >> >  arch/x86/kernel/cpu/perf_event_intel.c    |  123 +++++++++++++++++++++++++++++
> >> >  arch/x86/kernel/cpu/perf_event_intel_ds.c |   44 ++++++++++-
> >> >  3 files changed, 168 insertions(+), 1 deletions(-)
> >> >
> >> > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> >> > index 10bfe24..49d51be 100644
> >> > --- a/arch/x86/kernel/cpu/perf_event.c
> >> > +++ b/arch/x86/kernel/cpu/perf_event.c
> >> > @@ -148,6 +148,8 @@ struct cpu_hw_events {
> >> >  */
> >> >  #define INTEL_EVENT_CONSTRAINT(c, n)   \
> >> >        EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT)
> >> > +#define INTEL_EVENT_CONSTRAINT2(c, n)  \
> >> > +       EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)
> >> >
> >> >  /*
> >> >  * Constraint on the Event code + UMask + fixed-mask
> >> > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> >> > index 084b383..3085868 100644
> >> > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> >> > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> >> > @@ -76,6 +76,19 @@ static struct event_constraint intel_westmere_event_constraints[] =
> >> >        EVENT_CONSTRAINT_END
> >> >  };
> >> >
> >> > +static struct event_constraint intel_snb_event_constraints[] =
> >> > +{
> >> > +       FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
> >> > +       FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
> >> > +       /* FIXED_EVENT_CONSTRAINT(0x013c, 2), CPU_CLK_UNHALTED.REF */
> >> > +       INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.PENDING */
> >> > +       INTEL_EVENT_CONSTRAINT(0xb7, 0x1), /* OFF_CORE_RESPONSE_0 */
> >> > +       INTEL_EVENT_CONSTRAINT(0xbb, 0x8), /* OFF_CORE_RESPONSE_1 */
> >> > +       INTEL_EVENT_CONSTRAINT2(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
> >> > +       INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
> >> > +       EVENT_CONSTRAINT_END
> >> > +};
> >> > +
> >> >  static struct event_constraint intel_gen_event_constraints[] =
> >> >  {
> >> >        FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
> >> > @@ -89,6 +102,106 @@ static u64 intel_pmu_event_map(int hw_event)
> >> >        return intel_perfmon_event_map[hw_event];
> >> >  }
> >> >
> >> > +static __initconst const u64 snb_hw_cache_event_ids
> >> > +                               [PERF_COUNT_HW_CACHE_MAX]
> >> > +                               [PERF_COUNT_HW_CACHE_OP_MAX]
> >> > +                               [PERF_COUNT_HW_CACHE_RESULT_MAX] =
> >> > +{
> >> > + [ C(L1D) ] = {
> >> > +       [ C(OP_READ) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0xf1d0, /* MEM_UOP_RETIRED.LOADS        */
> >> > +               [ C(RESULT_MISS)   ] = 0x0151, /* L1D.REPLACEMENT              */
> >> > +       },
> >> > +       [ C(OP_WRITE) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0xf2d0, /* MEM_UOP_RETIRED.STORES       */
> >> > +               [ C(RESULT_MISS)   ] = 0x0851, /* L1D.ALL_M_REPLACEMENT        */
> >> > +       },
> >> > +       [ C(OP_PREFETCH) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x0,
> >> > +               [ C(RESULT_MISS)   ] = 0x024e, /* HW_PRE_REQ.DL1_MISS          */
> >> > +       },
> >> > + },
> >> > + [ C(L1I ) ] = {
> >> > +       [ C(OP_READ) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x0,
> >> > +               [ C(RESULT_MISS)   ] = 0x0280, /* ICACHE.MISSES */
> >> > +       },
> >> > +       [ C(OP_WRITE) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = -1,
> >> > +               [ C(RESULT_MISS)   ] = -1,
> >> > +       },
> >> > +       [ C(OP_PREFETCH) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x0,
> >> > +               [ C(RESULT_MISS)   ] = 0x0,
> >> > +       },
> >> > + },
> >> > + [ C(LL  ) ] = {
> >> > +       /*
> >> > +        * TBD: Need Off-core Response Performance Monitoring support
> >> > +        */
> >> > +       [ C(OP_READ) ] = {
> >> > +               /* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
> >> > +               [ C(RESULT_ACCESS) ] = 0x01b7,
> >> > +               /* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
> >> > +               [ C(RESULT_MISS)   ] = 0x01bb,
> >> > +       },
> >> > +       [ C(OP_WRITE) ] = {
> >> > +               /* OFFCORE_RESPONSE_0.ANY_RFO.LOCAL_CACHE */
> >> > +               [ C(RESULT_ACCESS) ] = 0x01b7,
> >> > +               /* OFFCORE_RESPONSE_1.ANY_RFO.ANY_LLC_MISS */
> >> > +               [ C(RESULT_MISS)   ] = 0x01bb,
> >> > +       },
> >> > +       [ C(OP_PREFETCH) ] = {
> >> > +               /* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
> >> > +               [ C(RESULT_ACCESS) ] = 0x01b7,
> >> > +               /* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
> >> > +               [ C(RESULT_MISS)   ] = 0x01bb,
> >> > +       },
> >> > + },
> >> > + [ C(DTLB) ] = {
> >> > +       [ C(OP_READ) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x01d0, /* MEM_UOP_RETIRED.LOADS */
> >> > +               [ C(RESULT_MISS)   ] = 0x0108, /* DTLB_LOAD_MISSES.CAUSES_A_WALK */
> >> > +       },
> >> > +       [ C(OP_WRITE) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x02d0, /* MEM_UOP_RETIRED.STORES */
> >> > +               [ C(RESULT_MISS)   ] = 0x0149, /* DTLB_STORE_MISSES.MISS_CAUSES_A_WALK */
> >> > +       },
> >> > +       [ C(OP_PREFETCH) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x0,
> >> > +               [ C(RESULT_MISS)   ] = 0x0,
> >> > +       },
> >> > + },
> >> > + [ C(ITLB) ] = {
> >> > +       [ C(OP_READ) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x1085, /* ITLB_MISSES.STLB_HIT         */
> >> > +               [ C(RESULT_MISS)   ] = 0x0185, /* ITLB_MISSES.CAUSES_A_WALK    */
> >> > +       },
> >> > +       [ C(OP_WRITE) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = -1,
> >> > +               [ C(RESULT_MISS)   ] = -1,
> >> > +       },
> >> > +       [ C(OP_PREFETCH) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = -1,
> >> > +               [ C(RESULT_MISS)   ] = -1,
> >> > +       },
> >> > + },
> >> > + [ C(BPU ) ] = {
> >> > +       [ C(OP_READ) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = 0x00c4, /* BR_INST_RETIRED.ALL_BRANCHES */
> >> > +               [ C(RESULT_MISS)   ] = 0x00c5, /* BR_MISP_RETIRED.ALL_BRANCHES */
> >> > +       },
> >> > +       [ C(OP_WRITE) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = -1,
> >> > +               [ C(RESULT_MISS)   ] = -1,
> >> > +       },
> >> > +       [ C(OP_PREFETCH) ] = {
> >> > +               [ C(RESULT_ACCESS) ] = -1,
> >> > +               [ C(RESULT_MISS)   ] = -1,
> >> > +       },
> >> > + },
> >> > +};
> >> > +
> >> >  static __initconst const u64 westmere_hw_cache_event_ids
> >> >                                [PERF_COUNT_HW_CACHE_MAX]
> >> >                                [PERF_COUNT_HW_CACHE_OP_MAX]
> >> > @@ -1062,6 +1175,16 @@ static __init int intel_pmu_init(void)
> >> >                pr_cont("Westmere events, ");
> >> >                break;
> >> >
> >> > +       case 42: /* SandyBridge */
> >> > +               memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
> >> > +                      sizeof(hw_cache_event_ids));
> >> > +
> >> > +               intel_pmu_lbr_init_nhm();
> >> > +
> >> > +               x86_pmu.event_constraints = intel_snb_event_constraints;
> >> > +               pr_cont("SandyBridge events, ");
> >> > +               break;
> >> > +
> >> >        default:
> >> >                /*
> >> >                 * default constraints for v2 and up
> >> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> >> > index b7dcd9f..e60f91b 100644
> >> > --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> >> > +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> >> > @@ -388,6 +388,42 @@ static struct event_constraint intel_nehalem_pebs_events[] = {
> >> >        EVENT_CONSTRAINT_END
> >> >  };
> >> >
> >> > +static struct event_constraint intel_snb_pebs_events[] = {
> >> > +       PEBS_EVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
> >> > +       PEBS_EVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
> >> > +       PEBS_EVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
> >> > +       PEBS_EVENT_CONSTRAINT(0x01c4, 0xf), /* BR_INST_RETIRED.CONDITIONAL */
> >> > +       PEBS_EVENT_CONSTRAINT(0x02c4, 0xf), /* BR_INST_RETIRED.NEAR_CALL */
> >> > +       PEBS_EVENT_CONSTRAINT(0x04c4, 0xf), /* BR_INST_RETIRED.ALL_BRANCHES */
> >> > +       PEBS_EVENT_CONSTRAINT(0x08c4, 0xf), /* BR_INST_RETIRED.NEAR_RETURN */
> >> > +       PEBS_EVENT_CONSTRAINT(0x10c4, 0xf), /* BR_INST_RETIRED.NOT_TAKEN */
> >> > +       PEBS_EVENT_CONSTRAINT(0x20c4, 0xf), /* BR_INST_RETIRED.NEAR_TAKEN */
> >> > +       PEBS_EVENT_CONSTRAINT(0x40c4, 0xf), /* BR_INST_RETIRED.FAR_BRANCH */
> >> > +       PEBS_EVENT_CONSTRAINT(0x01c5, 0xf), /* BR_MISP_RETIRED.CONDITIONAL */
> >> > +       PEBS_EVENT_CONSTRAINT(0x02c5, 0xf), /* BR_MISP_RETIRED.NEAR_CALL */
> >> > +       PEBS_EVENT_CONSTRAINT(0x04c5, 0xf), /* BR_MISP_RETIRED.ALL_BRANCHES */
> >> > +       PEBS_EVENT_CONSTRAINT(0x10c5, 0xf), /* BR_MISP_RETIRED.NOT_TAKEN */
> >> > +       PEBS_EVENT_CONSTRAINT(0x20c5, 0xf), /* BR_MISP_RETIRED.TAKEN */
> >> > +       PEBS_EVENT_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
> >> > +       PEBS_EVENT_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORE */
> >>
> >> > +       PEBS_EVENT_CONSTRAINT(0x01d0, 0xf), /* MEM_UOP_RETIRED.LOADS */
> >> > +       PEBS_EVENT_CONSTRAINT(0x02d0, 0xf), /* MEM_UOP_RETIRED.STORES */
> >> > +       PEBS_EVENT_CONSTRAINT(0x10d0, 0xf), /* MEM_UOP_RETIRED.STLB_MISS */
> >> > +       PEBS_EVENT_CONSTRAINT(0x20d0, 0xf), /* MEM_UOP_RETIRED.LOCK */
> >> > +       PEBS_EVENT_CONSTRAINT(0x40d0, 0xf), /* MEM_UOP_RETIRED.SPLIT */
> >> > +       PEBS_EVENT_CONSTRAINT(0x80d0, 0xf), /* MEM_UOP_RETIRED.ALL */
> >>
> >> Not quite. For event 0xd0, you are not listing the right umask combinations.
> >> The following combinations are supported for event 0xd0:
> >>
> >> 0x5381d0      snb::MEM_UOP_RETIRED:ANY_LOADS
> >> 0x5382d0      snb::MEM_UOP_RETIRED:ANY_STORES
> >> 0x5321d0      snb::MEM_UOP_RETIRED:LOCK_LOADS
> >> 0x5322d0      snb::MEM_UOP_RETIRED:LOCK_STORES
> >> 0x5341d0      snb::MEM_UOP_RETIRED:SPLIT_LOADS
> >> 0x5342d0      snb::MEM_UOP_RETIRED:SPLIT_STORES
> >> 0x5311d0      snb::MEM_UOP_RETIRED:STLB_MISS_LOADS
> >> 0x5312d0      snb::MEM_UOP_RETIRED:STLB_MISS_STORES
> >>
> >> In other words, bit 0-3 of the umask cannot be zero.
> >
> > I got the umask from "Table 30-20. PEBS Performance Events for Intel
> > microarchitecture code name Sandy Bridge".
> >
> > But from "Table A-2. Non-Architectural Performance Events In the
> > Processor Core for Intel Core Processor 2xxx Series", the combinations
> > are needed as you show above.
> >
> > Which one is correct?
> >
> I think Table A-2 is correct. Umasks 10h, 20h, 40h, 80h MUST be combined
> to collect something meaningful.

Yes, thanks for figuring this out.



  reply	other threads:[~2011-02-28 14:03 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-28  7:22 [PATCH v2 -tip] perf: x86, add SandyBridge support Lin Ming
2011-02-28  8:20 ` Stephane Eranian
2011-02-28  8:51   ` Lin Ming
2011-02-28  9:02     ` Stephane Eranian
2011-02-28 14:03       ` Lin Ming [this message]
2011-02-28 14:28         ` Lin Ming
2011-02-28  9:08     ` Ingo Molnar
2011-02-28 14:02       ` Lin Ming
2011-02-28 14:13         ` Stephane Eranian
2011-02-28  9:15 ` Peter Zijlstra
2011-02-28 12:25   ` Stephane Eranian
2011-02-28 14:33     ` Lin Ming
2011-02-28 14:43       ` Stephane Eranian
2011-02-28 14:52         ` Lin Ming
2011-02-28 14:55           ` Stephane Eranian
2011-02-28 14:21   ` Lin Ming
2011-02-28 14:24     ` Peter Zijlstra
2011-02-28 14:45       ` Lin Ming
2011-02-28 14:46         ` Stephane Eranian
2011-02-28 14:56   ` Lin Ming
2011-02-28 15:11     ` Peter Zijlstra
2011-03-01  0:32       ` Lin Ming
2011-03-01  7:43   ` Stephane Eranian
2011-03-01  8:21     ` Lin Ming
2011-03-01  8:45     ` Lin Ming
2011-03-01  8:57       ` Stephane Eranian
2011-03-01  9:39         ` Stephane Eranian
2011-03-01 15:07           ` Lin Ming
2011-03-01 15:09             ` Stephane Eranian
2011-03-01 15:18               ` Lin Ming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1298901821.2169.36.camel@localhost \
    --to=ming.m.lin@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=andi@firstfloor.org \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.