* [PATCH 0/4] powerpc/perf: Fixes for power10 PMU
@ 2020-11-11 4:33 Athira Rajeev
2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw)
To: mpe; +Cc: mikey, maddy, linuxppc-dev
Patchset contains PMU fixes for power10.
This patchset contains 4 patches.
Patch1 includes fix to update event code with radix_scope_qual
bit in power10.
Patch2 updates the event group constraints for L2/L3 and threshold
events in power10.
Patch3 includes the event code changes for l2/l3 events and
some of the generic events.
Patch4 adds fixes for PMCCEXT bit in power10.
Athira Rajeev (4):
powerpc/perf: Fix to update radix_scope_qual in power10
powerpc/perf: Update the PMU group constraints for l2l3 and threshold
events in power10
powerpc/perf: Fix to update l2l3 events and generic event codes for
power10
powerpc/perf: MMCR0 control for PMU registers under PMCC=00
arch/powerpc/include/asm/reg.h | 1 +
arch/powerpc/kernel/cpu_setup_power.S | 2 +
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/perf/core-book3s.c | 16 +++
arch/powerpc/perf/isa207-common.c | 27 ++++-
arch/powerpc/perf/isa207-common.h | 16 ++-
arch/powerpc/perf/power10-events-list.h | 9 ++
arch/powerpc/perf/power10-pmu.c | 177 ++++++++++++++++++++++++++++++--
8 files changed, 236 insertions(+), 13 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev @ 2020-11-11 4:33 ` Athira Rajeev 2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev ` (2 subsequent siblings) 3 siblings, 0 replies; 9+ messages in thread From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw) To: mpe; +Cc: mikey, maddy, linuxppc-dev power10 uses bit 9 of the raw event code as RADIX_SCOPE_QUAL. This bit is used for enabling the radix process events. Patch fixes the PMU counter support functions to program bit 18 of MMCR1 ( Monitor Mode Control Register1 ) with the RADIX_SCOPE_QUAL bit value. Since this field is not per-pmc, add this to PMU group constraints to make sure events in a group will have same bit value for this field. Use bit 21 as constraint bit field for radix_scope_qual. Patch also updates the power10 raw event encoding layout information, format field and constraints bit layout to include the radix_scope_qual bit. Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> --- arch/powerpc/perf/isa207-common.c | 12 ++++++++++++ arch/powerpc/perf/isa207-common.h | 13 ++++++++++--- arch/powerpc/perf/power10-pmu.c | 11 +++++++---- 3 files changed, 29 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index 2848904..f57f54f 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -339,6 +339,11 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) value |= CNST_L1_QUAL_VAL(cache); } + if (cpu_has_feature(CPU_FTR_ARCH_31)) { + mask |= CNST_RADIX_SCOPE_GROUP_MASK; + value |= CNST_RADIX_SCOPE_GROUP_VAL(event >> p10_EVENT_RADIX_SCOPE_QUAL_SHIFT); + } + if (is_event_marked(event)) { mask |= CNST_SAMPLE_MASK; value |= CNST_SAMPLE_VAL(event >> EVENT_SAMPLE_SHIFT); @@ -456,6 +461,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev, } } + /* Set RADIX_SCOPE_QUAL bit */ + if (cpu_has_feature(CPU_FTR_ARCH_31)) { + val = (event[i] >> p10_EVENT_RADIX_SCOPE_QUAL_SHIFT) & + p10_EVENT_RADIX_SCOPE_QUAL_MASK; + mmcr1 |= val << p10_MMCR1_RADIX_SCOPE_QUAL_SHIFT; + } + if (is_event_marked(event[i])) { mmcra |= MMCRA_SAMPLE_ENABLE; diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h index 7025de5..dc9c3d2 100644 --- a/arch/powerpc/perf/isa207-common.h +++ b/arch/powerpc/perf/isa207-common.h @@ -101,6 +101,9 @@ #define p10_EVENT_CACHE_SEL_MASK 0x3ull #define p10_EVENT_MMCR3_MASK 0x7fffull #define p10_EVENT_MMCR3_SHIFT 45 +#define p10_EVENT_RADIX_SCOPE_QUAL_SHIFT 9 +#define p10_EVENT_RADIX_SCOPE_QUAL_MASK 0x1 +#define p10_MMCR1_RADIX_SCOPE_QUAL_SHIFT 45 #define p10_EVENT_VALID_MASK \ ((p10_SDAR_MODE_MASK << p10_SDAR_MODE_SHIFT | \ @@ -112,6 +115,7 @@ (p9_EVENT_COMBINE_MASK << p9_EVENT_COMBINE_SHIFT) | \ (p10_EVENT_MMCR3_MASK << p10_EVENT_MMCR3_SHIFT) | \ (EVENT_MARKED_MASK << EVENT_MARKED_SHIFT) | \ + (p10_EVENT_RADIX_SCOPE_QUAL_MASK << p10_EVENT_RADIX_SCOPE_QUAL_SHIFT) | \ EVENT_LINUX_MASK | \ EVENT_PSEL_MASK)) /* @@ -125,9 +129,9 @@ * * 28 24 20 16 12 8 4 0 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - * [ ] | [ ] [ sample ] [ ] [6] [5] [4] [3] [2] [1] - * | | | | - * BHRB IFM -* | | | Count of events for each PMC. + * [ ] | [ ] | [ sample ] [ ] [6] [5] [4] [3] [2] [1] + * | | | | | + * BHRB IFM -* | | |*radix_scope | Count of events for each PMC. * EBB -* | | p1, p2, p3, p4, p5, p6. * L1 I/D qualifier -* | * nc - number of counters -* @@ -165,6 +169,9 @@ #define CNST_L2L3_GROUP_VAL(v) (((v) & 0x1full) << 55) #define CNST_L2L3_GROUP_MASK CNST_L2L3_GROUP_VAL(0x1f) +#define CNST_RADIX_SCOPE_GROUP_VAL(v) (((v) & 0x1ull) << 21) +#define CNST_RADIX_SCOPE_GROUP_MASK CNST_RADIX_SCOPE_GROUP_VAL(1) + /* * For NC we are counting up to 4 events. This requires three bits, and we need * the fifth event to overflow and set the 4th bit. To achieve that we bias the diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c index 9dbe8f9..cf44fb7 100644 --- a/arch/powerpc/perf/power10-pmu.c +++ b/arch/powerpc/perf/power10-pmu.c @@ -23,10 +23,10 @@ * * 28 24 20 16 12 8 4 0 * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - * [ ] [ sample ] [ ] [ ] [ pmc ] [unit ] [ ] m [ pmcxsel ] - * | | | | | | - * | | | | | *- mark - * | | | *- L1/L2/L3 cache_sel | + * [ ] [ sample ] [ ] [ ] [ pmc ] [unit ] [ ] | m [ pmcxsel ] + * | | | | | | | + * | | | | | | *- mark + * | | | *- L1/L2/L3 cache_sel | |*-radix_scope_qual * | | sdar_mode | * | *- sampling mode for marked events *- combine * | @@ -59,6 +59,7 @@ * * MMCR1[16] = cache_sel[0] * MMCR1[17] = cache_sel[1] + * MMCR1[18] = radix_scope_qual * * if mark: * MMCRA[63] = 1 (SAMPLE_ENABLE) @@ -175,6 +176,7 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) PMU_FORMAT_ATTR(invert_bit, "config:47"); PMU_FORMAT_ATTR(src_mask, "config:48-53"); PMU_FORMAT_ATTR(src_match, "config:54-59"); +PMU_FORMAT_ATTR(radix_scope, "config:9"); static struct attribute *power10_pmu_format_attr[] = { &format_attr_event.attr, @@ -194,6 +196,7 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) &format_attr_invert_bit.attr, &format_attr_src_mask.attr, &format_attr_src_match.attr, + &format_attr_radix_scope.attr, NULL, }; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events in power10 2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev 2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev @ 2020-11-11 4:33 ` Athira Rajeev 2020-11-18 4:32 ` Michael Ellerman 2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev 2020-11-11 4:33 ` [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00 Athira Rajeev 3 siblings, 1 reply; 9+ messages in thread From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw) To: mpe; +Cc: mikey, maddy, linuxppc-dev In Power9, L2/L3 bus events are always available as a "bank" of 4 events. To obtain the counts for any of the l2/l3 bus events in a given bank, the user will have to program PMC4 with corresponding l2/l3 bus event for that bank. Commit 59029136d750 ("powerpc/perf: Add constraints for power9 l2/l3 bus events") enforced this rule in Power9. But this is not valid for Power10, since in Power10 Monitor Mode Control Register2 (MMCR2) has bits to configure l2/l3 event bits. Hence remove this PMC4 constraint check from power10. Since the l2/l3 bits in MMCR2 are not per-pmc, patch handles group constrints checks for l2/l3 bits in MMCR2. Patch also updates constraints for threshold events in power10. Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> --- arch/powerpc/perf/isa207-common.c | 15 +++++++++++---- arch/powerpc/perf/isa207-common.h | 3 +++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index f57f54f..0f4983e 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -311,9 +311,11 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) } if (unit >= 6 && unit <= 9) { - if (cpu_has_feature(CPU_FTR_ARCH_31) && (unit == 6)) { - mask |= CNST_L2L3_GROUP_MASK; - value |= CNST_L2L3_GROUP_VAL(event >> p10_L2L3_EVENT_SHIFT); + if (cpu_has_feature(CPU_FTR_ARCH_31)) { + if (unit == 6) { + mask |= CNST_L2L3_GROUP_MASK; + value |= CNST_L2L3_GROUP_VAL(event >> p10_L2L3_EVENT_SHIFT); + } } else if (cpu_has_feature(CPU_FTR_ARCH_300)) { mask |= CNST_CACHE_GROUP_MASK; value |= CNST_CACHE_GROUP_VAL(event & 0xff); @@ -349,7 +351,12 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp) value |= CNST_SAMPLE_VAL(event >> EVENT_SAMPLE_SHIFT); } - if (cpu_has_feature(CPU_FTR_ARCH_300)) { + if (cpu_has_feature(CPU_FTR_ARCH_31)) { + if (event_is_threshold(event)) { + mask |= CNST_THRESH_CTL_SEL_MASK; + value |= CNST_THRESH_CTL_SEL_VAL(event >> EVENT_THRESH_SHIFT); + } + } else if (cpu_has_feature(CPU_FTR_ARCH_300)) { if (event_is_threshold(event) && is_thresh_cmp_valid(event)) { mask |= CNST_THRESH_MASK; value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT); diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h index dc9c3d2..4208764 100644 --- a/arch/powerpc/perf/isa207-common.h +++ b/arch/powerpc/perf/isa207-common.h @@ -149,6 +149,9 @@ #define CNST_THRESH_VAL(v) (((v) & EVENT_THRESH_MASK) << 32) #define CNST_THRESH_MASK CNST_THRESH_VAL(EVENT_THRESH_MASK) +#define CNST_THRESH_CTL_SEL_VAL(v) (((v) & 0x7ffull) << 32) +#define CNST_THRESH_CTL_SEL_MASK CNST_THRESH_CTL_SEL_VAL(0x7ff) + #define CNST_EBB_VAL(v) (((v) & EVENT_EBB_MASK) << 24) #define CNST_EBB_MASK CNST_EBB_VAL(EVENT_EBB_MASK) -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events in power10 2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev @ 2020-11-18 4:32 ` Michael Ellerman 2020-11-18 5:21 ` Athira Rajeev 0 siblings, 1 reply; 9+ messages in thread From: Michael Ellerman @ 2020-11-18 4:32 UTC (permalink / raw) To: Athira Rajeev; +Cc: mikey, maddy, linuxppc-dev Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes: > In Power9, L2/L3 bus events are always available as a > "bank" of 4 events. To obtain the counts for any of the > l2/l3 bus events in a given bank, the user will have to > program PMC4 with corresponding l2/l3 bus event for that > bank. > > Commit 59029136d750 ("powerpc/perf: Add constraints for power9 l2/l3 bus events") > enforced this rule in Power9. But this is not valid for > Power10, since in Power10 Monitor Mode Control Register2 > (MMCR2) has bits to configure l2/l3 event bits. Hence remove > this PMC4 constraint check from power10. > > Since the l2/l3 bits in MMCR2 are not per-pmc, patch handles > group constrints checks for l2/l3 bits in MMCR2. > Patch also updates constraints for threshold events in power10. That should be done in a separate patch please. cheers ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events in power10 2020-11-18 4:32 ` Michael Ellerman @ 2020-11-18 5:21 ` Athira Rajeev 0 siblings, 0 replies; 9+ messages in thread From: Athira Rajeev @ 2020-11-18 5:21 UTC (permalink / raw) To: Michael Ellerman; +Cc: Michael Neuling, Madhavan Srinivasan, linuxppc-dev > On 18-Nov-2020, at 10:02 AM, Michael Ellerman <mpe@ellerman.id.au> wrote: > > Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes: >> In Power9, L2/L3 bus events are always available as a >> "bank" of 4 events. To obtain the counts for any of the >> l2/l3 bus events in a given bank, the user will have to >> program PMC4 with corresponding l2/l3 bus event for that >> bank. >> >> Commit 59029136d750 ("powerpc/perf: Add constraints for power9 l2/l3 bus events") >> enforced this rule in Power9. But this is not valid for >> Power10, since in Power10 Monitor Mode Control Register2 >> (MMCR2) has bits to configure l2/l3 event bits. Hence remove >> this PMC4 constraint check from power10. >> >> Since the l2/l3 bits in MMCR2 are not per-pmc, patch handles >> group constrints checks for l2/l3 bits in MMCR2. > >> Patch also updates constraints for threshold events in power10. > > That should be done in a separate patch please. Thanks mpe for checking the patch set. Sure, I will make threshold constraint changes as a separate patch and send next version > > cheers ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev 2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev 2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev @ 2020-11-11 4:33 ` Athira Rajeev 2020-11-18 4:36 ` Michael Ellerman 2020-11-11 4:33 ` [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00 Athira Rajeev 3 siblings, 1 reply; 9+ messages in thread From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw) To: mpe; +Cc: mikey, maddy, linuxppc-dev Fix the event code for events: branch-instructions (to PM_BR_FIN), branch-misses (to PM_BR_MPRED_FIN) and cache-misses (to PM_LD_DEMAND_MISS_L1_FIN) for power10 PMU. Update the list of generic events with this modified event code. Export l2l3 events (PM_L2_ST_MISS and PM_L2_ST) and LLC-prefetches (PM_L3_PF_MISS_L3) via sysfs, and also add these to cache_events. To maintain the current event code work with DD1, rename existing array of generic_events, cache_events and pmu_attr_groups with suffix _dd1. Update the power10 pmu init code to pick the dd1 list while registering the power PMU, based on the pvr (Processor Version Register) value. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> --- arch/powerpc/perf/power10-events-list.h | 9 ++ arch/powerpc/perf/power10-pmu.c | 166 +++++++++++++++++++++++++++++++- 2 files changed, 173 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/power10-events-list.h b/arch/powerpc/perf/power10-events-list.h index 60c1b81..9e0b3c9 100644 --- a/arch/powerpc/perf/power10-events-list.h +++ b/arch/powerpc/perf/power10-events-list.h @@ -15,6 +15,9 @@ EVENT(PM_RUN_INST_CMPL, 0x500fa); EVENT(PM_BR_CMPL, 0x4d05e); EVENT(PM_BR_MPRED_CMPL, 0x400f6); +EVENT(PM_BR_FIN, 0x2f04a); +EVENT(PM_BR_MPRED_FIN, 0x35884); +EVENT(PM_LD_DEMAND_MISS_L1_FIN, 0x400f0); /* All L1 D cache load references counted at finish, gated by reject */ EVENT(PM_LD_REF_L1, 0x100fc); @@ -36,6 +39,12 @@ EVENT(PM_DATA_FROM_L3, 0x01340000001c040); /* Demand LD - L3 Miss (not L2 hit and not L3 hit) */ EVENT(PM_DATA_FROM_L3MISS, 0x300fe); +/* All successful D-side store dispatches for this thread */ +EVENT(PM_L2_ST, 0x010000046080); +/* All successful D-side store dispatches for this thread that were L2 Miss */ +EVENT(PM_L2_ST_MISS, 0x26880); +/* Total HW L3 prefetches(Load+store) */ +EVENT(PM_L3_PF_MISS_L3, 0x100000016080); /* Data PTEG reload */ EVENT(PM_DTLB_MISS, 0x300fc); /* ITLB Reloaded */ diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c index cf44fb7..86665ad 100644 --- a/arch/powerpc/perf/power10-pmu.c +++ b/arch/powerpc/perf/power10-pmu.c @@ -114,6 +114,9 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1); GENERIC_EVENT_ATTR(mem-loads, MEM_LOADS); GENERIC_EVENT_ATTR(mem-stores, MEM_STORES); +GENERIC_EVENT_ATTR(branch-instructions, PM_BR_FIN); +GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_FIN); +GENERIC_EVENT_ATTR(cache-misses, PM_LD_DEMAND_MISS_L1_FIN); CACHE_EVENT_ATTR(L1-dcache-load-misses, PM_LD_MISS_L1); CACHE_EVENT_ATTR(L1-dcache-loads, PM_LD_REF_L1); @@ -124,12 +127,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) CACHE_EVENT_ATTR(L1-icache-prefetches, PM_IC_PREF_REQ); CACHE_EVENT_ATTR(LLC-load-misses, PM_DATA_FROM_L3MISS); CACHE_EVENT_ATTR(LLC-loads, PM_DATA_FROM_L3); +CACHE_EVENT_ATTR(LLC-prefetches, PM_L3_PF_MISS_L3); +CACHE_EVENT_ATTR(LLC-store-misses, PM_L2_ST_MISS); +CACHE_EVENT_ATTR(LLC-stores, PM_L2_ST); CACHE_EVENT_ATTR(branch-load-misses, PM_BR_MPRED_CMPL); CACHE_EVENT_ATTR(branch-loads, PM_BR_CMPL); CACHE_EVENT_ATTR(dTLB-load-misses, PM_DTLB_MISS); CACHE_EVENT_ATTR(iTLB-load-misses, PM_ITLB_MISS); -static struct attribute *power10_events_attr[] = { +static struct attribute *power10_events_attr_dd1[] = { GENERIC_EVENT_PTR(PM_RUN_CYC), GENERIC_EVENT_PTR(PM_RUN_INST_CMPL), GENERIC_EVENT_PTR(PM_BR_CMPL), @@ -154,11 +160,44 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) NULL }; +static struct attribute *power10_events_attr[] = { + GENERIC_EVENT_PTR(PM_RUN_CYC), + GENERIC_EVENT_PTR(PM_RUN_INST_CMPL), + GENERIC_EVENT_PTR(PM_BR_FIN), + GENERIC_EVENT_PTR(PM_BR_MPRED_FIN), + GENERIC_EVENT_PTR(PM_LD_REF_L1), + GENERIC_EVENT_PTR(PM_LD_DEMAND_MISS_L1_FIN), + GENERIC_EVENT_PTR(MEM_LOADS), + GENERIC_EVENT_PTR(MEM_STORES), + CACHE_EVENT_PTR(PM_LD_MISS_L1), + CACHE_EVENT_PTR(PM_LD_REF_L1), + CACHE_EVENT_PTR(PM_LD_PREFETCH_CACHE_LINE_MISS), + CACHE_EVENT_PTR(PM_ST_MISS_L1), + CACHE_EVENT_PTR(PM_L1_ICACHE_MISS), + CACHE_EVENT_PTR(PM_INST_FROM_L1), + CACHE_EVENT_PTR(PM_IC_PREF_REQ), + CACHE_EVENT_PTR(PM_DATA_FROM_L3MISS), + CACHE_EVENT_PTR(PM_DATA_FROM_L3), + CACHE_EVENT_PTR(PM_L3_PF_MISS_L3), + CACHE_EVENT_PTR(PM_L2_ST_MISS), + CACHE_EVENT_PTR(PM_L2_ST), + CACHE_EVENT_PTR(PM_BR_MPRED_CMPL), + CACHE_EVENT_PTR(PM_BR_CMPL), + CACHE_EVENT_PTR(PM_DTLB_MISS), + CACHE_EVENT_PTR(PM_ITLB_MISS), + NULL +}; + static struct attribute_group power10_pmu_events_group = { .name = "events", .attrs = power10_events_attr, }; +static struct attribute_group power10_pmu_events_group_dd1 = { + .name = "events", + .attrs = power10_events_attr_dd1, +}; + PMU_FORMAT_ATTR(event, "config:0-59"); PMU_FORMAT_ATTR(pmcxsel, "config:0-7"); PMU_FORMAT_ATTR(mark, "config:8"); @@ -211,7 +250,13 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) NULL, }; -static int power10_generic_events[] = { +static const struct attribute_group *power10_pmu_attr_groups_dd1[] = { + &power10_pmu_format_group, + &power10_pmu_events_group_dd1, + NULL, +}; + +static int power10_generic_events_dd1[] = { [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC, [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL, [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_CMPL, @@ -220,6 +265,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_MISS_L1, }; +static int power10_generic_events[] = { + [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC, + [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL, + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_FIN, + [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_FIN, + [PERF_COUNT_HW_CACHE_REFERENCES] = PM_LD_REF_L1, + [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_DEMAND_MISS_L1_FIN, +}; + static u64 power10_bhrb_filter_map(u64 branch_sample_type) { u64 pmu_bhrb_filter = 0; @@ -311,6 +365,107 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter) [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS, }, [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = PM_L2_ST, + [C(RESULT_MISS)] = PM_L2_ST_MISS, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = PM_L3_PF_MISS_L3, + [C(RESULT_MISS)] = 0, + }, + }, + [C(DTLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = 0, + [C(RESULT_MISS)] = PM_DTLB_MISS, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + }, + [C(ITLB)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = 0, + [C(RESULT_MISS)] = PM_ITLB_MISS, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + }, + [C(BPU)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = PM_BR_CMPL, + [C(RESULT_MISS)] = PM_BR_MPRED_CMPL, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + }, + [C(NODE)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = -1, + [C(RESULT_MISS)] = -1, + }, + }, +}; + +static u64 power10_cache_events_dd1[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = { + [C(L1D)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = PM_LD_REF_L1, + [C(RESULT_MISS)] = PM_LD_MISS_L1, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = 0, + [C(RESULT_MISS)] = PM_ST_MISS_L1, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = PM_LD_PREFETCH_CACHE_LINE_MISS, + [C(RESULT_MISS)] = 0, + }, + }, + [C(L1I)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = PM_INST_FROM_L1, + [C(RESULT_MISS)] = PM_L1_ICACHE_MISS, + }, + [C(OP_WRITE)] = { + [C(RESULT_ACCESS)] = PM_INST_FROM_L1MISS, + [C(RESULT_MISS)] = -1, + }, + [C(OP_PREFETCH)] = { + [C(RESULT_ACCESS)] = PM_IC_PREF_REQ, + [C(RESULT_MISS)] = 0, + }, + }, + [C(LL)] = { + [C(OP_READ)] = { + [C(RESULT_ACCESS)] = PM_DATA_FROM_L3, + [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS, + }, + [C(OP_WRITE)] = { [C(RESULT_ACCESS)] = -1, [C(RESULT_MISS)] = -1, }, @@ -407,6 +562,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter) int init_power10_pmu(void) { int rc; + unsigned int pvr = mfspr(SPRN_PVR); /* Comes from cpu_specs[] */ if (!cur_cpu_spec->oprofile_cpu_type || @@ -416,6 +572,12 @@ int init_power10_pmu(void) /* Set the PERF_REG_EXTENDED_MASK here */ PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31; + if ((PVR_MAJ(pvr) == 1)) { + power10_pmu.generic_events = power10_generic_events_dd1; + power10_pmu.attr_groups = power10_pmu_attr_groups_dd1; + power10_pmu.cache_events = &power10_cache_events_dd1; + } + rc = register_power_pmu(&power10_pmu); if (rc) return rc; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev @ 2020-11-18 4:36 ` Michael Ellerman 2020-11-18 5:23 ` Athira Rajeev 0 siblings, 1 reply; 9+ messages in thread From: Michael Ellerman @ 2020-11-18 4:36 UTC (permalink / raw) To: Athira Rajeev; +Cc: mikey, maddy, linuxppc-dev Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes: > Fix the event code for events: branch-instructions (to PM_BR_FIN), > branch-misses (to PM_BR_MPRED_FIN) and cache-misses (to > PM_LD_DEMAND_MISS_L1_FIN) for power10 PMU. Update the > list of generic events with this modified event code. That should be one patch. > Export l2l3 events (PM_L2_ST_MISS and PM_L2_ST) and LLC-prefetches > (PM_L3_PF_MISS_L3) via sysfs, and also add these to cache_events. That should be another patch. > To maintain the current event code work with DD1, rename > existing array of generic_events, cache_events and pmu_attr_groups > with suffix _dd1. Update the power10 pmu init code to pick the > dd1 list while registering the power PMU, based on the pvr > (Processor Version Register) value. And that should be a third patch. cheers > diff --git a/arch/powerpc/perf/power10-events-list.h b/arch/powerpc/perf/power10-events-list.h > index 60c1b81..9e0b3c9 100644 > --- a/arch/powerpc/perf/power10-events-list.h > +++ b/arch/powerpc/perf/power10-events-list.h > @@ -15,6 +15,9 @@ > EVENT(PM_RUN_INST_CMPL, 0x500fa); > EVENT(PM_BR_CMPL, 0x4d05e); > EVENT(PM_BR_MPRED_CMPL, 0x400f6); > +EVENT(PM_BR_FIN, 0x2f04a); > +EVENT(PM_BR_MPRED_FIN, 0x35884); > +EVENT(PM_LD_DEMAND_MISS_L1_FIN, 0x400f0); > > /* All L1 D cache load references counted at finish, gated by reject */ > EVENT(PM_LD_REF_L1, 0x100fc); > @@ -36,6 +39,12 @@ > EVENT(PM_DATA_FROM_L3, 0x01340000001c040); > /* Demand LD - L3 Miss (not L2 hit and not L3 hit) */ > EVENT(PM_DATA_FROM_L3MISS, 0x300fe); > +/* All successful D-side store dispatches for this thread */ > +EVENT(PM_L2_ST, 0x010000046080); > +/* All successful D-side store dispatches for this thread that were L2 Miss */ > +EVENT(PM_L2_ST_MISS, 0x26880); > +/* Total HW L3 prefetches(Load+store) */ > +EVENT(PM_L3_PF_MISS_L3, 0x100000016080); > /* Data PTEG reload */ > EVENT(PM_DTLB_MISS, 0x300fc); > /* ITLB Reloaded */ > diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c > index cf44fb7..86665ad 100644 > --- a/arch/powerpc/perf/power10-pmu.c > +++ b/arch/powerpc/perf/power10-pmu.c > @@ -114,6 +114,9 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) > GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1); > GENERIC_EVENT_ATTR(mem-loads, MEM_LOADS); > GENERIC_EVENT_ATTR(mem-stores, MEM_STORES); > +GENERIC_EVENT_ATTR(branch-instructions, PM_BR_FIN); > +GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_FIN); > +GENERIC_EVENT_ATTR(cache-misses, PM_LD_DEMAND_MISS_L1_FIN); > > CACHE_EVENT_ATTR(L1-dcache-load-misses, PM_LD_MISS_L1); > CACHE_EVENT_ATTR(L1-dcache-loads, PM_LD_REF_L1); > @@ -124,12 +127,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) > CACHE_EVENT_ATTR(L1-icache-prefetches, PM_IC_PREF_REQ); > CACHE_EVENT_ATTR(LLC-load-misses, PM_DATA_FROM_L3MISS); > CACHE_EVENT_ATTR(LLC-loads, PM_DATA_FROM_L3); > +CACHE_EVENT_ATTR(LLC-prefetches, PM_L3_PF_MISS_L3); > +CACHE_EVENT_ATTR(LLC-store-misses, PM_L2_ST_MISS); > +CACHE_EVENT_ATTR(LLC-stores, PM_L2_ST); > CACHE_EVENT_ATTR(branch-load-misses, PM_BR_MPRED_CMPL); > CACHE_EVENT_ATTR(branch-loads, PM_BR_CMPL); > CACHE_EVENT_ATTR(dTLB-load-misses, PM_DTLB_MISS); > CACHE_EVENT_ATTR(iTLB-load-misses, PM_ITLB_MISS); > > -static struct attribute *power10_events_attr[] = { > +static struct attribute *power10_events_attr_dd1[] = { > GENERIC_EVENT_PTR(PM_RUN_CYC), > GENERIC_EVENT_PTR(PM_RUN_INST_CMPL), > GENERIC_EVENT_PTR(PM_BR_CMPL), > @@ -154,11 +160,44 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) > NULL > }; > > +static struct attribute *power10_events_attr[] = { > + GENERIC_EVENT_PTR(PM_RUN_CYC), > + GENERIC_EVENT_PTR(PM_RUN_INST_CMPL), > + GENERIC_EVENT_PTR(PM_BR_FIN), > + GENERIC_EVENT_PTR(PM_BR_MPRED_FIN), > + GENERIC_EVENT_PTR(PM_LD_REF_L1), > + GENERIC_EVENT_PTR(PM_LD_DEMAND_MISS_L1_FIN), > + GENERIC_EVENT_PTR(MEM_LOADS), > + GENERIC_EVENT_PTR(MEM_STORES), > + CACHE_EVENT_PTR(PM_LD_MISS_L1), > + CACHE_EVENT_PTR(PM_LD_REF_L1), > + CACHE_EVENT_PTR(PM_LD_PREFETCH_CACHE_LINE_MISS), > + CACHE_EVENT_PTR(PM_ST_MISS_L1), > + CACHE_EVENT_PTR(PM_L1_ICACHE_MISS), > + CACHE_EVENT_PTR(PM_INST_FROM_L1), > + CACHE_EVENT_PTR(PM_IC_PREF_REQ), > + CACHE_EVENT_PTR(PM_DATA_FROM_L3MISS), > + CACHE_EVENT_PTR(PM_DATA_FROM_L3), > + CACHE_EVENT_PTR(PM_L3_PF_MISS_L3), > + CACHE_EVENT_PTR(PM_L2_ST_MISS), > + CACHE_EVENT_PTR(PM_L2_ST), > + CACHE_EVENT_PTR(PM_BR_MPRED_CMPL), > + CACHE_EVENT_PTR(PM_BR_CMPL), > + CACHE_EVENT_PTR(PM_DTLB_MISS), > + CACHE_EVENT_PTR(PM_ITLB_MISS), > + NULL > +}; > + > static struct attribute_group power10_pmu_events_group = { > .name = "events", > .attrs = power10_events_attr, > }; > > +static struct attribute_group power10_pmu_events_group_dd1 = { > + .name = "events", > + .attrs = power10_events_attr_dd1, > +}; > + > PMU_FORMAT_ATTR(event, "config:0-59"); > PMU_FORMAT_ATTR(pmcxsel, "config:0-7"); > PMU_FORMAT_ATTR(mark, "config:8"); > @@ -211,7 +250,13 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) > NULL, > }; > > -static int power10_generic_events[] = { > +static const struct attribute_group *power10_pmu_attr_groups_dd1[] = { > + &power10_pmu_format_group, > + &power10_pmu_events_group_dd1, > + NULL, > +}; > + > +static int power10_generic_events_dd1[] = { > [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC, > [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL, > [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_CMPL, > @@ -220,6 +265,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) > [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_MISS_L1, > }; > > +static int power10_generic_events[] = { > + [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC, > + [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL, > + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_FIN, > + [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_FIN, > + [PERF_COUNT_HW_CACHE_REFERENCES] = PM_LD_REF_L1, > + [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_DEMAND_MISS_L1_FIN, > +}; > + > static u64 power10_bhrb_filter_map(u64 branch_sample_type) > { > u64 pmu_bhrb_filter = 0; > @@ -311,6 +365,107 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter) > [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS, > }, > [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = PM_L2_ST, > + [C(RESULT_MISS)] = PM_L2_ST_MISS, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = PM_L3_PF_MISS_L3, > + [C(RESULT_MISS)] = 0, > + }, > + }, > + [C(DTLB)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = 0, > + [C(RESULT_MISS)] = PM_DTLB_MISS, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + }, > + [C(ITLB)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = 0, > + [C(RESULT_MISS)] = PM_ITLB_MISS, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + }, > + [C(BPU)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = PM_BR_CMPL, > + [C(RESULT_MISS)] = PM_BR_MPRED_CMPL, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + }, > + [C(NODE)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = -1, > + [C(RESULT_MISS)] = -1, > + }, > + }, > +}; > + > +static u64 power10_cache_events_dd1[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = { > + [C(L1D)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = PM_LD_REF_L1, > + [C(RESULT_MISS)] = PM_LD_MISS_L1, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = 0, > + [C(RESULT_MISS)] = PM_ST_MISS_L1, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = PM_LD_PREFETCH_CACHE_LINE_MISS, > + [C(RESULT_MISS)] = 0, > + }, > + }, > + [C(L1I)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = PM_INST_FROM_L1, > + [C(RESULT_MISS)] = PM_L1_ICACHE_MISS, > + }, > + [C(OP_WRITE)] = { > + [C(RESULT_ACCESS)] = PM_INST_FROM_L1MISS, > + [C(RESULT_MISS)] = -1, > + }, > + [C(OP_PREFETCH)] = { > + [C(RESULT_ACCESS)] = PM_IC_PREF_REQ, > + [C(RESULT_MISS)] = 0, > + }, > + }, > + [C(LL)] = { > + [C(OP_READ)] = { > + [C(RESULT_ACCESS)] = PM_DATA_FROM_L3, > + [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS, > + }, > + [C(OP_WRITE)] = { > [C(RESULT_ACCESS)] = -1, > [C(RESULT_MISS)] = -1, > }, > @@ -407,6 +562,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter) > int init_power10_pmu(void) > { > int rc; > + unsigned int pvr = mfspr(SPRN_PVR); > > /* Comes from cpu_specs[] */ > if (!cur_cpu_spec->oprofile_cpu_type || > @@ -416,6 +572,12 @@ int init_power10_pmu(void) > /* Set the PERF_REG_EXTENDED_MASK here */ > PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31; > > + if ((PVR_MAJ(pvr) == 1)) { > + power10_pmu.generic_events = power10_generic_events_dd1; > + power10_pmu.attr_groups = power10_pmu_attr_groups_dd1; > + power10_pmu.cache_events = &power10_cache_events_dd1; > + } > + > rc = register_power_pmu(&power10_pmu); > if (rc) > return rc; > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 2020-11-18 4:36 ` Michael Ellerman @ 2020-11-18 5:23 ` Athira Rajeev 0 siblings, 0 replies; 9+ messages in thread From: Athira Rajeev @ 2020-11-18 5:23 UTC (permalink / raw) To: Michael Ellerman; +Cc: Michael Neuling, Madhavan Srinivasan, linuxppc-dev > On 18-Nov-2020, at 10:06 AM, Michael Ellerman <mpe@ellerman.id.au> wrote: > > Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes: >> Fix the event code for events: branch-instructions (to PM_BR_FIN), >> branch-misses (to PM_BR_MPRED_FIN) and cache-misses (to >> PM_LD_DEMAND_MISS_L1_FIN) for power10 PMU. Update the >> list of generic events with this modified event code. > > That should be one patch. Ok, > >> Export l2l3 events (PM_L2_ST_MISS and PM_L2_ST) and LLC-prefetches >> (PM_L3_PF_MISS_L3) via sysfs, and also add these to cache_events. > > That should be another patch. Ok, > >> To maintain the current event code work with DD1, rename >> existing array of generic_events, cache_events and pmu_attr_groups >> with suffix _dd1. Update the power10 pmu init code to pick the >> dd1 list while registering the power PMU, based on the pvr >> (Processor Version Register) value. > > And that should be a third patch. > Ok, I will make these changes in the next version Thanks Athira > cheers > >> diff --git a/arch/powerpc/perf/power10-events-list.h b/arch/powerpc/perf/power10-events-list.h >> index 60c1b81..9e0b3c9 100644 >> --- a/arch/powerpc/perf/power10-events-list.h >> +++ b/arch/powerpc/perf/power10-events-list.h >> @@ -15,6 +15,9 @@ >> EVENT(PM_RUN_INST_CMPL, 0x500fa); >> EVENT(PM_BR_CMPL, 0x4d05e); >> EVENT(PM_BR_MPRED_CMPL, 0x400f6); >> +EVENT(PM_BR_FIN, 0x2f04a); >> +EVENT(PM_BR_MPRED_FIN, 0x35884); >> +EVENT(PM_LD_DEMAND_MISS_L1_FIN, 0x400f0); >> >> /* All L1 D cache load references counted at finish, gated by reject */ >> EVENT(PM_LD_REF_L1, 0x100fc); >> @@ -36,6 +39,12 @@ >> EVENT(PM_DATA_FROM_L3, 0x01340000001c040); >> /* Demand LD - L3 Miss (not L2 hit and not L3 hit) */ >> EVENT(PM_DATA_FROM_L3MISS, 0x300fe); >> +/* All successful D-side store dispatches for this thread */ >> +EVENT(PM_L2_ST, 0x010000046080); >> +/* All successful D-side store dispatches for this thread that were L2 Miss */ >> +EVENT(PM_L2_ST_MISS, 0x26880); >> +/* Total HW L3 prefetches(Load+store) */ >> +EVENT(PM_L3_PF_MISS_L3, 0x100000016080); >> /* Data PTEG reload */ >> EVENT(PM_DTLB_MISS, 0x300fc); >> /* ITLB Reloaded */ >> diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c >> index cf44fb7..86665ad 100644 >> --- a/arch/powerpc/perf/power10-pmu.c >> +++ b/arch/powerpc/perf/power10-pmu.c >> @@ -114,6 +114,9 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) >> GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1); >> GENERIC_EVENT_ATTR(mem-loads, MEM_LOADS); >> GENERIC_EVENT_ATTR(mem-stores, MEM_STORES); >> +GENERIC_EVENT_ATTR(branch-instructions, PM_BR_FIN); >> +GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_FIN); >> +GENERIC_EVENT_ATTR(cache-misses, PM_LD_DEMAND_MISS_L1_FIN); >> >> CACHE_EVENT_ATTR(L1-dcache-load-misses, PM_LD_MISS_L1); >> CACHE_EVENT_ATTR(L1-dcache-loads, PM_LD_REF_L1); >> @@ -124,12 +127,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) >> CACHE_EVENT_ATTR(L1-icache-prefetches, PM_IC_PREF_REQ); >> CACHE_EVENT_ATTR(LLC-load-misses, PM_DATA_FROM_L3MISS); >> CACHE_EVENT_ATTR(LLC-loads, PM_DATA_FROM_L3); >> +CACHE_EVENT_ATTR(LLC-prefetches, PM_L3_PF_MISS_L3); >> +CACHE_EVENT_ATTR(LLC-store-misses, PM_L2_ST_MISS); >> +CACHE_EVENT_ATTR(LLC-stores, PM_L2_ST); >> CACHE_EVENT_ATTR(branch-load-misses, PM_BR_MPRED_CMPL); >> CACHE_EVENT_ATTR(branch-loads, PM_BR_CMPL); >> CACHE_EVENT_ATTR(dTLB-load-misses, PM_DTLB_MISS); >> CACHE_EVENT_ATTR(iTLB-load-misses, PM_ITLB_MISS); >> >> -static struct attribute *power10_events_attr[] = { >> +static struct attribute *power10_events_attr_dd1[] = { >> GENERIC_EVENT_PTR(PM_RUN_CYC), >> GENERIC_EVENT_PTR(PM_RUN_INST_CMPL), >> GENERIC_EVENT_PTR(PM_BR_CMPL), >> @@ -154,11 +160,44 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) >> NULL >> }; >> >> +static struct attribute *power10_events_attr[] = { >> + GENERIC_EVENT_PTR(PM_RUN_CYC), >> + GENERIC_EVENT_PTR(PM_RUN_INST_CMPL), >> + GENERIC_EVENT_PTR(PM_BR_FIN), >> + GENERIC_EVENT_PTR(PM_BR_MPRED_FIN), >> + GENERIC_EVENT_PTR(PM_LD_REF_L1), >> + GENERIC_EVENT_PTR(PM_LD_DEMAND_MISS_L1_FIN), >> + GENERIC_EVENT_PTR(MEM_LOADS), >> + GENERIC_EVENT_PTR(MEM_STORES), >> + CACHE_EVENT_PTR(PM_LD_MISS_L1), >> + CACHE_EVENT_PTR(PM_LD_REF_L1), >> + CACHE_EVENT_PTR(PM_LD_PREFETCH_CACHE_LINE_MISS), >> + CACHE_EVENT_PTR(PM_ST_MISS_L1), >> + CACHE_EVENT_PTR(PM_L1_ICACHE_MISS), >> + CACHE_EVENT_PTR(PM_INST_FROM_L1), >> + CACHE_EVENT_PTR(PM_IC_PREF_REQ), >> + CACHE_EVENT_PTR(PM_DATA_FROM_L3MISS), >> + CACHE_EVENT_PTR(PM_DATA_FROM_L3), >> + CACHE_EVENT_PTR(PM_L3_PF_MISS_L3), >> + CACHE_EVENT_PTR(PM_L2_ST_MISS), >> + CACHE_EVENT_PTR(PM_L2_ST), >> + CACHE_EVENT_PTR(PM_BR_MPRED_CMPL), >> + CACHE_EVENT_PTR(PM_BR_CMPL), >> + CACHE_EVENT_PTR(PM_DTLB_MISS), >> + CACHE_EVENT_PTR(PM_ITLB_MISS), >> + NULL >> +}; >> + >> static struct attribute_group power10_pmu_events_group = { >> .name = "events", >> .attrs = power10_events_attr, >> }; >> >> +static struct attribute_group power10_pmu_events_group_dd1 = { >> + .name = "events", >> + .attrs = power10_events_attr_dd1, >> +}; >> + >> PMU_FORMAT_ATTR(event, "config:0-59"); >> PMU_FORMAT_ATTR(pmcxsel, "config:0-7"); >> PMU_FORMAT_ATTR(mark, "config:8"); >> @@ -211,7 +250,13 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) >> NULL, >> }; >> >> -static int power10_generic_events[] = { >> +static const struct attribute_group *power10_pmu_attr_groups_dd1[] = { >> + &power10_pmu_format_group, >> + &power10_pmu_events_group_dd1, >> + NULL, >> +}; >> + >> +static int power10_generic_events_dd1[] = { >> [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC, >> [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL, >> [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_CMPL, >> @@ -220,6 +265,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[]) >> [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_MISS_L1, >> }; >> >> +static int power10_generic_events[] = { >> + [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC, >> + [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL, >> + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_FIN, >> + [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_FIN, >> + [PERF_COUNT_HW_CACHE_REFERENCES] = PM_LD_REF_L1, >> + [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_DEMAND_MISS_L1_FIN, >> +}; >> + >> static u64 power10_bhrb_filter_map(u64 branch_sample_type) >> { >> u64 pmu_bhrb_filter = 0; >> @@ -311,6 +365,107 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter) >> [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS, >> }, >> [C(OP_WRITE)] = { >> + [C(RESULT_ACCESS)] = PM_L2_ST, >> + [C(RESULT_MISS)] = PM_L2_ST_MISS, >> + }, >> + [C(OP_PREFETCH)] = { >> + [C(RESULT_ACCESS)] = PM_L3_PF_MISS_L3, >> + [C(RESULT_MISS)] = 0, >> + }, >> + }, >> + [C(DTLB)] = { >> + [C(OP_READ)] = { >> + [C(RESULT_ACCESS)] = 0, >> + [C(RESULT_MISS)] = PM_DTLB_MISS, >> + }, >> + [C(OP_WRITE)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + [C(OP_PREFETCH)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + }, >> + [C(ITLB)] = { >> + [C(OP_READ)] = { >> + [C(RESULT_ACCESS)] = 0, >> + [C(RESULT_MISS)] = PM_ITLB_MISS, >> + }, >> + [C(OP_WRITE)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + [C(OP_PREFETCH)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + }, >> + [C(BPU)] = { >> + [C(OP_READ)] = { >> + [C(RESULT_ACCESS)] = PM_BR_CMPL, >> + [C(RESULT_MISS)] = PM_BR_MPRED_CMPL, >> + }, >> + [C(OP_WRITE)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + [C(OP_PREFETCH)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + }, >> + [C(NODE)] = { >> + [C(OP_READ)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + [C(OP_WRITE)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + [C(OP_PREFETCH)] = { >> + [C(RESULT_ACCESS)] = -1, >> + [C(RESULT_MISS)] = -1, >> + }, >> + }, >> +}; >> + >> +static u64 power10_cache_events_dd1[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = { >> + [C(L1D)] = { >> + [C(OP_READ)] = { >> + [C(RESULT_ACCESS)] = PM_LD_REF_L1, >> + [C(RESULT_MISS)] = PM_LD_MISS_L1, >> + }, >> + [C(OP_WRITE)] = { >> + [C(RESULT_ACCESS)] = 0, >> + [C(RESULT_MISS)] = PM_ST_MISS_L1, >> + }, >> + [C(OP_PREFETCH)] = { >> + [C(RESULT_ACCESS)] = PM_LD_PREFETCH_CACHE_LINE_MISS, >> + [C(RESULT_MISS)] = 0, >> + }, >> + }, >> + [C(L1I)] = { >> + [C(OP_READ)] = { >> + [C(RESULT_ACCESS)] = PM_INST_FROM_L1, >> + [C(RESULT_MISS)] = PM_L1_ICACHE_MISS, >> + }, >> + [C(OP_WRITE)] = { >> + [C(RESULT_ACCESS)] = PM_INST_FROM_L1MISS, >> + [C(RESULT_MISS)] = -1, >> + }, >> + [C(OP_PREFETCH)] = { >> + [C(RESULT_ACCESS)] = PM_IC_PREF_REQ, >> + [C(RESULT_MISS)] = 0, >> + }, >> + }, >> + [C(LL)] = { >> + [C(OP_READ)] = { >> + [C(RESULT_ACCESS)] = PM_DATA_FROM_L3, >> + [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS, >> + }, >> + [C(OP_WRITE)] = { >> [C(RESULT_ACCESS)] = -1, >> [C(RESULT_MISS)] = -1, >> }, >> @@ -407,6 +562,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter) >> int init_power10_pmu(void) >> { >> int rc; >> + unsigned int pvr = mfspr(SPRN_PVR); >> >> /* Comes from cpu_specs[] */ >> if (!cur_cpu_spec->oprofile_cpu_type || >> @@ -416,6 +572,12 @@ int init_power10_pmu(void) >> /* Set the PERF_REG_EXTENDED_MASK here */ >> PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31; >> >> + if ((PVR_MAJ(pvr) == 1)) { >> + power10_pmu.generic_events = power10_generic_events_dd1; >> + power10_pmu.attr_groups = power10_pmu_attr_groups_dd1; >> + power10_pmu.cache_events = &power10_cache_events_dd1; >> + } >> + >> rc = register_power_pmu(&power10_pmu); >> if (rc) >> return rc; >> -- >> 1.8.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00 2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev ` (2 preceding siblings ...) 2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev @ 2020-11-11 4:33 ` Athira Rajeev 3 siblings, 0 replies; 9+ messages in thread From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw) To: mpe; +Cc: mikey, maddy, linuxppc-dev PowerISA v3.1 introduces new control bit (PMCCEXT) for enabling secure access to group B PMU registers in problem state when MMCR0 PMCC=0b00. This patch adds support for MMCR0 PMCCEXT bit in power10 by enabling this bit during boot and during the PMU event enable/disable operations when MMCR0 PMCC=0b00 Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> --- arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/cpu_setup_power.S | 2 ++ arch/powerpc/kernel/dt_cpu_ftrs.c | 1 + arch/powerpc/perf/core-book3s.c | 16 ++++++++++++++++ 4 files changed, 20 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index f877a57..cba9965 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -864,6 +864,7 @@ #define MMCR0_BHRBA 0x00200000UL /* BHRB Access allowed in userspace */ #define MMCR0_EBE 0x00100000UL /* Event based branch enable */ #define MMCR0_PMCC 0x000c0000UL /* PMC control */ +#define MMCR0_PMCCEXT ASM_CONST(0x00000200) /* PMCCEXT control */ #define MMCR0_PMCC_U6 0x00080000UL /* PMC1-6 are R/W by user (PR) */ #define MMCR0_PMC1CE 0x00008000UL /* PMC1 count enable*/ #define MMCR0_PMCjCE ASM_CONST(0x00004000) /* PMCj count enable*/ diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S index 704e8b9..8fc8b72 100644 --- a/arch/powerpc/kernel/cpu_setup_power.S +++ b/arch/powerpc/kernel/cpu_setup_power.S @@ -249,4 +249,6 @@ __init_PMU_ISA31: mtspr SPRN_MMCR3,r5 LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE) mtspr SPRN_MMCRA,r5 + LOAD_REG_IMMEDIATE(r5, MMCR0_PMCCEXT) + mtspr SPRN_MMCR0,r5 blr diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 1098863..9d07965 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -454,6 +454,7 @@ static void init_pmu_power10(void) mtspr(SPRN_MMCR3, 0); mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE); + mtspr(SPRN_MMCR0, MMCR0_PMCCEXT); } static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 08643cb..f328bc0 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -95,6 +95,7 @@ struct cpu_hw_events { #define SPRN_SIER3 0 #define MMCRA_SAMPLE_ENABLE 0 #define MMCRA_BHRB_DISABLE 0 +#define MMCR0_PMCCEXT 0 static inline unsigned long perf_ip_adjust(struct pt_regs *regs) { @@ -1242,6 +1243,9 @@ static void power_pmu_disable(struct pmu *pmu) val |= MMCR0_FC; val &= ~(MMCR0_EBE | MMCR0_BHRBA | MMCR0_PMCC | MMCR0_PMAO | MMCR0_FC56); + /* Set mmcr0 PMCCEXT for p10 */ + if (ppmu->flags & PPMU_ARCH_31) + val |= MMCR0_PMCCEXT; /* * The barrier is to make sure the mtspr has been @@ -1449,6 +1453,18 @@ static void power_pmu_enable(struct pmu *pmu) mmcr0 = ebb_switch_in(ebb, cpuhw); + /* + * Set mmcr0 (PMCCEXT) for p10 + * if mmcr0 PMCC=0b00 to allow secure + * mode of access to group B registers. + */ + if (ppmu->flags & PPMU_ARCH_31) { + if (!(mmcr0 & MMCR0_PMCC)) { + cpuhw->mmcr.mmcr0 |= MMCR0_PMCCEXT; + mmcr0 |= MMCR0_PMCCEXT; + } + } + mb(); if (cpuhw->bhrb_users) ppmu->config_bhrb(cpuhw->bhrb_filter); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-18 5:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev 2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev 2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev 2020-11-18 4:32 ` Michael Ellerman 2020-11-18 5:21 ` Athira Rajeev 2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev 2020-11-18 4:36 ` Michael Ellerman 2020-11-18 5:23 ` Athira Rajeev 2020-11-11 4:33 ` [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00 Athira Rajeev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).