* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-01 10:07 ` Kulkarni, Ganapatrao 0 siblings, 0 replies; 18+ messages in thread From: Kulkarni, Ganapatrao @ 2018-10-01 10:07 UTC (permalink / raw) To: linux-arm-kernel Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, since L1D_CACHE_REFILL counts both load and store misses. Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses and dTLB-loads are wrongly mapped. Hence Deleting all these cache events from armv8_pmuv3 cache mapping. Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> --- arch/arm64/kernel/perf_event.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 33147aacdafd..6a67ad22d1eb 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -207,17 +207,9 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_RESULT_MAX] = { PERF_CACHE_MAP_ALL_UNSUPPORTED, - [C(L1D)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE, - [C(L1D)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL, - [C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE, - [C(L1D)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL, - [C(L1I)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_CACHE, [C(L1I)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL, - [C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL, - [C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB, - [C(ITLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL, [C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB, -- 2.18.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-01 10:07 ` Kulkarni, Ganapatrao 0 siblings, 0 replies; 18+ messages in thread From: Kulkarni, Ganapatrao @ 2018-10-01 10:07 UTC (permalink / raw) To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Cc: Will.Deacon@arm.com, mark.rutland@arm.com, catalin.marinas@arm.com, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, Nair, Jayachandran, Richter, Robert, Lomovtsev, Vadim, Jan Glauber, gklkml16@gmail.com Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, since L1D_CACHE_REFILL counts both load and store misses. Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses and dTLB-loads are wrongly mapped. Hence Deleting all these cache events from armv8_pmuv3 cache mapping. Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> --- arch/arm64/kernel/perf_event.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 33147aacdafd..6a67ad22d1eb 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -207,17 +207,9 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_RESULT_MAX] = { PERF_CACHE_MAP_ALL_UNSUPPORTED, - [C(L1D)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE, - [C(L1D)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL, - [C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE, - [C(L1D)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL, - [C(L1I)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_CACHE, [C(L1I)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL, - [C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL, - [C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB, - [C(ITLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL, [C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB, -- 2.18.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-01 10:07 ` Kulkarni, Ganapatrao @ 2018-10-01 14:29 ` Will Deacon -1 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-01 14:29 UTC (permalink / raw) To: linux-arm-kernel Hi Ganapat, On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote: > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, > since L1D_CACHE_REFILL counts both load and store misses. > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events > from armv8_pmuv3 cache mapping. > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> > --- > arch/arm64/kernel/perf_event.c | 8 -------- > 1 file changed, 8 deletions(-) The "generic" events are really implemented on a best-effort basis, as they rarely tend to map exactly to what the hardware supports. I think they originally stemmed from the x86 CPU PMU, but that doesn't really help us. I had a discussion with Ingo back when we originally implemented perf because I actually preferred not to implement the generic events at all. However, he was strongly of the opinion that a best-effort approach was sufficient to get casual users going with the tool, so that's what we went with. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-01 14:29 ` Will Deacon 0 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-01 14:29 UTC (permalink / raw) To: Kulkarni, Ganapatrao Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, catalin.marinas@arm.com, peterz@infradead.org, mingo@redhat.com, acme@kernel.org, Nair, Jayachandran, Richter, Robert, Lomovtsev, Vadim, Jan Glauber, gklkml16@gmail.com Hi Ganapat, On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote: > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, > since L1D_CACHE_REFILL counts both load and store misses. > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events > from armv8_pmuv3 cache mapping. > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> > --- > arch/arm64/kernel/perf_event.c | 8 -------- > 1 file changed, 8 deletions(-) The "generic" events are really implemented on a best-effort basis, as they rarely tend to map exactly to what the hardware supports. I think they originally stemmed from the x86 CPU PMU, but that doesn't really help us. I had a discussion with Ingo back when we originally implemented perf because I actually preferred not to implement the generic events at all. However, he was strongly of the opinion that a best-effort approach was sufficient to get casual users going with the tool, so that's what we went with. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-01 14:29 ` Will Deacon @ 2018-10-01 16:39 ` Ganapatrao Kulkarni -1 siblings, 0 replies; 18+ messages in thread From: Ganapatrao Kulkarni @ 2018-10-01 16:39 UTC (permalink / raw) To: linux-arm-kernel Hi Will, On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote: > > Hi Ganapat, > > On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote: > > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to > > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, > > since L1D_CACHE_REFILL counts both load and store misses. > > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses > > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events > > from armv8_pmuv3 cache mapping. > > > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> > > --- > > arch/arm64/kernel/perf_event.c | 8 -------- > > 1 file changed, 8 deletions(-) > > The "generic" events are really implemented on a best-effort basis, as > they rarely tend to map exactly to what the hardware supports. I think > they originally stemmed from the x86 CPU PMU, but that doesn't really > help us. This works fairly well for DT based boots, since almost all SoCs have added remapping using custom dt object binding. However we have concluded in the past to drop SoC specific from the ACPI mapping and use json to add SoC/micro architecture specific events support. At present , When we boot with ACPI, it is misleading for these events. In fact, this has been pointed internally from benchmark team and reported it as hardware bug! IMHO, it would be much simpler to delete these misleading events mapping rather explaining to perf tool users. We already have proper mapping for these events, armv8_pmuv3_0/l1d_cache_refill/ armv8_pmuv3_0/l1d_cache/ [core imp def:] l1d_cache_rd l1d_cache_wr l1d_cache_refill_rd l1d_cache_refill_wr > > I had a discussion with Ingo back when we originally implemented perf > because I actually preferred not to implement the generic events at all. > However, he was strongly of the opinion that a best-effort approach was > sufficient to get casual users going with the tool, so that's what we went > with. thanks for the background info, these generic mapping fairly works except these events. > > Will thanks, Ganapat ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-01 16:39 ` Ganapatrao Kulkarni 0 siblings, 0 replies; 18+ messages in thread From: Ganapatrao Kulkarni @ 2018-10-01 16:39 UTC (permalink / raw) To: Will Deacon Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland, catalin.marinas, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter, Vadim.Lomovtsev, Jan.Glauber Hi Will, On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote: > > Hi Ganapat, > > On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote: > > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to > > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, > > since L1D_CACHE_REFILL counts both load and store misses. > > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses > > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events > > from armv8_pmuv3 cache mapping. > > > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> > > --- > > arch/arm64/kernel/perf_event.c | 8 -------- > > 1 file changed, 8 deletions(-) > > The "generic" events are really implemented on a best-effort basis, as > they rarely tend to map exactly to what the hardware supports. I think > they originally stemmed from the x86 CPU PMU, but that doesn't really > help us. This works fairly well for DT based boots, since almost all SoCs have added remapping using custom dt object binding. However we have concluded in the past to drop SoC specific from the ACPI mapping and use json to add SoC/micro architecture specific events support. At present , When we boot with ACPI, it is misleading for these events. In fact, this has been pointed internally from benchmark team and reported it as hardware bug! IMHO, it would be much simpler to delete these misleading events mapping rather explaining to perf tool users. We already have proper mapping for these events, armv8_pmuv3_0/l1d_cache_refill/ armv8_pmuv3_0/l1d_cache/ [core imp def:] l1d_cache_rd l1d_cache_wr l1d_cache_refill_rd l1d_cache_refill_wr > > I had a discussion with Ingo back when we originally implemented perf > because I actually preferred not to implement the generic events at all. > However, he was strongly of the opinion that a best-effort approach was > sufficient to get casual users going with the tool, so that's what we went > with. thanks for the background info, these generic mapping fairly works except these events. > > Will thanks, Ganapat ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-01 16:39 ` Ganapatrao Kulkarni @ 2018-10-04 5:42 ` Ganapatrao Kulkarni -1 siblings, 0 replies; 18+ messages in thread From: Ganapatrao Kulkarni @ 2018-10-04 5:42 UTC (permalink / raw) To: linux-arm-kernel Hi Will, can you please pull this patch? On Mon, Oct 1, 2018 at 10:09 PM Ganapatrao Kulkarni <gklkml16@gmail.com> wrote: > > Hi Will, > > On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote: > > > > Hi Ganapat, > > > > On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote: > > > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to > > > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, > > > since L1D_CACHE_REFILL counts both load and store misses. > > > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses > > > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events > > > from armv8_pmuv3 cache mapping. > > > > > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> > > > --- > > > arch/arm64/kernel/perf_event.c | 8 -------- > > > 1 file changed, 8 deletions(-) > > > > The "generic" events are really implemented on a best-effort basis, as > > they rarely tend to map exactly to what the hardware supports. I think > > they originally stemmed from the x86 CPU PMU, but that doesn't really > > help us. > > This works fairly well for DT based boots, since almost all SoCs have > added remapping using custom dt object binding. > However we have concluded in the past to drop SoC specific from the > ACPI mapping and use json to add SoC/micro architecture specific > events support. > At present , When we boot with ACPI, it is misleading for these events. > > In fact, this has been pointed internally from benchmark team and > reported it as hardware bug! > IMHO, it would be much simpler to delete these misleading events > mapping rather explaining to perf tool users. > > We already have proper mapping for these events, > armv8_pmuv3_0/l1d_cache_refill/ > armv8_pmuv3_0/l1d_cache/ > [core imp def:] > l1d_cache_rd > l1d_cache_wr > l1d_cache_refill_rd > l1d_cache_refill_wr > > > > > I had a discussion with Ingo back when we originally implemented perf > > because I actually preferred not to implement the generic events at all. > > However, he was strongly of the opinion that a best-effort approach was > > sufficient to get casual users going with the tool, so that's what we went > > with. > > thanks for the background info, these generic mapping fairly works > except these events. > > > > > Will > > thanks, > Ganapat thanks, Ganapat ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-04 5:42 ` Ganapatrao Kulkarni 0 siblings, 0 replies; 18+ messages in thread From: Ganapatrao Kulkarni @ 2018-10-04 5:42 UTC (permalink / raw) To: Will Deacon Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland, catalin.marinas, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter, Vadim.Lomovtsev, Jan.Glauber Hi Will, can you please pull this patch? On Mon, Oct 1, 2018 at 10:09 PM Ganapatrao Kulkarni <gklkml16@gmail.com> wrote: > > Hi Will, > > On Mon, Oct 1, 2018 at 7:58 PM Will Deacon <will.deacon@arm.com> wrote: > > > > Hi Ganapat, > > > > On Mon, Oct 01, 2018 at 10:07:43AM +0000, Kulkarni, Ganapatrao wrote: > > > Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to > > > armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, > > > since L1D_CACHE_REFILL counts both load and store misses. > > > Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses > > > and dTLB-loads are wrongly mapped. Hence Deleting all these cache events > > > from armv8_pmuv3 cache mapping. > > > > > > Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> > > > --- > > > arch/arm64/kernel/perf_event.c | 8 -------- > > > 1 file changed, 8 deletions(-) > > > > The "generic" events are really implemented on a best-effort basis, as > > they rarely tend to map exactly to what the hardware supports. I think > > they originally stemmed from the x86 CPU PMU, but that doesn't really > > help us. > > This works fairly well for DT based boots, since almost all SoCs have > added remapping using custom dt object binding. > However we have concluded in the past to drop SoC specific from the > ACPI mapping and use json to add SoC/micro architecture specific > events support. > At present , When we boot with ACPI, it is misleading for these events. > > In fact, this has been pointed internally from benchmark team and > reported it as hardware bug! > IMHO, it would be much simpler to delete these misleading events > mapping rather explaining to perf tool users. > > We already have proper mapping for these events, > armv8_pmuv3_0/l1d_cache_refill/ > armv8_pmuv3_0/l1d_cache/ > [core imp def:] > l1d_cache_rd > l1d_cache_wr > l1d_cache_refill_rd > l1d_cache_refill_wr > > > > > I had a discussion with Ingo back when we originally implemented perf > > because I actually preferred not to implement the generic events at all. > > However, he was strongly of the opinion that a best-effort approach was > > sufficient to get casual users going with the tool, so that's what we went > > with. > > thanks for the background info, these generic mapping fairly works > except these events. > > > > > Will > > thanks, > Ganapat thanks, Ganapat ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-04 5:42 ` Ganapatrao Kulkarni @ 2018-10-04 12:22 ` Will Deacon -1 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-04 12:22 UTC (permalink / raw) To: linux-arm-kernel Hi Ganapat, On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > can you please pull this patch? I still don't like the idea of just removing events like this, especially when other architectures (including some x86 and Power CPUs afaict) playa similar games for generic events, and these events do actually appear in user code. I also don't understand why you remove the TLB events. I think that logic would imply we should remove all of the events, because we can't distinguish prefetches from reads either. If we want to be consistent, then I think we should just remove the OP_WRITE events for L1D and BPU -- would you be ok with that instead? Also, looking at the code, I think our PMCEID parsing is broken for 8.1 parts, where the upper 32 bits of the register are offset by 0x4000 in the event numbering space. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-04 12:22 ` Will Deacon 0 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-04 12:22 UTC (permalink / raw) To: Ganapatrao Kulkarni Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland, catalin.marinas, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter, Vadim.Lomovtsev, Jan.Glauber Hi Ganapat, On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > can you please pull this patch? I still don't like the idea of just removing events like this, especially when other architectures (including some x86 and Power CPUs afaict) playa similar games for generic events, and these events do actually appear in user code. I also don't understand why you remove the TLB events. I think that logic would imply we should remove all of the events, because we can't distinguish prefetches from reads either. If we want to be consistent, then I think we should just remove the OP_WRITE events for L1D and BPU -- would you be ok with that instead? Also, looking at the code, I think our PMCEID parsing is broken for 8.1 parts, where the upper 32 bits of the register are offset by 0x4000 in the event numbering space. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-04 12:22 ` Will Deacon @ 2018-10-04 19:39 ` Ganapatrao Kulkarni -1 siblings, 0 replies; 18+ messages in thread From: Ganapatrao Kulkarni @ 2018-10-04 19:39 UTC (permalink / raw) To: linux-arm-kernel Hi Will, On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote: > > Hi Ganapat, > > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > > can you please pull this patch? > > I still don't like the idea of just removing events like this, especially > when other architectures (including some x86 and Power CPUs afaict) playa > similar games for generic events, and these events do actually appear in > user code. > > I also don't understand why you remove the TLB events. I think that logic > would imply we should remove all of the events, because we can't distinguish > prefetches from reads either. If we want to be consistent, then I think > we should just remove the OP_WRITE events for L1D and BPU -- would you be > ok with that instead? IIUC, dTLB-load-misses is mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec, counts TLB access/misses for both memory-read operation and memory-write operation. IMO, It won't help in keeping these events, knowingly that their mapping is not accurate, only thing i can say to users is , dont use events that are marked as "Hardware cache event" > > Also, looking at the code, I think our PMCEID parsing is broken for 8.1 > parts, where the upper 32 bits of the register are offset by 0x4000 in the > event numbering space. yes, i did not find any mapping in PMCEIDx registers for implementation defined events, otherwise we would have remapped at runtime. > > Will thanks Ganapat ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-04 19:39 ` Ganapatrao Kulkarni 0 siblings, 0 replies; 18+ messages in thread From: Ganapatrao Kulkarni @ 2018-10-04 19:39 UTC (permalink / raw) To: Will Deacon Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland, catalin.marinas, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter, Vadim.Lomovtsev, Jan.Glauber Hi Will, On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote: > > Hi Ganapat, > > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > > can you please pull this patch? > > I still don't like the idea of just removing events like this, especially > when other architectures (including some x86 and Power CPUs afaict) playa > similar games for generic events, and these events do actually appear in > user code. > > I also don't understand why you remove the TLB events. I think that logic > would imply we should remove all of the events, because we can't distinguish > prefetches from reads either. If we want to be consistent, then I think > we should just remove the OP_WRITE events for L1D and BPU -- would you be > ok with that instead? IIUC, dTLB-load-misses is mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec, counts TLB access/misses for both memory-read operation and memory-write operation. IMO, It won't help in keeping these events, knowingly that their mapping is not accurate, only thing i can say to users is , dont use events that are marked as "Hardware cache event" > > Also, looking at the code, I think our PMCEID parsing is broken for 8.1 > parts, where the upper 32 bits of the register are offset by 0x4000 in the > event numbering space. yes, i did not find any mapping in PMCEIDx registers for implementation defined events, otherwise we would have remapped at runtime. > > Will thanks Ganapat ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-04 19:39 ` Ganapatrao Kulkarni @ 2018-10-05 13:34 ` Will Deacon -1 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-05 13:34 UTC (permalink / raw) To: linux-arm-kernel On Fri, Oct 05, 2018 at 01:09:33AM +0530, Ganapatrao Kulkarni wrote: > On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote: > > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > > > can you please pull this patch? > > > > I still don't like the idea of just removing events like this, especially > > when other architectures (including some x86 and Power CPUs afaict) playa > > similar games for generic events, and these events do actually appear in > > user code. > > > > I also don't understand why you remove the TLB events. I think that logic > > would imply we should remove all of the events, because we can't distinguish > > prefetches from reads either. If we want to be consistent, then I think > > we should just remove the OP_WRITE events for L1D and BPU -- would you be > > ok with that instead? > > IIUC, dTLB-load-misses is mapped to > ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is > mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec, > counts TLB access/misses for both memory-read operation and > memory-write operation. > > IMO, It won't help in keeping these events, knowingly that their > mapping is not accurate, only thing i can say to users is , dont use > events that are marked as "Hardware cache event" Right, but my point is that all of the events are inaccurate by this line of reasoning, because we don't support PREFETCH for any of them. So I'd rather just drop the duplicate events from the WRITE entries, like other CPUs and architectures do. I'm about to pack my desk up because we're moving office, but I pushed out some patches I hacked up on my perf/updates branch: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=perf/updates I'll post them to the list next week if it's not the merge window. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-05 13:34 ` Will Deacon 0 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-05 13:34 UTC (permalink / raw) To: Ganapatrao Kulkarni Cc: Ganapatrao Kulkarni, LKML, linux-arm-kernel, Mark Rutland, catalin.marinas, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Nair, Jayachandran, Robert Richter, Vadim.Lomovtsev, Jan.Glauber On Fri, Oct 05, 2018 at 01:09:33AM +0530, Ganapatrao Kulkarni wrote: > On Thu, Oct 4, 2018 at 5:51 PM Will Deacon <will.deacon@arm.com> wrote: > > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > > > can you please pull this patch? > > > > I still don't like the idea of just removing events like this, especially > > when other architectures (including some x86 and Power CPUs afaict) playa > > similar games for generic events, and these events do actually appear in > > user code. > > > > I also don't understand why you remove the TLB events. I think that logic > > would imply we should remove all of the events, because we can't distinguish > > prefetches from reads either. If we want to be consistent, then I think > > we should just remove the OP_WRITE events for L1D and BPU -- would you be > > ok with that instead? > > IIUC, dTLB-load-misses is mapped to > ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL(event 0x05) and dTLB-loads is > mapped to ARMV8_PMUV3_PERFCTR_L1D_TLB(0x25). Which are as per spec, > counts TLB access/misses for both memory-read operation and > memory-write operation. > > IMO, It won't help in keeping these events, knowingly that their > mapping is not accurate, only thing i can say to users is , dont use > events that are marked as "Hardware cache event" Right, but my point is that all of the events are inaccurate by this line of reasoning, because we don't support PREFETCH for any of them. So I'd rather just drop the duplicate events from the WRITE entries, like other CPUs and architectures do. I'm about to pack my desk up because we're moving office, but I pushed out some patches I hacked up on my perf/updates branch: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=perf/updates I'll post them to the list next week if it's not the merge window. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-04 12:22 ` Will Deacon @ 2018-10-05 12:27 ` John Garry -1 siblings, 0 replies; 18+ messages in thread From: John Garry @ 2018-10-05 12:27 UTC (permalink / raw) To: linux-arm-kernel On 04/10/2018 13:22, Will Deacon wrote: > Hi Ganapat, > > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: >> can you please pull this patch? > > I still don't like the idea of just removing events like this, especially > when other architectures (including some x86 and Power CPUs afaict) playa > similar games for generic events, and these events do actually appear in > user code. > > I also don't understand why you remove the TLB events. I think that logic > would imply we should remove all of the events, because we can't distinguish > prefetches from reads either. If we want to be consistent, then I think > we should just remove the OP_WRITE events for L1D and BPU -- would you be > ok with that instead? > > Also, looking at the code, I think our PMCEID parsing is broken for 8.1 > parts, where the upper 32 bits of the register are offset by 0x4000 in the > event numbering space. > Here's something I noticed: static ssize_t armv8pmu_events_sysfs_show(struct device *dev, struct device_attribute *attr, char *page) { struct perf_pmu_events_attr *pmu_attr; pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); return sprintf(page, "event=0x%03llx\n", pmu_attr->id); Should this be min width now be 4, since event width is now 16 bits (even though I don't know why we need to specify this width at all)? Cheers, John > Will > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-05 12:27 ` John Garry 0 siblings, 0 replies; 18+ messages in thread From: John Garry @ 2018-10-05 12:27 UTC (permalink / raw) To: Will Deacon, Ganapatrao Kulkarni Cc: Mark Rutland, Nair, Jayachandran, Jan.Glauber, Peter Zijlstra, catalin.marinas, LKML, Arnaldo Carvalho de Melo, Robert Richter, Ingo Molnar, Vadim.Lomovtsev, Ganapatrao Kulkarni, linux-arm-kernel On 04/10/2018 13:22, Will Deacon wrote: > Hi Ganapat, > > On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: >> can you please pull this patch? > > I still don't like the idea of just removing events like this, especially > when other architectures (including some x86 and Power CPUs afaict) playa > similar games for generic events, and these events do actually appear in > user code. > > I also don't understand why you remove the TLB events. I think that logic > would imply we should remove all of the events, because we can't distinguish > prefetches from reads either. If we want to be consistent, then I think > we should just remove the OP_WRITE events for L1D and BPU -- would you be > ok with that instead? > > Also, looking at the code, I think our PMCEID parsing is broken for 8.1 > parts, where the upper 32 bits of the register are offset by 0x4000 in the > event numbering space. > Here's something I noticed: static ssize_t armv8pmu_events_sysfs_show(struct device *dev, struct device_attribute *attr, char *page) { struct perf_pmu_events_attr *pmu_attr; pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); return sprintf(page, "event=0x%03llx\n", pmu_attr->id); Should this be min width now be 4, since event width is now 16 bits (even though I don't know why we need to specify this width at all)? Cheers, John > Will > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. 2018-10-05 12:27 ` John Garry @ 2018-10-05 12:39 ` Will Deacon -1 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-05 12:39 UTC (permalink / raw) To: linux-arm-kernel On Fri, Oct 05, 2018 at 01:27:08PM +0100, John Garry wrote: > On 04/10/2018 13:22, Will Deacon wrote: > >Hi Ganapat, > > > >On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > >>can you please pull this patch? > > > >I still don't like the idea of just removing events like this, especially > >when other architectures (including some x86 and Power CPUs afaict) playa > >similar games for generic events, and these events do actually appear in > >user code. > > > >I also don't understand why you remove the TLB events. I think that logic > >would imply we should remove all of the events, because we can't distinguish > >prefetches from reads either. If we want to be consistent, then I think > >we should just remove the OP_WRITE events for L1D and BPU -- would you be > >ok with that instead? > > > >Also, looking at the code, I think our PMCEID parsing is broken for 8.1 > >parts, where the upper 32 bits of the register are offset by 0x4000 in the > >event numbering space. > > > > Here's something I noticed: > static ssize_t > armv8pmu_events_sysfs_show(struct device *dev, > struct device_attribute *attr, char *page) > { > struct perf_pmu_events_attr *pmu_attr; > > pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); > > return sprintf(page, "event=0x%03llx\n", pmu_attr->id); > > Should this be min width now be 4, since event width is now 16 bits (even > though I don't know why we need to specify this width at all)? Yeah, that is pretty weird, but the 16-bit events won't be truncated, so I think I'd rather just leave that string as-is since it's ABI. I agree that we probably shouldn't have bothered with the width at all in hindsight. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events. @ 2018-10-05 12:39 ` Will Deacon 0 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2018-10-05 12:39 UTC (permalink / raw) To: John Garry Cc: Ganapatrao Kulkarni, Mark Rutland, Nair, Jayachandran, Jan.Glauber, Peter Zijlstra, catalin.marinas, LKML, Arnaldo Carvalho de Melo, Robert Richter, Ingo Molnar, Vadim.Lomovtsev, Ganapatrao Kulkarni, linux-arm-kernel On Fri, Oct 05, 2018 at 01:27:08PM +0100, John Garry wrote: > On 04/10/2018 13:22, Will Deacon wrote: > >Hi Ganapat, > > > >On Thu, Oct 04, 2018 at 11:12:09AM +0530, Ganapatrao Kulkarni wrote: > >>can you please pull this patch? > > > >I still don't like the idea of just removing events like this, especially > >when other architectures (including some x86 and Power CPUs afaict) playa > >similar games for generic events, and these events do actually appear in > >user code. > > > >I also don't understand why you remove the TLB events. I think that logic > >would imply we should remove all of the events, because we can't distinguish > >prefetches from reads either. If we want to be consistent, then I think > >we should just remove the OP_WRITE events for L1D and BPU -- would you be > >ok with that instead? > > > >Also, looking at the code, I think our PMCEID parsing is broken for 8.1 > >parts, where the upper 32 bits of the register are offset by 0x4000 in the > >event numbering space. > > > > Here's something I noticed: > static ssize_t > armv8pmu_events_sysfs_show(struct device *dev, > struct device_attribute *attr, char *page) > { > struct perf_pmu_events_attr *pmu_attr; > > pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); > > return sprintf(page, "event=0x%03llx\n", pmu_attr->id); > > Should this be min width now be 4, since event width is now 16 bits (even > though I don't know why we need to specify this width at all)? Yeah, that is pretty weird, but the 16-bit events won't be truncated, so I think I'd rather just leave that string as-is since it's ABI. I agree that we probably shouldn't have bothered with the width at all in hindsight. Will ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-10-05 13:34 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-10-01 10:07 [PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events Kulkarni, Ganapatrao 2018-10-01 10:07 ` Kulkarni, Ganapatrao 2018-10-01 14:29 ` Will Deacon 2018-10-01 14:29 ` Will Deacon 2018-10-01 16:39 ` Ganapatrao Kulkarni 2018-10-01 16:39 ` Ganapatrao Kulkarni 2018-10-04 5:42 ` Ganapatrao Kulkarni 2018-10-04 5:42 ` Ganapatrao Kulkarni 2018-10-04 12:22 ` Will Deacon 2018-10-04 12:22 ` Will Deacon 2018-10-04 19:39 ` Ganapatrao Kulkarni 2018-10-04 19:39 ` Ganapatrao Kulkarni 2018-10-05 13:34 ` Will Deacon 2018-10-05 13:34 ` Will Deacon 2018-10-05 12:27 ` John Garry 2018-10-05 12:27 ` John Garry 2018-10-05 12:39 ` Will Deacon 2018-10-05 12:39 ` Will Deacon
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.