* [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA Ravi Bangoria
` (8 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Older API PERF_MEM_LVL_UNC can be replaced by PERF_MEM_LVLNUM_UNC.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
include/uapi/linux/perf_event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 37675437b768..39c6a250dd1b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1339,7 +1339,8 @@ union perf_mem_data_src {
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
-/* 5-0x8 available */
+/* 5-0x7 available */
+#define PERF_MEM_LVLNUM_UNC 0x08 /* Uncached */
#define PERF_MEM_LVLNUM_CXL 0x09 /* CXL */
#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 3/9] perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src Ravi Bangoria
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_NA is used to set default values.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
include/linux/perf_event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d5628a7b5eaa..064c5ad7ff11 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1183,7 +1183,8 @@ struct perf_sample_data {
PERF_MEM_S(LVL, NA) |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA) |\
- PERF_MEM_S(TLB, NA))
+ PERF_MEM_S(TLB, NA) |\
+ PERF_MEM_S(LVLNUM, NA))
static inline void perf_sample_data_init(struct perf_sample_data *data,
u64 addr, u64 period)
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 3/9] perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h Ravi Bangoria
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
From: Namhyung Kim <namhyung@kernel.org>
Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
limitations. Mainly:
1. mem_lvl_num doesn't allow setting multiple sources whereas old API
allows it. Setting multiple data sources is useful because IBS on
pre-zen4 uarch doesn't provide fine granular DataSrc details (there
is only one such DataSrc(2h) though).
2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
1st one can be handled by using ANY_CACHE with HOPS_0. 2nd is purely
perf tool specific issue and should be fixed separately.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
arch/x86/events/amd/ibs.c | 156 +++++++++++++++++---------------------
1 file changed, 68 insertions(+), 88 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 64582954b5f6..b6b833f5da50 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -703,38 +703,63 @@ static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
return op_data2->data_src_lo;
}
-static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
- union ibs_op_data3 *op_data3,
- struct perf_sample_data *data)
+#define L(x) (PERF_MEM_S(LVL, x) | PERF_MEM_S(LVL, HIT))
+#define LN(x) PERF_MEM_S(LVLNUM, x)
+#define REM PERF_MEM_S(REMOTE, REMOTE)
+#define HOPS(x) PERF_MEM_S(HOPS, x)
+
+static u64 g_data_src[8] = {
+ [IBS_DATA_SRC_LOC_CACHE] = L(L3) | L(REM_CCE1) | LN(ANY_CACHE) | HOPS(0),
+ [IBS_DATA_SRC_DRAM] = L(LOC_RAM) | LN(RAM),
+ [IBS_DATA_SRC_REM_CACHE] = L(REM_CCE2) | LN(ANY_CACHE) | REM | HOPS(1),
+ [IBS_DATA_SRC_IO] = L(IO) | LN(IO),
+};
+
+#define RMT_NODE_BITS (1 << IBS_DATA_SRC_DRAM)
+#define RMT_NODE_APPLICABLE(x) (RMT_NODE_BITS & (1 << x))
+
+static u64 g_zen4_data_src[32] = {
+ [IBS_DATA_SRC_EXT_LOC_CACHE] = L(L3) | LN(L3),
+ [IBS_DATA_SRC_EXT_NEAR_CCX_CACHE] = L(REM_CCE1) | LN(ANY_CACHE) | REM | HOPS(0),
+ [IBS_DATA_SRC_EXT_DRAM] = L(LOC_RAM) | LN(RAM),
+ [IBS_DATA_SRC_EXT_FAR_CCX_CACHE] = L(REM_CCE2) | LN(ANY_CACHE) | REM | HOPS(1),
+ [IBS_DATA_SRC_EXT_PMEM] = LN(PMEM),
+ [IBS_DATA_SRC_EXT_IO] = L(IO) | LN(IO),
+ [IBS_DATA_SRC_EXT_EXT_MEM] = LN(CXL),
+};
+
+#define ZEN4_RMT_NODE_BITS ((1 << IBS_DATA_SRC_EXT_DRAM) | \
+ (1 << IBS_DATA_SRC_EXT_PMEM) | \
+ (1 << IBS_DATA_SRC_EXT_EXT_MEM))
+#define ZEN4_RMT_NODE_APPLICABLE(x) (ZEN4_RMT_NODE_BITS & (1 << x))
+
+static __u64 perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
+ union ibs_op_data3 *op_data3,
+ struct perf_sample_data *data)
{
union perf_mem_data_src *data_src = &data->data_src;
u8 ibs_data_src = perf_ibs_data_src(op_data2);
data_src->mem_lvl = 0;
+ data_src->mem_lvl_num = 0;
/*
* DcMiss, L2Miss, DataSrc, DcMissLat etc. are all invalid for Uncached
* memory accesses. So, check DcUcMemAcc bit early.
*/
- if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO) {
- data_src->mem_lvl = PERF_MEM_LVL_UNC | PERF_MEM_LVL_HIT;
- return;
- }
+ if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO)
+ return L(UNC) | LN(UNC);
/* L1 Hit */
- if (op_data3->dc_miss == 0) {
- data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
- return;
- }
+ if (op_data3->dc_miss == 0)
+ return L(L1) | LN(L1);
/* L2 Hit */
if (op_data3->l2_miss == 0) {
/* Erratum #1293 */
if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xF ||
- !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
- data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
- return;
- }
+ !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc))
+ return L(L2) | LN(L2);
}
/*
@@ -744,82 +769,36 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
if (data_src->mem_op != PERF_MEM_OP_LOAD)
goto check_mab;
- /* L3 Hit */
if (ibs_caps & IBS_CAPS_ZEN4) {
- if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE) {
- data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
- return;
- }
- } else {
- if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
- data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_REM_CCE1 |
- PERF_MEM_LVL_HIT;
- return;
- }
- }
+ u64 val = g_zen4_data_src[ibs_data_src];
- /* A peer cache in a near CCX */
- if (ibs_caps & IBS_CAPS_ZEN4 &&
- ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE) {
- data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
- return;
- }
+ if (!val)
+ goto check_mab;
- /* A peer cache in a far CCX */
- if (ibs_caps & IBS_CAPS_ZEN4) {
- if (ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE) {
- data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
- return;
- }
- } else {
- if (ibs_data_src == IBS_DATA_SRC_REM_CACHE) {
- data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
- return;
+ /* HOPS_1 because IBS doesn't provide remote socket detail */
+ if (op_data2->rmt_node && ZEN4_RMT_NODE_APPLICABLE(ibs_data_src)) {
+ if (ibs_data_src == IBS_DATA_SRC_EXT_DRAM)
+ val = L(REM_RAM1) | LN(RAM) | REM | HOPS(1);
+ else
+ val |= REM | HOPS(1);
}
- }
- /* DRAM */
- if (ibs_data_src == IBS_DATA_SRC_EXT_DRAM) {
- if (op_data2->rmt_node == 0)
- data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
- else
- data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
- return;
- }
+ return val;
+ } else {
+ u64 val = g_data_src[ibs_data_src];
- /* PMEM */
- if (ibs_caps & IBS_CAPS_ZEN4 && ibs_data_src == IBS_DATA_SRC_EXT_PMEM) {
- data_src->mem_lvl_num = PERF_MEM_LVLNUM_PMEM;
- if (op_data2->rmt_node) {
- data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
- /* IBS doesn't provide Remote socket detail */
- data_src->mem_hops = PERF_MEM_HOPS_1;
- }
- return;
- }
+ if (!val)
+ goto check_mab;
- /* Extension Memory */
- if (ibs_caps & IBS_CAPS_ZEN4 &&
- ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
- data_src->mem_lvl_num = PERF_MEM_LVLNUM_CXL;
- if (op_data2->rmt_node) {
- data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
- /* IBS doesn't provide Remote socket detail */
- data_src->mem_hops = PERF_MEM_HOPS_1;
+ /* HOPS_1 because IBS doesn't provide remote socket detail */
+ if (op_data2->rmt_node && RMT_NODE_APPLICABLE(ibs_data_src)) {
+ if (ibs_data_src == IBS_DATA_SRC_DRAM)
+ val = L(REM_RAM1) | LN(RAM) | REM | HOPS(1);
+ else
+ val |= REM | HOPS(1);
}
- return;
- }
- /* IO */
- if (ibs_data_src == IBS_DATA_SRC_EXT_IO) {
- data_src->mem_lvl = PERF_MEM_LVL_IO;
- data_src->mem_lvl_num = PERF_MEM_LVLNUM_IO;
- if (op_data2->rmt_node) {
- data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
- /* IBS doesn't provide Remote socket detail */
- data_src->mem_hops = PERF_MEM_HOPS_1;
- }
- return;
+ return val;
}
check_mab:
@@ -830,12 +809,11 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
* DataSrc simultaneously. Prioritize DataSrc over MAB, i.e. set
* MAB only when IBS fails to provide DataSrc.
*/
- if (op_data3->dc_miss_no_mab_alloc) {
- data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
- return;
- }
+ if (op_data3->dc_miss_no_mab_alloc)
+ return L(LFB) | LN(LFB);
- data_src->mem_lvl = PERF_MEM_LVL_NA;
+ /* Don't set HIT with NA */
+ return PERF_MEM_S(LVL, NA) | LN(NA);
}
static bool perf_ibs_cache_hit_st_valid(void)
@@ -925,7 +903,9 @@ static void perf_ibs_get_data_src(struct perf_ibs_data *ibs_data,
union ibs_op_data2 *op_data2,
union ibs_op_data3 *op_data3)
{
- perf_ibs_get_mem_lvl(op_data2, op_data3, data);
+ union perf_mem_data_src *data_src = &data->data_src;
+
+ data_src->val |= perf_ibs_get_mem_lvl(op_data2, op_data3, data);
perf_ibs_get_mem_snoop(op_data2, data);
perf_ibs_get_tlb_lvl(op_data3, data);
perf_ibs_get_mem_lock(op_data3, data);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
` (2 preceding siblings ...)
2023-04-07 11:24 ` [PATCH v3 3/9] perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE Ravi Bangoria
` (5 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
... to bring PERF_MEM_LVLNUM_UNC definition to userspace
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/include/uapi/linux/perf_event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 37675437b768..39c6a250dd1b 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1339,7 +1339,8 @@ union perf_mem_data_src {
#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */
#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */
#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */
-/* 5-0x8 available */
+/* 5-0x7 available */
+#define PERF_MEM_LVLNUM_UNC 0x08 /* Uncached */
#define PERF_MEM_LVLNUM_CXL 0x09 /* CXL */
#define PERF_MEM_LVLNUM_IO 0x0a /* I/O */
#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
` (3 preceding siblings ...)
2023-04-07 11:24 ` [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC Ravi Bangoria
` (4 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_DATA_SRC_NONE is used to set
default values.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/util/event.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 6663a676eadc..de20e01c9d72 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -89,7 +89,8 @@ enum {
PERF_MEM_S(LVL, NA) |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA) |\
- PERF_MEM_S(TLB, NA))
+ PERF_MEM_S(TLB, NA) |\
+ PERF_MEM_S(LVLNUM, NA))
/* Attribute type for custom synthesized events */
#define PERF_TYPE_SYNTH (INT_MAX + 1U)
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
` (4 preceding siblings ...)
2023-04-07 11:24 ` [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf() Ravi Bangoria
` (3 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Add support for printing PERF_MEM_LVLNUM_UNC in perf mem report.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/util/mem-events.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index b3a91093069a..3a7c72be8326 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -295,6 +295,7 @@ static const char * const mem_lvl[] = {
};
static const char * const mem_lvlnum[] = {
+ [PERF_MEM_LVLNUM_UNC] = "Uncached",
[PERF_MEM_LVLNUM_CXL] = "CXL",
[PERF_MEM_LVLNUM_IO] = "I/O",
[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf()
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
` (5 preceding siblings ...)
2023-04-07 11:24 ` [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars Ravi Bangoria
` (2 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() is
non-intuitive. For ex, it ignores mem_lvl when mem_hops is set but
considers it otherwise. It prints both mem_lvl_num and mem_lvl when
mem_hops is not set.
Refactor this function such that it behaves more intuitively: Use
new API mem_lvl_num|mem_remote|mem_hops if mem_lvl_num contains
value other than PERF_MEM_LVLNUM_NA. Otherwise, fallback to old API
mem_lvl. Since new API has no way to indicate MISS, use it from old
api, otherwise don't club old and new APIs while parsing as well as
printing.
Before:
$ sudo ./perf mem report -F sample,mem --stdio
# Samples Memory access
# ............ ........................
#
250097 N/A
188907 L1 hit
4116 L2 hit
3496 Remote Cache (1 hop) hit
3271 Remote Cache (2 hops) hit
873 L3 hit
598 Local RAM hit
438 Remote RAM (1 hop) hit
1 Uncached hit
After:
$ sudo ./perf mem report -F sample,mem --stdio
# Samples Memory access
# ............ .......................................
#
255517 N/A
189989 L1 hit
4541 L2 hit
3363 Remote core, same node Any cache hit
3336 Remote node, same socket Any cache hit
1275 L3 hit
743 RAM hit
545 Remote node, same socket RAM hit
4 Uncached hit
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/util/mem-events.c | 89 +++++++++++++++++++-----------------
1 file changed, 47 insertions(+), 42 deletions(-)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 3a7c72be8326..ed1ee4b05356 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -344,66 +344,71 @@ static int perf_mem__op_scnprintf(char *out, size_t sz, struct mem_info *mem_inf
int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
{
- size_t i, l = 0;
- u64 m = PERF_MEM_LVL_NA;
- u64 hit, miss;
+ union perf_mem_data_src data_src;
int printed = 0;
-
- if (mem_info)
- m = mem_info->data_src.mem_lvl;
+ size_t l = 0;
+ size_t i;
+ int lvl;
+ char hit_miss[5] = {0};
sz -= 1; /* -1 for null termination */
out[0] = '\0';
- hit = m & PERF_MEM_LVL_HIT;
- miss = m & PERF_MEM_LVL_MISS;
+ if (!mem_info)
+ goto na;
- /* already taken care of */
- m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
+ data_src = mem_info->data_src;
- if (mem_info && mem_info->data_src.mem_remote) {
- strcat(out, "Remote ");
- l += 7;
- }
+ if (data_src.mem_lvl & PERF_MEM_LVL_HIT)
+ memcpy(hit_miss, "hit", 3);
+ else if (data_src.mem_lvl & PERF_MEM_LVL_MISS)
+ memcpy(hit_miss, "miss", 4);
- /*
- * Incase mem_hops field is set, we can skip printing data source via
- * PERF_MEM_LVL namespace.
- */
- if (mem_info && mem_info->data_src.mem_hops) {
- l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
- } else {
- for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
- if (!(m & 0x1))
- continue;
- if (printed++) {
- strcat(out, " or ");
- l += 4;
- }
- l += scnprintf(out + l, sz - l, mem_lvl[i]);
+ lvl = data_src.mem_lvl_num;
+ if (lvl && lvl != PERF_MEM_LVLNUM_NA) {
+ if (data_src.mem_remote) {
+ strcat(out, "Remote ");
+ l += 7;
}
+
+ if (data_src.mem_hops)
+ l += scnprintf(out + l, sz - l, "%s ", mem_hops[data_src.mem_hops]);
+
+ if (mem_lvlnum[lvl])
+ l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]);
+ else
+ l += scnprintf(out + l, sz - l, "L%d", lvl);
+
+ l += scnprintf(out + l, sz - l, " %s", hit_miss);
+ return l;
}
- if (mem_info && mem_info->data_src.mem_lvl_num) {
- int lvl = mem_info->data_src.mem_lvl_num;
+ lvl = data_src.mem_lvl;
+ if (!lvl)
+ goto na;
+
+ lvl &= ~(PERF_MEM_LVL_NA | PERF_MEM_LVL_HIT | PERF_MEM_LVL_MISS);
+ if (!lvl)
+ goto na;
+
+ for (i = 0; lvl && i < ARRAY_SIZE(mem_lvl); i++, lvl >>= 1) {
+ if (!(lvl & 0x1))
+ continue;
if (printed++) {
strcat(out, " or ");
l += 4;
}
- if (mem_lvlnum[lvl])
- l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]);
- else
- l += scnprintf(out + l, sz - l, "L%d", lvl);
+ l += scnprintf(out + l, sz - l, mem_lvl[i]);
}
- if (l == 0)
- l += scnprintf(out + l, sz - l, "N/A");
- if (hit)
- l += scnprintf(out + l, sz - l, " hit");
- if (miss)
- l += scnprintf(out + l, sz - l, " miss");
+ if (printed) {
+ l += scnprintf(out + l, sz - l, " %s", hit_miss);
+ return l;
+ }
- return l;
+na:
+ strcat(out, "N/A");
+ return 3;
}
static const char * const snoop_access[] = {
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
` (6 preceding siblings ...)
2023-04-07 11:24 ` [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf() Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR Ravi Bangoria
2023-04-07 21:44 ` [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Namhyung Kim
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
39 is taken from the length of longest printable new API string:
"Remote socket, same board Any cache hit". Although, using old API
can result into even longer strings, let's not overkill by making
it dynamic length.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/util/hist.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b6e4b4edde43..46ce4527919a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -207,7 +207,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
hists__new_col_len(hists, HISTC_MEM_TLB, 22);
hists__new_col_len(hists, HISTC_MEM_SNOOP, 12);
- hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
+ hists__new_col_len(hists, HISTC_MEM_LVL, 36 + 3);
hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
` (7 preceding siblings ...)
2023-04-07 11:24 ` [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
2023-04-07 21:44 ` [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Namhyung Kim
9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
To: peterz, namhyung, acme
Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Some of the IBS_OP_DATA2 bit descriptions were stale (taken from old
version of PPR). Change it according to latest PPR.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/util/amd-sample-raw.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index b0e70ce9d87a..6a6ddba76c75 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -105,17 +105,17 @@ static void pr_ibs_op_data2_extended(union ibs_op_data2 reg)
static const char * const data_src_str[] = {
"",
" DataSrc 1=Local L3 or other L1/L2 in CCX",
- " DataSrc 2=A peer cache in a near CCX",
- " DataSrc 3=Data returned from DRAM",
+ " DataSrc 2=Another CCX cache in the same NUMA node",
+ " DataSrc 3=DRAM",
" DataSrc 4=(reserved)",
- " DataSrc 5=A peer cache in a far CCX",
- " DataSrc 6=DRAM address map with \"long latency\" bit set",
- " DataSrc 7=Data returned from MMIO/Config/PCI/APIC",
- " DataSrc 8=Extension Memory (S-Link, GenZ, etc)",
+ " DataSrc 5=Another CCX cache in a different NUMA node",
+ " DataSrc 6=Long-latency DIMM",
+ " DataSrc 7=MMIO/Config/PCI/APIC",
+ " DataSrc 8=Extension Memory",
" DataSrc 9=(reserved)",
" DataSrc 10=(reserved)",
" DataSrc 11=(reserved)",
- " DataSrc 12=Peer Agent Memory",
+ " DataSrc 12=Coherent Memory of a different processor type",
/* 13 to 31 are reserved. Avoid printing them. */
};
int data_src = (reg.data_src_hi << 3) | reg.data_src_lo;
--
2.34.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
` (8 preceding siblings ...)
2023-04-07 11:24 ` [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR Ravi Bangoria
@ 2023-04-07 21:44 ` Namhyung Kim
2023-04-10 2:23 ` Ravi Bangoria
9 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2023-04-07 21:44 UTC (permalink / raw)
To: Ravi Bangoria
Cc: peterz, acme, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Hi Ravi,
On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> limitations. Mainly:
>
> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> allows it. Setting multiple data sources is useful because IBS on
> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> is only one such DataSrc(2h) though).
> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>
> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> setting new API fields, convert all individual field assignments to
> compile time wrapper macros built using PERF_MEM_S(). Also convert
> DataSrc conditional code to array lookups.
>
> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> non-intuitive. Make it sane.
Looks good, but I think you need to split kernel and user patches.
>
> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> v2->v3:
> - IBS: Don't club RmtNode with DataSrc=7 (IO)
> - Make perf_mem__lvl_scnprintf() more sane
> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> and interpreat it in tool.
> - Add PERF_MEM_LVLNUM_NA to default data_src value
> - Change some of the IBS bit description according to latest PPR
>
> Namhyung Kim (1):
> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
>
> Ravi Bangoria (8):
> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> perf headers: Sync uapi/linux/perf_event.h
> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> perf mem: Refactor perf_mem__lvl_scnprintf()
> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> perf script ibs: Change bit description according to latest PPR
Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
>
> arch/x86/events/amd/ibs.c | 156 +++++++++++---------------
> include/linux/perf_event.h | 3 +-
> include/uapi/linux/perf_event.h | 3 +-
> tools/include/uapi/linux/perf_event.h | 3 +-
> tools/perf/util/amd-sample-raw.c | 14 +--
> tools/perf/util/event.h | 3 +-
> tools/perf/util/hist.c | 2 +-
> tools/perf/util/mem-events.c | 90 ++++++++-------
> 8 files changed, 132 insertions(+), 142 deletions(-)
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
2023-04-07 21:44 ` [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Namhyung Kim
@ 2023-04-10 2:23 ` Ravi Bangoria
2023-04-10 15:15 ` Arnaldo Carvalho de Melo
2023-05-16 2:45 ` Ravi Bangoria
0 siblings, 2 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-10 2:23 UTC (permalink / raw)
To: Namhyung Kim, peterz, acme
Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
kjain, linux-perf-users, linux-kernel, sandipan.das,
ananth.narayan, santosh.shukla, Ravi Bangoria
On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> Hi Ravi,
>
> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>> limitations. Mainly:
>>
>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>> allows it. Setting multiple data sources is useful because IBS on
>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>> is only one such DataSrc(2h) though).
>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>
>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>> setting new API fields, convert all individual field assignments to
>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>> DataSrc conditional code to array lookups.
>>
>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>> non-intuitive. Make it sane.
>
> Looks good, but I think you need to split kernel and user patches.
Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
Arnaldo, Peter, please let me know if you wants to split the series and
resend.
>
>>
>> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
>> v2->v3:
>> - IBS: Don't club RmtNode with DataSrc=7 (IO)
>> - Make perf_mem__lvl_scnprintf() more sane
>> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
>> and interpreat it in tool.
>> - Add PERF_MEM_LVLNUM_NA to default data_src value
>> - Change some of the IBS bit description according to latest PPR
>>
>> Namhyung Kim (1):
>> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
>>
>> Ravi Bangoria (8):
>> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
>> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
>> perf headers: Sync uapi/linux/perf_event.h
>> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
>> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
>> perf mem: Refactor perf_mem__lvl_scnprintf()
>> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
>> perf script ibs: Change bit description according to latest PPR
>
> Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks!
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
2023-04-10 2:23 ` Ravi Bangoria
@ 2023-04-10 15:15 ` Arnaldo Carvalho de Melo
2023-04-10 22:31 ` Arnaldo Carvalho de Melo
2023-05-16 2:45 ` Ravi Bangoria
1 sibling, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-04-10 15:15 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Namhyung Kim, peterz, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Em Mon, Apr 10, 2023 at 07:53:57AM +0530, Ravi Bangoria escreveu:
> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> > Hi Ravi,
> >
> > On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> >> limitations. Mainly:
> >>
> >> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> >> allows it. Setting multiple data sources is useful because IBS on
> >> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> >> is only one such DataSrc(2h) though).
> >> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> >> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> >> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
> >>
> >> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> >> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> >> setting new API fields, convert all individual field assignments to
> >> compile time wrapper macros built using PERF_MEM_S(). Also convert
> >> DataSrc conditional code to array lookups.
> >>
> >> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> >> non-intuitive. Make it sane.
> >
> > Looks good, but I think you need to split kernel and user patches.
>
> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> Arnaldo, Peter, please let me know if you wants to split the series and
> resend.
I can always use b4's -P option :-) So no need to resubmit, I can pick
the tools bits,
- Arnaldo
> >
> >>
> >> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> >> v2->v3:
> >> - IBS: Don't club RmtNode with DataSrc=7 (IO)
> >> - Make perf_mem__lvl_scnprintf() more sane
> >> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> >> and interpreat it in tool.
> >> - Add PERF_MEM_LVLNUM_NA to default data_src value
> >> - Change some of the IBS bit description according to latest PPR
> >>
> >> Namhyung Kim (1):
> >> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
> >>
> >> Ravi Bangoria (8):
> >> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> >> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> >> perf headers: Sync uapi/linux/perf_event.h
> >> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> >> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> >> perf mem: Refactor perf_mem__lvl_scnprintf()
> >> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> >> perf script ibs: Change bit description according to latest PPR
> >
> > Acked-by: Namhyung Kim <namhyung@kernel.org>
>
> Thanks!
--
- Arnaldo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
2023-04-10 15:15 ` Arnaldo Carvalho de Melo
@ 2023-04-10 22:31 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-04-10 22:31 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Namhyung Kim, peterz, mingo, eranian, kan.liang, jolsa, irogers,
adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
sandipan.das, ananth.narayan, santosh.shukla
Em Mon, Apr 10, 2023 at 12:15:48PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Apr 10, 2023 at 07:53:57AM +0530, Ravi Bangoria escreveu:
> > On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> > > Hi Ravi,
> > >
> > > On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> > >>
> > >> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> > >> limitations. Mainly:
> > >>
> > >> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> > >> allows it. Setting multiple data sources is useful because IBS on
> > >> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> > >> is only one such DataSrc(2h) though).
> > >> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> > >> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> > >> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
> > >>
> > >> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> > >> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> > >> setting new API fields, convert all individual field assignments to
> > >> compile time wrapper macros built using PERF_MEM_S(). Also convert
> > >> DataSrc conditional code to array lookups.
> > >>
> > >> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> > >> non-intuitive. Make it sane.
> > >
> > > Looks good, but I think you need to split kernel and user patches.
> >
> > Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> > Arnaldo, Peter, please let me know if you wants to split the series and
> > resend.
>
> I can always use b4's -P option :-) So no need to resubmit, I can pick
>> the tools bits,
Done
> - Arnaldo
>
> > >
> > >>
> > >> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> > >> v2->v3:
> > >> - IBS: Don't club RmtNode with DataSrc=7 (IO)
> > >> - Make perf_mem__lvl_scnprintf() more sane
> > >> - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> > >> and interpreat it in tool.
> > >> - Add PERF_MEM_LVLNUM_NA to default data_src value
> > >> - Change some of the IBS bit description according to latest PPR
> > >>
> > >> Namhyung Kim (1):
> > >> perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
> > >>
> > >> Ravi Bangoria (8):
> > >> perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> > >> perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> > >> perf headers: Sync uapi/linux/perf_event.h
> > >> perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> > >> perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> > >> perf mem: Refactor perf_mem__lvl_scnprintf()
> > >> perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> > >> perf script ibs: Change bit description according to latest PPR
> > >
> > > Acked-by: Namhyung Kim <namhyung@kernel.org>
> >
> > Thanks!
>
> --
>
> - Arnaldo
--
- Arnaldo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
2023-04-10 2:23 ` Ravi Bangoria
2023-04-10 15:15 ` Arnaldo Carvalho de Melo
@ 2023-05-16 2:45 ` Ravi Bangoria
2023-05-29 4:05 ` Ravi Bangoria
1 sibling, 1 reply; 17+ messages in thread
From: Ravi Bangoria @ 2023-05-16 2:45 UTC (permalink / raw)
To: peterz
Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
kjain, linux-perf-users, linux-kernel, sandipan.das,
ananth.narayan, santosh.shukla, Namhyung Kim, acme, Ravi Bangoria
On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>> Hi Ravi,
>>
>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>
>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>> limitations. Mainly:
>>>
>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>> allows it. Setting multiple data sources is useful because IBS on
>>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>> is only one such DataSrc(2h) though).
>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>
>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>> setting new API fields, convert all individual field assignments to
>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>> DataSrc conditional code to array lookups.
>>>
>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>> non-intuitive. Make it sane.
>>
>> Looks good, but I think you need to split kernel and user patches.
>
> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> Arnaldo, Peter, please let me know if you wants to split the series and
> resend.
Hi Peter, tools/ patches are already upstream. Can you please pick up
kernel changes.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
2023-05-16 2:45 ` Ravi Bangoria
@ 2023-05-29 4:05 ` Ravi Bangoria
2023-06-30 6:35 ` Ravi Bangoria
0 siblings, 1 reply; 17+ messages in thread
From: Ravi Bangoria @ 2023-05-29 4:05 UTC (permalink / raw)
To: peterz
Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
kjain, linux-perf-users, linux-kernel, sandipan.das,
ananth.narayan, santosh.shukla, Namhyung Kim, acme, Ravi Bangoria
On 16-May-23 8:15 AM, Ravi Bangoria wrote:
> On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
>> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>>> Hi Ravi,
>>>
>>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>>> limitations. Mainly:
>>>>
>>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>>> allows it. Setting multiple data sources is useful because IBS on
>>>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>>> is only one such DataSrc(2h) though).
>>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>>
>>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>>> setting new API fields, convert all individual field assignments to
>>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>>> DataSrc conditional code to array lookups.
>>>>
>>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>>> non-intuitive. Make it sane.
>>>
>>> Looks good, but I think you need to split kernel and user patches.
>>
>> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
>> Arnaldo, Peter, please let me know if you wants to split the series and
>> resend.
>
> Hi Peter, tools/ patches are already upstream. Can you please pick up
> kernel changes.
Gentle ping, Peter!
Thanks,
Ravi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
2023-05-29 4:05 ` Ravi Bangoria
@ 2023-06-30 6:35 ` Ravi Bangoria
0 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-06-30 6:35 UTC (permalink / raw)
To: peterz
Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
kjain, linux-perf-users, linux-kernel, sandipan.das,
ananth.narayan, santosh.shukla, Namhyung Kim, acme, Ravi Bangoria
On 29-May-23 9:35 AM, Ravi Bangoria wrote:
> On 16-May-23 8:15 AM, Ravi Bangoria wrote:
>> On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
>>> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>>>> Hi Ravi,
>>>>
>>>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>
>>>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>>>> limitations. Mainly:
>>>>>
>>>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>>>> allows it. Setting multiple data sources is useful because IBS on
>>>>> pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>>>> is only one such DataSrc(2h) though).
>>>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>>>> c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>>>> prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>>>
>>>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>>>> setting new API fields, convert all individual field assignments to
>>>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>>>> DataSrc conditional code to array lookups.
>>>>>
>>>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>>>> non-intuitive. Make it sane.
>>>>
>>>> Looks good, but I think you need to split kernel and user patches.
>>>
>>> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
>>> Arnaldo, Peter, please let me know if you wants to split the series and
>>> resend.
>>
>> Hi Peter, tools/ patches are already upstream. Can you please pick up
>> kernel changes.
>
> Gentle ping, Peter!
Hello Peter, this is pending from long time. Can you please consider taking
it. Please let me know if you want me to resend the patches.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 17+ messages in thread