linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
@ 2023-04-07 11:24 Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC Ravi Bangoria
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
limitations. Mainly:

1. mem_lvl_num doesn't allow setting multiple sources whereas old API
   allows it. Setting multiple data sources is useful because IBS on
   pre-zen4 uarch doesn't provide fine granular DataSrc details (there
   is only one such DataSrc(2h) though).
2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
   c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
   prints mem_lvl and mem_lvl_num both if both are set, which is ugly.

Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
setting new API fields, convert all individual field assignments to
compile time wrapper macros built using PERF_MEM_S(). Also convert
DataSrc conditional code to array lookups.

Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
non-intuitive. Make it sane.

v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
v2->v3:
  - IBS: Don't club RmtNode with DataSrc=7 (IO)
  - Make perf_mem__lvl_scnprintf() more sane
  - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
    and interpreat it in tool.
  - Add PERF_MEM_LVLNUM_NA to default data_src value
  - Change some of the IBS bit description according to latest PPR

Namhyung Kim (1):
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src

Ravi Bangoria (8):
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf headers: Sync uapi/linux/perf_event.h
  perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
  perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
  perf mem: Refactor perf_mem__lvl_scnprintf()
  perf mem: Increase HISTC_MEM_LVL column size to 39 chars
  perf script ibs: Change bit description according to latest PPR

 arch/x86/events/amd/ibs.c             | 156 +++++++++++---------------
 include/linux/perf_event.h            |   3 +-
 include/uapi/linux/perf_event.h       |   3 +-
 tools/include/uapi/linux/perf_event.h |   3 +-
 tools/perf/util/amd-sample-raw.c      |  14 +--
 tools/perf/util/event.h               |   3 +-
 tools/perf/util/hist.c                |   2 +-
 tools/perf/util/mem-events.c          |  90 ++++++++-------
 8 files changed, 132 insertions(+), 142 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA Ravi Bangoria
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Older API PERF_MEM_LVL_UNC can be replaced by PERF_MEM_LVLNUM_UNC.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 37675437b768..39c6a250dd1b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1339,7 +1339,8 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0x8 available */
+/* 5-0x7 available */
+#define PERF_MEM_LVLNUM_UNC	0x08 /* Uncached */
 #define PERF_MEM_LVLNUM_CXL	0x09 /* CXL */
 #define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 3/9] perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src Ravi Bangoria
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_NA is used to set default values.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 include/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d5628a7b5eaa..064c5ad7ff11 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1183,7 +1183,8 @@ struct perf_sample_data {
 		    PERF_MEM_S(LVL, NA)   |\
 		    PERF_MEM_S(SNOOP, NA) |\
 		    PERF_MEM_S(LOCK, NA)  |\
-		    PERF_MEM_S(TLB, NA))
+		    PERF_MEM_S(TLB, NA)   |\
+		    PERF_MEM_S(LVLNUM, NA))
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
 					 u64 addr, u64 period)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/9] perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h Ravi Bangoria
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

From: Namhyung Kim <namhyung@kernel.org>

Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
limitations. Mainly:

1. mem_lvl_num doesn't allow setting multiple sources whereas old API
   allows it. Setting multiple data sources is useful because IBS on
   pre-zen4 uarch doesn't provide fine granular DataSrc details (there
   is only one such DataSrc(2h) though).
2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
   c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
   prints mem_lvl and mem_lvl_num both if both are set, which is ugly.

1st one can be handled by using ANY_CACHE with HOPS_0. 2nd is purely
perf tool specific issue and should be fixed separately.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 arch/x86/events/amd/ibs.c | 156 +++++++++++++++++---------------------
 1 file changed, 68 insertions(+), 88 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 64582954b5f6..b6b833f5da50 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -703,38 +703,63 @@ static u8 perf_ibs_data_src(union ibs_op_data2 *op_data2)
 	return op_data2->data_src_lo;
 }
 
-static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
-				 union ibs_op_data3 *op_data3,
-				 struct perf_sample_data *data)
+#define	L(x)		(PERF_MEM_S(LVL, x) | PERF_MEM_S(LVL, HIT))
+#define	LN(x)		PERF_MEM_S(LVLNUM, x)
+#define	REM		PERF_MEM_S(REMOTE, REMOTE)
+#define	HOPS(x)		PERF_MEM_S(HOPS, x)
+
+static u64 g_data_src[8] = {
+	[IBS_DATA_SRC_LOC_CACHE]	  = L(L3) | L(REM_CCE1) | LN(ANY_CACHE) | HOPS(0),
+	[IBS_DATA_SRC_DRAM]		  = L(LOC_RAM) | LN(RAM),
+	[IBS_DATA_SRC_REM_CACHE]	  = L(REM_CCE2) | LN(ANY_CACHE) | REM | HOPS(1),
+	[IBS_DATA_SRC_IO]		  = L(IO) | LN(IO),
+};
+
+#define RMT_NODE_BITS			(1 << IBS_DATA_SRC_DRAM)
+#define RMT_NODE_APPLICABLE(x)		(RMT_NODE_BITS & (1 << x))
+
+static u64 g_zen4_data_src[32] = {
+	[IBS_DATA_SRC_EXT_LOC_CACHE]	  = L(L3) | LN(L3),
+	[IBS_DATA_SRC_EXT_NEAR_CCX_CACHE] = L(REM_CCE1) | LN(ANY_CACHE) | REM | HOPS(0),
+	[IBS_DATA_SRC_EXT_DRAM]		  = L(LOC_RAM) | LN(RAM),
+	[IBS_DATA_SRC_EXT_FAR_CCX_CACHE]  = L(REM_CCE2) | LN(ANY_CACHE) | REM | HOPS(1),
+	[IBS_DATA_SRC_EXT_PMEM]		  = LN(PMEM),
+	[IBS_DATA_SRC_EXT_IO]		  = L(IO) | LN(IO),
+	[IBS_DATA_SRC_EXT_EXT_MEM]	  = LN(CXL),
+};
+
+#define ZEN4_RMT_NODE_BITS		((1 << IBS_DATA_SRC_EXT_DRAM) | \
+					 (1 << IBS_DATA_SRC_EXT_PMEM) | \
+					 (1 << IBS_DATA_SRC_EXT_EXT_MEM))
+#define ZEN4_RMT_NODE_APPLICABLE(x)	(ZEN4_RMT_NODE_BITS & (1 << x))
+
+static __u64 perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
+				  union ibs_op_data3 *op_data3,
+				  struct perf_sample_data *data)
 {
 	union perf_mem_data_src *data_src = &data->data_src;
 	u8 ibs_data_src = perf_ibs_data_src(op_data2);
 
 	data_src->mem_lvl = 0;
+	data_src->mem_lvl_num = 0;
 
 	/*
 	 * DcMiss, L2Miss, DataSrc, DcMissLat etc. are all invalid for Uncached
 	 * memory accesses. So, check DcUcMemAcc bit early.
 	 */
-	if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO) {
-		data_src->mem_lvl = PERF_MEM_LVL_UNC | PERF_MEM_LVL_HIT;
-		return;
-	}
+	if (op_data3->dc_uc_mem_acc && ibs_data_src != IBS_DATA_SRC_EXT_IO)
+		return L(UNC) | LN(UNC);
 
 	/* L1 Hit */
-	if (op_data3->dc_miss == 0) {
-		data_src->mem_lvl = PERF_MEM_LVL_L1 | PERF_MEM_LVL_HIT;
-		return;
-	}
+	if (op_data3->dc_miss == 0)
+		return L(L1) | LN(L1);
 
 	/* L2 Hit */
 	if (op_data3->l2_miss == 0) {
 		/* Erratum #1293 */
 		if (boot_cpu_data.x86 != 0x19 || boot_cpu_data.x86_model > 0xF ||
-		    !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc)) {
-			data_src->mem_lvl = PERF_MEM_LVL_L2 | PERF_MEM_LVL_HIT;
-			return;
-		}
+		    !(op_data3->sw_pf || op_data3->dc_miss_no_mab_alloc))
+			return L(L2) | LN(L2);
 	}
 
 	/*
@@ -744,82 +769,36 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
 	if (data_src->mem_op != PERF_MEM_OP_LOAD)
 		goto check_mab;
 
-	/* L3 Hit */
 	if (ibs_caps & IBS_CAPS_ZEN4) {
-		if (ibs_data_src == IBS_DATA_SRC_EXT_LOC_CACHE) {
-			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_HIT;
-			return;
-		}
-	} else {
-		if (ibs_data_src == IBS_DATA_SRC_LOC_CACHE) {
-			data_src->mem_lvl = PERF_MEM_LVL_L3 | PERF_MEM_LVL_REM_CCE1 |
-					    PERF_MEM_LVL_HIT;
-			return;
-		}
-	}
+		u64 val = g_zen4_data_src[ibs_data_src];
 
-	/* A peer cache in a near CCX */
-	if (ibs_caps & IBS_CAPS_ZEN4 &&
-	    ibs_data_src == IBS_DATA_SRC_EXT_NEAR_CCX_CACHE) {
-		data_src->mem_lvl = PERF_MEM_LVL_REM_CCE1 | PERF_MEM_LVL_HIT;
-		return;
-	}
+		if (!val)
+			goto check_mab;
 
-	/* A peer cache in a far CCX */
-	if (ibs_caps & IBS_CAPS_ZEN4) {
-		if (ibs_data_src == IBS_DATA_SRC_EXT_FAR_CCX_CACHE) {
-			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
-			return;
-		}
-	} else {
-		if (ibs_data_src == IBS_DATA_SRC_REM_CACHE) {
-			data_src->mem_lvl = PERF_MEM_LVL_REM_CCE2 | PERF_MEM_LVL_HIT;
-			return;
+		/* HOPS_1 because IBS doesn't provide remote socket detail */
+		if (op_data2->rmt_node && ZEN4_RMT_NODE_APPLICABLE(ibs_data_src)) {
+			if (ibs_data_src == IBS_DATA_SRC_EXT_DRAM)
+				val = L(REM_RAM1) | LN(RAM) | REM | HOPS(1);
+			else
+				val |= REM | HOPS(1);
 		}
-	}
 
-	/* DRAM */
-	if (ibs_data_src == IBS_DATA_SRC_EXT_DRAM) {
-		if (op_data2->rmt_node == 0)
-			data_src->mem_lvl = PERF_MEM_LVL_LOC_RAM | PERF_MEM_LVL_HIT;
-		else
-			data_src->mem_lvl = PERF_MEM_LVL_REM_RAM1 | PERF_MEM_LVL_HIT;
-		return;
-	}
+		return val;
+	} else {
+		u64 val = g_data_src[ibs_data_src];
 
-	/* PMEM */
-	if (ibs_caps & IBS_CAPS_ZEN4 && ibs_data_src == IBS_DATA_SRC_EXT_PMEM) {
-		data_src->mem_lvl_num = PERF_MEM_LVLNUM_PMEM;
-		if (op_data2->rmt_node) {
-			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
-			/* IBS doesn't provide Remote socket detail */
-			data_src->mem_hops = PERF_MEM_HOPS_1;
-		}
-		return;
-	}
+		if (!val)
+			goto check_mab;
 
-	/* Extension Memory */
-	if (ibs_caps & IBS_CAPS_ZEN4 &&
-	    ibs_data_src == IBS_DATA_SRC_EXT_EXT_MEM) {
-		data_src->mem_lvl_num = PERF_MEM_LVLNUM_CXL;
-		if (op_data2->rmt_node) {
-			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
-			/* IBS doesn't provide Remote socket detail */
-			data_src->mem_hops = PERF_MEM_HOPS_1;
+		/* HOPS_1 because IBS doesn't provide remote socket detail */
+		if (op_data2->rmt_node && RMT_NODE_APPLICABLE(ibs_data_src)) {
+			if (ibs_data_src == IBS_DATA_SRC_DRAM)
+				val = L(REM_RAM1) | LN(RAM) | REM | HOPS(1);
+			else
+				val |= REM | HOPS(1);
 		}
-		return;
-	}
 
-	/* IO */
-	if (ibs_data_src == IBS_DATA_SRC_EXT_IO) {
-		data_src->mem_lvl = PERF_MEM_LVL_IO;
-		data_src->mem_lvl_num = PERF_MEM_LVLNUM_IO;
-		if (op_data2->rmt_node) {
-			data_src->mem_remote = PERF_MEM_REMOTE_REMOTE;
-			/* IBS doesn't provide Remote socket detail */
-			data_src->mem_hops = PERF_MEM_HOPS_1;
-		}
-		return;
+		return val;
 	}
 
 check_mab:
@@ -830,12 +809,11 @@ static void perf_ibs_get_mem_lvl(union ibs_op_data2 *op_data2,
 	 * DataSrc simultaneously. Prioritize DataSrc over MAB, i.e. set
 	 * MAB only when IBS fails to provide DataSrc.
 	 */
-	if (op_data3->dc_miss_no_mab_alloc) {
-		data_src->mem_lvl = PERF_MEM_LVL_LFB | PERF_MEM_LVL_HIT;
-		return;
-	}
+	if (op_data3->dc_miss_no_mab_alloc)
+		return L(LFB) | LN(LFB);
 
-	data_src->mem_lvl = PERF_MEM_LVL_NA;
+	/* Don't set HIT with NA */
+	return PERF_MEM_S(LVL, NA) | LN(NA);
 }
 
 static bool perf_ibs_cache_hit_st_valid(void)
@@ -925,7 +903,9 @@ static void perf_ibs_get_data_src(struct perf_ibs_data *ibs_data,
 				  union ibs_op_data2 *op_data2,
 				  union ibs_op_data3 *op_data3)
 {
-	perf_ibs_get_mem_lvl(op_data2, op_data3, data);
+	union perf_mem_data_src *data_src = &data->data_src;
+
+	data_src->val |= perf_ibs_get_mem_lvl(op_data2, op_data3, data);
 	perf_ibs_get_mem_snoop(op_data2, data);
 	perf_ibs_get_tlb_lvl(op_data3, data);
 	perf_ibs_get_mem_lock(op_data3, data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
                   ` (2 preceding siblings ...)
  2023-04-07 11:24 ` [PATCH v3 3/9] perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE Ravi Bangoria
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

... to bring PERF_MEM_LVLNUM_UNC definition to userspace

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/include/uapi/linux/perf_event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 37675437b768..39c6a250dd1b 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1339,7 +1339,8 @@ union perf_mem_data_src {
 #define PERF_MEM_LVLNUM_L2	0x02 /* L2 */
 #define PERF_MEM_LVLNUM_L3	0x03 /* L3 */
 #define PERF_MEM_LVLNUM_L4	0x04 /* L4 */
-/* 5-0x8 available */
+/* 5-0x7 available */
+#define PERF_MEM_LVLNUM_UNC	0x08 /* Uncached */
 #define PERF_MEM_LVLNUM_CXL	0x09 /* CXL */
 #define PERF_MEM_LVLNUM_IO	0x0a /* I/O */
 #define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
                   ` (3 preceding siblings ...)
  2023-04-07 11:24 ` [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC Ravi Bangoria
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_DATA_SRC_NONE is used to set
default values.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/event.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 6663a676eadc..de20e01c9d72 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -89,7 +89,8 @@ enum {
 	 PERF_MEM_S(LVL, NA) |\
 	 PERF_MEM_S(SNOOP, NA) |\
 	 PERF_MEM_S(LOCK, NA) |\
-	 PERF_MEM_S(TLB, NA))
+	 PERF_MEM_S(TLB, NA) |\
+	 PERF_MEM_S(LVLNUM, NA))
 
 /* Attribute type for custom synthesized events */
 #define PERF_TYPE_SYNTH		(INT_MAX + 1U)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
                   ` (4 preceding siblings ...)
  2023-04-07 11:24 ` [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf() Ravi Bangoria
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Add support for printing PERF_MEM_LVLNUM_UNC in perf mem report.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index b3a91093069a..3a7c72be8326 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -295,6 +295,7 @@ static const char * const mem_lvl[] = {
 };
 
 static const char * const mem_lvlnum[] = {
+	[PERF_MEM_LVLNUM_UNC] = "Uncached",
 	[PERF_MEM_LVLNUM_CXL] = "CXL",
 	[PERF_MEM_LVLNUM_IO] = "I/O",
 	[PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf()
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
                   ` (5 preceding siblings ...)
  2023-04-07 11:24 ` [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars Ravi Bangoria
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() is
non-intuitive. For ex, it ignores mem_lvl when mem_hops is set but
considers it otherwise. It prints both mem_lvl_num and mem_lvl when
mem_hops is not set.

Refactor this function such that it behaves more intuitively: Use
new API mem_lvl_num|mem_remote|mem_hops if mem_lvl_num contains
value other than PERF_MEM_LVLNUM_NA. Otherwise, fallback to old API
mem_lvl. Since new API has no way to indicate MISS, use it from old
api, otherwise don't club old and new APIs while parsing as well as
printing.

Before:
  $ sudo ./perf mem report -F sample,mem --stdio
  #      Samples  Memory access
  # ............  ........................
  #
          250097  N/A
          188907  L1 hit
            4116  L2 hit
            3496  Remote Cache (1 hop) hit
            3271  Remote Cache (2 hops) hit
             873  L3 hit
             598  Local RAM hit
             438  Remote RAM (1 hop) hit
               1  Uncached hit

After:
  $ sudo ./perf mem report -F sample,mem --stdio
  #      Samples  Memory access
  # ............  .......................................
  #
          255517  N/A
          189989  L1 hit
            4541  L2 hit
            3363  Remote core, same node Any cache hit
            3336  Remote node, same socket Any cache hit
            1275  L3 hit
             743  RAM hit
             545  Remote node, same socket RAM hit
               4  Uncached hit

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/mem-events.c | 89 +++++++++++++++++++-----------------
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 3a7c72be8326..ed1ee4b05356 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -344,66 +344,71 @@ static int perf_mem__op_scnprintf(char *out, size_t sz, struct mem_info *mem_inf
 
 int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
 {
-	size_t i, l = 0;
-	u64 m =  PERF_MEM_LVL_NA;
-	u64 hit, miss;
+	union perf_mem_data_src data_src;
 	int printed = 0;
-
-	if (mem_info)
-		m  = mem_info->data_src.mem_lvl;
+	size_t l = 0;
+	size_t i;
+	int lvl;
+	char hit_miss[5] = {0};
 
 	sz -= 1; /* -1 for null termination */
 	out[0] = '\0';
 
-	hit = m & PERF_MEM_LVL_HIT;
-	miss = m & PERF_MEM_LVL_MISS;
+	if (!mem_info)
+		goto na;
 
-	/* already taken care of */
-	m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
+	data_src = mem_info->data_src;
 
-	if (mem_info && mem_info->data_src.mem_remote) {
-		strcat(out, "Remote ");
-		l += 7;
-	}
+	if (data_src.mem_lvl & PERF_MEM_LVL_HIT)
+		memcpy(hit_miss, "hit", 3);
+	else if (data_src.mem_lvl & PERF_MEM_LVL_MISS)
+		memcpy(hit_miss, "miss", 4);
 
-	/*
-	 * Incase mem_hops field is set, we can skip printing data source via
-	 * PERF_MEM_LVL namespace.
-	 */
-	if (mem_info && mem_info->data_src.mem_hops) {
-		l += scnprintf(out + l, sz - l, "%s ", mem_hops[mem_info->data_src.mem_hops]);
-	} else {
-		for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) {
-			if (!(m & 0x1))
-				continue;
-			if (printed++) {
-				strcat(out, " or ");
-				l += 4;
-			}
-			l += scnprintf(out + l, sz - l, mem_lvl[i]);
+	lvl = data_src.mem_lvl_num;
+	if (lvl && lvl != PERF_MEM_LVLNUM_NA) {
+		if (data_src.mem_remote) {
+			strcat(out, "Remote ");
+			l += 7;
 		}
+
+		if (data_src.mem_hops)
+			l += scnprintf(out + l, sz - l, "%s ", mem_hops[data_src.mem_hops]);
+
+		if (mem_lvlnum[lvl])
+			l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]);
+		else
+			l += scnprintf(out + l, sz - l, "L%d", lvl);
+
+		l += scnprintf(out + l, sz - l, " %s", hit_miss);
+		return l;
 	}
 
-	if (mem_info && mem_info->data_src.mem_lvl_num) {
-		int lvl = mem_info->data_src.mem_lvl_num;
+	lvl = data_src.mem_lvl;
+	if (!lvl)
+		goto na;
+
+	lvl &= ~(PERF_MEM_LVL_NA | PERF_MEM_LVL_HIT | PERF_MEM_LVL_MISS);
+	if (!lvl)
+		goto na;
+
+	for (i = 0; lvl && i < ARRAY_SIZE(mem_lvl); i++, lvl >>= 1) {
+		if (!(lvl & 0x1))
+			continue;
 		if (printed++) {
 			strcat(out, " or ");
 			l += 4;
 		}
-		if (mem_lvlnum[lvl])
-			l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]);
-		else
-			l += scnprintf(out + l, sz - l, "L%d", lvl);
+		l += scnprintf(out + l, sz - l, mem_lvl[i]);
 	}
 
-	if (l == 0)
-		l += scnprintf(out + l, sz - l, "N/A");
-	if (hit)
-		l += scnprintf(out + l, sz - l, " hit");
-	if (miss)
-		l += scnprintf(out + l, sz - l, " miss");
+	if (printed) {
+		l += scnprintf(out + l, sz - l, " %s", hit_miss);
+		return l;
+	}
 
-	return l;
+na:
+	strcat(out, "N/A");
+	return 3;
 }
 
 static const char * const snoop_access[] = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
                   ` (6 preceding siblings ...)
  2023-04-07 11:24 ` [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf() Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 11:24 ` [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR Ravi Bangoria
  2023-04-07 21:44 ` [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Namhyung Kim
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

39 is taken from the length of longest printable new API string:
"Remote socket, same board Any cache hit". Although, using old API
can result into even longer strings, let's not overkill by making
it dynamic length.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/hist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b6e4b4edde43..46ce4527919a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -207,7 +207,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 	hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
 	hists__new_col_len(hists, HISTC_MEM_TLB, 22);
 	hists__new_col_len(hists, HISTC_MEM_SNOOP, 12);
-	hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
+	hists__new_col_len(hists, HISTC_MEM_LVL, 36 + 3);
 	hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
 	hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
 	hists__new_col_len(hists, HISTC_MEM_BLOCKED, 10);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
                   ` (7 preceding siblings ...)
  2023-04-07 11:24 ` [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars Ravi Bangoria
@ 2023-04-07 11:24 ` Ravi Bangoria
  2023-04-07 21:44 ` [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Namhyung Kim
  9 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-07 11:24 UTC (permalink / raw)
  To: peterz, namhyung, acme
  Cc: ravi.bangoria, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Some of the IBS_OP_DATA2 bit descriptions were stale (taken from old
version of PPR). Change it according to latest PPR.

Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
 tools/perf/util/amd-sample-raw.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index b0e70ce9d87a..6a6ddba76c75 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -105,17 +105,17 @@ static void pr_ibs_op_data2_extended(union ibs_op_data2 reg)
 	static const char * const data_src_str[] = {
 		"",
 		" DataSrc 1=Local L3 or other L1/L2 in CCX",
-		" DataSrc 2=A peer cache in a near CCX",
-		" DataSrc 3=Data returned from DRAM",
+		" DataSrc 2=Another CCX cache in the same NUMA node",
+		" DataSrc 3=DRAM",
 		" DataSrc 4=(reserved)",
-		" DataSrc 5=A peer cache in a far CCX",
-		" DataSrc 6=DRAM address map with \"long latency\" bit set",
-		" DataSrc 7=Data returned from MMIO/Config/PCI/APIC",
-		" DataSrc 8=Extension Memory (S-Link, GenZ, etc)",
+		" DataSrc 5=Another CCX cache in a different NUMA node",
+		" DataSrc 6=Long-latency DIMM",
+		" DataSrc 7=MMIO/Config/PCI/APIC",
+		" DataSrc 8=Extension Memory",
 		" DataSrc 9=(reserved)",
 		" DataSrc 10=(reserved)",
 		" DataSrc 11=(reserved)",
-		" DataSrc 12=Peer Agent Memory",
+		" DataSrc 12=Coherent Memory of a different processor type",
 		/* 13 to 31 are reserved. Avoid printing them. */
 	};
 	int data_src = (reg.data_src_hi << 3) | reg.data_src_lo;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
  2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
                   ` (8 preceding siblings ...)
  2023-04-07 11:24 ` [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR Ravi Bangoria
@ 2023-04-07 21:44 ` Namhyung Kim
  2023-04-10  2:23   ` Ravi Bangoria
  9 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2023-04-07 21:44 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: peterz, acme, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Hi Ravi,

On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>
> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> limitations. Mainly:
>
> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>    allows it. Setting multiple data sources is useful because IBS on
>    pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>    is only one such DataSrc(2h) though).
> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>    c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>    prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>
> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> setting new API fields, convert all individual field assignments to
> compile time wrapper macros built using PERF_MEM_S(). Also convert
> DataSrc conditional code to array lookups.
>
> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> non-intuitive. Make it sane.

Looks good, but I think you need to split kernel and user patches.

>
> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> v2->v3:
>   - IBS: Don't club RmtNode with DataSrc=7 (IO)
>   - Make perf_mem__lvl_scnprintf() more sane
>   - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
>     and interpreat it in tool.
>   - Add PERF_MEM_LVLNUM_NA to default data_src value
>   - Change some of the IBS bit description according to latest PPR
>
> Namhyung Kim (1):
>   perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
>
> Ravi Bangoria (8):
>   perf/mem: Introduce PERF_MEM_LVLNUM_UNC
>   perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
>   perf headers: Sync uapi/linux/perf_event.h
>   perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
>   perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
>   perf mem: Refactor perf_mem__lvl_scnprintf()
>   perf mem: Increase HISTC_MEM_LVL column size to 39 chars
>   perf script ibs: Change bit description according to latest PPR

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


>
>  arch/x86/events/amd/ibs.c             | 156 +++++++++++---------------
>  include/linux/perf_event.h            |   3 +-
>  include/uapi/linux/perf_event.h       |   3 +-
>  tools/include/uapi/linux/perf_event.h |   3 +-
>  tools/perf/util/amd-sample-raw.c      |  14 +--
>  tools/perf/util/event.h               |   3 +-
>  tools/perf/util/hist.c                |   2 +-
>  tools/perf/util/mem-events.c          |  90 ++++++++-------
>  8 files changed, 132 insertions(+), 142 deletions(-)
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
  2023-04-07 21:44 ` [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Namhyung Kim
@ 2023-04-10  2:23   ` Ravi Bangoria
  2023-04-10 15:15     ` Arnaldo Carvalho de Melo
  2023-05-16  2:45     ` Ravi Bangoria
  0 siblings, 2 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-04-10  2:23 UTC (permalink / raw)
  To: Namhyung Kim, peterz, acme
  Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
	kjain, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla, Ravi Bangoria

On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> Hi Ravi,
> 
> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>> limitations. Mainly:
>>
>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>    allows it. Setting multiple data sources is useful because IBS on
>>    pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>    is only one such DataSrc(2h) though).
>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>    c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>    prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>
>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>> setting new API fields, convert all individual field assignments to
>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>> DataSrc conditional code to array lookups.
>>
>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>> non-intuitive. Make it sane.
> 
> Looks good, but I think you need to split kernel and user patches.

Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
Arnaldo, Peter, please let me know if you wants to split the series and
resend.

> 
>>
>> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
>> v2->v3:
>>   - IBS: Don't club RmtNode with DataSrc=7 (IO)
>>   - Make perf_mem__lvl_scnprintf() more sane
>>   - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
>>     and interpreat it in tool.
>>   - Add PERF_MEM_LVLNUM_NA to default data_src value
>>   - Change some of the IBS bit description according to latest PPR
>>
>> Namhyung Kim (1):
>>   perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
>>
>> Ravi Bangoria (8):
>>   perf/mem: Introduce PERF_MEM_LVLNUM_UNC
>>   perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
>>   perf headers: Sync uapi/linux/perf_event.h
>>   perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
>>   perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
>>   perf mem: Refactor perf_mem__lvl_scnprintf()
>>   perf mem: Increase HISTC_MEM_LVL column size to 39 chars
>>   perf script ibs: Change bit description according to latest PPR
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
  2023-04-10  2:23   ` Ravi Bangoria
@ 2023-04-10 15:15     ` Arnaldo Carvalho de Melo
  2023-04-10 22:31       ` Arnaldo Carvalho de Melo
  2023-05-16  2:45     ` Ravi Bangoria
  1 sibling, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-04-10 15:15 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Namhyung Kim, peterz, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Em Mon, Apr 10, 2023 at 07:53:57AM +0530, Ravi Bangoria escreveu:
> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> > Hi Ravi,
> > 
> > On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> >> limitations. Mainly:
> >>
> >> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> >>    allows it. Setting multiple data sources is useful because IBS on
> >>    pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> >>    is only one such DataSrc(2h) though).
> >> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> >>    c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> >>    prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
> >>
> >> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> >> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> >> setting new API fields, convert all individual field assignments to
> >> compile time wrapper macros built using PERF_MEM_S(). Also convert
> >> DataSrc conditional code to array lookups.
> >>
> >> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> >> non-intuitive. Make it sane.
> > 
> > Looks good, but I think you need to split kernel and user patches.
> 
> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> Arnaldo, Peter, please let me know if you wants to split the series and
> resend.

I can always use b4's -P option :-) So no need to resubmit, I can pick
the tools bits,

- Arnaldo
 
> > 
> >>
> >> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> >> v2->v3:
> >>   - IBS: Don't club RmtNode with DataSrc=7 (IO)
> >>   - Make perf_mem__lvl_scnprintf() more sane
> >>   - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> >>     and interpreat it in tool.
> >>   - Add PERF_MEM_LVLNUM_NA to default data_src value
> >>   - Change some of the IBS bit description according to latest PPR
> >>
> >> Namhyung Kim (1):
> >>   perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
> >>
> >> Ravi Bangoria (8):
> >>   perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> >>   perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> >>   perf headers: Sync uapi/linux/perf_event.h
> >>   perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> >>   perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> >>   perf mem: Refactor perf_mem__lvl_scnprintf()
> >>   perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> >>   perf script ibs: Change bit description according to latest PPR
> > 
> > Acked-by: Namhyung Kim <namhyung@kernel.org>
> 
> Thanks!

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
  2023-04-10 15:15     ` Arnaldo Carvalho de Melo
@ 2023-04-10 22:31       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-04-10 22:31 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Namhyung Kim, peterz, mingo, eranian, kan.liang, jolsa, irogers,
	adrian.hunter, leo.yan, kjain, linux-perf-users, linux-kernel,
	sandipan.das, ananth.narayan, santosh.shukla

Em Mon, Apr 10, 2023 at 12:15:48PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Apr 10, 2023 at 07:53:57AM +0530, Ravi Bangoria escreveu:
> > On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
> > > Hi Ravi,
> > > 
> > > On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> > >>
> > >> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
> > >> limitations. Mainly:
> > >>
> > >> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
> > >>    allows it. Setting multiple data sources is useful because IBS on
> > >>    pre-zen4 uarch doesn't provide fine granular DataSrc details (there
> > >>    is only one such DataSrc(2h) though).
> > >> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
> > >>    c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
> > >>    prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
> > >>
> > >> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
> > >> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
> > >> setting new API fields, convert all individual field assignments to
> > >> compile time wrapper macros built using PERF_MEM_S(). Also convert
> > >> DataSrc conditional code to array lookups.
> > >>
> > >> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
> > >> non-intuitive. Make it sane.
> > > 
> > > Looks good, but I think you need to split kernel and user patches.
> > 
> > Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> > Arnaldo, Peter, please let me know if you wants to split the series and
> > resend.
> 
> I can always use b4's -P option :-) So no need to resubmit, I can pick
>> the tools bits,

Done
 
> - Arnaldo
>  
> > > 
> > >>
> > >> v2: https://lore.kernel.org/r/20230327130851.1565-1-ravi.bangoria%40amd.com
> > >> v2->v3:
> > >>   - IBS: Don't club RmtNode with DataSrc=7 (IO)
> > >>   - Make perf_mem__lvl_scnprintf() more sane
> > >>   - Introduce PERF_MEM_LVLNUM_UNC, set it along with PERF_MEM_LVL_UNC
> > >>     and interpreat it in tool.
> > >>   - Add PERF_MEM_LVLNUM_NA to default data_src value
> > >>   - Change some of the IBS bit description according to latest PPR
> > >>
> > >> Namhyung Kim (1):
> > >>   perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
> > >>
> > >> Ravi Bangoria (8):
> > >>   perf/mem: Introduce PERF_MEM_LVLNUM_UNC
> > >>   perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
> > >>   perf headers: Sync uapi/linux/perf_event.h
> > >>   perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE
> > >>   perf mem: Add support for printing PERF_MEM_LVLNUM_UNC
> > >>   perf mem: Refactor perf_mem__lvl_scnprintf()
> > >>   perf mem: Increase HISTC_MEM_LVL column size to 39 chars
> > >>   perf script ibs: Change bit description according to latest PPR
> > > 
> > > Acked-by: Namhyung Kim <namhyung@kernel.org>
> > 
> > Thanks!
> 
> -- 
> 
> - Arnaldo

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
  2023-04-10  2:23   ` Ravi Bangoria
  2023-04-10 15:15     ` Arnaldo Carvalho de Melo
@ 2023-05-16  2:45     ` Ravi Bangoria
  2023-05-29  4:05       ` Ravi Bangoria
  1 sibling, 1 reply; 17+ messages in thread
From: Ravi Bangoria @ 2023-05-16  2:45 UTC (permalink / raw)
  To: peterz
  Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
	kjain, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla, Namhyung Kim, acme, Ravi Bangoria

On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>> Hi Ravi,
>>
>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>
>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>> limitations. Mainly:
>>>
>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>>    allows it. Setting multiple data sources is useful because IBS on
>>>    pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>>    is only one such DataSrc(2h) though).
>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>>    c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>>    prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>
>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>> setting new API fields, convert all individual field assignments to
>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>> DataSrc conditional code to array lookups.
>>>
>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>> non-intuitive. Make it sane.
>>
>> Looks good, but I think you need to split kernel and user patches.
> 
> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
> Arnaldo, Peter, please let me know if you wants to split the series and
> resend.

Hi Peter, tools/ patches are already upstream. Can you please pick up
kernel changes.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
  2023-05-16  2:45     ` Ravi Bangoria
@ 2023-05-29  4:05       ` Ravi Bangoria
  2023-06-30  6:35         ` Ravi Bangoria
  0 siblings, 1 reply; 17+ messages in thread
From: Ravi Bangoria @ 2023-05-29  4:05 UTC (permalink / raw)
  To: peterz
  Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
	kjain, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla, Namhyung Kim, acme, Ravi Bangoria

On 16-May-23 8:15 AM, Ravi Bangoria wrote:
> On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
>> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>>> Hi Ravi,
>>>
>>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>>> limitations. Mainly:
>>>>
>>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>>>    allows it. Setting multiple data sources is useful because IBS on
>>>>    pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>>>    is only one such DataSrc(2h) though).
>>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>>>    c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>>>    prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>>
>>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>>> setting new API fields, convert all individual field assignments to
>>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>>> DataSrc conditional code to array lookups.
>>>>
>>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>>> non-intuitive. Make it sane.
>>>
>>> Looks good, but I think you need to split kernel and user patches.
>>
>> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
>> Arnaldo, Peter, please let me know if you wants to split the series and
>> resend.
> 
> Hi Peter, tools/ patches are already upstream. Can you please pick up
> kernel changes.

Gentle ping, Peter!

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements
  2023-05-29  4:05       ` Ravi Bangoria
@ 2023-06-30  6:35         ` Ravi Bangoria
  0 siblings, 0 replies; 17+ messages in thread
From: Ravi Bangoria @ 2023-06-30  6:35 UTC (permalink / raw)
  To: peterz
  Cc: mingo, eranian, kan.liang, jolsa, irogers, adrian.hunter, leo.yan,
	kjain, linux-perf-users, linux-kernel, sandipan.das,
	ananth.narayan, santosh.shukla, Namhyung Kim, acme, Ravi Bangoria

On 29-May-23 9:35 AM, Ravi Bangoria wrote:
> On 16-May-23 8:15 AM, Ravi Bangoria wrote:
>> On 10-Apr-23 7:53 AM, Ravi Bangoria wrote:
>>> On 08-Apr-23 3:14 AM, Namhyung Kim wrote:
>>>> Hi Ravi,
>>>>
>>>> On Fri, Apr 7, 2023 at 4:25 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>
>>>>> Kernel IBS driver wasn't using new PERF_MEM_* APIs due to some of its
>>>>> limitations. Mainly:
>>>>>
>>>>> 1. mem_lvl_num doesn't allow setting multiple sources whereas old API
>>>>>    allows it. Setting multiple data sources is useful because IBS on
>>>>>    pre-zen4 uarch doesn't provide fine granular DataSrc details (there
>>>>>    is only one such DataSrc(2h) though).
>>>>> 2. perf mem sorting logic (sort__lvl_cmp()) ignores mem_lvl_num. perf
>>>>>    c2c (c2c_decode_stats()) does not use mem_lvl_num at all. perf mem
>>>>>    prints mem_lvl and mem_lvl_num both if both are set, which is ugly.
>>>>>
>>>>> Set mem_lvl_num, mem_remote and mem_hops for data_src via IBS. Handle
>>>>> first issue using mem_lvl_num = ANY_CACHE | HOPS_0. In addition to
>>>>> setting new API fields, convert all individual field assignments to
>>>>> compile time wrapper macros built using PERF_MEM_S(). Also convert
>>>>> DataSrc conditional code to array lookups.
>>>>>
>>>>> Interpretation of perf_mem_data_src by perf_mem__lvl_scnprintf() was
>>>>> non-intuitive. Make it sane.
>>>>
>>>> Looks good, but I think you need to split kernel and user patches.
>>>
>>> Patch #1 to #3 are kernel changes. Patch #4 to #9 are userspace changes.
>>> Arnaldo, Peter, please let me know if you wants to split the series and
>>> resend.
>>
>> Hi Peter, tools/ patches are already upstream. Can you please pick up
>> kernel changes.
> 
> Gentle ping, Peter!

Hello Peter, this is pending from long time. Can you please consider taking
it. Please let me know if you want me to resend the patches.

Thanks,
Ravi

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-30  6:37 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-07 11:24 [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 1/9] perf/mem: Introduce PERF_MEM_LVLNUM_UNC Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 2/9] perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 3/9] perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 4/9] perf headers: Sync uapi/linux/perf_event.h Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 5/9] perf mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_DATA_SRC_NONE Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 6/9] perf mem: Add support for printing PERF_MEM_LVLNUM_UNC Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 7/9] perf mem: Refactor perf_mem__lvl_scnprintf() Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 8/9] perf mem: Increase HISTC_MEM_LVL column size to 39 chars Ravi Bangoria
2023-04-07 11:24 ` [PATCH v3 9/9] perf script ibs: Change bit description according to latest PPR Ravi Bangoria
2023-04-07 21:44 ` [PATCH v3 0/9] perf/mem: AMD IBS and generic tools improvements Namhyung Kim
2023-04-10  2:23   ` Ravi Bangoria
2023-04-10 15:15     ` Arnaldo Carvalho de Melo
2023-04-10 22:31       ` Arnaldo Carvalho de Melo
2023-05-16  2:45     ` Ravi Bangoria
2023-05-29  4:05       ` Ravi Bangoria
2023-06-30  6:35         ` Ravi Bangoria

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).