* [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes)
@ 2025-04-29 3:59 Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump Ravi Bangoria
` (4 more replies)
0 siblings, 5 replies; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-29 3:59 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim
Cc: Ravi Bangoria, Peter Zijlstra, Joe Mario, Stephane Eranian,
Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel, linux-perf-users,
Santosh Shukla, Ananth Narayan, Sandipan Das
IBS on Zen5:
- Introduced Load Latency filtering capability.
- Shows DTLB and page size information differently from prior generations.
Kernel changes for these enhancements are already upstream. So, resending
tools changes separately.
Patches are prepared on perf-tools-next/perf-tools-next (85447f68a1e3).
v3: https://lore.kernel.org/r/20250205060547.1337-1-ravi.bangoria@amd.com
v3->v4:
- Remove kernel changes.
- Improve IBS sample period unit test
Ravi Bangoria (4):
perf amd ibs: Add Load Latency bits in raw dump
perf amd ibs: Incorporate Zen5 DTLB and PageSize information
perf mem/c2c amd: Add ldlat support
perf test amd ibs: Add sample period unit test
tools/perf/Documentation/perf-amd-ibs.txt | 9 +
tools/perf/Documentation/perf-c2c.txt | 11 +-
tools/perf/Documentation/perf-mem.txt | 13 +-
tools/perf/arch/x86/include/arch-tests.h | 1 +
tools/perf/arch/x86/tests/Build | 1 +
tools/perf/arch/x86/tests/amd-ibs-period.c | 1001 ++++++++++++++++++++
tools/perf/arch/x86/tests/arch-tests.c | 2 +
tools/perf/arch/x86/util/mem-events.c | 6 +
tools/perf/arch/x86/util/mem-events.h | 1 +
tools/perf/arch/x86/util/pmu.c | 20 +-
tools/perf/tests/shell/test_data_symbol.sh | 29 +-
tools/perf/util/amd-sample-raw.c | 77 +-
tools/perf/util/pmu.c | 11 +
tools/perf/util/pmu.h | 2 +
14 files changed, 1160 insertions(+), 24 deletions(-)
create mode 100644 tools/perf/arch/x86/tests/amd-ibs-period.c
--
2.43.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump
2025-04-29 3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
@ 2025-04-29 3:59 ` Ravi Bangoria
2025-04-30 16:58 ` Namhyung Kim
2025-04-29 3:59 ` [PATCH v4 2/4] perf amd ibs: Incorporate Zen5 DTLB and PageSize information Ravi Bangoria
` (3 subsequent siblings)
4 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-29 3:59 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim
Cc: Ravi Bangoria, Peter Zijlstra, Joe Mario, Stephane Eranian,
Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel, linux-perf-users,
Santosh Shukla, Ananth Narayan, Sandipan Das
IBS OP PMU on Zen5 supports Load Latency filtering. Decode and dump Load
Latency filtering related bits into perf script raw dump.
Also add oneliner example in the perf-amd-ibs man page.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/Documentation/perf-amd-ibs.txt | 9 +++++++++
tools/perf/util/amd-sample-raw.c | 14 ++++++++++++--
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-amd-ibs.txt b/tools/perf/Documentation/perf-amd-ibs.txt
index 2fd31d9d7b71..55f80beae037 100644
--- a/tools/perf/Documentation/perf-amd-ibs.txt
+++ b/tools/perf/Documentation/perf-amd-ibs.txt
@@ -85,6 +85,15 @@ System-wide profile, uOps event, sampling period: 100000, L3MissOnly (Zen4 onwar
# perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
+System-wide profile, cycles event, sampling period: 100000, LdLat filtering (Zen5
+onward)
+
+ # perf record -e ibs_op/ldlat=128/ -c 100000 -a
+
+ Supported load latency threshold values are 128 to 2048 (both inclusive).
+ Latency value which is a multiple of 128 incurs a little less profiling
+ overhead compared to other values.
+
Per process(upstream v6.2 onward), uOps event, sampling period: 100000
# perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index 9d0ce88e90e4..ac34b18ccc0c 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -19,6 +19,7 @@
static u32 cpu_family, cpu_model, ibs_fetch_type, ibs_op_type;
static bool zen4_ibs_extensions;
+static bool ldlat_cap;
static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
{
@@ -78,14 +79,20 @@ static void pr_ic_ibs_extd_ctl(union ic_ibs_extd_ctl reg)
static void pr_ibs_op_ctl(union ibs_op_ctl reg)
{
char l3_miss_only[sizeof(" L3MissOnly _")] = "";
+ char ldlat[sizeof(" LdLatThrsh __ LdLatEn _")] = "";
if (zen4_ibs_extensions)
snprintf(l3_miss_only, sizeof(l3_miss_only), " L3MissOnly %d", reg.l3_miss_only);
- printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d\n",
+ if (ldlat_cap) {
+ snprintf(ldlat, sizeof(ldlat), " LdLatThrsh %2d LdLatEn %d",
+ reg.ldlat_thrsh, reg.ldlat_en);
+ }
+
+ printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d%s\n",
reg.val, ((reg.opmaxcnt_ext << 16) | reg.opmaxcnt) << 4, l3_miss_only,
reg.op_en, reg.op_val, reg.cnt_ctl,
- reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt);
+ reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt, ldlat);
}
static void pr_ibs_op_data(union ibs_op_data reg)
@@ -331,6 +338,9 @@ bool evlist__has_amd_ibs(struct evlist *evlist)
if (perf_env__find_pmu_cap(env, "ibs_op", "zen4_ibs_extensions"))
zen4_ibs_extensions = 1;
+ if (perf_env__find_pmu_cap(env, "ibs_op", "ldlat"))
+ ldlat_cap = 1;
+
if (ibs_fetch_type || ibs_op_type) {
if (!cpu_family)
parse_cpuid(env);
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 2/4] perf amd ibs: Incorporate Zen5 DTLB and PageSize information
2025-04-29 3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump Ravi Bangoria
@ 2025-04-29 3:59 ` Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 3/4] perf mem/c2c amd: Add ldlat support Ravi Bangoria
` (2 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-29 3:59 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim
Cc: Ravi Bangoria, Peter Zijlstra, Joe Mario, Stephane Eranian,
Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel, linux-perf-users,
Santosh Shukla, Ananth Narayan, Sandipan Das
IBS Op PMU on Zen5 reports DTLB and page size information differently
compared to prior generation.
IBS_OP_DATA3 Zen3/4 Zen5
----------------------------------------------------------------
19 IbsDcL2TlbHit1G Reserved
----------------------------------------------------------------
6 IbsDcL2tlbHit2M Reserved
----------------------------------------------------------------
5 IbsDcL1TlbHit1G PageSize:
4 IbsDcL1TlbHit2M 0 - 4K
1 - 2M
2 - 1G
3 - Reserved
Valid only if
IbsDcPhyAddrValid = 1
----------------------------------------------------------------
3 IbsDcL2TlbMiss IbsDcL2TlbMiss
Valid only if
IbsDcPhyAddrValid = 1
----------------------------------------------------------------
2 IbsDcL1tlbMiss IbsDcL1tlbMiss
Valid only if
IbsDcPhyAddrValid = 1
----------------------------------------------------------------
Kernel expose this change as "dtlb_pgsize" capability in PMU sysfs.
Change IBS register raw-dump logic according to new bit definitions.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/util/amd-sample-raw.c | 63 ++++++++++++++++++++++++++------
1 file changed, 51 insertions(+), 12 deletions(-)
diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
index ac34b18ccc0c..022c9eb39509 100644
--- a/tools/perf/util/amd-sample-raw.c
+++ b/tools/perf/util/amd-sample-raw.c
@@ -20,6 +20,7 @@
static u32 cpu_family, cpu_model, ibs_fetch_type, ibs_op_type;
static bool zen4_ibs_extensions;
static bool ldlat_cap;
+static bool dtlb_pgsize_cap;
static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
{
@@ -161,9 +162,20 @@ static void pr_ibs_op_data2(union ibs_op_data2 reg)
static void pr_ibs_op_data3(union ibs_op_data3 reg)
{
- char l2_miss_str[sizeof(" L2Miss _")] = "";
- char op_mem_width_str[sizeof(" OpMemWidth _____ bytes")] = "";
+ static const char * const dc_page_sizes[] = {
+ " 4K",
+ " 2M",
+ " 1G",
+ " ??",
+ };
char op_dc_miss_open_mem_reqs_str[sizeof(" OpDcMissOpenMemReqs __")] = "";
+ char dc_l1_l2tlb_miss_str[sizeof(" DcL1TlbMiss _ DcL2TlbMiss _")] = "";
+ char dc_l1tlb_hit_str[sizeof(" DcL1TlbHit2M _ DcL1TlbHit1G _")] = "";
+ char op_mem_width_str[sizeof(" OpMemWidth _____ bytes")] = "";
+ char dc_l2tlb_hit_2m_str[sizeof(" DcL2TlbHit2M _")] = "";
+ char dc_l2tlb_hit_1g_str[sizeof(" DcL2TlbHit1G _")] = "";
+ char dc_page_size_str[sizeof(" DcPageSize ____")] = "";
+ char l2_miss_str[sizeof(" L2Miss _")] = "";
/*
* Erratum #1293
@@ -179,16 +191,40 @@ static void pr_ibs_op_data3(union ibs_op_data3 reg)
snprintf(op_mem_width_str, sizeof(op_mem_width_str),
" OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1));
- printf("ibs_op_data3:\t%016llx LdOp %d StOp %d DcL1TlbMiss %d DcL2TlbMiss %d "
- "DcL1TlbHit2M %d DcL1TlbHit1G %d DcL2TlbHit2M %d DcMiss %d DcMisAcc %d "
- "DcWcMemAcc %d DcUcMemAcc %d DcLockedOp %d DcMissNoMabAlloc %d DcLinAddrValid %d "
- "DcPhyAddrValid %d DcL2TlbHit1G %d%s SwPf %d%s%s DcMissLat %5d TlbRefillLat %5d\n",
- reg.val, reg.ld_op, reg.st_op, reg.dc_l1tlb_miss, reg.dc_l2tlb_miss,
- reg.dc_l1tlb_hit_2m, reg.dc_l1tlb_hit_1g, reg.dc_l2tlb_hit_2m, reg.dc_miss,
- reg.dc_mis_acc, reg.dc_wc_mem_acc, reg.dc_uc_mem_acc, reg.dc_locked_op,
- reg.dc_miss_no_mab_alloc, reg.dc_lin_addr_valid, reg.dc_phy_addr_valid,
- reg.dc_l2_tlb_hit_1g, l2_miss_str, reg.sw_pf, op_mem_width_str,
- op_dc_miss_open_mem_reqs_str, reg.dc_miss_lat, reg.tlb_refill_lat);
+ if (dtlb_pgsize_cap) {
+ if (reg.dc_phy_addr_valid) {
+ int idx = (reg.dc_l1tlb_hit_1g << 1) | reg.dc_l1tlb_hit_2m;
+
+ snprintf(dc_l1_l2tlb_miss_str, sizeof(dc_l1_l2tlb_miss_str),
+ " DcL1TlbMiss %d DcL2TlbMiss %d",
+ reg.dc_l1tlb_miss, reg.dc_l2tlb_miss);
+ snprintf(dc_page_size_str, sizeof(dc_page_size_str),
+ " DcPageSize %4s", dc_page_sizes[idx]);
+ }
+ } else {
+ snprintf(dc_l1_l2tlb_miss_str, sizeof(dc_l1_l2tlb_miss_str),
+ " DcL1TlbMiss %d DcL2TlbMiss %d",
+ reg.dc_l1tlb_miss, reg.dc_l2tlb_miss);
+ snprintf(dc_l1tlb_hit_str, sizeof(dc_l1tlb_hit_str),
+ " DcL1TlbHit2M %d DcL1TlbHit1G %d",
+ reg.dc_l1tlb_hit_2m, reg.dc_l1tlb_hit_1g);
+ snprintf(dc_l2tlb_hit_2m_str, sizeof(dc_l2tlb_hit_2m_str),
+ " DcL2TlbHit2M %d", reg.dc_l2tlb_hit_2m);
+ snprintf(dc_l2tlb_hit_1g_str, sizeof(dc_l2tlb_hit_1g_str),
+ " DcL2TlbHit1G %d", reg.dc_l2_tlb_hit_1g);
+ }
+
+ printf("ibs_op_data3:\t%016llx LdOp %d StOp %d%s%s%s DcMiss %d DcMisAcc %d "
+ "DcWcMemAcc %d DcUcMemAcc %d DcLockedOp %d DcMissNoMabAlloc %d "
+ "DcLinAddrValid %d DcPhyAddrValid %d%s%s SwPf %d%s%s "
+ "DcMissLat %5d TlbRefillLat %5d\n",
+ reg.val, reg.ld_op, reg.st_op, dc_l1_l2tlb_miss_str,
+ dtlb_pgsize_cap ? dc_page_size_str : dc_l1tlb_hit_str,
+ dc_l2tlb_hit_2m_str, reg.dc_miss, reg.dc_mis_acc, reg.dc_wc_mem_acc,
+ reg.dc_uc_mem_acc, reg.dc_locked_op, reg.dc_miss_no_mab_alloc,
+ reg.dc_lin_addr_valid, reg.dc_phy_addr_valid, dc_l2tlb_hit_1g_str,
+ l2_miss_str, reg.sw_pf, op_mem_width_str, op_dc_miss_open_mem_reqs_str,
+ reg.dc_miss_lat, reg.tlb_refill_lat);
}
/*
@@ -341,6 +377,9 @@ bool evlist__has_amd_ibs(struct evlist *evlist)
if (perf_env__find_pmu_cap(env, "ibs_op", "ldlat"))
ldlat_cap = 1;
+ if (perf_env__find_pmu_cap(env, "ibs_op", "dtlb_pgsize"))
+ dtlb_pgsize_cap = 1;
+
if (ibs_fetch_type || ibs_op_type) {
if (!cpu_family)
parse_cpuid(env);
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 3/4] perf mem/c2c amd: Add ldlat support
2025-04-29 3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 2/4] perf amd ibs: Incorporate Zen5 DTLB and PageSize information Ravi Bangoria
@ 2025-04-29 3:59 ` Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 4/4] perf test amd ibs: Add sample period unit test Ravi Bangoria
2025-04-30 2:00 ` [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Arnaldo Carvalho de Melo
4 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-29 3:59 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim
Cc: Ravi Bangoria, Peter Zijlstra, Joe Mario, Stephane Eranian,
Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel, linux-perf-users,
Santosh Shukla, Ananth Narayan, Sandipan Das
Perf mem and c2c uses IBS Op PMU on AMD platforms. IBS Op PMU on Zen5
uarch has added support for Load Latency filtering. Implement perf mem/
c2c --ldlat using IBS Op Load Latency filtering capability.
Some subtle differences between AMD and other arch:
o --ldlat is disabled by default on AMD
o Supported values are 128 to 2048.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/Documentation/perf-c2c.txt | 11 ++++++--
tools/perf/Documentation/perf-mem.txt | 13 ++++++++--
tools/perf/arch/x86/util/mem-events.c | 6 +++++
tools/perf/arch/x86/util/mem-events.h | 1 +
tools/perf/arch/x86/util/pmu.c | 20 ++++++++++++---
tools/perf/tests/shell/test_data_symbol.sh | 29 +++++++++++++++++++---
tools/perf/util/pmu.c | 11 ++++++++
tools/perf/util/pmu.h | 2 ++
8 files changed, 83 insertions(+), 10 deletions(-)
diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 856f0dfb8e5a..f4af2dd6ab31 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -54,8 +54,15 @@ RECORD OPTIONS
-l::
--ldlat::
- Configure mem-loads latency. Supported on Intel and Arm64 processors
- only. Ignored on other archs.
+ Configure mem-loads latency. Supported on Intel, Arm64 and some AMD
+ processors. Ignored on other archs.
+
+ On supported AMD processors:
+ - /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
+ - Supported latency values are 128 to 2048 (both inclusive).
+ - Latency value which is a multiple of 128 incurs a little less profiling
+ overhead compared to other values.
+ - Load latency filtering is disabled by default.
-k::
--all-kernel::
diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 8a1bd9ff0f86..a9e3c71a2205 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -28,6 +28,8 @@ and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
Due to the statistical nature of SPE sampling, not every memory operation will
be sampled.
+On AMD this use IBS Op PMU to sample load-store operations.
+
COMMON OPTIONS
--------------
-f::
@@ -67,8 +69,15 @@ RECORD OPTIONS
Configure all used events to run in user space.
--ldlat <n>::
- Specify desired latency for loads event. Supported on Intel and Arm64
- processors only. Ignored on other archs.
+ Specify desired latency for loads event. Supported on Intel, Arm64 and
+ some AMD processors. Ignored on other archs.
+
+ On supported AMD processors:
+ - /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
+ - Supported latency values are 128 to 2048 (both inclusive).
+ - Latency value which is a multiple of 128 incurs a little less profiling
+ overhead compared to other values.
+ - Load latency filtering is disabled by default.
REPORT OPTIONS
--------------
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 62df03e91c7e..b38f519020ff 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -26,3 +26,9 @@ struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
E(NULL, NULL, NULL, false, 0),
E("mem-ldst", "%s//", NULL, false, 0),
};
+
+struct perf_mem_event perf_mem_events_amd_ldlat[PERF_MEM_EVENTS__MAX] = {
+ E(NULL, NULL, NULL, false, 0),
+ E(NULL, NULL, NULL, false, 0),
+ E("mem-ldst", "%s/ldlat=%u/", NULL, true, 0),
+};
diff --git a/tools/perf/arch/x86/util/mem-events.h b/tools/perf/arch/x86/util/mem-events.h
index f55c8d3b7d59..11e09a256f5b 100644
--- a/tools/perf/arch/x86/util/mem-events.h
+++ b/tools/perf/arch/x86/util/mem-events.h
@@ -6,5 +6,6 @@ extern struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX];
extern struct perf_mem_event perf_mem_events_intel_aux[PERF_MEM_EVENTS__MAX];
extern struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX];
+extern struct perf_mem_event perf_mem_events_amd_ldlat[PERF_MEM_EVENTS__MAX];
#endif /* _X86_MEM_EVENTS_H */
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index e0060dac2a9f..8712cbbbc712 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -18,8 +18,10 @@
#include "mem-events.h"
#include "util/env.h"
-void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
+void perf_pmu__arch_init(struct perf_pmu *pmu)
{
+ struct perf_pmu_caps *ldlat_cap;
+
#ifdef HAVE_AUXTRACE_SUPPORT
if (!strcmp(pmu->name, INTEL_PT_PMU_NAME)) {
pmu->auxtrace = true;
@@ -33,8 +35,20 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
#endif
if (x86__is_amd_cpu()) {
- if (!strcmp(pmu->name, "ibs_op"))
- pmu->mem_events = perf_mem_events_amd;
+ if (strcmp(pmu->name, "ibs_op"))
+ return;
+
+ pmu->mem_events = perf_mem_events_amd;
+
+ if (!perf_pmu__caps_parse(pmu))
+ return;
+
+ ldlat_cap = perf_pmu__get_cap(pmu, "ldlat");
+ if (!ldlat_cap || strcmp(ldlat_cap->value, "1"))
+ return;
+
+ perf_mem_events__loads_ldlat = 0;
+ pmu->mem_events = perf_mem_events_amd_ldlat;
} else if (pmu->is_core) {
if (perf_pmu__have_event(pmu, "mem-loads-aux"))
pmu->mem_events = perf_mem_events_intel_aux;
diff --git a/tools/perf/tests/shell/test_data_symbol.sh b/tools/perf/tests/shell/test_data_symbol.sh
index bbe8277496ae..d61b5659a46d 100755
--- a/tools/perf/tests/shell/test_data_symbol.sh
+++ b/tools/perf/tests/shell/test_data_symbol.sh
@@ -54,11 +54,34 @@ trap cleanup_files exit term int
echo "Recording workload..."
-# perf mem/c2c internally uses IBS PMU on AMD CPU which doesn't support
-# user/kernel filtering and per-process monitoring, spin program on
-# specific CPU and test in per-CPU mode.
is_amd=$(grep -E -c 'vendor_id.*AuthenticAMD' /proc/cpuinfo)
if (($is_amd >= 1)); then
+ mem_events="$(perf mem record -v -e list 2>&1)"
+ if ! [[ "$mem_events" =~ ^mem\-ldst.*ibs_op/(.*)/.*available ]]; then
+ echo "ERROR: mem-ldst event is not matching"
+ exit 1
+ fi
+
+ # --ldlat on AMD:
+ # o Zen4 and earlier uarch does not support ldlat
+ # o Even on supported platforms, it's disabled (--ldlat=0) by default.
+ ldlat=${BASH_REMATCH[1]}
+ if [[ -n $ldlat ]]; then
+ if ! [[ "$ldlat" =~ ldlat=0 ]]; then
+ echo "ERROR: ldlat not initialized to 0?"
+ exit 1
+ fi
+
+ mem_events="$(perf mem record -v --ldlat=150 -e list 2>&1)"
+ if ! [[ "$mem_events" =~ ^mem-ldst.*ibs_op/ldlat=150/.*available ]]; then
+ echo "ERROR: --ldlat not honored?"
+ exit 1
+ fi
+ fi
+
+ # perf mem/c2c internally uses IBS PMU on AMD CPU which doesn't
+ # support user/kernel filtering and per-process monitoring on older
+ # kernels, spin program on specific CPU and test in per-CPU mode.
perf mem record -vvv -o ${PERF_DATA} -C 0 -- taskset -c 0 $TEST_PROGRAM 2>"${ERR_FILE}"
else
perf mem record -vvv --all-user -o ${PERF_DATA} -- $TEST_PROGRAM 2>"${ERR_FILE}"
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index bbb906bb2159..d08972aa461c 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -2259,6 +2259,17 @@ static void perf_pmu__del_caps(struct perf_pmu *pmu)
}
}
+struct perf_pmu_caps *perf_pmu__get_cap(struct perf_pmu *pmu, const char *name)
+{
+ struct perf_pmu_caps *caps;
+
+ list_for_each_entry(caps, &pmu->caps, list) {
+ if (!strcmp(caps->name, name))
+ return caps;
+ }
+ return NULL;
+}
+
/*
* Reading/parsing the given pmu capabilities, which should be located at:
* /sys/bus/event_source/devices/<dev>/caps as sysfs group attributes.
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 13dd3511f504..a1fdd6d50c53 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -277,6 +277,8 @@ bool pmu_uncore_identifier_match(const char *compat, const char *id);
int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
+struct perf_pmu_caps *perf_pmu__get_cap(struct perf_pmu *pmu, const char *name);
+
int perf_pmu__caps_parse(struct perf_pmu *pmu);
void perf_pmu__warn_invalid_config(struct perf_pmu *pmu, __u64 config,
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-29 3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
` (2 preceding siblings ...)
2025-04-29 3:59 ` [PATCH v4 3/4] perf mem/c2c amd: Add ldlat support Ravi Bangoria
@ 2025-04-29 3:59 ` Ravi Bangoria
2025-04-29 20:55 ` Arnaldo Carvalho de Melo
2025-04-30 2:00 ` [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Arnaldo Carvalho de Melo
4 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-29 3:59 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim
Cc: Ravi Bangoria, Peter Zijlstra, Joe Mario, Stephane Eranian,
Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel, linux-perf-users,
Santosh Shukla, Ananth Narayan, Sandipan Das
IBS Fetch and IBS Op PMUs has various constraints on supported sample
periods. Add perf unit tests to test those.
Running it in parallel with other tests causes intermittent failures.
Mark it exclusive to force it to run sequentially. Sample output on a
Zen5 machine:
Without kernel fixes:
$ sudo ./perf test -vv 112
112: AMD IBS sample period:
--- start ---
test child forked, pid 8774
Using CPUID AuthenticAMD-26-2-1
IBS config tests:
-----------------
Fetch PMU tests:
0xffff : Ok (nr samples: 1078)
0x1000 : Ok (nr samples: 17030)
0xff : Ok (nr samples: 41068)
0x1 : Ok (nr samples: 40543)
0x0 : Ok
0x10000 : Ok
Op PMU tests:
0x0 : Ok
0x1 : Fail
0x8 : Fail
0x9 : Ok (nr samples: 40543)
0xf : Ok (nr samples: 40543)
0x1000 : Ok (nr samples: 18736)
0xffff : Ok (nr samples: 1168)
0x10000 : Ok
0x100000 : Fail (nr samples: 14)
0xf00000 : Fail (nr samples: 1)
0xf0ffff : Fail (nr samples: 1)
0x1f0ffff : Fail (nr samples: 1)
0x7f0ffff : Fail (nr samples: 0)
0x8f0ffff : Ok
0x17f0ffff : Ok
IBS sample period constraint tests:
-----------------------------------
Fetch PMU test:
freq 0, sample_freq 0: Ok
freq 0, sample_freq 1: Fail
freq 0, sample_freq 15: Fail
freq 0, sample_freq 16: Ok (nr samples: 1604)
freq 0, sample_freq 17: Ok (nr samples: 1604)
freq 0, sample_freq 143: Ok (nr samples: 1604)
freq 0, sample_freq 144: Ok (nr samples: 1604)
freq 0, sample_freq 145: Ok (nr samples: 1604)
freq 0, sample_freq 1234: Ok (nr samples: 1566)
freq 0, sample_freq 4103: Ok (nr samples: 1119)
freq 0, sample_freq 65520: Ok (nr samples: 2264)
freq 0, sample_freq 65535: Ok (nr samples: 2263)
freq 0, sample_freq 65552: Ok (nr samples: 1166)
freq 0, sample_freq 8388607: Ok (nr samples: 268)
freq 0, sample_freq 268435455: Ok (nr samples: 8)
freq 1, sample_freq 0: Ok
freq 1, sample_freq 1: Ok (nr samples: 4)
freq 1, sample_freq 15: Ok (nr samples: 4)
freq 1, sample_freq 16: Ok (nr samples: 4)
freq 1, sample_freq 17: Ok (nr samples: 4)
freq 1, sample_freq 143: Ok (nr samples: 5)
freq 1, sample_freq 144: Ok (nr samples: 5)
freq 1, sample_freq 145: Ok (nr samples: 5)
freq 1, sample_freq 1234: Ok (nr samples: 7)
freq 1, sample_freq 4103: Ok (nr samples: 35)
freq 1, sample_freq 65520: Ok (nr samples: 642)
freq 1, sample_freq 65535: Ok (nr samples: 636)
freq 1, sample_freq 65552: Ok (nr samples: 651)
freq 1, sample_freq 8388607: Ok
Op PMU test:
freq 0, sample_freq 0: Ok
freq 0, sample_freq 1: Fail
freq 0, sample_freq 15: Fail
freq 0, sample_freq 16: Fail
freq 0, sample_freq 17: Fail
freq 0, sample_freq 143: Fail
freq 0, sample_freq 144: Ok (nr samples: 1604)
freq 0, sample_freq 145: Ok (nr samples: 1604)
freq 0, sample_freq 1234: Ok (nr samples: 1604)
freq 0, sample_freq 4103: Ok (nr samples: 1604)
freq 0, sample_freq 65520: Ok (nr samples: 2227)
freq 0, sample_freq 65535: Ok (nr samples: 2296)
freq 0, sample_freq 65552: Ok (nr samples: 2213)
freq 0, sample_freq 8388607: Ok (nr samples: 250)
freq 0, sample_freq 268435455: Ok (nr samples: 8)
freq 1, sample_freq 0: Ok
freq 1, sample_freq 1: Fail (nr samples: 4)
freq 1, sample_freq 15: Fail (nr samples: 4)
freq 1, sample_freq 16: Fail (nr samples: 4)
freq 1, sample_freq 17: Fail (nr samples: 4)
freq 1, sample_freq 143: Fail (nr samples: 5)
freq 1, sample_freq 144: Fail (nr samples: 5)
freq 1, sample_freq 145: Fail (nr samples: 5)
freq 1, sample_freq 1234: Fail (nr samples: 8)
freq 1, sample_freq 4103: Fail (nr samples: 33)
freq 1, sample_freq 65520: Fail (nr samples: 546)
freq 1, sample_freq 65535: Fail (nr samples: 544)
freq 1, sample_freq 65552: Fail (nr samples: 555)
freq 1, sample_freq 8388607: Ok
IBS ioctl() tests:
------------------
Fetch PMU tests
ioctl(period = 0x0 ): Ok
ioctl(period = 0x1 ): Fail
ioctl(period = 0xf ): Fail
ioctl(period = 0x10 ): Ok
ioctl(period = 0x11 ): Fail
ioctl(period = 0x1f ): Fail
ioctl(period = 0x20 ): Ok
ioctl(period = 0x80 ): Ok
ioctl(period = 0x8f ): Fail
ioctl(period = 0x90 ): Ok
ioctl(period = 0x91 ): Fail
ioctl(period = 0x100 ): Ok
ioctl(period = 0xfff0 ): Ok
ioctl(period = 0xffff ): Fail
ioctl(period = 0x10000 ): Ok
ioctl(period = 0x1fff0 ): Ok
ioctl(period = 0x1fff5 ): Fail
ioctl(freq = 0x0 ): Ok
ioctl(freq = 0x1 ): Ok
ioctl(freq = 0xf ): Ok
ioctl(freq = 0x10 ): Ok
ioctl(freq = 0x11 ): Ok
ioctl(freq = 0x1f ): Ok
ioctl(freq = 0x20 ): Ok
ioctl(freq = 0x80 ): Ok
ioctl(freq = 0x8f ): Ok
ioctl(freq = 0x90 ): Ok
ioctl(freq = 0x91 ): Ok
ioctl(freq = 0x100 ): Ok
Op PMU tests
ioctl(period = 0x0 ): Ok
ioctl(period = 0x1 ): Fail
ioctl(period = 0xf ): Fail
ioctl(period = 0x10 ): Fail
ioctl(period = 0x11 ): Fail
ioctl(period = 0x1f ): Fail
ioctl(period = 0x20 ): Fail
ioctl(period = 0x80 ): Fail
ioctl(period = 0x8f ): Fail
ioctl(period = 0x90 ): Ok
ioctl(period = 0x91 ): Fail
ioctl(period = 0x100 ): Ok
ioctl(period = 0xfff0 ): Ok
ioctl(period = 0xffff ): Fail
ioctl(period = 0x10000 ): Ok
ioctl(period = 0x1fff0 ): Ok
ioctl(period = 0x1fff5 ): Fail
ioctl(freq = 0x0 ): Ok
ioctl(freq = 0x1 ): Ok
ioctl(freq = 0xf ): Ok
ioctl(freq = 0x10 ): Ok
ioctl(freq = 0x11 ): Ok
ioctl(freq = 0x1f ): Ok
ioctl(freq = 0x20 ): Ok
ioctl(freq = 0x80 ): Ok
ioctl(freq = 0x8f ): Ok
ioctl(freq = 0x90 ): Ok
ioctl(freq = 0x91 ): Ok
ioctl(freq = 0x100 ): Ok
IBS freq (negative) tests:
--------------------------
freq 1, sample_freq 200000: Fail
IBS L3MissOnly test: (takes a while)
--------------------
Fetch L3MissOnly: Fail (nr_samples: 1213)
Op L3MissOnly: Ok (nr_samples: 1193)
---- end(-1) ----
112: AMD IBS sample period : FAILED!
With kernel fixes:
$ sudo ./perf test -vv 112
112: AMD IBS sample period:
--- start ---
test child forked, pid 6939
Using CPUID AuthenticAMD-26-2-1
IBS config tests:
-----------------
Fetch PMU tests:
0xffff : Ok (nr samples: 969)
0x1000 : Ok (nr samples: 15540)
0xff : Ok (nr samples: 40555)
0x1 : Ok (nr samples: 40543)
0x0 : Ok
0x10000 : Ok
Op PMU tests:
0x0 : Ok
0x1 : Ok
0x8 : Ok
0x9 : Ok (nr samples: 40543)
0xf : Ok (nr samples: 40543)
0x1000 : Ok (nr samples: 19156)
0xffff : Ok (nr samples: 1169)
0x10000 : Ok
0x100000 : Ok (nr samples: 1151)
0xf00000 : Ok (nr samples: 76)
0xf0ffff : Ok (nr samples: 73)
0x1f0ffff : Ok (nr samples: 33)
0x7f0ffff : Ok (nr samples: 10)
0x8f0ffff : Ok
0x17f0ffff : Ok
IBS sample period constraint tests:
-----------------------------------
Fetch PMU test:
freq 0, sample_freq 0: Ok
freq 0, sample_freq 1: Ok
freq 0, sample_freq 15: Ok
freq 0, sample_freq 16: Ok (nr samples: 1203)
freq 0, sample_freq 17: Ok (nr samples: 1604)
freq 0, sample_freq 143: Ok (nr samples: 1604)
freq 0, sample_freq 144: Ok (nr samples: 1604)
freq 0, sample_freq 145: Ok (nr samples: 1604)
freq 0, sample_freq 1234: Ok (nr samples: 1604)
freq 0, sample_freq 4103: Ok (nr samples: 1343)
freq 0, sample_freq 65520: Ok (nr samples: 2254)
freq 0, sample_freq 65535: Ok (nr samples: 2136)
freq 0, sample_freq 65552: Ok (nr samples: 1158)
freq 0, sample_freq 8388607: Ok (nr samples: 257)
freq 0, sample_freq 268435455: Ok (nr samples: 8)
freq 1, sample_freq 0: Ok
freq 1, sample_freq 1: Ok (nr samples: 4)
freq 1, sample_freq 15: Ok (nr samples: 4)
freq 1, sample_freq 16: Ok (nr samples: 4)
freq 1, sample_freq 17: Ok (nr samples: 4)
freq 1, sample_freq 143: Ok (nr samples: 5)
freq 1, sample_freq 144: Ok (nr samples: 5)
freq 1, sample_freq 145: Ok (nr samples: 5)
freq 1, sample_freq 1234: Ok (nr samples: 8)
freq 1, sample_freq 4103: Ok (nr samples: 34)
freq 1, sample_freq 65520: Ok (nr samples: 458)
freq 1, sample_freq 65535: Ok (nr samples: 628)
freq 1, sample_freq 65552: Ok (nr samples: 396)
freq 1, sample_freq 8388607: Ok
Op PMU test:
freq 0, sample_freq 0: Ok
freq 0, sample_freq 1: Ok
freq 0, sample_freq 15: Ok
freq 0, sample_freq 16: Ok
freq 0, sample_freq 17: Ok
freq 0, sample_freq 143: Ok
freq 0, sample_freq 144: Ok (nr samples: 1604)
freq 0, sample_freq 145: Ok (nr samples: 1604)
freq 0, sample_freq 1234: Ok (nr samples: 1604)
freq 0, sample_freq 4103: Ok (nr samples: 1604)
freq 0, sample_freq 65520: Ok (nr samples: 2250)
freq 0, sample_freq 65535: Ok (nr samples: 2158)
freq 0, sample_freq 65552: Ok (nr samples: 2296)
freq 0, sample_freq 8388607: Ok (nr samples: 243)
freq 0, sample_freq 268435455: Ok (nr samples: 6)
freq 1, sample_freq 0: Ok
freq 1, sample_freq 1: Ok (nr samples: 4)
freq 1, sample_freq 15: Ok (nr samples: 4)
freq 1, sample_freq 16: Ok (nr samples: 4)
freq 1, sample_freq 17: Ok (nr samples: 4)
freq 1, sample_freq 143: Ok (nr samples: 4)
freq 1, sample_freq 144: Ok (nr samples: 5)
freq 1, sample_freq 145: Ok (nr samples: 4)
freq 1, sample_freq 1234: Ok (nr samples: 6)
freq 1, sample_freq 4103: Ok (nr samples: 27)
freq 1, sample_freq 65520: Ok (nr samples: 542)
freq 1, sample_freq 65535: Ok (nr samples: 550)
freq 1, sample_freq 65552: Ok (nr samples: 552)
freq 1, sample_freq 8388607: Ok
IBS ioctl() tests:
------------------
Fetch PMU tests
ioctl(period = 0x0 ): Ok
ioctl(period = 0x1 ): Ok
ioctl(period = 0xf ): Ok
ioctl(period = 0x10 ): Ok
ioctl(period = 0x11 ): Ok
ioctl(period = 0x1f ): Ok
ioctl(period = 0x20 ): Ok
ioctl(period = 0x80 ): Ok
ioctl(period = 0x8f ): Ok
ioctl(period = 0x90 ): Ok
ioctl(period = 0x91 ): Ok
ioctl(period = 0x100 ): Ok
ioctl(period = 0xfff0 ): Ok
ioctl(period = 0xffff ): Ok
ioctl(period = 0x10000 ): Ok
ioctl(period = 0x1fff0 ): Ok
ioctl(period = 0x1fff5 ): Ok
ioctl(freq = 0x0 ): Ok
ioctl(freq = 0x1 ): Ok
ioctl(freq = 0xf ): Ok
ioctl(freq = 0x10 ): Ok
ioctl(freq = 0x11 ): Ok
ioctl(freq = 0x1f ): Ok
ioctl(freq = 0x20 ): Ok
ioctl(freq = 0x80 ): Ok
ioctl(freq = 0x8f ): Ok
ioctl(freq = 0x90 ): Ok
ioctl(freq = 0x91 ): Ok
ioctl(freq = 0x100 ): Ok
Op PMU tests
ioctl(period = 0x0 ): Ok
ioctl(period = 0x1 ): Ok
ioctl(period = 0xf ): Ok
ioctl(period = 0x10 ): Ok
ioctl(period = 0x11 ): Ok
ioctl(period = 0x1f ): Ok
ioctl(period = 0x20 ): Ok
ioctl(period = 0x80 ): Ok
ioctl(period = 0x8f ): Ok
ioctl(period = 0x90 ): Ok
ioctl(period = 0x91 ): Ok
ioctl(period = 0x100 ): Ok
ioctl(period = 0xfff0 ): Ok
ioctl(period = 0xffff ): Ok
ioctl(period = 0x10000 ): Ok
ioctl(period = 0x1fff0 ): Ok
ioctl(period = 0x1fff5 ): Ok
ioctl(freq = 0x0 ): Ok
ioctl(freq = 0x1 ): Ok
ioctl(freq = 0xf ): Ok
ioctl(freq = 0x10 ): Ok
ioctl(freq = 0x11 ): Ok
ioctl(freq = 0x1f ): Ok
ioctl(freq = 0x20 ): Ok
ioctl(freq = 0x80 ): Ok
ioctl(freq = 0x8f ): Ok
ioctl(freq = 0x90 ): Ok
ioctl(freq = 0x91 ): Ok
ioctl(freq = 0x100 ): Ok
IBS freq (negative) tests:
--------------------------
freq 1, sample_freq 200000: Ok
IBS L3MissOnly test: (takes a while)
--------------------
Fetch L3MissOnly: Ok (nr_samples: 1301)
Op L3MissOnly: Ok (nr_samples: 1590)
---- end(0) ----
112: AMD IBS sample period : Ok
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
---
tools/perf/arch/x86/include/arch-tests.h | 1 +
tools/perf/arch/x86/tests/Build | 1 +
tools/perf/arch/x86/tests/amd-ibs-period.c | 1001 ++++++++++++++++++++
tools/perf/arch/x86/tests/arch-tests.c | 2 +
4 files changed, 1005 insertions(+)
create mode 100644 tools/perf/arch/x86/tests/amd-ibs-period.c
diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
index c0421a26b875..4fd425157d7d 100644
--- a/tools/perf/arch/x86/include/arch-tests.h
+++ b/tools/perf/arch/x86/include/arch-tests.h
@@ -14,6 +14,7 @@ int test__intel_pt_hybrid_compat(struct test_suite *test, int subtest);
int test__bp_modify(struct test_suite *test, int subtest);
int test__x86_sample_parsing(struct test_suite *test, int subtest);
int test__amd_ibs_via_core_pmu(struct test_suite *test, int subtest);
+int test__amd_ibs_period(struct test_suite *test, int subtest);
int test__hybrid(struct test_suite *test, int subtest);
extern struct test_suite *arch_tests[];
diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
index 86262c720857..5e00cbfd2d56 100644
--- a/tools/perf/arch/x86/tests/Build
+++ b/tools/perf/arch/x86/tests/Build
@@ -10,6 +10,7 @@ perf-test-$(CONFIG_AUXTRACE) += insn-x86.o
endif
perf-test-$(CONFIG_X86_64) += bp-modify.o
perf-test-y += amd-ibs-via-core-pmu.o
+perf-test-y += amd-ibs-period.o
ifdef SHELLCHECK
SHELL_TESTS := gen-insn-x86-dat.sh
diff --git a/tools/perf/arch/x86/tests/amd-ibs-period.c b/tools/perf/arch/x86/tests/amd-ibs-period.c
new file mode 100644
index 000000000000..0cf3656e4b9b
--- /dev/null
+++ b/tools/perf/arch/x86/tests/amd-ibs-period.c
@@ -0,0 +1,1001 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <sched.h>
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <string.h>
+
+#include "arch-tests.h"
+#include "linux/perf_event.h"
+#include "linux/zalloc.h"
+#include "tests/tests.h"
+#include "../perf-sys.h"
+#include "pmu.h"
+#include "pmus.h"
+#include "debug.h"
+#include "util.h"
+#include "strbuf.h"
+#include "../util/env.h"
+
+#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+
+#define PERF_MMAP_DATA_PAGES 32L
+#define PERF_MMAP_DATA_SIZE (PERF_MMAP_DATA_PAGES * PAGE_SIZE)
+#define PERF_MMAP_DATA_MASK (PERF_MMAP_DATA_SIZE - 1)
+#define PERF_MMAP_TOTAL_PAGES (PERF_MMAP_DATA_PAGES + 1)
+#define PERF_MMAP_TOTAL_SIZE (PERF_MMAP_TOTAL_PAGES * PAGE_SIZE)
+
+#define rmb() asm volatile("lfence":::"memory")
+
+enum {
+ FD_ERROR,
+ FD_SUCCESS,
+};
+
+enum {
+ IBS_FETCH,
+ IBS_OP,
+};
+
+struct perf_pmu *fetch_pmu;
+struct perf_pmu *op_pmu;
+unsigned int perf_event_max_sample_rate;
+
+/* Dummy workload to generate IBS samples. */
+static int dummy_workload_1(unsigned long count)
+{
+ int (*func)(void);
+ int ret = 0;
+ char *p;
+ char insn1[] = {
+ 0xb8, 0x01, 0x00, 0x00, 0x00, /* mov 1,%eax */
+ 0xc3, /* ret */
+ 0xcc, /* int 3 */
+ };
+
+ char insn2[] = {
+ 0xb8, 0x02, 0x00, 0x00, 0x00, /* mov 2,%eax */
+ 0xc3, /* ret */
+ 0xcc, /* int 3 */
+ };
+
+ p = zalloc(2 * PAGE_SIZE);
+ if (!p) {
+ printf("malloc() failed. %m");
+ return 1;
+ }
+
+ func = (void *)((unsigned long)(p + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1));
+
+ ret = mprotect(func, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC);
+ if (ret) {
+ printf("mprotect() failed. %m");
+ goto out;
+ }
+
+ if (count < 100000)
+ count = 100000;
+ else if (count > 10000000)
+ count = 10000000;
+ while (count--) {
+ memcpy(func, insn1, sizeof(insn1));
+ if (func() != 1) {
+ pr_debug("ERROR insn1\n");
+ ret = -1;
+ goto out;
+ }
+ memcpy(func, insn2, sizeof(insn2));
+ if (func() != 2) {
+ pr_debug("ERROR insn2\n");
+ ret = -1;
+ goto out;
+ }
+ }
+
+out:
+ free(p);
+ return ret;
+}
+
+/* Another dummy workload to generate IBS samples. */
+static void dummy_workload_2(char *perf)
+{
+ char bench[] = " bench sched messaging -g 10 -l 5000 > /dev/null 2>&1";
+ char taskset[] = "taskset -c 0 ";
+ int ret __maybe_unused;
+ struct strbuf sb;
+ char *cmd;
+
+ strbuf_init(&sb, 0);
+ strbuf_add(&sb, taskset, strlen(taskset));
+ strbuf_add(&sb, perf, strlen(perf));
+ strbuf_add(&sb, bench, strlen(bench));
+ cmd = strbuf_detach(&sb, NULL);
+ ret = system(cmd);
+ free(cmd);
+}
+
+static int sched_affine(int cpu)
+{
+ cpu_set_t set;
+
+ CPU_ZERO(&set);
+ CPU_SET(cpu, &set);
+ if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
+ pr_debug("sched_setaffinity() failed. [%m]");
+ return -1;
+ }
+ return 0;
+}
+
+static void
+copy_sample_data(void *src, unsigned long offset, void *dest, size_t size)
+{
+ size_t chunk1_size, chunk2_size;
+
+ if ((offset + size) < (size_t)PERF_MMAP_DATA_SIZE) {
+ memcpy(dest, src + offset, size);
+ } else {
+ chunk1_size = PERF_MMAP_DATA_SIZE - offset;
+ chunk2_size = size - chunk1_size;
+
+ memcpy(dest, src + offset, chunk1_size);
+ memcpy(dest + chunk1_size, src, chunk2_size);
+ }
+}
+
+static int rb_read(struct perf_event_mmap_page *rb, void *dest, size_t size)
+{
+ void *base;
+ unsigned long data_tail, data_head;
+
+ /* Casting to (void *) is needed. */
+ base = (void *)rb + PAGE_SIZE;
+
+ data_head = rb->data_head;
+ rmb();
+ data_tail = rb->data_tail;
+
+ if ((data_head - data_tail) < size)
+ return -1;
+
+ data_tail &= PERF_MMAP_DATA_MASK;
+ copy_sample_data(base, data_tail, dest, size);
+ rb->data_tail += size;
+ return 0;
+}
+
+static void rb_skip(struct perf_event_mmap_page *rb, size_t size)
+{
+ size_t data_head = rb->data_head;
+
+ rmb();
+
+ if ((rb->data_tail + size) > data_head)
+ rb->data_tail = data_head;
+ else
+ rb->data_tail += size;
+}
+
+/* Sample period value taken from perf sample must match with expected value. */
+static int period_equal(unsigned long exp_period, unsigned long act_period)
+{
+ return exp_period == act_period ? 0 : -1;
+}
+
+/*
+ * Sample period value taken from perf sample must be >= minimum sample period
+ * supported by IBS HW.
+ */
+static int period_higher(unsigned long min_period, unsigned long act_period)
+{
+ return min_period <= act_period ? 0 : -1;
+}
+
+static int rb_drain_samples(struct perf_event_mmap_page *rb,
+ unsigned long exp_period,
+ int *nr_samples,
+ int (*callback)(unsigned long, unsigned long))
+{
+ struct perf_event_header hdr;
+ unsigned long period;
+ int ret = 0;
+
+ /*
+ * PERF_RECORD_SAMPLE:
+ * struct {
+ * struct perf_event_header hdr;
+ * { u64 period; } && PERF_SAMPLE_PERIOD
+ * };
+ */
+ while (1) {
+ if (rb_read(rb, &hdr, sizeof(hdr)))
+ return ret;
+
+ if (hdr.type == PERF_RECORD_SAMPLE) {
+ (*nr_samples)++;
+ period = 0;
+ if (rb_read(rb, &period, sizeof(period)))
+ pr_debug("rb_read(period) error. [%m]");
+ ret |= callback(exp_period, period);
+ } else {
+ rb_skip(rb, hdr.size - sizeof(hdr));
+ }
+ }
+ return ret;
+}
+
+static long perf_event_open(struct perf_event_attr *attr, pid_t pid,
+ int cpu, int group_fd, unsigned long flags)
+{
+ return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+}
+
+static void fetch_prepare_attr(struct perf_event_attr *attr,
+ unsigned long long config, int freq,
+ unsigned long sample_period)
+{
+ memset(attr, 0, sizeof(struct perf_event_attr));
+
+ attr->type = fetch_pmu->type;
+ attr->size = sizeof(struct perf_event_attr);
+ attr->config = config;
+ attr->disabled = 1;
+ attr->sample_type = PERF_SAMPLE_PERIOD;
+ attr->freq = freq;
+ attr->sample_period = sample_period; /* = ->sample_freq */
+}
+
+static void op_prepare_attr(struct perf_event_attr *attr,
+ unsigned long config, int freq,
+ unsigned long sample_period)
+{
+ memset(attr, 0, sizeof(struct perf_event_attr));
+
+ attr->type = op_pmu->type;
+ attr->size = sizeof(struct perf_event_attr);
+ attr->config = config;
+ attr->disabled = 1;
+ attr->sample_type = PERF_SAMPLE_PERIOD;
+ attr->freq = freq;
+ attr->sample_period = sample_period; /* = ->sample_freq */
+}
+
+struct ibs_configs {
+ /* Input */
+ unsigned long config;
+
+ /* Expected output */
+ unsigned long period;
+ int fd;
+};
+
+/*
+ * Somehow first Fetch event with sample period = 0x10 causes 0
+ * samples. So start with large period and decrease it gradually.
+ */
+struct ibs_configs fetch_configs[] = {
+ { .config = 0xffff, .period = 0xffff0, .fd = FD_SUCCESS },
+ { .config = 0x1000, .period = 0x10000, .fd = FD_SUCCESS },
+ { .config = 0xff, .period = 0xff0, .fd = FD_SUCCESS },
+ { .config = 0x1, .period = 0x10, .fd = FD_SUCCESS },
+ { .config = 0x0, .period = -1, .fd = FD_ERROR },
+ { .config = 0x10000, .period = -1, .fd = FD_ERROR },
+};
+
+struct ibs_configs op_configs[] = {
+ { .config = 0x0, .period = -1, .fd = FD_ERROR },
+ { .config = 0x1, .period = -1, .fd = FD_ERROR },
+ { .config = 0x8, .period = -1, .fd = FD_ERROR },
+ { .config = 0x9, .period = 0x90, .fd = FD_SUCCESS },
+ { .config = 0xf, .period = 0xf0, .fd = FD_SUCCESS },
+ { .config = 0x1000, .period = 0x10000, .fd = FD_SUCCESS },
+ { .config = 0xffff, .period = 0xffff0, .fd = FD_SUCCESS },
+ { .config = 0x10000, .period = -1, .fd = FD_ERROR },
+ { .config = 0x100000, .period = 0x100000, .fd = FD_SUCCESS },
+ { .config = 0xf00000, .period = 0xf00000, .fd = FD_SUCCESS },
+ { .config = 0xf0ffff, .period = 0xfffff0, .fd = FD_SUCCESS },
+ { .config = 0x1f0ffff, .period = 0x1fffff0, .fd = FD_SUCCESS },
+ { .config = 0x7f0ffff, .period = 0x7fffff0, .fd = FD_SUCCESS },
+ { .config = 0x8f0ffff, .period = -1, .fd = FD_ERROR },
+ { .config = 0x17f0ffff, .period = -1, .fd = FD_ERROR },
+};
+
+static int __ibs_config_test(int ibs_type, struct ibs_configs *config, int *nr_samples)
+{
+ struct perf_event_attr attr;
+ int fd, i;
+ void *rb;
+ int ret = 0;
+
+ if (ibs_type == IBS_FETCH)
+ fetch_prepare_attr(&attr, config->config, 0, 0);
+ else
+ op_prepare_attr(&attr, config->config, 0, 0);
+
+ /* CPU0, All processes */
+ fd = perf_event_open(&attr, -1, 0, -1, 0);
+ if (config->fd == FD_ERROR) {
+ if (fd != -1) {
+ close(fd);
+ return -1;
+ }
+ return 0;
+ }
+ if (fd <= -1)
+ return -1;
+
+ rb = mmap(NULL, PERF_MMAP_TOTAL_SIZE, PROT_READ | PROT_WRITE,
+ MAP_SHARED, fd, 0);
+ if (rb == MAP_FAILED) {
+ pr_debug("mmap() failed. [%m]\n");
+ return -1;
+ }
+
+ ioctl(fd, PERF_EVENT_IOC_RESET, 0);
+ ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
+
+ i = 5;
+ while (i--) {
+ dummy_workload_1(1000000);
+
+ ret = rb_drain_samples(rb, config->period, nr_samples,
+ period_equal);
+ if (ret)
+ break;
+ }
+
+ ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
+ munmap(rb, PERF_MMAP_TOTAL_SIZE);
+ close(fd);
+ return ret;
+}
+
+static int ibs_config_test(void)
+{
+ int nr_samples = 0;
+ unsigned long i;
+ int ret = 0;
+ int r;
+
+ pr_debug("\nIBS config tests:\n");
+ pr_debug("-----------------\n");
+
+ pr_debug("Fetch PMU tests:\n");
+ for (i = 0; i < ARRAY_SIZE(fetch_configs); i++) {
+ nr_samples = 0;
+ r = __ibs_config_test(IBS_FETCH, &(fetch_configs[i]), &nr_samples);
+
+ if (fetch_configs[i].fd == FD_ERROR) {
+ pr_debug("0x%-16lx: %-4s\n", fetch_configs[i].config,
+ !r ? "Ok" : "Fail");
+ } else {
+ /*
+ * Although nr_samples == 0 is reported as Fail here,
+ * the failure status is not cascaded up because, we
+ * can not decide whether test really failed or not
+ * without actual samples.
+ */
+ pr_debug("0x%-16lx: %-4s (nr samples: %d)\n", fetch_configs[i].config,
+ (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
+ }
+
+ ret |= r;
+ }
+
+ pr_debug("Op PMU tests:\n");
+ for (i = 0; i < ARRAY_SIZE(op_configs); i++) {
+ nr_samples = 0;
+ r = __ibs_config_test(IBS_OP, &(op_configs[i]), &nr_samples);
+
+ if (op_configs[i].fd == FD_ERROR) {
+ pr_debug("0x%-16lx: %-4s\n", op_configs[i].config,
+ !r ? "Ok" : "Fail");
+ } else {
+ /*
+ * Although nr_samples == 0 is reported as Fail here,
+ * the failure status is not cascaded up because, we
+ * can not decide whether test really failed or not
+ * without actual samples.
+ */
+ pr_debug("0x%-16lx: %-4s (nr samples: %d)\n", op_configs[i].config,
+ (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
+ }
+
+ ret |= r;
+ }
+
+ return ret;
+}
+
+struct ibs_period {
+ /* Input */
+ int freq;
+ unsigned long sample_freq;
+
+ /* Output */
+ int ret;
+ unsigned long period;
+};
+
+struct ibs_period fetch_period[] = {
+ { .freq = 0, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 1, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 0xf, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 0x10, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 0, .sample_freq = 0x11, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 0, .sample_freq = 0x8f, .ret = FD_SUCCESS, .period = 0x80 },
+ { .freq = 0, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 0, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 0, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x4d0 },
+ { .freq = 0, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x1000 },
+ { .freq = 0, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0xfff0 },
+ { .freq = 0, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0xfff0 },
+ { .freq = 0, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x10010 },
+ { .freq = 0, .sample_freq = 0x7fffff, .ret = FD_SUCCESS, .period = 0x7ffff0 },
+ { .freq = 0, .sample_freq = 0xfffffff, .ret = FD_SUCCESS, .period = 0xffffff0 },
+ { .freq = 1, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
+ { .freq = 1, .sample_freq = 1, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0xf, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x10, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x11, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x8f, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0x10 },
+ { .freq = 1, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x10 },
+ /* ret=FD_ERROR because freq > default perf_event_max_sample_rate (100000) */
+ { .freq = 1, .sample_freq = 0x7fffff, .ret = FD_ERROR, .period = -1 },
+};
+
+struct ibs_period op_period[] = {
+ { .freq = 0, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 1, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 0xf, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 0x10, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 0x11, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 0x8f, .ret = FD_ERROR, .period = -1 },
+ { .freq = 0, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 0, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 0, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x4d0 },
+ { .freq = 0, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x1000 },
+ { .freq = 0, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0xfff0 },
+ { .freq = 0, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0xfff0 },
+ { .freq = 0, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x10010 },
+ { .freq = 0, .sample_freq = 0x7fffff, .ret = FD_SUCCESS, .period = 0x7ffff0 },
+ { .freq = 0, .sample_freq = 0xfffffff, .ret = FD_SUCCESS, .period = 0xffffff0 },
+ { .freq = 1, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
+ { .freq = 1, .sample_freq = 1, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0xf, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x10, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x11, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x8f, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0x90 },
+ { .freq = 1, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x90 },
+ /* ret=FD_ERROR because freq > default perf_event_max_sample_rate (100000) */
+ { .freq = 1, .sample_freq = 0x7fffff, .ret = FD_ERROR, .period = -1 },
+};
+
+static int __ibs_period_constraint_test(int ibs_type, struct ibs_period *period,
+ int *nr_samples)
+{
+ struct perf_event_attr attr;
+ int ret = 0;
+ void *rb;
+ int fd;
+
+ if (period->freq && period->sample_freq > perf_event_max_sample_rate)
+ period->ret = FD_ERROR;
+
+ if (ibs_type == IBS_FETCH)
+ fetch_prepare_attr(&attr, 0, period->freq, period->sample_freq);
+ else
+ op_prepare_attr(&attr, 0, period->freq, period->sample_freq);
+
+ /* CPU0, All processes */
+ fd = perf_event_open(&attr, -1, 0, -1, 0);
+ if (period->ret == FD_ERROR) {
+ if (fd != -1) {
+ close(fd);
+ return -1;
+ }
+ return 0;
+ }
+ if (fd <= -1)
+ return -1;
+
+ rb = mmap(NULL, PERF_MMAP_TOTAL_SIZE, PROT_READ | PROT_WRITE,
+ MAP_SHARED, fd, 0);
+ if (rb == MAP_FAILED) {
+ pr_debug("mmap() failed. [%m]\n");
+ close(fd);
+ return -1;
+ }
+
+ ioctl(fd, PERF_EVENT_IOC_RESET, 0);
+ ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
+
+ if (period->freq) {
+ dummy_workload_1(100000);
+ ret = rb_drain_samples(rb, period->period, nr_samples,
+ period_higher);
+ } else {
+ dummy_workload_1(period->sample_freq * 10);
+ ret = rb_drain_samples(rb, period->period, nr_samples,
+ period_equal);
+ }
+
+ ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
+ munmap(rb, PERF_MMAP_TOTAL_SIZE);
+ close(fd);
+ return ret;
+}
+
+static int ibs_period_constraint_test(void)
+{
+ unsigned long i;
+ int nr_samples;
+ int ret = 0;
+ int r;
+
+ pr_debug("\nIBS sample period constraint tests:\n");
+ pr_debug("-----------------------------------\n");
+
+ pr_debug("Fetch PMU test:\n");
+ for (i = 0; i < ARRAY_SIZE(fetch_period); i++) {
+ nr_samples = 0;
+ r = __ibs_period_constraint_test(IBS_FETCH, &fetch_period[i],
+ &nr_samples);
+
+ if (fetch_period[i].ret == FD_ERROR) {
+ pr_debug("freq %d, sample_freq %9ld: %-4s\n",
+ fetch_period[i].freq, fetch_period[i].sample_freq,
+ !r ? "Ok" : "Fail");
+ } else {
+ /*
+ * Although nr_samples == 0 is reported as Fail here,
+ * the failure status is not cascaded up because, we
+ * can not decide whether test really failed or not
+ * without actual samples.
+ */
+ pr_debug("freq %d, sample_freq %9ld: %-4s (nr samples: %d)\n",
+ fetch_period[i].freq, fetch_period[i].sample_freq,
+ (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
+ }
+ ret |= r;
+ }
+
+ pr_debug("Op PMU test:\n");
+ for (i = 0; i < ARRAY_SIZE(op_period); i++) {
+ nr_samples = 0;
+ r = __ibs_period_constraint_test(IBS_OP, &op_period[i],
+ &nr_samples);
+
+ if (op_period[i].ret == FD_ERROR) {
+ pr_debug("freq %d, sample_freq %9ld: %-4s\n",
+ op_period[i].freq, op_period[i].sample_freq,
+ !r ? "Ok" : "Fail");
+ } else {
+ /*
+ * Although nr_samples == 0 is reported as Fail here,
+ * the failure status is not cascaded up because, we
+ * can not decide whether test really failed or not
+ * without actual samples.
+ */
+ pr_debug("freq %d, sample_freq %9ld: %-4s (nr samples: %d)\n",
+ op_period[i].freq, op_period[i].sample_freq,
+ (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
+ }
+ ret |= r;
+ }
+
+ return ret;
+}
+
+struct ibs_ioctl {
+ /* Input */
+ int freq;
+ unsigned long period;
+
+ /* Expected output */
+ int ret;
+};
+
+struct ibs_ioctl fetch_ioctl[] = {
+ { .freq = 0, .period = 0x0, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x1, .ret = FD_ERROR },
+ { .freq = 0, .period = 0xf, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x10, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x11, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x1f, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x20, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x80, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x8f, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x90, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x91, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x100, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0xfff0, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0xffff, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x10000, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x1fff0, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x1fff5, .ret = FD_ERROR },
+ { .freq = 1, .period = 0x0, .ret = FD_ERROR },
+ { .freq = 1, .period = 0x1, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0xf, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x10, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x11, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x1f, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x20, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x80, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x8f, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x90, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x91, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x100, .ret = FD_SUCCESS },
+};
+
+struct ibs_ioctl op_ioctl[] = {
+ { .freq = 0, .period = 0x0, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x1, .ret = FD_ERROR },
+ { .freq = 0, .period = 0xf, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x10, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x11, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x1f, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x20, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x80, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x8f, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x90, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x91, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x100, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0xfff0, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0xffff, .ret = FD_ERROR },
+ { .freq = 0, .period = 0x10000, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x1fff0, .ret = FD_SUCCESS },
+ { .freq = 0, .period = 0x1fff5, .ret = FD_ERROR },
+ { .freq = 1, .period = 0x0, .ret = FD_ERROR },
+ { .freq = 1, .period = 0x1, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0xf, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x10, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x11, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x1f, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x20, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x80, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x8f, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x90, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x91, .ret = FD_SUCCESS },
+ { .freq = 1, .period = 0x100, .ret = FD_SUCCESS },
+};
+
+static int __ibs_ioctl_test(int ibs_type, struct ibs_ioctl *ibs_ioctl)
+{
+ struct perf_event_attr attr;
+ int ret = 0;
+ int fd;
+ int r;
+
+ if (ibs_type == IBS_FETCH)
+ fetch_prepare_attr(&attr, 0, ibs_ioctl->freq, 1000);
+ else
+ op_prepare_attr(&attr, 0, ibs_ioctl->freq, 1000);
+
+ /* CPU0, All processes */
+ fd = perf_event_open(&attr, -1, 0, -1, 0);
+ if (fd <= -1) {
+ pr_debug("event_open() Failed\n");
+ return -1;
+ }
+
+ r = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ibs_ioctl->period);
+ if ((ibs_ioctl->ret == FD_SUCCESS && r <= -1) ||
+ (ibs_ioctl->ret == FD_ERROR && r >= 0)) {
+ ret = -1;
+ }
+
+ close(fd);
+ return ret;
+}
+
+static int ibs_ioctl_test(void)
+{
+ unsigned long i;
+ int ret = 0;
+ int r;
+
+ pr_debug("\nIBS ioctl() tests:\n");
+ pr_debug("------------------\n");
+
+ pr_debug("Fetch PMU tests\n");
+ for (i = 0; i < ARRAY_SIZE(fetch_ioctl); i++) {
+ r = __ibs_ioctl_test(IBS_FETCH, &fetch_ioctl[i]);
+
+ pr_debug("ioctl(%s = 0x%-7lx): %s\n",
+ fetch_ioctl[i].freq ? "freq " : "period",
+ fetch_ioctl[i].period, r ? "Fail" : "Ok");
+ ret |= r;
+ }
+
+ pr_debug("Op PMU tests\n");
+ for (i = 0; i < ARRAY_SIZE(op_ioctl); i++) {
+ r = __ibs_ioctl_test(IBS_OP, &op_ioctl[i]);
+
+ pr_debug("ioctl(%s = 0x%-7lx): %s\n",
+ op_ioctl[i].freq ? "freq " : "period",
+ op_ioctl[i].period, r ? "Fail" : "Ok");
+ ret |= r;
+ }
+
+ return ret;
+}
+
+static int ibs_freq_neg_test(void)
+{
+ struct perf_event_attr attr;
+ int fd;
+
+ pr_debug("\nIBS freq (negative) tests:\n");
+ pr_debug("--------------------------\n");
+
+ /*
+ * Assuming perf_event_max_sample_rate <= 100000,
+ * config: 0x300D40 ==> MaxCnt: 200000
+ */
+ op_prepare_attr(&attr, 0x300D40, 1, 0);
+
+ /* CPU0, All processes */
+ fd = perf_event_open(&attr, -1, 0, -1, 0);
+ if (fd != -1) {
+ pr_debug("freq 1, sample_freq 200000: Fail\n");
+ close(fd);
+ return -1;
+ }
+
+ pr_debug("freq 1, sample_freq 200000: Ok\n");
+
+ return 0;
+}
+
+struct ibs_l3missonly {
+ /* Input */
+ int freq;
+ unsigned long sample_freq;
+
+ /* Expected output */
+ int ret;
+ unsigned long min_period;
+};
+
+struct ibs_l3missonly fetch_l3missonly = {
+ .freq = 1,
+ .sample_freq = 10000,
+ .ret = FD_SUCCESS,
+ .min_period = 0x10,
+};
+
+struct ibs_l3missonly op_l3missonly = {
+ .freq = 1,
+ .sample_freq = 10000,
+ .ret = FD_SUCCESS,
+ .min_period = 0x90,
+};
+
+static int __ibs_l3missonly_test(char *perf, int ibs_type, int *nr_samples,
+ struct ibs_l3missonly *l3missonly)
+{
+ struct perf_event_attr attr;
+ int ret = 0;
+ void *rb;
+ int fd;
+
+ if (l3missonly->sample_freq > perf_event_max_sample_rate)
+ l3missonly->ret = FD_ERROR;
+
+ if (ibs_type == IBS_FETCH) {
+ fetch_prepare_attr(&attr, 0x800000000000000UL, l3missonly->freq,
+ l3missonly->sample_freq);
+ } else {
+ op_prepare_attr(&attr, 0x10000, l3missonly->freq,
+ l3missonly->sample_freq);
+ }
+
+ /* CPU0, All processes */
+ fd = perf_event_open(&attr, -1, 0, -1, 0);
+ if (l3missonly->ret == FD_ERROR) {
+ if (fd != -1) {
+ close(fd);
+ return -1;
+ }
+ return 0;
+ }
+ if (fd == -1) {
+ pr_debug("perf_event_open() failed. [%m]\n");
+ return -1;
+ }
+
+ rb = mmap(NULL, PERF_MMAP_TOTAL_SIZE, PROT_READ | PROT_WRITE,
+ MAP_SHARED, fd, 0);
+ if (rb == MAP_FAILED) {
+ pr_debug("mmap() failed. [%m]\n");
+ close(fd);
+ return -1;
+ }
+
+ ioctl(fd, PERF_EVENT_IOC_RESET, 0);
+ ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
+
+ dummy_workload_2(perf);
+
+ ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
+
+ ret = rb_drain_samples(rb, l3missonly->min_period, nr_samples, period_higher);
+
+ munmap(rb, PERF_MMAP_TOTAL_SIZE);
+ close(fd);
+ return ret;
+}
+
+static int ibs_l3missonly_test(char *perf)
+{
+ int nr_samples = 0;
+ int ret = 0;
+ int r = 0;
+
+ pr_debug("\nIBS L3MissOnly test: (takes a while)\n");
+ pr_debug("--------------------\n");
+
+ if (perf_pmu__has_format(fetch_pmu, "l3missonly")) {
+ nr_samples = 0;
+ r = __ibs_l3missonly_test(perf, IBS_FETCH, &nr_samples, &fetch_l3missonly);
+ if (fetch_l3missonly.ret == FD_ERROR) {
+ pr_debug("Fetch L3MissOnly: %-4s\n", !r ? "Ok" : "Fail");
+ } else {
+ /*
+ * Although nr_samples == 0 is reported as Fail here,
+ * the failure status is not cascaded up because, we
+ * can not decide whether test really failed or not
+ * without actual samples.
+ */
+ pr_debug("Fetch L3MissOnly: %-4s (nr_samples: %d)\n",
+ (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
+ }
+ ret |= r;
+ }
+
+ if (perf_pmu__has_format(op_pmu, "l3missonly")) {
+ nr_samples = 0;
+ r = __ibs_l3missonly_test(perf, IBS_OP, &nr_samples, &op_l3missonly);
+ if (op_l3missonly.ret == FD_ERROR) {
+ pr_debug("Op L3MissOnly: %-4s\n", !r ? "Ok" : "Fail");
+ } else {
+ /*
+ * Although nr_samples == 0 is reported as Fail here,
+ * the failure status is not cascaded up because, we
+ * can not decide whether test really failed or not
+ * without actual samples.
+ */
+ pr_debug("Op L3MissOnly: %-4s (nr_samples: %d)\n",
+ (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
+ }
+ ret |= r;
+ }
+
+ return ret;
+}
+
+static unsigned int get_perf_event_max_sample_rate(void)
+{
+ unsigned int max_sample_rate = 100000;
+ FILE *fp;
+ int ret;
+
+ fp = fopen("/proc/sys/kernel/perf_event_max_sample_rate", "r");
+ if (!fp) {
+ pr_debug("Can't open perf_event_max_sample_rate. Asssuming %d\n",
+ max_sample_rate);
+ goto out;
+ }
+
+ ret = fscanf(fp, "%d", &max_sample_rate);
+ if (ret == EOF) {
+ pr_debug("Can't read perf_event_max_sample_rate. Assuming 100000\n");
+ max_sample_rate = 100000;
+ }
+ fclose(fp);
+
+out:
+ return max_sample_rate;
+}
+
+int test__amd_ibs_period(struct test_suite *test __maybe_unused,
+ int subtest __maybe_unused)
+{
+ char perf[PATH_MAX] = {'\0'};
+ int ret = TEST_OK;
+
+ /*
+ * Reading perf_event_max_sample_rate only once _might_ cause some
+ * of the test to fail if kernel changes it after reading it here.
+ */
+ perf_event_max_sample_rate = get_perf_event_max_sample_rate();
+ fetch_pmu = perf_pmus__find("ibs_fetch");
+ op_pmu = perf_pmus__find("ibs_op");
+
+ if (!x86__is_amd_cpu() || !fetch_pmu || !op_pmu)
+ return TEST_SKIP;
+
+ perf_exe(perf, sizeof(perf));
+
+ if (sched_affine(0))
+ return TEST_FAIL;
+
+ /*
+ * Perf event can be opened in two modes:
+ * 1 Freq mode
+ * perf_event_attr->freq = 1, ->sample_freq = <frequency>
+ * 2 Sample period mode
+ * perf_event_attr->freq = 0, ->sample_period = <period>
+ *
+ * Instead of using above interface, IBS event in 'sample period mode'
+ * can also be opened by passing <period> value directly in a MaxCnt
+ * bitfields of perf_event_attr->config. Test this IBS specific special
+ * interface.
+ */
+ if (ibs_config_test())
+ ret = TEST_FAIL;
+
+ /*
+ * IBS Fetch and Op PMUs have HW constraints on minimum sample period.
+ * Also, sample period value must be in multiple of 0x10. Test that IBS
+ * driver honors HW constraints for various possible values in Freq as
+ * well as Sample Period mode IBS events.
+ */
+ if (ibs_period_constraint_test())
+ ret = TEST_FAIL;
+
+ /*
+ * Test ioctl() with various sample period values for IBS event.
+ */
+ if (ibs_ioctl_test())
+ ret = TEST_FAIL;
+
+ /*
+ * Test that opening of freq mode IBS event fails when the freq value
+ * is passed through ->config, not explicitly in ->sample_freq. Also
+ * use high freq value (beyond perf_event_max_sample_rate) to test IBS
+ * driver do not bypass perf_event_max_sample_rate checks.
+ */
+ if (ibs_freq_neg_test())
+ ret = TEST_FAIL;
+
+ /*
+ * L3MissOnly is a post-processing filter, i.e. IBS HW checks for L3
+ * Miss at the completion of the tagged uOp. The sample is discarded
+ * if the tagged uOp did not cause L3Miss. Also, IBS HW internally
+ * resets CurCnt to a small pseudo-random value and resumes counting.
+ * A new uOp is tagged once CurCnt reaches to MaxCnt. But the process
+ * repeats until the tagged uOp causes an L3 Miss.
+ *
+ * With the freq mode event, the next sample period is calculated by
+ * generic kernel on every sample to achieve desired freq of samples.
+ *
+ * Since the number of times HW internally reset CurCnt and the pseudo-
+ * random value of CurCnt for all those occurrences are not known to SW,
+ * the sample period adjustment by kernel goes for a toes for freq mode
+ * IBS events. Kernel will set very small period for the next sample if
+ * the window between current sample and prev sample is too high due to
+ * multiple samples being discarded internally by IBS HW.
+ *
+ * Test that IBS sample period constraints are honored when L3MissOnly
+ * is ON.
+ */
+ if (ibs_l3missonly_test(perf))
+ ret = TEST_FAIL;
+
+ return ret;
+}
diff --git a/tools/perf/arch/x86/tests/arch-tests.c b/tools/perf/arch/x86/tests/arch-tests.c
index a216a5d172ed..bfee2432515b 100644
--- a/tools/perf/arch/x86/tests/arch-tests.c
+++ b/tools/perf/arch/x86/tests/arch-tests.c
@@ -25,6 +25,7 @@ DEFINE_SUITE("x86 bp modify", bp_modify);
#endif
DEFINE_SUITE("x86 Sample parsing", x86_sample_parsing);
DEFINE_SUITE("AMD IBS via core pmu", amd_ibs_via_core_pmu);
+DEFINE_SUITE_EXCLUSIVE("AMD IBS sample period", amd_ibs_period);
static struct test_case hybrid_tests[] = {
TEST_CASE_REASON("x86 hybrid event parsing", hybrid, "not hybrid"),
{ .name = NULL, }
@@ -50,6 +51,7 @@ struct test_suite *arch_tests[] = {
#endif
&suite__x86_sample_parsing,
&suite__amd_ibs_via_core_pmu,
+ &suite__amd_ibs_period,
&suite__hybrid,
NULL,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-29 3:59 ` [PATCH v4 4/4] perf test amd ibs: Add sample period unit test Ravi Bangoria
@ 2025-04-29 20:55 ` Arnaldo Carvalho de Melo
2025-04-30 1:13 ` Arnaldo Carvalho de Melo
2025-04-30 6:33 ` Ravi Bangoria
0 siblings, 2 replies; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-04-29 20:55 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das
On Tue, Apr 29, 2025 at 03:59:38AM +0000, Ravi Bangoria wrote:
> IBS Fetch and IBS Op PMUs has various constraints on supported sample
> periods. Add perf unit tests to test those.
>
> Running it in parallel with other tests causes intermittent failures.
> Mark it exclusive to force it to run sequentially. Sample output on a
> Zen5 machine:
I've applied the series and will test it now, but found some problems
when building in some non-glibc systems, namely the use of PAGE_SIZE,
that is used in libc headers, even in glibc, its just that in glibc we
happen not to include that header where PAGE_SIZE gets redefined:
⬢ [acme@toolbx perf-tools-next]$ grep PAGE_SIZE /usr/include/sys/*.h
/usr/include/sys/user.h:#define PAGE_SIZE (1UL << PAGE_SHIFT)
/usr/include/sys/user.h:#define PAGE_MASK (~(PAGE_SIZE-1))
/usr/include/sys/user.h:#define NBPG PAGE_SIZE
⬢ [acme@toolbx perf-tools-next]$
So I folded the following patch, see if it is acceptable and please ack.
Thanks for respining it!
- Arnaldo
diff --git a/tools/perf/arch/x86/tests/amd-ibs-period.c b/tools/perf/arch/x86/tests/amd-ibs-period.c
index 0cf3656e4b9bdacf..946b0a377554fb81 100644
--- a/tools/perf/arch/x86/tests/amd-ibs-period.c
+++ b/tools/perf/arch/x86/tests/amd-ibs-period.c
@@ -17,13 +17,13 @@
#include "strbuf.h"
#include "../util/env.h"
-#define PAGE_SIZE sysconf(_SC_PAGESIZE)
+static int page_size;
#define PERF_MMAP_DATA_PAGES 32L
-#define PERF_MMAP_DATA_SIZE (PERF_MMAP_DATA_PAGES * PAGE_SIZE)
+#define PERF_MMAP_DATA_SIZE (PERF_MMAP_DATA_PAGES * page_size)
#define PERF_MMAP_DATA_MASK (PERF_MMAP_DATA_SIZE - 1)
#define PERF_MMAP_TOTAL_PAGES (PERF_MMAP_DATA_PAGES + 1)
-#define PERF_MMAP_TOTAL_SIZE (PERF_MMAP_TOTAL_PAGES * PAGE_SIZE)
+#define PERF_MMAP_TOTAL_SIZE (PERF_MMAP_TOTAL_PAGES * page_size)
#define rmb() asm volatile("lfence":::"memory")
@@ -59,15 +59,15 @@ static int dummy_workload_1(unsigned long count)
0xcc, /* int 3 */
};
- p = zalloc(2 * PAGE_SIZE);
+ p = zalloc(2 * page_size);
if (!p) {
printf("malloc() failed. %m");
return 1;
}
- func = (void *)((unsigned long)(p + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1));
+ func = (void *)((unsigned long)(p + page_size - 1) & ~(page_size - 1));
- ret = mprotect(func, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC);
+ ret = mprotect(func, page_size, PROT_READ | PROT_WRITE | PROT_EXEC);
if (ret) {
printf("mprotect() failed. %m");
goto out;
@@ -150,7 +150,7 @@ static int rb_read(struct perf_event_mmap_page *rb, void *dest, size_t size)
unsigned long data_tail, data_head;
/* Casting to (void *) is needed. */
- base = (void *)rb + PAGE_SIZE;
+ base = (void *)rb + page_size;
data_head = rb->data_head;
rmb();
@@ -918,6 +918,8 @@ int test__amd_ibs_period(struct test_suite *test __maybe_unused,
char perf[PATH_MAX] = {'\0'};
int ret = TEST_OK;
+ page_size = sysconf(_SC_PAGESIZE);
+
/*
* Reading perf_event_max_sample_rate only once _might_ cause some
* of the test to fail if kernel changes it after reading it here.
> Without kernel fixes:
>
> $ sudo ./perf test -vv 112
> 112: AMD IBS sample period:
> --- start ---
> test child forked, pid 8774
> Using CPUID AuthenticAMD-26-2-1
>
> IBS config tests:
> -----------------
> Fetch PMU tests:
> 0xffff : Ok (nr samples: 1078)
> 0x1000 : Ok (nr samples: 17030)
> 0xff : Ok (nr samples: 41068)
> 0x1 : Ok (nr samples: 40543)
> 0x0 : Ok
> 0x10000 : Ok
> Op PMU tests:
> 0x0 : Ok
> 0x1 : Fail
> 0x8 : Fail
> 0x9 : Ok (nr samples: 40543)
> 0xf : Ok (nr samples: 40543)
> 0x1000 : Ok (nr samples: 18736)
> 0xffff : Ok (nr samples: 1168)
> 0x10000 : Ok
> 0x100000 : Fail (nr samples: 14)
> 0xf00000 : Fail (nr samples: 1)
> 0xf0ffff : Fail (nr samples: 1)
> 0x1f0ffff : Fail (nr samples: 1)
> 0x7f0ffff : Fail (nr samples: 0)
> 0x8f0ffff : Ok
> 0x17f0ffff : Ok
>
> IBS sample period constraint tests:
> -----------------------------------
> Fetch PMU test:
> freq 0, sample_freq 0: Ok
> freq 0, sample_freq 1: Fail
> freq 0, sample_freq 15: Fail
> freq 0, sample_freq 16: Ok (nr samples: 1604)
> freq 0, sample_freq 17: Ok (nr samples: 1604)
> freq 0, sample_freq 143: Ok (nr samples: 1604)
> freq 0, sample_freq 144: Ok (nr samples: 1604)
> freq 0, sample_freq 145: Ok (nr samples: 1604)
> freq 0, sample_freq 1234: Ok (nr samples: 1566)
> freq 0, sample_freq 4103: Ok (nr samples: 1119)
> freq 0, sample_freq 65520: Ok (nr samples: 2264)
> freq 0, sample_freq 65535: Ok (nr samples: 2263)
> freq 0, sample_freq 65552: Ok (nr samples: 1166)
> freq 0, sample_freq 8388607: Ok (nr samples: 268)
> freq 0, sample_freq 268435455: Ok (nr samples: 8)
> freq 1, sample_freq 0: Ok
> freq 1, sample_freq 1: Ok (nr samples: 4)
> freq 1, sample_freq 15: Ok (nr samples: 4)
> freq 1, sample_freq 16: Ok (nr samples: 4)
> freq 1, sample_freq 17: Ok (nr samples: 4)
> freq 1, sample_freq 143: Ok (nr samples: 5)
> freq 1, sample_freq 144: Ok (nr samples: 5)
> freq 1, sample_freq 145: Ok (nr samples: 5)
> freq 1, sample_freq 1234: Ok (nr samples: 7)
> freq 1, sample_freq 4103: Ok (nr samples: 35)
> freq 1, sample_freq 65520: Ok (nr samples: 642)
> freq 1, sample_freq 65535: Ok (nr samples: 636)
> freq 1, sample_freq 65552: Ok (nr samples: 651)
> freq 1, sample_freq 8388607: Ok
> Op PMU test:
> freq 0, sample_freq 0: Ok
> freq 0, sample_freq 1: Fail
> freq 0, sample_freq 15: Fail
> freq 0, sample_freq 16: Fail
> freq 0, sample_freq 17: Fail
> freq 0, sample_freq 143: Fail
> freq 0, sample_freq 144: Ok (nr samples: 1604)
> freq 0, sample_freq 145: Ok (nr samples: 1604)
> freq 0, sample_freq 1234: Ok (nr samples: 1604)
> freq 0, sample_freq 4103: Ok (nr samples: 1604)
> freq 0, sample_freq 65520: Ok (nr samples: 2227)
> freq 0, sample_freq 65535: Ok (nr samples: 2296)
> freq 0, sample_freq 65552: Ok (nr samples: 2213)
> freq 0, sample_freq 8388607: Ok (nr samples: 250)
> freq 0, sample_freq 268435455: Ok (nr samples: 8)
> freq 1, sample_freq 0: Ok
> freq 1, sample_freq 1: Fail (nr samples: 4)
> freq 1, sample_freq 15: Fail (nr samples: 4)
> freq 1, sample_freq 16: Fail (nr samples: 4)
> freq 1, sample_freq 17: Fail (nr samples: 4)
> freq 1, sample_freq 143: Fail (nr samples: 5)
> freq 1, sample_freq 144: Fail (nr samples: 5)
> freq 1, sample_freq 145: Fail (nr samples: 5)
> freq 1, sample_freq 1234: Fail (nr samples: 8)
> freq 1, sample_freq 4103: Fail (nr samples: 33)
> freq 1, sample_freq 65520: Fail (nr samples: 546)
> freq 1, sample_freq 65535: Fail (nr samples: 544)
> freq 1, sample_freq 65552: Fail (nr samples: 555)
> freq 1, sample_freq 8388607: Ok
>
> IBS ioctl() tests:
> ------------------
> Fetch PMU tests
> ioctl(period = 0x0 ): Ok
> ioctl(period = 0x1 ): Fail
> ioctl(period = 0xf ): Fail
> ioctl(period = 0x10 ): Ok
> ioctl(period = 0x11 ): Fail
> ioctl(period = 0x1f ): Fail
> ioctl(period = 0x20 ): Ok
> ioctl(period = 0x80 ): Ok
> ioctl(period = 0x8f ): Fail
> ioctl(period = 0x90 ): Ok
> ioctl(period = 0x91 ): Fail
> ioctl(period = 0x100 ): Ok
> ioctl(period = 0xfff0 ): Ok
> ioctl(period = 0xffff ): Fail
> ioctl(period = 0x10000 ): Ok
> ioctl(period = 0x1fff0 ): Ok
> ioctl(period = 0x1fff5 ): Fail
> ioctl(freq = 0x0 ): Ok
> ioctl(freq = 0x1 ): Ok
> ioctl(freq = 0xf ): Ok
> ioctl(freq = 0x10 ): Ok
> ioctl(freq = 0x11 ): Ok
> ioctl(freq = 0x1f ): Ok
> ioctl(freq = 0x20 ): Ok
> ioctl(freq = 0x80 ): Ok
> ioctl(freq = 0x8f ): Ok
> ioctl(freq = 0x90 ): Ok
> ioctl(freq = 0x91 ): Ok
> ioctl(freq = 0x100 ): Ok
> Op PMU tests
> ioctl(period = 0x0 ): Ok
> ioctl(period = 0x1 ): Fail
> ioctl(period = 0xf ): Fail
> ioctl(period = 0x10 ): Fail
> ioctl(period = 0x11 ): Fail
> ioctl(period = 0x1f ): Fail
> ioctl(period = 0x20 ): Fail
> ioctl(period = 0x80 ): Fail
> ioctl(period = 0x8f ): Fail
> ioctl(period = 0x90 ): Ok
> ioctl(period = 0x91 ): Fail
> ioctl(period = 0x100 ): Ok
> ioctl(period = 0xfff0 ): Ok
> ioctl(period = 0xffff ): Fail
> ioctl(period = 0x10000 ): Ok
> ioctl(period = 0x1fff0 ): Ok
> ioctl(period = 0x1fff5 ): Fail
> ioctl(freq = 0x0 ): Ok
> ioctl(freq = 0x1 ): Ok
> ioctl(freq = 0xf ): Ok
> ioctl(freq = 0x10 ): Ok
> ioctl(freq = 0x11 ): Ok
> ioctl(freq = 0x1f ): Ok
> ioctl(freq = 0x20 ): Ok
> ioctl(freq = 0x80 ): Ok
> ioctl(freq = 0x8f ): Ok
> ioctl(freq = 0x90 ): Ok
> ioctl(freq = 0x91 ): Ok
> ioctl(freq = 0x100 ): Ok
>
> IBS freq (negative) tests:
> --------------------------
> freq 1, sample_freq 200000: Fail
>
> IBS L3MissOnly test: (takes a while)
> --------------------
> Fetch L3MissOnly: Fail (nr_samples: 1213)
> Op L3MissOnly: Ok (nr_samples: 1193)
> ---- end(-1) ----
> 112: AMD IBS sample period : FAILED!
>
> With kernel fixes:
>
> $ sudo ./perf test -vv 112
> 112: AMD IBS sample period:
> --- start ---
> test child forked, pid 6939
> Using CPUID AuthenticAMD-26-2-1
>
> IBS config tests:
> -----------------
> Fetch PMU tests:
> 0xffff : Ok (nr samples: 969)
> 0x1000 : Ok (nr samples: 15540)
> 0xff : Ok (nr samples: 40555)
> 0x1 : Ok (nr samples: 40543)
> 0x0 : Ok
> 0x10000 : Ok
> Op PMU tests:
> 0x0 : Ok
> 0x1 : Ok
> 0x8 : Ok
> 0x9 : Ok (nr samples: 40543)
> 0xf : Ok (nr samples: 40543)
> 0x1000 : Ok (nr samples: 19156)
> 0xffff : Ok (nr samples: 1169)
> 0x10000 : Ok
> 0x100000 : Ok (nr samples: 1151)
> 0xf00000 : Ok (nr samples: 76)
> 0xf0ffff : Ok (nr samples: 73)
> 0x1f0ffff : Ok (nr samples: 33)
> 0x7f0ffff : Ok (nr samples: 10)
> 0x8f0ffff : Ok
> 0x17f0ffff : Ok
>
> IBS sample period constraint tests:
> -----------------------------------
> Fetch PMU test:
> freq 0, sample_freq 0: Ok
> freq 0, sample_freq 1: Ok
> freq 0, sample_freq 15: Ok
> freq 0, sample_freq 16: Ok (nr samples: 1203)
> freq 0, sample_freq 17: Ok (nr samples: 1604)
> freq 0, sample_freq 143: Ok (nr samples: 1604)
> freq 0, sample_freq 144: Ok (nr samples: 1604)
> freq 0, sample_freq 145: Ok (nr samples: 1604)
> freq 0, sample_freq 1234: Ok (nr samples: 1604)
> freq 0, sample_freq 4103: Ok (nr samples: 1343)
> freq 0, sample_freq 65520: Ok (nr samples: 2254)
> freq 0, sample_freq 65535: Ok (nr samples: 2136)
> freq 0, sample_freq 65552: Ok (nr samples: 1158)
> freq 0, sample_freq 8388607: Ok (nr samples: 257)
> freq 0, sample_freq 268435455: Ok (nr samples: 8)
> freq 1, sample_freq 0: Ok
> freq 1, sample_freq 1: Ok (nr samples: 4)
> freq 1, sample_freq 15: Ok (nr samples: 4)
> freq 1, sample_freq 16: Ok (nr samples: 4)
> freq 1, sample_freq 17: Ok (nr samples: 4)
> freq 1, sample_freq 143: Ok (nr samples: 5)
> freq 1, sample_freq 144: Ok (nr samples: 5)
> freq 1, sample_freq 145: Ok (nr samples: 5)
> freq 1, sample_freq 1234: Ok (nr samples: 8)
> freq 1, sample_freq 4103: Ok (nr samples: 34)
> freq 1, sample_freq 65520: Ok (nr samples: 458)
> freq 1, sample_freq 65535: Ok (nr samples: 628)
> freq 1, sample_freq 65552: Ok (nr samples: 396)
> freq 1, sample_freq 8388607: Ok
> Op PMU test:
> freq 0, sample_freq 0: Ok
> freq 0, sample_freq 1: Ok
> freq 0, sample_freq 15: Ok
> freq 0, sample_freq 16: Ok
> freq 0, sample_freq 17: Ok
> freq 0, sample_freq 143: Ok
> freq 0, sample_freq 144: Ok (nr samples: 1604)
> freq 0, sample_freq 145: Ok (nr samples: 1604)
> freq 0, sample_freq 1234: Ok (nr samples: 1604)
> freq 0, sample_freq 4103: Ok (nr samples: 1604)
> freq 0, sample_freq 65520: Ok (nr samples: 2250)
> freq 0, sample_freq 65535: Ok (nr samples: 2158)
> freq 0, sample_freq 65552: Ok (nr samples: 2296)
> freq 0, sample_freq 8388607: Ok (nr samples: 243)
> freq 0, sample_freq 268435455: Ok (nr samples: 6)
> freq 1, sample_freq 0: Ok
> freq 1, sample_freq 1: Ok (nr samples: 4)
> freq 1, sample_freq 15: Ok (nr samples: 4)
> freq 1, sample_freq 16: Ok (nr samples: 4)
> freq 1, sample_freq 17: Ok (nr samples: 4)
> freq 1, sample_freq 143: Ok (nr samples: 4)
> freq 1, sample_freq 144: Ok (nr samples: 5)
> freq 1, sample_freq 145: Ok (nr samples: 4)
> freq 1, sample_freq 1234: Ok (nr samples: 6)
> freq 1, sample_freq 4103: Ok (nr samples: 27)
> freq 1, sample_freq 65520: Ok (nr samples: 542)
> freq 1, sample_freq 65535: Ok (nr samples: 550)
> freq 1, sample_freq 65552: Ok (nr samples: 552)
> freq 1, sample_freq 8388607: Ok
>
> IBS ioctl() tests:
> ------------------
> Fetch PMU tests
> ioctl(period = 0x0 ): Ok
> ioctl(period = 0x1 ): Ok
> ioctl(period = 0xf ): Ok
> ioctl(period = 0x10 ): Ok
> ioctl(period = 0x11 ): Ok
> ioctl(period = 0x1f ): Ok
> ioctl(period = 0x20 ): Ok
> ioctl(period = 0x80 ): Ok
> ioctl(period = 0x8f ): Ok
> ioctl(period = 0x90 ): Ok
> ioctl(period = 0x91 ): Ok
> ioctl(period = 0x100 ): Ok
> ioctl(period = 0xfff0 ): Ok
> ioctl(period = 0xffff ): Ok
> ioctl(period = 0x10000 ): Ok
> ioctl(period = 0x1fff0 ): Ok
> ioctl(period = 0x1fff5 ): Ok
> ioctl(freq = 0x0 ): Ok
> ioctl(freq = 0x1 ): Ok
> ioctl(freq = 0xf ): Ok
> ioctl(freq = 0x10 ): Ok
> ioctl(freq = 0x11 ): Ok
> ioctl(freq = 0x1f ): Ok
> ioctl(freq = 0x20 ): Ok
> ioctl(freq = 0x80 ): Ok
> ioctl(freq = 0x8f ): Ok
> ioctl(freq = 0x90 ): Ok
> ioctl(freq = 0x91 ): Ok
> ioctl(freq = 0x100 ): Ok
> Op PMU tests
> ioctl(period = 0x0 ): Ok
> ioctl(period = 0x1 ): Ok
> ioctl(period = 0xf ): Ok
> ioctl(period = 0x10 ): Ok
> ioctl(period = 0x11 ): Ok
> ioctl(period = 0x1f ): Ok
> ioctl(period = 0x20 ): Ok
> ioctl(period = 0x80 ): Ok
> ioctl(period = 0x8f ): Ok
> ioctl(period = 0x90 ): Ok
> ioctl(period = 0x91 ): Ok
> ioctl(period = 0x100 ): Ok
> ioctl(period = 0xfff0 ): Ok
> ioctl(period = 0xffff ): Ok
> ioctl(period = 0x10000 ): Ok
> ioctl(period = 0x1fff0 ): Ok
> ioctl(period = 0x1fff5 ): Ok
> ioctl(freq = 0x0 ): Ok
> ioctl(freq = 0x1 ): Ok
> ioctl(freq = 0xf ): Ok
> ioctl(freq = 0x10 ): Ok
> ioctl(freq = 0x11 ): Ok
> ioctl(freq = 0x1f ): Ok
> ioctl(freq = 0x20 ): Ok
> ioctl(freq = 0x80 ): Ok
> ioctl(freq = 0x8f ): Ok
> ioctl(freq = 0x90 ): Ok
> ioctl(freq = 0x91 ): Ok
> ioctl(freq = 0x100 ): Ok
>
> IBS freq (negative) tests:
> --------------------------
> freq 1, sample_freq 200000: Ok
>
> IBS L3MissOnly test: (takes a while)
> --------------------
> Fetch L3MissOnly: Ok (nr_samples: 1301)
> Op L3MissOnly: Ok (nr_samples: 1590)
> ---- end(0) ----
> 112: AMD IBS sample period : Ok
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
> tools/perf/arch/x86/include/arch-tests.h | 1 +
> tools/perf/arch/x86/tests/Build | 1 +
> tools/perf/arch/x86/tests/amd-ibs-period.c | 1001 ++++++++++++++++++++
> tools/perf/arch/x86/tests/arch-tests.c | 2 +
> 4 files changed, 1005 insertions(+)
> create mode 100644 tools/perf/arch/x86/tests/amd-ibs-period.c
>
> diff --git a/tools/perf/arch/x86/include/arch-tests.h b/tools/perf/arch/x86/include/arch-tests.h
> index c0421a26b875..4fd425157d7d 100644
> --- a/tools/perf/arch/x86/include/arch-tests.h
> +++ b/tools/perf/arch/x86/include/arch-tests.h
> @@ -14,6 +14,7 @@ int test__intel_pt_hybrid_compat(struct test_suite *test, int subtest);
> int test__bp_modify(struct test_suite *test, int subtest);
> int test__x86_sample_parsing(struct test_suite *test, int subtest);
> int test__amd_ibs_via_core_pmu(struct test_suite *test, int subtest);
> +int test__amd_ibs_period(struct test_suite *test, int subtest);
> int test__hybrid(struct test_suite *test, int subtest);
>
> extern struct test_suite *arch_tests[];
> diff --git a/tools/perf/arch/x86/tests/Build b/tools/perf/arch/x86/tests/Build
> index 86262c720857..5e00cbfd2d56 100644
> --- a/tools/perf/arch/x86/tests/Build
> +++ b/tools/perf/arch/x86/tests/Build
> @@ -10,6 +10,7 @@ perf-test-$(CONFIG_AUXTRACE) += insn-x86.o
> endif
> perf-test-$(CONFIG_X86_64) += bp-modify.o
> perf-test-y += amd-ibs-via-core-pmu.o
> +perf-test-y += amd-ibs-period.o
>
> ifdef SHELLCHECK
> SHELL_TESTS := gen-insn-x86-dat.sh
> diff --git a/tools/perf/arch/x86/tests/amd-ibs-period.c b/tools/perf/arch/x86/tests/amd-ibs-period.c
> new file mode 100644
> index 000000000000..0cf3656e4b9b
> --- /dev/null
> +++ b/tools/perf/arch/x86/tests/amd-ibs-period.c
> @@ -0,0 +1,1001 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <sched.h>
> +#include <sys/syscall.h>
> +#include <sys/mman.h>
> +#include <sys/ioctl.h>
> +#include <string.h>
> +
> +#include "arch-tests.h"
> +#include "linux/perf_event.h"
> +#include "linux/zalloc.h"
> +#include "tests/tests.h"
> +#include "../perf-sys.h"
> +#include "pmu.h"
> +#include "pmus.h"
> +#include "debug.h"
> +#include "util.h"
> +#include "strbuf.h"
> +#include "../util/env.h"
> +
> +#define PAGE_SIZE sysconf(_SC_PAGESIZE)
> +
> +#define PERF_MMAP_DATA_PAGES 32L
> +#define PERF_MMAP_DATA_SIZE (PERF_MMAP_DATA_PAGES * PAGE_SIZE)
> +#define PERF_MMAP_DATA_MASK (PERF_MMAP_DATA_SIZE - 1)
> +#define PERF_MMAP_TOTAL_PAGES (PERF_MMAP_DATA_PAGES + 1)
> +#define PERF_MMAP_TOTAL_SIZE (PERF_MMAP_TOTAL_PAGES * PAGE_SIZE)
> +
> +#define rmb() asm volatile("lfence":::"memory")
> +
> +enum {
> + FD_ERROR,
> + FD_SUCCESS,
> +};
> +
> +enum {
> + IBS_FETCH,
> + IBS_OP,
> +};
> +
> +struct perf_pmu *fetch_pmu;
> +struct perf_pmu *op_pmu;
> +unsigned int perf_event_max_sample_rate;
> +
> +/* Dummy workload to generate IBS samples. */
> +static int dummy_workload_1(unsigned long count)
> +{
> + int (*func)(void);
> + int ret = 0;
> + char *p;
> + char insn1[] = {
> + 0xb8, 0x01, 0x00, 0x00, 0x00, /* mov 1,%eax */
> + 0xc3, /* ret */
> + 0xcc, /* int 3 */
> + };
> +
> + char insn2[] = {
> + 0xb8, 0x02, 0x00, 0x00, 0x00, /* mov 2,%eax */
> + 0xc3, /* ret */
> + 0xcc, /* int 3 */
> + };
> +
> + p = zalloc(2 * PAGE_SIZE);
> + if (!p) {
> + printf("malloc() failed. %m");
> + return 1;
> + }
> +
> + func = (void *)((unsigned long)(p + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1));
> +
> + ret = mprotect(func, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC);
> + if (ret) {
> + printf("mprotect() failed. %m");
> + goto out;
> + }
> +
> + if (count < 100000)
> + count = 100000;
> + else if (count > 10000000)
> + count = 10000000;
> + while (count--) {
> + memcpy(func, insn1, sizeof(insn1));
> + if (func() != 1) {
> + pr_debug("ERROR insn1\n");
> + ret = -1;
> + goto out;
> + }
> + memcpy(func, insn2, sizeof(insn2));
> + if (func() != 2) {
> + pr_debug("ERROR insn2\n");
> + ret = -1;
> + goto out;
> + }
> + }
> +
> +out:
> + free(p);
> + return ret;
> +}
> +
> +/* Another dummy workload to generate IBS samples. */
> +static void dummy_workload_2(char *perf)
> +{
> + char bench[] = " bench sched messaging -g 10 -l 5000 > /dev/null 2>&1";
> + char taskset[] = "taskset -c 0 ";
> + int ret __maybe_unused;
> + struct strbuf sb;
> + char *cmd;
> +
> + strbuf_init(&sb, 0);
> + strbuf_add(&sb, taskset, strlen(taskset));
> + strbuf_add(&sb, perf, strlen(perf));
> + strbuf_add(&sb, bench, strlen(bench));
> + cmd = strbuf_detach(&sb, NULL);
> + ret = system(cmd);
> + free(cmd);
> +}
> +
> +static int sched_affine(int cpu)
> +{
> + cpu_set_t set;
> +
> + CPU_ZERO(&set);
> + CPU_SET(cpu, &set);
> + if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
> + pr_debug("sched_setaffinity() failed. [%m]");
> + return -1;
> + }
> + return 0;
> +}
> +
> +static void
> +copy_sample_data(void *src, unsigned long offset, void *dest, size_t size)
> +{
> + size_t chunk1_size, chunk2_size;
> +
> + if ((offset + size) < (size_t)PERF_MMAP_DATA_SIZE) {
> + memcpy(dest, src + offset, size);
> + } else {
> + chunk1_size = PERF_MMAP_DATA_SIZE - offset;
> + chunk2_size = size - chunk1_size;
> +
> + memcpy(dest, src + offset, chunk1_size);
> + memcpy(dest + chunk1_size, src, chunk2_size);
> + }
> +}
> +
> +static int rb_read(struct perf_event_mmap_page *rb, void *dest, size_t size)
> +{
> + void *base;
> + unsigned long data_tail, data_head;
> +
> + /* Casting to (void *) is needed. */
> + base = (void *)rb + PAGE_SIZE;
> +
> + data_head = rb->data_head;
> + rmb();
> + data_tail = rb->data_tail;
> +
> + if ((data_head - data_tail) < size)
> + return -1;
> +
> + data_tail &= PERF_MMAP_DATA_MASK;
> + copy_sample_data(base, data_tail, dest, size);
> + rb->data_tail += size;
> + return 0;
> +}
> +
> +static void rb_skip(struct perf_event_mmap_page *rb, size_t size)
> +{
> + size_t data_head = rb->data_head;
> +
> + rmb();
> +
> + if ((rb->data_tail + size) > data_head)
> + rb->data_tail = data_head;
> + else
> + rb->data_tail += size;
> +}
> +
> +/* Sample period value taken from perf sample must match with expected value. */
> +static int period_equal(unsigned long exp_period, unsigned long act_period)
> +{
> + return exp_period == act_period ? 0 : -1;
> +}
> +
> +/*
> + * Sample period value taken from perf sample must be >= minimum sample period
> + * supported by IBS HW.
> + */
> +static int period_higher(unsigned long min_period, unsigned long act_period)
> +{
> + return min_period <= act_period ? 0 : -1;
> +}
> +
> +static int rb_drain_samples(struct perf_event_mmap_page *rb,
> + unsigned long exp_period,
> + int *nr_samples,
> + int (*callback)(unsigned long, unsigned long))
> +{
> + struct perf_event_header hdr;
> + unsigned long period;
> + int ret = 0;
> +
> + /*
> + * PERF_RECORD_SAMPLE:
> + * struct {
> + * struct perf_event_header hdr;
> + * { u64 period; } && PERF_SAMPLE_PERIOD
> + * };
> + */
> + while (1) {
> + if (rb_read(rb, &hdr, sizeof(hdr)))
> + return ret;
> +
> + if (hdr.type == PERF_RECORD_SAMPLE) {
> + (*nr_samples)++;
> + period = 0;
> + if (rb_read(rb, &period, sizeof(period)))
> + pr_debug("rb_read(period) error. [%m]");
> + ret |= callback(exp_period, period);
> + } else {
> + rb_skip(rb, hdr.size - sizeof(hdr));
> + }
> + }
> + return ret;
> +}
> +
> +static long perf_event_open(struct perf_event_attr *attr, pid_t pid,
> + int cpu, int group_fd, unsigned long flags)
> +{
> + return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
> +}
> +
> +static void fetch_prepare_attr(struct perf_event_attr *attr,
> + unsigned long long config, int freq,
> + unsigned long sample_period)
> +{
> + memset(attr, 0, sizeof(struct perf_event_attr));
> +
> + attr->type = fetch_pmu->type;
> + attr->size = sizeof(struct perf_event_attr);
> + attr->config = config;
> + attr->disabled = 1;
> + attr->sample_type = PERF_SAMPLE_PERIOD;
> + attr->freq = freq;
> + attr->sample_period = sample_period; /* = ->sample_freq */
> +}
> +
> +static void op_prepare_attr(struct perf_event_attr *attr,
> + unsigned long config, int freq,
> + unsigned long sample_period)
> +{
> + memset(attr, 0, sizeof(struct perf_event_attr));
> +
> + attr->type = op_pmu->type;
> + attr->size = sizeof(struct perf_event_attr);
> + attr->config = config;
> + attr->disabled = 1;
> + attr->sample_type = PERF_SAMPLE_PERIOD;
> + attr->freq = freq;
> + attr->sample_period = sample_period; /* = ->sample_freq */
> +}
> +
> +struct ibs_configs {
> + /* Input */
> + unsigned long config;
> +
> + /* Expected output */
> + unsigned long period;
> + int fd;
> +};
> +
> +/*
> + * Somehow first Fetch event with sample period = 0x10 causes 0
> + * samples. So start with large period and decrease it gradually.
> + */
> +struct ibs_configs fetch_configs[] = {
> + { .config = 0xffff, .period = 0xffff0, .fd = FD_SUCCESS },
> + { .config = 0x1000, .period = 0x10000, .fd = FD_SUCCESS },
> + { .config = 0xff, .period = 0xff0, .fd = FD_SUCCESS },
> + { .config = 0x1, .period = 0x10, .fd = FD_SUCCESS },
> + { .config = 0x0, .period = -1, .fd = FD_ERROR },
> + { .config = 0x10000, .period = -1, .fd = FD_ERROR },
> +};
> +
> +struct ibs_configs op_configs[] = {
> + { .config = 0x0, .period = -1, .fd = FD_ERROR },
> + { .config = 0x1, .period = -1, .fd = FD_ERROR },
> + { .config = 0x8, .period = -1, .fd = FD_ERROR },
> + { .config = 0x9, .period = 0x90, .fd = FD_SUCCESS },
> + { .config = 0xf, .period = 0xf0, .fd = FD_SUCCESS },
> + { .config = 0x1000, .period = 0x10000, .fd = FD_SUCCESS },
> + { .config = 0xffff, .period = 0xffff0, .fd = FD_SUCCESS },
> + { .config = 0x10000, .period = -1, .fd = FD_ERROR },
> + { .config = 0x100000, .period = 0x100000, .fd = FD_SUCCESS },
> + { .config = 0xf00000, .period = 0xf00000, .fd = FD_SUCCESS },
> + { .config = 0xf0ffff, .period = 0xfffff0, .fd = FD_SUCCESS },
> + { .config = 0x1f0ffff, .period = 0x1fffff0, .fd = FD_SUCCESS },
> + { .config = 0x7f0ffff, .period = 0x7fffff0, .fd = FD_SUCCESS },
> + { .config = 0x8f0ffff, .period = -1, .fd = FD_ERROR },
> + { .config = 0x17f0ffff, .period = -1, .fd = FD_ERROR },
> +};
> +
> +static int __ibs_config_test(int ibs_type, struct ibs_configs *config, int *nr_samples)
> +{
> + struct perf_event_attr attr;
> + int fd, i;
> + void *rb;
> + int ret = 0;
> +
> + if (ibs_type == IBS_FETCH)
> + fetch_prepare_attr(&attr, config->config, 0, 0);
> + else
> + op_prepare_attr(&attr, config->config, 0, 0);
> +
> + /* CPU0, All processes */
> + fd = perf_event_open(&attr, -1, 0, -1, 0);
> + if (config->fd == FD_ERROR) {
> + if (fd != -1) {
> + close(fd);
> + return -1;
> + }
> + return 0;
> + }
> + if (fd <= -1)
> + return -1;
> +
> + rb = mmap(NULL, PERF_MMAP_TOTAL_SIZE, PROT_READ | PROT_WRITE,
> + MAP_SHARED, fd, 0);
> + if (rb == MAP_FAILED) {
> + pr_debug("mmap() failed. [%m]\n");
> + return -1;
> + }
> +
> + ioctl(fd, PERF_EVENT_IOC_RESET, 0);
> + ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
> +
> + i = 5;
> + while (i--) {
> + dummy_workload_1(1000000);
> +
> + ret = rb_drain_samples(rb, config->period, nr_samples,
> + period_equal);
> + if (ret)
> + break;
> + }
> +
> + ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
> + munmap(rb, PERF_MMAP_TOTAL_SIZE);
> + close(fd);
> + return ret;
> +}
> +
> +static int ibs_config_test(void)
> +{
> + int nr_samples = 0;
> + unsigned long i;
> + int ret = 0;
> + int r;
> +
> + pr_debug("\nIBS config tests:\n");
> + pr_debug("-----------------\n");
> +
> + pr_debug("Fetch PMU tests:\n");
> + for (i = 0; i < ARRAY_SIZE(fetch_configs); i++) {
> + nr_samples = 0;
> + r = __ibs_config_test(IBS_FETCH, &(fetch_configs[i]), &nr_samples);
> +
> + if (fetch_configs[i].fd == FD_ERROR) {
> + pr_debug("0x%-16lx: %-4s\n", fetch_configs[i].config,
> + !r ? "Ok" : "Fail");
> + } else {
> + /*
> + * Although nr_samples == 0 is reported as Fail here,
> + * the failure status is not cascaded up because, we
> + * can not decide whether test really failed or not
> + * without actual samples.
> + */
> + pr_debug("0x%-16lx: %-4s (nr samples: %d)\n", fetch_configs[i].config,
> + (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
> + }
> +
> + ret |= r;
> + }
> +
> + pr_debug("Op PMU tests:\n");
> + for (i = 0; i < ARRAY_SIZE(op_configs); i++) {
> + nr_samples = 0;
> + r = __ibs_config_test(IBS_OP, &(op_configs[i]), &nr_samples);
> +
> + if (op_configs[i].fd == FD_ERROR) {
> + pr_debug("0x%-16lx: %-4s\n", op_configs[i].config,
> + !r ? "Ok" : "Fail");
> + } else {
> + /*
> + * Although nr_samples == 0 is reported as Fail here,
> + * the failure status is not cascaded up because, we
> + * can not decide whether test really failed or not
> + * without actual samples.
> + */
> + pr_debug("0x%-16lx: %-4s (nr samples: %d)\n", op_configs[i].config,
> + (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
> + }
> +
> + ret |= r;
> + }
> +
> + return ret;
> +}
> +
> +struct ibs_period {
> + /* Input */
> + int freq;
> + unsigned long sample_freq;
> +
> + /* Output */
> + int ret;
> + unsigned long period;
> +};
> +
> +struct ibs_period fetch_period[] = {
> + { .freq = 0, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 1, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 0xf, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 0x10, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 0, .sample_freq = 0x11, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 0, .sample_freq = 0x8f, .ret = FD_SUCCESS, .period = 0x80 },
> + { .freq = 0, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 0, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 0, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x4d0 },
> + { .freq = 0, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x1000 },
> + { .freq = 0, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0xfff0 },
> + { .freq = 0, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0xfff0 },
> + { .freq = 0, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x10010 },
> + { .freq = 0, .sample_freq = 0x7fffff, .ret = FD_SUCCESS, .period = 0x7ffff0 },
> + { .freq = 0, .sample_freq = 0xfffffff, .ret = FD_SUCCESS, .period = 0xffffff0 },
> + { .freq = 1, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
> + { .freq = 1, .sample_freq = 1, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0xf, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x10, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x11, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x8f, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0x10 },
> + { .freq = 1, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x10 },
> + /* ret=FD_ERROR because freq > default perf_event_max_sample_rate (100000) */
> + { .freq = 1, .sample_freq = 0x7fffff, .ret = FD_ERROR, .period = -1 },
> +};
> +
> +struct ibs_period op_period[] = {
> + { .freq = 0, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 1, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 0xf, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 0x10, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 0x11, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 0x8f, .ret = FD_ERROR, .period = -1 },
> + { .freq = 0, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 0, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 0, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x4d0 },
> + { .freq = 0, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x1000 },
> + { .freq = 0, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0xfff0 },
> + { .freq = 0, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0xfff0 },
> + { .freq = 0, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x10010 },
> + { .freq = 0, .sample_freq = 0x7fffff, .ret = FD_SUCCESS, .period = 0x7ffff0 },
> + { .freq = 0, .sample_freq = 0xfffffff, .ret = FD_SUCCESS, .period = 0xffffff0 },
> + { .freq = 1, .sample_freq = 0, .ret = FD_ERROR, .period = -1 },
> + { .freq = 1, .sample_freq = 1, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0xf, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x10, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x11, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x8f, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x90, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x91, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x4d2, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x1007, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0xfff0, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0xffff, .ret = FD_SUCCESS, .period = 0x90 },
> + { .freq = 1, .sample_freq = 0x10010, .ret = FD_SUCCESS, .period = 0x90 },
> + /* ret=FD_ERROR because freq > default perf_event_max_sample_rate (100000) */
> + { .freq = 1, .sample_freq = 0x7fffff, .ret = FD_ERROR, .period = -1 },
> +};
> +
> +static int __ibs_period_constraint_test(int ibs_type, struct ibs_period *period,
> + int *nr_samples)
> +{
> + struct perf_event_attr attr;
> + int ret = 0;
> + void *rb;
> + int fd;
> +
> + if (period->freq && period->sample_freq > perf_event_max_sample_rate)
> + period->ret = FD_ERROR;
> +
> + if (ibs_type == IBS_FETCH)
> + fetch_prepare_attr(&attr, 0, period->freq, period->sample_freq);
> + else
> + op_prepare_attr(&attr, 0, period->freq, period->sample_freq);
> +
> + /* CPU0, All processes */
> + fd = perf_event_open(&attr, -1, 0, -1, 0);
> + if (period->ret == FD_ERROR) {
> + if (fd != -1) {
> + close(fd);
> + return -1;
> + }
> + return 0;
> + }
> + if (fd <= -1)
> + return -1;
> +
> + rb = mmap(NULL, PERF_MMAP_TOTAL_SIZE, PROT_READ | PROT_WRITE,
> + MAP_SHARED, fd, 0);
> + if (rb == MAP_FAILED) {
> + pr_debug("mmap() failed. [%m]\n");
> + close(fd);
> + return -1;
> + }
> +
> + ioctl(fd, PERF_EVENT_IOC_RESET, 0);
> + ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
> +
> + if (period->freq) {
> + dummy_workload_1(100000);
> + ret = rb_drain_samples(rb, period->period, nr_samples,
> + period_higher);
> + } else {
> + dummy_workload_1(period->sample_freq * 10);
> + ret = rb_drain_samples(rb, period->period, nr_samples,
> + period_equal);
> + }
> +
> + ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
> + munmap(rb, PERF_MMAP_TOTAL_SIZE);
> + close(fd);
> + return ret;
> +}
> +
> +static int ibs_period_constraint_test(void)
> +{
> + unsigned long i;
> + int nr_samples;
> + int ret = 0;
> + int r;
> +
> + pr_debug("\nIBS sample period constraint tests:\n");
> + pr_debug("-----------------------------------\n");
> +
> + pr_debug("Fetch PMU test:\n");
> + for (i = 0; i < ARRAY_SIZE(fetch_period); i++) {
> + nr_samples = 0;
> + r = __ibs_period_constraint_test(IBS_FETCH, &fetch_period[i],
> + &nr_samples);
> +
> + if (fetch_period[i].ret == FD_ERROR) {
> + pr_debug("freq %d, sample_freq %9ld: %-4s\n",
> + fetch_period[i].freq, fetch_period[i].sample_freq,
> + !r ? "Ok" : "Fail");
> + } else {
> + /*
> + * Although nr_samples == 0 is reported as Fail here,
> + * the failure status is not cascaded up because, we
> + * can not decide whether test really failed or not
> + * without actual samples.
> + */
> + pr_debug("freq %d, sample_freq %9ld: %-4s (nr samples: %d)\n",
> + fetch_period[i].freq, fetch_period[i].sample_freq,
> + (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
> + }
> + ret |= r;
> + }
> +
> + pr_debug("Op PMU test:\n");
> + for (i = 0; i < ARRAY_SIZE(op_period); i++) {
> + nr_samples = 0;
> + r = __ibs_period_constraint_test(IBS_OP, &op_period[i],
> + &nr_samples);
> +
> + if (op_period[i].ret == FD_ERROR) {
> + pr_debug("freq %d, sample_freq %9ld: %-4s\n",
> + op_period[i].freq, op_period[i].sample_freq,
> + !r ? "Ok" : "Fail");
> + } else {
> + /*
> + * Although nr_samples == 0 is reported as Fail here,
> + * the failure status is not cascaded up because, we
> + * can not decide whether test really failed or not
> + * without actual samples.
> + */
> + pr_debug("freq %d, sample_freq %9ld: %-4s (nr samples: %d)\n",
> + op_period[i].freq, op_period[i].sample_freq,
> + (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
> + }
> + ret |= r;
> + }
> +
> + return ret;
> +}
> +
> +struct ibs_ioctl {
> + /* Input */
> + int freq;
> + unsigned long period;
> +
> + /* Expected output */
> + int ret;
> +};
> +
> +struct ibs_ioctl fetch_ioctl[] = {
> + { .freq = 0, .period = 0x0, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x1, .ret = FD_ERROR },
> + { .freq = 0, .period = 0xf, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x10, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x11, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x1f, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x20, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x80, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x8f, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x90, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x91, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x100, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0xfff0, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0xffff, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x10000, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x1fff0, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x1fff5, .ret = FD_ERROR },
> + { .freq = 1, .period = 0x0, .ret = FD_ERROR },
> + { .freq = 1, .period = 0x1, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0xf, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x10, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x11, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x1f, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x20, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x80, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x8f, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x90, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x91, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x100, .ret = FD_SUCCESS },
> +};
> +
> +struct ibs_ioctl op_ioctl[] = {
> + { .freq = 0, .period = 0x0, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x1, .ret = FD_ERROR },
> + { .freq = 0, .period = 0xf, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x10, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x11, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x1f, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x20, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x80, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x8f, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x90, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x91, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x100, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0xfff0, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0xffff, .ret = FD_ERROR },
> + { .freq = 0, .period = 0x10000, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x1fff0, .ret = FD_SUCCESS },
> + { .freq = 0, .period = 0x1fff5, .ret = FD_ERROR },
> + { .freq = 1, .period = 0x0, .ret = FD_ERROR },
> + { .freq = 1, .period = 0x1, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0xf, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x10, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x11, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x1f, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x20, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x80, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x8f, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x90, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x91, .ret = FD_SUCCESS },
> + { .freq = 1, .period = 0x100, .ret = FD_SUCCESS },
> +};
> +
> +static int __ibs_ioctl_test(int ibs_type, struct ibs_ioctl *ibs_ioctl)
> +{
> + struct perf_event_attr attr;
> + int ret = 0;
> + int fd;
> + int r;
> +
> + if (ibs_type == IBS_FETCH)
> + fetch_prepare_attr(&attr, 0, ibs_ioctl->freq, 1000);
> + else
> + op_prepare_attr(&attr, 0, ibs_ioctl->freq, 1000);
> +
> + /* CPU0, All processes */
> + fd = perf_event_open(&attr, -1, 0, -1, 0);
> + if (fd <= -1) {
> + pr_debug("event_open() Failed\n");
> + return -1;
> + }
> +
> + r = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ibs_ioctl->period);
> + if ((ibs_ioctl->ret == FD_SUCCESS && r <= -1) ||
> + (ibs_ioctl->ret == FD_ERROR && r >= 0)) {
> + ret = -1;
> + }
> +
> + close(fd);
> + return ret;
> +}
> +
> +static int ibs_ioctl_test(void)
> +{
> + unsigned long i;
> + int ret = 0;
> + int r;
> +
> + pr_debug("\nIBS ioctl() tests:\n");
> + pr_debug("------------------\n");
> +
> + pr_debug("Fetch PMU tests\n");
> + for (i = 0; i < ARRAY_SIZE(fetch_ioctl); i++) {
> + r = __ibs_ioctl_test(IBS_FETCH, &fetch_ioctl[i]);
> +
> + pr_debug("ioctl(%s = 0x%-7lx): %s\n",
> + fetch_ioctl[i].freq ? "freq " : "period",
> + fetch_ioctl[i].period, r ? "Fail" : "Ok");
> + ret |= r;
> + }
> +
> + pr_debug("Op PMU tests\n");
> + for (i = 0; i < ARRAY_SIZE(op_ioctl); i++) {
> + r = __ibs_ioctl_test(IBS_OP, &op_ioctl[i]);
> +
> + pr_debug("ioctl(%s = 0x%-7lx): %s\n",
> + op_ioctl[i].freq ? "freq " : "period",
> + op_ioctl[i].period, r ? "Fail" : "Ok");
> + ret |= r;
> + }
> +
> + return ret;
> +}
> +
> +static int ibs_freq_neg_test(void)
> +{
> + struct perf_event_attr attr;
> + int fd;
> +
> + pr_debug("\nIBS freq (negative) tests:\n");
> + pr_debug("--------------------------\n");
> +
> + /*
> + * Assuming perf_event_max_sample_rate <= 100000,
> + * config: 0x300D40 ==> MaxCnt: 200000
> + */
> + op_prepare_attr(&attr, 0x300D40, 1, 0);
> +
> + /* CPU0, All processes */
> + fd = perf_event_open(&attr, -1, 0, -1, 0);
> + if (fd != -1) {
> + pr_debug("freq 1, sample_freq 200000: Fail\n");
> + close(fd);
> + return -1;
> + }
> +
> + pr_debug("freq 1, sample_freq 200000: Ok\n");
> +
> + return 0;
> +}
> +
> +struct ibs_l3missonly {
> + /* Input */
> + int freq;
> + unsigned long sample_freq;
> +
> + /* Expected output */
> + int ret;
> + unsigned long min_period;
> +};
> +
> +struct ibs_l3missonly fetch_l3missonly = {
> + .freq = 1,
> + .sample_freq = 10000,
> + .ret = FD_SUCCESS,
> + .min_period = 0x10,
> +};
> +
> +struct ibs_l3missonly op_l3missonly = {
> + .freq = 1,
> + .sample_freq = 10000,
> + .ret = FD_SUCCESS,
> + .min_period = 0x90,
> +};
> +
> +static int __ibs_l3missonly_test(char *perf, int ibs_type, int *nr_samples,
> + struct ibs_l3missonly *l3missonly)
> +{
> + struct perf_event_attr attr;
> + int ret = 0;
> + void *rb;
> + int fd;
> +
> + if (l3missonly->sample_freq > perf_event_max_sample_rate)
> + l3missonly->ret = FD_ERROR;
> +
> + if (ibs_type == IBS_FETCH) {
> + fetch_prepare_attr(&attr, 0x800000000000000UL, l3missonly->freq,
> + l3missonly->sample_freq);
> + } else {
> + op_prepare_attr(&attr, 0x10000, l3missonly->freq,
> + l3missonly->sample_freq);
> + }
> +
> + /* CPU0, All processes */
> + fd = perf_event_open(&attr, -1, 0, -1, 0);
> + if (l3missonly->ret == FD_ERROR) {
> + if (fd != -1) {
> + close(fd);
> + return -1;
> + }
> + return 0;
> + }
> + if (fd == -1) {
> + pr_debug("perf_event_open() failed. [%m]\n");
> + return -1;
> + }
> +
> + rb = mmap(NULL, PERF_MMAP_TOTAL_SIZE, PROT_READ | PROT_WRITE,
> + MAP_SHARED, fd, 0);
> + if (rb == MAP_FAILED) {
> + pr_debug("mmap() failed. [%m]\n");
> + close(fd);
> + return -1;
> + }
> +
> + ioctl(fd, PERF_EVENT_IOC_RESET, 0);
> + ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
> +
> + dummy_workload_2(perf);
> +
> + ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
> +
> + ret = rb_drain_samples(rb, l3missonly->min_period, nr_samples, period_higher);
> +
> + munmap(rb, PERF_MMAP_TOTAL_SIZE);
> + close(fd);
> + return ret;
> +}
> +
> +static int ibs_l3missonly_test(char *perf)
> +{
> + int nr_samples = 0;
> + int ret = 0;
> + int r = 0;
> +
> + pr_debug("\nIBS L3MissOnly test: (takes a while)\n");
> + pr_debug("--------------------\n");
> +
> + if (perf_pmu__has_format(fetch_pmu, "l3missonly")) {
> + nr_samples = 0;
> + r = __ibs_l3missonly_test(perf, IBS_FETCH, &nr_samples, &fetch_l3missonly);
> + if (fetch_l3missonly.ret == FD_ERROR) {
> + pr_debug("Fetch L3MissOnly: %-4s\n", !r ? "Ok" : "Fail");
> + } else {
> + /*
> + * Although nr_samples == 0 is reported as Fail here,
> + * the failure status is not cascaded up because, we
> + * can not decide whether test really failed or not
> + * without actual samples.
> + */
> + pr_debug("Fetch L3MissOnly: %-4s (nr_samples: %d)\n",
> + (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
> + }
> + ret |= r;
> + }
> +
> + if (perf_pmu__has_format(op_pmu, "l3missonly")) {
> + nr_samples = 0;
> + r = __ibs_l3missonly_test(perf, IBS_OP, &nr_samples, &op_l3missonly);
> + if (op_l3missonly.ret == FD_ERROR) {
> + pr_debug("Op L3MissOnly: %-4s\n", !r ? "Ok" : "Fail");
> + } else {
> + /*
> + * Although nr_samples == 0 is reported as Fail here,
> + * the failure status is not cascaded up because, we
> + * can not decide whether test really failed or not
> + * without actual samples.
> + */
> + pr_debug("Op L3MissOnly: %-4s (nr_samples: %d)\n",
> + (!r && nr_samples != 0) ? "Ok" : "Fail", nr_samples);
> + }
> + ret |= r;
> + }
> +
> + return ret;
> +}
> +
> +static unsigned int get_perf_event_max_sample_rate(void)
> +{
> + unsigned int max_sample_rate = 100000;
> + FILE *fp;
> + int ret;
> +
> + fp = fopen("/proc/sys/kernel/perf_event_max_sample_rate", "r");
> + if (!fp) {
> + pr_debug("Can't open perf_event_max_sample_rate. Asssuming %d\n",
> + max_sample_rate);
> + goto out;
> + }
> +
> + ret = fscanf(fp, "%d", &max_sample_rate);
> + if (ret == EOF) {
> + pr_debug("Can't read perf_event_max_sample_rate. Assuming 100000\n");
> + max_sample_rate = 100000;
> + }
> + fclose(fp);
> +
> +out:
> + return max_sample_rate;
> +}
> +
> +int test__amd_ibs_period(struct test_suite *test __maybe_unused,
> + int subtest __maybe_unused)
> +{
> + char perf[PATH_MAX] = {'\0'};
> + int ret = TEST_OK;
> +
> + /*
> + * Reading perf_event_max_sample_rate only once _might_ cause some
> + * of the test to fail if kernel changes it after reading it here.
> + */
> + perf_event_max_sample_rate = get_perf_event_max_sample_rate();
> + fetch_pmu = perf_pmus__find("ibs_fetch");
> + op_pmu = perf_pmus__find("ibs_op");
> +
> + if (!x86__is_amd_cpu() || !fetch_pmu || !op_pmu)
> + return TEST_SKIP;
> +
> + perf_exe(perf, sizeof(perf));
> +
> + if (sched_affine(0))
> + return TEST_FAIL;
> +
> + /*
> + * Perf event can be opened in two modes:
> + * 1 Freq mode
> + * perf_event_attr->freq = 1, ->sample_freq = <frequency>
> + * 2 Sample period mode
> + * perf_event_attr->freq = 0, ->sample_period = <period>
> + *
> + * Instead of using above interface, IBS event in 'sample period mode'
> + * can also be opened by passing <period> value directly in a MaxCnt
> + * bitfields of perf_event_attr->config. Test this IBS specific special
> + * interface.
> + */
> + if (ibs_config_test())
> + ret = TEST_FAIL;
> +
> + /*
> + * IBS Fetch and Op PMUs have HW constraints on minimum sample period.
> + * Also, sample period value must be in multiple of 0x10. Test that IBS
> + * driver honors HW constraints for various possible values in Freq as
> + * well as Sample Period mode IBS events.
> + */
> + if (ibs_period_constraint_test())
> + ret = TEST_FAIL;
> +
> + /*
> + * Test ioctl() with various sample period values for IBS event.
> + */
> + if (ibs_ioctl_test())
> + ret = TEST_FAIL;
> +
> + /*
> + * Test that opening of freq mode IBS event fails when the freq value
> + * is passed through ->config, not explicitly in ->sample_freq. Also
> + * use high freq value (beyond perf_event_max_sample_rate) to test IBS
> + * driver do not bypass perf_event_max_sample_rate checks.
> + */
> + if (ibs_freq_neg_test())
> + ret = TEST_FAIL;
> +
> + /*
> + * L3MissOnly is a post-processing filter, i.e. IBS HW checks for L3
> + * Miss at the completion of the tagged uOp. The sample is discarded
> + * if the tagged uOp did not cause L3Miss. Also, IBS HW internally
> + * resets CurCnt to a small pseudo-random value and resumes counting.
> + * A new uOp is tagged once CurCnt reaches to MaxCnt. But the process
> + * repeats until the tagged uOp causes an L3 Miss.
> + *
> + * With the freq mode event, the next sample period is calculated by
> + * generic kernel on every sample to achieve desired freq of samples.
> + *
> + * Since the number of times HW internally reset CurCnt and the pseudo-
> + * random value of CurCnt for all those occurrences are not known to SW,
> + * the sample period adjustment by kernel goes for a toes for freq mode
> + * IBS events. Kernel will set very small period for the next sample if
> + * the window between current sample and prev sample is too high due to
> + * multiple samples being discarded internally by IBS HW.
> + *
> + * Test that IBS sample period constraints are honored when L3MissOnly
> + * is ON.
> + */
> + if (ibs_l3missonly_test(perf))
> + ret = TEST_FAIL;
> +
> + return ret;
> +}
> diff --git a/tools/perf/arch/x86/tests/arch-tests.c b/tools/perf/arch/x86/tests/arch-tests.c
> index a216a5d172ed..bfee2432515b 100644
> --- a/tools/perf/arch/x86/tests/arch-tests.c
> +++ b/tools/perf/arch/x86/tests/arch-tests.c
> @@ -25,6 +25,7 @@ DEFINE_SUITE("x86 bp modify", bp_modify);
> #endif
> DEFINE_SUITE("x86 Sample parsing", x86_sample_parsing);
> DEFINE_SUITE("AMD IBS via core pmu", amd_ibs_via_core_pmu);
> +DEFINE_SUITE_EXCLUSIVE("AMD IBS sample period", amd_ibs_period);
> static struct test_case hybrid_tests[] = {
> TEST_CASE_REASON("x86 hybrid event parsing", hybrid, "not hybrid"),
> { .name = NULL, }
> @@ -50,6 +51,7 @@ struct test_suite *arch_tests[] = {
> #endif
> &suite__x86_sample_parsing,
> &suite__amd_ibs_via_core_pmu,
> + &suite__amd_ibs_period,
> &suite__hybrid,
> NULL,
> };
> --
> 2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-29 20:55 ` Arnaldo Carvalho de Melo
@ 2025-04-30 1:13 ` Arnaldo Carvalho de Melo
2025-04-30 1:22 ` Arnaldo Carvalho de Melo
2025-04-30 6:36 ` Ravi Bangoria
2025-04-30 6:33 ` Ravi Bangoria
1 sibling, 2 replies; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-04-30 1:13 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das
On Tue, Apr 29, 2025 at 05:55:13PM -0300, Arnaldo Carvalho de Melo wrote:
> On Tue, Apr 29, 2025 at 03:59:38AM +0000, Ravi Bangoria wrote:
> > IBS Fetch and IBS Op PMUs has various constraints on supported sample
> > periods. Add perf unit tests to test those.
> >
> > Running it in parallel with other tests causes intermittent failures.
> > Mark it exclusive to force it to run sequentially. Sample output on a
> > Zen5 machine:
>
> I've applied the series and will test it now, but found some problems
> when building in some non-glibc systems, namely the use of PAGE_SIZE,
> that is used in libc headers, even in glibc, its just that in glibc we
> happen not to include that header where PAGE_SIZE gets redefined:
>
> ⬢ [acme@toolbx perf-tools-next]$ grep PAGE_SIZE /usr/include/sys/*.h
> /usr/include/sys/user.h:#define PAGE_SIZE (1UL << PAGE_SHIFT)
> /usr/include/sys/user.h:#define PAGE_MASK (~(PAGE_SIZE-1))
> /usr/include/sys/user.h:#define NBPG PAGE_SIZE
> ⬢ [acme@toolbx perf-tools-next]$
>
> So I folded the following patch, see if it is acceptable and please ack.
>
> Thanks for respining it!
Another issue when building with clang on musl:
arch/x86/tests/amd-ibs-period.c:81:3: error: no matching function for call to 'memcpy'
memcpy(func, insn1, sizeof(insn1));
^~~~~~
/usr/include/string.h:27:7: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *' for 1st argument
void *memcpy (void *__restrict, const void *__restrict, size_t);
^
/usr/include/fortify/string.h:40:27: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *const' for 1st argument
_FORTIFY_FN(memcpy) void *memcpy(void * _FORTIFY_POS0 __od,
^
arch/x86/tests/amd-ibs-period.c:87:3: error: no matching function for call to 'memcpy'
memcpy(func, insn2, sizeof(insn2));
^~~~~~
/usr/include/string.h:27:7: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *' for 1st argument
void *memcpy (void *__restrict, const void *__restrict, size_t);
^
/usr/include/fortify/string.h:40:27: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *const' for 1st argument
_FORTIFY_FN(memcpy) void *memcpy(void * _FORTIFY_POS0 __od,
^
2 errors generated.
CC /tmp/build/perf/ui/browsers/header.o
CC /tmp/build/perf/arch/x86/util/mem-events.o
Adding the patch below cures it, still need to test on a Zen 5 system.
These issues were just in the regression test.
- Arnaldo
diff --git a/tools/perf/arch/x86/tests/amd-ibs-period.c b/tools/perf/arch/x86/tests/amd-ibs-period.c
index 946b0a377554fb81..a198434da9b5c4a1 100644
--- a/tools/perf/arch/x86/tests/amd-ibs-period.c
+++ b/tools/perf/arch/x86/tests/amd-ibs-period.c
@@ -78,13 +78,13 @@ static int dummy_workload_1(unsigned long count)
else if (count > 10000000)
count = 10000000;
while (count--) {
- memcpy(func, insn1, sizeof(insn1));
+ memcpy((void *)func, insn1, sizeof(insn1));
if (func() != 1) {
pr_debug("ERROR insn1\n");
ret = -1;
goto out;
}
- memcpy(func, insn2, sizeof(insn2));
+ memcpy((void *)func, insn2, sizeof(insn2));
if (func() != 2) {
pr_debug("ERROR insn2\n");
ret = -1;
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-30 1:13 ` Arnaldo Carvalho de Melo
@ 2025-04-30 1:22 ` Arnaldo Carvalho de Melo
2025-04-30 9:02 ` Ravi Bangoria
2025-04-30 6:36 ` Ravi Bangoria
1 sibling, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-04-30 1:22 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das
On Tue, Apr 29, 2025 at 10:13:53PM -0300, Arnaldo Carvalho de Melo wrote:
> Adding the patch below cures it, still need to test on a Zen 5 system.
>
> These issues were just in the regression test.
BTW, all is at the tmp.perf-tools-next branch at:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
- Arnaldo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes)
2025-04-29 3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
` (3 preceding siblings ...)
2025-04-29 3:59 ` [PATCH v4 4/4] perf test amd ibs: Add sample period unit test Ravi Bangoria
@ 2025-04-30 2:00 ` Arnaldo Carvalho de Melo
2025-05-13 8:32 ` Ravi Bangoria
4 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-04-30 2:00 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das
On Tue, Apr 29, 2025 at 03:59:34AM +0000, Ravi Bangoria wrote:
> IBS on Zen5:
> - Introduced Load Latency filtering capability.
> - Shows DTLB and page size information differently from prior generations.
>
> Kernel changes for these enhancements are already upstream. So, resending
> tools changes separately.
>
> Patches are prepared on perf-tools-next/perf-tools-next (85447f68a1e3).
>
> v3: https://lore.kernel.org/r/20250205060547.1337-1-ravi.bangoria@amd.com
> v3->v4:
> - Remove kernel changes.
> - Improve IBS sample period unit test
Preliminary tests with what is in tmp.perf-tools-next:
root@number:~# perf mem record find / > /dev/null
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.992 MB perf.data (31824 samples) ]
root@number:~# perf mem report -s mem --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 31K of event 'ibs_op//'
# Total weight : 66561
# Sort order : mem
#
# Overhead Samples Memory access
# ........ ............ .......................................
#
36.51% 456 L2 hit
30.26% 20141 N/A
16.75% 11149 L1 hit
10.08% 18 RAM hit
6.39% 52 L3 hit
0.01% 8 LFB/MAB hit
#
# (Tip: To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed))
#
root@number:~#
root@number:~# perf evlist -v
ibs_op//: type: 11 (ibs_op), size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
root@number:~#
root@number:~# perf report --header-only | head -25
# ========
# captured on : Tue Apr 29 22:54:04 2025
# header version : 1
# data offset : 512
# data size : 668520
# feat offset : 669032
# hostname : number
# os release : 6.15.0-rc4+
# perf version : 6.15.rc2.g3e8278077117
# arch : x86_64
# nrcpus online : 32
# nrcpus avail : 32
# cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
# cpuid : AuthenticAMD,26,68,0
# total memory : 31928240 kB
# cmdline : /home/acme/bin/perf mem record find /
# event : name = ibs_op//, , id = { 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335 }, type = 11 (ibs_op), size = 136, config = 0, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, mmap_data = 1, sample_id_all = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, amd_df = 12, amd_iommu_0 = 15, amd_l3 = 13, amd_umc_0 = 14, breakpoint = 5, hwmon_amdgpu = 4294901761, hwmon_k10temp = 4294901762, hwmon_nvme = 4294901760, hwmon_r8169_0_e00_00 = 4294901763, ibs_fetch = 10, ibs_op = 11, kprobe = 8, msr = 16, power = 17, power_core = 18, software = 1, tool = 4294967294, tracepoint = 2, uprobe = 9
# CACHE info available, use -I to display
# time of first sample : 244.312475
# time of last sample : 246.801803
# sample duration : 2489.328 ms
# MEM_TOPOLOGY info available, use -I to display
root@number:~#
root@number:~# perf report | head
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'ibs_op//'
# Event count (approx.): 12948758501
#
# Overhead Command Shared Object Symbol
# ........ ....... ......................... ........................................
root@number:~# perf report | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'ibs_op//'
# Event count (approx.): 12948758501
#
# Overhead Command Shared Object Symbol
# ........ ....... ......................... ........................................
#
6.11% find [kernel.kallsyms] [k] btrfs_bin_search
4.91% find [kernel.kallsyms] [k] filldir64
4.77% find find [.] consider_visiting
3.95% find [kernel.kallsyms] [k] memcpy
2.76% find [kernel.kallsyms] [k] entry_SYSCALL_64
2.59% find libc.so.6 [.] __printf_buffer
2.52% find [kernel.kallsyms] [k] btrfs_getattr
2.09% find [kernel.kallsyms] [k] pid_delete_dentry
1.88% find libc.so.6 [.] msort_with_tmp.part.0
root@number:~#
root@number:~# perf annotate -v --stdio2 btrfs_bin_search
build id event received for [vdso]: 6dc5707510cc7434be3d6cb4dc6bae12881efda3 [20]
build id event received for /usr/bin/find: 3804e1e1214a39a975e093a79ec04961743ef5c5 [20]
build id event received for /usr/lib64/libc.so.6: 2b3c02fe7e4d3811767175b6f323692a10a4e116 [20]
build id event received for [kernel.kallsyms]: d391f0e79126801bc8a8f907e763de7979941712 [20]
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/6.15.0-rc4+/build/vmlinux for symbols
read_gnu_debugdata: using .gnu_debugdata of /usr/bin/find
symbol__disassemble: filename=/lib/modules/6.15.0-rc4+/build/vmlinux, sym=btrfs_bin_search, start=0xffffffffac97e890, end=0xffffffffac97ead9
annotating [0x2e87fbf0] /lib/modules/6.15.0-rc4+/build/vmlinux : [0x2fa7f070] btrfs_bin_search
Disassembled with llvm
Samples: 585 of event 'ibs_op//', 4000 Hz, Event count (approx.): 790819874, [percent: local period]
btrfs_bin_search() /lib/modules/6.15.0-rc4+/build/vmlinux
Percent 0xffffffff8197e890 <btrfs_bin_search>:
0.17 endbr64
→ callq __fentry__
0.16 pushq %r15
0.18 movq %rdx,%r15
pushq %r14
pushq %r13
0.18 pushq %r12
0.34 pushq %rbp
movl %esi,%ebp
pushq %rbx
0.35 movq %rdi,%rbx
subq $0x48,%rsp
movq (%rdi),%r9
movq %rcx,(%rsp)
0.34 movq %r9,%rdx
andl $0xfff,%edx
movq __stack_chk_guard,%r14
0.18 movq %r14,0x40(%rsp)
0.33 movl %esi,%r14d
0.17 movq 0x70(%rdi),%rsi
movq %rsi,%rax
subq vmemmap_base,%rax
sarq $0x6, %rax
0.17 shlq $0xc, %rax
0.15 addq page_offset_base,%rax
0.17 addq %rdx,%rax
movl 0x60(%rax),%r13d
0.17 cmpl %ebp,%r13d
→ jb btrfs_bin_search.cold
cmpb $0x1,0x64(%rax)
sbbl %r12d,%r12d
andl $-0x8,%r12d
0.18 addl $0x21,%r12d
cmpl %r13d,%r14d
↓ jae 20f
1.04 84: leal (%r14,%r13),%ebp
0.81 movb $0x0,0x3f(%rsp)
1.20 movslq 0xc(%rbx),%r10
0.66 movl $0xfff,%r11d
1.06 movq $0x0,0x2f(%rsp)
0.85 shrl %ebp
1.35 movq $0x0,0x37(%rsp)
1.93 movl %ebp,%eax
1.04 movq (%rsi),%rdx
2.20 imull %r12d,%eax
10.77 cltq
10.26 addq $0x65,%rax
3.11 addq %rax,%r9
0.68 andl $0x40,%edx
↓ je e3
movq 0x40(%rsi),%rsi
movl $0x1000,%r11d
movzbl %sil,%ecx
shlq %cl, %r11
subq $0x1,%r11
root@number:~#
I'll do more tests tomorrow and try some of the workloads that Joe uses.
Thanks a lot!
- Arnaldo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-29 20:55 ` Arnaldo Carvalho de Melo
2025-04-30 1:13 ` Arnaldo Carvalho de Melo
@ 2025-04-30 6:33 ` Ravi Bangoria
1 sibling, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-30 6:33 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das,
Ravi Bangoria
On 30-Apr-25 2:25 AM, Arnaldo Carvalho de Melo wrote:
> On Tue, Apr 29, 2025 at 03:59:38AM +0000, Ravi Bangoria wrote:
>> IBS Fetch and IBS Op PMUs has various constraints on supported sample
>> periods. Add perf unit tests to test those.
>>
>> Running it in parallel with other tests causes intermittent failures.
>> Mark it exclusive to force it to run sequentially. Sample output on a
>> Zen5 machine:
>
> I've applied the series and will test it now, but found some problems
> when building in some non-glibc systems, namely the use of PAGE_SIZE,
> that is used in libc headers, even in glibc, its just that in glibc we
> happen not to include that header where PAGE_SIZE gets redefined:
>
> ⬢ [acme@toolbx perf-tools-next]$ grep PAGE_SIZE /usr/include/sys/*.h
> /usr/include/sys/user.h:#define PAGE_SIZE (1UL << PAGE_SHIFT)
> /usr/include/sys/user.h:#define PAGE_MASK (~(PAGE_SIZE-1))
> /usr/include/sys/user.h:#define NBPG PAGE_SIZE
> ⬢ [acme@toolbx perf-tools-next]$
>
> So I folded the following patch, see if it is acceptable and please ack.
Thanks for the fix Arnaldo. It LGTM.
Ravi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-30 1:13 ` Arnaldo Carvalho de Melo
2025-04-30 1:22 ` Arnaldo Carvalho de Melo
@ 2025-04-30 6:36 ` Ravi Bangoria
1 sibling, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-30 6:36 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das,
Ravi Bangoria
> Another issue when building with clang on musl:
>
> arch/x86/tests/amd-ibs-period.c:81:3: error: no matching function for call to 'memcpy'
> memcpy(func, insn1, sizeof(insn1));
> ^~~~~~
> /usr/include/string.h:27:7: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *' for 1st argument
> void *memcpy (void *__restrict, const void *__restrict, size_t);
> ^
> /usr/include/fortify/string.h:40:27: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *const' for 1st argument
> _FORTIFY_FN(memcpy) void *memcpy(void * _FORTIFY_POS0 __od,
> ^
> arch/x86/tests/amd-ibs-period.c:87:3: error: no matching function for call to 'memcpy'
> memcpy(func, insn2, sizeof(insn2));
> ^~~~~~
> /usr/include/string.h:27:7: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *' for 1st argument
> void *memcpy (void *__restrict, const void *__restrict, size_t);
> ^
> /usr/include/fortify/string.h:40:27: note: candidate function not viable: no known conversion from 'int (*)(void)' to 'void *const' for 1st argument
> _FORTIFY_FN(memcpy) void *memcpy(void * _FORTIFY_POS0 __od,
> ^
> 2 errors generated.
> CC /tmp/build/perf/ui/browsers/header.o
> CC /tmp/build/perf/arch/x86/util/mem-events.o
>
> Adding the patch below cures it, still need to test on a Zen 5 system.
Thanks for the fix. Looks good.
Ravi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-30 1:22 ` Arnaldo Carvalho de Melo
@ 2025-04-30 9:02 ` Ravi Bangoria
2025-04-30 13:06 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-30 9:02 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das,
Ravi Bangoria
On 30-Apr-25 6:52 AM, Arnaldo Carvalho de Melo wrote:
> On Tue, Apr 29, 2025 at 10:13:53PM -0300, Arnaldo Carvalho de Melo wrote:
>> Adding the patch below cures it, still need to test on a Zen 5 system.
>>
>> These issues were just in the regression test.
>
> BTW, all is at the tmp.perf-tools-next branch at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
I tested with few simple perf mem/c2c commands and it seems to be working
fine.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-30 9:02 ` Ravi Bangoria
@ 2025-04-30 13:06 ` Arnaldo Carvalho de Melo
2025-04-30 13:31 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-04-30 13:06 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das
On Wed, Apr 30, 2025 at 02:32:12PM +0530, Ravi Bangoria wrote:
> On 30-Apr-25 6:52 AM, Arnaldo Carvalho de Melo wrote:
> > On Tue, Apr 29, 2025 at 10:13:53PM -0300, Arnaldo Carvalho de Melo wrote:
> >> Adding the patch below cures it, still need to test on a Zen 5 system.
> >> These issues were just in the regression test.
> > BTW, all is at the tmp.perf-tools-next branch at:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
> I tested with few simple perf mem/c2c commands and it seems to be working
> fine.
Thanks for checking!
While testing it I noticed that locally built kernels using O= to
separate the build files from source are ending up with:
root@number:/home/acme/git/linux# readelf -wi ../build/v6.15.0-rc4+/vmlinux | grep -m1 DW_AT_comp_dir
<17> DW_AT_comp_dir : (indirect line string, offset: 0): /home/acme/git/build/v6.15.0-rc4+
root@number:/home/acme/git/linux# readelf -wi ../build/v6.15.0-rc4+/vmlinux | grep DW_AT_comp_dir | cut -d: -f4 | sort | uniq -c
49 /home/acme/git/build/v6.15.0-rc2+
3104 /home/acme/git/build/v6.15.0-rc4+
root@number:/home/acme/git/linux#
I reused a previous build dir, something got reused, but then tools like
annotate, objdump -dS can't find the sources.
A distraction while testing your patches :-\
- Arnaldo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-30 13:06 ` Arnaldo Carvalho de Melo
@ 2025-04-30 13:31 ` Arnaldo Carvalho de Melo
2025-04-30 16:07 ` Ravi Bangoria
0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-04-30 13:31 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das
On Wed, Apr 30, 2025 at 10:06:33AM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Apr 30, 2025 at 02:32:12PM +0530, Ravi Bangoria wrote:
> > On 30-Apr-25 6:52 AM, Arnaldo Carvalho de Melo wrote:
> > > On Tue, Apr 29, 2025 at 10:13:53PM -0300, Arnaldo Carvalho de Melo wrote:
> > >> Adding the patch below cures it, still need to test on a Zen 5 system.
>
> > >> These issues were just in the regression test.
>
> > > BTW, all is at the tmp.perf-tools-next branch at:
>
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
>
> > I tested with few simple perf mem/c2c commands and it seems to be working
> > fine.
>
> Thanks for checking!
>
BTW:
root@number:/home/acme/git/linux# perf test -vvv ibs
73: AMD IBS via core pmu:
--- start ---
test child forked, pid 10047
Using CPUID AuthenticAMD-26-44-0
type: 0x0, config: 0x0, fd: 3 - Pass
type: 0x0, config: 0x1, fd: -1 - Pass
type: 0x4, config: 0x76, fd: 3 - Pass
type: 0x4, config: 0xc1, fd: 3 - Pass
type: 0x4, config: 0x12, fd: -1 - Pass
---- end(0) ----
73: AMD IBS via core pmu : Ok
root@number:/home/acme/git/linux#
- Arnaldo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-30 13:31 ` Arnaldo Carvalho de Melo
@ 2025-04-30 16:07 ` Ravi Bangoria
2025-04-30 23:39 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-30 16:07 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Shukla, Santosh, Ananth Narayan, Sandipan Das, Ravi Bangoria
Hi Arnaldo,
> root@number:/home/acme/git/linux# perf test -vvv ibs
> 73: AMD IBS via core pmu:
> --- start ---
> test child forked, pid 10047
> Using CPUID AuthenticAMD-26-44-0
> type: 0x0, config: 0x0, fd: 3 - Pass
> type: 0x0, config: 0x1, fd: -1 - Pass
> type: 0x4, config: 0x76, fd: 3 - Pass
> type: 0x4, config: 0xc1, fd: 3 - Pass
> type: 0x4, config: 0x12, fd: -1 - Pass
> ---- end(0) ----
> 73: AMD IBS via core pmu : Ok
> root@number:/home/acme/git/linux#
It picks up both the IBS tests for me. (Is that what you mean?)
$ sudo ./perf test ibs
73: AMD IBS via core pmu : Ok
112: AMD IBS sample period : Ok
$ ./perf --version
perf version 6.15.rc2.g35db59fa8ea2
Thanks,
Ravi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump
2025-04-29 3:59 ` [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump Ravi Bangoria
@ 2025-04-30 16:58 ` Namhyung Kim
2025-04-30 17:45 ` Ravi Bangoria
0 siblings, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2025-04-30 16:58 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das
Hello,
On Tue, Apr 29, 2025 at 03:59:35AM +0000, Ravi Bangoria wrote:
> IBS OP PMU on Zen5 supports Load Latency filtering. Decode and dump Load
> Latency filtering related bits into perf script raw dump.
>
> Also add oneliner example in the perf-amd-ibs man page.
>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
> ---
> tools/perf/Documentation/perf-amd-ibs.txt | 9 +++++++++
> tools/perf/util/amd-sample-raw.c | 14 ++++++++++++--
> 2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-amd-ibs.txt b/tools/perf/Documentation/perf-amd-ibs.txt
> index 2fd31d9d7b71..55f80beae037 100644
> --- a/tools/perf/Documentation/perf-amd-ibs.txt
> +++ b/tools/perf/Documentation/perf-amd-ibs.txt
> @@ -85,6 +85,15 @@ System-wide profile, uOps event, sampling period: 100000, L3MissOnly (Zen4 onwar
>
> # perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
>
> +System-wide profile, cycles event, sampling period: 100000, LdLat filtering (Zen5
> +onward)
> +
> + # perf record -e ibs_op/ldlat=128/ -c 100000 -a
> +
> + Supported load latency threshold values are 128 to 2048 (both inclusive).
What happens if user gives an out of range value?
> + Latency value which is a multiple of 128 incurs a little less profiling
> + overhead compared to other values.
> +
> Per process(upstream v6.2 onward), uOps event, sampling period: 100000
>
> # perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
> diff --git a/tools/perf/util/amd-sample-raw.c b/tools/perf/util/amd-sample-raw.c
> index 9d0ce88e90e4..ac34b18ccc0c 100644
> --- a/tools/perf/util/amd-sample-raw.c
> +++ b/tools/perf/util/amd-sample-raw.c
> @@ -19,6 +19,7 @@
>
> static u32 cpu_family, cpu_model, ibs_fetch_type, ibs_op_type;
> static bool zen4_ibs_extensions;
> +static bool ldlat_cap;
>
> static void pr_ibs_fetch_ctl(union ibs_fetch_ctl reg)
> {
> @@ -78,14 +79,20 @@ static void pr_ic_ibs_extd_ctl(union ic_ibs_extd_ctl reg)
> static void pr_ibs_op_ctl(union ibs_op_ctl reg)
> {
> char l3_miss_only[sizeof(" L3MissOnly _")] = "";
> + char ldlat[sizeof(" LdLatThrsh __ LdLatEn _")] = "";
Shouldn't it reserve 4 characters for the threshold since it can be up
to 2048?
>
> if (zen4_ibs_extensions)
> snprintf(l3_miss_only, sizeof(l3_miss_only), " L3MissOnly %d", reg.l3_miss_only);
>
> - printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d\n",
> + if (ldlat_cap) {
> + snprintf(ldlat, sizeof(ldlat), " LdLatThrsh %2d LdLatEn %d",
Here, it would be %4d.
Thanks,
Namhyung
> + reg.ldlat_thrsh, reg.ldlat_en);
> + }
> +
> + printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d%s\n",
> reg.val, ((reg.opmaxcnt_ext << 16) | reg.opmaxcnt) << 4, l3_miss_only,
> reg.op_en, reg.op_val, reg.cnt_ctl,
> - reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt);
> + reg.cnt_ctl ? "uOps" : "cycles", reg.opcurcnt, ldlat);
> }
>
> static void pr_ibs_op_data(union ibs_op_data reg)
> @@ -331,6 +338,9 @@ bool evlist__has_amd_ibs(struct evlist *evlist)
> if (perf_env__find_pmu_cap(env, "ibs_op", "zen4_ibs_extensions"))
> zen4_ibs_extensions = 1;
>
> + if (perf_env__find_pmu_cap(env, "ibs_op", "ldlat"))
> + ldlat_cap = 1;
> +
> if (ibs_fetch_type || ibs_op_type) {
> if (!cpu_family)
> parse_cpuid(env);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump
2025-04-30 16:58 ` Namhyung Kim
@ 2025-04-30 17:45 ` Ravi Bangoria
0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2025-04-30 17:45 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das,
Ravi Bangoria
Hi Namhyung,
>> + # perf record -e ibs_op/ldlat=128/ -c 100000 -a
>> +
>> + Supported load latency threshold values are 128 to 2048 (both inclusive).
>
> What happens if user gives an out of range value?
Kernel returns error.
>> static void pr_ibs_op_ctl(union ibs_op_ctl reg)
>> {
>> char l3_miss_only[sizeof(" L3MissOnly _")] = "";
>> + char ldlat[sizeof(" LdLatThrsh __ LdLatEn _")] = "";
>
> Shouldn't it reserve 4 characters for the threshold since it can be up
> to 2048?
This function dumps HW register content. IBS_OP_CTL[LdLatThrsh] is a
4 bit field which should be programmed as:
(actual threshold / 128) - 1
Valid values for LdLatThrsh are 0 to 15.
>> if (zen4_ibs_extensions)
>> snprintf(l3_miss_only, sizeof(l3_miss_only), " L3MissOnly %d", reg.l3_miss_only);
>>
>> - printf("ibs_op_ctl:\t%016llx MaxCnt %9d%s En %d Val %d CntCtl %d=%s CurCnt %9d\n",
>> + if (ldlat_cap) {
>> + snprintf(ldlat, sizeof(ldlat), " LdLatThrsh %2d LdLatEn %d",
>
> Here, it would be %4d.
Since the valid values for LdLatThrsh are 0 to 15, two characters are
sufficient.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 4/4] perf test amd ibs: Add sample period unit test
2025-04-30 16:07 ` Ravi Bangoria
@ 2025-04-30 23:39 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-04-30 23:39 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang,
linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
Shukla, Santosh, Ananth Narayan, Sandipan Das
On Wed, Apr 30, 2025 at 09:37:22PM +0530, Ravi Bangoria wrote:
> Hi Arnaldo,
>
> > root@number:/home/acme/git/linux# perf test -vvv ibs
> > 73: AMD IBS via core pmu:
> > --- start ---
> > test child forked, pid 10047
> > Using CPUID AuthenticAMD-26-44-0
> > type: 0x0, config: 0x0, fd: 3 - Pass
> > type: 0x0, config: 0x1, fd: -1 - Pass
> > type: 0x4, config: 0x76, fd: 3 - Pass
> > type: 0x4, config: 0xc1, fd: 3 - Pass
> > type: 0x4, config: 0x12, fd: -1 - Pass
> > ---- end(0) ----
> > 73: AMD IBS via core pmu : Ok
> > root@number:/home/acme/git/linux#
>
> It picks up both the IBS tests for me. (Is that what you mean?)
>
> $ sudo ./perf test ibs
> 73: AMD IBS via core pmu : Ok
> 112: AMD IBS sample period : Ok
>
> $ ./perf --version
> perf version 6.15.rc2.g35db59fa8ea2
Are there two? I probably tested it with just the first patch on your
series applied, lemme see...
The second takes quite a while to finish :)
root@number:/# perf test ibs
73: AMD IBS via core pmu : Ok
112: AMD IBS sample period : Ok
root@number:/#
root@number:/# perf stat --null perf test 73
73: AMD IBS via core pmu : Ok
Performance counter stats for 'perf test 73':
0.018951061 seconds time elapsed
0.004640000 seconds user
0.005627000 seconds sys
root@number:/# perf stat --null perf test 112
112: AMD IBS sample period : Ok
Performance counter stats for 'perf test 112':
17.888279280 seconds time elapsed
1.539229000 seconds user
16.325610000 seconds sys
root@number:/#
But yeah, both are passing on this 9950x3d.
- Arnaldo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes)
2025-04-30 2:00 ` [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Arnaldo Carvalho de Melo
@ 2025-05-13 8:32 ` Ravi Bangoria
0 siblings, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2025-05-13 8:32 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ingo Molnar, Namhyung Kim, Peter Zijlstra, Joe Mario,
Stephane Eranian, Jiri Olsa, Ian Rogers, Kan Liang, linux-kernel,
linux-perf-users, Santosh Shukla, Ananth Narayan, Sandipan Das,
Ravi Bangoria
On 30-Apr-25 7:30 AM, Arnaldo Carvalho de Melo wrote:
> On Tue, Apr 29, 2025 at 03:59:34AM +0000, Ravi Bangoria wrote:
>> IBS on Zen5:
>> - Introduced Load Latency filtering capability.
>> - Shows DTLB and page size information differently from prior generations.
>>
>> Kernel changes for these enhancements are already upstream. So, resending
>> tools changes separately.
>>
>> Patches are prepared on perf-tools-next/perf-tools-next (85447f68a1e3).
>>
>> v3: https://lore.kernel.org/r/20250205060547.1337-1-ravi.bangoria@amd.com
>> v3->v4:
>> - Remove kernel changes.
>> - Improve IBS sample period unit test
>
> Preliminary tests with what is in tmp.perf-tools-next:
[...]
> I'll do more tests tomorrow and try some of the workloads that Joe uses.
Gentle ping, Arnaldo!
Thanks,
Ravi
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-05-13 8:32 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump Ravi Bangoria
2025-04-30 16:58 ` Namhyung Kim
2025-04-30 17:45 ` Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 2/4] perf amd ibs: Incorporate Zen5 DTLB and PageSize information Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 3/4] perf mem/c2c amd: Add ldlat support Ravi Bangoria
2025-04-29 3:59 ` [PATCH v4 4/4] perf test amd ibs: Add sample period unit test Ravi Bangoria
2025-04-29 20:55 ` Arnaldo Carvalho de Melo
2025-04-30 1:13 ` Arnaldo Carvalho de Melo
2025-04-30 1:22 ` Arnaldo Carvalho de Melo
2025-04-30 9:02 ` Ravi Bangoria
2025-04-30 13:06 ` Arnaldo Carvalho de Melo
2025-04-30 13:31 ` Arnaldo Carvalho de Melo
2025-04-30 16:07 ` Ravi Bangoria
2025-04-30 23:39 ` Arnaldo Carvalho de Melo
2025-04-30 6:36 ` Ravi Bangoria
2025-04-30 6:33 ` Ravi Bangoria
2025-04-30 2:00 ` [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Arnaldo Carvalho de Melo
2025-05-13 8:32 ` Ravi Bangoria
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).