* [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
@ 2024-10-01 0:20 Namhyung Kim
2024-10-01 0:20 ` [PATCH 1/8] perf tools: Add fallback for exclude_guest Namhyung Kim
` (10 more replies)
0 siblings, 11 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
Hello,
I found perf tools set exclude_guest bit inconsistently. It used to
set the bit but now the default event for perf record doesn't. So I'm
wondering why we want the bit in the first place.
Actually it's not good for PMUs don't support any exclusion like AMD
IBS because it disables new features after the exclude_guest due to
the missing feature detection logic.
v4 changes)
* handle EOPNOTSUPP error in compatible way (Kan)
* drop --exclude-guest option in perf stat
* not to separate exclude_hv fallback
* rename to exclude_GH_default (Kan)
* drop the RFC from the subject
v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
* move exclude_guest fallback to the front
* fix precise_max handling on AMD
* simplify the default event for perf record
v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
* update the missing feature detection logic
* separate exclude_hv fallback
* add new fallback for exclude_guest
v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
AFAIK it doesn't matter for the most cases but perf kvm. If users
need to set the bit, they can still use :H modifier. For vPMU pass-
through or Apple M1, it'd add the exclude_guest during the fallback
logic.
Also the kernel feature detection logic should be separated from the
exclude bit tests since it depends on the PMU implementation rather
than the core kernel features. So I changed it to use a software
event for the detection and factor out some hw-specific checks.
The code is available at 'perf/exclude-v4' branch in
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Thanks,
Namhyung
Namhyung Kim (8):
perf tools: Add fallback for exclude_guest
perf tools: Don't set attr.exclude_guest by default
perf tools: Simplify evsel__add_modifier()
perf tools: Do not set exclude_guest for precise_ip
perf tools: Detect missing kernel features properly
perf tools: Move x86__is_amd_cpu() to util/env.c
perf tools: Check fallback error and order
perf record: Just use "cycles:P" as the default event
tools/perf/arch/x86/util/Build | 1 -
tools/perf/arch/x86/util/env.c | 19 -
tools/perf/arch/x86/util/env.h | 7 -
tools/perf/arch/x86/util/pmu.c | 2 +-
tools/perf/builtin-kvm.c | 1 +
tools/perf/builtin-record.c | 4 +-
tools/perf/builtin-stat.c | 18 +-
tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
tools/perf/tests/parse-events.c | 30 +-
tools/perf/util/env.c | 24 ++
tools/perf/util/env.h | 4 +
tools/perf/util/evsel.c | 394 ++++++++++++++------
tools/perf/util/evsel.h | 1 -
tools/perf/util/parse-events.c | 6 +-
tools/perf/util/util.c | 10 +-
tools/perf/util/util.h | 3 +
18 files changed, 364 insertions(+), 166 deletions(-)
delete mode 100644 tools/perf/arch/x86/util/env.c
delete mode 100644 tools/perf/arch/x86/util/env.h
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/8] perf tools: Add fallback for exclude_guest
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 17:41 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 2/8] perf tools: Don't set attr.exclude_guest by default Namhyung Kim
` (9 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang, James Clark
Commit 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default")
changed to parse "cycles:P" event instead of creating a new cycles
event for perf record. But it also changed the way how modifiers are
handled so it doesn't set the exclude_guest bit by default.
It seems Apple M1 PMU requires exclude_guest set and returns EOPNOTSUPP
if not. Let's add a fallback so that it can work with default events.
Fixes: 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default")
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/builtin-stat.c | 18 +++++++++++++++---
tools/perf/util/evsel.c | 21 +++++++++++++++++++++
2 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 1521b6df26065ccf..fd9ea15f6b1c0809 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -639,8 +639,7 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
* (behavior changed with commit b0a873e).
*/
if (errno == EINVAL || errno == ENOSYS ||
- errno == ENOENT || errno == EOPNOTSUPP ||
- errno == ENXIO) {
+ errno == ENOENT || errno == ENXIO) {
if (verbose > 0)
ui__warning("%s event is not supported by the kernel.\n",
evsel__name(counter));
@@ -658,7 +657,7 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
if (verbose > 0)
ui__warning("%s\n", msg);
return COUNTER_RETRY;
- } else if (target__has_per_thread(&target) &&
+ } else if (target__has_per_thread(&target) && errno != EOPNOTSUPP &&
evsel_list->core.threads &&
evsel_list->core.threads->err_thread != -1) {
/*
@@ -679,6 +678,19 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
return COUNTER_SKIP;
}
+ if (errno == EOPNOTSUPP) {
+ if (verbose > 0) {
+ ui__warning("%s event is not supported by the kernel.\n",
+ evsel__name(counter));
+ }
+ counter->supported = false;
+ counter->errored = true;
+
+ if ((evsel__leader(counter) != counter) ||
+ !(counter->core.leader->nr_members > 1))
+ return COUNTER_SKIP;
+ }
+
evsel__open_strerror(counter, &target, errno, msg, sizeof(msg));
ui__error("%s\n", msg);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index da0bada62140d9b0..0ddd77c139e89a2e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3259,6 +3259,27 @@ bool evsel__fallback(struct evsel *evsel, struct target *target, int err,
evsel->core.attr.exclude_kernel = 1;
evsel->core.attr.exclude_hv = 1;
+ return true;
+ } else if (err == EOPNOTSUPP && !evsel->core.attr.exclude_guest &&
+ !evsel->exclude_GH) {
+ const char *name = evsel__name(evsel);
+ char *new_name;
+ const char *sep = ":";
+
+ /* Is there already the separator in the name. */
+ if (strchr(name, '/') ||
+ (strchr(name, ':') && !evsel->is_libpfm_event))
+ sep = "";
+
+ if (asprintf(&new_name, "%s%sH", name, sep) < 0)
+ return false;
+
+ free(evsel->name);
+ evsel->name = new_name;
+ /* Apple M1 requires exclude_guest */
+ scnprintf(msg, msgsize, "trying to fall back to excluding guest samples");
+ evsel->core.attr.exclude_guest = 1;
+
return true;
}
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 2/8] perf tools: Don't set attr.exclude_guest by default
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
2024-10-01 0:20 ` [PATCH 1/8] perf tools: Add fallback for exclude_guest Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 17:43 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 3/8] perf tools: Simplify evsel__add_modifier() Namhyung Kim
` (8 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang, James Clark
The exclude_guest in the event attribute is to limit profiling in the
host environment. But I'm not sure why we want to set it by default
cause we don't care about it in most cases and I feel like it just
makes new PMU implementation complicated.
Of course it's useful for perf kvm command so I added the
exclude_GH_default variable to preserve the old behavior for perf kvm
and other commands like perf record and stat won't set the exclude bit.
This is helpful for AMD IBS case since having exclude_guest bit will
clear new feature bit due to the missing feature check logic.
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = 0
$ perf record -W -e ibs_op// -vv true 2>&1 | grep switching
switching off PERF_FORMAT_LOST support
switching off weight struct support
switching off bpf_event
switching off ksymbol
switching off cloexec flag
switching off mmap2
switching off exclude_guest, exclude_host
Intestingly, I found it sets the exclude_bit if "u" modifier is used.
I don't know why but it's neither intuitive nor consistent. Let's
remove the bit there too.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/builtin-kvm.c | 1 +
tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
tools/perf/tests/parse-events.c | 18 +++++++++---------
tools/perf/util/parse-events.c | 2 +-
tools/perf/util/util.c | 10 ++++++++--
tools/perf/util/util.h | 3 +++
6 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 55ea17c5ff02acf7..099ce3ebf67ce6ee 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -2147,6 +2147,7 @@ int cmd_kvm(int argc, const char **argv)
"buildid-list", "stat", NULL };
const char *kvm_usage[] = { NULL, NULL };
+ exclude_GH_default = true;
perf_host = 0;
perf_guest = 1;
diff --git a/tools/perf/tests/attr/test-record-dummy-C0 b/tools/perf/tests/attr/test-record-dummy-C0
index 576ec48b3aafaa6a..8ce6f0a5df5b7013 100644
--- a/tools/perf/tests/attr/test-record-dummy-C0
+++ b/tools/perf/tests/attr/test-record-dummy-C0
@@ -37,7 +37,7 @@ precise_ip=0
mmap_data=0
sample_id_all=1
exclude_host=0
-exclude_guest=1
+exclude_guest=0
exclude_callchain_kernel=0
exclude_callchain_user=0
mmap2=1
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 78e999f03d2d75f4..727683f249f66f5a 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -932,7 +932,7 @@ static int test__group2(struct evlist *evlist)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel__is_group_leader(evsel));
@@ -947,7 +947,7 @@ static int test__group2(struct evlist *evlist)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
if (evsel__has_leader(evsel, leader))
@@ -1072,7 +1072,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel__is_group_leader(evsel));
@@ -1222,7 +1222,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel__is_group_leader(evsel));
@@ -1437,7 +1437,7 @@ static int test__leader_sample1(struct evlist *evlist)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
@@ -1453,7 +1453,7 @@ static int test__leader_sample1(struct evlist *evlist)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
@@ -1468,7 +1468,7 @@ static int test__leader_sample1(struct evlist *evlist)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
@@ -1497,7 +1497,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
@@ -1513,7 +1513,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index e96cf13dc396193f..ff67213d6e887169 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1776,7 +1776,7 @@ static int parse_events__modifier_list(struct parse_events_state *parse_state,
if (mod.user) {
if (!exclude)
exclude = eu = ek = eh = 1;
- if (!exclude_GH && !perf_guest)
+ if (!exclude_GH && !perf_guest && exclude_GH_default)
eG = 1;
eu = 0;
}
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 9d55a13787ce3c05..280c86d61d8a7956 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -78,17 +78,23 @@ bool sysctl__nmi_watchdog_enabled(void)
bool test_attr__enabled;
+bool exclude_GH_default;
+
bool perf_host = true;
bool perf_guest = false;
void event_attr_init(struct perf_event_attr *attr)
{
+ /* to capture ABI version */
+ attr->size = sizeof(*attr);
+
+ if (!exclude_GH_default)
+ return;
+
if (!perf_host)
attr->exclude_host = 1;
if (!perf_guest)
attr->exclude_guest = 1;
- /* to capture ABI version */
- attr->size = sizeof(*attr);
}
int mkdir_p(char *path, mode_t mode)
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 9966c21aaf048479..4920e102ff54879a 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -21,6 +21,9 @@ extern const char perf_more_info_string[];
extern const char *input_name;
+/* This will control if perf_{host,guest} will set attr.exclude_{host,guest}. */
+extern bool exclude_GH_default;
+
extern bool perf_host;
extern bool perf_guest;
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 3/8] perf tools: Simplify evsel__add_modifier()
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
2024-10-01 0:20 ` [PATCH 1/8] perf tools: Add fallback for exclude_guest Namhyung Kim
2024-10-01 0:20 ` [PATCH 2/8] perf tools: Don't set attr.exclude_guest by default Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 17:44 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 4/8] perf tools: Do not set exclude_guest for precise_ip Namhyung Kim
` (7 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
Since it doesn't set the exclude_guest, no need to special handle the
bit and simply show only if one of host or guest bit is set. Now the
default event name might not have :H prefix anymore so change the
dlfilter test not to compare the ":" at the end.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
tools/perf/util/evsel.c | 5 +----
3 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/tools/perf/dlfilters/dlfilter-test-api-v0.c b/tools/perf/dlfilters/dlfilter-test-api-v0.c
index 4083b1abeaabe605..4ca2d7b2ea6c8200 100644
--- a/tools/perf/dlfilters/dlfilter-test-api-v0.c
+++ b/tools/perf/dlfilters/dlfilter-test-api-v0.c
@@ -220,7 +220,7 @@ static int check_sample(struct filter_data *d, const struct perf_dlfilter_sample
CHECK_SAMPLE(raw_callchain_nr);
CHECK(!sample->raw_callchain);
-#define EVENT_NAME "branches:"
+#define EVENT_NAME "branches"
CHECK(!strncmp(sample->event, EVENT_NAME, strlen(EVENT_NAME)));
return 0;
diff --git a/tools/perf/dlfilters/dlfilter-test-api-v2.c b/tools/perf/dlfilters/dlfilter-test-api-v2.c
index 32ff619e881caa50..00d73a16c4fdaece 100644
--- a/tools/perf/dlfilters/dlfilter-test-api-v2.c
+++ b/tools/perf/dlfilters/dlfilter-test-api-v2.c
@@ -235,7 +235,7 @@ static int check_sample(struct filter_data *d, const struct perf_dlfilter_sample
CHECK_SAMPLE(raw_callchain_nr);
CHECK(!sample->raw_callchain);
-#define EVENT_NAME "branches:"
+#define EVENT_NAME "branches"
CHECK(!strncmp(sample->event, EVENT_NAME, strlen(EVENT_NAME)));
return 0;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0ddd77c139e89a2e..f202d28147d62a44 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -545,7 +545,6 @@ static int evsel__add_modifiers(struct evsel *evsel, char *bf, size_t size)
{
int colon = 0, r = 0;
struct perf_event_attr *attr = &evsel->core.attr;
- bool exclude_guest_default = false;
#define MOD_PRINT(context, mod) do { \
if (!attr->exclude_##context) { \
@@ -557,17 +556,15 @@ static int evsel__add_modifiers(struct evsel *evsel, char *bf, size_t size)
MOD_PRINT(kernel, 'k');
MOD_PRINT(user, 'u');
MOD_PRINT(hv, 'h');
- exclude_guest_default = true;
}
if (attr->precise_ip) {
if (!colon)
colon = ++r;
r += scnprintf(bf + r, size - r, "%.*s", attr->precise_ip, "ppp");
- exclude_guest_default = true;
}
- if (attr->exclude_host || attr->exclude_guest == exclude_guest_default) {
+ if (attr->exclude_host || attr->exclude_guest) {
MOD_PRINT(host, 'H');
MOD_PRINT(guest, 'G');
}
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 4/8] perf tools: Do not set exclude_guest for precise_ip
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (2 preceding siblings ...)
2024-10-01 0:20 ` [PATCH 3/8] perf tools: Simplify evsel__add_modifier() Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 17:46 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 5/8] perf tools: Detect missing kernel features properly Namhyung Kim
` (6 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
It seems perf sets the exclude_guest bit because of Intel PEBS
implementation which uses a virtual address. IIUC now kernel disables
PEBS when it goes to the guest mode regardless of this bit so we don't
need to set it explicitly. At least for the other archs/vendors.
I found the commit 1342798cc13e set the exclude_guest for precise_ip
in the tool and the commit 20b279ddb38c added kernel side enforcement
which was reverted by commit a706d965dcfd later.
Actually it doesn't set the exclude_guest for the default event
(cycles:P) already.
$ grep -m1 vendor /proc/cpuinfo
vendor_id : GenuineIntel
$ perf record -e cycles:P true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ]
$ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise
precise_ip: 3
But having lower 'p' modifier set the bit for some reason.
$ perf record -e cycles:pp true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ]
$ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise
precise_ip: 2
exclude_guest: 1
Actually AMD IBS suffers from this because it doesn't support excludes
and having this bit effectively disables new features in the current
implementation (due to the missing feature check).
$ grep -m1 vendor /proc/cpuinfo
vendor_id : AuthenticAMD
$ perf record -W -e cycles:p -vv true 2>&1 | grep switching
switching off PERF_FORMAT_LOST support
switching off weight struct support
switching off bpf_event
switching off ksymbol
switching off cloexec flag
switching off mmap2
switching off exclude_guest, exclude_host
By not setting exclude_guest, we can fix this inconsistency and the
troubles.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/tests/parse-events.c | 12 ++++--------
tools/perf/util/parse-events.c | 4 ----
2 files changed, 4 insertions(+), 12 deletions(-)
diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
index 727683f249f66f5a..82a19674a38f774e 100644
--- a/tools/perf/tests/parse-events.c
+++ b/tools/perf/tests/parse-events.c
@@ -898,8 +898,7 @@ static int test__group1(struct evlist *evlist)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- /* use of precise requires exclude_guest */
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip == 2);
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
@@ -1016,9 +1015,8 @@ static int test__group3(struct evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude_kernel",
!evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- /* use of precise requires exclude_guest */
TEST_ASSERT_VAL("wrong exclude guest",
- evsel->core.attr.exclude_guest);
+ !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host",
!evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip",
@@ -1103,8 +1101,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- /* use of precise requires exclude_guest */
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip == 1);
TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
@@ -1122,8 +1119,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
- /* use of precise requires exclude_guest */
- TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
+ TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip == 2);
TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index ff67213d6e887169..63da105ba914ef29 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1769,10 +1769,6 @@ static int parse_events__modifier_list(struct parse_events_state *parse_state,
int exclude = eu | ek | eh;
int exclude_GH = group ? evsel->exclude_GH : 0;
- if (mod.precise) {
- /* use of precise requires exclude_guest */
- eG = 1;
- }
if (mod.user) {
if (!exclude)
exclude = eu = ek = eh = 1;
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 5/8] perf tools: Detect missing kernel features properly
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (3 preceding siblings ...)
2024-10-01 0:20 ` [PATCH 4/8] perf tools: Do not set exclude_guest for precise_ip Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 17:53 ` Ian Rogers
2024-10-15 4:19 ` Ravi Bangoria
2024-10-01 0:20 ` [PATCH 6/8] perf tools: Move x86__is_amd_cpu() to util/env.c Namhyung Kim
` (5 subsequent siblings)
10 siblings, 2 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
The evsel__detect_missing_features() is to check if the attributes of
the evsel is supported or not. But it checks the attribute based on the
given evsel, it might miss something if the attr doesn't have the bit or
give incorrect results if the event is special.
Also it maintains the order of the feature that was added to the kernel
which means it can assume older features should be supported once it
detects the current feature is working. To minimized the confusion and
to accurately check the kernel features, I think it's better to use a
software event and go through all the features at once.
Also make the function static since it's only used in evsel.c.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/evsel.c | 345 +++++++++++++++++++++++++++++-----------
tools/perf/util/evsel.h | 1 -
2 files changed, 249 insertions(+), 97 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f202d28147d62a44..32e30c293d0c6198 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -20,6 +20,7 @@
#include <linux/zalloc.h>
#include <sys/ioctl.h>
#include <sys/resource.h>
+#include <sys/syscall.h>
#include <sys/types.h>
#include <dirent.h>
#include <stdlib.h>
@@ -2150,120 +2151,272 @@ int evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
return err;
}
-bool evsel__detect_missing_features(struct evsel *evsel)
+static bool has_attr_feature(struct perf_event_attr *attr, unsigned long flags)
{
+ int fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
+ /*group_fd=*/-1, flags);
+ close(fd);
+
+ if (fd < 0) {
+ attr->exclude_kernel = 1;
+
+ fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
+ /*group_fd=*/-1, flags);
+ close(fd);
+ }
+
+ if (fd < 0) {
+ attr->exclude_hv = 1;
+
+ fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
+ /*group_fd=*/-1, flags);
+ close(fd);
+ }
+
+ if (fd < 0) {
+ attr->exclude_guest = 1;
+
+ fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
+ /*group_fd=*/-1, flags);
+ close(fd);
+ }
+
+ attr->exclude_kernel = 0;
+ attr->exclude_guest = 0;
+ attr->exclude_hv = 0;
+
+ return fd >= 0;
+}
+
+static void evsel__detect_missing_brstack_features(struct evsel *evsel)
+{
+ static bool detection_done = false;
+ struct perf_event_attr attr = {
+ .type = evsel->core.attr.type,
+ .config = evsel->core.attr.config,
+ .disabled = 1,
+ .sample_type = PERF_SAMPLE_BRANCH_STACK,
+ .sample_period = 1000,
+ };
+ int old_errno;
+
+ if (detection_done)
+ return;
+
+ old_errno = errno;
+
/*
* Must probe features in the order they were added to the
- * perf_event_attr interface.
+ * perf_event_attr interface. These are PMU specific limitation
+ * so we can detect with the given hardware event and stop on the
+ * first one succeeded.
*/
- if (!perf_missing_features.branch_counters &&
- (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
- perf_missing_features.branch_counters = true;
- pr_debug2("switching off branch counters support\n");
- return true;
- } else if (!perf_missing_features.read_lost &&
- (evsel->core.attr.read_format & PERF_FORMAT_LOST)) {
- perf_missing_features.read_lost = true;
- pr_debug2("switching off PERF_FORMAT_LOST support\n");
+
+ /* Please add new feature detection here. */
+
+ attr.branch_sample_type = PERF_SAMPLE_BRANCH_COUNTERS;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.branch_counters = true;
+ pr_debug2("switching off branch counters support\n");
+
+ attr.branch_sample_type = PERF_SAMPLE_BRANCH_HW_INDEX;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.branch_hw_idx = true;
+ pr_debug2("switching off branch HW index support\n");
+
+ attr.branch_sample_type = PERF_SAMPLE_BRANCH_NO_CYCLES | PERF_SAMPLE_BRANCH_NO_FLAGS;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.lbr_flags = true;
+ pr_debug2_peo("switching off branch sample type no (cycles/flags)\n");
+
+found:
+ detection_done = true;
+ errno = old_errno;
+}
+
+static bool evsel__detect_missing_features(struct evsel *evsel)
+{
+ static bool detection_done = false;
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_SOFTWARE,
+ .config = PERF_COUNT_SW_TASK_CLOCK,
+ .disabled = 1,
+ };
+ int old_errno;
+
+ if (evsel__has_br_stack(evsel))
+ evsel__detect_missing_brstack_features(evsel);
+
+ if (detection_done)
+ goto check;
+
+ old_errno = errno;
+
+ /*
+ * Must probe features in the order they were added to the
+ * perf_event_attr interface. These are kernel core limitation
+ * not PMU-specific so we can detect with a software event and
+ * stop on the first one succeeded.
+ */
+
+ /* Please add new feature detection here. */
+
+ attr.read_format = PERF_FORMAT_LOST;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.read_lost = true;
+ pr_debug2("switching off PERF_FORMAT_LOST support\n");
+ attr.read_format = 0;
+
+ attr.sample_type = PERF_SAMPLE_WEIGHT_STRUCT;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.weight_struct = true;
+ pr_debug2("switching off weight struct support\n");
+ attr.sample_type = 0;
+
+ attr.sample_type = PERF_SAMPLE_CODE_PAGE_SIZE;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.code_page_size = true;
+ pr_debug2_peo("Kernel has no PERF_SAMPLE_CODE_PAGE_SIZE support\n");
+ attr.sample_type = 0;
+
+ attr.sample_type = PERF_SAMPLE_DATA_PAGE_SIZE;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.data_page_size = true;
+ pr_debug2_peo("Kernel has no PERF_SAMPLE_DATA_PAGE_SIZE support\n");
+ attr.sample_type = 0;
+
+ attr.cgroup = 1;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.cgroup = true;
+ pr_debug2_peo("Kernel has no cgroup sampling support\n");
+ attr.cgroup = 0;
+
+ attr.aux_output = 1;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.aux_output = true;
+ pr_debug2_peo("Kernel has no attr.aux_output support\n");
+ attr.aux_output = 0;
+
+ attr.bpf_event = 1;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.bpf = true;
+ pr_debug2_peo("switching off bpf_event\n");
+ attr.bpf_event = 0;
+
+ attr.ksymbol = 1;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.ksymbol = true;
+ pr_debug2_peo("switching off ksymbol\n");
+ attr.ksymbol = 0;
+
+ attr.write_backward = 1;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.write_backward = true;
+ pr_debug2_peo("switching off write_backward\n");
+ attr.write_backward = 0;
+
+ attr.use_clockid = 1;
+ attr.clockid = CLOCK_MONOTONIC;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.clockid = true;
+ pr_debug2_peo("switching off clockid\n");
+ attr.use_clockid = 0;
+ attr.clockid = 0;
+
+ if (has_attr_feature(&attr, /*flags=*/PERF_FLAG_FD_CLOEXEC))
+ goto found;
+ perf_missing_features.cloexec = true;
+ pr_debug2_peo("switching off cloexec flag\n");
+
+ attr.mmap2 = 1;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.mmap2 = true;
+ pr_debug2_peo("switching off mmap2\n");
+ attr.mmap2 = 0;
+
+ /* set this unconditionally? */
+ perf_missing_features.sample_id_all = true;
+ pr_debug2_peo("switching off sample_id_all\n");
+
+ attr.inherit = 1;
+ attr.read_format = PERF_FORMAT_GROUP;
+ if (has_attr_feature(&attr, /*flags=*/0))
+ goto found;
+ perf_missing_features.group_read = true;
+ pr_debug2_peo("switching off group read\n");
+ attr.inherit = 0;
+ attr.read_format = 0;
+
+found:
+ detection_done = true;
+ errno = old_errno;
+
+check:
+ if ((evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
+ perf_missing_features.branch_counters)
return true;
- } else if (!perf_missing_features.weight_struct &&
- (evsel->core.attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT)) {
- perf_missing_features.weight_struct = true;
- pr_debug2("switching off weight struct support\n");
+
+ if ((evsel->core.attr.read_format & PERF_FORMAT_LOST) &&
+ perf_missing_features.read_lost)
return true;
- } else if (!perf_missing_features.code_page_size &&
- (evsel->core.attr.sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)) {
- perf_missing_features.code_page_size = true;
- pr_debug2_peo("Kernel has no PERF_SAMPLE_CODE_PAGE_SIZE support, bailing out\n");
- return false;
- } else if (!perf_missing_features.data_page_size &&
- (evsel->core.attr.sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)) {
- perf_missing_features.data_page_size = true;
- pr_debug2_peo("Kernel has no PERF_SAMPLE_DATA_PAGE_SIZE support, bailing out\n");
- return false;
- } else if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
- perf_missing_features.cgroup = true;
- pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
- return false;
- } else if (!perf_missing_features.branch_hw_idx &&
- (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
- perf_missing_features.branch_hw_idx = true;
- pr_debug2("switching off branch HW index support\n");
+
+ if ((evsel->core.attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT) &&
+ perf_missing_features.weight_struct)
return true;
- } else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
- perf_missing_features.aux_output = true;
- pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
- return false;
- } else if (!perf_missing_features.bpf && evsel->core.attr.bpf_event) {
- perf_missing_features.bpf = true;
- pr_debug2_peo("switching off bpf_event\n");
+
+ if (evsel->core.attr.use_clockid && evsel->core.attr.clockid != CLOCK_MONOTONIC &&
+ !perf_missing_features.clockid) {
+ perf_missing_features.clockid_wrong = true;
return true;
- } else if (!perf_missing_features.ksymbol && evsel->core.attr.ksymbol) {
- perf_missing_features.ksymbol = true;
- pr_debug2_peo("switching off ksymbol\n");
+ }
+
+ if (evsel->core.attr.use_clockid && perf_missing_features.clockid)
return true;
- } else if (!perf_missing_features.write_backward && evsel->core.attr.write_backward) {
- perf_missing_features.write_backward = true;
- pr_debug2_peo("switching off write_backward\n");
- return false;
- } else if (!perf_missing_features.clockid_wrong && evsel->core.attr.use_clockid) {
- perf_missing_features.clockid_wrong = true;
- pr_debug2_peo("switching off clockid\n");
+
+ if ((evsel->open_flags & PERF_FLAG_FD_CLOEXEC) &&
+ perf_missing_features.cloexec)
return true;
- } else if (!perf_missing_features.clockid && evsel->core.attr.use_clockid) {
- perf_missing_features.clockid = true;
- pr_debug2_peo("switching off use_clockid\n");
+
+ if (evsel->core.attr.mmap2 && perf_missing_features.mmap2)
return true;
- } else if (!perf_missing_features.cloexec && (evsel->open_flags & PERF_FLAG_FD_CLOEXEC)) {
- perf_missing_features.cloexec = true;
- pr_debug2_peo("switching off cloexec flag\n");
+
+ if ((evsel->core.attr.branch_sample_type & (PERF_SAMPLE_BRANCH_NO_FLAGS |
+ PERF_SAMPLE_BRANCH_NO_CYCLES)) &&
+ perf_missing_features.lbr_flags)
return true;
- } else if (!perf_missing_features.mmap2 && evsel->core.attr.mmap2) {
- perf_missing_features.mmap2 = true;
- pr_debug2_peo("switching off mmap2\n");
+
+ if (evsel->core.attr.inherit && (evsel->core.attr.read_format & PERF_FORMAT_GROUP) &&
+ perf_missing_features.group_read)
return true;
- } else if (evsel->core.attr.exclude_guest || evsel->core.attr.exclude_host) {
- if (evsel->pmu == NULL)
- evsel->pmu = evsel__find_pmu(evsel);
-
- if (evsel->pmu)
- evsel->pmu->missing_features.exclude_guest = true;
- else {
- /* we cannot find PMU, disable attrs now */
- evsel->core.attr.exclude_host = false;
- evsel->core.attr.exclude_guest = false;
- }
- if (evsel->exclude_GH) {
- pr_debug2_peo("PMU has no exclude_host/guest support, bailing out\n");
- return false;
- }
- if (!perf_missing_features.exclude_guest) {
- perf_missing_features.exclude_guest = true;
- pr_debug2_peo("switching off exclude_guest, exclude_host\n");
- }
+ if (evsel->core.attr.ksymbol && perf_missing_features.ksymbol)
return true;
- } else if (!perf_missing_features.sample_id_all) {
- perf_missing_features.sample_id_all = true;
- pr_debug2_peo("switching off sample_id_all\n");
+
+ if (evsel->core.attr.bpf_event && perf_missing_features.bpf)
return true;
- } else if (!perf_missing_features.lbr_flags &&
- (evsel->core.attr.branch_sample_type &
- (PERF_SAMPLE_BRANCH_NO_CYCLES |
- PERF_SAMPLE_BRANCH_NO_FLAGS))) {
- perf_missing_features.lbr_flags = true;
- pr_debug2_peo("switching off branch sample type no (cycles/flags)\n");
+
+ if ((evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) &&
+ perf_missing_features.branch_hw_idx)
return true;
- } else if (!perf_missing_features.group_read &&
- evsel->core.attr.inherit &&
- (evsel->core.attr.read_format & PERF_FORMAT_GROUP) &&
- evsel__is_group_leader(evsel)) {
- perf_missing_features.group_read = true;
- pr_debug2_peo("switching off group read\n");
+
+ if (evsel->core.attr.sample_id_all && perf_missing_features.sample_id_all)
return true;
- } else {
- return false;
- }
+
+ return false;
}
static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3e751ea769ac4d3a..ea3140cd91c589fd 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -368,7 +368,6 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
void evsel__close(struct evsel *evsel);
int evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
struct perf_thread_map *threads);
-bool evsel__detect_missing_features(struct evsel *evsel);
bool evsel__precise_ip_fallback(struct evsel *evsel);
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 6/8] perf tools: Move x86__is_amd_cpu() to util/env.c
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (4 preceding siblings ...)
2024-10-01 0:20 ` [PATCH 5/8] perf tools: Detect missing kernel features properly Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 16:03 ` Liang, Kan
2024-10-01 0:20 ` [PATCH 7/8] perf tools: Check fallback error and order Namhyung Kim
` (4 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
It can be called from non-x86 platform so let's move it to the general
util directory. Also add a new helper perf_env__is_x86_amd_cpu() so
that it can be called with an existing perf_env as well.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/arch/x86/util/Build | 1 -
tools/perf/arch/x86/util/env.c | 19 -------------------
tools/perf/arch/x86/util/env.h | 7 -------
tools/perf/arch/x86/util/pmu.c | 2 +-
tools/perf/util/env.c | 24 ++++++++++++++++++++++++
tools/perf/util/env.h | 4 ++++
6 files changed, 29 insertions(+), 28 deletions(-)
delete mode 100644 tools/perf/arch/x86/util/env.c
delete mode 100644 tools/perf/arch/x86/util/env.h
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 2607ed5c42966543..ce6d802a1381c5ab 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -10,7 +10,6 @@ perf-util-y += evlist.o
perf-util-y += mem-events.o
perf-util-y += evsel.o
perf-util-y += iostat.o
-perf-util-y += env.o
perf-util-$(CONFIG_DWARF) += dwarf-regs.o
perf-util-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/env.c b/tools/perf/arch/x86/util/env.c
deleted file mode 100644
index 3e537ffb1353aab2..0000000000000000
--- a/tools/perf/arch/x86/util/env.c
+++ /dev/null
@@ -1,19 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include "linux/string.h"
-#include "util/env.h"
-#include "env.h"
-
-bool x86__is_amd_cpu(void)
-{
- struct perf_env env = { .total_mem = 0, };
- static int is_amd; /* 0: Uninitialized, 1: Yes, -1: No */
-
- if (is_amd)
- goto ret;
-
- perf_env__cpuid(&env);
- is_amd = env.cpuid && strstarts(env.cpuid, "AuthenticAMD") ? 1 : -1;
- perf_env__exit(&env);
-ret:
- return is_amd >= 1 ? true : false;
-}
diff --git a/tools/perf/arch/x86/util/env.h b/tools/perf/arch/x86/util/env.h
deleted file mode 100644
index d78f080b6b3f889a..0000000000000000
--- a/tools/perf/arch/x86/util/env.h
+++ /dev/null
@@ -1,7 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _X86_ENV_H
-#define _X86_ENV_H
-
-bool x86__is_amd_cpu(void);
-
-#endif /* _X86_ENV_H */
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index c3d89d6ba1bf03ad..e0060dac2a9f9242 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -16,7 +16,7 @@
#include "../../../util/fncache.h"
#include "../../../util/pmus.h"
#include "mem-events.h"
-#include "env.h"
+#include "util/env.h"
void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
{
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 1edbccfc3281d2b1..470a0156e0722e4e 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -5,6 +5,7 @@
#include "util/header.h"
#include "linux/compiler.h"
#include <linux/ctype.h>
+#include <linux/string.h>
#include <linux/zalloc.h>
#include "cgroup.h"
#include <errno.h>
@@ -625,6 +626,7 @@ char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
return NULL;
}
+
void perf_env__find_br_cntr_info(struct perf_env *env,
unsigned int *nr,
unsigned int *width)
@@ -639,3 +641,25 @@ void perf_env__find_br_cntr_info(struct perf_env *env,
env->pmu_caps->br_cntr_width;
}
}
+
+bool perf_env__is_x86_amd_cpu(struct perf_env *env)
+{
+ static int is_amd; /* 0: Uninitialized, 1: Yes, -1: No */
+
+ if (is_amd == 0)
+ is_amd = env->cpuid && strstarts(env->cpuid, "AuthenticAMD") ? 1 : -1;
+
+ return is_amd >= 1 ? true : false;
+}
+
+bool x86__is_amd_cpu(void)
+{
+ struct perf_env env = { .total_mem = 0, };
+ bool is_amd;
+
+ perf_env__cpuid(&env);
+ is_amd = perf_env__is_x86_amd_cpu(&env);
+ perf_env__exit(&env);
+
+ return is_amd;
+}
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 51b36c36019be666..ae604c4edbb7eb44 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -195,4 +195,8 @@ bool perf_env__has_pmu_mapping(struct perf_env *env, const char *pmu_name);
void perf_env__find_br_cntr_info(struct perf_env *env,
unsigned int *nr,
unsigned int *width);
+
+bool x86__is_amd_cpu(void);
+bool perf_env__is_x86_amd_cpu(struct perf_env *env);
+
#endif /* __PERF_ENV_H */
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 7/8] perf tools: Check fallback error and order
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (5 preceding siblings ...)
2024-10-01 0:20 ` [PATCH 6/8] perf tools: Move x86__is_amd_cpu() to util/env.c Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 18:00 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 8/8] perf record: Just use "cycles:P" as the default event Namhyung Kim
` (3 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
The perf_event_open might fail due to various reasons, so blindly
reducing precise_ip level might not be the best way to deal with it.
It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
given precise level. Let's try again with the correct error code.
This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
user events with exclude_kernel=1 cannot make progress. Let's add the
evsel__handle_error_quirks() to this case specially. I plan to work on
the kernel side to improve this situation but it'd still need some
special handling for IBS.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 32e30c293d0c6198..ef8356260eea54cd 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
return false;
}
+static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
+{
+ /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
+ if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
+ evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
+ evsel->core.attr.precise_ip = 0;
+ pr_debug2_peo("removing precise_ip on AMD\n");
+ display_attr(&evsel->core.attr);
+ return true;
+ }
+
+ return false;
+}
+
static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
struct perf_thread_map *threads,
int start_cpu_map_idx, int end_cpu_map_idx)
@@ -2580,9 +2594,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
return 0;
try_fallback:
- if (evsel__precise_ip_fallback(evsel))
- goto retry_open;
-
if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
idx, threads, thread, err)) {
/* We just removed 1 thread, so lower the upper nthreads limit. */
@@ -2599,11 +2610,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
goto retry_open;
- if (err != -EINVAL || idx > 0 || thread > 0)
- goto out_close;
+ if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
+ goto retry_open;
- if (evsel__detect_missing_features(evsel))
+ if (err == -EINVAL && evsel__detect_missing_features(evsel))
goto fallback_missing_features;
+
+ if (evsel__handle_error_quirks(evsel, err))
+ goto retry_open;
+
out_close:
if (err)
threads->err_thread = thread;
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 8/8] perf record: Just use "cycles:P" as the default event
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (6 preceding siblings ...)
2024-10-01 0:20 ` [PATCH 7/8] perf tools: Check fallback error and order Namhyung Kim
@ 2024-10-01 0:20 ` Namhyung Kim
2024-10-01 18:00 ` Ian Rogers
2024-10-01 17:46 ` [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Liang, Kan
` (2 subsequent siblings)
10 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 0:20 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
The fallback logic can add ":u" modifier if needed.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/builtin-record.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index adbaf80b398c1f4c..f8325247292112d7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -4157,9 +4157,7 @@ int cmd_record(int argc, const char **argv)
record.opts.tail_synthesize = true;
if (rec->evlist->core.nr_entries == 0) {
- bool can_profile_kernel = perf_event_paranoid_check(1);
-
- err = parse_event(rec->evlist, can_profile_kernel ? "cycles:P" : "cycles:Pu");
+ err = parse_event(rec->evlist, "cycles:P");
if (err)
goto out;
}
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 6/8] perf tools: Move x86__is_amd_cpu() to util/env.c
2024-10-01 0:20 ` [PATCH 6/8] perf tools: Move x86__is_amd_cpu() to util/env.c Namhyung Kim
@ 2024-10-01 16:03 ` Liang, Kan
2024-10-01 22:39 ` Namhyung Kim
0 siblings, 1 reply; 37+ messages in thread
From: Liang, Kan @ 2024-10-01 16:03 UTC (permalink / raw)
To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
On 2024-09-30 8:20 p.m., Namhyung Kim wrote:
> It can be called from non-x86 platform so let's move it to the general
> util directory. Also add a new helper perf_env__is_x86_amd_cpu() so
> that it can be called with an existing perf_env as well.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/arch/x86/util/Build | 1 -
> tools/perf/arch/x86/util/env.c | 19 -------------------
> tools/perf/arch/x86/util/env.h | 7 -------
> tools/perf/arch/x86/util/pmu.c | 2 +-
> tools/perf/util/env.c | 24 ++++++++++++++++++++++++
> tools/perf/util/env.h | 4 ++++
> 6 files changed, 29 insertions(+), 28 deletions(-)
> delete mode 100644 tools/perf/arch/x86/util/env.c
> delete mode 100644 tools/perf/arch/x86/util/env.h
>
> diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
> index 2607ed5c42966543..ce6d802a1381c5ab 100644
> --- a/tools/perf/arch/x86/util/Build
> +++ b/tools/perf/arch/x86/util/Build
> @@ -10,7 +10,6 @@ perf-util-y += evlist.o
> perf-util-y += mem-events.o
> perf-util-y += evsel.o
> perf-util-y += iostat.o
> -perf-util-y += env.o
>
> perf-util-$(CONFIG_DWARF) += dwarf-regs.o
> perf-util-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
> diff --git a/tools/perf/arch/x86/util/env.c b/tools/perf/arch/x86/util/env.c
> deleted file mode 100644
> index 3e537ffb1353aab2..0000000000000000
> --- a/tools/perf/arch/x86/util/env.c
> +++ /dev/null
> @@ -1,19 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -#include "linux/string.h"
> -#include "util/env.h"
> -#include "env.h"
> -
> -bool x86__is_amd_cpu(void)
> -{
> - struct perf_env env = { .total_mem = 0, };
> - static int is_amd; /* 0: Uninitialized, 1: Yes, -1: No */
> -
> - if (is_amd)
> - goto ret;
> -
> - perf_env__cpuid(&env);
> - is_amd = env.cpuid && strstarts(env.cpuid, "AuthenticAMD") ? 1 : -1;
> - perf_env__exit(&env);
> -ret:
> - return is_amd >= 1 ? true : false;
> -}
> diff --git a/tools/perf/arch/x86/util/env.h b/tools/perf/arch/x86/util/env.h
> deleted file mode 100644
> index d78f080b6b3f889a..0000000000000000
> --- a/tools/perf/arch/x86/util/env.h
> +++ /dev/null
> @@ -1,7 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef _X86_ENV_H
> -#define _X86_ENV_H
> -
> -bool x86__is_amd_cpu(void);
> -
> -#endif /* _X86_ENV_H */
> diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
> index c3d89d6ba1bf03ad..e0060dac2a9f9242 100644
> --- a/tools/perf/arch/x86/util/pmu.c
> +++ b/tools/perf/arch/x86/util/pmu.c
> @@ -16,7 +16,7 @@
> #include "../../../util/fncache.h"
> #include "../../../util/pmus.h"
> #include "mem-events.h"
> -#include "env.h"
> +#include "util/env.h"
>
> void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> {
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index 1edbccfc3281d2b1..470a0156e0722e4e 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -5,6 +5,7 @@
> #include "util/header.h"
> #include "linux/compiler.h"
> #include <linux/ctype.h>
> +#include <linux/string.h>
> #include <linux/zalloc.h>
> #include "cgroup.h"
> #include <errno.h>
> @@ -625,6 +626,7 @@ char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
> return NULL;
> }
>
> +
Useless empty line.
Thanks,
Kan
> void perf_env__find_br_cntr_info(struct perf_env *env,
> unsigned int *nr,
> unsigned int *width)
> @@ -639,3 +641,25 @@ void perf_env__find_br_cntr_info(struct perf_env *env,
> env->pmu_caps->br_cntr_width;
> }
> }
> +
> +bool perf_env__is_x86_amd_cpu(struct perf_env *env)
> +{
> + static int is_amd; /* 0: Uninitialized, 1: Yes, -1: No */
> +
> + if (is_amd == 0)
> + is_amd = env->cpuid && strstarts(env->cpuid, "AuthenticAMD") ? 1 : -1;
> +
> + return is_amd >= 1 ? true : false;
> +}
> +
> +bool x86__is_amd_cpu(void)
> +{
> + struct perf_env env = { .total_mem = 0, };
> + bool is_amd;
> +
> + perf_env__cpuid(&env);
> + is_amd = perf_env__is_x86_amd_cpu(&env);
> + perf_env__exit(&env);
> +
> + return is_amd;
> +}
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index 51b36c36019be666..ae604c4edbb7eb44 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -195,4 +195,8 @@ bool perf_env__has_pmu_mapping(struct perf_env *env, const char *pmu_name);
> void perf_env__find_br_cntr_info(struct perf_env *env,
> unsigned int *nr,
> unsigned int *width);
> +
> +bool x86__is_amd_cpu(void);
> +bool perf_env__is_x86_amd_cpu(struct perf_env *env);
> +
> #endif /* __PERF_ENV_H */
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/8] perf tools: Add fallback for exclude_guest
2024-10-01 0:20 ` [PATCH 1/8] perf tools: Add fallback for exclude_guest Namhyung Kim
@ 2024-10-01 17:41 ` Ian Rogers
0 siblings, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 17:41 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang,
James Clark
On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Commit 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default")
> changed to parse "cycles:P" event instead of creating a new cycles
> event for perf record. But it also changed the way how modifiers are
> handled so it doesn't set the exclude_guest bit by default.
>
> It seems Apple M1 PMU requires exclude_guest set and returns EOPNOTSUPP
> if not. Let's add a fallback so that it can work with default events.
>
> Fixes: 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default")
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: James Clark <james.clark@linaro.org>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/builtin-stat.c | 18 +++++++++++++++---
> tools/perf/util/evsel.c | 21 +++++++++++++++++++++
> 2 files changed, 36 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 1521b6df26065ccf..fd9ea15f6b1c0809 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -639,8 +639,7 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
> * (behavior changed with commit b0a873e).
> */
> if (errno == EINVAL || errno == ENOSYS ||
> - errno == ENOENT || errno == EOPNOTSUPP ||
> - errno == ENXIO) {
> + errno == ENOENT || errno == ENXIO) {
> if (verbose > 0)
> ui__warning("%s event is not supported by the kernel.\n",
> evsel__name(counter));
> @@ -658,7 +657,7 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
> if (verbose > 0)
> ui__warning("%s\n", msg);
> return COUNTER_RETRY;
> - } else if (target__has_per_thread(&target) &&
> + } else if (target__has_per_thread(&target) && errno != EOPNOTSUPP &&
> evsel_list->core.threads &&
> evsel_list->core.threads->err_thread != -1) {
> /*
> @@ -679,6 +678,19 @@ static enum counter_recovery stat_handle_error(struct evsel *counter)
> return COUNTER_SKIP;
> }
>
> + if (errno == EOPNOTSUPP) {
> + if (verbose > 0) {
> + ui__warning("%s event is not supported by the kernel.\n",
> + evsel__name(counter));
> + }
> + counter->supported = false;
> + counter->errored = true;
> +
> + if ((evsel__leader(counter) != counter) ||
> + !(counter->core.leader->nr_members > 1))
> + return COUNTER_SKIP;
> + }
> +
> evsel__open_strerror(counter, &target, errno, msg, sizeof(msg));
> ui__error("%s\n", msg);
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index da0bada62140d9b0..0ddd77c139e89a2e 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -3259,6 +3259,27 @@ bool evsel__fallback(struct evsel *evsel, struct target *target, int err,
> evsel->core.attr.exclude_kernel = 1;
> evsel->core.attr.exclude_hv = 1;
>
> + return true;
> + } else if (err == EOPNOTSUPP && !evsel->core.attr.exclude_guest &&
> + !evsel->exclude_GH) {
> + const char *name = evsel__name(evsel);
> + char *new_name;
> + const char *sep = ":";
> +
> + /* Is there already the separator in the name. */
> + if (strchr(name, '/') ||
> + (strchr(name, ':') && !evsel->is_libpfm_event))
> + sep = "";
> +
> + if (asprintf(&new_name, "%s%sH", name, sep) < 0)
> + return false;
> +
> + free(evsel->name);
> + evsel->name = new_name;
> + /* Apple M1 requires exclude_guest */
> + scnprintf(msg, msgsize, "trying to fall back to excluding guest samples");
> + evsel->core.attr.exclude_guest = 1;
> +
> return true;
> }
>
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 2/8] perf tools: Don't set attr.exclude_guest by default
2024-10-01 0:20 ` [PATCH 2/8] perf tools: Don't set attr.exclude_guest by default Namhyung Kim
@ 2024-10-01 17:43 ` Ian Rogers
0 siblings, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 17:43 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang,
James Clark
On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The exclude_guest in the event attribute is to limit profiling in the
> host environment. But I'm not sure why we want to set it by default
> cause we don't care about it in most cases and I feel like it just
> makes new PMU implementation complicated.
>
> Of course it's useful for perf kvm command so I added the
> exclude_GH_default variable to preserve the old behavior for perf kvm
> and other commands like perf record and stat won't set the exclude bit.
> This is helpful for AMD IBS case since having exclude_guest bit will
> clear new feature bit due to the missing feature check logic.
>
> $ sysctl kernel.perf_event_paranoid
> kernel.perf_event_paranoid = 0
>
> $ perf record -W -e ibs_op// -vv true 2>&1 | grep switching
> switching off PERF_FORMAT_LOST support
> switching off weight struct support
> switching off bpf_event
> switching off ksymbol
> switching off cloexec flag
> switching off mmap2
> switching off exclude_guest, exclude_host
>
> Intestingly, I found it sets the exclude_bit if "u" modifier is used.
> I don't know why but it's neither intuitive nor consistent. Let's
> remove the bit there too.
>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: James Clark <james.clark@linaro.org>
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/builtin-kvm.c | 1 +
> tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
> tools/perf/tests/parse-events.c | 18 +++++++++---------
> tools/perf/util/parse-events.c | 2 +-
> tools/perf/util/util.c | 10 ++++++++--
> tools/perf/util/util.h | 3 +++
> 6 files changed, 23 insertions(+), 13 deletions(-)
>
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index 55ea17c5ff02acf7..099ce3ebf67ce6ee 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -2147,6 +2147,7 @@ int cmd_kvm(int argc, const char **argv)
> "buildid-list", "stat", NULL };
> const char *kvm_usage[] = { NULL, NULL };
>
> + exclude_GH_default = true;
> perf_host = 0;
> perf_guest = 1;
>
> diff --git a/tools/perf/tests/attr/test-record-dummy-C0 b/tools/perf/tests/attr/test-record-dummy-C0
> index 576ec48b3aafaa6a..8ce6f0a5df5b7013 100644
> --- a/tools/perf/tests/attr/test-record-dummy-C0
> +++ b/tools/perf/tests/attr/test-record-dummy-C0
> @@ -37,7 +37,7 @@ precise_ip=0
> mmap_data=0
> sample_id_all=1
> exclude_host=0
> -exclude_guest=1
> +exclude_guest=0
> exclude_callchain_kernel=0
> exclude_callchain_user=0
> mmap2=1
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 78e999f03d2d75f4..727683f249f66f5a 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -932,7 +932,7 @@ static int test__group2(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong leader", evsel__is_group_leader(evsel));
> @@ -947,7 +947,7 @@ static int test__group2(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> if (evsel__has_leader(evsel, leader))
> @@ -1072,7 +1072,7 @@ static int test__group3(struct evlist *evlist __maybe_unused)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong leader", evsel__is_group_leader(evsel));
> @@ -1222,7 +1222,7 @@ static int test__group5(struct evlist *evlist __maybe_unused)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong leader", evsel__is_group_leader(evsel));
> @@ -1437,7 +1437,7 @@ static int test__leader_sample1(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> @@ -1453,7 +1453,7 @@ static int test__leader_sample1(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> @@ -1468,7 +1468,7 @@ static int test__leader_sample1(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", !evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> @@ -1497,7 +1497,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> @@ -1513,7 +1513,7 @@ static int test__leader_sample2(struct evlist *evlist __maybe_unused)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
> TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index e96cf13dc396193f..ff67213d6e887169 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1776,7 +1776,7 @@ static int parse_events__modifier_list(struct parse_events_state *parse_state,
> if (mod.user) {
> if (!exclude)
> exclude = eu = ek = eh = 1;
> - if (!exclude_GH && !perf_guest)
> + if (!exclude_GH && !perf_guest && exclude_GH_default)
> eG = 1;
> eu = 0;
> }
> diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
> index 9d55a13787ce3c05..280c86d61d8a7956 100644
> --- a/tools/perf/util/util.c
> +++ b/tools/perf/util/util.c
> @@ -78,17 +78,23 @@ bool sysctl__nmi_watchdog_enabled(void)
>
> bool test_attr__enabled;
>
> +bool exclude_GH_default;
> +
> bool perf_host = true;
> bool perf_guest = false;
>
> void event_attr_init(struct perf_event_attr *attr)
> {
> + /* to capture ABI version */
> + attr->size = sizeof(*attr);
> +
> + if (!exclude_GH_default)
> + return;
> +
> if (!perf_host)
> attr->exclude_host = 1;
> if (!perf_guest)
> attr->exclude_guest = 1;
> - /* to capture ABI version */
> - attr->size = sizeof(*attr);
> }
>
> int mkdir_p(char *path, mode_t mode)
> diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
> index 9966c21aaf048479..4920e102ff54879a 100644
> --- a/tools/perf/util/util.h
> +++ b/tools/perf/util/util.h
> @@ -21,6 +21,9 @@ extern const char perf_more_info_string[];
>
> extern const char *input_name;
>
> +/* This will control if perf_{host,guest} will set attr.exclude_{host,guest}. */
> +extern bool exclude_GH_default;
> +
> extern bool perf_host;
> extern bool perf_guest;
>
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/8] perf tools: Simplify evsel__add_modifier()
2024-10-01 0:20 ` [PATCH 3/8] perf tools: Simplify evsel__add_modifier() Namhyung Kim
@ 2024-10-01 17:44 ` Ian Rogers
0 siblings, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 17:44 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Since it doesn't set the exclude_guest, no need to special handle the
> bit and simply show only if one of host or guest bit is set. Now the
> default event name might not have :H prefix anymore so change the
> dlfilter test not to compare the ":" at the end.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
> tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
> tools/perf/util/evsel.c | 5 +----
> 3 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/dlfilters/dlfilter-test-api-v0.c b/tools/perf/dlfilters/dlfilter-test-api-v0.c
> index 4083b1abeaabe605..4ca2d7b2ea6c8200 100644
> --- a/tools/perf/dlfilters/dlfilter-test-api-v0.c
> +++ b/tools/perf/dlfilters/dlfilter-test-api-v0.c
> @@ -220,7 +220,7 @@ static int check_sample(struct filter_data *d, const struct perf_dlfilter_sample
> CHECK_SAMPLE(raw_callchain_nr);
> CHECK(!sample->raw_callchain);
>
> -#define EVENT_NAME "branches:"
> +#define EVENT_NAME "branches"
> CHECK(!strncmp(sample->event, EVENT_NAME, strlen(EVENT_NAME)));
>
> return 0;
> diff --git a/tools/perf/dlfilters/dlfilter-test-api-v2.c b/tools/perf/dlfilters/dlfilter-test-api-v2.c
> index 32ff619e881caa50..00d73a16c4fdaece 100644
> --- a/tools/perf/dlfilters/dlfilter-test-api-v2.c
> +++ b/tools/perf/dlfilters/dlfilter-test-api-v2.c
> @@ -235,7 +235,7 @@ static int check_sample(struct filter_data *d, const struct perf_dlfilter_sample
> CHECK_SAMPLE(raw_callchain_nr);
> CHECK(!sample->raw_callchain);
>
> -#define EVENT_NAME "branches:"
> +#define EVENT_NAME "branches"
> CHECK(!strncmp(sample->event, EVENT_NAME, strlen(EVENT_NAME)));
>
> return 0;
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 0ddd77c139e89a2e..f202d28147d62a44 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -545,7 +545,6 @@ static int evsel__add_modifiers(struct evsel *evsel, char *bf, size_t size)
> {
> int colon = 0, r = 0;
> struct perf_event_attr *attr = &evsel->core.attr;
> - bool exclude_guest_default = false;
>
> #define MOD_PRINT(context, mod) do { \
> if (!attr->exclude_##context) { \
> @@ -557,17 +556,15 @@ static int evsel__add_modifiers(struct evsel *evsel, char *bf, size_t size)
> MOD_PRINT(kernel, 'k');
> MOD_PRINT(user, 'u');
> MOD_PRINT(hv, 'h');
> - exclude_guest_default = true;
> }
>
> if (attr->precise_ip) {
> if (!colon)
> colon = ++r;
> r += scnprintf(bf + r, size - r, "%.*s", attr->precise_ip, "ppp");
> - exclude_guest_default = true;
> }
>
> - if (attr->exclude_host || attr->exclude_guest == exclude_guest_default) {
> + if (attr->exclude_host || attr->exclude_guest) {
> MOD_PRINT(host, 'H');
> MOD_PRINT(guest, 'G');
> }
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/8] perf tools: Do not set exclude_guest for precise_ip
2024-10-01 0:20 ` [PATCH 4/8] perf tools: Do not set exclude_guest for precise_ip Namhyung Kim
@ 2024-10-01 17:46 ` Ian Rogers
0 siblings, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 17:46 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> It seems perf sets the exclude_guest bit because of Intel PEBS
> implementation which uses a virtual address. IIUC now kernel disables
> PEBS when it goes to the guest mode regardless of this bit so we don't
> need to set it explicitly. At least for the other archs/vendors.
>
> I found the commit 1342798cc13e set the exclude_guest for precise_ip
> in the tool and the commit 20b279ddb38c added kernel side enforcement
> which was reverted by commit a706d965dcfd later.
>
> Actually it doesn't set the exclude_guest for the default event
> (cycles:P) already.
>
> $ grep -m1 vendor /proc/cpuinfo
> vendor_id : GenuineIntel
>
> $ perf record -e cycles:P true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ]
>
> $ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise
> precise_ip: 3
>
> But having lower 'p' modifier set the bit for some reason.
>
> $ perf record -e cycles:pp true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ]
>
> $ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise
> precise_ip: 2
> exclude_guest: 1
>
> Actually AMD IBS suffers from this because it doesn't support excludes
> and having this bit effectively disables new features in the current
> implementation (due to the missing feature check).
>
> $ grep -m1 vendor /proc/cpuinfo
> vendor_id : AuthenticAMD
>
> $ perf record -W -e cycles:p -vv true 2>&1 | grep switching
> switching off PERF_FORMAT_LOST support
> switching off weight struct support
> switching off bpf_event
> switching off ksymbol
> switching off cloexec flag
> switching off mmap2
> switching off exclude_guest, exclude_host
>
> By not setting exclude_guest, we can fix this inconsistency and the
> troubles.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/tests/parse-events.c | 12 ++++--------
> tools/perf/util/parse-events.c | 4 ----
> 2 files changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 727683f249f66f5a..82a19674a38f774e 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -898,8 +898,7 @@ static int test__group1(struct evlist *evlist)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - /* use of precise requires exclude_guest */
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip == 2);
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> @@ -1016,9 +1015,8 @@ static int test__group3(struct evlist *evlist __maybe_unused)
> TEST_ASSERT_VAL("wrong exclude_kernel",
> !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - /* use of precise requires exclude_guest */
> TEST_ASSERT_VAL("wrong exclude guest",
> - evsel->core.attr.exclude_guest);
> + !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host",
> !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip",
> @@ -1103,8 +1101,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
> TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - /* use of precise requires exclude_guest */
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip == 1);
> TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> @@ -1122,8 +1119,7 @@ static int test__group4(struct evlist *evlist __maybe_unused)
> TEST_ASSERT_VAL("wrong exclude_user", evsel->core.attr.exclude_user);
> TEST_ASSERT_VAL("wrong exclude_kernel", !evsel->core.attr.exclude_kernel);
> TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> - /* use of precise requires exclude_guest */
> - TEST_ASSERT_VAL("wrong exclude guest", evsel->core.attr.exclude_guest);
> + TEST_ASSERT_VAL("wrong exclude guest", !evsel->core.attr.exclude_guest);
> TEST_ASSERT_VAL("wrong exclude host", !evsel->core.attr.exclude_host);
> TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip == 2);
> TEST_ASSERT_VAL("wrong leader", evsel__has_leader(evsel, leader));
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index ff67213d6e887169..63da105ba914ef29 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1769,10 +1769,6 @@ static int parse_events__modifier_list(struct parse_events_state *parse_state,
> int exclude = eu | ek | eh;
> int exclude_GH = group ? evsel->exclude_GH : 0;
>
> - if (mod.precise) {
> - /* use of precise requires exclude_guest */
> - eG = 1;
> - }
> if (mod.user) {
> if (!exclude)
> exclude = eu = ek = eh = 1;
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (7 preceding siblings ...)
2024-10-01 0:20 ` [PATCH 8/8] perf record: Just use "cycles:P" as the default event Namhyung Kim
@ 2024-10-01 17:46 ` Liang, Kan
2024-10-01 21:19 ` Namhyung Kim
2024-10-02 9:49 ` James Clark
2024-10-15 4:25 ` Ravi Bangoria
10 siblings, 1 reply; 37+ messages in thread
From: Liang, Kan @ 2024-10-01 17:46 UTC (permalink / raw)
To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang
On 2024-09-30 8:20 p.m., Namhyung Kim wrote:
> Hello,
>
> I found perf tools set exclude_guest bit inconsistently. It used to
> set the bit but now the default event for perf record doesn't. So I'm
> wondering why we want the bit in the first place.
>
> Actually it's not good for PMUs don't support any exclusion like AMD
> IBS because it disables new features after the exclude_guest due to
> the missing feature detection logic.
>
> v4 changes)
>
> * handle EOPNOTSUPP error in compatible way (Kan)
> * drop --exclude-guest option in perf stat
> * not to separate exclude_hv fallback
> * rename to exclude_GH_default (Kan)
> * drop the RFC from the subject
>
> v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
>
> * move exclude_guest fallback to the front
> * fix precise_max handling on AMD
> * simplify the default event for perf record
>
> v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
>
> * update the missing feature detection logic
> * separate exclude_hv fallback
> * add new fallback for exclude_guest
>
> v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
>
> AFAIK it doesn't matter for the most cases but perf kvm. If users
> need to set the bit, they can still use :H modifier. For vPMU pass-
> through or Apple M1, it'd add the exclude_guest during the fallback
> logic.
>
> Also the kernel feature detection logic should be separated from the
> exclude bit tests since it depends on the PMU implementation rather
> than the core kernel features. So I changed it to use a software
> event for the detection and factor out some hw-specific checks.
>
> The code is available at 'perf/exclude-v4' branch in
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
>
> Namhyung Kim (8):
> perf tools: Add fallback for exclude_guest
> perf tools: Don't set attr.exclude_guest by default
> perf tools: Simplify evsel__add_modifier()
> perf tools: Do not set exclude_guest for precise_ip
> perf tools: Detect missing kernel features properly
> perf tools: Move x86__is_amd_cpu() to util/env.c
> perf tools: Check fallback error and order
> perf record: Just use "cycles:P" as the default event
>
The patch set looks good to me.
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Thanks,
Kan
> tools/perf/arch/x86/util/Build | 1 -
> tools/perf/arch/x86/util/env.c | 19 -
> tools/perf/arch/x86/util/env.h | 7 -
> tools/perf/arch/x86/util/pmu.c | 2 +-
> tools/perf/builtin-kvm.c | 1 +
> tools/perf/builtin-record.c | 4 +-
> tools/perf/builtin-stat.c | 18 +-
> tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
> tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
> tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
> tools/perf/tests/parse-events.c | 30 +-
> tools/perf/util/env.c | 24 ++
> tools/perf/util/env.h | 4 +
> tools/perf/util/evsel.c | 394 ++++++++++++++------
> tools/perf/util/evsel.h | 1 -
> tools/perf/util/parse-events.c | 6 +-
> tools/perf/util/util.c | 10 +-
> tools/perf/util/util.h | 3 +
> 18 files changed, 364 insertions(+), 166 deletions(-)
> delete mode 100644 tools/perf/arch/x86/util/env.c
> delete mode 100644 tools/perf/arch/x86/util/env.h
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/8] perf tools: Detect missing kernel features properly
2024-10-01 0:20 ` [PATCH 5/8] perf tools: Detect missing kernel features properly Namhyung Kim
@ 2024-10-01 17:53 ` Ian Rogers
2024-10-01 21:32 ` Namhyung Kim
2024-10-15 4:19 ` Ravi Bangoria
1 sibling, 1 reply; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 17:53 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The evsel__detect_missing_features() is to check if the attributes of
> the evsel is supported or not. But it checks the attribute based on the
> given evsel, it might miss something if the attr doesn't have the bit or
> give incorrect results if the event is special.
>
> Also it maintains the order of the feature that was added to the kernel
> which means it can assume older features should be supported once it
> detects the current feature is working. To minimized the confusion and
> to accurately check the kernel features, I think it's better to use a
> software event and go through all the features at once.
>
> Also make the function static since it's only used in evsel.c.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/util/evsel.c | 345 +++++++++++++++++++++++++++++-----------
> tools/perf/util/evsel.h | 1 -
> 2 files changed, 249 insertions(+), 97 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index f202d28147d62a44..32e30c293d0c6198 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -20,6 +20,7 @@
> #include <linux/zalloc.h>
> #include <sys/ioctl.h>
> #include <sys/resource.h>
> +#include <sys/syscall.h>
> #include <sys/types.h>
> #include <dirent.h>
> #include <stdlib.h>
> @@ -2150,120 +2151,272 @@ int evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
> return err;
> }
>
> -bool evsel__detect_missing_features(struct evsel *evsel)
> +static bool has_attr_feature(struct perf_event_attr *attr, unsigned long flags)
> {
> + int fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> + /*group_fd=*/-1, flags);
> + close(fd);
> +
> + if (fd < 0) {
> + attr->exclude_kernel = 1;
> +
> + fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> + /*group_fd=*/-1, flags);
> + close(fd);
> + }
> +
> + if (fd < 0) {
> + attr->exclude_hv = 1;
> +
> + fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> + /*group_fd=*/-1, flags);
> + close(fd);
> + }
> +
> + if (fd < 0) {
> + attr->exclude_guest = 1;
> +
> + fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> + /*group_fd=*/-1, flags);
> + close(fd);
> + }
> +
> + attr->exclude_kernel = 0;
> + attr->exclude_guest = 0;
> + attr->exclude_hv = 0;
> +
> + return fd >= 0;
> +}
> +
> +static void evsel__detect_missing_brstack_features(struct evsel *evsel)
In the future could other PMU specific unsupported features be added
not just brstack? Perhaps evsel__detect_missing_pmu_features would
better capture that.
> +{
> + static bool detection_done = false;
> + struct perf_event_attr attr = {
> + .type = evsel->core.attr.type,
> + .config = evsel->core.attr.config,
> + .disabled = 1,
> + .sample_type = PERF_SAMPLE_BRANCH_STACK,
> + .sample_period = 1000,
> + };
> + int old_errno;
> +
> + if (detection_done)
> + return;
> +
> + old_errno = errno;
> +
> /*
> * Must probe features in the order they were added to the
> - * perf_event_attr interface.
> + * perf_event_attr interface. These are PMU specific limitation
> + * so we can detect with the given hardware event and stop on the
> + * first one succeeded.
> */
> - if (!perf_missing_features.branch_counters &&
> - (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
> - perf_missing_features.branch_counters = true;
> - pr_debug2("switching off branch counters support\n");
> - return true;
> - } else if (!perf_missing_features.read_lost &&
> - (evsel->core.attr.read_format & PERF_FORMAT_LOST)) {
> - perf_missing_features.read_lost = true;
> - pr_debug2("switching off PERF_FORMAT_LOST support\n");
> +
> + /* Please add new feature detection here. */
> +
> + attr.branch_sample_type = PERF_SAMPLE_BRANCH_COUNTERS;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.branch_counters = true;
It feels like these global variables should be part of the PMU state.
There is already perf_pmu.missing_features.
Thanks,
Ian
> + pr_debug2("switching off branch counters support\n");
> +
> + attr.branch_sample_type = PERF_SAMPLE_BRANCH_HW_INDEX;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.branch_hw_idx = true;
> + pr_debug2("switching off branch HW index support\n");
> +
> + attr.branch_sample_type = PERF_SAMPLE_BRANCH_NO_CYCLES | PERF_SAMPLE_BRANCH_NO_FLAGS;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.lbr_flags = true;
> + pr_debug2_peo("switching off branch sample type no (cycles/flags)\n");
> +
> +found:
> + detection_done = true;
> + errno = old_errno;
> +}
> +
> +static bool evsel__detect_missing_features(struct evsel *evsel)
> +{
> + static bool detection_done = false;
> + struct perf_event_attr attr = {
> + .type = PERF_TYPE_SOFTWARE,
> + .config = PERF_COUNT_SW_TASK_CLOCK,
> + .disabled = 1,
> + };
> + int old_errno;
> +
> + if (evsel__has_br_stack(evsel))
> + evsel__detect_missing_brstack_features(evsel);
> +
> + if (detection_done)
> + goto check;
> +
> + old_errno = errno;
> +
> + /*
> + * Must probe features in the order they were added to the
> + * perf_event_attr interface. These are kernel core limitation
> + * not PMU-specific so we can detect with a software event and
> + * stop on the first one succeeded.
> + */
> +
> + /* Please add new feature detection here. */
> +
> + attr.read_format = PERF_FORMAT_LOST;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.read_lost = true;
> + pr_debug2("switching off PERF_FORMAT_LOST support\n");
> + attr.read_format = 0;
> +
> + attr.sample_type = PERF_SAMPLE_WEIGHT_STRUCT;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.weight_struct = true;
> + pr_debug2("switching off weight struct support\n");
> + attr.sample_type = 0;
> +
> + attr.sample_type = PERF_SAMPLE_CODE_PAGE_SIZE;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.code_page_size = true;
> + pr_debug2_peo("Kernel has no PERF_SAMPLE_CODE_PAGE_SIZE support\n");
> + attr.sample_type = 0;
> +
> + attr.sample_type = PERF_SAMPLE_DATA_PAGE_SIZE;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.data_page_size = true;
> + pr_debug2_peo("Kernel has no PERF_SAMPLE_DATA_PAGE_SIZE support\n");
> + attr.sample_type = 0;
> +
> + attr.cgroup = 1;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.cgroup = true;
> + pr_debug2_peo("Kernel has no cgroup sampling support\n");
> + attr.cgroup = 0;
> +
> + attr.aux_output = 1;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.aux_output = true;
> + pr_debug2_peo("Kernel has no attr.aux_output support\n");
> + attr.aux_output = 0;
> +
> + attr.bpf_event = 1;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.bpf = true;
> + pr_debug2_peo("switching off bpf_event\n");
> + attr.bpf_event = 0;
> +
> + attr.ksymbol = 1;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.ksymbol = true;
> + pr_debug2_peo("switching off ksymbol\n");
> + attr.ksymbol = 0;
> +
> + attr.write_backward = 1;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.write_backward = true;
> + pr_debug2_peo("switching off write_backward\n");
> + attr.write_backward = 0;
> +
> + attr.use_clockid = 1;
> + attr.clockid = CLOCK_MONOTONIC;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.clockid = true;
> + pr_debug2_peo("switching off clockid\n");
> + attr.use_clockid = 0;
> + attr.clockid = 0;
> +
> + if (has_attr_feature(&attr, /*flags=*/PERF_FLAG_FD_CLOEXEC))
> + goto found;
> + perf_missing_features.cloexec = true;
> + pr_debug2_peo("switching off cloexec flag\n");
> +
> + attr.mmap2 = 1;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.mmap2 = true;
> + pr_debug2_peo("switching off mmap2\n");
> + attr.mmap2 = 0;
> +
> + /* set this unconditionally? */
> + perf_missing_features.sample_id_all = true;
> + pr_debug2_peo("switching off sample_id_all\n");
> +
> + attr.inherit = 1;
> + attr.read_format = PERF_FORMAT_GROUP;
> + if (has_attr_feature(&attr, /*flags=*/0))
> + goto found;
> + perf_missing_features.group_read = true;
> + pr_debug2_peo("switching off group read\n");
> + attr.inherit = 0;
> + attr.read_format = 0;
> +
> +found:
> + detection_done = true;
> + errno = old_errno;
> +
> +check:
> + if ((evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
> + perf_missing_features.branch_counters)
> return true;
> - } else if (!perf_missing_features.weight_struct &&
> - (evsel->core.attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT)) {
> - perf_missing_features.weight_struct = true;
> - pr_debug2("switching off weight struct support\n");
> +
> + if ((evsel->core.attr.read_format & PERF_FORMAT_LOST) &&
> + perf_missing_features.read_lost)
> return true;
> - } else if (!perf_missing_features.code_page_size &&
> - (evsel->core.attr.sample_type & PERF_SAMPLE_CODE_PAGE_SIZE)) {
> - perf_missing_features.code_page_size = true;
> - pr_debug2_peo("Kernel has no PERF_SAMPLE_CODE_PAGE_SIZE support, bailing out\n");
> - return false;
> - } else if (!perf_missing_features.data_page_size &&
> - (evsel->core.attr.sample_type & PERF_SAMPLE_DATA_PAGE_SIZE)) {
> - perf_missing_features.data_page_size = true;
> - pr_debug2_peo("Kernel has no PERF_SAMPLE_DATA_PAGE_SIZE support, bailing out\n");
> - return false;
> - } else if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
> - perf_missing_features.cgroup = true;
> - pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
> - return false;
> - } else if (!perf_missing_features.branch_hw_idx &&
> - (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX)) {
> - perf_missing_features.branch_hw_idx = true;
> - pr_debug2("switching off branch HW index support\n");
> +
> + if ((evsel->core.attr.sample_type & PERF_SAMPLE_WEIGHT_STRUCT) &&
> + perf_missing_features.weight_struct)
> return true;
> - } else if (!perf_missing_features.aux_output && evsel->core.attr.aux_output) {
> - perf_missing_features.aux_output = true;
> - pr_debug2_peo("Kernel has no attr.aux_output support, bailing out\n");
> - return false;
> - } else if (!perf_missing_features.bpf && evsel->core.attr.bpf_event) {
> - perf_missing_features.bpf = true;
> - pr_debug2_peo("switching off bpf_event\n");
> +
> + if (evsel->core.attr.use_clockid && evsel->core.attr.clockid != CLOCK_MONOTONIC &&
> + !perf_missing_features.clockid) {
> + perf_missing_features.clockid_wrong = true;
> return true;
> - } else if (!perf_missing_features.ksymbol && evsel->core.attr.ksymbol) {
> - perf_missing_features.ksymbol = true;
> - pr_debug2_peo("switching off ksymbol\n");
> + }
> +
> + if (evsel->core.attr.use_clockid && perf_missing_features.clockid)
> return true;
> - } else if (!perf_missing_features.write_backward && evsel->core.attr.write_backward) {
> - perf_missing_features.write_backward = true;
> - pr_debug2_peo("switching off write_backward\n");
> - return false;
> - } else if (!perf_missing_features.clockid_wrong && evsel->core.attr.use_clockid) {
> - perf_missing_features.clockid_wrong = true;
> - pr_debug2_peo("switching off clockid\n");
> +
> + if ((evsel->open_flags & PERF_FLAG_FD_CLOEXEC) &&
> + perf_missing_features.cloexec)
> return true;
> - } else if (!perf_missing_features.clockid && evsel->core.attr.use_clockid) {
> - perf_missing_features.clockid = true;
> - pr_debug2_peo("switching off use_clockid\n");
> +
> + if (evsel->core.attr.mmap2 && perf_missing_features.mmap2)
> return true;
> - } else if (!perf_missing_features.cloexec && (evsel->open_flags & PERF_FLAG_FD_CLOEXEC)) {
> - perf_missing_features.cloexec = true;
> - pr_debug2_peo("switching off cloexec flag\n");
> +
> + if ((evsel->core.attr.branch_sample_type & (PERF_SAMPLE_BRANCH_NO_FLAGS |
> + PERF_SAMPLE_BRANCH_NO_CYCLES)) &&
> + perf_missing_features.lbr_flags)
> return true;
> - } else if (!perf_missing_features.mmap2 && evsel->core.attr.mmap2) {
> - perf_missing_features.mmap2 = true;
> - pr_debug2_peo("switching off mmap2\n");
> +
> + if (evsel->core.attr.inherit && (evsel->core.attr.read_format & PERF_FORMAT_GROUP) &&
> + perf_missing_features.group_read)
> return true;
> - } else if (evsel->core.attr.exclude_guest || evsel->core.attr.exclude_host) {
> - if (evsel->pmu == NULL)
> - evsel->pmu = evsel__find_pmu(evsel);
> -
> - if (evsel->pmu)
> - evsel->pmu->missing_features.exclude_guest = true;
> - else {
> - /* we cannot find PMU, disable attrs now */
> - evsel->core.attr.exclude_host = false;
> - evsel->core.attr.exclude_guest = false;
> - }
>
> - if (evsel->exclude_GH) {
> - pr_debug2_peo("PMU has no exclude_host/guest support, bailing out\n");
> - return false;
> - }
> - if (!perf_missing_features.exclude_guest) {
> - perf_missing_features.exclude_guest = true;
> - pr_debug2_peo("switching off exclude_guest, exclude_host\n");
> - }
> + if (evsel->core.attr.ksymbol && perf_missing_features.ksymbol)
> return true;
> - } else if (!perf_missing_features.sample_id_all) {
> - perf_missing_features.sample_id_all = true;
> - pr_debug2_peo("switching off sample_id_all\n");
> +
> + if (evsel->core.attr.bpf_event && perf_missing_features.bpf)
> return true;
> - } else if (!perf_missing_features.lbr_flags &&
> - (evsel->core.attr.branch_sample_type &
> - (PERF_SAMPLE_BRANCH_NO_CYCLES |
> - PERF_SAMPLE_BRANCH_NO_FLAGS))) {
> - perf_missing_features.lbr_flags = true;
> - pr_debug2_peo("switching off branch sample type no (cycles/flags)\n");
> +
> + if ((evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX) &&
> + perf_missing_features.branch_hw_idx)
> return true;
> - } else if (!perf_missing_features.group_read &&
> - evsel->core.attr.inherit &&
> - (evsel->core.attr.read_format & PERF_FORMAT_GROUP) &&
> - evsel__is_group_leader(evsel)) {
> - perf_missing_features.group_read = true;
> - pr_debug2_peo("switching off group read\n");
> +
> + if (evsel->core.attr.sample_id_all && perf_missing_features.sample_id_all)
> return true;
> - } else {
> - return false;
> - }
> +
> + return false;
> }
>
> static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 3e751ea769ac4d3a..ea3140cd91c589fd 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -368,7 +368,6 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
> void evsel__close(struct evsel *evsel);
> int evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
> struct perf_thread_map *threads);
> -bool evsel__detect_missing_features(struct evsel *evsel);
>
> bool evsel__precise_ip_fallback(struct evsel *evsel);
>
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-01 0:20 ` [PATCH 7/8] perf tools: Check fallback error and order Namhyung Kim
@ 2024-10-01 18:00 ` Ian Rogers
2024-10-01 21:36 ` Namhyung Kim
0 siblings, 1 reply; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 18:00 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The perf_event_open might fail due to various reasons, so blindly
> reducing precise_ip level might not be the best way to deal with it.
>
> It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> given precise level. Let's try again with the correct error code.
>
> This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> user events with exclude_kernel=1 cannot make progress. Let's add the
> evsel__handle_error_quirks() to this case specially. I plan to work on
> the kernel side to improve this situation but it'd still need some
> special handling for IBS.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 32e30c293d0c6198..ef8356260eea54cd 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> return false;
> }
>
> +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> +{
> + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
Should the quirk handling be specific to evsels with the IBS PMU given
the comment above? ie this is a PMU specific workaround rather than an
AMD specific workaround, however, the PMU only exists on AMD :-)
> + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
So here rather than x86__is_amd_cpu it would be
!strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
the logic into pmu.c.
Thanks,
Ian
> + evsel->core.attr.precise_ip = 0;
> + pr_debug2_peo("removing precise_ip on AMD\n");
> + display_attr(&evsel->core.attr);
> + return true;
> + }
> +
> + return false;
> +}
> +
> static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> struct perf_thread_map *threads,
> int start_cpu_map_idx, int end_cpu_map_idx)
> @@ -2580,9 +2594,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> return 0;
>
> try_fallback:
> - if (evsel__precise_ip_fallback(evsel))
> - goto retry_open;
> -
> if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> idx, threads, thread, err)) {
> /* We just removed 1 thread, so lower the upper nthreads limit. */
> @@ -2599,11 +2610,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> goto retry_open;
>
> - if (err != -EINVAL || idx > 0 || thread > 0)
> - goto out_close;
> + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> + goto retry_open;
>
> - if (evsel__detect_missing_features(evsel))
> + if (err == -EINVAL && evsel__detect_missing_features(evsel))
> goto fallback_missing_features;
> +
> + if (evsel__handle_error_quirks(evsel, err))
> + goto retry_open;
> +
> out_close:
> if (err)
> threads->err_thread = thread;
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 8/8] perf record: Just use "cycles:P" as the default event
2024-10-01 0:20 ` [PATCH 8/8] perf record: Just use "cycles:P" as the default event Namhyung Kim
@ 2024-10-01 18:00 ` Ian Rogers
0 siblings, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 18:00 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The fallback logic can add ":u" modifier if needed.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Thanks,
Ian
> ---
> tools/perf/builtin-record.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index adbaf80b398c1f4c..f8325247292112d7 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -4157,9 +4157,7 @@ int cmd_record(int argc, const char **argv)
> record.opts.tail_synthesize = true;
>
> if (rec->evlist->core.nr_entries == 0) {
> - bool can_profile_kernel = perf_event_paranoid_check(1);
> -
> - err = parse_event(rec->evlist, can_profile_kernel ? "cycles:P" : "cycles:Pu");
> + err = parse_event(rec->evlist, "cycles:P");
> if (err)
> goto out;
> }
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
2024-10-01 17:46 ` [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Liang, Kan
@ 2024-10-01 21:19 ` Namhyung Kim
0 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 21:19 UTC (permalink / raw)
To: Liang, Kan
Cc: Arnaldo Carvalho de Melo, Ian Rogers, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Tue, Oct 01, 2024 at 01:46:36PM -0400, Liang, Kan wrote:
>
>
> On 2024-09-30 8:20 p.m., Namhyung Kim wrote:
> > Hello,
> >
> > I found perf tools set exclude_guest bit inconsistently. It used to
> > set the bit but now the default event for perf record doesn't. So I'm
> > wondering why we want the bit in the first place.
> >
> > Actually it's not good for PMUs don't support any exclusion like AMD
> > IBS because it disables new features after the exclude_guest due to
> > the missing feature detection logic.
> >
> > v4 changes)
> >
> > * handle EOPNOTSUPP error in compatible way (Kan)
> > * drop --exclude-guest option in perf stat
> > * not to separate exclude_hv fallback
> > * rename to exclude_GH_default (Kan)
> > * drop the RFC from the subject
> >
> > v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
> >
> > * move exclude_guest fallback to the front
> > * fix precise_max handling on AMD
> > * simplify the default event for perf record
> >
> > v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
> >
> > * update the missing feature detection logic
> > * separate exclude_hv fallback
> > * add new fallback for exclude_guest
> >
> > v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
> >
> > AFAIK it doesn't matter for the most cases but perf kvm. If users
> > need to set the bit, they can still use :H modifier. For vPMU pass-
> > through or Apple M1, it'd add the exclude_guest during the fallback
> > logic.
> >
> > Also the kernel feature detection logic should be separated from the
> > exclude bit tests since it depends on the PMU implementation rather
> > than the core kernel features. So I changed it to use a software
> > event for the detection and factor out some hw-specific checks.
> >
> > The code is available at 'perf/exclude-v4' branch in
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> >
> > Thanks,
> > Namhyung
> >
> >
> > Namhyung Kim (8):
> > perf tools: Add fallback for exclude_guest
> > perf tools: Don't set attr.exclude_guest by default
> > perf tools: Simplify evsel__add_modifier()
> > perf tools: Do not set exclude_guest for precise_ip
> > perf tools: Detect missing kernel features properly
> > perf tools: Move x86__is_amd_cpu() to util/env.c
> > perf tools: Check fallback error and order
> > perf record: Just use "cycles:P" as the default event
> >
>
> The patch set looks good to me.
>
> Acked-by: Kan Liang <kan.liang@linux.intel.com>
Thanks for your review!
Namhyung
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/8] perf tools: Detect missing kernel features properly
2024-10-01 17:53 ` Ian Rogers
@ 2024-10-01 21:32 ` Namhyung Kim
0 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 21:32 UTC (permalink / raw)
To: Ian Rogers, Kan Liang
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Tue, Oct 01, 2024 at 10:53:02AM -0700, Ian Rogers wrote:
> On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The evsel__detect_missing_features() is to check if the attributes of
> > the evsel is supported or not. But it checks the attribute based on the
> > given evsel, it might miss something if the attr doesn't have the bit or
> > give incorrect results if the event is special.
> >
> > Also it maintains the order of the feature that was added to the kernel
> > which means it can assume older features should be supported once it
> > detects the current feature is working. To minimized the confusion and
> > to accurately check the kernel features, I think it's better to use a
> > software event and go through all the features at once.
> >
> > Also make the function static since it's only used in evsel.c.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/util/evsel.c | 345 +++++++++++++++++++++++++++++-----------
> > tools/perf/util/evsel.h | 1 -
> > 2 files changed, 249 insertions(+), 97 deletions(-)
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index f202d28147d62a44..32e30c293d0c6198 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -20,6 +20,7 @@
> > #include <linux/zalloc.h>
> > #include <sys/ioctl.h>
> > #include <sys/resource.h>
> > +#include <sys/syscall.h>
> > #include <sys/types.h>
> > #include <dirent.h>
> > #include <stdlib.h>
> > @@ -2150,120 +2151,272 @@ int evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
> > return err;
> > }
> >
> > -bool evsel__detect_missing_features(struct evsel *evsel)
> > +static bool has_attr_feature(struct perf_event_attr *attr, unsigned long flags)
> > {
> > + int fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> > + /*group_fd=*/-1, flags);
> > + close(fd);
> > +
> > + if (fd < 0) {
> > + attr->exclude_kernel = 1;
> > +
> > + fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> > + /*group_fd=*/-1, flags);
> > + close(fd);
> > + }
> > +
> > + if (fd < 0) {
> > + attr->exclude_hv = 1;
> > +
> > + fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> > + /*group_fd=*/-1, flags);
> > + close(fd);
> > + }
> > +
> > + if (fd < 0) {
> > + attr->exclude_guest = 1;
> > +
> > + fd = syscall(SYS_perf_event_open, attr, /*pid=*/0, /*cpu=*/-1,
> > + /*group_fd=*/-1, flags);
> > + close(fd);
> > + }
> > +
> > + attr->exclude_kernel = 0;
> > + attr->exclude_guest = 0;
> > + attr->exclude_hv = 0;
> > +
> > + return fd >= 0;
> > +}
> > +
> > +static void evsel__detect_missing_brstack_features(struct evsel *evsel)
>
> In the future could other PMU specific unsupported features be added
> not just brstack? Perhaps evsel__detect_missing_pmu_features would
> better capture that.
Yep, sounds reasonable. I think we can add that if we have another
thing to check.
>
> > +{
> > + static bool detection_done = false;
> > + struct perf_event_attr attr = {
> > + .type = evsel->core.attr.type,
> > + .config = evsel->core.attr.config,
> > + .disabled = 1,
> > + .sample_type = PERF_SAMPLE_BRANCH_STACK,
> > + .sample_period = 1000,
> > + };
> > + int old_errno;
> > +
> > + if (detection_done)
> > + return;
> > +
> > + old_errno = errno;
> > +
> > /*
> > * Must probe features in the order they were added to the
> > - * perf_event_attr interface.
> > + * perf_event_attr interface. These are PMU specific limitation
> > + * so we can detect with the given hardware event and stop on the
> > + * first one succeeded.
> > */
> > - if (!perf_missing_features.branch_counters &&
> > - (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
> > - perf_missing_features.branch_counters = true;
> > - pr_debug2("switching off branch counters support\n");
> > - return true;
> > - } else if (!perf_missing_features.read_lost &&
> > - (evsel->core.attr.read_format & PERF_FORMAT_LOST)) {
> > - perf_missing_features.read_lost = true;
> > - pr_debug2("switching off PERF_FORMAT_LOST support\n");
> > +
> > + /* Please add new feature detection here. */
> > +
> > + attr.branch_sample_type = PERF_SAMPLE_BRANCH_COUNTERS;
> > + if (has_attr_feature(&attr, /*flags=*/0))
> > + goto found;
> > + perf_missing_features.branch_counters = true;
>
> It feels like these global variables should be part of the PMU state.
> There is already perf_pmu.missing_features.
You're right. But I think this is kinda global feature unless we have
different PMUs that provide different branch sampling capability. It's
just the feature test requires a specific PMU and event. But it'd be
better putting this into struct perf_pmu later.
Kan, can you confirm if Intel hybrid systems have the same branch stack
sampling capabilities on both cores?
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-01 18:00 ` Ian Rogers
@ 2024-10-01 21:36 ` Namhyung Kim
2024-10-01 22:21 ` Ian Rogers
0 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 21:36 UTC (permalink / raw)
To: Ian Rogers
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Tue, Oct 01, 2024 at 11:00:20AM -0700, Ian Rogers wrote:
> On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The perf_event_open might fail due to various reasons, so blindly
> > reducing precise_ip level might not be the best way to deal with it.
> >
> > It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> > given precise level. Let's try again with the correct error code.
> >
> > This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> > user events with exclude_kernel=1 cannot make progress. Let's add the
> > evsel__handle_error_quirks() to this case specially. I plan to work on
> > the kernel side to improve this situation but it'd still need some
> > special handling for IBS.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> > 1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 32e30c293d0c6198..ef8356260eea54cd 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> > return false;
> > }
> >
> > +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> > +{
> > + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
>
> Should the quirk handling be specific to evsels with the IBS PMU given
> the comment above? ie this is a PMU specific workaround rather than an
> AMD specific workaround, however, the PMU only exists on AMD :-)
>
> > + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> > + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
>
> So here rather than x86__is_amd_cpu it would be
> !strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
> the logic into pmu.c.
I guess the problem is that AMD driver does implicit forwarding to IBS
if the legacy hardware events have precise_ip. So it might have just
core pmu (or no pmu) in the evsel.
Thanks,
Namhyung
>
> > + evsel->core.attr.precise_ip = 0;
> > + pr_debug2_peo("removing precise_ip on AMD\n");
> > + display_attr(&evsel->core.attr);
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > struct perf_thread_map *threads,
> > int start_cpu_map_idx, int end_cpu_map_idx)
> > @@ -2580,9 +2594,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > return 0;
> >
> > try_fallback:
> > - if (evsel__precise_ip_fallback(evsel))
> > - goto retry_open;
> > -
> > if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> > idx, threads, thread, err)) {
> > /* We just removed 1 thread, so lower the upper nthreads limit. */
> > @@ -2599,11 +2610,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> > goto retry_open;
> >
> > - if (err != -EINVAL || idx > 0 || thread > 0)
> > - goto out_close;
> > + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> > + goto retry_open;
> >
> > - if (evsel__detect_missing_features(evsel))
> > + if (err == -EINVAL && evsel__detect_missing_features(evsel))
> > goto fallback_missing_features;
> > +
> > + if (evsel__handle_error_quirks(evsel, err))
> > + goto retry_open;
> > +
> > out_close:
> > if (err)
> > threads->err_thread = thread;
> > --
> > 2.46.1.824.gd892dcdcdd-goog
> >
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-01 21:36 ` Namhyung Kim
@ 2024-10-01 22:21 ` Ian Rogers
2024-10-03 17:06 ` Namhyung Kim
0 siblings, 1 reply; 37+ messages in thread
From: Ian Rogers @ 2024-10-01 22:21 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Tue, Oct 1, 2024 at 2:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Oct 01, 2024 at 11:00:20AM -0700, Ian Rogers wrote:
> > On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > The perf_event_open might fail due to various reasons, so blindly
> > > reducing precise_ip level might not be the best way to deal with it.
> > >
> > > It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> > > given precise level. Let's try again with the correct error code.
> > >
> > > This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> > > user events with exclude_kernel=1 cannot make progress. Let's add the
> > > evsel__handle_error_quirks() to this case specially. I plan to work on
> > > the kernel side to improve this situation but it'd still need some
> > > special handling for IBS.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > > tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > index 32e30c293d0c6198..ef8356260eea54cd 100644
> > > --- a/tools/perf/util/evsel.c
> > > +++ b/tools/perf/util/evsel.c
> > > @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> > > return false;
> > > }
> > >
> > > +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> > > +{
> > > + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
> >
> > Should the quirk handling be specific to evsels with the IBS PMU given
> > the comment above? ie this is a PMU specific workaround rather than an
> > AMD specific workaround, however, the PMU only exists on AMD :-)
> >
> > > + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> > > + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
> >
> > So here rather than x86__is_amd_cpu it would be
> > !strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
> > the logic into pmu.c.
>
> I guess the problem is that AMD driver does implicit forwarding to IBS
> if the legacy hardware events have precise_ip. So it might have just
> core pmu (or no pmu) in the evsel.
I think the no PMU case should probably have a PMU possibly similar to
the tool PMU in:
https://lore.kernel.org/all/20240912190341.919229-1-irogers@google.com/
But that's not in place yet. You can always use
perf_pmus__scan_core(NULL) or
perf_pmus__find_by_type(evsel->core.attr.type or PERF_TYPE_RAW).
evsel->pmu->is_core && x86__is_amd_cpu() would identify an AMD core
PMU whereas the code above could fire for IBS or other PMUs.
Thanks,
Ian
> >
> > > + evsel->core.attr.precise_ip = 0;
> > > + pr_debug2_peo("removing precise_ip on AMD\n");
> > > + display_attr(&evsel->core.attr);
> > > + return true;
> > > + }
> > > +
> > > + return false;
> > > +}
> > > +
> > > static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > struct perf_thread_map *threads,
> > > int start_cpu_map_idx, int end_cpu_map_idx)
> > > @@ -2580,9 +2594,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > return 0;
> > >
> > > try_fallback:
> > > - if (evsel__precise_ip_fallback(evsel))
> > > - goto retry_open;
> > > -
> > > if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> > > idx, threads, thread, err)) {
> > > /* We just removed 1 thread, so lower the upper nthreads limit. */
> > > @@ -2599,11 +2610,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> > > goto retry_open;
> > >
> > > - if (err != -EINVAL || idx > 0 || thread > 0)
> > > - goto out_close;
> > > + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> > > + goto retry_open;
> > >
> > > - if (evsel__detect_missing_features(evsel))
> > > + if (err == -EINVAL && evsel__detect_missing_features(evsel))
> > > goto fallback_missing_features;
> > > +
> > > + if (evsel__handle_error_quirks(evsel, err))
> > > + goto retry_open;
> > > +
> > > out_close:
> > > if (err)
> > > threads->err_thread = thread;
> > > --
> > > 2.46.1.824.gd892dcdcdd-goog
> > >
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/8] perf tools: Move x86__is_amd_cpu() to util/env.c
2024-10-01 16:03 ` Liang, Kan
@ 2024-10-01 22:39 ` Namhyung Kim
0 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-01 22:39 UTC (permalink / raw)
To: Liang, Kan
Cc: Arnaldo Carvalho de Melo, Ian Rogers, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Tue, Oct 01, 2024 at 12:03:29PM -0400, Liang, Kan wrote:
>
>
> On 2024-09-30 8:20 p.m., Namhyung Kim wrote:
> > It can be called from non-x86 platform so let's move it to the general
> > util directory. Also add a new helper perf_env__is_x86_amd_cpu() so
> > that it can be called with an existing perf_env as well.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/arch/x86/util/Build | 1 -
> > tools/perf/arch/x86/util/env.c | 19 -------------------
> > tools/perf/arch/x86/util/env.h | 7 -------
> > tools/perf/arch/x86/util/pmu.c | 2 +-
> > tools/perf/util/env.c | 24 ++++++++++++++++++++++++
> > tools/perf/util/env.h | 4 ++++
> > 6 files changed, 29 insertions(+), 28 deletions(-)
> > delete mode 100644 tools/perf/arch/x86/util/env.c
> > delete mode 100644 tools/perf/arch/x86/util/env.h
> >
> > diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
> > index 2607ed5c42966543..ce6d802a1381c5ab 100644
> > --- a/tools/perf/arch/x86/util/Build
> > +++ b/tools/perf/arch/x86/util/Build
> > @@ -10,7 +10,6 @@ perf-util-y += evlist.o
> > perf-util-y += mem-events.o
> > perf-util-y += evsel.o
> > perf-util-y += iostat.o
> > -perf-util-y += env.o
> >
> > perf-util-$(CONFIG_DWARF) += dwarf-regs.o
> > perf-util-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
> > diff --git a/tools/perf/arch/x86/util/env.c b/tools/perf/arch/x86/util/env.c
> > deleted file mode 100644
> > index 3e537ffb1353aab2..0000000000000000
> > --- a/tools/perf/arch/x86/util/env.c
> > +++ /dev/null
> > @@ -1,19 +0,0 @@
> > -// SPDX-License-Identifier: GPL-2.0
> > -#include "linux/string.h"
> > -#include "util/env.h"
> > -#include "env.h"
> > -
> > -bool x86__is_amd_cpu(void)
> > -{
> > - struct perf_env env = { .total_mem = 0, };
> > - static int is_amd; /* 0: Uninitialized, 1: Yes, -1: No */
> > -
> > - if (is_amd)
> > - goto ret;
> > -
> > - perf_env__cpuid(&env);
> > - is_amd = env.cpuid && strstarts(env.cpuid, "AuthenticAMD") ? 1 : -1;
> > - perf_env__exit(&env);
> > -ret:
> > - return is_amd >= 1 ? true : false;
> > -}
> > diff --git a/tools/perf/arch/x86/util/env.h b/tools/perf/arch/x86/util/env.h
> > deleted file mode 100644
> > index d78f080b6b3f889a..0000000000000000
> > --- a/tools/perf/arch/x86/util/env.h
> > +++ /dev/null
> > @@ -1,7 +0,0 @@
> > -/* SPDX-License-Identifier: GPL-2.0 */
> > -#ifndef _X86_ENV_H
> > -#define _X86_ENV_H
> > -
> > -bool x86__is_amd_cpu(void);
> > -
> > -#endif /* _X86_ENV_H */
> > diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
> > index c3d89d6ba1bf03ad..e0060dac2a9f9242 100644
> > --- a/tools/perf/arch/x86/util/pmu.c
> > +++ b/tools/perf/arch/x86/util/pmu.c
> > @@ -16,7 +16,7 @@
> > #include "../../../util/fncache.h"
> > #include "../../../util/pmus.h"
> > #include "mem-events.h"
> > -#include "env.h"
> > +#include "util/env.h"
> >
> > void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> > {
> > diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> > index 1edbccfc3281d2b1..470a0156e0722e4e 100644
> > --- a/tools/perf/util/env.c
> > +++ b/tools/perf/util/env.c
> > @@ -5,6 +5,7 @@
> > #include "util/header.h"
> > #include "linux/compiler.h"
> > #include <linux/ctype.h>
> > +#include <linux/string.h>
> > #include <linux/zalloc.h>
> > #include "cgroup.h"
> > #include <errno.h>
> > @@ -625,6 +626,7 @@ char *perf_env__find_pmu_cap(struct perf_env *env, const char *pmu_name,
> > return NULL;
> > }
> >
> > +
>
> Useless empty line.
Ugh, will remove.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (8 preceding siblings ...)
2024-10-01 17:46 ` [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Liang, Kan
@ 2024-10-02 9:49 ` James Clark
2024-10-02 18:29 ` Namhyung Kim
2024-10-15 4:25 ` Ravi Bangoria
10 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2024-10-02 9:49 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang, Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
On 01/10/2024 1:20 am, Namhyung Kim wrote:
> Hello,
>
> I found perf tools set exclude_guest bit inconsistently. It used to
> set the bit but now the default event for perf record doesn't. So I'm
> wondering why we want the bit in the first place.
>
> Actually it's not good for PMUs don't support any exclusion like AMD
> IBS because it disables new features after the exclude_guest due to
> the missing feature detection logic.
>
> v4 changes)
>
> * handle EOPNOTSUPP error in compatible way (Kan)
> * drop --exclude-guest option in perf stat
> * not to separate exclude_hv fallback
> * rename to exclude_GH_default (Kan)
> * drop the RFC from the subject
>
> v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
>
> * move exclude_guest fallback to the front
> * fix precise_max handling on AMD
> * simplify the default event for perf record
>
> v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
>
> * update the missing feature detection logic
> * separate exclude_hv fallback
> * add new fallback for exclude_guest
>
> v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
>
> AFAIK it doesn't matter for the most cases but perf kvm. If users
> need to set the bit, they can still use :H modifier. For vPMU pass-
> through or Apple M1, it'd add the exclude_guest during the fallback
> logic.
>
> Also the kernel feature detection logic should be separated from the
> exclude bit tests since it depends on the PMU implementation rather
> than the core kernel features. So I changed it to use a software
> event for the detection and factor out some hw-specific checks.
>
> The code is available at 'perf/exclude-v4' branch in
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
>
Looks like you need to allow for :H in the perf stat test on M1 now:
diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
index 5a2ca2bcf94d..77cb95859649 100755
--- a/tools/perf/tests/shell/stat.sh
+++ b/tools/perf/tests/shell/stat.sh
@@ -161,7 +161,7 @@ test_hybrid() {
fi
# Run default Perf stat
- cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles
" | wc -l)
+ cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles
| cycles:H " | wc -l)
if [ "$pmus" -ne "$cycles_events" ]
then
Other than that:
Reviewed-by: James Clark <james.clark@linaro.org>
> Namhyung Kim (8):
> perf tools: Add fallback for exclude_guest
> perf tools: Don't set attr.exclude_guest by default
> perf tools: Simplify evsel__add_modifier()
> perf tools: Do not set exclude_guest for precise_ip
> perf tools: Detect missing kernel features properly
> perf tools: Move x86__is_amd_cpu() to util/env.c
> perf tools: Check fallback error and order
> perf record: Just use "cycles:P" as the default event
>
> tools/perf/arch/x86/util/Build | 1 -
> tools/perf/arch/x86/util/env.c | 19 -
> tools/perf/arch/x86/util/env.h | 7 -
> tools/perf/arch/x86/util/pmu.c | 2 +-
> tools/perf/builtin-kvm.c | 1 +
> tools/perf/builtin-record.c | 4 +-
> tools/perf/builtin-stat.c | 18 +-
> tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
> tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
> tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
> tools/perf/tests/parse-events.c | 30 +-
> tools/perf/util/env.c | 24 ++
> tools/perf/util/env.h | 4 +
> tools/perf/util/evsel.c | 394 ++++++++++++++------
> tools/perf/util/evsel.h | 1 -
> tools/perf/util/parse-events.c | 6 +-
> tools/perf/util/util.c | 10 +-
> tools/perf/util/util.h | 3 +
> 18 files changed, 364 insertions(+), 166 deletions(-)
> delete mode 100644 tools/perf/arch/x86/util/env.c
> delete mode 100644 tools/perf/arch/x86/util/env.h
>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
2024-10-02 9:49 ` James Clark
@ 2024-10-02 18:29 ` Namhyung Kim
2024-10-04 15:40 ` James Clark
0 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-02 18:29 UTC (permalink / raw)
To: James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang, Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
On Wed, Oct 02, 2024 at 10:49:36AM +0100, James Clark wrote:
>
>
> On 01/10/2024 1:20 am, Namhyung Kim wrote:
> > Hello,
> >
> > I found perf tools set exclude_guest bit inconsistently. It used to
> > set the bit but now the default event for perf record doesn't. So I'm
> > wondering why we want the bit in the first place.
> >
> > Actually it's not good for PMUs don't support any exclusion like AMD
> > IBS because it disables new features after the exclude_guest due to
> > the missing feature detection logic.
> >
> > v4 changes)
> >
> > * handle EOPNOTSUPP error in compatible way (Kan)
> > * drop --exclude-guest option in perf stat
> > * not to separate exclude_hv fallback
> > * rename to exclude_GH_default (Kan)
> > * drop the RFC from the subject
> >
> > v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
> >
> > * move exclude_guest fallback to the front
> > * fix precise_max handling on AMD
> > * simplify the default event for perf record
> >
> > v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
> >
> > * update the missing feature detection logic
> > * separate exclude_hv fallback
> > * add new fallback for exclude_guest
> >
> > v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
> >
> > AFAIK it doesn't matter for the most cases but perf kvm. If users
> > need to set the bit, they can still use :H modifier. For vPMU pass-
> > through or Apple M1, it'd add the exclude_guest during the fallback
> > logic.
> >
> > Also the kernel feature detection logic should be separated from the
> > exclude bit tests since it depends on the PMU implementation rather
> > than the core kernel features. So I changed it to use a software
> > event for the detection and factor out some hw-specific checks.
> >
> > The code is available at 'perf/exclude-v4' branch in
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> >
> > Thanks,
> > Namhyung
> >
> >
>
> Looks like you need to allow for :H in the perf stat test on M1 now:
>
> diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
> index 5a2ca2bcf94d..77cb95859649 100755
> --- a/tools/perf/tests/shell/stat.sh
> +++ b/tools/perf/tests/shell/stat.sh
> @@ -161,7 +161,7 @@ test_hybrid() {
> fi
>
> # Run default Perf stat
> - cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles " |
> wc -l)
> + cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles | cycles:H " | wc -l)
Ok, what about "u"? Probably it can be
cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/(uH)*| cycles(:[uH])* " | wc -l)
>
> if [ "$pmus" -ne "$cycles_events" ]
> then
>
> Other than that:
>
> Reviewed-by: James Clark <james.clark@linaro.org>
Thanks for the review!
Namhyung
>
> > Namhyung Kim (8):
> > perf tools: Add fallback for exclude_guest
> > perf tools: Don't set attr.exclude_guest by default
> > perf tools: Simplify evsel__add_modifier()
> > perf tools: Do not set exclude_guest for precise_ip
> > perf tools: Detect missing kernel features properly
> > perf tools: Move x86__is_amd_cpu() to util/env.c
> > perf tools: Check fallback error and order
> > perf record: Just use "cycles:P" as the default event
> >
> > tools/perf/arch/x86/util/Build | 1 -
> > tools/perf/arch/x86/util/env.c | 19 -
> > tools/perf/arch/x86/util/env.h | 7 -
> > tools/perf/arch/x86/util/pmu.c | 2 +-
> > tools/perf/builtin-kvm.c | 1 +
> > tools/perf/builtin-record.c | 4 +-
> > tools/perf/builtin-stat.c | 18 +-
> > tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
> > tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
> > tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
> > tools/perf/tests/parse-events.c | 30 +-
> > tools/perf/util/env.c | 24 ++
> > tools/perf/util/env.h | 4 +
> > tools/perf/util/evsel.c | 394 ++++++++++++++------
> > tools/perf/util/evsel.h | 1 -
> > tools/perf/util/parse-events.c | 6 +-
> > tools/perf/util/util.c | 10 +-
> > tools/perf/util/util.h | 3 +
> > 18 files changed, 364 insertions(+), 166 deletions(-)
> > delete mode 100644 tools/perf/arch/x86/util/env.c
> > delete mode 100644 tools/perf/arch/x86/util/env.h
> >
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-01 22:21 ` Ian Rogers
@ 2024-10-03 17:06 ` Namhyung Kim
2024-10-03 17:32 ` Ian Rogers
0 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-03 17:06 UTC (permalink / raw)
To: Ian Rogers
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Tue, Oct 01, 2024 at 03:21:50PM -0700, Ian Rogers wrote:
> On Tue, Oct 1, 2024 at 2:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Tue, Oct 01, 2024 at 11:00:20AM -0700, Ian Rogers wrote:
> > > On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > The perf_event_open might fail due to various reasons, so blindly
> > > > reducing precise_ip level might not be the best way to deal with it.
> > > >
> > > > It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> > > > given precise level. Let's try again with the correct error code.
> > > >
> > > > This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> > > > user events with exclude_kernel=1 cannot make progress. Let's add the
> > > > evsel__handle_error_quirks() to this case specially. I plan to work on
> > > > the kernel side to improve this situation but it'd still need some
> > > > special handling for IBS.
> > > >
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > > tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > > index 32e30c293d0c6198..ef8356260eea54cd 100644
> > > > --- a/tools/perf/util/evsel.c
> > > > +++ b/tools/perf/util/evsel.c
> > > > @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> > > > return false;
> > > > }
> > > >
> > > > +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> > > > +{
> > > > + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
> > >
> > > Should the quirk handling be specific to evsels with the IBS PMU given
> > > the comment above? ie this is a PMU specific workaround rather than an
> > > AMD specific workaround, however, the PMU only exists on AMD :-)
> > >
> > > > + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> > > > + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
> > >
> > > So here rather than x86__is_amd_cpu it would be
> > > !strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
> > > the logic into pmu.c.
> >
> > I guess the problem is that AMD driver does implicit forwarding to IBS
> > if the legacy hardware events have precise_ip. So it might have just
> > core pmu (or no pmu) in the evsel.
>
> I think the no PMU case should probably have a PMU possibly similar to
> the tool PMU in:
> https://lore.kernel.org/all/20240912190341.919229-1-irogers@google.com/
> But that's not in place yet. You can always use
> perf_pmus__scan_core(NULL) or
> perf_pmus__find_by_type(evsel->core.attr.type or PERF_TYPE_RAW).
> evsel->pmu->is_core && x86__is_amd_cpu() would identify an AMD core
> PMU whereas the code above could fire for IBS or other PMUs.
But IBS is the only PMU on AMD that provides precise_ip IIRC. So I
don't think other events would have it nor have any effect on changing
the value. So maybe we can skip the PMU scanning and just drop to 0?
Thanks,
Namhyung
>
> > >
> > > > + evsel->core.attr.precise_ip = 0;
> > > > + pr_debug2_peo("removing precise_ip on AMD\n");
> > > > + display_attr(&evsel->core.attr);
> > > > + return true;
> > > > + }
> > > > +
> > > > + return false;
> > > > +}
> > > > +
> > > > static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > > struct perf_thread_map *threads,
> > > > int start_cpu_map_idx, int end_cpu_map_idx)
> > > > @@ -2580,9 +2594,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > > return 0;
> > > >
> > > > try_fallback:
> > > > - if (evsel__precise_ip_fallback(evsel))
> > > > - goto retry_open;
> > > > -
> > > > if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> > > > idx, threads, thread, err)) {
> > > > /* We just removed 1 thread, so lower the upper nthreads limit. */
> > > > @@ -2599,11 +2610,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > > if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> > > > goto retry_open;
> > > >
> > > > - if (err != -EINVAL || idx > 0 || thread > 0)
> > > > - goto out_close;
> > > > + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> > > > + goto retry_open;
> > > >
> > > > - if (evsel__detect_missing_features(evsel))
> > > > + if (err == -EINVAL && evsel__detect_missing_features(evsel))
> > > > goto fallback_missing_features;
> > > > +
> > > > + if (evsel__handle_error_quirks(evsel, err))
> > > > + goto retry_open;
> > > > +
> > > > out_close:
> > > > if (err)
> > > > threads->err_thread = thread;
> > > > --
> > > > 2.46.1.824.gd892dcdcdd-goog
> > > >
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-03 17:06 ` Namhyung Kim
@ 2024-10-03 17:32 ` Ian Rogers
2024-10-03 22:38 ` Namhyung Kim
0 siblings, 1 reply; 37+ messages in thread
From: Ian Rogers @ 2024-10-03 17:32 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Thu, Oct 3, 2024 at 10:06 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Oct 01, 2024 at 03:21:50PM -0700, Ian Rogers wrote:
> > On Tue, Oct 1, 2024 at 2:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Tue, Oct 01, 2024 at 11:00:20AM -0700, Ian Rogers wrote:
> > > > On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > The perf_event_open might fail due to various reasons, so blindly
> > > > > reducing precise_ip level might not be the best way to deal with it.
> > > > >
> > > > > It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> > > > > given precise level. Let's try again with the correct error code.
> > > > >
> > > > > This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> > > > > user events with exclude_kernel=1 cannot make progress. Let's add the
> > > > > evsel__handle_error_quirks() to this case specially. I plan to work on
> > > > > the kernel side to improve this situation but it'd still need some
> > > > > special handling for IBS.
> > > > >
> > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > ---
> > > > > tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> > > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > > > index 32e30c293d0c6198..ef8356260eea54cd 100644
> > > > > --- a/tools/perf/util/evsel.c
> > > > > +++ b/tools/perf/util/evsel.c
> > > > > @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> > > > > return false;
> > > > > }
> > > > >
> > > > > +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> > > > > +{
> > > > > + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
> > > >
> > > > Should the quirk handling be specific to evsels with the IBS PMU given
> > > > the comment above? ie this is a PMU specific workaround rather than an
> > > > AMD specific workaround, however, the PMU only exists on AMD :-)
> > > >
> > > > > + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> > > > > + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
> > > >
> > > > So here rather than x86__is_amd_cpu it would be
> > > > !strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
> > > > the logic into pmu.c.
> > >
> > > I guess the problem is that AMD driver does implicit forwarding to IBS
> > > if the legacy hardware events have precise_ip. So it might have just
> > > core pmu (or no pmu) in the evsel.
> >
> > I think the no PMU case should probably have a PMU possibly similar to
> > the tool PMU in:
> > https://lore.kernel.org/all/20240912190341.919229-1-irogers@google.com/
> > But that's not in place yet. You can always use
> > perf_pmus__scan_core(NULL) or
> > perf_pmus__find_by_type(evsel->core.attr.type or PERF_TYPE_RAW).
> > evsel->pmu->is_core && x86__is_amd_cpu() would identify an AMD core
> > PMU whereas the code above could fire for IBS or other PMUs.
>
> But IBS is the only PMU on AMD that provides precise_ip IIRC. So I
> don't think other events would have it nor have any effect on changing
> the value. So maybe we can skip the PMU scanning and just drop to 0?
cpu supports precise_ip as it forwards it to IBS, IBS is an ambiguous
term as there are ibs_op and ibs_fetch PMUs. The code is using values
in the attribute to infer the PMU that is being used, it feels it
would be more intention revealing to do things like:
```
if (error == ... && perf_pmu__is_ibs_op_or_fetch(evsel->pmu)) ..
```
perhaps to not burden the code this can be:
```
if (...) {
assert(perf_pmu__is_ibs_op_or_fetch(evsel->pmu));
```
Thanks,
Ian
> >
> > > >
> > > > > + evsel->core.attr.precise_ip = 0;
> > > > > + pr_debug2_peo("removing precise_ip on AMD\n");
> > > > > + display_attr(&evsel->core.attr);
> > > > > + return true;
> > > > > + }
> > > > > +
> > > > > + return false;
> > > > > +}
> > > > > +
> > > > > static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > > > struct perf_thread_map *threads,
> > > > > int start_cpu_map_idx, int end_cpu_map_idx)
> > > > > @@ -2580,9 +2594,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > > > return 0;
> > > > >
> > > > > try_fallback:
> > > > > - if (evsel__precise_ip_fallback(evsel))
> > > > > - goto retry_open;
> > > > > -
> > > > > if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> > > > > idx, threads, thread, err)) {
> > > > > /* We just removed 1 thread, so lower the upper nthreads limit. */
> > > > > @@ -2599,11 +2610,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > > > > if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> > > > > goto retry_open;
> > > > >
> > > > > - if (err != -EINVAL || idx > 0 || thread > 0)
> > > > > - goto out_close;
> > > > > + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> > > > > + goto retry_open;
> > > > >
> > > > > - if (evsel__detect_missing_features(evsel))
> > > > > + if (err == -EINVAL && evsel__detect_missing_features(evsel))
> > > > > goto fallback_missing_features;
> > > > > +
> > > > > + if (evsel__handle_error_quirks(evsel, err))
> > > > > + goto retry_open;
> > > > > +
> > > > > out_close:
> > > > > if (err)
> > > > > threads->err_thread = thread;
> > > > > --
> > > > > 2.46.1.824.gd892dcdcdd-goog
> > > > >
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-03 17:32 ` Ian Rogers
@ 2024-10-03 22:38 ` Namhyung Kim
2024-10-14 19:22 ` Namhyung Kim
0 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2024-10-03 22:38 UTC (permalink / raw)
To: Ian Rogers
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Thu, Oct 03, 2024 at 10:32:47AM -0700, Ian Rogers wrote:
> On Thu, Oct 3, 2024 at 10:06 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Tue, Oct 01, 2024 at 03:21:50PM -0700, Ian Rogers wrote:
> > > On Tue, Oct 1, 2024 at 2:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Tue, Oct 01, 2024 at 11:00:20AM -0700, Ian Rogers wrote:
> > > > > On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > The perf_event_open might fail due to various reasons, so blindly
> > > > > > reducing precise_ip level might not be the best way to deal with it.
> > > > > >
> > > > > > It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> > > > > > given precise level. Let's try again with the correct error code.
> > > > > >
> > > > > > This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> > > > > > user events with exclude_kernel=1 cannot make progress. Let's add the
> > > > > > evsel__handle_error_quirks() to this case specially. I plan to work on
> > > > > > the kernel side to improve this situation but it'd still need some
> > > > > > special handling for IBS.
> > > > > >
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > > tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> > > > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > > > > index 32e30c293d0c6198..ef8356260eea54cd 100644
> > > > > > --- a/tools/perf/util/evsel.c
> > > > > > +++ b/tools/perf/util/evsel.c
> > > > > > @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> > > > > > return false;
> > > > > > }
> > > > > >
> > > > > > +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> > > > > > +{
> > > > > > + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
> > > > >
> > > > > Should the quirk handling be specific to evsels with the IBS PMU given
> > > > > the comment above? ie this is a PMU specific workaround rather than an
> > > > > AMD specific workaround, however, the PMU only exists on AMD :-)
> > > > >
> > > > > > + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> > > > > > + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
> > > > >
> > > > > So here rather than x86__is_amd_cpu it would be
> > > > > !strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
> > > > > the logic into pmu.c.
> > > >
> > > > I guess the problem is that AMD driver does implicit forwarding to IBS
> > > > if the legacy hardware events have precise_ip. So it might have just
> > > > core pmu (or no pmu) in the evsel.
> > >
> > > I think the no PMU case should probably have a PMU possibly similar to
> > > the tool PMU in:
> > > https://lore.kernel.org/all/20240912190341.919229-1-irogers@google.com/
> > > But that's not in place yet. You can always use
> > > perf_pmus__scan_core(NULL) or
> > > perf_pmus__find_by_type(evsel->core.attr.type or PERF_TYPE_RAW).
> > > evsel->pmu->is_core && x86__is_amd_cpu() would identify an AMD core
> > > PMU whereas the code above could fire for IBS or other PMUs.
> >
> > But IBS is the only PMU on AMD that provides precise_ip IIRC. So I
> > don't think other events would have it nor have any effect on changing
> > the value. So maybe we can skip the PMU scanning and just drop to 0?
>
> cpu supports precise_ip as it forwards it to IBS, IBS is an ambiguous
> term as there are ibs_op and ibs_fetch PMUs. The code is using values
> in the attribute to infer the PMU that is being used, it feels it
> would be more intention revealing to do things like:
> ```
> if (error == ... && perf_pmu__is_ibs_op_or_fetch(evsel->pmu)) ..
> ```
I guess it'd get a core PMU for the default cycles event. I think the
intention is already confusing with the implicit forwarding.
Thanks,
Namhyung
> perhaps to not burden the code this can be:
> ```
> if (...) {
> assert(perf_pmu__is_ibs_op_or_fetch(evsel->pmu));
> ```
>
> Thanks,
> Ian
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
2024-10-02 18:29 ` Namhyung Kim
@ 2024-10-04 15:40 ` James Clark
2024-10-04 19:17 ` Namhyung Kim
0 siblings, 1 reply; 37+ messages in thread
From: James Clark @ 2024-10-04 15:40 UTC (permalink / raw)
To: Namhyung Kim
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang, Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
On 02/10/2024 19:29, Namhyung Kim wrote:
> On Wed, Oct 02, 2024 at 10:49:36AM +0100, James Clark wrote:
>>
>>
>> On 01/10/2024 1:20 am, Namhyung Kim wrote:
>>> Hello,
>>>
>>> I found perf tools set exclude_guest bit inconsistently. It used to
>>> set the bit but now the default event for perf record doesn't. So I'm
>>> wondering why we want the bit in the first place.
>>>
>>> Actually it's not good for PMUs don't support any exclusion like AMD
>>> IBS because it disables new features after the exclude_guest due to
>>> the missing feature detection logic.
>>>
>>> v4 changes)
>>>
>>> * handle EOPNOTSUPP error in compatible way (Kan)
>>> * drop --exclude-guest option in perf stat
>>> * not to separate exclude_hv fallback
>>> * rename to exclude_GH_default (Kan)
>>> * drop the RFC from the subject
>>>
>>> v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
>>>
>>> * move exclude_guest fallback to the front
>>> * fix precise_max handling on AMD
>>> * simplify the default event for perf record
>>>
>>> v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
>>>
>>> * update the missing feature detection logic
>>> * separate exclude_hv fallback
>>> * add new fallback for exclude_guest
>>>
>>> v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
>>>
>>> AFAIK it doesn't matter for the most cases but perf kvm. If users
>>> need to set the bit, they can still use :H modifier. For vPMU pass-
>>> through or Apple M1, it'd add the exclude_guest during the fallback
>>> logic.
>>>
>>> Also the kernel feature detection logic should be separated from the
>>> exclude bit tests since it depends on the PMU implementation rather
>>> than the core kernel features. So I changed it to use a software
>>> event for the detection and factor out some hw-specific checks.
>>>
>>> The code is available at 'perf/exclude-v4' branch in
>>> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>>>
>>> Thanks,
>>> Namhyung
>>>
>>>
>>
>> Looks like you need to allow for :H in the perf stat test on M1 now:
>>
>> diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
>> index 5a2ca2bcf94d..77cb95859649 100755
>> --- a/tools/perf/tests/shell/stat.sh
>> +++ b/tools/perf/tests/shell/stat.sh
>> @@ -161,7 +161,7 @@ test_hybrid() {
>> fi
>>
>> # Run default Perf stat
>> - cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles " |
>> wc -l)
>> + cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles | cycles:H " | wc -l)
>
> Ok, what about "u"? Probably it can be
>
> cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/(uH)*| cycles(:[uH])* " | wc -l)
>
Yep that probably works too and is a bit more robust
>>
>> if [ "$pmus" -ne "$cycles_events" ]
>> then
>>
>> Other than that:
>>
>> Reviewed-by: James Clark <james.clark@linaro.org>
>
> Thanks for the review!
> Namhyung
>
>>
>>> Namhyung Kim (8):
>>> perf tools: Add fallback for exclude_guest
>>> perf tools: Don't set attr.exclude_guest by default
>>> perf tools: Simplify evsel__add_modifier()
>>> perf tools: Do not set exclude_guest for precise_ip
>>> perf tools: Detect missing kernel features properly
>>> perf tools: Move x86__is_amd_cpu() to util/env.c
>>> perf tools: Check fallback error and order
>>> perf record: Just use "cycles:P" as the default event
>>>
>>> tools/perf/arch/x86/util/Build | 1 -
>>> tools/perf/arch/x86/util/env.c | 19 -
>>> tools/perf/arch/x86/util/env.h | 7 -
>>> tools/perf/arch/x86/util/pmu.c | 2 +-
>>> tools/perf/builtin-kvm.c | 1 +
>>> tools/perf/builtin-record.c | 4 +-
>>> tools/perf/builtin-stat.c | 18 +-
>>> tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
>>> tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
>>> tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
>>> tools/perf/tests/parse-events.c | 30 +-
>>> tools/perf/util/env.c | 24 ++
>>> tools/perf/util/env.h | 4 +
>>> tools/perf/util/evsel.c | 394 ++++++++++++++------
>>> tools/perf/util/evsel.h | 1 -
>>> tools/perf/util/parse-events.c | 6 +-
>>> tools/perf/util/util.c | 10 +-
>>> tools/perf/util/util.h | 3 +
>>> 18 files changed, 364 insertions(+), 166 deletions(-)
>>> delete mode 100644 tools/perf/arch/x86/util/env.c
>>> delete mode 100644 tools/perf/arch/x86/util/env.h
>>>
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
2024-10-04 15:40 ` James Clark
@ 2024-10-04 19:17 ` Namhyung Kim
0 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-04 19:17 UTC (permalink / raw)
To: James Clark
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Ravi Bangoria, Mark Rutland, James Clark,
Kajol Jain, Thomas Richter, Atish Patra, Palmer Dabbelt,
Mingwei Zhang, Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
On Fri, Oct 04, 2024 at 04:40:18PM +0100, James Clark wrote:
>
>
> On 02/10/2024 19:29, Namhyung Kim wrote:
> > On Wed, Oct 02, 2024 at 10:49:36AM +0100, James Clark wrote:
> > >
> > >
> > > On 01/10/2024 1:20 am, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > I found perf tools set exclude_guest bit inconsistently. It used to
> > > > set the bit but now the default event for perf record doesn't. So I'm
> > > > wondering why we want the bit in the first place.
> > > >
> > > > Actually it's not good for PMUs don't support any exclusion like AMD
> > > > IBS because it disables new features after the exclude_guest due to
> > > > the missing feature detection logic.
> > > >
> > > > v4 changes)
> > > >
> > > > * handle EOPNOTSUPP error in compatible way (Kan)
> > > > * drop --exclude-guest option in perf stat
> > > > * not to separate exclude_hv fallback
> > > > * rename to exclude_GH_default (Kan)
> > > > * drop the RFC from the subject
> > > >
> > > > v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
> > > >
> > > > * move exclude_guest fallback to the front
> > > > * fix precise_max handling on AMD
> > > > * simplify the default event for perf record
> > > >
> > > > v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
> > > >
> > > > * update the missing feature detection logic
> > > > * separate exclude_hv fallback
> > > > * add new fallback for exclude_guest
> > > >
> > > > v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
> > > >
> > > > AFAIK it doesn't matter for the most cases but perf kvm. If users
> > > > need to set the bit, they can still use :H modifier. For vPMU pass-
> > > > through or Apple M1, it'd add the exclude_guest during the fallback
> > > > logic.
> > > >
> > > > Also the kernel feature detection logic should be separated from the
> > > > exclude bit tests since it depends on the PMU implementation rather
> > > > than the core kernel features. So I changed it to use a software
> > > > event for the detection and factor out some hw-specific checks.
> > > >
> > > > The code is available at 'perf/exclude-v4' branch in
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> > > >
> > > > Thanks,
> > > > Namhyung
> > > >
> > > >
> > >
> > > Looks like you need to allow for :H in the perf stat test on M1 now:
> > >
> > > diff --git a/tools/perf/tests/shell/stat.sh b/tools/perf/tests/shell/stat.sh
> > > index 5a2ca2bcf94d..77cb95859649 100755
> > > --- a/tools/perf/tests/shell/stat.sh
> > > +++ b/tools/perf/tests/shell/stat.sh
> > > @@ -161,7 +161,7 @@ test_hybrid() {
> > > fi
> > >
> > > # Run default Perf stat
> > > - cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles " |
> > > wc -l)
> > > + cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/| cycles | cycles:H " | wc -l)
> >
> > Ok, what about "u"? Probably it can be
> >
> > cycles_events=$(perf stat -- true 2>&1 | grep -E "/cycles/(uH)*| cycles(:[uH])* " | wc -l)
> >
>
> Yep that probably works too and is a bit more robust
Great, I'll merge this change with the fix above if nobody objects.
It'd be nice if more folks on other architectures could test this
patchset too.
Thanks,
Namhyung
>
> > >
> > > if [ "$pmus" -ne "$cycles_events" ]
> > > then
> > >
> > > Other than that:
> > >
> > > Reviewed-by: James Clark <james.clark@linaro.org>
> >
> > Thanks for the review!
> > Namhyung
> >
> > >
> > > > Namhyung Kim (8):
> > > > perf tools: Add fallback for exclude_guest
> > > > perf tools: Don't set attr.exclude_guest by default
> > > > perf tools: Simplify evsel__add_modifier()
> > > > perf tools: Do not set exclude_guest for precise_ip
> > > > perf tools: Detect missing kernel features properly
> > > > perf tools: Move x86__is_amd_cpu() to util/env.c
> > > > perf tools: Check fallback error and order
> > > > perf record: Just use "cycles:P" as the default event
> > > >
> > > > tools/perf/arch/x86/util/Build | 1 -
> > > > tools/perf/arch/x86/util/env.c | 19 -
> > > > tools/perf/arch/x86/util/env.h | 7 -
> > > > tools/perf/arch/x86/util/pmu.c | 2 +-
> > > > tools/perf/builtin-kvm.c | 1 +
> > > > tools/perf/builtin-record.c | 4 +-
> > > > tools/perf/builtin-stat.c | 18 +-
> > > > tools/perf/dlfilters/dlfilter-test-api-v0.c | 2 +-
> > > > tools/perf/dlfilters/dlfilter-test-api-v2.c | 2 +-
> > > > tools/perf/tests/attr/test-record-dummy-C0 | 2 +-
> > > > tools/perf/tests/parse-events.c | 30 +-
> > > > tools/perf/util/env.c | 24 ++
> > > > tools/perf/util/env.h | 4 +
> > > > tools/perf/util/evsel.c | 394 ++++++++++++++------
> > > > tools/perf/util/evsel.h | 1 -
> > > > tools/perf/util/parse-events.c | 6 +-
> > > > tools/perf/util/util.c | 10 +-
> > > > tools/perf/util/util.h | 3 +
> > > > 18 files changed, 364 insertions(+), 166 deletions(-)
> > > > delete mode 100644 tools/perf/arch/x86/util/env.c
> > > > delete mode 100644 tools/perf/arch/x86/util/env.h
> > > >
> > >
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-03 22:38 ` Namhyung Kim
@ 2024-10-14 19:22 ` Namhyung Kim
2024-10-15 4:21 ` Ravi Bangoria
2024-10-16 4:47 ` Ian Rogers
0 siblings, 2 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-14 19:22 UTC (permalink / raw)
To: Ian Rogers
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
Hi Ian,
On Thu, Oct 03, 2024 at 03:38:01PM -0700, Namhyung Kim wrote:
> On Thu, Oct 03, 2024 at 10:32:47AM -0700, Ian Rogers wrote:
> > On Thu, Oct 3, 2024 at 10:06 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Tue, Oct 01, 2024 at 03:21:50PM -0700, Ian Rogers wrote:
> > > > On Tue, Oct 1, 2024 at 2:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Tue, Oct 01, 2024 at 11:00:20AM -0700, Ian Rogers wrote:
> > > > > > On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > >
> > > > > > > The perf_event_open might fail due to various reasons, so blindly
> > > > > > > reducing precise_ip level might not be the best way to deal with it.
> > > > > > >
> > > > > > > It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> > > > > > > given precise level. Let's try again with the correct error code.
> > > > > > >
> > > > > > > This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> > > > > > > user events with exclude_kernel=1 cannot make progress. Let's add the
> > > > > > > evsel__handle_error_quirks() to this case specially. I plan to work on
> > > > > > > the kernel side to improve this situation but it'd still need some
> > > > > > > special handling for IBS.
> > > > > > >
> > > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > > ---
> > > > > > > tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> > > > > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > > > > >
> > > > > > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > > > > > index 32e30c293d0c6198..ef8356260eea54cd 100644
> > > > > > > --- a/tools/perf/util/evsel.c
> > > > > > > +++ b/tools/perf/util/evsel.c
> > > > > > > @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> > > > > > > return false;
> > > > > > > }
> > > > > > >
> > > > > > > +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> > > > > > > +{
> > > > > > > + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
> > > > > >
> > > > > > Should the quirk handling be specific to evsels with the IBS PMU given
> > > > > > the comment above? ie this is a PMU specific workaround rather than an
> > > > > > AMD specific workaround, however, the PMU only exists on AMD :-)
> > > > > >
> > > > > > > + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> > > > > > > + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
> > > > > >
> > > > > > So here rather than x86__is_amd_cpu it would be
> > > > > > !strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
> > > > > > the logic into pmu.c.
> > > > >
> > > > > I guess the problem is that AMD driver does implicit forwarding to IBS
> > > > > if the legacy hardware events have precise_ip. So it might have just
> > > > > core pmu (or no pmu) in the evsel.
> > > >
> > > > I think the no PMU case should probably have a PMU possibly similar to
> > > > the tool PMU in:
> > > > https://lore.kernel.org/all/20240912190341.919229-1-irogers@google.com/
> > > > But that's not in place yet. You can always use
> > > > perf_pmus__scan_core(NULL) or
> > > > perf_pmus__find_by_type(evsel->core.attr.type or PERF_TYPE_RAW).
> > > > evsel->pmu->is_core && x86__is_amd_cpu() would identify an AMD core
> > > > PMU whereas the code above could fire for IBS or other PMUs.
> > >
> > > But IBS is the only PMU on AMD that provides precise_ip IIRC. So I
> > > don't think other events would have it nor have any effect on changing
> > > the value. So maybe we can skip the PMU scanning and just drop to 0?
> >
> > cpu supports precise_ip as it forwards it to IBS, IBS is an ambiguous
> > term as there are ibs_op and ibs_fetch PMUs. The code is using values
> > in the attribute to infer the PMU that is being used, it feels it
> > would be more intention revealing to do things like:
> > ```
> > if (error == ... && perf_pmu__is_ibs_op_or_fetch(evsel->pmu)) ..
> > ```
>
> I guess it'd get a core PMU for the default cycles event. I think the
> intention is already confusing with the implicit forwarding.
What about this?
---8<---
From 70d39fb5c2956ba264a292f112f7fd7272dc91be Mon Sep 17 00:00:00 2001
From: Namhyung Kim <namhyung@kernel.org>
Date: Tue, 3 Sep 2024 22:50:09 -0700
Subject: [PATCH] perf tools: Check fallback error and order
The perf_event_open might fail due to various reasons, so blindly
reducing precise_ip level might not be the best way to deal with it.
It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
given precise level. Let's try again with the correct error code.
This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
user events with exclude_kernel=1 cannot make progress. Let's add the
evsel__handle_error_quirks() to this case specially. I plan to work on
the kernel side to improve this situation but it'd still need some
special handling for IBS.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/evsel.c | 50 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 44 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 476658143822c346..2c45c55222c43dd4 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2246,6 +2246,43 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
return false;
}
+/*
+ * AMD core PMU forwards some events with precise_ip to IBS implicitly.
+ * This logic matches to the kernel function (core_pmu_ibs_config).
+ */
+static bool evsel__is_implicit_ibs_event(struct evsel *evsel)
+{
+ if (evsel->core.attr.precise_ip == 0 || !x86__is_amd_cpu())
+ return false;
+
+ if (evsel->core.attr.type == PERF_TYPE_HARDWARE &&
+ evsel->core.attr.config == PERF_COUNT_HW_CPU_CYCLES)
+ return true;
+
+ if (evsel->core.attr.type == PERF_TYPE_RAW &&
+ (evsel->core.attr.config == 0x76 || evsel->core.attr.config == 0xc1))
+ return true;
+
+ return false;
+}
+
+static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
+{
+ /*
+ * AMD IBS doesn't support exclude_kernel, forward it back to the core
+ * PMU by clearing precise_ip only if it's from precise_max (:P).
+ */
+ if (error == -EINVAL && evsel__is_implicit_ibs_event(evsel) &&
+ evsel->core.attr.exclude_kernel && evsel->precise_max) {
+ evsel->core.attr.precise_ip = 0;
+ pr_debug2_peo("removing precise_ip on AMD\n");
+ display_attr(&evsel->core.attr);
+ return true;
+ }
+
+ return false;
+}
+
static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
struct perf_thread_map *threads,
int start_cpu_map_idx, int end_cpu_map_idx)
@@ -2366,9 +2403,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
return 0;
try_fallback:
- if (evsel__precise_ip_fallback(evsel))
- goto retry_open;
-
if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
idx, threads, thread, err)) {
/* We just removed 1 thread, so lower the upper nthreads limit. */
@@ -2385,11 +2419,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
goto retry_open;
- if (err != -EINVAL || idx > 0 || thread > 0)
- goto out_close;
+ if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
+ goto retry_open;
- if (evsel__detect_missing_features(evsel))
+ if (err == -EINVAL && evsel__detect_missing_features(evsel))
goto fallback_missing_features;
+
+ if (evsel__handle_error_quirks(evsel, err))
+ goto retry_open;
+
out_close:
if (err)
threads->err_thread = thread;
--
2.47.0.rc1.288.g06298d1525-goog
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 5/8] perf tools: Detect missing kernel features properly
2024-10-01 0:20 ` [PATCH 5/8] perf tools: Detect missing kernel features properly Namhyung Kim
2024-10-01 17:53 ` Ian Rogers
@ 2024-10-15 4:19 ` Ravi Bangoria
2024-10-16 4:49 ` Namhyung Kim
1 sibling, 1 reply; 37+ messages in thread
From: Ravi Bangoria @ 2024-10-15 4:19 UTC (permalink / raw)
To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang,
Ravi Bangoria
> - } else if (evsel->core.attr.exclude_guest || evsel->core.attr.exclude_host) {
> - if (evsel->pmu == NULL)
> - evsel->pmu = evsel__find_pmu(evsel);
> -
> - if (evsel->pmu)
> - evsel->pmu->missing_features.exclude_guest = true;
> - else {
> - /* we cannot find PMU, disable attrs now */
> - evsel->core.attr.exclude_host = false;
> - evsel->core.attr.exclude_guest = false;
> - }
>
> - if (evsel->exclude_GH) {
> - pr_debug2_peo("PMU has no exclude_host/guest support, bailing out\n");
> - return false;
> - }
> - if (!perf_missing_features.exclude_guest) {
> - perf_missing_features.exclude_guest = true;
> - pr_debug2_peo("switching off exclude_guest, exclude_host\n");
> - }
Shall we get rid of:
perf_missing_features.exclude_guest
pmu->missing_features.exclude_guest
they don't seem to be used anywhere after the patch.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-14 19:22 ` Namhyung Kim
@ 2024-10-15 4:21 ` Ravi Bangoria
2024-10-16 4:33 ` Namhyung Kim
2024-10-16 4:47 ` Ian Rogers
1 sibling, 1 reply; 37+ messages in thread
From: Ravi Bangoria @ 2024-10-15 4:21 UTC (permalink / raw)
To: Namhyung Kim, Ian Rogers
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users, Mark Rutland,
James Clark, Kajol Jain, Thomas Richter, Atish Patra,
Palmer Dabbelt, Mingwei Zhang, Ravi Bangoria
> @@ -2366,9 +2403,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> return 0;
>
> try_fallback:
> - if (evsel__precise_ip_fallback(evsel))
> - goto retry_open;
> -
> if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> idx, threads, thread, err)) {
> /* We just removed 1 thread, so lower the upper nthreads limit. */
> @@ -2385,11 +2419,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> goto retry_open;
>
> - if (err != -EINVAL || idx > 0 || thread > 0)
> - goto out_close;
> + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> + goto retry_open;
This will change the behavior of events like instructions:P on AMD.
Without patch:
$ ./perf record -e instructions:P -- true
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.009 MB perf.data (9 samples) ]
With the patch:
$ ./perf record -e instructions:P -- true
Error:
The instructions:Pu event is not supported.
Thanks,
Ravi
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4)
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
` (9 preceding siblings ...)
2024-10-02 9:49 ` James Clark
@ 2024-10-15 4:25 ` Ravi Bangoria
10 siblings, 0 replies; 37+ messages in thread
From: Ravi Bangoria @ 2024-10-15 4:25 UTC (permalink / raw)
To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang,
Ravi Bangoria
On 01-Oct-24 5:50 AM, Namhyung Kim wrote:
> Hello,
>
> I found perf tools set exclude_guest bit inconsistently. It used to
> set the bit but now the default event for perf record doesn't. So I'm
> wondering why we want the bit in the first place.
>
> Actually it's not good for PMUs don't support any exclusion like AMD
> IBS because it disables new features after the exclude_guest due to
> the missing feature detection logic.
>
> v4 changes)
>
> * handle EOPNOTSUPP error in compatible way (Kan)
> * drop --exclude-guest option in perf stat
> * not to separate exclude_hv fallback
> * rename to exclude_GH_default (Kan)
> * drop the RFC from the subject
>
> v3) https://lore.kernel.org/lkml/20240905202426.2690105-1-namhyung@kernel.org/
>
> * move exclude_guest fallback to the front
> * fix precise_max handling on AMD
> * simplify the default event for perf record
>
> v2) https://lore.kernel.org/lkml/20240904064131.2377873-1-namhyung@kernel.org/
>
> * update the missing feature detection logic
> * separate exclude_hv fallback
> * add new fallback for exclude_guest
>
> v1) https://lore.kernel.org/lkml/20240902014621.2002343-1-namhyung@kernel.org/
>
> AFAIK it doesn't matter for the most cases but perf kvm. If users
> need to set the bit, they can still use :H modifier. For vPMU pass-
> through or Apple M1, it'd add the exclude_guest during the fallback
> logic.
>
> Also the kernel feature detection logic should be separated from the
> exclude bit tests since it depends on the PMU implementation rather
> than the core kernel features. So I changed it to use a software
> event for the detection and factor out some hw-specific checks.
>
> The code is available at 'perf/exclude-v4' branch in
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Apart from few minor things, series looks good to me:
Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
Thanks,
Ravi
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-15 4:21 ` Ravi Bangoria
@ 2024-10-16 4:33 ` Namhyung Kim
0 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-16 4:33 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Ian Rogers, Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa,
Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
Hello Ravi,
On Tue, Oct 15, 2024 at 09:51:50AM +0530, Ravi Bangoria wrote:
> > @@ -2366,9 +2403,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > return 0;
> >
> > try_fallback:
> > - if (evsel__precise_ip_fallback(evsel))
> > - goto retry_open;
> > -
> > if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> > idx, threads, thread, err)) {
> > /* We just removed 1 thread, so lower the upper nthreads limit. */
> > @@ -2385,11 +2419,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> > if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> > goto retry_open;
> >
> > - if (err != -EINVAL || idx > 0 || thread > 0)
> > - goto out_close;
> > + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> > + goto retry_open;
>
> This will change the behavior of events like instructions:P on AMD.
>
> Without patch:
> $ ./perf record -e instructions:P -- true
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.009 MB perf.data (9 samples) ]
>
> With the patch:
>
> $ ./perf record -e instructions:P -- true
> Error:
> The instructions:Pu event is not supported.
Thanks for the test, it should support other events too. I've noticed
it returns -ENOENT for non-IBS events with precise_ip > 0. Will handle
them as well.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/8] perf tools: Check fallback error and order
2024-10-14 19:22 ` Namhyung Kim
2024-10-15 4:21 ` Ravi Bangoria
@ 2024-10-16 4:47 ` Ian Rogers
1 sibling, 0 replies; 37+ messages in thread
From: Ian Rogers @ 2024-10-16 4:47 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Kan Liang, Jiri Olsa, Adrian Hunter,
Peter Zijlstra, Ingo Molnar, LKML, linux-perf-users,
Ravi Bangoria, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Mon, Oct 14, 2024 at 12:22 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Thu, Oct 03, 2024 at 03:38:01PM -0700, Namhyung Kim wrote:
> > On Thu, Oct 03, 2024 at 10:32:47AM -0700, Ian Rogers wrote:
> > > On Thu, Oct 3, 2024 at 10:06 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Tue, Oct 01, 2024 at 03:21:50PM -0700, Ian Rogers wrote:
> > > > > On Tue, Oct 1, 2024 at 2:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, Oct 01, 2024 at 11:00:20AM -0700, Ian Rogers wrote:
> > > > > > > On Mon, Sep 30, 2024 at 5:20 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > > >
> > > > > > > > The perf_event_open might fail due to various reasons, so blindly
> > > > > > > > reducing precise_ip level might not be the best way to deal with it.
> > > > > > > >
> > > > > > > > It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> > > > > > > > given precise level. Let's try again with the correct error code.
> > > > > > > >
> > > > > > > > This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> > > > > > > > user events with exclude_kernel=1 cannot make progress. Let's add the
> > > > > > > > evsel__handle_error_quirks() to this case specially. I plan to work on
> > > > > > > > the kernel side to improve this situation but it'd still need some
> > > > > > > > special handling for IBS.
> > > > > > > >
> > > > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > > > ---
> > > > > > > > tools/perf/util/evsel.c | 27 +++++++++++++++++++++------
> > > > > > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > > > > > > index 32e30c293d0c6198..ef8356260eea54cd 100644
> > > > > > > > --- a/tools/perf/util/evsel.c
> > > > > > > > +++ b/tools/perf/util/evsel.c
> > > > > > > > @@ -2419,6 +2419,20 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> > > > > > > > return false;
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> > > > > > > > +{
> > > > > > > > + /* AMD IBS doesn't support exclude_kernel, forward it to core PMU */
> > > > > > >
> > > > > > > Should the quirk handling be specific to evsels with the IBS PMU given
> > > > > > > the comment above? ie this is a PMU specific workaround rather than an
> > > > > > > AMD specific workaround, however, the PMU only exists on AMD :-)
> > > > > > >
> > > > > > > > + if (error == -EINVAL && evsel->precise_max && evsel->core.attr.precise_ip &&
> > > > > > > > + evsel->core.attr.exclude_kernel && x86__is_amd_cpu()) {
> > > > > > >
> > > > > > > So here rather than x86__is_amd_cpu it would be
> > > > > > > !strcmp(evsel->pmu->name, "ibs_...") . But it may be cleaner to move
> > > > > > > the logic into pmu.c.
> > > > > >
> > > > > > I guess the problem is that AMD driver does implicit forwarding to IBS
> > > > > > if the legacy hardware events have precise_ip. So it might have just
> > > > > > core pmu (or no pmu) in the evsel.
> > > > >
> > > > > I think the no PMU case should probably have a PMU possibly similar to
> > > > > the tool PMU in:
> > > > > https://lore.kernel.org/all/20240912190341.919229-1-irogers@google.com/
> > > > > But that's not in place yet. You can always use
> > > > > perf_pmus__scan_core(NULL) or
> > > > > perf_pmus__find_by_type(evsel->core.attr.type or PERF_TYPE_RAW).
> > > > > evsel->pmu->is_core && x86__is_amd_cpu() would identify an AMD core
> > > > > PMU whereas the code above could fire for IBS or other PMUs.
> > > >
> > > > But IBS is the only PMU on AMD that provides precise_ip IIRC. So I
> > > > don't think other events would have it nor have any effect on changing
> > > > the value. So maybe we can skip the PMU scanning and just drop to 0?
> > >
> > > cpu supports precise_ip as it forwards it to IBS, IBS is an ambiguous
> > > term as there are ibs_op and ibs_fetch PMUs. The code is using values
> > > in the attribute to infer the PMU that is being used, it feels it
> > > would be more intention revealing to do things like:
> > > ```
> > > if (error == ... && perf_pmu__is_ibs_op_or_fetch(evsel->pmu)) ..
> > > ```
> >
> > I guess it'd get a core PMU for the default cycles event. I think the
> > intention is already confusing with the implicit forwarding.
>
> What about this?
>
> ---8<---
>
> From 70d39fb5c2956ba264a292f112f7fd7272dc91be Mon Sep 17 00:00:00 2001
> From: Namhyung Kim <namhyung@kernel.org>
> Date: Tue, 3 Sep 2024 22:50:09 -0700
> Subject: [PATCH] perf tools: Check fallback error and order
>
> The perf_event_open might fail due to various reasons, so blindly
> reducing precise_ip level might not be the best way to deal with it.
>
> It seems the kernel return -EOPNOTSUPP when PMU doesn't support the
> given precise level. Let's try again with the correct error code.
>
> This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but
> user events with exclude_kernel=1 cannot make progress. Let's add the
> evsel__handle_error_quirks() to this case specially. I plan to work on
> the kernel side to improve this situation but it'd still need some
> special handling for IBS.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/util/evsel.c | 50 ++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 44 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 476658143822c346..2c45c55222c43dd4 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2246,6 +2246,43 @@ static bool evsel__detect_missing_features(struct evsel *evsel)
> return false;
> }
>
> +/*
> + * AMD core PMU forwards some events with precise_ip to IBS implicitly.
> + * This logic matches to the kernel function (core_pmu_ibs_config).
> + */
> +static bool evsel__is_implicit_ibs_event(struct evsel *evsel)
> +{
> + if (evsel->core.attr.precise_ip == 0 || !x86__is_amd_cpu())
> + return false;
> +
> + if (evsel->core.attr.type == PERF_TYPE_HARDWARE &&
> + evsel->core.attr.config == PERF_COUNT_HW_CPU_CYCLES)
> + return true;
Lgtm, still seems strange we're not asserting something like
evsel->is_pmu_core. There's some clean up of that in (unmerged):
https://lore.kernel.org/lkml/20240918220133.102964-3-irogers@google.com/#t
Thanks,
Ian
> +
> + if (evsel->core.attr.type == PERF_TYPE_RAW &&
> + (evsel->core.attr.config == 0x76 || evsel->core.attr.config == 0xc1))
> + return true;
> +
> + return false;
> +}
> +
> +static bool evsel__handle_error_quirks(struct evsel *evsel, int error)
> +{
> + /*
> + * AMD IBS doesn't support exclude_kernel, forward it back to the core
> + * PMU by clearing precise_ip only if it's from precise_max (:P).
> + */
> + if (error == -EINVAL && evsel__is_implicit_ibs_event(evsel) &&
> + evsel->core.attr.exclude_kernel && evsel->precise_max) {
> + evsel->core.attr.precise_ip = 0;
> + pr_debug2_peo("removing precise_ip on AMD\n");
> + display_attr(&evsel->core.attr);
> + return true;
> + }
> +
> + return false;
> +}
> +
> static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> struct perf_thread_map *threads,
> int start_cpu_map_idx, int end_cpu_map_idx)
> @@ -2366,9 +2403,6 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> return 0;
>
> try_fallback:
> - if (evsel__precise_ip_fallback(evsel))
> - goto retry_open;
> -
> if (evsel__ignore_missing_thread(evsel, perf_cpu_map__nr(cpus),
> idx, threads, thread, err)) {
> /* We just removed 1 thread, so lower the upper nthreads limit. */
> @@ -2385,11 +2419,15 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> if (err == -EMFILE && rlimit__increase_nofile(&set_rlimit))
> goto retry_open;
>
> - if (err != -EINVAL || idx > 0 || thread > 0)
> - goto out_close;
> + if (err == -EOPNOTSUPP && evsel__precise_ip_fallback(evsel))
> + goto retry_open;
>
> - if (evsel__detect_missing_features(evsel))
> + if (err == -EINVAL && evsel__detect_missing_features(evsel))
> goto fallback_missing_features;
> +
> + if (evsel__handle_error_quirks(evsel, err))
> + goto retry_open;
> +
> out_close:
> if (err)
> threads->err_thread = thread;
> --
> 2.47.0.rc1.288.g06298d1525-goog
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/8] perf tools: Detect missing kernel features properly
2024-10-15 4:19 ` Ravi Bangoria
@ 2024-10-16 4:49 ` Namhyung Kim
0 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2024-10-16 4:49 UTC (permalink / raw)
To: Ravi Bangoria
Cc: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang, Jiri Olsa,
Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Mark Rutland, James Clark, Kajol Jain,
Thomas Richter, Atish Patra, Palmer Dabbelt, Mingwei Zhang
On Tue, Oct 15, 2024 at 09:49:52AM +0530, Ravi Bangoria wrote:
> > - } else if (evsel->core.attr.exclude_guest || evsel->core.attr.exclude_host) {
> > - if (evsel->pmu == NULL)
> > - evsel->pmu = evsel__find_pmu(evsel);
> > -
> > - if (evsel->pmu)
> > - evsel->pmu->missing_features.exclude_guest = true;
> > - else {
> > - /* we cannot find PMU, disable attrs now */
> > - evsel->core.attr.exclude_host = false;
> > - evsel->core.attr.exclude_guest = false;
> > - }
> >
> > - if (evsel->exclude_GH) {
> > - pr_debug2_peo("PMU has no exclude_host/guest support, bailing out\n");
> > - return false;
> > - }
> > - if (!perf_missing_features.exclude_guest) {
> > - perf_missing_features.exclude_guest = true;
> > - pr_debug2_peo("switching off exclude_guest, exclude_host\n");
> > - }
>
> Shall we get rid of:
>
> perf_missing_features.exclude_guest
> pmu->missing_features.exclude_guest
>
> they don't seem to be used anywhere after the patch.
Hmm.. it's my bad. I think we should keep it for ancient kernels.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2024-10-16 4:49 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 0:20 [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Namhyung Kim
2024-10-01 0:20 ` [PATCH 1/8] perf tools: Add fallback for exclude_guest Namhyung Kim
2024-10-01 17:41 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 2/8] perf tools: Don't set attr.exclude_guest by default Namhyung Kim
2024-10-01 17:43 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 3/8] perf tools: Simplify evsel__add_modifier() Namhyung Kim
2024-10-01 17:44 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 4/8] perf tools: Do not set exclude_guest for precise_ip Namhyung Kim
2024-10-01 17:46 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 5/8] perf tools: Detect missing kernel features properly Namhyung Kim
2024-10-01 17:53 ` Ian Rogers
2024-10-01 21:32 ` Namhyung Kim
2024-10-15 4:19 ` Ravi Bangoria
2024-10-16 4:49 ` Namhyung Kim
2024-10-01 0:20 ` [PATCH 6/8] perf tools: Move x86__is_amd_cpu() to util/env.c Namhyung Kim
2024-10-01 16:03 ` Liang, Kan
2024-10-01 22:39 ` Namhyung Kim
2024-10-01 0:20 ` [PATCH 7/8] perf tools: Check fallback error and order Namhyung Kim
2024-10-01 18:00 ` Ian Rogers
2024-10-01 21:36 ` Namhyung Kim
2024-10-01 22:21 ` Ian Rogers
2024-10-03 17:06 ` Namhyung Kim
2024-10-03 17:32 ` Ian Rogers
2024-10-03 22:38 ` Namhyung Kim
2024-10-14 19:22 ` Namhyung Kim
2024-10-15 4:21 ` Ravi Bangoria
2024-10-16 4:33 ` Namhyung Kim
2024-10-16 4:47 ` Ian Rogers
2024-10-01 0:20 ` [PATCH 8/8] perf record: Just use "cycles:P" as the default event Namhyung Kim
2024-10-01 18:00 ` Ian Rogers
2024-10-01 17:46 ` [PATCHSET 0/8] perf tools: Do not set attr.exclude_guest by default (v4) Liang, Kan
2024-10-01 21:19 ` Namhyung Kim
2024-10-02 9:49 ` James Clark
2024-10-02 18:29 ` Namhyung Kim
2024-10-04 15:40 ` James Clark
2024-10-04 19:17 ` Namhyung Kim
2024-10-15 4:25 ` Ravi Bangoria
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).