* [PATCH v1 00/10] Move uid filtering to BPF filters
@ 2025-01-11 19:01 Ian Rogers
2025-01-11 19:01 ` [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables Ian Rogers
` (11 more replies)
0 siblings, 12 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Rather than scanning /proc and skipping PIDs based on their UIDs, use
BPF filters for uid filtering. The /proc scanning in thread_map is
racy as the PID may exit before the perf_event_open causing perf to
abort. BPF UID filters are more robust as they avoid the race. Add a
helper for commands that support UID filtering and wire up. Remove the
non-BPF UID filtering support.
Ian Rogers (10):
perf bench evlist-open-close: Reduce scope of 2 variables
perf parse-events filter: Use evsel__find_pmu
perf target: Separate parse_uid into its own function
perf parse-events: Add parse_uid_filter helper
perf record: Switch user option to use BPF filter
perf top: Switch user option to use BPF filter
perf trace: Switch user option to use BPF filter
perf bench evlist-open-close: Switch user option to use BPF filter
perf target: Remove uid from target
perf thread_map: Remove uid options
tools/perf/bench/evlist-open-close.c | 76 ++++++++++++---------
tools/perf/builtin-ftrace.c | 1 -
tools/perf/builtin-kvm.c | 2 -
tools/perf/builtin-record.c | 26 +++----
tools/perf/builtin-stat.c | 4 +-
tools/perf/builtin-top.c | 22 +++---
tools/perf/builtin-trace.c | 25 ++++---
tools/perf/tests/backward-ring-buffer.c | 1 -
tools/perf/tests/event-times.c | 8 +--
tools/perf/tests/keep-tracking.c | 2 +-
tools/perf/tests/mmap-basic.c | 2 +-
tools/perf/tests/openat-syscall-all-cpus.c | 2 +-
tools/perf/tests/openat-syscall-tp-fields.c | 1 -
tools/perf/tests/openat-syscall.c | 2 +-
tools/perf/tests/perf-record.c | 1 -
tools/perf/tests/perf-time-to-tsc.c | 2 +-
tools/perf/tests/switch-tracking.c | 2 +-
tools/perf/tests/task-exit.c | 1 -
tools/perf/tests/thread-map.c | 2 +-
tools/perf/util/bpf-filter.c | 2 +-
tools/perf/util/evlist.c | 3 +-
tools/perf/util/parse-events.c | 25 ++++---
tools/perf/util/parse-events.h | 1 +
tools/perf/util/python.c | 10 +--
tools/perf/util/target.c | 54 ++-------------
tools/perf/util/target.h | 15 +---
tools/perf/util/thread_map.c | 32 +--------
tools/perf/util/thread_map.h | 6 +-
tools/perf/util/top.c | 4 +-
tools/perf/util/top.h | 1 +
30 files changed, 135 insertions(+), 200 deletions(-)
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-02-12 14:17 ` Arnaldo Carvalho de Melo
2025-01-11 19:01 ` [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu Ian Rogers
` (10 subsequent siblings)
11 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Make 2 global variables local. Reduces ELF binary size by removing
relocations. For a no flags build, the perf binary size is reduced by
4,144 bytes on x86-64.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/bench/evlist-open-close.c | 42 +++++++++++++++-------------
1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
index 5a27691469ed..79cedcf94a39 100644
--- a/tools/perf/bench/evlist-open-close.c
+++ b/tools/perf/bench/evlist-open-close.c
@@ -46,25 +46,6 @@ static struct record_opts opts = {
.ctl_fd_ack = -1,
};
-static const struct option options[] = {
- OPT_STRING('e', "event", &event_string, "event", "event selector. use 'perf list' to list available events"),
- OPT_INTEGER('n', "nr-events", &nr_events,
- "number of dummy events to create (default 1). If used with -e, it clones those events n times (1 = no change)"),
- OPT_INTEGER('i', "iterations", &iterations, "Number of iterations used to compute average (default=100)"),
- OPT_BOOLEAN('a', "all-cpus", &opts.target.system_wide, "system-wide collection from all CPUs"),
- OPT_STRING('C', "cpu", &opts.target.cpu_list, "cpu", "list of cpus where to open events"),
- OPT_STRING('p', "pid", &opts.target.pid, "pid", "record events on existing process id"),
- OPT_STRING('t', "tid", &opts.target.tid, "tid", "record events on existing thread id"),
- OPT_STRING('u', "uid", &opts.target.uid_str, "user", "user to profile"),
- OPT_BOOLEAN(0, "per-thread", &opts.target.per_thread, "use per-thread mmaps"),
- OPT_END()
-};
-
-static const char *const bench_usage[] = {
- "perf bench internals evlist-open-close <options>",
- NULL
-};
-
static int evlist__count_evsel_fds(struct evlist *evlist)
{
struct evsel *evsel;
@@ -225,6 +206,29 @@ static char *bench__repeat_event_string(const char *evstr, int n)
int bench_evlist_open_close(int argc, const char **argv)
{
+ const struct option options[] = {
+ OPT_STRING('e', "event", &event_string, "event",
+ "event selector. use 'perf list' to list available events"),
+ OPT_INTEGER('n', "nr-events", &nr_events,
+ "number of dummy events to create (default 1). If used with -e, it clones those events n times (1 = no change)"),
+ OPT_INTEGER('i', "iterations", &iterations,
+ "Number of iterations used to compute average (default=100)"),
+ OPT_BOOLEAN('a', "all-cpus", &opts.target.system_wide,
+ "system-wide collection from all CPUs"),
+ OPT_STRING('C', "cpu", &opts.target.cpu_list, "cpu",
+ "list of cpus where to open events"),
+ OPT_STRING('p', "pid", &opts.target.pid, "pid",
+ "record events on existing process id"),
+ OPT_STRING('t', "tid", &opts.target.tid, "tid",
+ "record events on existing thread id"),
+ OPT_STRING('u', "uid", &opts.target.uid_str, "user", "user to profile"),
+ OPT_BOOLEAN(0, "per-thread", &opts.target.per_thread, "use per-thread mmaps"),
+ OPT_END()
+ };
+ const char *const bench_usage[] = {
+ "perf bench internals evlist-open-close <options>",
+ NULL
+ };
char *evstr, errbuf[BUFSIZ];
int err;
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-01-11 19:01 ` [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-02-12 14:51 ` Arnaldo Carvalho de Melo
2025-01-11 19:01 ` [PATCH v1 03/10] perf target: Separate parse_uid into its own function Ian Rogers
` (9 subsequent siblings)
11 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Rather than manually scanning PMUs, use evsel__find_pmu that can use
the PMU set during event parsing.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/parse-events.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1e23faa364b1..f147e13a7017 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2406,9 +2406,8 @@ foreach_evsel_in_last_glob(struct evlist *evlist,
static int set_filter(struct evsel *evsel, const void *arg)
{
const char *str = arg;
- bool found = false;
int nr_addr_filters = 0;
- struct perf_pmu *pmu = NULL;
+ struct perf_pmu *pmu;
if (evsel == NULL) {
fprintf(stderr,
@@ -2426,16 +2425,11 @@ static int set_filter(struct evsel *evsel, const void *arg)
return 0;
}
- while ((pmu = perf_pmus__scan(pmu)) != NULL)
- if (pmu->type == evsel->core.attr.type) {
- found = true;
- break;
- }
-
- if (found)
+ pmu = evsel__find_pmu(evsel);
+ if (pmu) {
perf_pmu__scan_file(pmu, "nr_addr_filters",
"%d", &nr_addr_filters);
-
+ }
if (!nr_addr_filters)
return perf_bpf_filter__parse(&evsel->bpf_filters, str);
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 03/10] perf target: Separate parse_uid into its own function
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-01-11 19:01 ` [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables Ian Rogers
2025-01-11 19:01 ` [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 04/10] perf parse-events: Add parse_uid_filter helper Ian Rogers
` (8 subsequent siblings)
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Allow parse_uid to be called without a struct target. Rather than have
two errors, remove TARGET_ERRNO__USER_NOT_FOUND and use
TARGET_ERRNO__INVALID_UID as the handling is identical.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/target.c | 22 ++++++++++++----------
tools/perf/util/target.h | 3 ++-
2 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
index 0f383418e3df..f3ad59ccfa99 100644
--- a/tools/perf/util/target.c
+++ b/tools/perf/util/target.c
@@ -94,15 +94,13 @@ enum target_errno target__validate(struct target *target)
return ret;
}
-enum target_errno target__parse_uid(struct target *target)
+uid_t parse_uid(const char *str)
{
struct passwd pwd, *result;
char buf[1024];
- const char *str = target->uid_str;
- target->uid = UINT_MAX;
if (str == NULL)
- return TARGET_ERRNO__SUCCESS;
+ return UINT_MAX;
/* Try user name first */
getpwnam_r(str, &pwd, buf, sizeof(buf), &result);
@@ -115,16 +113,22 @@ enum target_errno target__parse_uid(struct target *target)
int uid = strtol(str, &endptr, 10);
if (*endptr != '\0')
- return TARGET_ERRNO__INVALID_UID;
+ return UINT_MAX;
getpwuid_r(uid, &pwd, buf, sizeof(buf), &result);
if (result == NULL)
- return TARGET_ERRNO__USER_NOT_FOUND;
+ return UINT_MAX;
}
- target->uid = result->pw_uid;
- return TARGET_ERRNO__SUCCESS;
+ return result->pw_uid;
+}
+
+enum target_errno target__parse_uid(struct target *target)
+{
+ target->uid = parse_uid(target->uid_str);
+
+ return target->uid != UINT_MAX ? TARGET_ERRNO__SUCCESS : TARGET_ERRNO__INVALID_UID;
}
/*
@@ -142,7 +146,6 @@ static const char *target__error_str[] = {
"BPF switch overriding UID",
"BPF switch overriding THREAD",
"Invalid User: %s",
- "Problems obtaining information for user %s",
};
int target__strerror(struct target *target, int errnum,
@@ -171,7 +174,6 @@ int target__strerror(struct target *target, int errnum,
break;
case TARGET_ERRNO__INVALID_UID:
- case TARGET_ERRNO__USER_NOT_FOUND:
snprintf(buf, buflen, msg, target->uid_str);
break;
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index 2ee2cc30340f..e082bda990fb 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -48,12 +48,13 @@ enum target_errno {
/* for target__parse_uid() */
TARGET_ERRNO__INVALID_UID,
- TARGET_ERRNO__USER_NOT_FOUND,
__TARGET_ERRNO__END,
};
enum target_errno target__validate(struct target *target);
+
+uid_t parse_uid(const char *str);
enum target_errno target__parse_uid(struct target *target);
int target__strerror(struct target *target, int errnum, char *buf, size_t buflen);
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 04/10] perf parse-events: Add parse_uid_filter helper
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (2 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 03/10] perf target: Separate parse_uid into its own function Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 05/10] perf record: Switch user option to use BPF filter Ian Rogers
` (7 subsequent siblings)
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Add parse_uid_filter filter as a helper to parse_filter, that
constructs a uid filter string.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/parse-events.c | 11 +++++++++++
tools/perf/util/parse-events.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index f147e13a7017..dd298dd6655a 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2451,6 +2451,17 @@ int parse_filter(const struct option *opt, const char *str,
(const void *)str);
}
+int parse_uid_filter(struct evlist *evlist, uid_t uid)
+{
+ struct option opt = {
+ .value = &evlist,
+ };
+ char buf[128];
+
+ snprintf(buf, sizeof(buf), "uid == %d", uid);
+ return parse_filter(&opt, buf, /*unset=*/0);
+}
+
static int add_exclude_perf_filter(struct evsel *evsel,
const void *arg __maybe_unused)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e176a34ab088..289afd42d642 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -45,6 +45,7 @@ static inline int parse_events(struct evlist *evlist, const char *str,
int parse_event(struct evlist *evlist, const char *str);
int parse_filter(const struct option *opt, const char *str, int unset);
+int parse_uid_filter(struct evlist *evlist, uid_t uid);
int exclude_perf(const struct option *opt, const char *arg, int unset);
enum parse_events__term_val_type {
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 05/10] perf record: Switch user option to use BPF filter
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (3 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 04/10] perf parse-events: Add parse_uid_filter helper Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 06/10] perf top: " Ian Rogers
` (6 subsequent siblings)
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Finding user processes by scanning /proc is inherently racy and
results in perf_event_open failures. Use a BPF filter to drop samples
where the uid doesn't match.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-record.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5db1aedf48df..d0203f62faeb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -172,6 +172,7 @@ struct record {
bool timestamp_boundary;
bool off_cpu;
const char *filter_action;
+ const char *uid_str;
struct switch_output switch_output;
unsigned long long samples;
unsigned long output_max_size; /* = 0: unlimited */
@@ -3451,8 +3452,7 @@ static struct option __record_options[] = {
"or ranges of time to enable events e.g. '-D 10-20,30-40'",
record__parse_event_enable_time),
OPT_BOOLEAN(0, "kcore", &record.opts.kcore, "copy /proc/kcore"),
- OPT_STRING('u', "uid", &record.opts.target.uid_str, "user",
- "user to profile"),
+ OPT_STRING('u', "uid", &record.uid_str, "user", "user to profile"),
OPT_CALLBACK_NOOPT('b', "branch-any", &record.opts.branch_stack,
"branch any", "sample any taken branches",
@@ -4171,19 +4171,21 @@ int cmd_record(int argc, const char **argv)
ui__warning("%s\n", errbuf);
}
- err = target__parse_uid(&rec->opts.target);
- if (err) {
- int saved_errno = errno;
-
- target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
- ui__error("%s", errbuf);
+ if (rec->uid_str) {
+ uid_t uid = parse_uid(rec->uid_str);
- err = -saved_errno;
- goto out;
+ if (uid == UINT_MAX) {
+ ui__error("Invalid User: %s", rec->uid_str);
+ err = -EINVAL;
+ goto out;
+ }
+ err = parse_uid_filter(rec->evlist, uid);
+ if (err)
+ goto out;
}
- /* Enable ignoring missing threads when -u/-p option is defined. */
- rec->opts.ignore_missing_thread = rec->opts.target.uid != UINT_MAX || rec->opts.target.pid;
+ /* Enable ignoring missing threads when -p option is defined. */
+ rec->opts.ignore_missing_thread = rec->opts.target.pid;
evlist__warn_user_requested_cpus(rec->evlist, rec->opts.target.cpu_list);
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 06/10] perf top: Switch user option to use BPF filter
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (4 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 05/10] perf record: Switch user option to use BPF filter Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 07/10] perf trace: " Ian Rogers
` (5 subsequent siblings)
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Finding user processes by scanning /proc is inherently racy and
results in perf_event_open failures. Use a BPF filter to drop samples
where the uid doesn't match.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-top.c | 22 ++++++++++++----------
tools/perf/util/top.c | 4 ++--
tools/perf/util/top.h | 1 +
3 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ca3e8eca6610..f6d837be185b 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -642,7 +642,7 @@ static void *display_thread_tui(void *arg)
*/
evlist__for_each_entry(top->evlist, pos) {
struct hists *hists = evsel__hists(pos);
- hists->uid_filter_str = top->record_opts.target.uid_str;
+ hists->uid_filter_str = top->uid_str;
}
ret = evlist__tui_browse_hists(top->evlist, help, &hbt, top->min_percent,
@@ -1562,7 +1562,7 @@ int cmd_top(int argc, const char **argv)
"Add prefix to source file path names in programs (with --prefix-strip)"),
OPT_STRING(0, "prefix-strip", &annotate_opts.prefix_strip, "N",
"Strip first N entries of source file path name in programs (with --prefix)"),
- OPT_STRING('u', "uid", &target->uid_str, "user", "user to profile"),
+ OPT_STRING('u', "uid", &top.uid_str, "user", "user to profile"),
OPT_CALLBACK(0, "percent-limit", &top, "percent",
"Don't show entries under that percent", parse_percent_limit),
OPT_CALLBACK(0, "percentage", NULL, "relative|absolute",
@@ -1753,15 +1753,17 @@ int cmd_top(int argc, const char **argv)
ui__warning("%s\n", errbuf);
}
- status = target__parse_uid(target);
- if (status) {
- int saved_errno = errno;
-
- target__strerror(target, status, errbuf, BUFSIZ);
- ui__error("%s\n", errbuf);
+ if (top.uid_str) {
+ uid_t uid = parse_uid(top.uid_str);
- status = -saved_errno;
- goto out_delete_evlist;
+ if (uid == UINT_MAX) {
+ ui__error("Invalid User: %s", top.uid_str);
+ status = -EINVAL;
+ goto out_delete_evlist;
+ }
+ status = parse_uid_filter(top.evlist, uid);
+ if (status)
+ goto out_delete_evlist;
}
if (target__none(target))
diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
index 4db3d1bd686c..b06e10a116bb 100644
--- a/tools/perf/util/top.c
+++ b/tools/perf/util/top.c
@@ -88,9 +88,9 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
else if (target->tid)
ret += SNPRINTF(bf + ret, size - ret, " (target_tid: %s",
target->tid);
- else if (target->uid_str != NULL)
+ else if (top->uid_str != NULL)
ret += SNPRINTF(bf + ret, size - ret, " (uid: %s",
- target->uid_str);
+ top->uid_str);
else
ret += SNPRINTF(bf + ret, size - ret, " (all");
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index 4c5588dbb131..04ff926846be 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -48,6 +48,7 @@ struct perf_top {
const char *sym_filter;
float min_percent;
unsigned int nr_threads_synthesize;
+ const char *uid_str;
struct {
struct ordered_events *in;
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 07/10] perf trace: Switch user option to use BPF filter
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (5 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 06/10] perf top: " Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 08/10] perf bench evlist-open-close: " Ian Rogers
` (4 subsequent siblings)
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Finding user processes by scanning /proc is inherently racy and
results in perf_event_open failures. Use a BPF filter to drop samples
where the uid doesn't match.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-trace.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 6f5ae3ac0638..4b9a34c9179f 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -219,6 +219,7 @@ struct trace {
struct ordered_events data;
u64 last;
} oe;
+ const char *uid_str;
};
static void trace__load_vmlinux_btf(struct trace *trace __maybe_unused)
@@ -4233,8 +4234,8 @@ static int trace__run(struct trace *trace, int argc, const char **argv)
evlist__add(evlist, pgfault_min);
}
- /* Enable ignoring missing threads when -u/-p option is defined. */
- trace->opts.ignore_missing_thread = trace->opts.target.uid != UINT_MAX || trace->opts.target.pid;
+ /* Enable ignoring missing threads when -p option is defined. */
+ trace->opts.ignore_missing_thread = trace->opts.target.pid;
if (trace->sched &&
evlist__add_newtp(evlist, "sched", "sched_stat_runtime", trace__sched_stat_runtime))
@@ -5138,8 +5139,7 @@ int cmd_trace(int argc, const char **argv)
"child tasks do not inherit counters"),
OPT_CALLBACK('m', "mmap-pages", &trace.opts.mmap_pages, "pages",
"number of mmap data pages", evlist__parse_mmap_pages),
- OPT_STRING('u', "uid", &trace.opts.target.uid_str, "user",
- "user to profile"),
+ OPT_STRING('u', "uid", &trace.uid_str, "user", "user to profile"),
OPT_CALLBACK(0, "duration", &trace, "float",
"show only events with duration > N.M ms",
trace__set_duration),
@@ -5473,11 +5473,17 @@ int cmd_trace(int argc, const char **argv)
goto out_close;
}
- err = target__parse_uid(&trace.opts.target);
- if (err) {
- target__strerror(&trace.opts.target, err, bf, sizeof(bf));
- fprintf(trace.output, "%s", bf);
- goto out_close;
+ if (trace.uid_str) {
+ uid_t uid = parse_uid(trace.uid_str);
+
+ if (uid == UINT_MAX) {
+ ui__error("Invalid User: %s", trace.uid_str);
+ err = -EINVAL;
+ goto out_close;
+ }
+ err = parse_uid_filter(trace.evlist, uid);
+ if (err)
+ goto out_close;
}
if (!argc && target__none(&trace.opts.target))
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 08/10] perf bench evlist-open-close: Switch user option to use BPF filter
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (6 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 07/10] perf trace: " Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 09/10] perf target: Remove uid from target Ian Rogers
` (3 subsequent siblings)
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Finding user processes by scanning /proc is inherently racy and
results in perf_event_open failures. Use a BPF filter to drop samples
where the uid doesn't match.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/bench/evlist-open-close.c | 36 ++++++++++++++++------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
index 79cedcf94a39..bfaf50e4e519 100644
--- a/tools/perf/bench/evlist-open-close.c
+++ b/tools/perf/bench/evlist-open-close.c
@@ -57,7 +57,7 @@ static int evlist__count_evsel_fds(struct evlist *evlist)
return cnt;
}
-static struct evlist *bench__create_evlist(char *evstr)
+static struct evlist *bench__create_evlist(char *evstr, const char *uid_str)
{
struct parse_events_error err;
struct evlist *evlist = evlist__new();
@@ -78,6 +78,18 @@ static struct evlist *bench__create_evlist(char *evstr)
goto out_delete_evlist;
}
parse_events_error__exit(&err);
+ if (uid_str) {
+ uid_t uid = parse_uid(uid_str);
+
+ if (uid == UINT_MAX) {
+ pr_err("Invalid User: %s", uid_str);
+ ret = -EINVAL;
+ goto out_delete_evlist;
+ }
+ ret = parse_uid_filter(evlist, uid);
+ if (ret)
+ goto out_delete_evlist;
+ }
ret = evlist__create_maps(evlist, &opts.target);
if (ret < 0) {
pr_err("Not enough memory to create thread/cpu maps\n");
@@ -117,10 +129,10 @@ static int bench__do_evlist_open_close(struct evlist *evlist)
return 0;
}
-static int bench_evlist_open_close__run(char *evstr)
+static int bench_evlist_open_close__run(char *evstr, const char *uid_str)
{
// used to print statistics only
- struct evlist *evlist = bench__create_evlist(evstr);
+ struct evlist *evlist = bench__create_evlist(evstr, uid_str);
double time_average, time_stddev;
struct timeval start, end, diff;
struct stats time_stats;
@@ -142,7 +154,7 @@ static int bench_evlist_open_close__run(char *evstr)
for (i = 0; i < iterations; i++) {
pr_debug("Started iteration %d\n", i);
- evlist = bench__create_evlist(evstr);
+ evlist = bench__create_evlist(evstr, uid_str);
if (!evlist)
return -ENOMEM;
@@ -206,6 +218,7 @@ static char *bench__repeat_event_string(const char *evstr, int n)
int bench_evlist_open_close(int argc, const char **argv)
{
+ const char *uid_str = NULL;
const struct option options[] = {
OPT_STRING('e', "event", &event_string, "event",
"event selector. use 'perf list' to list available events"),
@@ -221,7 +234,7 @@ int bench_evlist_open_close(int argc, const char **argv)
"record events on existing process id"),
OPT_STRING('t', "tid", &opts.target.tid, "tid",
"record events on existing thread id"),
- OPT_STRING('u', "uid", &opts.target.uid_str, "user", "user to profile"),
+ OPT_STRING('u', "uid", &uid_str, "user", "user to profile"),
OPT_BOOLEAN(0, "per-thread", &opts.target.per_thread, "use per-thread mmaps"),
OPT_END()
};
@@ -245,15 +258,8 @@ int bench_evlist_open_close(int argc, const char **argv)
goto out;
}
- err = target__parse_uid(&opts.target);
- if (err) {
- target__strerror(&opts.target, err, errbuf, sizeof(errbuf));
- pr_err("%s", errbuf);
- goto out;
- }
-
- /* Enable ignoring missing threads when -u/-p option is defined. */
- opts.ignore_missing_thread = opts.target.uid != UINT_MAX || opts.target.pid;
+ /* Enable ignoring missing threads when -p option is defined. */
+ opts.ignore_missing_thread = opts.target.pid;
evstr = bench__repeat_event_string(event_string, nr_events);
if (!evstr) {
@@ -261,7 +267,7 @@ int bench_evlist_open_close(int argc, const char **argv)
goto out;
}
- err = bench_evlist_open_close__run(evstr);
+ err = bench_evlist_open_close__run(evstr, uid_str);
free(evstr);
out:
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 09/10] perf target: Remove uid from target
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (7 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 08/10] perf bench evlist-open-close: " Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 10/10] perf thread_map: Remove uid options Ian Rogers
` (2 subsequent siblings)
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Gathering threads with a uid by scanning /proc is inherently racy
leading to perf_event_open failures that quit perf. All users of the
functionality now use BPF filters, so remove uid and uid_str from
target.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-ftrace.c | 1 -
tools/perf/builtin-kvm.c | 2 -
tools/perf/builtin-stat.c | 4 +-
tools/perf/builtin-trace.c | 1 -
tools/perf/tests/backward-ring-buffer.c | 1 -
tools/perf/tests/event-times.c | 4 +-
tools/perf/tests/openat-syscall-tp-fields.c | 1 -
tools/perf/tests/perf-record.c | 1 -
tools/perf/tests/task-exit.c | 1 -
tools/perf/util/bpf-filter.c | 2 +-
tools/perf/util/evlist.c | 3 +-
tools/perf/util/target.c | 46 +--------------------
tools/perf/util/target.h | 12 +-----
13 files changed, 6 insertions(+), 73 deletions(-)
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index 08c1cc429b27..e7947cfa3497 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -1572,7 +1572,6 @@ int cmd_ftrace(int argc, const char **argv)
int (*cmd_func)(struct perf_ftrace *) = NULL;
struct perf_ftrace ftrace = {
.tracer = DEFAULT_TRACER,
- .target = { .uid = UINT_MAX, },
};
const struct option common_options[] = {
OPT_STRING('p', "pid", &ftrace.target.pid, "pid",
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 67fd2b006b0b..d75bd3684980 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1871,8 +1871,6 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
kvm->opts.user_interval = 1;
kvm->opts.mmap_pages = 512;
kvm->opts.target.uses_mmap = false;
- kvm->opts.target.uid_str = NULL;
- kvm->opts.target.uid = UINT_MAX;
symbol__init(NULL);
disable_buildid_cache();
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 77e327d4a9a7..dec47552a294 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -108,9 +108,7 @@ static struct parse_events_option_args parse_events_option_args = {
static bool all_counters_use_bpf = true;
-static struct target target = {
- .uid = UINT_MAX,
-};
+static struct target target;
static volatile sig_atomic_t child_pid = -1;
static int detailed_run = 0;
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 4b9a34c9179f..3c4d696da12e 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -5092,7 +5092,6 @@ int cmd_trace(int argc, const char **argv)
struct trace trace = {
.opts = {
.target = {
- .uid = UINT_MAX,
.uses_mmap = true,
},
.user_freq = UINT_MAX,
diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
index 79a980b1e786..c5e7999f2817 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -91,7 +91,6 @@ static int test__backward_ring_buffer(struct test_suite *test __maybe_unused, in
struct parse_events_error parse_error;
struct record_opts opts = {
.target = {
- .uid = UINT_MAX,
.uses_mmap = true,
},
.freq = 0,
diff --git a/tools/perf/tests/event-times.c b/tools/perf/tests/event-times.c
index deefe5003bfc..2148024b4f4a 100644
--- a/tools/perf/tests/event-times.c
+++ b/tools/perf/tests/event-times.c
@@ -17,9 +17,7 @@
static int attach__enable_on_exec(struct evlist *evlist)
{
struct evsel *evsel = evlist__last(evlist);
- struct target target = {
- .uid = UINT_MAX,
- };
+ struct target target = {};
const char *argv[] = { "true", NULL, };
char sbuf[STRERR_BUFSIZE];
int err;
diff --git a/tools/perf/tests/openat-syscall-tp-fields.c b/tools/perf/tests/openat-syscall-tp-fields.c
index 3943da441979..30bd521e65b4 100644
--- a/tools/perf/tests/openat-syscall-tp-fields.c
+++ b/tools/perf/tests/openat-syscall-tp-fields.c
@@ -28,7 +28,6 @@ static int test__syscall_openat_tp_fields(struct test_suite *test __maybe_unused
{
struct record_opts opts = {
.target = {
- .uid = UINT_MAX,
.uses_mmap = true,
},
.no_buffering = true,
diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c
index 1c4feec1adff..ed1dabd9e87b 100644
--- a/tools/perf/tests/perf-record.c
+++ b/tools/perf/tests/perf-record.c
@@ -45,7 +45,6 @@ static int test__PERF_RECORD(struct test_suite *test __maybe_unused, int subtest
{
struct record_opts opts = {
.target = {
- .uid = UINT_MAX,
.uses_mmap = true,
},
.no_buffering = true,
diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c
index 8e328bbd509d..4053ff2813bb 100644
--- a/tools/perf/tests/task-exit.c
+++ b/tools/perf/tests/task-exit.c
@@ -46,7 +46,6 @@ static int test__task_exit(struct test_suite *test __maybe_unused, int subtest _
struct evsel *evsel;
struct evlist *evlist;
struct target target = {
- .uid = UINT_MAX,
.uses_mmap = true,
};
const char *argv[] = { "true", NULL };
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index a4fdf6911ec1..ed4845ff4a10 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -449,7 +449,7 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target)
struct bpf_program *prog;
struct bpf_link *link;
struct perf_bpf_filter_entry *entry;
- bool needs_idx_hash = !target__has_cpu(target) && !target->uid_str;
+ bool needs_idx_hash = !target__has_cpu(target);
entry = calloc(MAX_FILTERS, sizeof(*entry));
if (entry == NULL)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f0dd174e2deb..1fd91b5e8777 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1006,8 +1006,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
* per-thread data. thread_map__new_str will call
* thread_map__new_all_cpus to enumerate all threads.
*/
- threads = thread_map__new_str(target->pid, target->tid, target->uid,
- all_threads);
+ threads = thread_map__new_str(target->pid, target->tid, UINT_MAX, all_threads);
if (!threads)
return -1;
diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
index f3ad59ccfa99..8cf71bea295a 100644
--- a/tools/perf/util/target.c
+++ b/tools/perf/util/target.c
@@ -28,20 +28,6 @@ enum target_errno target__validate(struct target *target)
ret = TARGET_ERRNO__PID_OVERRIDE_CPU;
}
- /* UID and PID are mutually exclusive */
- if (target->tid && target->uid_str) {
- target->uid_str = NULL;
- if (ret == TARGET_ERRNO__SUCCESS)
- ret = TARGET_ERRNO__PID_OVERRIDE_UID;
- }
-
- /* UID and CPU are mutually exclusive */
- if (target->uid_str && target->cpu_list) {
- target->cpu_list = NULL;
- if (ret == TARGET_ERRNO__SUCCESS)
- ret = TARGET_ERRNO__UID_OVERRIDE_CPU;
- }
-
/* PID and SYSTEM are mutually exclusive */
if (target->tid && target->system_wide) {
target->system_wide = false;
@@ -49,13 +35,6 @@ enum target_errno target__validate(struct target *target)
ret = TARGET_ERRNO__PID_OVERRIDE_SYSTEM;
}
- /* UID and SYSTEM are mutually exclusive */
- if (target->uid_str && target->system_wide) {
- target->system_wide = false;
- if (ret == TARGET_ERRNO__SUCCESS)
- ret = TARGET_ERRNO__UID_OVERRIDE_SYSTEM;
- }
-
/* BPF and CPU are mutually exclusive */
if (target->bpf_str && target->cpu_list) {
target->cpu_list = NULL;
@@ -70,13 +49,6 @@ enum target_errno target__validate(struct target *target)
ret = TARGET_ERRNO__BPF_OVERRIDE_PID;
}
- /* BPF and UID are mutually exclusive */
- if (target->bpf_str && target->uid_str) {
- target->uid_str = NULL;
- if (ret == TARGET_ERRNO__SUCCESS)
- ret = TARGET_ERRNO__BPF_OVERRIDE_UID;
- }
-
/* BPF and THREADS are mutually exclusive */
if (target->bpf_str && target->per_thread) {
target->per_thread = false;
@@ -124,31 +96,19 @@ uid_t parse_uid(const char *str)
return result->pw_uid;
}
-enum target_errno target__parse_uid(struct target *target)
-{
- target->uid = parse_uid(target->uid_str);
-
- return target->uid != UINT_MAX ? TARGET_ERRNO__SUCCESS : TARGET_ERRNO__INVALID_UID;
-}
-
/*
* This must have a same ordering as the enum target_errno.
*/
static const char *target__error_str[] = {
"PID/TID switch overriding CPU",
- "PID/TID switch overriding UID",
- "UID switch overriding CPU",
"PID/TID switch overriding SYSTEM",
- "UID switch overriding SYSTEM",
"SYSTEM/CPU switch overriding PER-THREAD",
"BPF switch overriding CPU",
"BPF switch overriding PID/TID",
- "BPF switch overriding UID",
"BPF switch overriding THREAD",
- "Invalid User: %s",
};
-int target__strerror(struct target *target, int errnum,
+int target__strerror(struct target *target __maybe_unused, int errnum,
char *buf, size_t buflen)
{
int idx;
@@ -173,10 +133,6 @@ int target__strerror(struct target *target, int errnum,
snprintf(buf, buflen, "%s", msg);
break;
- case TARGET_ERRNO__INVALID_UID:
- snprintf(buf, buflen, msg, target->uid_str);
- break;
-
default:
/* cannot reach here */
break;
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index e082bda990fb..84ebb9c940c6 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -9,9 +9,7 @@ struct target {
const char *pid;
const char *tid;
const char *cpu_list;
- const char *uid_str;
const char *bpf_str;
- uid_t uid;
bool system_wide;
bool uses_mmap;
bool default_per_cpu;
@@ -36,32 +34,24 @@ enum target_errno {
/* for target__validate() */
TARGET_ERRNO__PID_OVERRIDE_CPU = __TARGET_ERRNO__START,
- TARGET_ERRNO__PID_OVERRIDE_UID,
- TARGET_ERRNO__UID_OVERRIDE_CPU,
TARGET_ERRNO__PID_OVERRIDE_SYSTEM,
- TARGET_ERRNO__UID_OVERRIDE_SYSTEM,
TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD,
TARGET_ERRNO__BPF_OVERRIDE_CPU,
TARGET_ERRNO__BPF_OVERRIDE_PID,
- TARGET_ERRNO__BPF_OVERRIDE_UID,
TARGET_ERRNO__BPF_OVERRIDE_THREAD,
- /* for target__parse_uid() */
- TARGET_ERRNO__INVALID_UID,
-
__TARGET_ERRNO__END,
};
enum target_errno target__validate(struct target *target);
uid_t parse_uid(const char *str);
-enum target_errno target__parse_uid(struct target *target);
int target__strerror(struct target *target, int errnum, char *buf, size_t buflen);
static inline bool target__has_task(struct target *target)
{
- return target->tid || target->pid || target->uid_str;
+ return target->tid || target->pid;
}
static inline bool target__has_cpu(struct target *target)
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v1 10/10] perf thread_map: Remove uid options
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (8 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 09/10] perf target: Remove uid from target Ian Rogers
@ 2025-01-11 19:01 ` Ian Rogers
2025-02-10 18:18 ` [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-02-10 19:59 ` Namhyung Kim
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-01-11 19:01 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
Yang Jihong, linux-perf-users, linux-kernel
Now the target doesn't have a uid, it is handled through BPF filters,
remove the uid options to thread_map creation. Tidy up the functions
used in tests to avoid passing unused arguments.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/tests/event-times.c | 4 +--
tools/perf/tests/keep-tracking.c | 2 +-
tools/perf/tests/mmap-basic.c | 2 +-
tools/perf/tests/openat-syscall-all-cpus.c | 2 +-
tools/perf/tests/openat-syscall.c | 2 +-
tools/perf/tests/perf-time-to-tsc.c | 2 +-
tools/perf/tests/switch-tracking.c | 2 +-
tools/perf/tests/thread-map.c | 2 +-
tools/perf/util/evlist.c | 2 +-
tools/perf/util/python.c | 10 +++----
tools/perf/util/thread_map.c | 32 ++--------------------
tools/perf/util/thread_map.h | 6 ++--
12 files changed, 20 insertions(+), 48 deletions(-)
diff --git a/tools/perf/tests/event-times.c b/tools/perf/tests/event-times.c
index 2148024b4f4a..ae3b98bb42cf 100644
--- a/tools/perf/tests/event-times.c
+++ b/tools/perf/tests/event-times.c
@@ -62,7 +62,7 @@ static int attach__current_disabled(struct evlist *evlist)
pr_debug("attaching to current thread as disabled\n");
- threads = thread_map__new(-1, getpid(), UINT_MAX);
+ threads = thread_map__new_by_tid(getpid());
if (threads == NULL) {
pr_debug("thread_map__new\n");
return -1;
@@ -88,7 +88,7 @@ static int attach__current_enabled(struct evlist *evlist)
pr_debug("attaching to current thread as enabled\n");
- threads = thread_map__new(-1, getpid(), UINT_MAX);
+ threads = thread_map__new_by_tid(getpid());
if (threads == NULL) {
pr_debug("failed to call thread_map__new\n");
return -1;
diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c
index 5a3b2bed07f3..eafb49eb0b56 100644
--- a/tools/perf/tests/keep-tracking.c
+++ b/tools/perf/tests/keep-tracking.c
@@ -78,7 +78,7 @@ static int test__keep_tracking(struct test_suite *test __maybe_unused, int subte
int found, err = -1;
const char *comm;
- threads = thread_map__new(-1, getpid(), UINT_MAX);
+ threads = thread_map__new_by_tid(getpid());
CHECK_NOT_NULL__(threads);
cpus = perf_cpu_map__new_online_cpus();
diff --git a/tools/perf/tests/mmap-basic.c b/tools/perf/tests/mmap-basic.c
index 012c8ae439fd..cf2ca28aa0f7 100644
--- a/tools/perf/tests/mmap-basic.c
+++ b/tools/perf/tests/mmap-basic.c
@@ -46,7 +46,7 @@ static int test__basic_mmap(struct test_suite *test __maybe_unused, int subtest
char sbuf[STRERR_BUFSIZE];
struct mmap *md;
- threads = thread_map__new(-1, getpid(), UINT_MAX);
+ threads = thread_map__new_by_tid(getpid());
if (threads == NULL) {
pr_debug("thread_map__new\n");
return -1;
diff --git a/tools/perf/tests/openat-syscall-all-cpus.c b/tools/perf/tests/openat-syscall-all-cpus.c
index fb114118c876..3644d6f52c07 100644
--- a/tools/perf/tests/openat-syscall-all-cpus.c
+++ b/tools/perf/tests/openat-syscall-all-cpus.c
@@ -28,7 +28,7 @@ static int test__openat_syscall_event_on_all_cpus(struct test_suite *test __mayb
struct evsel *evsel;
unsigned int nr_openat_calls = 111, i;
cpu_set_t cpu_set;
- struct perf_thread_map *threads = thread_map__new(-1, getpid(), UINT_MAX);
+ struct perf_thread_map *threads = thread_map__new_by_tid(getpid());
char sbuf[STRERR_BUFSIZE];
char errbuf[BUFSIZ];
diff --git a/tools/perf/tests/openat-syscall.c b/tools/perf/tests/openat-syscall.c
index 131b62271bfa..b54cbe5f1808 100644
--- a/tools/perf/tests/openat-syscall.c
+++ b/tools/perf/tests/openat-syscall.c
@@ -20,7 +20,7 @@ static int test__openat_syscall_event(struct test_suite *test __maybe_unused,
int err = TEST_FAIL, fd;
struct evsel *evsel;
unsigned int nr_openat_calls = 111, i;
- struct perf_thread_map *threads = thread_map__new(-1, getpid(), UINT_MAX);
+ struct perf_thread_map *threads = thread_map__new_by_tid(getpid());
char sbuf[STRERR_BUFSIZE];
char errbuf[BUFSIZ];
diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c
index bbe2ddeb9b74..c2a6b451d7ff 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -90,7 +90,7 @@ static int test__perf_time_to_tsc(struct test_suite *test __maybe_unused, int su
struct mmap *md;
- threads = thread_map__new(-1, getpid(), UINT_MAX);
+ threads = thread_map__new_by_tid(getpid());
CHECK_NOT_NULL__(threads);
cpus = perf_cpu_map__new_online_cpus();
diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c
index 576f82a15015..0691c272537c 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -345,7 +345,7 @@ static int test__switch_tracking(struct test_suite *test __maybe_unused, int sub
const char *comm;
int err = -1;
- threads = thread_map__new(-1, getpid(), UINT_MAX);
+ threads = thread_map__new_by_tid(getpid());
if (!threads) {
pr_debug("thread_map__new failed!\n");
goto out_err;
diff --git a/tools/perf/tests/thread-map.c b/tools/perf/tests/thread-map.c
index 1fe521466bf4..54209592168d 100644
--- a/tools/perf/tests/thread-map.c
+++ b/tools/perf/tests/thread-map.c
@@ -115,7 +115,7 @@ static int test__thread_map_remove(struct test_suite *test __maybe_unused, int s
TEST_ASSERT_VAL("failed to allocate map string",
asprintf(&str, "%d,%d", getpid(), getppid()) >= 0);
- threads = thread_map__new_str(str, NULL, 0, false);
+ threads = thread_map__new_str(str, /*tid=*/NULL, /*all_threads=*/false);
free(str);
TEST_ASSERT_VAL("failed to allocate thread_map",
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1fd91b5e8777..f686c9f5054d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1006,7 +1006,7 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
* per-thread data. thread_map__new_str will call
* thread_map__new_all_cpus to enumerate all threads.
*/
- threads = thread_map__new_str(target->pid, target->tid, UINT_MAX, all_threads);
+ threads = thread_map__new_str(target->pid, target->tid, all_threads);
if (!threads)
return -1;
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index b4bc57859f73..2488f6f4a3f7 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -566,14 +566,14 @@ struct pyrf_thread_map {
static int pyrf_thread_map__init(struct pyrf_thread_map *pthreads,
PyObject *args, PyObject *kwargs)
{
- static char *kwlist[] = { "pid", "tid", "uid", NULL };
- int pid = -1, tid = -1, uid = UINT_MAX;
+ static char *kwlist[] = { "pid", "tid", NULL };
+ int pid = -1, tid = -1;
- if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iii",
- kwlist, &pid, &tid, &uid))
+ if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|ii",
+ kwlist, &pid, &tid))
return -1;
- pthreads->threads = thread_map__new(pid, tid, uid);
+ pthreads->threads = thread_map__new(pid, tid);
if (pthreads->threads == NULL)
return -1;
return 0;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index b5f12390c355..ca193c1374ed 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -72,7 +72,7 @@ struct perf_thread_map *thread_map__new_by_tid(pid_t tid)
return threads;
}
-static struct perf_thread_map *__thread_map__new_all_cpus(uid_t uid)
+static struct perf_thread_map *thread_map__new_all_cpus(void)
{
DIR *proc;
int max_threads = 32, items, i;
@@ -98,15 +98,6 @@ static struct perf_thread_map *__thread_map__new_all_cpus(uid_t uid)
if (*end) /* only interested in proper numerical dirents */
continue;
- snprintf(path, sizeof(path), "/proc/%s", dirent->d_name);
-
- if (uid != UINT_MAX) {
- struct stat st;
-
- if (stat(path, &st) != 0 || st.st_uid != uid)
- continue;
- }
-
snprintf(path, sizeof(path), "/proc/%d/task", pid);
items = scandir(path, &namelist, filter, NULL);
if (items <= 0) {
@@ -157,24 +148,11 @@ static struct perf_thread_map *__thread_map__new_all_cpus(uid_t uid)
goto out_closedir;
}
-struct perf_thread_map *thread_map__new_all_cpus(void)
-{
- return __thread_map__new_all_cpus(UINT_MAX);
-}
-
-struct perf_thread_map *thread_map__new_by_uid(uid_t uid)
-{
- return __thread_map__new_all_cpus(uid);
-}
-
-struct perf_thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid)
+struct perf_thread_map *thread_map__new(pid_t pid, pid_t tid)
{
if (pid != -1)
return thread_map__new_by_pid(pid);
- if (tid == -1 && uid != UINT_MAX)
- return thread_map__new_by_uid(uid);
-
return thread_map__new_by_tid(tid);
}
@@ -289,15 +267,11 @@ struct perf_thread_map *thread_map__new_by_tid_str(const char *tid_str)
goto out;
}
-struct perf_thread_map *thread_map__new_str(const char *pid, const char *tid,
- uid_t uid, bool all_threads)
+struct perf_thread_map *thread_map__new_str(const char *pid, const char *tid, bool all_threads)
{
if (pid)
return thread_map__new_by_pid_str(pid);
- if (!tid && uid != UINT_MAX)
- return thread_map__new_by_uid(uid);
-
if (all_threads)
return thread_map__new_all_cpus();
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 00ec05fc1656..fc16d87f32fb 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -11,13 +11,11 @@ struct perf_record_thread_map;
struct perf_thread_map *thread_map__new_dummy(void);
struct perf_thread_map *thread_map__new_by_pid(pid_t pid);
struct perf_thread_map *thread_map__new_by_tid(pid_t tid);
-struct perf_thread_map *thread_map__new_by_uid(uid_t uid);
-struct perf_thread_map *thread_map__new_all_cpus(void);
-struct perf_thread_map *thread_map__new(pid_t pid, pid_t tid, uid_t uid);
+struct perf_thread_map *thread_map__new(pid_t pid, pid_t tid);
struct perf_thread_map *thread_map__new_event(struct perf_record_thread_map *event);
struct perf_thread_map *thread_map__new_str(const char *pid,
- const char *tid, uid_t uid, bool all_threads);
+ const char *tid, bool all_threads);
struct perf_thread_map *thread_map__new_by_tid_str(const char *tid_str);
--
2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (9 preceding siblings ...)
2025-01-11 19:01 ` [PATCH v1 10/10] perf thread_map: Remove uid options Ian Rogers
@ 2025-02-10 18:18 ` Ian Rogers
2025-02-10 19:59 ` Namhyung Kim
11 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-02-10 18:18 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Hao Ge, James Clark,
Howard Chu, Dominique Martinet, Levi Yun, Xu Yang, Tengda Wu,
linux-perf-users, linux-kernel
On Sat, Jan 11, 2025 at 11:01 AM Ian Rogers <irogers@google.com> wrote:
>
> Rather than scanning /proc and skipping PIDs based on their UIDs, use
> BPF filters for uid filtering. The /proc scanning in thread_map is
> racy as the PID may exit before the perf_event_open causing perf to
> abort. BPF UID filters are more robust as they avoid the race. Add a
> helper for commands that support UID filtering and wire up. Remove the
> non-BPF UID filtering support.
>
> Ian Rogers (10):
> perf bench evlist-open-close: Reduce scope of 2 variables
> perf parse-events filter: Use evsel__find_pmu
> perf target: Separate parse_uid into its own function
> perf parse-events: Add parse_uid_filter helper
> perf record: Switch user option to use BPF filter
> perf top: Switch user option to use BPF filter
> perf trace: Switch user option to use BPF filter
> perf bench evlist-open-close: Switch user option to use BPF filter
> perf target: Remove uid from target
> perf thread_map: Remove uid options
>
> tools/perf/bench/evlist-open-close.c | 76 ++++++++++++---------
> tools/perf/builtin-ftrace.c | 1 -
> tools/perf/builtin-kvm.c | 2 -
> tools/perf/builtin-record.c | 26 +++----
> tools/perf/builtin-stat.c | 4 +-
> tools/perf/builtin-top.c | 22 +++---
> tools/perf/builtin-trace.c | 25 ++++---
> tools/perf/tests/backward-ring-buffer.c | 1 -
> tools/perf/tests/event-times.c | 8 +--
> tools/perf/tests/keep-tracking.c | 2 +-
> tools/perf/tests/mmap-basic.c | 2 +-
> tools/perf/tests/openat-syscall-all-cpus.c | 2 +-
> tools/perf/tests/openat-syscall-tp-fields.c | 1 -
> tools/perf/tests/openat-syscall.c | 2 +-
> tools/perf/tests/perf-record.c | 1 -
> tools/perf/tests/perf-time-to-tsc.c | 2 +-
> tools/perf/tests/switch-tracking.c | 2 +-
> tools/perf/tests/task-exit.c | 1 -
> tools/perf/tests/thread-map.c | 2 +-
> tools/perf/util/bpf-filter.c | 2 +-
> tools/perf/util/evlist.c | 3 +-
> tools/perf/util/parse-events.c | 25 ++++---
> tools/perf/util/parse-events.h | 1 +
> tools/perf/util/python.c | 10 +--
> tools/perf/util/target.c | 54 ++-------------
> tools/perf/util/target.h | 15 +---
> tools/perf/util/thread_map.c | 32 +--------
> tools/perf/util/thread_map.h | 6 +-
> tools/perf/util/top.c | 4 +-
> tools/perf/util/top.h | 1 +
> 30 files changed, 135 insertions(+), 200 deletions(-)
Ping. Always nice to see code being removed and the BPF based uid
filtering functionality is superior.
Thanks,
Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
` (10 preceding siblings ...)
2025-02-10 18:18 ` [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
@ 2025-02-10 19:59 ` Namhyung Kim
2025-02-10 22:06 ` Ian Rogers
11 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-10 19:59 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> Rather than scanning /proc and skipping PIDs based on their UIDs, use
> BPF filters for uid filtering. The /proc scanning in thread_map is
> racy as the PID may exit before the perf_event_open causing perf to
> abort. BPF UID filters are more robust as they avoid the race. Add a
> helper for commands that support UID filtering and wire up. Remove the
> non-BPF UID filtering support.
Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
Also non-root users will be limited unless it pinned the BPF program in
advance.
I think you can keep the original behavior and convert to BPF only when
it's available.
Thanks,
Namhyung
>
> Ian Rogers (10):
> perf bench evlist-open-close: Reduce scope of 2 variables
> perf parse-events filter: Use evsel__find_pmu
> perf target: Separate parse_uid into its own function
> perf parse-events: Add parse_uid_filter helper
> perf record: Switch user option to use BPF filter
> perf top: Switch user option to use BPF filter
> perf trace: Switch user option to use BPF filter
> perf bench evlist-open-close: Switch user option to use BPF filter
> perf target: Remove uid from target
> perf thread_map: Remove uid options
>
> tools/perf/bench/evlist-open-close.c | 76 ++++++++++++---------
> tools/perf/builtin-ftrace.c | 1 -
> tools/perf/builtin-kvm.c | 2 -
> tools/perf/builtin-record.c | 26 +++----
> tools/perf/builtin-stat.c | 4 +-
> tools/perf/builtin-top.c | 22 +++---
> tools/perf/builtin-trace.c | 25 ++++---
> tools/perf/tests/backward-ring-buffer.c | 1 -
> tools/perf/tests/event-times.c | 8 +--
> tools/perf/tests/keep-tracking.c | 2 +-
> tools/perf/tests/mmap-basic.c | 2 +-
> tools/perf/tests/openat-syscall-all-cpus.c | 2 +-
> tools/perf/tests/openat-syscall-tp-fields.c | 1 -
> tools/perf/tests/openat-syscall.c | 2 +-
> tools/perf/tests/perf-record.c | 1 -
> tools/perf/tests/perf-time-to-tsc.c | 2 +-
> tools/perf/tests/switch-tracking.c | 2 +-
> tools/perf/tests/task-exit.c | 1 -
> tools/perf/tests/thread-map.c | 2 +-
> tools/perf/util/bpf-filter.c | 2 +-
> tools/perf/util/evlist.c | 3 +-
> tools/perf/util/parse-events.c | 25 ++++---
> tools/perf/util/parse-events.h | 1 +
> tools/perf/util/python.c | 10 +--
> tools/perf/util/target.c | 54 ++-------------
> tools/perf/util/target.h | 15 +---
> tools/perf/util/thread_map.c | 32 +--------
> tools/perf/util/thread_map.h | 6 +-
> tools/perf/util/top.c | 4 +-
> tools/perf/util/top.h | 1 +
> 30 files changed, 135 insertions(+), 200 deletions(-)
>
> --
> 2.47.1.613.gc27f4b7a9f-goog
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-10 19:59 ` Namhyung Kim
@ 2025-02-10 22:06 ` Ian Rogers
2025-02-11 3:12 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-10 22:06 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > BPF filters for uid filtering. The /proc scanning in thread_map is
> > racy as the PID may exit before the perf_event_open causing perf to
> > abort. BPF UID filters are more robust as they avoid the race. Add a
> > helper for commands that support UID filtering and wire up. Remove the
> > non-BPF UID filtering support.
>
> Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> Also non-root users will be limited unless it pinned the BPF program in
> advance.
>
> I think you can keep the original behavior and convert to BPF only when
> it's available.
Using BPF when available would be limited progress. The issues I have
with not removing the existing approach are:
1) It is broken
Scanning /proc for pids and then doing perf_event_open means that any
time a process exits the perf_event_open fails.
Steps to reproduce:
This bug reproduces easily but if your machine is lightly loaded in
one terminal run `perf test`, in another terminal run `sudo perf top
-u $(id -u)` the perf top command will exit with:
```
Error:
The sys_perf_event_open() syscall returned with 3 (No such process)
for event (cycles:P).
/bin/dmesg | grep -i perf may provide additional information.
```
2) It is a work in progress that isn't progressing. Scanning /proc
will only tell you about started processes; it won't tell you about
processes that start during the profiling run, whether it be perf top
or perf record. Extra work would be necessary to make this most basic
of use-cases work, perhaps you could use tracepoints to capture
starting processes and then enable user profiling on those processes.
It would be horribly complicated, suffer from delays between observing
things happen and doing the perf_event_open, biases in the samples,
etc. I don't expect anyone to do it, especially when BPF filtering
already solves the problem better. There have been 13 years that
someone could have fixed it.
3) it adds significant useless complexity to the code base. Having
'user' in the target makes it look like perf_event_open can work on a
pid, system wide or user basis. The user basis doesn't exist so the
majority of the code base is just ignoring it - search for users of
uid_str on target. Those that do run into problems (1) and (2).
4) It is redundant and leads to confusion with BPF filtering. Having
the BPF filter on evsels is non-zero cost in terms of the code base
complexity, but it is something broadly useful. As user filtering has
never worked (see above) it isn't broadly used but is adding
complexity. If both approaches were wanted then it is unclear what the
code should be doing, presumably the mish-mash of BPF filtering and
/proc scanning that happens today and will be broken due to (1) and
(2). Putting everything into the BPF filter makes sense as you can
combine a BPF filter with an additional BPF filter on user.
5) It is untested and adding a test leads to an always broken test due
to testing being an example of where things break, see point 1 and its
example.
6) Nobody wants the non-BPF approach. As it was broken nobody used the
previous feature, maintaining it for no users is overhead. Let me know
if someone is using this option (I doubt it given points 1 and 2) and
they wouldn't be better served by BPF. People building perf today have
to explicitly opt-out of wanting BPF in their tooling. I posted this
change a month ago and nobody has jumped up saying please don't remove
the old approach.
7) The interaction with this feature and changes in behavior, say
ignoring events that fail to open, is non-obvious and not testable as
testing would be broken as the functionality itself is broken. Having
the broken feature hanging around and being untestable means that we
slow progress on new features, testing and other improvements.
Yes, we could try to fix all of that but why? Nobody has cared or
tried in 13 years and I would like the tech debt off our plate with a
better approach in its place.
Thanks,
Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-10 22:06 ` Ian Rogers
@ 2025-02-11 3:12 ` Namhyung Kim
2025-02-11 4:40 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-11 3:12 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > racy as the PID may exit before the perf_event_open causing perf to
> > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > helper for commands that support UID filtering and wire up. Remove the
> > > non-BPF UID filtering support.
> >
> > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > Also non-root users will be limited unless it pinned the BPF program in
> > advance.
> >
> > I think you can keep the original behavior and convert to BPF only when
> > it's available.
>
> Using BPF when available would be limited progress. The issues I have
> with not removing the existing approach are:
>
> 1) It is broken
> Scanning /proc for pids and then doing perf_event_open means that any
> time a process exits the perf_event_open fails.
> Steps to reproduce:
> This bug reproduces easily but if your machine is lightly loaded in
> one terminal run `perf test`, in another terminal run `sudo perf top
> -u $(id -u)` the perf top command will exit with:
> ```
> Error:
> The sys_perf_event_open() syscall returned with 3 (No such process)
> for event (cycles:P).
> /bin/dmesg | grep -i perf may provide additional information.
> ```
>
> 2) It is a work in progress that isn't progressing. Scanning /proc
> will only tell you about started processes; it won't tell you about
> processes that start during the profiling run, whether it be perf top
> or perf record. Extra work would be necessary to make this most basic
> of use-cases work, perhaps you could use tracepoints to capture
> starting processes and then enable user profiling on those processes.
> It would be horribly complicated, suffer from delays between observing
> things happen and doing the perf_event_open, biases in the samples,
> etc. I don't expect anyone to do it, especially when BPF filtering
> already solves the problem better. There have been 13 years that
> someone could have fixed it.
>
> 3) it adds significant useless complexity to the code base. Having
> 'user' in the target makes it look like perf_event_open can work on a
> pid, system wide or user basis. The user basis doesn't exist so the
> majority of the code base is just ignoring it - search for users of
> uid_str on target. Those that do run into problems (1) and (2).
>
> 4) It is redundant and leads to confusion with BPF filtering. Having
> the BPF filter on evsels is non-zero cost in terms of the code base
> complexity, but it is something broadly useful. As user filtering has
> never worked (see above) it isn't broadly used but is adding
> complexity. If both approaches were wanted then it is unclear what the
> code should be doing, presumably the mish-mash of BPF filtering and
> /proc scanning that happens today and will be broken due to (1) and
> (2). Putting everything into the BPF filter makes sense as you can
> combine a BPF filter with an additional BPF filter on user.
>
> 5) It is untested and adding a test leads to an always broken test due
> to testing being an example of where things break, see point 1 and its
> example.
>
> 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> previous feature, maintaining it for no users is overhead. Let me know
> if someone is using this option (I doubt it given points 1 and 2) and
> they wouldn't be better served by BPF. People building perf today have
> to explicitly opt-out of wanting BPF in their tooling. I posted this
> change a month ago and nobody has jumped up saying please don't remove
> the old approach.
>
> 7) The interaction with this feature and changes in behavior, say
> ignoring events that fail to open, is non-obvious and not testable as
> testing would be broken as the functionality itself is broken. Having
> the broken feature hanging around and being untestable means that we
> slow progress on new features, testing and other improvements.
>
> Yes, we could try to fix all of that but why? Nobody has cared or
> tried in 13 years and I would like the tech debt off our plate with a
> better approach in its place.
Thanks for writing this up. I agree BPF approach is better but it has
its own limitation - basically it requires root. And I know a few of
people who don't use BPF. :) And maybe we need to check if user passes
filters to the event already.
Also, I admit that I don't know who actually uses this. But I can say
sometimes people uses tools in a creative way. Anyway, I can imagine
an use case that system is in a steady state and each user has dedicated
jobs. Then scanning /proc would work well.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-11 3:12 ` Namhyung Kim
@ 2025-02-11 4:40 ` Ian Rogers
2025-02-11 17:51 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-11 4:40 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Mon, Feb 10, 2025 at 7:12 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> > On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > > helper for commands that support UID filtering and wire up. Remove the
> > > > non-BPF UID filtering support.
> > >
> > > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > > Also non-root users will be limited unless it pinned the BPF program in
> > > advance.
> > >
> > > I think you can keep the original behavior and convert to BPF only when
> > > it's available.
> >
> > Using BPF when available would be limited progress. The issues I have
> > with not removing the existing approach are:
> >
> > 1) It is broken
> > Scanning /proc for pids and then doing perf_event_open means that any
> > time a process exits the perf_event_open fails.
> > Steps to reproduce:
> > This bug reproduces easily but if your machine is lightly loaded in
> > one terminal run `perf test`, in another terminal run `sudo perf top
> > -u $(id -u)` the perf top command will exit with:
> > ```
> > Error:
> > The sys_perf_event_open() syscall returned with 3 (No such process)
> > for event (cycles:P).
> > /bin/dmesg | grep -i perf may provide additional information.
> > ```
> >
> > 2) It is a work in progress that isn't progressing. Scanning /proc
> > will only tell you about started processes; it won't tell you about
> > processes that start during the profiling run, whether it be perf top
> > or perf record. Extra work would be necessary to make this most basic
> > of use-cases work, perhaps you could use tracepoints to capture
> > starting processes and then enable user profiling on those processes.
> > It would be horribly complicated, suffer from delays between observing
> > things happen and doing the perf_event_open, biases in the samples,
> > etc. I don't expect anyone to do it, especially when BPF filtering
> > already solves the problem better. There have been 13 years that
> > someone could have fixed it.
> >
> > 3) it adds significant useless complexity to the code base. Having
> > 'user' in the target makes it look like perf_event_open can work on a
> > pid, system wide or user basis. The user basis doesn't exist so the
> > majority of the code base is just ignoring it - search for users of
> > uid_str on target. Those that do run into problems (1) and (2).
> >
> > 4) It is redundant and leads to confusion with BPF filtering. Having
> > the BPF filter on evsels is non-zero cost in terms of the code base
> > complexity, but it is something broadly useful. As user filtering has
> > never worked (see above) it isn't broadly used but is adding
> > complexity. If both approaches were wanted then it is unclear what the
> > code should be doing, presumably the mish-mash of BPF filtering and
> > /proc scanning that happens today and will be broken due to (1) and
> > (2). Putting everything into the BPF filter makes sense as you can
> > combine a BPF filter with an additional BPF filter on user.
> >
> > 5) It is untested and adding a test leads to an always broken test due
> > to testing being an example of where things break, see point 1 and its
> > example.
> >
> > 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> > previous feature, maintaining it for no users is overhead. Let me know
> > if someone is using this option (I doubt it given points 1 and 2) and
> > they wouldn't be better served by BPF. People building perf today have
> > to explicitly opt-out of wanting BPF in their tooling. I posted this
> > change a month ago and nobody has jumped up saying please don't remove
> > the old approach.
> >
> > 7) The interaction with this feature and changes in behavior, say
> > ignoring events that fail to open, is non-obvious and not testable as
> > testing would be broken as the functionality itself is broken. Having
> > the broken feature hanging around and being untestable means that we
> > slow progress on new features, testing and other improvements.
> >
> > Yes, we could try to fix all of that but why? Nobody has cared or
> > tried in 13 years and I would like the tech debt off our plate with a
> > better approach in its place.
>
> Thanks for writing this up. I agree BPF approach is better but it has
> its own limitation - basically it requires root. And I know a few of
> people who don't use BPF. :) And maybe we need to check if user passes
> filters to the event already.
I thought you were working on making the BPF filters pin-able? So root
could enable the filter but then users have access to it. You have to
be root to be looking at other users in any case.
> Also, I admit that I don't know who actually uses this. But I can say
> sometimes people uses tools in a creative way. Anyway, I can imagine
> an use case that system is in a steady state and each user has dedicated
> jobs. Then scanning /proc would work well.
Another one for Google's tree then.
Thanks,
Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-11 4:40 ` Ian Rogers
@ 2025-02-11 17:51 ` Namhyung Kim
2025-02-11 18:06 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-11 17:51 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Mon, Feb 10, 2025 at 08:40:01PM -0800, Ian Rogers wrote:
> On Mon, Feb 10, 2025 at 7:12 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> > > On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > > > helper for commands that support UID filtering and wire up. Remove the
> > > > > non-BPF UID filtering support.
> > > >
> > > > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > > > Also non-root users will be limited unless it pinned the BPF program in
> > > > advance.
> > > >
> > > > I think you can keep the original behavior and convert to BPF only when
> > > > it's available.
> > >
> > > Using BPF when available would be limited progress. The issues I have
> > > with not removing the existing approach are:
> > >
> > > 1) It is broken
> > > Scanning /proc for pids and then doing perf_event_open means that any
> > > time a process exits the perf_event_open fails.
> > > Steps to reproduce:
> > > This bug reproduces easily but if your machine is lightly loaded in
> > > one terminal run `perf test`, in another terminal run `sudo perf top
> > > -u $(id -u)` the perf top command will exit with:
> > > ```
> > > Error:
> > > The sys_perf_event_open() syscall returned with 3 (No such process)
> > > for event (cycles:P).
> > > /bin/dmesg | grep -i perf may provide additional information.
> > > ```
> > >
> > > 2) It is a work in progress that isn't progressing. Scanning /proc
> > > will only tell you about started processes; it won't tell you about
> > > processes that start during the profiling run, whether it be perf top
> > > or perf record. Extra work would be necessary to make this most basic
> > > of use-cases work, perhaps you could use tracepoints to capture
> > > starting processes and then enable user profiling on those processes.
> > > It would be horribly complicated, suffer from delays between observing
> > > things happen and doing the perf_event_open, biases in the samples,
> > > etc. I don't expect anyone to do it, especially when BPF filtering
> > > already solves the problem better. There have been 13 years that
> > > someone could have fixed it.
> > >
> > > 3) it adds significant useless complexity to the code base. Having
> > > 'user' in the target makes it look like perf_event_open can work on a
> > > pid, system wide or user basis. The user basis doesn't exist so the
> > > majority of the code base is just ignoring it - search for users of
> > > uid_str on target. Those that do run into problems (1) and (2).
> > >
> > > 4) It is redundant and leads to confusion with BPF filtering. Having
> > > the BPF filter on evsels is non-zero cost in terms of the code base
> > > complexity, but it is something broadly useful. As user filtering has
> > > never worked (see above) it isn't broadly used but is adding
> > > complexity. If both approaches were wanted then it is unclear what the
> > > code should be doing, presumably the mish-mash of BPF filtering and
> > > /proc scanning that happens today and will be broken due to (1) and
> > > (2). Putting everything into the BPF filter makes sense as you can
> > > combine a BPF filter with an additional BPF filter on user.
> > >
> > > 5) It is untested and adding a test leads to an always broken test due
> > > to testing being an example of where things break, see point 1 and its
> > > example.
> > >
> > > 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> > > previous feature, maintaining it for no users is overhead. Let me know
> > > if someone is using this option (I doubt it given points 1 and 2) and
> > > they wouldn't be better served by BPF. People building perf today have
> > > to explicitly opt-out of wanting BPF in their tooling. I posted this
> > > change a month ago and nobody has jumped up saying please don't remove
> > > the old approach.
> > >
> > > 7) The interaction with this feature and changes in behavior, say
> > > ignoring events that fail to open, is non-obvious and not testable as
> > > testing would be broken as the functionality itself is broken. Having
> > > the broken feature hanging around and being untestable means that we
> > > slow progress on new features, testing and other improvements.
> > >
> > > Yes, we could try to fix all of that but why? Nobody has cared or
> > > tried in 13 years and I would like the tech debt off our plate with a
> > > better approach in its place.
> >
> > Thanks for writing this up. I agree BPF approach is better but it has
> > its own limitation - basically it requires root. And I know a few of
> > people who don't use BPF. :) And maybe we need to check if user passes
> > filters to the event already.
>
> I thought you were working on making the BPF filters pin-able? So root
> could enable the filter but then users have access to it.
Right, 'perf record --setup-filter pin' would do the job. But it has to
be run in advance.
> You have to be root to be looking at other users in any case.
That's true. But at least you can profile your processes. :)
>
> > Also, I admit that I don't know who actually uses this. But I can say
> > sometimes people uses tools in a creative way. Anyway, I can imagine
> > an use case that system is in a steady state and each user has dedicated
> > jobs. Then scanning /proc would work well.
>
> Another one for Google's tree then.
Any chance you update the patchset to retain the old behavior and use
BPF only if available?
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-11 17:51 ` Namhyung Kim
@ 2025-02-11 18:06 ` Ian Rogers
2025-02-12 1:51 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-11 18:06 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Tue, Feb 11, 2025 at 9:51 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Feb 10, 2025 at 08:40:01PM -0800, Ian Rogers wrote:
> > On Mon, Feb 10, 2025 at 7:12 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> > > > On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > > > > helper for commands that support UID filtering and wire up. Remove the
> > > > > > non-BPF UID filtering support.
> > > > >
> > > > > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > > > > Also non-root users will be limited unless it pinned the BPF program in
> > > > > advance.
> > > > >
> > > > > I think you can keep the original behavior and convert to BPF only when
> > > > > it's available.
> > > >
> > > > Using BPF when available would be limited progress. The issues I have
> > > > with not removing the existing approach are:
> > > >
> > > > 1) It is broken
> > > > Scanning /proc for pids and then doing perf_event_open means that any
> > > > time a process exits the perf_event_open fails.
> > > > Steps to reproduce:
> > > > This bug reproduces easily but if your machine is lightly loaded in
> > > > one terminal run `perf test`, in another terminal run `sudo perf top
> > > > -u $(id -u)` the perf top command will exit with:
> > > > ```
> > > > Error:
> > > > The sys_perf_event_open() syscall returned with 3 (No such process)
> > > > for event (cycles:P).
> > > > /bin/dmesg | grep -i perf may provide additional information.
> > > > ```
> > > >
> > > > 2) It is a work in progress that isn't progressing. Scanning /proc
> > > > will only tell you about started processes; it won't tell you about
> > > > processes that start during the profiling run, whether it be perf top
> > > > or perf record. Extra work would be necessary to make this most basic
> > > > of use-cases work, perhaps you could use tracepoints to capture
> > > > starting processes and then enable user profiling on those processes.
> > > > It would be horribly complicated, suffer from delays between observing
> > > > things happen and doing the perf_event_open, biases in the samples,
> > > > etc. I don't expect anyone to do it, especially when BPF filtering
> > > > already solves the problem better. There have been 13 years that
> > > > someone could have fixed it.
> > > >
> > > > 3) it adds significant useless complexity to the code base. Having
> > > > 'user' in the target makes it look like perf_event_open can work on a
> > > > pid, system wide or user basis. The user basis doesn't exist so the
> > > > majority of the code base is just ignoring it - search for users of
> > > > uid_str on target. Those that do run into problems (1) and (2).
> > > >
> > > > 4) It is redundant and leads to confusion with BPF filtering. Having
> > > > the BPF filter on evsels is non-zero cost in terms of the code base
> > > > complexity, but it is something broadly useful. As user filtering has
> > > > never worked (see above) it isn't broadly used but is adding
> > > > complexity. If both approaches were wanted then it is unclear what the
> > > > code should be doing, presumably the mish-mash of BPF filtering and
> > > > /proc scanning that happens today and will be broken due to (1) and
> > > > (2). Putting everything into the BPF filter makes sense as you can
> > > > combine a BPF filter with an additional BPF filter on user.
> > > >
> > > > 5) It is untested and adding a test leads to an always broken test due
> > > > to testing being an example of where things break, see point 1 and its
> > > > example.
> > > >
> > > > 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> > > > previous feature, maintaining it for no users is overhead. Let me know
> > > > if someone is using this option (I doubt it given points 1 and 2) and
> > > > they wouldn't be better served by BPF. People building perf today have
> > > > to explicitly opt-out of wanting BPF in their tooling. I posted this
> > > > change a month ago and nobody has jumped up saying please don't remove
> > > > the old approach.
> > > >
> > > > 7) The interaction with this feature and changes in behavior, say
> > > > ignoring events that fail to open, is non-obvious and not testable as
> > > > testing would be broken as the functionality itself is broken. Having
> > > > the broken feature hanging around and being untestable means that we
> > > > slow progress on new features, testing and other improvements.
> > > >
> > > > Yes, we could try to fix all of that but why? Nobody has cared or
> > > > tried in 13 years and I would like the tech debt off our plate with a
> > > > better approach in its place.
> > >
> > > Thanks for writing this up. I agree BPF approach is better but it has
> > > its own limitation - basically it requires root. And I know a few of
> > > people who don't use BPF. :) And maybe we need to check if user passes
> > > filters to the event already.
> >
> > I thought you were working on making the BPF filters pin-able? So root
> > could enable the filter but then users have access to it.
>
> Right, 'perf record --setup-filter pin' would do the job. But it has to
> be run in advance.
>
> > You have to be root to be looking at other users in any case.
>
> That's true. But at least you can profile your processes. :)
>
> >
> > > Also, I admit that I don't know who actually uses this. But I can say
> > > sometimes people uses tools in a creative way. Anyway, I can imagine
> > > an use case that system is in a steady state and each user has dedicated
> > > jobs. Then scanning /proc would work well.
> >
> > Another one for Google's tree then.
>
> Any chance you update the patchset to retain the old behavior and use
> BPF only if available?
The point of the series is:
1) get rid of unnecessary notions of target, the uid_str, it is extra
complexity and doesn't make sense;
2) switch the two users of if to BPF.
You are focussing on the thing that isn't the main point of the series.
Thanks,
Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-11 18:06 ` Ian Rogers
@ 2025-02-12 1:51 ` Namhyung Kim
2025-02-12 5:41 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-12 1:51 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Tue, Feb 11, 2025 at 10:06:34AM -0800, Ian Rogers wrote:
> On Tue, Feb 11, 2025 at 9:51 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Feb 10, 2025 at 08:40:01PM -0800, Ian Rogers wrote:
> > > On Mon, Feb 10, 2025 at 7:12 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> > > > > On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > > > > > helper for commands that support UID filtering and wire up. Remove the
> > > > > > > non-BPF UID filtering support.
> > > > > >
> > > > > > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > > > > > Also non-root users will be limited unless it pinned the BPF program in
> > > > > > advance.
> > > > > >
> > > > > > I think you can keep the original behavior and convert to BPF only when
> > > > > > it's available.
> > > > >
> > > > > Using BPF when available would be limited progress. The issues I have
> > > > > with not removing the existing approach are:
> > > > >
> > > > > 1) It is broken
> > > > > Scanning /proc for pids and then doing perf_event_open means that any
> > > > > time a process exits the perf_event_open fails.
> > > > > Steps to reproduce:
> > > > > This bug reproduces easily but if your machine is lightly loaded in
> > > > > one terminal run `perf test`, in another terminal run `sudo perf top
> > > > > -u $(id -u)` the perf top command will exit with:
> > > > > ```
> > > > > Error:
> > > > > The sys_perf_event_open() syscall returned with 3 (No such process)
> > > > > for event (cycles:P).
> > > > > /bin/dmesg | grep -i perf may provide additional information.
> > > > > ```
> > > > >
> > > > > 2) It is a work in progress that isn't progressing. Scanning /proc
> > > > > will only tell you about started processes; it won't tell you about
> > > > > processes that start during the profiling run, whether it be perf top
> > > > > or perf record. Extra work would be necessary to make this most basic
> > > > > of use-cases work, perhaps you could use tracepoints to capture
> > > > > starting processes and then enable user profiling on those processes.
> > > > > It would be horribly complicated, suffer from delays between observing
> > > > > things happen and doing the perf_event_open, biases in the samples,
> > > > > etc. I don't expect anyone to do it, especially when BPF filtering
> > > > > already solves the problem better. There have been 13 years that
> > > > > someone could have fixed it.
> > > > >
> > > > > 3) it adds significant useless complexity to the code base. Having
> > > > > 'user' in the target makes it look like perf_event_open can work on a
> > > > > pid, system wide or user basis. The user basis doesn't exist so the
> > > > > majority of the code base is just ignoring it - search for users of
> > > > > uid_str on target. Those that do run into problems (1) and (2).
> > > > >
> > > > > 4) It is redundant and leads to confusion with BPF filtering. Having
> > > > > the BPF filter on evsels is non-zero cost in terms of the code base
> > > > > complexity, but it is something broadly useful. As user filtering has
> > > > > never worked (see above) it isn't broadly used but is adding
> > > > > complexity. If both approaches were wanted then it is unclear what the
> > > > > code should be doing, presumably the mish-mash of BPF filtering and
> > > > > /proc scanning that happens today and will be broken due to (1) and
> > > > > (2). Putting everything into the BPF filter makes sense as you can
> > > > > combine a BPF filter with an additional BPF filter on user.
> > > > >
> > > > > 5) It is untested and adding a test leads to an always broken test due
> > > > > to testing being an example of where things break, see point 1 and its
> > > > > example.
> > > > >
> > > > > 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> > > > > previous feature, maintaining it for no users is overhead. Let me know
> > > > > if someone is using this option (I doubt it given points 1 and 2) and
> > > > > they wouldn't be better served by BPF. People building perf today have
> > > > > to explicitly opt-out of wanting BPF in their tooling. I posted this
> > > > > change a month ago and nobody has jumped up saying please don't remove
> > > > > the old approach.
> > > > >
> > > > > 7) The interaction with this feature and changes in behavior, say
> > > > > ignoring events that fail to open, is non-obvious and not testable as
> > > > > testing would be broken as the functionality itself is broken. Having
> > > > > the broken feature hanging around and being untestable means that we
> > > > > slow progress on new features, testing and other improvements.
> > > > >
> > > > > Yes, we could try to fix all of that but why? Nobody has cared or
> > > > > tried in 13 years and I would like the tech debt off our plate with a
> > > > > better approach in its place.
> > > >
> > > > Thanks for writing this up. I agree BPF approach is better but it has
> > > > its own limitation - basically it requires root. And I know a few of
> > > > people who don't use BPF. :) And maybe we need to check if user passes
> > > > filters to the event already.
> > >
> > > I thought you were working on making the BPF filters pin-able? So root
> > > could enable the filter but then users have access to it.
> >
> > Right, 'perf record --setup-filter pin' would do the job. But it has to
> > be run in advance.
> >
> > > You have to be root to be looking at other users in any case.
> >
> > That's true. But at least you can profile your processes. :)
> >
> > >
> > > > Also, I admit that I don't know who actually uses this. But I can say
> > > > sometimes people uses tools in a creative way. Anyway, I can imagine
> > > > an use case that system is in a steady state and each user has dedicated
> > > > jobs. Then scanning /proc would work well.
> > >
> > > Another one for Google's tree then.
> >
> > Any chance you update the patchset to retain the old behavior and use
> > BPF only if available?
>
> The point of the series is:
> 1) get rid of unnecessary notions of target, the uid_str, it is extra
> complexity and doesn't make sense;
> 2) switch the two users of if to BPF.
> You are focussing on the thing that isn't the main point of the series.
But you removed non-BPF and non-root (w/o pinning BPF) use cases.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-12 1:51 ` Namhyung Kim
@ 2025-02-12 5:41 ` Ian Rogers
2025-02-12 18:46 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-12 5:41 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Tue, Feb 11, 2025 at 5:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
> But you removed non-BPF and non-root (w/o pinning BPF) use cases.
I didn't think this was a hard series to understand. It moves the -u
options of perf top and perf record to using BPF filters. The many
reasons for this I already explained:
https://lore.kernel.org/lkml/CAP-5=fUY83gifsMZA0Q45kiQQbAKp2uJxkuwrwGtHK2hiUFRDA@mail.gmail.com/
Your case is a user that isn't exiting and starting processes and
wants to process themself or some other user they some how have
permissions for? They need to not be starting and exiting processes as
new processes are ignored and exiting processes cause the
perf_event_open to fail. What stops such a user passing the list of
processes they have that aren't starting and exiting as a -p option?
If you try something like:
$ perf top -p $(ps --no-headers -u $USER -o pid | awk '{printf "%s%s",
sep, $1; sep=","}')
this is exactly what you get. Does it work? No, the ps and awk
processes terminating but being in the list of processes causes the
perf_event_open for those pids to fail. This is exactly the same
problem as the -u option that you want to keep for this case. The
approach just doesn't work.
Why not make failing to add a -u option fallback on doing the /proc
scan and pretend the processes are a -p option? So now there's a
silent fallback to a broken behavior. New processes won't be profiled
and the data race between the scan and the perf_event_open will cause
failures with non-obvious causes/solutions - mainly, use sudo to run
as root. I'd say this isn't ideal.
This series fixes an option on two commands so that it works in the
sensible use-case, perf running with privileges trying to filter
samples belonging to a specific user. We can say the patch series
doesn't work for the case you give, I don't think anybody cares about
that case, they can get near identical behavior from -p, the behavior
from -p will be clearer than just having -u doing something similar,
namely failing to open for a process and terminate.
You may hate and ignore every point I make and still want to keep the
existing broken behavior. As I've already tried to make clear, you're
adding to the maintenance burden to everyone working in the code base
as the notion of target is fundamental and because you are insisting
on keeping a broken behavior you are also making it untestable. Given
the -u option doesn't work, I strongly suspect nobody uses it. Do I
worry about this series causing harm to the people who aren't using
the option? No I don't as there is a better opportunity in having an
option that (1) does work and (2) results in a simpler code base.
Thanks,
Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables
2025-01-11 19:01 ` [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables Ian Rogers
@ 2025-02-12 14:17 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 32+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-02-12 14:17 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Hao Ge,
James Clark, Howard Chu, Dominique Martinet, Levi Yun, Xu Yang,
Tengda Wu, Yang Jihong, linux-perf-users, linux-kernel
On Sat, Jan 11, 2025 at 11:01:34AM -0800, Ian Rogers wrote:
> Make 2 global variables local. Reduces ELF binary size by removing
> relocations. For a no flags build, the perf binary size is reduced by
> 4,144 bytes on x86-64.
I'm trying to reproduce your results:
$ gcc --version | head -1
gcc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3)
$
$ rm -rf /tmp/build/$(basename $PWD)/ ; mkdir -p /tmp/build/$(basename $PWD)/
$ make -k O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
Without your patch:
$ ls -la ~/bin/perf
-rwxr-xr-x. 2 acme acme 11411680 Feb 12 10:57 /home/acme/bin/perf
$ size ~/bin/perf
text data bss dec hex filename
10071297 302496 34540 10408333 9ed18d /home/acme/bin/perf
$
Then, with your patch:
$ git log --oneline -1
abd904389b3f0807 (HEAD -> perf-tools-next) perf bench evlist-open-close: Reduce scope of 2 variables
$ perf -v
perf version 6.13.rc2.gabd904389b3f
$ size ~/bin/perf
text data bss dec hex filename
10072001 301568 34540 10408109 9ed0ad /home/acme/bin/perf
$
$ ls -la ~/bin/perf
-rwxr-xr-x. 2 acme acme 11411632 Feb 12 11:02 /home/acme/bin/perf
$
So a more modest 224 bytes reduction in the perf binary size.
In the distant past several of these moves from global to local were
made, for instance:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d20deb64e0490ee9442b5181bc08a62d2cadcb90
I tried but didn't find the before/after effects on binary size...
Anyways,
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thanks,
- Arnaldo
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> tools/perf/bench/evlist-open-close.c | 42 +++++++++++++++-------------
> 1 file changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
> index 5a27691469ed..79cedcf94a39 100644
> --- a/tools/perf/bench/evlist-open-close.c
> +++ b/tools/perf/bench/evlist-open-close.c
> @@ -46,25 +46,6 @@ static struct record_opts opts = {
> .ctl_fd_ack = -1,
> };
>
> -static const struct option options[] = {
> - OPT_STRING('e', "event", &event_string, "event", "event selector. use 'perf list' to list available events"),
> - OPT_INTEGER('n', "nr-events", &nr_events,
> - "number of dummy events to create (default 1). If used with -e, it clones those events n times (1 = no change)"),
> - OPT_INTEGER('i', "iterations", &iterations, "Number of iterations used to compute average (default=100)"),
> - OPT_BOOLEAN('a', "all-cpus", &opts.target.system_wide, "system-wide collection from all CPUs"),
> - OPT_STRING('C', "cpu", &opts.target.cpu_list, "cpu", "list of cpus where to open events"),
> - OPT_STRING('p', "pid", &opts.target.pid, "pid", "record events on existing process id"),
> - OPT_STRING('t', "tid", &opts.target.tid, "tid", "record events on existing thread id"),
> - OPT_STRING('u', "uid", &opts.target.uid_str, "user", "user to profile"),
> - OPT_BOOLEAN(0, "per-thread", &opts.target.per_thread, "use per-thread mmaps"),
> - OPT_END()
> -};
> -
> -static const char *const bench_usage[] = {
> - "perf bench internals evlist-open-close <options>",
> - NULL
> -};
> -
> static int evlist__count_evsel_fds(struct evlist *evlist)
> {
> struct evsel *evsel;
> @@ -225,6 +206,29 @@ static char *bench__repeat_event_string(const char *evstr, int n)
>
> int bench_evlist_open_close(int argc, const char **argv)
> {
> + const struct option options[] = {
> + OPT_STRING('e', "event", &event_string, "event",
> + "event selector. use 'perf list' to list available events"),
> + OPT_INTEGER('n', "nr-events", &nr_events,
> + "number of dummy events to create (default 1). If used with -e, it clones those events n times (1 = no change)"),
> + OPT_INTEGER('i', "iterations", &iterations,
> + "Number of iterations used to compute average (default=100)"),
> + OPT_BOOLEAN('a', "all-cpus", &opts.target.system_wide,
> + "system-wide collection from all CPUs"),
> + OPT_STRING('C', "cpu", &opts.target.cpu_list, "cpu",
> + "list of cpus where to open events"),
> + OPT_STRING('p', "pid", &opts.target.pid, "pid",
> + "record events on existing process id"),
> + OPT_STRING('t', "tid", &opts.target.tid, "tid",
> + "record events on existing thread id"),
> + OPT_STRING('u', "uid", &opts.target.uid_str, "user", "user to profile"),
> + OPT_BOOLEAN(0, "per-thread", &opts.target.per_thread, "use per-thread mmaps"),
> + OPT_END()
> + };
> + const char *const bench_usage[] = {
> + "perf bench internals evlist-open-close <options>",
> + NULL
> + };
> char *evstr, errbuf[BUFSIZ];
> int err;
>
> --
> 2.47.1.613.gc27f4b7a9f-goog
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu
2025-01-11 19:01 ` [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu Ian Rogers
@ 2025-02-12 14:51 ` Arnaldo Carvalho de Melo
2025-02-12 16:11 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-02-12 14:51 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Hao Ge,
James Clark, Howard Chu, Dominique Martinet, Levi Yun, Xu Yang,
Tengda Wu, Yang Jihong, linux-perf-users, linux-kernel
On Sat, Jan 11, 2025 at 11:01:35AM -0800, Ian Rogers wrote:
> Rather than manually scanning PMUs, use evsel__find_pmu that can use
> the PMU set during event parsing.
Right, and then evsel__find_pmu() also does some extra checks to call
pmu_read_sysfs() more selectively, right?
- Arnaldo
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> tools/perf/util/parse-events.c | 14 ++++----------
> 1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 1e23faa364b1..f147e13a7017 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -2406,9 +2406,8 @@ foreach_evsel_in_last_glob(struct evlist *evlist,
> static int set_filter(struct evsel *evsel, const void *arg)
> {
> const char *str = arg;
> - bool found = false;
> int nr_addr_filters = 0;
> - struct perf_pmu *pmu = NULL;
> + struct perf_pmu *pmu;
>
> if (evsel == NULL) {
> fprintf(stderr,
> @@ -2426,16 +2425,11 @@ static int set_filter(struct evsel *evsel, const void *arg)
> return 0;
> }
>
> - while ((pmu = perf_pmus__scan(pmu)) != NULL)
> - if (pmu->type == evsel->core.attr.type) {
> - found = true;
> - break;
> - }
> -
> - if (found)
> + pmu = evsel__find_pmu(evsel);
> + if (pmu) {
> perf_pmu__scan_file(pmu, "nr_addr_filters",
> "%d", &nr_addr_filters);
> -
> + }
> if (!nr_addr_filters)
> return perf_bpf_filter__parse(&evsel->bpf_filters, str);
>
> --
> 2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu
2025-02-12 14:51 ` Arnaldo Carvalho de Melo
@ 2025-02-12 16:11 ` Ian Rogers
0 siblings, 0 replies; 32+ messages in thread
From: Ian Rogers @ 2025-02-12 16:11 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Hao Ge,
James Clark, Howard Chu, Dominique Martinet, Levi Yun, Xu Yang,
Tengda Wu, Yang Jihong, linux-perf-users, linux-kernel
On Wed, Feb 12, 2025 at 6:51 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> On Sat, Jan 11, 2025 at 11:01:35AM -0800, Ian Rogers wrote:
> > Rather than manually scanning PMUs, use evsel__find_pmu that can use
> > the PMU set during event parsing.
>
> Right, and then evsel__find_pmu() also does some extra checks to call
> pmu_read_sysfs() more selectively, right?
Right, but the pmu should already be initialized by parse_events so no
scanning should be necessary:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n759
You are right that the fall back to perf_pmus__find_by_type is more
selective in what it scans:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n302
First seeing if the PMU is already loaded then just loading the PMUs
relevant to the type number.
Thanks,
Ian
> - Arnaldo
>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/perf/util/parse-events.c | 14 ++++----------
> > 1 file changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 1e23faa364b1..f147e13a7017 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -2406,9 +2406,8 @@ foreach_evsel_in_last_glob(struct evlist *evlist,
> > static int set_filter(struct evsel *evsel, const void *arg)
> > {
> > const char *str = arg;
> > - bool found = false;
> > int nr_addr_filters = 0;
> > - struct perf_pmu *pmu = NULL;
> > + struct perf_pmu *pmu;
> >
> > if (evsel == NULL) {
> > fprintf(stderr,
> > @@ -2426,16 +2425,11 @@ static int set_filter(struct evsel *evsel, const void *arg)
> > return 0;
> > }
> >
> > - while ((pmu = perf_pmus__scan(pmu)) != NULL)
> > - if (pmu->type == evsel->core.attr.type) {
> > - found = true;
> > - break;
> > - }
> > -
> > - if (found)
> > + pmu = evsel__find_pmu(evsel);
> > + if (pmu) {
> > perf_pmu__scan_file(pmu, "nr_addr_filters",
> > "%d", &nr_addr_filters);
> > -
> > + }
> > if (!nr_addr_filters)
> > return perf_bpf_filter__parse(&evsel->bpf_filters, str);
> >
> > --
> > 2.47.1.613.gc27f4b7a9f-goog
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-12 5:41 ` Ian Rogers
@ 2025-02-12 18:46 ` Namhyung Kim
2025-02-12 20:00 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-12 18:46 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Tue, Feb 11, 2025 at 09:41:04PM -0800, Ian Rogers wrote:
> On Tue, Feb 11, 2025 at 5:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > But you removed non-BPF and non-root (w/o pinning BPF) use cases.
>
> I didn't think this was a hard series to understand. It moves the -u
> options of perf top and perf record to using BPF filters. The many
> reasons for this I already explained:
> https://lore.kernel.org/lkml/CAP-5=fUY83gifsMZA0Q45kiQQbAKp2uJxkuwrwGtHK2hiUFRDA@mail.gmail.com/
>
> Your case is a user that isn't exiting and starting processes and
> wants to process themself or some other user they some how have
> permissions for? They need to not be starting and exiting processes as
> new processes are ignored and exiting processes cause the
> perf_event_open to fail. What stops such a user passing the list of
> processes they have that aren't starting and exiting as a -p option?
>
> If you try something like:
> $ perf top -p $(ps --no-headers -u $USER -o pid | awk '{printf "%s%s",
> sep, $1; sep=","}')
> this is exactly what you get. Does it work? No, the ps and awk
> processes terminating but being in the list of processes causes the
> perf_event_open for those pids to fail. This is exactly the same
> problem as the -u option that you want to keep for this case. The
> approach just doesn't work.
>
> Why not make failing to add a -u option fallback on doing the /proc
> scan and pretend the processes are a -p option? So now there's a
> silent fallback to a broken behavior. New processes won't be profiled
> and the data race between the scan and the perf_event_open will cause
> failures with non-obvious causes/solutions - mainly, use sudo to run
> as root. I'd say this isn't ideal.
>
> This series fixes an option on two commands so that it works in the
> sensible use-case, perf running with privileges trying to filter
> samples belonging to a specific user. We can say the patch series
> doesn't work for the case you give, I don't think anybody cares about
> that case, they can get near identical behavior from -p, the behavior
> from -p will be clearer than just having -u doing something similar,
> namely failing to open for a process and terminate.
>
> You may hate and ignore every point I make and still want to keep the
> existing broken behavior. As I've already tried to make clear, you're
> adding to the maintenance burden to everyone working in the code base
> as the notion of target is fundamental and because you are insisting
> on keeping a broken behavior you are also making it untestable. Given
> the -u option doesn't work, I strongly suspect nobody uses it. Do I
> worry about this series causing harm to the people who aren't using
> the option? No I don't as there is a better opportunity in having an
> option that (1) does work and (2) results in a simpler code base.
It's not completely broken and works sometimes. And it seems we have an
issue with BPF sideband events. But it worked when you ran it as root.
$ sudo perf record -u $(id -u) --no-bpf-event -- sleep 1
WARNING: Ignored open failure for pid 404758
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.754 MB perf.data (3638 samples) ]
$ sudo perf report -s sym --stdio -q | head
1.43% [k] audit_filter_rules.isra.0
1.33% [.] pthread_mutex_lock@@GLIBC_2.2.5
1.06% [k] __audit_filter_op
1.05% [.] __vdso_clock_gettime
0.94% [.] _dbus_type_reader_get_current_type
0.82% [.] pthread_mutex_trylock@@GLIBC_2.34
0.76% [k] __fdget
0.72% [.] _dbus_first_type_in_signature
0.61% [.] __GI___pthread_mutex_unlock_usercnt
0.56% [k] native_sched_clock
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-12 18:46 ` Namhyung Kim
@ 2025-02-12 20:00 ` Ian Rogers
2025-02-12 22:56 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-12 20:00 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> It's not completely broken and works sometimes.
No this is the definition of completely broken. If it only works
sometimes then you can't use it, we can't put a test on it, there is
no point in it. Even when it doesn't fail on perf_event_open, does it
work for processes that start after /proc is scanned? No, it is
completely broken.
Thanks,
Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-12 20:00 ` Ian Rogers
@ 2025-02-12 22:56 ` Namhyung Kim
2025-02-12 23:17 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-12 22:56 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Wed, Feb 12, 2025 at 12:00:42PM -0800, Ian Rogers wrote:
> On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > It's not completely broken and works sometimes.
>
> No this is the definition of completely broken. If it only works
> sometimes then you can't use it, we can't put a test on it, there is
> no point in it. Even when it doesn't fail on perf_event_open, does it
> work for processes that start after /proc is scanned? No, it is
> completely broken.
Ok, we have a different definition for it. Let's ignore the imaginary
users of the broken features. Have you added a test for this change?
Anyway I've tested your change and found some issues:
1. It silently exited when BPF-skel is not built. Better to put some
messages at least.
$ sudo ./perf record -u $(id -u) -- sleep 1
2. Even with BPF-skel, perf record doesn't work well. It did something
but failed to get sample data for some reason.
$ sudo ./perf record -u $(id -u) -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.045 MB perf.data ]
Oh, I think you now need to pass -a because it now works in
system-wide mode and drops samples for other users.
3. With BPF-skel, non-root users will see this.
$ ./perf record -u $(id -u) -- sleep 1
cannot get fd for 'filters' map
failed to set filter "BPF" on event cycles:P with 13 (Permission denied)
I think it's confusing and better to tell user that you need to run
'perf record --setup-filter pin' as root first. But maybe due to the
issue #2, you still need to run it as root.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-12 22:56 ` Namhyung Kim
@ 2025-02-12 23:17 ` Ian Rogers
2025-02-13 1:44 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-12 23:17 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Wed, Feb 12, 2025 at 2:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Feb 12, 2025 at 12:00:42PM -0800, Ian Rogers wrote:
> > On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > It's not completely broken and works sometimes.
> >
> > No this is the definition of completely broken. If it only works
> > sometimes then you can't use it, we can't put a test on it, there is
> > no point in it. Even when it doesn't fail on perf_event_open, does it
> > work for processes that start after /proc is scanned? No, it is
> > completely broken.
>
> Ok, we have a different definition for it. Let's ignore the imaginary
> users of the broken features. Have you added a test for this change?
>
> Anyway I've tested your change and found some issues:
>
> 1. It silently exited when BPF-skel is not built. Better to put some
> messages at least.
>
> $ sudo ./perf record -u $(id -u) -- sleep 1
>
> 2. Even with BPF-skel, perf record doesn't work well. It did something
> but failed to get sample data for some reason.
>
> $ sudo ./perf record -u $(id -u) -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.045 MB perf.data ]
>
> Oh, I think you now need to pass -a because it now works in
> system-wide mode and drops samples for other users.
This is a pre-existing problem, no?
Thanks,
Ian
> 3. With BPF-skel, non-root users will see this.
>
> $ ./perf record -u $(id -u) -- sleep 1
> cannot get fd for 'filters' map
> failed to set filter "BPF" on event cycles:P with 13 (Permission denied)
>
> I think it's confusing and better to tell user that you need to run
> 'perf record --setup-filter pin' as root first. But maybe due to the
> issue #2, you still need to run it as root.
>
> Thanks,
> Namhyung
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-12 23:17 ` Ian Rogers
@ 2025-02-13 1:44 ` Namhyung Kim
2025-02-13 7:27 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-13 1:44 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Wed, Feb 12, 2025 at 03:17:35PM -0800, Ian Rogers wrote:
> On Wed, Feb 12, 2025 at 2:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Feb 12, 2025 at 12:00:42PM -0800, Ian Rogers wrote:
> > > On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > It's not completely broken and works sometimes.
> > >
> > > No this is the definition of completely broken. If it only works
> > > sometimes then you can't use it, we can't put a test on it, there is
> > > no point in it. Even when it doesn't fail on perf_event_open, does it
> > > work for processes that start after /proc is scanned? No, it is
> > > completely broken.
> >
> > Ok, we have a different definition for it. Let's ignore the imaginary
> > users of the broken features. Have you added a test for this change?
> >
> > Anyway I've tested your change and found some issues:
> >
> > 1. It silently exited when BPF-skel is not built. Better to put some
> > messages at least.
> >
> > $ sudo ./perf record -u $(id -u) -- sleep 1
> >
> > 2. Even with BPF-skel, perf record doesn't work well. It did something
> > but failed to get sample data for some reason.
> >
> > $ sudo ./perf record -u $(id -u) -- sleep 1
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.045 MB perf.data ]
> >
> > Oh, I think you now need to pass -a because it now works in
> > system-wide mode and drops samples for other users.
>
> This is a pre-existing problem, no?
No, it worked without -a in the past. Please see my previous reply.
I think -u/--uid is one of the supported target in the perf tool (not
for the system call) and it used to disable system-wide mode if -u is
used at the same time.
Thanks,
Namhyung
>
> > 3. With BPF-skel, non-root users will see this.
> >
> > $ ./perf record -u $(id -u) -- sleep 1
> > cannot get fd for 'filters' map
> > failed to set filter "BPF" on event cycles:P with 13 (Permission denied)
> >
> > I think it's confusing and better to tell user that you need to run
> > 'perf record --setup-filter pin' as root first. But maybe due to the
> > issue #2, you still need to run it as root.
> >
> > Thanks,
> > Namhyung
> >
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-13 1:44 ` Namhyung Kim
@ 2025-02-13 7:27 ` Ian Rogers
2025-02-13 17:47 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-13 7:27 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Wed, Feb 12, 2025 at 5:44 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Feb 12, 2025 at 03:17:35PM -0800, Ian Rogers wrote:
> > On Wed, Feb 12, 2025 at 2:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, Feb 12, 2025 at 12:00:42PM -0800, Ian Rogers wrote:
> > > > On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > It's not completely broken and works sometimes.
> > > >
> > > > No this is the definition of completely broken. If it only works
> > > > sometimes then you can't use it, we can't put a test on it, there is
> > > > no point in it. Even when it doesn't fail on perf_event_open, does it
> > > > work for processes that start after /proc is scanned? No, it is
> > > > completely broken.
> > >
> > > Ok, we have a different definition for it. Let's ignore the imaginary
> > > users of the broken features. Have you added a test for this change?
> > >
> > > Anyway I've tested your change and found some issues:
> > >
> > > 1. It silently exited when BPF-skel is not built. Better to put some
> > > messages at least.
> > >
> > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > >
> > > 2. Even with BPF-skel, perf record doesn't work well. It did something
> > > but failed to get sample data for some reason.
> > >
> > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.045 MB perf.data ]
> > >
> > > Oh, I think you now need to pass -a because it now works in
> > > system-wide mode and drops samples for other users.
> >
> > This is a pre-existing problem, no?
>
> No, it worked without -a in the past. Please see my previous reply.
> I think -u/--uid is one of the supported target in the perf tool (not
> for the system call) and it used to disable system-wide mode if -u is
> used at the same time.
As I keep repeating the '-u' option has never worked in the past, it
either early exited or missed recording new processes.
The documentation for `perf record` is:
```
...
-a, --all-cpus
System-wide collection from all CPUs (default if no target
is specified).
...
-u, --uid=
Record events in threads owned by uid. Name or number.
...
```
So `-a` is implied without a target/workload but you are specifying a
workload and `-u`. So do you want samples from the workload or from
the user's processes? I can tell from your command line being sleep
that you want it to imply `-a` but would that be true if it were a
benchmark? For the benchmark case you probably don't want `-a`
implied. If you specify `-p` and `-u` should the processes that don't
match -u be sampled or are you expecting implicit system wide
profiling now?
I agree the behavior in the patch series doesn't match the existing
behavior, but I'm not sure the existing behavior agrees with the
documentation nor working as expected. Having the `-u` not select
system wide, as in the patch, seems to agree with the documentation
better. You have specified a target workload/process/eventual pid and
you want samples owned by the uid, why should you now start getting
samples from other processes? With `-p` you wouldn't expect `-a` to
suddenly get implied, I'm not sure it makes any more sense with any
target/workload.
> > > 3. With BPF-skel, non-root users will see this.
> > >
> > > $ ./perf record -u $(id -u) -- sleep 1
> > > cannot get fd for 'filters' map
> > > failed to set filter "BPF" on event cycles:P with 13 (Permission denied)
> > >
> > > I think it's confusing and better to tell user that you need to run
> > > 'perf record --setup-filter pin' as root first. But maybe due to the
> > > issue #2, you still need to run it as root.
I think that is an okay addition to BPF filters in general. I'm wary
of having patch series feature crept into requiring the entire tool
being reimplemented.
Thanks,
Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-13 7:27 ` Ian Rogers
@ 2025-02-13 17:47 ` Namhyung Kim
2025-02-13 18:13 ` Ian Rogers
0 siblings, 1 reply; 32+ messages in thread
From: Namhyung Kim @ 2025-02-13 17:47 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Wed, Feb 12, 2025 at 11:27:38PM -0800, Ian Rogers wrote:
> On Wed, Feb 12, 2025 at 5:44 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Feb 12, 2025 at 03:17:35PM -0800, Ian Rogers wrote:
> > > On Wed, Feb 12, 2025 at 2:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, Feb 12, 2025 at 12:00:42PM -0800, Ian Rogers wrote:
> > > > > On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > It's not completely broken and works sometimes.
> > > > >
> > > > > No this is the definition of completely broken. If it only works
> > > > > sometimes then you can't use it, we can't put a test on it, there is
> > > > > no point in it. Even when it doesn't fail on perf_event_open, does it
> > > > > work for processes that start after /proc is scanned? No, it is
> > > > > completely broken.
> > > >
> > > > Ok, we have a different definition for it. Let's ignore the imaginary
> > > > users of the broken features. Have you added a test for this change?
> > > >
> > > > Anyway I've tested your change and found some issues:
> > > >
> > > > 1. It silently exited when BPF-skel is not built. Better to put some
> > > > messages at least.
> > > >
> > > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > > >
> > > > 2. Even with BPF-skel, perf record doesn't work well. It did something
> > > > but failed to get sample data for some reason.
> > > >
> > > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 0.045 MB perf.data ]
> > > >
> > > > Oh, I think you now need to pass -a because it now works in
> > > > system-wide mode and drops samples for other users.
> > >
> > > This is a pre-existing problem, no?
> >
> > No, it worked without -a in the past. Please see my previous reply.
> > I think -u/--uid is one of the supported target in the perf tool (not
> > for the system call) and it used to disable system-wide mode if -u is
> > used at the same time.
>
> As I keep repeating the '-u' option has never worked in the past, it
> either early exited or missed recording new processes.
I agree the early exit of existing process is a problem of '-u'. But
I'm not sure it would miss all new processes. IIUC '-u' is a shortcut
for '-p' with all processes belong to the user. That means it'll have
perf_event with attr.inherit set then it can track new processes from
them. Of course, there's a race between /proc scanning and process
creation so it will miss some. But even '-p' has the same race.
>
> The documentation for `perf record` is:
> ```
> ...
> -a, --all-cpus
> System-wide collection from all CPUs (default if no target
> is specified).
> ...
> -u, --uid=
> Record events in threads owned by uid. Name or number.
> ...
> ```
> So `-a` is implied without a target/workload but you are specifying a
> workload and `-u`. So do you want samples from the workload or from
> the user's processes? I can tell from your command line being sleep
> that you want it to imply `-a` but would that be true if it were a
> benchmark? For the benchmark case you probably don't want `-a`
> implied. If you specify `-p` and `-u` should the processes that don't
> match -u be sampled or are you expecting implicit system wide
> profiling now?
I don't follow. You cannot specify two targets at the same time. If
you use -a and -u together currently, you will see:
$ perf record -a -u $(id -u)
Warning:
UID switch overriding SYSTEM
And for -p and -u:
$ perf record -p 1 -u $(id -u)
Warning:
PID/TID switch overriding UID
And specifying a target and a workload means it will profile the target
while the workload is running.
$ perf record -u $(id -u) -- sleep 1
So the above command will profile all processes belong to me for 1
second. As you're doing system wide for 1 second with below command.
$ perf record -a -- sleep 1
You can change sleep with any other workload but it's not the target of
the profile unless it matches to the target specifically.
>
> I agree the behavior in the patch series doesn't match the existing
> behavior, but I'm not sure the existing behavior agrees with the
> documentation nor working as expected. Having the `-u` not select
> system wide, as in the patch, seems to agree with the documentation
> better. You have specified a target workload/process/eventual pid and
> you want samples owned by the uid, why should you now start getting
> samples from other processes? With `-p` you wouldn't expect `-a` to
> suddenly get implied, I'm not sure it makes any more sense with any
> target/workload.
As I said, the existing behavior doesn't imply system wide. But your
change now requires it.
Thanks,
Namhyung
>
> > > > 3. With BPF-skel, non-root users will see this.
> > > >
> > > > $ ./perf record -u $(id -u) -- sleep 1
> > > > cannot get fd for 'filters' map
> > > > failed to set filter "BPF" on event cycles:P with 13 (Permission denied)
> > > >
> > > > I think it's confusing and better to tell user that you need to run
> > > > 'perf record --setup-filter pin' as root first. But maybe due to the
> > > > issue #2, you still need to run it as root.
>
> I think that is an okay addition to BPF filters in general. I'm wary
> of having patch series feature crept into requiring the entire tool
> being reimplemented.
>
> Thanks,
> Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-13 17:47 ` Namhyung Kim
@ 2025-02-13 18:13 ` Ian Rogers
2025-02-13 18:59 ` Namhyung Kim
0 siblings, 1 reply; 32+ messages in thread
From: Ian Rogers @ 2025-02-13 18:13 UTC (permalink / raw)
To: Namhyung Kim
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Thu, Feb 13, 2025 at 9:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Feb 12, 2025 at 11:27:38PM -0800, Ian Rogers wrote:
> > On Wed, Feb 12, 2025 at 5:44 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, Feb 12, 2025 at 03:17:35PM -0800, Ian Rogers wrote:
> > > > On Wed, Feb 12, 2025 at 2:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Wed, Feb 12, 2025 at 12:00:42PM -0800, Ian Rogers wrote:
> > > > > > On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > > It's not completely broken and works sometimes.
> > > > > >
> > > > > > No this is the definition of completely broken. If it only works
> > > > > > sometimes then you can't use it, we can't put a test on it, there is
> > > > > > no point in it. Even when it doesn't fail on perf_event_open, does it
> > > > > > work for processes that start after /proc is scanned? No, it is
> > > > > > completely broken.
> > > > >
> > > > > Ok, we have a different definition for it. Let's ignore the imaginary
> > > > > users of the broken features. Have you added a test for this change?
> > > > >
> > > > > Anyway I've tested your change and found some issues:
> > > > >
> > > > > 1. It silently exited when BPF-skel is not built. Better to put some
> > > > > messages at least.
> > > > >
> > > > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > > > >
> > > > > 2. Even with BPF-skel, perf record doesn't work well. It did something
> > > > > but failed to get sample data for some reason.
> > > > >
> > > > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 0.045 MB perf.data ]
> > > > >
> > > > > Oh, I think you now need to pass -a because it now works in
> > > > > system-wide mode and drops samples for other users.
> > > >
> > > > This is a pre-existing problem, no?
> > >
> > > No, it worked without -a in the past. Please see my previous reply.
> > > I think -u/--uid is one of the supported target in the perf tool (not
> > > for the system call) and it used to disable system-wide mode if -u is
> > > used at the same time.
> >
> > As I keep repeating the '-u' option has never worked in the past, it
> > either early exited or missed recording new processes.
>
> I agree the early exit of existing process is a problem of '-u'. But
> I'm not sure it would miss all new processes. IIUC '-u' is a shortcut
> for '-p' with all processes belong to the user. That means it'll have
> perf_event with attr.inherit set then it can track new processes from
> them. Of course, there's a race between /proc scanning and process
> creation so it will miss some. But even '-p' has the same race.
>
> >
> > The documentation for `perf record` is:
> > ```
> > ...
> > -a, --all-cpus
> > System-wide collection from all CPUs (default if no target
> > is specified).
> > ...
> > -u, --uid=
> > Record events in threads owned by uid. Name or number.
> > ...
> > ```
> > So `-a` is implied without a target/workload but you are specifying a
> > workload and `-u`. So do you want samples from the workload or from
> > the user's processes? I can tell from your command line being sleep
> > that you want it to imply `-a` but would that be true if it were a
> > benchmark? For the benchmark case you probably don't want `-a`
> > implied. If you specify `-p` and `-u` should the processes that don't
> > match -u be sampled or are you expecting implicit system wide
> > profiling now?
>
> I don't follow. You cannot specify two targets at the same time. If
> you use -a and -u together currently, you will see:
>
> $ perf record -a -u $(id -u)
> Warning:
> UID switch overriding SYSTEM
>
> And for -p and -u:
>
> $ perf record -p 1 -u $(id -u)
> Warning:
> PID/TID switch overriding UID
>
> And specifying a target and a workload means it will profile the target
> while the workload is running.
>
> $ perf record -u $(id -u) -- sleep 1
>
> So the above command will profile all processes belong to me for 1
> second. As you're doing system wide for 1 second with below command.
>
> $ perf record -a -- sleep 1
>
> You can change sleep with any other workload but it's not the target of
> the profile unless it matches to the target specifically.
>
> >
> > I agree the behavior in the patch series doesn't match the existing
> > behavior, but I'm not sure the existing behavior agrees with the
> > documentation nor working as expected. Having the `-u` not select
> > system wide, as in the patch, seems to agree with the documentation
> > better. You have specified a target workload/process/eventual pid and
> > you want samples owned by the uid, why should you now start getting
> > samples from other processes? With `-p` you wouldn't expect `-a` to
> > suddenly get implied, I'm not sure it makes any more sense with any
> > target/workload.
>
> As I said, the existing behavior doesn't imply system wide. But your
> change now requires it.
The existing behavior doesn't work. If we can't put a test on it, it
doesn't work.
I'm asking what should the behavior be? You are asking to match
something that doesn't work, why? Pointing out warnings ignores my
question, you just seem to be out to prove you can find fault with the
patch series as-is.
The major use-case for `-u` I find being perf top and not perf record,
largely because if you are trying to do perf record you are less
tolerant to the brokeness that is the existing -u behavior. perf top
doesn't have a workload and system wide is implied.
Should the behavior be with -u to imply system wide or should it be a
filter to the user's other options or should we scan /proc and have
that brokenness again? The latter has been your expressed preference
as you don't see -u as broken.
Thanks,
Ian
>
> >
> > > > > 3. With BPF-skel, non-root users will see this.
> > > > >
> > > > > $ ./perf record -u $(id -u) -- sleep 1
> > > > > cannot get fd for 'filters' map
> > > > > failed to set filter "BPF" on event cycles:P with 13 (Permission denied)
> > > > >
> > > > > I think it's confusing and better to tell user that you need to run
> > > > > 'perf record --setup-filter pin' as root first. But maybe due to the
> > > > > issue #2, you still need to run it as root.
> >
> > I think that is an okay addition to BPF filters in general. I'm wary
> > of having patch series feature crept into requiring the entire tool
> > being reimplemented.
> >
> > Thanks,
> > Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v1 00/10] Move uid filtering to BPF filters
2025-02-13 18:13 ` Ian Rogers
@ 2025-02-13 18:59 ` Namhyung Kim
0 siblings, 0 replies; 32+ messages in thread
From: Namhyung Kim @ 2025-02-13 18:59 UTC (permalink / raw)
To: Ian Rogers
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Kan Liang, Hao Ge, James Clark, Howard Chu, Dominique Martinet,
Levi Yun, Xu Yang, Tengda Wu, Yang Jihong, linux-perf-users,
linux-kernel
On Thu, Feb 13, 2025 at 10:13:57AM -0800, Ian Rogers wrote:
> On Thu, Feb 13, 2025 at 9:47 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Feb 12, 2025 at 11:27:38PM -0800, Ian Rogers wrote:
> > > On Wed, Feb 12, 2025 at 5:44 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, Feb 12, 2025 at 03:17:35PM -0800, Ian Rogers wrote:
> > > > > On Wed, Feb 12, 2025 at 2:56 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > On Wed, Feb 12, 2025 at 12:00:42PM -0800, Ian Rogers wrote:
> > > > > > > On Wed, Feb 12, 2025 at 10:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > > > It's not completely broken and works sometimes.
> > > > > > >
> > > > > > > No this is the definition of completely broken. If it only works
> > > > > > > sometimes then you can't use it, we can't put a test on it, there is
> > > > > > > no point in it. Even when it doesn't fail on perf_event_open, does it
> > > > > > > work for processes that start after /proc is scanned? No, it is
> > > > > > > completely broken.
> > > > > >
> > > > > > Ok, we have a different definition for it. Let's ignore the imaginary
> > > > > > users of the broken features. Have you added a test for this change?
> > > > > >
> > > > > > Anyway I've tested your change and found some issues:
> > > > > >
> > > > > > 1. It silently exited when BPF-skel is not built. Better to put some
> > > > > > messages at least.
> > > > > >
> > > > > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > > > > >
> > > > > > 2. Even with BPF-skel, perf record doesn't work well. It did something
> > > > > > but failed to get sample data for some reason.
> > > > > >
> > > > > > $ sudo ./perf record -u $(id -u) -- sleep 1
> > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > [ perf record: Captured and wrote 0.045 MB perf.data ]
> > > > > >
> > > > > > Oh, I think you now need to pass -a because it now works in
> > > > > > system-wide mode and drops samples for other users.
> > > > >
> > > > > This is a pre-existing problem, no?
> > > >
> > > > No, it worked without -a in the past. Please see my previous reply.
> > > > I think -u/--uid is one of the supported target in the perf tool (not
> > > > for the system call) and it used to disable system-wide mode if -u is
> > > > used at the same time.
> > >
> > > As I keep repeating the '-u' option has never worked in the past, it
> > > either early exited or missed recording new processes.
> >
> > I agree the early exit of existing process is a problem of '-u'. But
> > I'm not sure it would miss all new processes. IIUC '-u' is a shortcut
> > for '-p' with all processes belong to the user. That means it'll have
> > perf_event with attr.inherit set then it can track new processes from
> > them. Of course, there's a race between /proc scanning and process
> > creation so it will miss some. But even '-p' has the same race.
> >
> > >
> > > The documentation for `perf record` is:
> > > ```
> > > ...
> > > -a, --all-cpus
> > > System-wide collection from all CPUs (default if no target
> > > is specified).
> > > ...
> > > -u, --uid=
> > > Record events in threads owned by uid. Name or number.
> > > ...
> > > ```
> > > So `-a` is implied without a target/workload but you are specifying a
> > > workload and `-u`. So do you want samples from the workload or from
> > > the user's processes? I can tell from your command line being sleep
> > > that you want it to imply `-a` but would that be true if it were a
> > > benchmark? For the benchmark case you probably don't want `-a`
> > > implied. If you specify `-p` and `-u` should the processes that don't
> > > match -u be sampled or are you expecting implicit system wide
> > > profiling now?
> >
> > I don't follow. You cannot specify two targets at the same time. If
> > you use -a and -u together currently, you will see:
> >
> > $ perf record -a -u $(id -u)
> > Warning:
> > UID switch overriding SYSTEM
> >
> > And for -p and -u:
> >
> > $ perf record -p 1 -u $(id -u)
> > Warning:
> > PID/TID switch overriding UID
> >
> > And specifying a target and a workload means it will profile the target
> > while the workload is running.
> >
> > $ perf record -u $(id -u) -- sleep 1
> >
> > So the above command will profile all processes belong to me for 1
> > second. As you're doing system wide for 1 second with below command.
> >
> > $ perf record -a -- sleep 1
> >
> > You can change sleep with any other workload but it's not the target of
> > the profile unless it matches to the target specifically.
> >
> > >
> > > I agree the behavior in the patch series doesn't match the existing
> > > behavior, but I'm not sure the existing behavior agrees with the
> > > documentation nor working as expected. Having the `-u` not select
> > > system wide, as in the patch, seems to agree with the documentation
> > > better. You have specified a target workload/process/eventual pid and
> > > you want samples owned by the uid, why should you now start getting
> > > samples from other processes? With `-p` you wouldn't expect `-a` to
> > > suddenly get implied, I'm not sure it makes any more sense with any
> > > target/workload.
> >
> > As I said, the existing behavior doesn't imply system wide. But your
> > change now requires it.
>
> The existing behavior doesn't work. If we can't put a test on it, it
> doesn't work.
Ok, then I'd be happy to see a test case with this series.
>
> I'm asking what should the behavior be? You are asking to match
> something that doesn't work, why? Pointing out warnings ignores my
> question, you just seem to be out to prove you can find fault with the
> patch series as-is.
Sorry if you felt that way. I just pointed out the behavior changes in
the series. You can freely use BPF filters if the condition is met.
Otherwise it can fallback to the existing behavior or at least fail with
a message explaining the situation and hopefully how to fix it.
>
> The major use-case for `-u` I find being perf top and not perf record,
> largely because if you are trying to do perf record you are less
> tolerant to the brokeness that is the existing -u behavior. perf top
> doesn't have a workload and system wide is implied.
I agree it'd be useful to perf top more than perf record. And it will
be run as root mostly so it should be good to go. But that doesn't mean
you can leave perf record in a broken state.
>
> Should the behavior be with -u to imply system wide or should it be a
> filter to the user's other options or should we scan /proc and have
> that brokenness again? The latter has been your expressed preference
> as you don't see -u as broken.
I think it's natural to imply system-wide. And as I pointed out, it's
not (used to be) possible to mix -u with other target options.
But it now requires higher privilege and/or perf_event_paranoid.
Thanks,
Namhyung
> >
> > >
> > > > > > 3. With BPF-skel, non-root users will see this.
> > > > > >
> > > > > > $ ./perf record -u $(id -u) -- sleep 1
> > > > > > cannot get fd for 'filters' map
> > > > > > failed to set filter "BPF" on event cycles:P with 13 (Permission denied)
> > > > > >
> > > > > > I think it's confusing and better to tell user that you need to run
> > > > > > 'perf record --setup-filter pin' as root first. But maybe due to the
> > > > > > issue #2, you still need to run it as root.
> > >
> > > I think that is an okay addition to BPF filters in general. I'm wary
> > > of having patch series feature crept into requiring the entire tool
> > > being reimplemented.
> > >
> > > Thanks,
> > > Ian
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-02-13 18:59 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-01-11 19:01 ` [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables Ian Rogers
2025-02-12 14:17 ` Arnaldo Carvalho de Melo
2025-01-11 19:01 ` [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu Ian Rogers
2025-02-12 14:51 ` Arnaldo Carvalho de Melo
2025-02-12 16:11 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 03/10] perf target: Separate parse_uid into its own function Ian Rogers
2025-01-11 19:01 ` [PATCH v1 04/10] perf parse-events: Add parse_uid_filter helper Ian Rogers
2025-01-11 19:01 ` [PATCH v1 05/10] perf record: Switch user option to use BPF filter Ian Rogers
2025-01-11 19:01 ` [PATCH v1 06/10] perf top: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 07/10] perf trace: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 08/10] perf bench evlist-open-close: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 09/10] perf target: Remove uid from target Ian Rogers
2025-01-11 19:01 ` [PATCH v1 10/10] perf thread_map: Remove uid options Ian Rogers
2025-02-10 18:18 ` [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-02-10 19:59 ` Namhyung Kim
2025-02-10 22:06 ` Ian Rogers
2025-02-11 3:12 ` Namhyung Kim
2025-02-11 4:40 ` Ian Rogers
2025-02-11 17:51 ` Namhyung Kim
2025-02-11 18:06 ` Ian Rogers
2025-02-12 1:51 ` Namhyung Kim
2025-02-12 5:41 ` Ian Rogers
2025-02-12 18:46 ` Namhyung Kim
2025-02-12 20:00 ` Ian Rogers
2025-02-12 22:56 ` Namhyung Kim
2025-02-12 23:17 ` Ian Rogers
2025-02-13 1:44 ` Namhyung Kim
2025-02-13 7:27 ` Ian Rogers
2025-02-13 17:47 ` Namhyung Kim
2025-02-13 18:13 ` Ian Rogers
2025-02-13 18:59 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).