From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
Kan Liang <kan.liang@linux.intel.com>, Hao Ge <gehao@kylinos.cn>,
James Clark <james.clark@linaro.org>,
Howard Chu <howardchu95@gmail.com>,
Dominique Martinet <asmadeus@codewreck.org>,
Levi Yun <yeoreum.yun@arm.com>, Xu Yang <xu.yang_2@nxp.com>,
Tengda Wu <wutengda@huaweicloud.com>,
Yang Jihong <yangjihong1@huawei.com>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 00/10] Move uid filtering to BPF filters
Date: Tue, 11 Feb 2025 09:51:20 -0800 [thread overview]
Message-ID: <Z6uOGNO6p7i9Fese@google.com> (raw)
In-Reply-To: <CAP-5=fXjmJ+Rr7ejL6fCMeu6PZSit7n84hQkjh=0jQhkr1on3Q@mail.gmail.com>
On Mon, Feb 10, 2025 at 08:40:01PM -0800, Ian Rogers wrote:
> On Mon, Feb 10, 2025 at 7:12 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> > > On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > > > racy as the PID may exit before the perf_event_open causing perf to
> > > > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > > > helper for commands that support UID filtering and wire up. Remove the
> > > > > non-BPF UID filtering support.
> > > >
> > > > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > > > Also non-root users will be limited unless it pinned the BPF program in
> > > > advance.
> > > >
> > > > I think you can keep the original behavior and convert to BPF only when
> > > > it's available.
> > >
> > > Using BPF when available would be limited progress. The issues I have
> > > with not removing the existing approach are:
> > >
> > > 1) It is broken
> > > Scanning /proc for pids and then doing perf_event_open means that any
> > > time a process exits the perf_event_open fails.
> > > Steps to reproduce:
> > > This bug reproduces easily but if your machine is lightly loaded in
> > > one terminal run `perf test`, in another terminal run `sudo perf top
> > > -u $(id -u)` the perf top command will exit with:
> > > ```
> > > Error:
> > > The sys_perf_event_open() syscall returned with 3 (No such process)
> > > for event (cycles:P).
> > > /bin/dmesg | grep -i perf may provide additional information.
> > > ```
> > >
> > > 2) It is a work in progress that isn't progressing. Scanning /proc
> > > will only tell you about started processes; it won't tell you about
> > > processes that start during the profiling run, whether it be perf top
> > > or perf record. Extra work would be necessary to make this most basic
> > > of use-cases work, perhaps you could use tracepoints to capture
> > > starting processes and then enable user profiling on those processes.
> > > It would be horribly complicated, suffer from delays between observing
> > > things happen and doing the perf_event_open, biases in the samples,
> > > etc. I don't expect anyone to do it, especially when BPF filtering
> > > already solves the problem better. There have been 13 years that
> > > someone could have fixed it.
> > >
> > > 3) it adds significant useless complexity to the code base. Having
> > > 'user' in the target makes it look like perf_event_open can work on a
> > > pid, system wide or user basis. The user basis doesn't exist so the
> > > majority of the code base is just ignoring it - search for users of
> > > uid_str on target. Those that do run into problems (1) and (2).
> > >
> > > 4) It is redundant and leads to confusion with BPF filtering. Having
> > > the BPF filter on evsels is non-zero cost in terms of the code base
> > > complexity, but it is something broadly useful. As user filtering has
> > > never worked (see above) it isn't broadly used but is adding
> > > complexity. If both approaches were wanted then it is unclear what the
> > > code should be doing, presumably the mish-mash of BPF filtering and
> > > /proc scanning that happens today and will be broken due to (1) and
> > > (2). Putting everything into the BPF filter makes sense as you can
> > > combine a BPF filter with an additional BPF filter on user.
> > >
> > > 5) It is untested and adding a test leads to an always broken test due
> > > to testing being an example of where things break, see point 1 and its
> > > example.
> > >
> > > 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> > > previous feature, maintaining it for no users is overhead. Let me know
> > > if someone is using this option (I doubt it given points 1 and 2) and
> > > they wouldn't be better served by BPF. People building perf today have
> > > to explicitly opt-out of wanting BPF in their tooling. I posted this
> > > change a month ago and nobody has jumped up saying please don't remove
> > > the old approach.
> > >
> > > 7) The interaction with this feature and changes in behavior, say
> > > ignoring events that fail to open, is non-obvious and not testable as
> > > testing would be broken as the functionality itself is broken. Having
> > > the broken feature hanging around and being untestable means that we
> > > slow progress on new features, testing and other improvements.
> > >
> > > Yes, we could try to fix all of that but why? Nobody has cared or
> > > tried in 13 years and I would like the tech debt off our plate with a
> > > better approach in its place.
> >
> > Thanks for writing this up. I agree BPF approach is better but it has
> > its own limitation - basically it requires root. And I know a few of
> > people who don't use BPF. :) And maybe we need to check if user passes
> > filters to the event already.
>
> I thought you were working on making the BPF filters pin-able? So root
> could enable the filter but then users have access to it.
Right, 'perf record --setup-filter pin' would do the job. But it has to
be run in advance.
> You have to be root to be looking at other users in any case.
That's true. But at least you can profile your processes. :)
>
> > Also, I admit that I don't know who actually uses this. But I can say
> > sometimes people uses tools in a creative way. Anyway, I can imagine
> > an use case that system is in a steady state and each user has dedicated
> > jobs. Then scanning /proc would work well.
>
> Another one for Google's tree then.
Any chance you update the patchset to retain the old behavior and use
BPF only if available?
Thanks,
Namhyung
next prev parent reply other threads:[~2025-02-11 17:51 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-01-11 19:01 ` [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables Ian Rogers
2025-02-12 14:17 ` Arnaldo Carvalho de Melo
2025-01-11 19:01 ` [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu Ian Rogers
2025-02-12 14:51 ` Arnaldo Carvalho de Melo
2025-02-12 16:11 ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 03/10] perf target: Separate parse_uid into its own function Ian Rogers
2025-01-11 19:01 ` [PATCH v1 04/10] perf parse-events: Add parse_uid_filter helper Ian Rogers
2025-01-11 19:01 ` [PATCH v1 05/10] perf record: Switch user option to use BPF filter Ian Rogers
2025-01-11 19:01 ` [PATCH v1 06/10] perf top: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 07/10] perf trace: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 08/10] perf bench evlist-open-close: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 09/10] perf target: Remove uid from target Ian Rogers
2025-01-11 19:01 ` [PATCH v1 10/10] perf thread_map: Remove uid options Ian Rogers
2025-02-10 18:18 ` [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-02-10 19:59 ` Namhyung Kim
2025-02-10 22:06 ` Ian Rogers
2025-02-11 3:12 ` Namhyung Kim
2025-02-11 4:40 ` Ian Rogers
2025-02-11 17:51 ` Namhyung Kim [this message]
2025-02-11 18:06 ` Ian Rogers
2025-02-12 1:51 ` Namhyung Kim
2025-02-12 5:41 ` Ian Rogers
2025-02-12 18:46 ` Namhyung Kim
2025-02-12 20:00 ` Ian Rogers
2025-02-12 22:56 ` Namhyung Kim
2025-02-12 23:17 ` Ian Rogers
2025-02-13 1:44 ` Namhyung Kim
2025-02-13 7:27 ` Ian Rogers
2025-02-13 17:47 ` Namhyung Kim
2025-02-13 18:13 ` Ian Rogers
2025-02-13 18:59 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6uOGNO6p7i9Fese@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=asmadeus@codewreck.org \
--cc=gehao@kylinos.cn \
--cc=howardchu95@gmail.com \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=wutengda@huaweicloud.com \
--cc=xu.yang_2@nxp.com \
--cc=yangjihong1@huawei.com \
--cc=yeoreum.yun@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).