Re: [PATCH v1 00/10] Move uid filtering to BPF filters

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>, Hao Ge <gehao@kylinos.cn>,
	James Clark <james.clark@linaro.org>,
	Howard Chu <howardchu95@gmail.com>,
	Dominique Martinet <asmadeus@codewreck.org>,
	Levi Yun <yeoreum.yun@arm.com>, Xu Yang <xu.yang_2@nxp.com>,
	Tengda Wu <wutengda@huaweicloud.com>,
	Yang Jihong <yangjihong1@huawei.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 00/10] Move uid filtering to BPF filters
Date: Mon, 10 Feb 2025 19:12:30 -0800	[thread overview]
Message-ID: <Z6rAHhAIdlkAryGJ@google.com> (raw)
In-Reply-To: <CAP-5=fUY83gifsMZA0Q45kiQQbAKp2uJxkuwrwGtHK2hiUFRDA@mail.gmail.com>

On Mon, Feb 10, 2025 at 02:06:18PM -0800, Ian Rogers wrote:
> On Mon, Feb 10, 2025 at 11:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Sat, Jan 11, 2025 at 11:01:33AM -0800, Ian Rogers wrote:
> > > Rather than scanning /proc and skipping PIDs based on their UIDs, use
> > > BPF filters for uid filtering. The /proc scanning in thread_map is
> > > racy as the PID may exit before the perf_event_open causing perf to
> > > abort. BPF UID filters are more robust as they avoid the race. Add a
> > > helper for commands that support UID filtering and wire up. Remove the
> > > non-BPF UID filtering support.
> >
> > Hmm.. then non-BPF build cannot use the UID filtering anymore, right?
> > Also non-root users will be limited unless it pinned the BPF program in
> > advance.
> >
> > I think you can keep the original behavior and convert to BPF only when
> > it's available.
> 
> Using BPF when available would be limited progress. The issues I have
> with not removing the existing approach are:
> 
> 1) It is broken
> Scanning /proc for pids and then doing perf_event_open means that any
> time a process exits the perf_event_open fails.
> Steps to reproduce:
> This bug reproduces easily but if your machine is lightly loaded in
> one terminal run `perf test`, in another terminal run `sudo perf top
> -u $(id -u)` the perf top command will exit with:
> ```
> Error:
> The sys_perf_event_open() syscall returned with 3 (No such process)
> for event (cycles:P).
> /bin/dmesg | grep -i perf may provide additional information.
> ```
> 
> 2) It is a work in progress that isn't progressing. Scanning /proc
> will only tell you about started processes; it won't tell you about
> processes that start during the profiling run, whether it be perf top
> or perf record. Extra work would be necessary to make this most basic
> of use-cases work, perhaps you could use tracepoints to capture
> starting processes and then enable user profiling on those processes.
> It would be horribly complicated, suffer from delays between observing
> things happen and doing the perf_event_open, biases in the samples,
> etc. I don't expect anyone to do it, especially when BPF filtering
> already solves the problem better. There have been 13 years that
> someone could have fixed it.
> 
> 3) it adds significant useless complexity to the code base. Having
> 'user' in the target makes it look like perf_event_open can work on a
> pid, system wide or user basis. The user basis doesn't exist so the
> majority of the code base is just ignoring it - search for users of
> uid_str on target. Those that do run into problems (1) and (2).
> 
> 4) It is redundant and leads to confusion with BPF filtering. Having
> the BPF filter on evsels is non-zero cost in terms of the code base
> complexity, but it is something broadly useful. As user filtering has
> never worked (see above) it isn't broadly used but is adding
> complexity. If both approaches were wanted then it is unclear what the
> code should be doing, presumably the mish-mash of BPF filtering and
> /proc scanning that happens today and will be broken due to (1) and
> (2). Putting everything into the BPF filter makes sense as you can
> combine a BPF filter with an additional BPF filter on user.
> 
> 5) It is untested and adding a test leads to an always broken test due
> to testing being an example of where things break, see point 1 and its
> example.
> 
> 6) Nobody wants the non-BPF approach. As it was broken nobody used the
> previous feature, maintaining it for no users is overhead. Let me know
> if someone is using this option (I doubt it given points 1 and 2) and
> they wouldn't be better served by BPF. People building perf today have
> to explicitly opt-out of wanting BPF in their tooling. I posted this
> change a month ago and nobody has jumped up saying please don't remove
> the old approach.
> 
> 7) The interaction with this feature and changes in behavior, say
> ignoring events that fail to open, is non-obvious and not testable as
> testing would be broken as the functionality itself is broken. Having
> the broken feature hanging around and being untestable means that we
> slow progress on new features, testing and other improvements.
> 
> Yes, we could try to fix all of that but why? Nobody has cared or
> tried in 13 years and I would like the tech debt off our plate with a
> better approach in its place.

Thanks for writing this up.  I agree BPF approach is better but it has
its own limitation - basically it requires root.  And I know a few of
people who don't use BPF. :)  And maybe we need to check if user passes
filters to the event already.

Also, I admit that I don't know who actually uses this.  But I can say
sometimes people uses tools in a creative way.  Anyway, I can imagine
an use case that system is in a steady state and each user has dedicated
jobs.  Then scanning /proc would work well.

Thanks,
Namhyung

next prev parent reply	other threads:[~2025-02-11  3:12 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-11 19:01 [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-01-11 19:01 ` [PATCH v1 01/10] perf bench evlist-open-close: Reduce scope of 2 variables Ian Rogers
2025-02-12 14:17   ` Arnaldo Carvalho de Melo
2025-01-11 19:01 ` [PATCH v1 02/10] perf parse-events filter: Use evsel__find_pmu Ian Rogers
2025-02-12 14:51   ` Arnaldo Carvalho de Melo
2025-02-12 16:11     ` Ian Rogers
2025-01-11 19:01 ` [PATCH v1 03/10] perf target: Separate parse_uid into its own function Ian Rogers
2025-01-11 19:01 ` [PATCH v1 04/10] perf parse-events: Add parse_uid_filter helper Ian Rogers
2025-01-11 19:01 ` [PATCH v1 05/10] perf record: Switch user option to use BPF filter Ian Rogers
2025-01-11 19:01 ` [PATCH v1 06/10] perf top: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 07/10] perf trace: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 08/10] perf bench evlist-open-close: " Ian Rogers
2025-01-11 19:01 ` [PATCH v1 09/10] perf target: Remove uid from target Ian Rogers
2025-01-11 19:01 ` [PATCH v1 10/10] perf thread_map: Remove uid options Ian Rogers
2025-02-10 18:18 ` [PATCH v1 00/10] Move uid filtering to BPF filters Ian Rogers
2025-02-10 19:59 ` Namhyung Kim
2025-02-10 22:06   ` Ian Rogers
2025-02-11  3:12     ` Namhyung Kim [this message]
2025-02-11  4:40       ` Ian Rogers
2025-02-11 17:51         ` Namhyung Kim
2025-02-11 18:06           ` Ian Rogers
2025-02-12  1:51             ` Namhyung Kim
2025-02-12  5:41               ` Ian Rogers
2025-02-12 18:46                 ` Namhyung Kim
2025-02-12 20:00                   ` Ian Rogers
2025-02-12 22:56                     ` Namhyung Kim
2025-02-12 23:17                       ` Ian Rogers
2025-02-13  1:44                         ` Namhyung Kim
2025-02-13  7:27                           ` Ian Rogers
2025-02-13 17:47                             ` Namhyung Kim
2025-02-13 18:13                               ` Ian Rogers
2025-02-13 18:59                                 ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6rAHhAIdlkAryGJ@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=asmadeus@codewreck.org \
    --cc=gehao@kylinos.cn \
    --cc=howardchu95@gmail.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=wutengda@huaweicloud.com \
    --cc=xu.yang_2@nxp.com \
    --cc=yangjihong1@huawei.com \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox