linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Michael Petlan <mpetlan@redhat.com>,
	Ian Rogers <irogers@google.com>,
	"linux-perf-use." <linux-perf-users@vger.kernel.org>,
	Christy Lee <christylee@fb.com>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH 1/3] perf/bpf: Remove prologue generation
Date: Sun, 13 Feb 2022 16:02:49 +0100	[thread overview]
Message-ID: <YgkdmQVN1XSuFDWM@krava> (raw)
In-Reply-To: <CAEf4Bza9a5_U2U9ZK_su4VGSA91EMNouipk=PFudGqkN_iGsPg@mail.gmail.com>

On Thu, Feb 10, 2022 at 09:28:51PM -0800, Andrii Nakryiko wrote:
> On Thu, Feb 10, 2022 at 1:31 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Feb 10, 2022 at 04:18:10PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Sun, Jan 23, 2022 at 11:19:30PM +0100, Jiri Olsa escreveu:
> > > > Removing code for ebpf program prologue generation.
> > > >
> > > > The prologue code was used to get data for extra arguments specified
> > > > in program section name, like:
> > > >
> > > >   SEC("lock_page=__lock_page page->flags")
> > > >   int lock_page(struct pt_regs *ctx, int err, unsigned long flags)
> > > >   {
> > > >          return 1;
> > > >   }
> > > >
> > > > This code is using deprecated libbpf API and blocks its removal.
> > > >
> > > > This feature was not documented and broken for some time without
> > > > anyone complaining, also original authors are not responding,
> > > > so I'm removing it.
> > >
> > > So, the example below breaks, how hard would be to move the deprecated
> > > APIs to perf like was done in some other cases?
> > >
> 
> Just copy/pasting libbpf code won't work. But there are three parts:
> 
> 1. bpf_(program|map|object)__set_priv(). There is no equivalent API,
> but perf can maintain this private data by building hashmap where the
> key is bpf_object/bpf_map/bpf_program pointer itself. Annoying but
> very straightforward to replace.
> 
> 2. For prologue generation, bpf_program__set_prep() doesn't have a
> direct equivalent. But program cloning and adjustment of the code can
> be achieved through bpf_program__insns()/bpf_program__insn_cnt() API
> to load one "prototype" program, gets its underlying insns and clone
> programs as necessary. After that, bpf_prog_load() would be used to
> load those cloned programs into kernel.

hm, I can't see how to clone a program.. so we need to end up with
several copies of the single program defined in the object.. I can
get its intructions with bpf_program__insns, but how do I add more
programs with these instructions customized/prefixed?

thanks,
jirka

> 
> 3. Those *very* custom SEC() definitions will be possible for perf to
> handle once [0] lands (I'll send new revision tomorrow, probably).
> You'll be able to register your own "fallback" handler with
> libbpf_register_prog_handler(NULL, ...).
> 
>   [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=611491&state=*
> 
> Sorry, it's not a straightforward copy/paste, but I hope this helps a bit.
> 
> > > - Arnaldo
> > >
> > > Before:
> > >
> > > [root@quaco perf]# cat tools/perf/examples/bpf/5sec.c
> > > // SPDX-License-Identifier: GPL-2.0
> > > /*
> > >     Description:
> > >
> > >     . Disable strace like syscall tracing (--no-syscalls), or try tracing
> > >       just some (-e *sleep).
> > >
> > >     . Attach a filter function to a kernel function, returning when it should
> > >       be considered, i.e. appear on the output.
> > >
> > >     . Run it system wide, so that any sleep of >= 5 seconds and < than 6
> > >       seconds gets caught.
> > >
> > >     . Ask for callgraphs using DWARF info, so that userspace can be unwound
> > >
> > >     . While this is running, run something like "sleep 5s".
> > >
> > >     . If we decide to add tv_nsec as well, then it becomes:
> > >
> > >       int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec)
> > >
> > >       I.e. add where it comes from (rqtp->tv_nsec) and where it will be
> > >       accessible in the function body (nsec)
> > >
> > >     # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/
> > >          0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5
> > >                                            hrtimer_nanosleep ([kernel.kallsyms])
> > >                                            __x64_sys_nanosleep ([kernel.kallsyms])
> > >                                            do_syscall_64 ([kernel.kallsyms])
> > >                                            entry_SYSCALL_64 ([kernel.kallsyms])
> > >                                            __GI___nanosleep (/usr/lib64/libc-2.26.so)
> > >                                            rpl_nanosleep (/usr/bin/sleep)
> > >                                            xnanosleep (/usr/bin/sleep)
> > >                                            main (/usr/bin/sleep)
> > >                                            __libc_start_main (/usr/lib64/libc-2.26.so)
> > >                                            _start (/usr/bin/sleep)
> > >     ^C#
> > >
> > >    Copyright (C) 2018 Red Hat, Inc., Arnaldo Carvalho de Melo <acme@redhat.com>
> > > */
> > >
> > > #include <bpf.h>
> > >
> > > #define NSEC_PER_SEC  1000000000L
> > >
> > > int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> > > {
> > >       return sec / NSEC_PER_SEC == 5ULL;
> > > }
> >
> > that sucks ;-) I'll check if we can re-implement as we discussed earlier,
> > however below is workaround how to do it without the prologue support
> >
> > jirka
> >
> >
> > ---
> > diff --git a/tools/perf/examples/bpf/5sec.c b/tools/perf/examples/bpf/5sec.c
> > index e6b6181c6dc6..734d39debdb8 100644
> > --- a/tools/perf/examples/bpf/5sec.c
> > +++ b/tools/perf/examples/bpf/5sec.c
> > @@ -43,9 +43,17 @@
> >
> >  #define NSEC_PER_SEC   1000000000L
> >
> > -int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> > +struct pt_regs {
> > +        long di;
> > +};
> > +
> > +SEC("function=hrtimer_nanosleep")
> > +int krava(struct pt_regs *ctx)
> >  {
> > -       return sec / NSEC_PER_SEC == 5ULL;
> > +       unsigned long arg;
> > +
> > +       probe_read_kernel(&arg, sizeof(arg), __builtin_preserve_access_index(&ctx->di));
> > +       return arg / NSEC_PER_SEC == 5ULL;
> >  }
> >
> >  license(GPL);
> > diff --git a/tools/perf/include/bpf/bpf.h b/tools/perf/include/bpf/bpf.h
> > index b422aeef5339..b7d6d2fc8342 100644
> > --- a/tools/perf/include/bpf/bpf.h
> > +++ b/tools/perf/include/bpf/bpf.h
> > @@ -64,6 +64,7 @@ int _version SEC("version") = LINUX_VERSION_CODE;
> >
> >  static int (*probe_read)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read;
> >  static int (*probe_read_str)(void *dst, int size, const void *unsafe_addr) = (void *)BPF_FUNC_probe_read_str;
> > +static long (*probe_read_kernel)(void *dst, __u32 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read_kernel;
> >
> >  static int (*perf_event_output)(void *, struct bpf_map *, int, void *, unsigned long) = (void *)BPF_FUNC_perf_event_output;
> >
> > diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c
> > index 96c8ef60f4f8..9274a3373847 100644
> > --- a/tools/perf/util/llvm-utils.c
> > +++ b/tools/perf/util/llvm-utils.c
> > @@ -25,7 +25,7 @@
> >                 "$CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS " \
> >                 "-Wno-unused-value -Wno-pointer-sign "          \
> >                 "-working-directory $WORKING_DIR "              \
> > -               "-c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE"
> > +               "-g -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE"
> >
> >  struct llvm_param llvm_param = {
> >         .clang_path = "clang",

  reply	other threads:[~2022-02-13 15:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-23 22:19 [PATCH 1/3] perf/bpf: Remove prologue generation Jiri Olsa
2022-01-23 22:19 ` [PATCH 2/3] perf/bpf: Remove special bpf config support Jiri Olsa
2022-01-23 22:19 ` [PATCH 3/3] perf tests: Remove bpf prologue generation test Jiri Olsa
2022-01-24 20:24 ` [PATCH 1/3] perf/bpf: Remove prologue generation Andrii Nakryiko
2022-02-02  1:01   ` Andrii Nakryiko
2022-02-02 10:08     ` Arnaldo Carvalho de Melo
2022-02-02 17:21       ` Andrii Nakryiko
2022-02-10 19:18 ` Arnaldo Carvalho de Melo
2022-02-10 21:30   ` Jiri Olsa
2022-02-11  5:28     ` Andrii Nakryiko
2022-02-13 15:02       ` Jiri Olsa [this message]
2022-02-14  5:57         ` Andrii Nakryiko
2022-02-14  8:33           ` Jiri Olsa
2022-02-14  6:02     ` Andrii Nakryiko
2022-02-14  8:16       ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgkdmQVN1XSuFDWM@krava \
    --to=olsajiri@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christylee@fb.com \
    --cc=irogers@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=mpetlan@redhat.com \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).