public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Daniel Xu" <dxu@dxuuu.xyz>
To: "Andrii Nakryiko" <andrii.nakryiko@gmail.com>
Cc: "Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Yonghong Song" <yhs@fb.com>, "Martin Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"Networking" <netdev@vger.kernel.org>,
	"bpf" <bpf@vger.kernel.org>,
	"Peter Ziljstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"open list" <linux-kernel@vger.kernel.org>,
	"Kernel Team" <kernel-team@fb.com>
Subject: Re: [PATCH bpf] bpf: Add LBR data to BPF_PROG_TYPE_PERF_EVENT prog context
Date: Mon, 16 Dec 2019 11:29:03 -0800	[thread overview]
Message-ID: <BZ73B4ZKTVZP.2ME64EWPJ01T8@dlxu-fedora-R90QNFJV> (raw)
In-Reply-To: <CAEf4BzY-ahRm5HPrqRWF5seOjGM+PJs+J+DTbuws3r=jd_PArg@mail.gmail.com>

On Fri Dec 6, 2019 at 9:10 AM, Andrii Nakryiko wrote:
> On Thu, Dec 5, 2019 at 4:13 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> >
> > Last-branch-record is an intel CPU feature that can be configured to
> > record certain branches that are taken during code execution. This data
> > is particularly interesting for profile guided optimizations. perf has
> > had LBR support for a while but the data collection can be a bit coarse
> > grained.
> >
> > We (Facebook) have recently run a lot of experiments with feeding
> > filtered LBR data to various PGO pipelines. We've seen really good
> > results (+2.5% throughput with lower cpu util and lower latency) by
> > feeding high request latency LBR branches to the compiler on a
> > request-oriented service. We used bpf to read a special request context
> > ID (which is how we associate branches with latency) from a fixed
> > userspace address. Reading from the fixed address is why bpf support is
> > useful.
> >
> > Aside from this particular use case, having LBR data available to bpf
> > progs can be useful to get stack traces out of userspace applications
> > that omit frame pointers.
> >
> > This patch adds support for LBR data to bpf perf progs.
> >
> > Some notes:
> > * We use `__u64 entries[BPF_MAX_LBR_ENTRIES * 3]` instead of
> >   `struct perf_branch_entry[BPF_MAX_LBR_ENTRIES]` because checkpatch.pl
> >   warns about including a uapi header from another uapi header
> >
> > * We define BPF_MAX_LBR_ENTRIES as 32 (instead of using the value from
> >   arch/x86/events/perf_events.h) because including arch specific headers
> >   seems wrong and could introduce circular header includes.
> >
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> >  include/uapi/linux/bpf_perf_event.h |  5 ++++
> >  kernel/trace/bpf_trace.c            | 39 +++++++++++++++++++++++++++++
> >  2 files changed, 44 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf_perf_event.h b/include/uapi/linux/bpf_perf_event.h
> > index eb1b9d21250c..dc87e3d50390 100644
> > --- a/include/uapi/linux/bpf_perf_event.h
> > +++ b/include/uapi/linux/bpf_perf_event.h
> > @@ -10,10 +10,15 @@
> >
> >  #include <asm/bpf_perf_event.h>
> >
> > +#define BPF_MAX_LBR_ENTRIES 32
> > +
> >  struct bpf_perf_event_data {
> >         bpf_user_pt_regs_t regs;
> >         __u64 sample_period;
> >         __u64 addr;
> > +       __u64 nr_lbr;
> > +       /* Cast to struct perf_branch_entry* before using */
> > +       __u64 entries[BPF_MAX_LBR_ENTRIES * 3];
> >  };
> >
>
> 
> I wonder if instead of hard-coding this in bpf_perf_event_data, could
> we achieve this and perhaps even more flexibility by letting users
> access underlying bpf_perf_event_data_kern and use CO-RE to read
> whatever needs to be read from perf_sample_data, perf_event, etc?
> Would that work?
>
> 
> >  #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ffc91d4935ac..96ba7995b3d7 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
>
> 
> [...]
>

Sorry about the late response. I chatted w/ Andrii last week and spent
some time playing with alternatives. It turns out we can read lbr data
by casting the bpf_perf_event_data to the internal kernel datastructure
and doing some well placed bpf_probe_read's.

Unless someone else thinks this patch would be useful, I will probably
abandon it for now (unless we experience enough pain from doing these
casts). If I did a v2, I would probably add a bpf helper instead of
modifying the ctx to get around the ugly api limitations.

Daniel

      reply	other threads:[~2019-12-16 19:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06  0:12 [PATCH bpf] bpf: Add LBR data to BPF_PROG_TYPE_PERF_EVENT prog context Daniel Xu
2019-12-06  6:39 ` Alexei Starovoitov
2019-12-06  6:55 ` Martin Lau
2019-12-06 11:37 ` Peter Zijlstra
2019-12-06 17:10 ` Andrii Nakryiko
2019-12-16 19:29   ` Daniel Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BZ73B4ZKTVZP.2ME64EWPJ01T8@dlxu-fedora-R90QNFJV \
    --to=dxu@dxuuu.xyz \
    --cc=acme@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox