netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@redhat.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andriin@fb.com>,
	"Steven Rostedt (VMware)" <rostedt@goodmis.org>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@chromium.org>, Daniel Xu <dxu@dxuuu.xyz>,
	Viktor Malik <vmalik@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>
Subject: Re: [RFC] store function address in BTF
Date: Thu, 7 Oct 2021 08:55:14 +0200	[thread overview]
Message-ID: <YV6Z0jcB8jdsG4Ly@krava> (raw)
In-Reply-To: <CAEf4BzYBS8+9XADjJdvKB=_6tf8_t19UVGfm2Lk1+Nb6qWk5cw@mail.gmail.com>

On Wed, Oct 06, 2021 at 03:37:16PM -0700, Andrii Nakryiko wrote:
> On Wed, Oct 6, 2021 at 3:05 PM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Wed, Oct 06, 2021 at 02:22:28PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Oct 6, 2021 at 1:06 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 06, 2021 at 09:17:39AM -0700, Andrii Nakryiko wrote:
> > > > > On Wed, Oct 6, 2021 at 1:42 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > > > > >
> > > > > > hi,
> > > > > > I'm hitting performance issue and soft lock ups with the new version
> > > > > > of the patchset and the reason seems to be kallsyms lookup that we
> > > > > > need to do for each btf id we want to attach
> > > > > >
> > > > > > I tried to change kallsyms_lookup_name linear search into rbtree search,
> > > > > > but it has its own pitfalls like duplicate function names and it still
> > > > > > seems not to be fast enough when you want to attach like 30k functions
> > > > >
> > > > > How not fast enough is it exactly? How long does it take?
> > > >
> > > > 30k functions takes 75 seconds for me, it's loop calling bpf_check_attach_target
> > > >
> > > > getting soft lock up messages:
> > > >
> > > > krava33 login: [  168.896671] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [bpftrace:1087]
> > > >
> > >
> > > That's without RB tree right? I was curious about the case of you
> > > converting kallsyms to RB tree and it still being slow. Can't imagine
> > > 30k queries against RB tree with ~160k kallsyms taking 75 seconds.
> >
> > yep, that's the standard kallsyms lookup api
> >
> > I need to make some adjustment for rbtree kalsyms code, I think I found
> > a bug in there, so the numbers are probably better as you suggest
> 
> ok, cool, let's see what are the new numbers then
> 
> >
> > >
> > > But as I said, why not map BTF IDs into function names, sort function
> > > names, and then pass over kallsyms once, doing binary search into a
> > > sorted array of requested function names and then recording addr for
> > > each. Then check that you found addresses for all functions (it also
> > > leaves a question of what to do when we have multiple matching
> > > functions, but it's a problem with any approach). If everything checks
> > > out, you have a nice btf id -> func name -> func addr mapping. It's
> > > O(N log(M)), which sounds like it shouldn't be slow. Definitely not
> > > multiple seconds slow.
> >
> > ok, now that's clear to me, thanks for these details
> 
> great
> 
> >
> > >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > so I wonder we could 'fix this' by storing function address in BTF,
> > > > > > which would cut kallsyms lookup completely, because it'd be done in
> > > > > > compile time
> > > > > >
> > > > > > my first thought was to add extra BTF section for that, after discussion
> > > > > > with Arnaldo perhaps we could be able to store extra 8 bytes after
> > > > > > BTF_KIND_FUNC record, using one of the 'unused' bits in btf_type to
> > > > > > indicate that? or new BTF_KIND_FUNC2 type?
> > > > > >
> > > > > > thoughts?
> > > > >
> > > > > I'm strongly against this, because (besides the BTF bloat reason) we
> > > > > need similar mass attachment functionality for kprobe/kretprobe and
> > > > > that one won't be relying on BTF FUNCs, so I think it's better to
> > > > > stick to the same mechanism for figuring out the address of the
> > > > > function.
> > > >
> > > > ok
> > > >
> > > > >
> > > > > If RB tree is not feasible, we can do a linear search over unsorted
> > > > > kallsyms and do binary search over sorted function names (derived from
> > > > > BTF IDs). That would be O(Nlog(M)), where N is number of ksyms, M is
> > > > > number of BTF IDs/functions-to-be-attached-to. If we did have an RB
> > > > > tree for kallsyms (is it hard to support duplicates? why?) it could be
> > > > > even faster O(Mlog(N)).
> > > >
> > > > I had issues with generic kallsyms rbtree in the post some time ago,
> > > > I'll revisit it to check on details.. but having the tree with just
> > > > btf id functions might clear that.. I'll check
> > >
> > > That's not what I'm proposing. See above. Please let me know if
> > > something is not clear before going all in for RB tree implementation
> > > :)
> > >
> > >
> > > But while we are on topic, do you think (with ftrace changes you are
> > > doing) it would be hard to support multi-attach for
> > > kprobes/kretprobes? We now have bpf_link interface for attaching
> > > kprobes, so API can be pretty aligned with fentry/fexit, except
> > > instead of btf IDs we'd need to pass array of pointers of C strings, I
> > > suppose.
> >
> > hum, I think kprobe/kretprobe is made of perf event (kprobe/kretprobe
> > pmus), then you pass event fd and program fd to bpf link syscall,
> > and it attaches bpf program to that perf event
> >
> > so perhaps the user interface would be array of perf events fds and prog fd
> >
> > also I think you can have just one probe for function, so we will not need
> > to share kprobes for multiple users like we need for trampolines, so the
> > attach logic will be simple
> 
> creating thousands of perf_event FDs seems expensive (and you'll be
> running into the limit of open files pretty soon in a lot of systems).
> So I think for multi-attach we'll have to have a separate way where
> you'd specify kernel function name (and maybe also offset). I'm just
> saying that having BPF_LINK_CREATE command, it's easier (probably) to
> extend this for kprobe multi-attach, than trying to retrofit this into
> perf_event_open.

ah true.. I wonder we could bypass perf by using directly kernel
kprobe interface

jirka


      reply	other threads:[~2021-10-07  6:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-06  8:41 [RFC] store function address in BTF Jiri Olsa
2021-10-06  8:44 ` Jiri Olsa
2021-10-06 14:53   ` Alexei Starovoitov
2021-10-06 20:07     ` Jiri Olsa
2021-10-06 16:17 ` Andrii Nakryiko
2021-10-06 20:06   ` Jiri Olsa
2021-10-06 21:22     ` Andrii Nakryiko
2021-10-06 22:05       ` Jiri Olsa
2021-10-06 22:37         ` Andrii Nakryiko
2021-10-07  6:55           ` Jiri Olsa [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YV6Z0jcB8jdsG4Ly@krava \
    --to=jolsa@redhat.com \
    --cc=acme@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dxu@dxuuu.xyz \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=vmalik@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).