Re: [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name_kind

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Eduard Zingerman <eddyz87@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Donglin Peng <pengdonglin@sangfor.com.cn>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Alexei Starovoitov <ast@kernel.org>, Song Liu <song@kernel.org>,
	Yonghong Song <yhs@fb.com>, Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	 dinghui@sangfor.com.cn, huangcun@sangfor.com.cn,
	bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name_kind
Date: Tue, 12 Sep 2023 20:03:40 +0300	[thread overview]
Message-ID: <035ab912d7d6bd11c54c038464795da01dbed2de.camel@gmail.com> (raw)
In-Reply-To: <CAADnVQL0O_WFYcYQRig7osO0piPdOH2yHkdH0CxCfNV7NkA0Lw@mail.gmail.com>

On Tue, 2023-09-12 at 09:40 -0700, Alexei Starovoitov wrote:
> On Tue, Sep 12, 2023 at 7:19 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Tue, 2023-09-12 at 16:51 +0300, Eduard Zingerman wrote:
> > > On Sat, 2023-09-09 at 02:16 -0700, Donglin Peng wrote:
> > > > Currently, we are only using the linear search method to find the type id
> > > > by the name, which has a time complexity of O(n). This change involves
> > > > sorting the names of btf types in ascending order and using binary search,
> > > > which has a time complexity of O(log(n)). This idea was inspired by the
> > > > following patch:
> > > > 
> > > > 60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_name()").
> > > > 
> > > > At present, this improvement is only for searching in vmlinux's and
> > > > module's BTFs, and the kind should only be BTF_KIND_FUNC or BTF_KIND_STRUCT.
> > > > 
> > > > Another change is the search direction, where we search the BTF first and
> > > > then its base, the type id of the first matched btf_type will be returned.
> > > > 
> > > > Here is a time-consuming result that finding all the type ids of 67,819 kernel
> > > > functions in vmlinux's BTF by their names:
> > > > 
> > > > Before: 17000 ms
> > > > After:     10 ms
> > > > 
> > > > The average lookup performance has improved about 1700x at the above scenario.
> > > > 
> > > > However, this change will consume more memory, for example, 67,819 kernel
> > > > functions will allocate about 530KB memory.
> > > 
> > > Hi Donglin,
> > > 
> > > I think this is a good improvement. However, I wonder, why did you
> > > choose to have a separate name map for each BTF kind?
> > > 
> > > I did some analysis for my local testing kernel config and got such numbers:
> > > - total number of BTF objects: 97350
> > > - number of FUNC and STRUCT objects: 51597
> > > - number of FUNC, STRUCT, UNION, ENUM, ENUM64, TYPEDEF, DATASEC objects: 56817
> > >   (these are all kinds for which lookup by name might make sense)
> > > - number of named objects: 54246
> > > - number of name collisions:
> > >   - unique names: 53985 counts
> > >   - 2 objects with the same name: 129 counts
> > >   - 3 objects with the same name: 3 counts
> > > 
> > > So, it appears that having a single map for all named objects makes
> > > sense and would also simplify the implementation, what do you think?
> > 
> > Some more numbers for my config:
> > - 13241 types (struct, union, typedef, enum), log2 13241 = 13.7
> > - 43575 funcs, log2 43575 = 15.4
> > Thus, having separate map for types vs functions might save ~1.7
> > search iterations. Is this a significant slowdown in practice?
> 
> What do you propose to do in case of duplicates ?
> func and struct can have the same name, but they will have two different
> btf_ids. How do we store them ?
> Also we might add global vars to BTF. Such request came up several times.
> So we need to make sure our search approach scales to
> func, struct, vars. I don't recall whether we search any other kinds.
> Separate arrays for different kinds seems ok.
> It's a bit of code complexity, but it's not an increase in memory.

Binary search gives, say, lowest index of a thing with name A, then
increment index while name remains A looking for correct kind.
Given the name conflicts info from above, 99% of times there would be
no need to iterate and in very few cases there would a couple of iterations.

Same logic would be necessary with current approach if different BTF
kinds would be allowed in BTF_ID_NAME_* cohorts. I figured that these
cohorts are mainly a way to split the tree for faster lookups, but
maybe that is not the main intent.

> With 13k structs and 43k funcs it's 56k * (4 + 4) that's 0.5 Mbyte
> extra memory. That's quite a bit. Anything we can do to compress it?

That's an interesting question, from the top of my head:
pre-sort in pahole (re-assign IDs so that increasing ID also would
mean "increasing" name), shouldn't be that difficult.

> Folks requested vmlinux BTF to be a module, so it's loaded on demand.
> BTF memory consumption is a concern to many.
> I think before we add these per-kind search arrays we better make
> BTF optional as a module.

next prev parent reply	other threads:[~2023-09-12 17:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-09  9:16 [RFC PATCH v2] bpf: Using binary search to improve the performance of btf_find_by_name_kind Donglin Peng
2023-09-09 12:29 ` Masami Hiramatsu
2023-09-12  2:06   ` pengdonglin
2023-09-12  3:35     ` pengdonglin
2023-09-13  8:07       ` Masami Hiramatsu
2023-09-13  9:27         ` pengdonglin
2023-09-12 13:51 ` Eduard Zingerman
2023-09-12 14:19   ` Eduard Zingerman
2023-09-12 16:40     ` Alexei Starovoitov
2023-09-12 17:03       ` Eduard Zingerman [this message]
2023-09-12 18:46         ` Alexei Starovoitov
2023-09-13 10:32           ` pengdonglin
2023-09-13 13:34             ` Alan Maguire
2023-09-13 13:45               ` Eduard Zingerman
2023-09-14 12:10                 ` pengdonglin
2023-09-14 10:13               ` pengdonglin
2023-09-14 12:46                 ` Alan Maguire
2023-09-14 13:05                   ` pengdonglin
2023-09-14 17:14                     ` Alan Maguire
2023-09-14 22:07                       ` Alexei Starovoitov
2023-09-17  9:07                       ` pengdonglin
2023-09-13  9:51         ` pengdonglin
2023-09-13 10:52       ` pengdonglin
2023-09-13  9:33     ` pengdonglin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=035ab912d7d6bd11c54c038464795da01dbed2de.camel@gmail.com \
    --to=eddyz87@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dinghui@sangfor.com.cn \
    --cc=huangcun@sangfor.com.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mhiramat@kernel.org \
    --cc=pengdonglin@sangfor.com.cn \
    --cc=rostedt@goodmis.org \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox