From: Eduard Zingerman <eddyz87@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Alan Maguire <alan.maguire@oracle.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org,
daniel@iogearbox.net, kernel-team@fb.com, yhs@fb.com,
arnaldo.melo@gmail.com
Subject: Re: [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations
Date: Mon, 31 Oct 2022 03:00:29 +0200 [thread overview]
Message-ID: <8da108a196542eb2d85327b4eeb9c8630fe8cf60.camel@gmail.com> (raw)
In-Reply-To: <CAEf4Bzaf8XhO6OPoKSHPPSa1oQQ+KFHeN5Rmp0vn_9dgvOkOYw@mail.gmail.com>
On Thu, 2022-10-27 at 15:07 -0700, Andrii Nakryiko wrote:
> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > Deduplicate forward declarations that don't take part in type graphs
> > comparisons if declaration name is unambiguous. Example:
> >
> > CU #1:
> >
> > struct foo; // standalone forward declaration
> > struct foo *some_global;
> >
> > CU #2:
> >
> > struct foo { int x; };
> > struct foo *another_global;
> >
> > The `struct foo` from CU #1 is not a part of any definition that is
> > compared against another definition while `btf_dedup_struct_types`
> > processes structural types. The the BTF after `btf_dedup_struct_types`
> > the BTF looks as follows:
> >
> > [1] STRUCT 'foo' size=4 vlen=1 ...
> > [2] INT 'int' size=4 ...
> > [3] PTR '(anon)' type_id=1
> > [4] FWD 'foo' fwd_kind=struct
> > [5] PTR '(anon)' type_id=4
> >
> > This commit adds a new pass `btf_dedup_standalone_fwds`, that maps
> > such forward declarations to structs or unions with identical name in
> > case if the name is not ambiguous.
> >
> > The pass is positioned before `btf_dedup_ref_types` so that types
> > [3] and [5] could be merged as a same type after [1] and [4] are merged.
> > The final result for the example above looks as follows:
> >
> > [1] STRUCT 'foo' size=4 vlen=1
> > 'x' type_id=2 bits_offset=0
> > [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> > [3] PTR '(anon)' type_id=1
> >
> > For defconfig kernel with BTF enabled this removes 63 forward
> > declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
> > The running time of `btf__dedup` function is increased by about 3%.
> >
>
> What about modules, can you share stats for module BTFs?
>
> Also cc Alan as he was looking at BTF dedup improvements for kernel
> module BTF dedup.
>
> > Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> > ---
> > tools/lib/bpf/btf.c | 178 +++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 174 insertions(+), 4 deletions(-)
> >
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index d88647da2c7f..c34c68d8e8a0 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
> > @@ -2881,6 +2881,7 @@ static int btf_dedup_strings(struct btf_dedup *d);
> > static int btf_dedup_prim_types(struct btf_dedup *d);
> > static int btf_dedup_struct_types(struct btf_dedup *d);
> > static int btf_dedup_ref_types(struct btf_dedup *d);
> > +static int btf_dedup_standalone_fwds(struct btf_dedup *d);
> > static int btf_dedup_compact_types(struct btf_dedup *d);
> > static int btf_dedup_remap_types(struct btf_dedup *d);
> >
> > @@ -2988,15 +2989,16 @@ static int btf_dedup_remap_types(struct btf_dedup *d);
> > * Algorithm summary
> > * =================
> > *
> > - * Algorithm completes its work in 6 separate passes:
> > + * Algorithm completes its work in 7 separate passes:
> > *
> > * 1. Strings deduplication.
> > * 2. Primitive types deduplication (int, enum, fwd).
> > * 3. Struct/union types deduplication.
> > - * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> > + * 4. Standalone fwd declarations deduplication.
>
> Let's call this "Resolve unambiguous forward declarations", we don't
> really deduplicate anything. And call the function
> btf_dedup_resolve_fwds()?
>
> > + * 5. Reference types deduplication (pointers, typedefs, arrays, funcs, func
> > * protos, and const/volatile/restrict modifiers).
> > - * 5. Types compaction.
> > - * 6. Types remapping.
> > + * 6. Types compaction.
> > + * 7. Types remapping.
> > *
> > * Algorithm determines canonical type descriptor, which is a single
> > * representative type for each truly unique type. This canonical type is the
> > @@ -3060,6 +3062,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
> > pr_debug("btf_dedup_struct_types failed:%d\n", err);
> > goto done;
> > }
> > + err = btf_dedup_standalone_fwds(d);
> > + if (err < 0) {
> > + pr_debug("btf_dedup_standalone_fwd failed:%d\n", err);
> > + goto done;
> > + }
> > err = btf_dedup_ref_types(d);
> > if (err < 0) {
> > pr_debug("btf_dedup_ref_types failed:%d\n", err);
> > @@ -4525,6 +4532,169 @@ static int btf_dedup_ref_types(struct btf_dedup *d)
> > return 0;
> > }
> >
> > +/*
> > + * `name_off_map` maps name offsets to type ids (essentially __u32 -> __u32).
> > + *
> > + * The __u32 key/value representations are cast to `void *` before passing
> > + * to `hashmap__*` functions. These pseudo-pointers are never dereferenced.
> > + *
> > + */
> > +static struct hashmap *name_off_map__new(void)
> > +{
> > + return hashmap__new(btf_dedup_identity_hash_fn,
> > + btf_dedup_equal_fn,
> > + NULL);
> > +}
>
> is there a point in name_off_map__new and name_off_map__find wrappers
> except to add one extra function to jump through when reading the
> code? If you look at other uses of hashmaps in this file, we use the
> directly. Let's drop those.
>
> > +
> > +static int name_off_map__find(struct hashmap *map, __u32 name_off, __u32 *type_id)
> > +{
> > + /* This has to be sizeof(void *) in order to be passed to hashmap__find */
> > + void *tmp;
> > + int found = hashmap__find(map, (void *)(ptrdiff_t)name_off, &tmp);
>
> but this (void *) casting everything was an error in API design, mea
> culpa. I've been wanting to switch hashmap to use long as key/value
> type for a long while, maybe let's do it now, as we are adding even
> more code that looks weird? It seems like accepting long will make
> hashmap API usage cleaner in most cases. There are not a lot of places
> where we use hashmap APIs in libbpf, but we'll also need to fix up
> bpftool usage, and I believe perf copy/pasted hashmap.h (cc Arnaldo),
> so we'd need to make sure to not break all that. But good thing it's
> all in the same repo and we can convert them at the same time with no
> breakage.
>
> WDYT?
Well, I did the change, excluding tests it amounts to:
- 15 files changed, 114 insertions(+), 137 deletions(-);
- 45 casts removed;
- 30 casts added.
TBH, it seems like I should just use "u32_as_hash_field" and be done
with it. In any case I'll post this as a part of v1 series for
"libbpf: Resolve unambigous forward declarations".
To account for a case when map has to store pointers and pointers are
32 bit I chose to update the map interface to be "uintptr_t -> uintptr_t".
Had it been "u64 -> u64" the additional temporary variable would be
necessary for "old" values, e.g. while working with hashmap__insert.
(Contrary to what we discussed on Friday).
>
> > + /*
> > + * __u64 cast is necessary to avoid pointer to integer conversion size warning.
> > + * It is fine to get rid of this warning as `void *` is used as an integer value.
> > + */
> > + if (found)
> > + *type_id = (__u64)tmp;
> > + return found;
> > +}
> > +
> > +static int name_off_map__set(struct hashmap *map, __u32 name_off, __u32 type_id)
> > +{
> > + return hashmap__set(map, (void *)(size_t)name_off, (void *)(size_t)type_id,
> > + NULL, NULL);
> > +}
>
> this function will also be completely unnecessary with longs
>
> > +
> > +/*
> > + * Collect a `name_off_map` that maps type names to type ids for all
> > + * canonical structs and unions. If the same name is shared by several
> > + * canonical types use a special value 0 to indicate this fact.
> > + */
> > +static int btf_dedup_fill_unique_names_map(struct btf_dedup *d, struct hashmap *names_map)
> > +{
> > + int i, err = 0;
> > + __u32 type_id, collision_id;
> > + __u16 kind;
> > + struct btf_type *t;
> > +
> > + for (i = 0; i < d->btf->nr_types; i++) {
> > + type_id = d->btf->start_id + i;
> > + t = btf_type_by_id(d->btf, type_id);
> > + kind = btf_kind(t);
> > +
> > + if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
> > + continue;
>
> let's also do ENUM FWD resolution. ENUM FWD is just ENUM with vlen=0
>
> > +
> > + /* Skip non-canonical types */
> > + if (type_id != d->map[type_id])
> > + continue;
> > +
> > + err = 0;
> > + if (name_off_map__find(names_map, t->name_off, &collision_id)) {
> > + /* Mark non-unique names with 0 */
> > + if (collision_id != 0 && collision_id != type_id)
> > + err = name_off_map__set(names_map, t->name_off, 0);
> > + } else {
> > + err = name_off_map__set(names_map, t->name_off, type_id);
> > + }
>
> err = hashmap__add(..., t->name_off, type_id);
> if (err == -EEXISTS) {
> hashmap__set(..., 0);
> return 0;
> }
>
> see comment for hashmap_insert_strategy in hashmap.h
>
> > +
> > + if (err < 0)
> > + return err;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int btf_dedup_standalone_fwd(struct btf_dedup *d,
> > + struct hashmap *names_map,
> > + __u32 type_id)
> > +{
> > + struct btf_type *t = btf_type_by_id(d->btf, type_id);
> > + __u16 kind = btf_kind(t);
> > + enum btf_fwd_kind fwd_kind = BTF_INFO_KFLAG(t->info);
> > +
>
> nit: don't break variables block in two parts, there shouldn't be empty lines
>
> also please use btf_kflag(t)
>
>
> > + struct btf_type *cand_t;
> > + __u16 cand_kind;
> > + __u32 cand_id = 0;
> > +
> > + if (kind != BTF_KIND_FWD)
> > + return 0;
> > +
> > + /* Skip if this FWD already has a mapping */
> > + if (type_id != d->map[type_id])
> > + return 0;
> > +
>
> [...]
next prev parent reply other threads:[~2022-10-31 1:00 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-25 22:27 [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 01/12] libbpf: Deduplicate unambigous standalone forward declarations Eduard Zingerman
2022-10-27 22:07 ` Andrii Nakryiko
2022-10-31 1:00 ` Eduard Zingerman [this message]
2022-10-31 15:49 ` Eduard Zingerman
2022-11-01 17:08 ` Alan Maguire
2022-11-01 17:37 ` Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 02/12] selftests/bpf: Tests for standalone forward BTF declarations deduplication Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 03/12] libbpf: Support for BTF_DECL_TAG dump in C format Eduard Zingerman
2022-10-27 22:36 ` Andrii Nakryiko
2022-10-25 22:27 ` [RFC bpf-next 04/12] selftests/bpf: Tests " Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 05/12] libbpf: Header guards for selected data structures in vmlinux.h Eduard Zingerman
2022-10-27 22:44 ` Andrii Nakryiko
2022-10-25 22:27 ` [RFC bpf-next 06/12] selftests/bpf: Tests for header guards printing in BTF dump Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 07/12] bpftool: Enable header guards generation Eduard Zingerman
2022-10-25 22:27 ` [RFC bpf-next 08/12] kbuild: Script to infer header guard values for uapi headers Eduard Zingerman
2022-10-27 22:51 ` Andrii Nakryiko
2022-10-25 22:27 ` [RFC bpf-next 09/12] kbuild: Header guards for types from include/uapi/*.h in kernel BTF Eduard Zingerman
2022-10-27 18:43 ` Yonghong Song
2022-10-27 18:55 ` Yonghong Song
2022-10-27 22:44 ` Yonghong Song
2022-10-28 0:00 ` Eduard Zingerman
2022-10-28 0:14 ` Mykola Lysenko
2022-10-28 1:23 ` Yonghong Song
2022-10-28 1:21 ` Yonghong Song
2022-10-25 22:27 ` [RFC bpf-next 10/12] selftests/bpf: Script to verify uapi headers usage with vmlinux.h Eduard Zingerman
2022-10-25 22:28 ` [RFC bpf-next 11/12] selftests/bpf: Known good uapi headers for test_uapi_headers.py Eduard Zingerman
2022-10-25 22:28 ` [RFC bpf-next 12/12] selftests/bpf: script for infer_header_guards.pl testing Eduard Zingerman
2022-10-25 23:46 ` [RFC bpf-next 00/12] Use uapi kernel headers with vmlinux.h Alexei Starovoitov
2022-10-26 22:46 ` Eduard Zingerman
2022-10-26 11:10 ` Alan Maguire
2022-10-26 23:54 ` Eduard Zingerman
2022-10-27 23:14 ` Andrii Nakryiko
2022-10-28 1:33 ` Yonghong Song
2022-10-28 17:13 ` Andrii Nakryiko
2022-10-28 18:56 ` Yonghong Song
2022-10-28 21:35 ` Andrii Nakryiko
2022-11-01 16:01 ` Alan Maguire
2022-11-01 18:35 ` Alexei Starovoitov
2022-11-01 19:21 ` Eduard Zingerman
2022-11-01 19:44 ` Alexei Starovoitov
2022-11-11 21:55 ` Eduard Zingerman
2022-11-14 7:52 ` Yonghong Song
2022-11-14 21:13 ` Eduard Zingerman
2022-11-14 21:50 ` Alexei Starovoitov
2022-11-16 2:01 ` Eduard Zingerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8da108a196542eb2d85327b4eeb9c8630fe8cf60.camel@gmail.com \
--to=eddyz87@gmail.com \
--cc=alan.maguire@oracle.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=arnaldo.melo@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox