BPF List
 help / color / mirror / Atom feed
From: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Alan Maguire <alan.maguire@oracle.com>,
	Mykyta Yatsenko <yatsenko@meta.com>
Cc: Ihor Solodrai <ihor.solodrai@pm.me>,
	bpf@vger.kernel.org, andrii@kernel.org, ast@kernel.org,
	daniel@iogearbox.net, mykolal@fb.com
Subject: Re: [PATCH bpf-next 2/2] selftests/bpf: do not update vmlinux.h unnecessarily
Date: Fri, 30 Aug 2024 22:23:41 +0100	[thread overview]
Message-ID: <45a24817-358d-4d25-ae7c-118539ec2ba7@gmail.com> (raw)
In-Reply-To: <CAEf4BzaBMhb4a2Y-2_mcLmYjJ2UWQuwNF-2sPVJXo39+0ziqzw@mail.gmail.com>

On 30/08/2024 21:34, Andrii Nakryiko wrote:
> On Wed, Aug 28, 2024 at 3:02 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>> On Wed, 2024-08-28 at 17:46 +0000, Ihor Solodrai wrote:
>>> %.bpf.o objects depend on vmlinux.h, which makes them transitively
>>> dependent on unnecessary libbpf headers. However vmlinux.h doesn't
>>> actually change as often.
>>>
>>> When generating vmlinux.h, compare it to a previous version and update
>>> it only if there are changes.
>>>
>>> Example of build time improvement (after first clean build):
>>>    $ touch ../../../lib/bpf/bpf.h
>>>    $ time make -j8
>>> Before: real  1m37.592s
>>> After:  real  0m27.310s
>>>
>>> Notice that %.bpf.o gen step is skipped if vmlinux.h hasn't changed.
>>>
>>> Link: https://lore.kernel.org/bpf/CAEf4BzY1z5cC7BKye8=A8aTVxpsCzD=p1jdTfKC7i0XVuYoHUQ@mail.gmail.com
>>>
>>> Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
>>> ---
>> Unfortunately, I think that this is a half-measure.
>> E.g. the following command forces tests rebuild for me:
>>
>>    touch ../../../../kernel/bpf/verifier.c; \
>>    make -j22 -C ../../../../; \
>>    time make test_progs
>>
>> To workaround this we need to enable reproducible_build option:
>>
>>      diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
>>      index b75f09f3f424..8cd648f3e32b 100644
>>      --- a/scripts/Makefile.btf
>>      +++ b/scripts/Makefile.btf
>>      @@ -19,7 +19,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 125)      += --skip_encoding_btf_inconsis
>>       else
>>
>>       # Switch to using --btf_features for v1.26 and later.
>>      -pahole-flags-$(call test-ge, $(pahole-ver), 126)  = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs
>>      +pahole-flags-$(call test-ge, $(pahole-ver), 126)  = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build
>>
>>       ifneq ($(KBUILD_EXTMOD),)
>>       module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base
>>
>> Question to the mailing list: do we want this?
> Alan, can you please give us a summary of what are the consequences of
> the reproducible_build pahole option? In terms of performance and
> otherwise.
>
> I've applied patches as is, despite them not solving the issue
> completely, as they are moving us in the right direction anyways. I do
> get slightly different BTF every single time I rebuild my kernel, so
> the change in patch #2 doesn't yet help me.
>
> For libbpf headers, Ihor, can you please follow up with adding
> bpf_helper_defs.h as a dependency?
>
> I have some ideas on how to make BTF regeneration in vmlinux.h itself
> unnecessary, that might help with this issue. Separately (depending on
> what are the negatives of the reproducible_build option) we can look
> into making pahole have more consistent internal BTF type ordering
> without negatively affecting the overall BTF dedup performance in
> pahole. Hopefully I can work with Ihor on this as follow ups.
>
> P.S. I also spent more time than I'm willing to admit trying to
> improve bpftool's BTF sorting to minimize the chance of vmlinux.h
> contents being different, and I think I removed a bunch of cases where
> we had unnecessary differences, but still, it's fundamentally
> non-deterministic to do everything based on type and field names,
> unfortunately.
>
> Anyways, Mykyta (cc'ed), what do you think about the changes below?
> Note that I'm also fixing the incorrect handling of enum64 (would be
> nice to prepare a proper patch and send it upstream, if you get a
> chance).
>
> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> index 6789c7a4d5ca..e8a244b09d56 100644
> --- a/tools/bpf/bpftool/btf.c
> +++ b/tools/bpf/bpftool/btf.c
> @@ -50,6 +50,7 @@ struct sort_datum {
>          int type_rank;
>          const char *sort_name;
>          const char *own_name;
> +       __u64 disambig_hash;
>   };
>
>   static const char *btf_int_enc_str(__u8 encoding)
> @@ -552,35 +553,92 @@ static int btf_type_rank(const struct btf *btf,
> __u32 index, bool has_name)
>          }
>   }
>
> -static const char *btf_type_sort_name(const struct btf *btf, __u32
> index, bool from_ref)
> +static const char *btf_type_sort_name(const struct btf *btf, __u32
> index, bool from_ref, const char *typedef_name)
>   {
>          const struct btf_type *t = btf__type_by_id(btf, index);
> +       int name_off;
>
>          switch (btf_kind(t)) {
>          case BTF_KIND_ENUM:
> -       case BTF_KIND_ENUM64: {
> -               int name_off = t->name_off;
> -
>                  /* Use name of the first element for anonymous enums
> if allowed */
>                  if (!from_ref && !t->name_off && btf_vlen(t))
>                          name_off = btf_enum(t)->name_off;
> +               else
> +                       name_off = t->name_off;
> +
> +               return btf__name_by_offset(btf, name_off);
> +       case BTF_KIND_ENUM64:
> +               /* Use name of the first element for anonymous enums
> if allowed */
> +               if (!from_ref && !t->name_off && btf_vlen(t))
> +                       name_off = btf_enum64(t)->name_off;
> +               else
> +                       name_off = t->name_off;
>
>                  return btf__name_by_offset(btf, name_off);
> -       }
>          case BTF_KIND_ARRAY:
> -               return btf_type_sort_name(btf, btf_array(t)->type, true);
> +               return btf_type_sort_name(btf, btf_array(t)->type,
> true, typedef_name);
> +       case BTF_KIND_STRUCT:
> +       case BTF_KIND_UNION:
> +               if (t->name_off == 0)
> +                       return typedef_name;
> +               return btf__name_by_offset(btf, t->name_off);
> +       case BTF_KIND_TYPEDEF:
> +               return btf_type_sort_name(btf, t->type, true,
> +                                         btf__name_by_offset(btf,
> t->name_off));
>          case BTF_KIND_TYPE_TAG:
>          case BTF_KIND_CONST:
>          case BTF_KIND_PTR:
>          case BTF_KIND_VOLATILE:
>          case BTF_KIND_RESTRICT:
> -       case BTF_KIND_TYPEDEF:
>          case BTF_KIND_DECL_TAG:
> -               return btf_type_sort_name(btf, t->type, true);
> +               return btf_type_sort_name(btf, t->type, true, typedef_name);
>          default:
>                  return btf__name_by_offset(btf, t->name_off);
>          }
> -       return NULL;
> +}
> +
> +static __u64 hasher(__u64 hash, __u64 val)
> +{
> +       return hash * 31 + val;
> +}
> +
> +static __u64 btf_type_disambig_hash(const struct btf *btf, __u32 index)
> +{
> +       const struct btf_type *t = btf__type_by_id(btf, index);
> +       int i;
> +       size_t hash = 0;
> +
> +       switch (btf_kind(t)) {
> +       case BTF_KIND_ENUM:
> +               hash = hasher(hash, t->size);
> +               for (i = 0; i < btf_vlen(t); i++)
> +                       hash = hasher(hash, btf_enum(t)[i].name_off);
> +               break;
> +       case BTF_KIND_ENUM64:
> +               hash = hasher(hash, t->size);
> +               for (i = 0; i < btf_vlen(t); i++)
> +                       hash = hasher(hash, btf_enum64(t)[i].name_off);
> +               break;
> +       case BTF_KIND_STRUCT:
> +       case BTF_KIND_UNION: {
> +               const struct btf_member *m;
> +               const char *ftname;
> +
> +               hash = hasher(hash, t->size);
> +               for (i = 0; i < btf_vlen(t); i++) {
> +                       m = btf_members(t) + i;
> +                       hash = hasher(hash, m->name_off);
> +
> +                       /* resolve field type's name and hash it as well */
> +                       ftname = btf_type_sort_name(btf, m->type, false, "");
> +                       hash = hasher(hash, str_hash(ftname));
> +               }
> +               break;
> +       }
> +       default:
> +               break;
> +       }
> +       return hash;
>   }
>
>   static int btf_type_compare(const void *left, const void *right)
> @@ -596,7 +654,14 @@ static int btf_type_compare(const void *left,
> const void *right)
>          if (r)
>                  return r;
>
> -       return strcmp(d1->own_name, d2->own_name);
> +       r = strcmp(d1->own_name, d2->own_name);
> +       if (r)
> +               return r;
> +
> +       if (d1->disambig_hash != d2->disambig_hash)
> +               return d1->disambig_hash < d2->disambig_hash ? -1 : 1;
> +
> +       return d1->index < d2->index ? -1 : 1;
>   }
>
>   static struct sort_datum *sort_btf_c(const struct btf *btf)
> @@ -615,8 +680,9 @@ static struct sort_datum *sort_btf_c(const struct btf *btf)
>
>                  d->index = i;
>                  d->type_rank = btf_type_rank(btf, i, false);
> -               d->sort_name = btf_type_sort_name(btf, i, false);
> +               d->sort_name = btf_type_sort_name(btf, i, false, "");
>                  d->own_name = btf__name_by_offset(btf, t->name_off);
> +               d->disambig_hash = btf_type_disambig_hash(btf, i);
>          }
>
>          qsort(datums, n, sizeof(struct sort_datum), btf_type_compare);
>
Thanks for pointing to the bug of enum64 handling. I'll create a patch.
Reading the rest of the code, hashing struct/union/enum fields is 
introduced:
this is only useful for disambiguating ordering of the anonymous 
structs/unions/enums.

I suspect the biggest source of the issues are structs and unions, though.
Are definitions like this create problems?
typedef struct {...} foo_t;
?
I'll check what other differences this change makes.
>> [...]
>>


  parent reply	other threads:[~2024-08-30 21:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-28 17:46 [PATCH bpf-next 1/2] selftests/bpf: specify libbpf headers required for %.bpf.o progs Ihor Solodrai
2024-08-28 17:46 ` [PATCH bpf-next 2/2] selftests/bpf: do not update vmlinux.h unnecessarily Ihor Solodrai
2024-08-28 22:02   ` Eduard Zingerman
2024-08-28 23:25     ` Alexei Starovoitov
2024-08-30 20:34     ` Andrii Nakryiko
2024-08-30 21:03       ` Alan Maguire
2024-08-30 23:42         ` Arnaldo Carvalho de Melo
2024-08-30 21:23       ` Mykyta Yatsenko [this message]
2024-08-30 22:18         ` Andrii Nakryiko
2024-08-31 18:18       ` Ihor Solodrai
2024-09-03 16:58         ` Andrii Nakryiko
2024-08-28 21:20 ` [PATCH bpf-next 1/2] selftests/bpf: specify libbpf headers required for %.bpf.o progs Eduard Zingerman
2024-08-30 20:40 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45a24817-358d-4d25-ae7c-118539ec2ba7@gmail.com \
    --to=mykyta.yatsenko5@gmail.com \
    --cc=alan.maguire@oracle.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=ihor.solodrai@pm.me \
    --cc=mykolal@fb.com \
    --cc=yatsenko@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox