BPF List
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next 1/3] selftests/bpf: Copy over libbpf configs
From: sdf @ 2022-07-12 22:33 UTC (permalink / raw)
  To: Daniel Müller
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Kernel Team, Mykola Lysenko
In-Reply-To: <20220712215322.rw3z6eoix3yagi2q@muellerd-fedora-MJ0AC3F3>

On 07/12, Daniel M�ller wrote:
> On Tue, Jul 12, 2022 at 02:27:47PM -0700, Alexei Starovoitov wrote:
> > On Tue, Jul 12, 2022 at 2:21 PM Daniel M�ller <deso@posteo.net> wrote:
> > >
> > > This change integrates the libbpf maintained configurations and
> > > black/white lists [0] into the repository, co-located with the BPF
> > > selftests themselves. The only differences from the source is that we
> > > replaced the terms blacklist & whitelist with denylist and allowlist,
> > > respectively.
> > >
> > > [0]  
> https://github.com/libbpf/libbpf/tree/20f03302350a4143825cedcbd210c4d7112c1898/travis-ci/vmtest/configs
> > >
> > > Signed-off-by: Daniel M�ller <deso@posteo.net>
> > > ---
> > >  .../bpf/configs/allowlist/ALLOWLIST-4.9.0     |    8 +
> > >  .../bpf/configs/allowlist/ALLOWLIST-5.5.0     |   55 +
> > >  .../selftests/bpf/configs/config-latest.s390x | 2711 +++++++++++++++
> > >  .../bpf/configs/config-latest.x86_64          | 3073  
> +++++++++++++++++
> >
> > Instead of checking in the full config please trim it to
> > relevant dependencies like existing selftests/bpf/config.
> > Otherwise every update/addition would trigger massive patches.

> Thanks for taking a look. Sure. Do we have some kind of tooling for that  
> or are
> there any suggestions on the best approach to minimize?

I would be interested to know as well if somebody knows some tricks on
how to deal with kconfig. I've spent some time yesterday manually
crafting various minimal bpf configs (for build tests), running make
olddefconfig and then verifying that all my options are still present in
the final config file.

It seems like kconfig tool can resolve some of the dependencies,
but there is a lot of if/endif that can break in non-obvious ways.
For example, putting CONFIG_TRACING=y and doing 'make olddefconfig'
won't get you CONFIG_TRACING=y in the final .config

So the only thing, for me, that helped, was to manually go through
the kconfig files trying to see what the dependencies are.
I've tried scripts/kconfig/merge_config.sh, but it doesn't
seem to bring anything new to the table..

So here is what I ended up with, I don't think it will help you that
much, but at least can highlight the moving parts (I was thinking that
maybe we can eventually put them in the CI as well to make sure all weird
configurations are build-tested?):

.config+all:CONFIG_MODULES=y
.config+all:CONFIG_HAVE_EBPF_JIT=y
.config+all:CONFIG_BPF=y
.config+all:CONFIG_BPF_SYSCALL=y
.config+all:CONFIG_BPF_JIT=y
.config+all:CONFIG_BPF_JIT_ALWAYS_ON=y
.config+all:CONFIG_BPF_JIT_DEFAULT_ON=y
.config+all:CONFIG_CGROUPS=y
.config+all:CONFIG_CGROUP_BPF=y
.config+all:CONFIG_SECURITY=y
.config+all:CONFIG_KPROBES=y
.config+all:CONFIG_TRACING=y
.config+all:CONFIG_FTRACE=y
.config+all:CONFIG_BPF_KPROBE_OVERRIDE=y
.config+all:CONFIG_BPF_EVENTS=y
.config+all:CONFIG_BPF_LSM=y
.config+all:CONFIG_NET=y
.config+all:CONFIG_INET=y
.config+all:CONFIG_NET_SCHED=y
.config+all:CONFIG_NET_CLS_ACT=y
.config+all:CONFIG_BPF_STREAM_PARSER=y
.config+all:CONFIG_NET_ACT_BPF=y
.config+all:CONFIG_NET_CLS_BPF=y
.config+all:CONFIG_TEST_BPF=y

.config-all:CONFIG_BPFILTER=n
.config-all:CONFIG_BPF=n
.config-all:CONFIG_BPF_JIT=n
.config-all:CONFIG_BPF_SYSCALL=n
.config-all:CONFIG_HAVE_EBPF_JIT=n
.config-all:CONFIG_NET_ACT_BPF=n
.config-all:CONFIG_NET_CLS_BPF=n
.config-all:CONFIG_TEST_BPF=n

.config-net:CONFIG_MODULES=y
.config-net:CONFIG_HAVE_EBPF_JIT=y
.config-net:CONFIG_BPF=y
.config-net:CONFIG_BPF_SYSCALL=y
.config-net:CONFIG_BPF_JIT=y
.config-net:CONFIG_BPF_JIT_ALWAYS_ON=y
.config-net:CONFIG_BPF_JIT_DEFAULT_ON=y
.config-net:CONFIG_CGROUPS=y
.config-net:CONFIG_CGROUP_BPF=y
.config-net:CONFIG_SECURITY=y
.config-net:CONFIG_KPROBES=y
.config-net:CONFIG_TRACING=y
.config-net:CONFIG_FTRACE=y
.config-net:CONFIG_BPF_KPROBE_OVERRIDE=y
.config-net:CONFIG_BPF_EVENTS=y
.config-net:CONFIG_BPF_LSM=y
.config-net:CONFIG_NET=n
.config-net:CONFIG_INET=n
.config-net:CONFIG_NET_SCHED=n
.config-net:CONFIG_NET_CLS_ACT=n
.config-net:CONFIG_BPF_STREAM_PARSER=n
.config-net:CONFIG_NET_ACT_BPF=n
.config-net:CONFIG_NET_CLS_BPF=n
.config-net:CONFIG_TEST_BPF=n

.config-cg-lsm:CONFIG_MODULES=y
.config-cg-lsm:CONFIG_HAVE_EBPF_JIT=y
.config-cg-lsm:CONFIG_BPF=y
.config-cg-lsm:CONFIG_BPF_SYSCALL=y
.config-cg-lsm:CONFIG_BPF_JIT=y
.config-cg-lsm:CONFIG_BPF_JIT_ALWAYS_ON=y
.config-cg-lsm:CONFIG_BPF_JIT_DEFAULT_ON=y
.config-cg-lsm:CONFIG_SECURITY=y
.config-cg-lsm:CONFIG_KPROBES=y
.config-cg-lsm:CONFIG_TRACING=n
.config-cg-lsm:CONFIG_FTRACE=n
.config-cg-lsm:CONFIG_BPF_EVENTS=n
.config-cg-lsm:CONFIG_BPF_LSM=n
.config-cg-lsm:CONFIG_CGROUP_BPF=n

.config+classic:CONFIG_MODULES=y
.config+classic:CONFIG_HAVE_EBPF_JIT=y
.config+classic:CONFIG_BPF=y
.config+classic:CONFIG_BPF_SYSCALL=n
.config+classic:CONFIG_BPF_JIT=y
.config+classic:CONFIG_BPF_JIT_ALWAYS_ON=n
.config+classic:CONFIG_BPF_JIT_DEFAULT_ON=n
.config+classic:CONFIG_SECURITY=n
.config+classic:CONFIG_KPROBES=n
.config+classic:CONFIG_TRACING=n
.config+classic:CONFIG_FTRACE=n
.config+classic:CONFIG_BPF_EVENTS=n
.config+classic:CONFIG_BPF_LSM=n
.config+classic:CONFIG_CGROUP_BPF=n
.config+classic:CONFIG_NET=y
.config+classic:CONFIG_INET=y

.config+syscall:CONFIG_MODULES=y
.config+syscall:CONFIG_HAVE_EBPF_JIT=y
.config+syscall:CONFIG_BPF=y
.config+syscall:CONFIG_BPF_SYSCALL=y
.config+syscall:CONFIG_BPF_JIT=n
.config+syscall:CONFIG_BPF_JIT_ALWAYS_ON=n
.config+syscall:CONFIG_BPF_JIT_DEFAULT_ON=n
.config+syscall:CONFIG_SECURITY=n
.config+syscall:CONFIG_KPROBES=n
.config+syscall:CONFIG_TRACING=n
.config+syscall:CONFIG_FTRACE=n
.config+syscall:CONFIG_BPF_EVENTS=n
.config+syscall:CONFIG_BPF_LSM=n
.config+syscall:CONFIG_CGROUP_BPF=n

.config-cg+lsm:CONFIG_MODULES=y
.config-cg+lsm:CONFIG_HAVE_EBPF_JIT=y
.config-cg+lsm:CONFIG_BPF=y
.config-cg+lsm:CONFIG_BPF_SYSCALL=y
.config-cg+lsm:CONFIG_BPF_JIT=y
.config-cg+lsm:CONFIG_BPF_JIT_ALWAYS_ON=y
.config-cg+lsm:CONFIG_BPF_JIT_DEFAULT_ON=y
.config-cg+lsm:CONFIG_SECURITY=y
.config-cg+lsm:CONFIG_KPROBES=y
.config-cg+lsm:CONFIG_TRACING=y
.config-cg+lsm:CONFIG_FTRACE=y
.config-cg+lsm:CONFIG_BPF_EVENTS=y
.config-cg+lsm:CONFIG_BPF_LSM=y
.config-cg+lsm:CONFIG_CGROUP_BPF=n

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf: reparent bpf maps on memcg offlining
From: Alexei Starovoitov @ 2022-07-12 22:15 UTC (permalink / raw)
  To: Shakeel Butt; +Cc: Roman Gushchin, bpf, Alexei Starovoitov, LKML
In-Reply-To: <CALvZod5uPV9cNKCMjs3HmadVnF--fum5BgG-Zcv1vTM_Bak8hw@mail.gmail.com>

On Tue, Jul 12, 2022 at 3:11 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Tue, Jul 12, 2022 at 2:49 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jul 11, 2022 at 9:28 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > The memory consumed by a mpf map is always accounted to the memory
> > > cgroup of the process which created the map. The map can outlive
> > > the memory cgroup if it's used by processes in other cgroups or
> > > is pinned on bpffs. In this case the map pins the original cgroup
> > > in the dying state.
> > >
> > > For other types of objects (slab objects, non-slab kernel allocations,
> > > percpu objects and recently LRU pages) there is a reparenting process
> > > implemented: on cgroup offlining charged objects are getting
> > > reassigned to the parent cgroup. Because all charges and statistics
> > > are fully recursive it's a fairly cheap operation.
> > >
> > > For efficiency and consistency with other types of objects, let's do
> > > the same for bpf maps. Fortunately thanks to the objcg API, the
> > > required changes are minimal.
> > >
> > > Please, note that individual allocations (slabs, percpu and large
> > > kmallocs) already have the reparenting mechanism. This commit adds
> > > it to the saved map->memcg pointer by replacing it to map->objcg.
> > > Because dying cgroups are not visible for a user and all charges are
> > > recursive, this commit doesn't bring any behavior changes for a user.
> > >
> > > v2:
> > >   added a missing const qualifier
> > >
> > > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > > ---
> > >  include/linux/bpf.h  |  2 +-
> > >  kernel/bpf/syscall.c | 35 +++++++++++++++++++++++++++--------
> > >  2 files changed, 28 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 2b21f2a3452f..85a4db3e0536 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -221,7 +221,7 @@ struct bpf_map {
> > >         u32 btf_vmlinux_value_type_id;
> > >         struct btf *btf;
> > >  #ifdef CONFIG_MEMCG_KMEM
> > > -       struct mem_cgroup *memcg;
> > > +       struct obj_cgroup *objcg;
> > >  #endif
> > >         char name[BPF_OBJ_NAME_LEN];
> > >         struct bpf_map_off_arr *off_arr;
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index ab688d85b2c6..ef60dbc21b17 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -419,35 +419,52 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
> > >  #ifdef CONFIG_MEMCG_KMEM
> > >  static void bpf_map_save_memcg(struct bpf_map *map)
> > >  {
> > > -       map->memcg = get_mem_cgroup_from_mm(current->mm);
> > > +       /* Currently if a map is created by a process belonging to the root
> > > +        * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> > > +        * So we have to check map->objcg for being NULL each time it's
> > > +        * being used.
> > > +        */
> > > +       map->objcg = get_obj_cgroup_from_current();
> > >  }
> > >
> > >  static void bpf_map_release_memcg(struct bpf_map *map)
> > >  {
> > > -       mem_cgroup_put(map->memcg);
> > > +       if (map->objcg)
> > > +               obj_cgroup_put(map->objcg);
> > > +}
> > > +
> > > +static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map) {
> > > +       if (map->objcg)
> > > +               return get_mem_cgroup_from_objcg(map->objcg);
> > > +
> > > +       return root_mem_cgroup;
> > >  }
> > >
> > >  void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
> > >                            int node)
> > >  {
> > > -       struct mem_cgroup *old_memcg;
> > > +       struct mem_cgroup *memcg, *old_memcg;
> > >         void *ptr;
> > >
> > > -       old_memcg = set_active_memcg(map->memcg);
> > > +       memcg = bpf_map_get_memcg(map);
> > > +       old_memcg = set_active_memcg(memcg);
> > >         ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node);
> > >         set_active_memcg(old_memcg);
> > > +       mem_cgroup_put(memcg);
> >
> > Here we might css_put root_mem_cgroup.
> > Should we css_get it when returning or
> > it's marked as CSS_NO_REF ?
> > But mem_cgroup_alloc() doesn't seem to be doing that marking.
> > I'm lost at that code.
>
> CSS_NO_REF is set for root_mem_cgroup in cgroup_init_subsys().

Ahh. I see that
css = ss->css_alloc(NULL); css->flags |= CSS_NO_REF; now.
Thanks.

^ permalink raw reply

* Re: [net 0/3] seg6: fix skb checksum for SRH encapsulation/insertion
From: sdf @ 2022-07-12 22:14 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hideaki YOSHIFUJI, David Ahern, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, David Lebrun,
	Mathieu Xhonneux, netdev, linux-kernel, bpf, Stefano Salsano,
	Paolo Lungaroni, Ahmed Abdelsalam, Anton Makarov
In-Reply-To: <20220712175837.16267-1-andrea.mayer@uniroma2.it>

On 07/12, Andrea Mayer wrote:
> The Linux kernel supports Segment Routing Header (SRH)
> encapsulation/insertion operations by providing the capability to: i)
> encapsulate a packet in an outer IPv6 header with a specified SRH; ii)
> insert a specified SRH directly after the IPv6 header of the packet.
> Note that the insertion operation is also referred to as 'injection'.

> The two operations are respectively supported by seg6_do_srh_encap() and
> seg6_do_srh_inline(), which operate on the skb associated to the packet as
> needed (e.g. adding the necessary headers and initializing them, while
> taking care to recalculate the skb checksum).

> seg6_do_srh_encap() and seg6_do_srh_inline() do not initialize the payload
> length of the IPv6 header, which is carried out by the caller functions.
> However, this approach causes the corruption of the skb checksum which
> needs to be updated only after initialization of headers is completed
> (thanks to Paolo Abeni for detecting this issue).

> The patchset fixes the skb checksum corruption by moving the IPv6 header
> payload length initialization from the callers of seg6_do_srh_encap() and
> seg6_do_srh_inline() directly into these functions.

> This patchset is organized as follows:
>   - patch 1/3, seg6: fix skb checksum evaluation in SRH
>     encapsulation/insertion;
>      (* SRH encapsulation/insertion available since v4.10)

>   - patch 2/3, seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps
>     behaviors;
>      (* SRv6 End.B6 and End.B6.Encaps behaviors available since v4.14)

>   - patch 3/3, seg6: bpf: fix skb checksum in bpf_push_seg6_encap();
>      (* bpf IPv6 Segment Routing helpers available since v4.18)

BPF changes make sense. I've tested them by applying the whole series and
running test_lwt_seg6local.sh.

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Tested-by: Stanislav Fomichev <sdf@google.com>


> Thank you all,
> Andrea

> Andrea Mayer (3):
>    seg6: fix skb checksum evaluation in SRH encapsulation/insertion
>    seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
>    seg6: bpf: fix skb checksum in bpf_push_seg6_encap()

>   net/core/filter.c        | 1 -
>   net/ipv6/seg6_iptunnel.c | 5 ++++-
>   net/ipv6/seg6_local.c    | 2 --
>   3 files changed, 4 insertions(+), 4 deletions(-)

> --
> 2.20.1


^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf: reparent bpf maps on memcg offlining
From: Shakeel Butt @ 2022-07-12 22:11 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Roman Gushchin, bpf, Alexei Starovoitov, LKML
In-Reply-To: <CAADnVQ+2Az23WLHj_1pQWYXdd8CbeKooCLrkT_GnzKXV7Yp8hw@mail.gmail.com>

On Tue, Jul 12, 2022 at 2:49 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jul 11, 2022 at 9:28 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > The memory consumed by a mpf map is always accounted to the memory
> > cgroup of the process which created the map. The map can outlive
> > the memory cgroup if it's used by processes in other cgroups or
> > is pinned on bpffs. In this case the map pins the original cgroup
> > in the dying state.
> >
> > For other types of objects (slab objects, non-slab kernel allocations,
> > percpu objects and recently LRU pages) there is a reparenting process
> > implemented: on cgroup offlining charged objects are getting
> > reassigned to the parent cgroup. Because all charges and statistics
> > are fully recursive it's a fairly cheap operation.
> >
> > For efficiency and consistency with other types of objects, let's do
> > the same for bpf maps. Fortunately thanks to the objcg API, the
> > required changes are minimal.
> >
> > Please, note that individual allocations (slabs, percpu and large
> > kmallocs) already have the reparenting mechanism. This commit adds
> > it to the saved map->memcg pointer by replacing it to map->objcg.
> > Because dying cgroups are not visible for a user and all charges are
> > recursive, this commit doesn't bring any behavior changes for a user.
> >
> > v2:
> >   added a missing const qualifier
> >
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > ---
> >  include/linux/bpf.h  |  2 +-
> >  kernel/bpf/syscall.c | 35 +++++++++++++++++++++++++++--------
> >  2 files changed, 28 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 2b21f2a3452f..85a4db3e0536 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -221,7 +221,7 @@ struct bpf_map {
> >         u32 btf_vmlinux_value_type_id;
> >         struct btf *btf;
> >  #ifdef CONFIG_MEMCG_KMEM
> > -       struct mem_cgroup *memcg;
> > +       struct obj_cgroup *objcg;
> >  #endif
> >         char name[BPF_OBJ_NAME_LEN];
> >         struct bpf_map_off_arr *off_arr;
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index ab688d85b2c6..ef60dbc21b17 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -419,35 +419,52 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
> >  #ifdef CONFIG_MEMCG_KMEM
> >  static void bpf_map_save_memcg(struct bpf_map *map)
> >  {
> > -       map->memcg = get_mem_cgroup_from_mm(current->mm);
> > +       /* Currently if a map is created by a process belonging to the root
> > +        * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> > +        * So we have to check map->objcg for being NULL each time it's
> > +        * being used.
> > +        */
> > +       map->objcg = get_obj_cgroup_from_current();
> >  }
> >
> >  static void bpf_map_release_memcg(struct bpf_map *map)
> >  {
> > -       mem_cgroup_put(map->memcg);
> > +       if (map->objcg)
> > +               obj_cgroup_put(map->objcg);
> > +}
> > +
> > +static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map) {
> > +       if (map->objcg)
> > +               return get_mem_cgroup_from_objcg(map->objcg);
> > +
> > +       return root_mem_cgroup;
> >  }
> >
> >  void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
> >                            int node)
> >  {
> > -       struct mem_cgroup *old_memcg;
> > +       struct mem_cgroup *memcg, *old_memcg;
> >         void *ptr;
> >
> > -       old_memcg = set_active_memcg(map->memcg);
> > +       memcg = bpf_map_get_memcg(map);
> > +       old_memcg = set_active_memcg(memcg);
> >         ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node);
> >         set_active_memcg(old_memcg);
> > +       mem_cgroup_put(memcg);
>
> Here we might css_put root_mem_cgroup.
> Should we css_get it when returning or
> it's marked as CSS_NO_REF ?
> But mem_cgroup_alloc() doesn't seem to be doing that marking.
> I'm lost at that code.

CSS_NO_REF is set for root_mem_cgroup in cgroup_init_subsys().

^ permalink raw reply

* Re: [PATCH bpf] bpf, x86: fix freeing of not-finalized bpf_prog_pack
From: Alexei Starovoitov @ 2022-07-12 22:09 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Daniel Borkmann, Kernel Team, Alexei Starovoitov,
	Andrii Nakryiko, syzbot+2f649ec6d2eea1495a8f,
	syzbot+87f65c75f4a72db05445
In-Reply-To: <20220706002612.4013790-1-song@kernel.org>

On Tue, Jul 5, 2022 at 5:26 PM Song Liu <song@kernel.org> wrote:
>
> syzbot reported a few issues with bpf_prog_pack [1], [2]. These are
> triggered when the program passed initial JIT in jit_subprogs(), but
> failed final pass of JIT. At this point, bpf_jit_binary_pack_free() is
> called before bpf_jit_binary_pack_finalize(), and the whole 2MB page is
> freed.
>
> Fix this with a custom bpf_jit_free() for x86_64, which calls
> bpf_jit_binary_pack_finalize() if necessary. Also, with custom
> bpf_jit_free(), bpf_prog_aux->use_bpf_prog_pack is not needed any more,
> remove it.
>
> Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc")
> [1] https://syzkaller.appspot.com/bug?extid=2f649ec6d2eea1495a8f
> [2] https://syzkaller.appspot.com/bug?extid=87f65c75f4a72db05445
> Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com
> Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  arch/x86/net/bpf_jit_comp.c | 25 +++++++++++++++++++++++++
>  include/linux/bpf.h         |  1 -
>  include/linux/filter.h      |  8 ++++++++
>  kernel/bpf/core.c           | 29 ++++++++++++-----------------
>  4 files changed, 45 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index c98b8c0ed3b8..c3dca4c97e48 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -2492,3 +2492,28 @@ void *bpf_arch_text_copy(void *dst, void *src, size_t len)
>                 return ERR_PTR(-EINVAL);
>         return dst;
>  }
> +
> +void bpf_jit_free(struct bpf_prog *prog)
> +{
> +       if (prog->jited) {
> +               struct x64_jit_data *jit_data = prog->aux->jit_data;
> +               struct bpf_binary_header *hdr;
> +
> +               /*
> +                * If we fail the final pass of JIT (from jit_subprogs),
> +                * the program may not be finalized yet. Call finalize here
> +                * before freeing it.
> +                */
> +               if (jit_data) {
> +                       bpf_jit_binary_pack_finalize(prog, jit_data->header,
> +                                                    jit_data->rw_header);
> +                       kvfree(jit_data->addrs);
> +                       kfree(jit_data);
> +               }

It looks like a workaround for missed cleanup on the JIT side.
When bpf_int_jit_compile() fails it is supposed to free jit_data
immediately.

> passed initial JIT in jit_subprogs(), but
> failed final pass of JIT. At this point, bpf_jit_binary_pack_free() is
> called before bpf_jit_binary_pack_finalize()

It feels that bpf_int_jit_compile() should call
bpf_jit_binary_pack_finalize() instead in the path where
it's failing.
I could be missing details on what exactly
"failed final pass of JIT" means.

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] selftests/bpf: Copy over libbpf configs
From: Daniel Müller @ 2022-07-12 21:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Kernel Team, Mykola Lysenko
In-Reply-To: <CAADnVQLLNQHHJuqd-pKzU09Uw3N-kBsztPy0ysYEKVipP=yMqw@mail.gmail.com>

On Tue, Jul 12, 2022 at 02:27:47PM -0700, Alexei Starovoitov wrote:
> On Tue, Jul 12, 2022 at 2:21 PM Daniel Müller <deso@posteo.net> wrote:
> >
> > This change integrates the libbpf maintained configurations and
> > black/white lists [0] into the repository, co-located with the BPF
> > selftests themselves. The only differences from the source is that we
> > replaced the terms blacklist & whitelist with denylist and allowlist,
> > respectively.
> >
> > [0] https://github.com/libbpf/libbpf/tree/20f03302350a4143825cedcbd210c4d7112c1898/travis-ci/vmtest/configs
> >
> > Signed-off-by: Daniel Müller <deso@posteo.net>
> > ---
> >  .../bpf/configs/allowlist/ALLOWLIST-4.9.0     |    8 +
> >  .../bpf/configs/allowlist/ALLOWLIST-5.5.0     |   55 +
> >  .../selftests/bpf/configs/config-latest.s390x | 2711 +++++++++++++++
> >  .../bpf/configs/config-latest.x86_64          | 3073 +++++++++++++++++
> 
> Instead of checking in the full config please trim it to
> relevant dependencies like existing selftests/bpf/config.
> Otherwise every update/addition would trigger massive patches.

Thanks for taking a look. Sure. Do we have some kind of tooling for that or are
there any suggestions on the best approach to minimize?

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf: reparent bpf maps on memcg offlining
From: Alexei Starovoitov @ 2022-07-12 21:48 UTC (permalink / raw)
  To: Roman Gushchin; +Cc: bpf, Shakeel Butt, Alexei Starovoitov, LKML
In-Reply-To: <20220711162827.184743-1-roman.gushchin@linux.dev>

On Mon, Jul 11, 2022 at 9:28 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> The memory consumed by a mpf map is always accounted to the memory
> cgroup of the process which created the map. The map can outlive
> the memory cgroup if it's used by processes in other cgroups or
> is pinned on bpffs. In this case the map pins the original cgroup
> in the dying state.
>
> For other types of objects (slab objects, non-slab kernel allocations,
> percpu objects and recently LRU pages) there is a reparenting process
> implemented: on cgroup offlining charged objects are getting
> reassigned to the parent cgroup. Because all charges and statistics
> are fully recursive it's a fairly cheap operation.
>
> For efficiency and consistency with other types of objects, let's do
> the same for bpf maps. Fortunately thanks to the objcg API, the
> required changes are minimal.
>
> Please, note that individual allocations (slabs, percpu and large
> kmallocs) already have the reparenting mechanism. This commit adds
> it to the saved map->memcg pointer by replacing it to map->objcg.
> Because dying cgroups are not visible for a user and all charges are
> recursive, this commit doesn't bring any behavior changes for a user.
>
> v2:
>   added a missing const qualifier
>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> ---
>  include/linux/bpf.h  |  2 +-
>  kernel/bpf/syscall.c | 35 +++++++++++++++++++++++++++--------
>  2 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 2b21f2a3452f..85a4db3e0536 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -221,7 +221,7 @@ struct bpf_map {
>         u32 btf_vmlinux_value_type_id;
>         struct btf *btf;
>  #ifdef CONFIG_MEMCG_KMEM
> -       struct mem_cgroup *memcg;
> +       struct obj_cgroup *objcg;
>  #endif
>         char name[BPF_OBJ_NAME_LEN];
>         struct bpf_map_off_arr *off_arr;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ab688d85b2c6..ef60dbc21b17 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -419,35 +419,52 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
>  #ifdef CONFIG_MEMCG_KMEM
>  static void bpf_map_save_memcg(struct bpf_map *map)
>  {
> -       map->memcg = get_mem_cgroup_from_mm(current->mm);
> +       /* Currently if a map is created by a process belonging to the root
> +        * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> +        * So we have to check map->objcg for being NULL each time it's
> +        * being used.
> +        */
> +       map->objcg = get_obj_cgroup_from_current();
>  }
>
>  static void bpf_map_release_memcg(struct bpf_map *map)
>  {
> -       mem_cgroup_put(map->memcg);
> +       if (map->objcg)
> +               obj_cgroup_put(map->objcg);
> +}
> +
> +static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map) {
> +       if (map->objcg)
> +               return get_mem_cgroup_from_objcg(map->objcg);
> +
> +       return root_mem_cgroup;
>  }
>
>  void *bpf_map_kmalloc_node(const struct bpf_map *map, size_t size, gfp_t flags,
>                            int node)
>  {
> -       struct mem_cgroup *old_memcg;
> +       struct mem_cgroup *memcg, *old_memcg;
>         void *ptr;
>
> -       old_memcg = set_active_memcg(map->memcg);
> +       memcg = bpf_map_get_memcg(map);
> +       old_memcg = set_active_memcg(memcg);
>         ptr = kmalloc_node(size, flags | __GFP_ACCOUNT, node);
>         set_active_memcg(old_memcg);
> +       mem_cgroup_put(memcg);

Here we might css_put root_mem_cgroup.
Should we css_get it when returning or
it's marked as CSS_NO_REF ?
But mem_cgroup_alloc() doesn't seem to be doing that marking.
I'm lost at that code.

^ permalink raw reply

* Re: [PATCH 2/8] perf evsel: Do not request ptrauth sample field if not supported
From: Vince Weaver @ 2022-07-12 21:30 UTC (permalink / raw)
  To: James Clark
  Cc: Andrew Kilroy, linux-kernel, linux-perf-users, acme,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Tom Rix,
	linux-arm-kernel, netdev, bpf, llvm
In-Reply-To: <8d282e87-c1c8-f7a0-631b-8d569c2154a6@arm.com>

On Mon, 11 Jul 2022, James Clark wrote:
> On 06/07/2022 17:01, Vince Weaver wrote:
> > So in this case you are leaking ARM64-specific info into the generic 
> > perf_event_open() call?  Is there any way the kernel could implement this 
> > without userspace having to deal with the issue?
> 
> The alternative to this change is just to call it "PERF_SAMPLE_POINTER_AUTH_MASK"
> and then it's not Arm specific, it's just that only Arm implements it for now.
> This is definitely an option.
> 
> But if no platform ever implements something similar then that bit is wasted.
> The intention of adding "PERF_SAMPLE_ARCH_1" was to prevent wasting that bit.
> But as you say, maybe making it arch specific isn't the right way either.

I don't know what the current kernel policy is on this kind of thing, but 
in the past perf_event_open was meant to be a generic as possible.
Having architecture-specific magic bits is best avoided.
However I'm not the maintainer for this so really my opinion doesn't 
really matter.

I'm just speaking up as a userspace coder who is trying to write 
cross-platform tools, and having to maintain obscure arch-specific code 
paths in every single utility ends up being a huge pain.  And isn't the 
whole point of an operating system to abstract this away?

> > can tell there haven't been any documentation patches included for the 
> > Makefile.
> 
> We plan to update the docs for the syscall, but it's in another repo, and
> we'll wait for this change to be finalised first. I'm not sure what you
> mean about the Makefile?

sorry, that was a mis-type.  I meant "manpage" not Makefile.

Vince Weaver
vincent.weaver@maine.edu

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] selftests/bpf: Copy over libbpf configs
From: Alexei Starovoitov @ 2022-07-12 21:27 UTC (permalink / raw)
  To: Daniel Müller
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Kernel Team, Mykola Lysenko
In-Reply-To: <20220712212124.3180314-2-deso@posteo.net>

On Tue, Jul 12, 2022 at 2:21 PM Daniel Müller <deso@posteo.net> wrote:
>
> This change integrates the libbpf maintained configurations and
> black/white lists [0] into the repository, co-located with the BPF
> selftests themselves. The only differences from the source is that we
> replaced the terms blacklist & whitelist with denylist and allowlist,
> respectively.
>
> [0] https://github.com/libbpf/libbpf/tree/20f03302350a4143825cedcbd210c4d7112c1898/travis-ci/vmtest/configs
>
> Signed-off-by: Daniel Müller <deso@posteo.net>
> ---
>  .../bpf/configs/allowlist/ALLOWLIST-4.9.0     |    8 +
>  .../bpf/configs/allowlist/ALLOWLIST-5.5.0     |   55 +
>  .../selftests/bpf/configs/config-latest.s390x | 2711 +++++++++++++++
>  .../bpf/configs/config-latest.x86_64          | 3073 +++++++++++++++++

Instead of checking in the full config please trim it to
relevant dependencies like existing selftests/bpf/config.
Otherwise every update/addition would trigger massive patches.

^ permalink raw reply

* [PATCH bpf-next 3/3] selftests/bpf: Adjust vmtest.sh to use local kernel configuration
From: Daniel Müller @ 2022-07-12 21:21 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kernel-team; +Cc: mykolal
In-Reply-To: <20220712212124.3180314-1-deso@posteo.net>

So far the vmtest.sh script, which can be used as a convenient way to
run bpf selftests, has obtained the kernel config safe to use for
testing from the libbpf/libbpf GitHub repository [0].
Given that we now have included this configuration into this very
repository, we can just consume it from here as well, eliminating the
necessity of remote accesses.
With this change we adjust the logic in the script to use the
configuration from below tools/testing/selftests/bpf/configs/ instead of
pulling it over the network.

[0]: https://github.com/libbpf/libbpf

Signed-off-by: Daniel Müller <deso@posteo.net>
---
 tools/testing/selftests/bpf/vmtest.sh | 28 ++++++++++++---------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/bpf/vmtest.sh b/tools/testing/selftests/bpf/vmtest.sh
index e0bb04a..fdab48 100755
--- a/tools/testing/selftests/bpf/vmtest.sh
+++ b/tools/testing/selftests/bpf/vmtest.sh
@@ -30,8 +30,7 @@ DEFAULT_COMMAND="./test_progs"
 MOUNT_DIR="mnt"
 ROOTFS_IMAGE="root.img"
 OUTPUT_DIR="$HOME/.bpf_selftests"
-KCONFIG_URL="https://raw.githubusercontent.com/libbpf/libbpf/master/travis-ci/vmtest/configs/config-latest.${ARCH}"
-KCONFIG_API_URL="https://api.github.com/repos/libbpf/libbpf/contents/travis-ci/vmtest/configs/config-latest.${ARCH}"
+KCONFIG_REL_PATH="tools/testing/selftests/bpf/configs/config-latest.${ARCH}"
 INDEX_URL="https://raw.githubusercontent.com/libbpf/ci/master/INDEX"
 NUM_COMPILE_JOBS="$(nproc)"
 LOG_FILE_BASE="$(date +"bpf_selftests.%Y-%m-%d_%H-%M-%S")"
@@ -271,20 +270,17 @@ is_rel_path()
 
 update_kconfig()
 {
-	local kconfig_file="$1"
-	local update_command="curl -sLf ${KCONFIG_URL} -o ${kconfig_file}"
-	# Github does not return the "last-modified" header when retrieving the
-	# raw contents of the file. Use the API call to get the last-modified
-	# time of the kernel config and only update the config if it has been
-	# updated after the previously cached config was created. This avoids
-	# unnecessarily compiling the kernel and selftests.
+	local kernel_checkout="$1"
+	local kconfig_file="$2"
+	local kconfig_src="${kernel_checkout}/${KCONFIG_REL_PATH}"
+	local update_command="cp ${kconfig_src} ${kconfig_file}"
 	if [[ -f "${kconfig_file}" ]]; then
-		local last_modified_date="$(curl -sL -D - "${KCONFIG_API_URL}" -o /dev/null | \
-			grep "last-modified" | awk -F ': ' '{print $2}')"
-		local remote_modified_timestamp="$(date -d "${last_modified_date}" +"%s")"
-		local local_creation_timestamp="$(stat -c %Y "${kconfig_file}")"
-
-		if [[ "${remote_modified_timestamp}" -gt "${local_creation_timestamp}" ]]; then
+		local src_modified="$(stat -c %Y "${kconfig_src}")"
+		local local_modified="$(stat -c %Y "${kconfig_file}")"
+		# Only update the config if it has been updated after the
+		# previously cached config was created. This avoids
+		# unnecessarily compiling the kernel and selftests.
+		if [[ "${src_modified}" -gt "${local_modified}" ]]; then
 			${update_command}
 		fi
 	else
@@ -372,7 +368,7 @@ main()
 
 	mkdir -p "${OUTPUT_DIR}"
 	mkdir -p "${mount_dir}"
-	update_kconfig "${kconfig_file}"
+	update_kconfig "${kernel_checkout}" "${kconfig_file}"
 
 	recompile_kernel "${kernel_checkout}" "${make_command}"
 
-- 
2.30.2


^ permalink raw reply

* [PATCH bpf-next 2/3] selftests/bpf: Integrate vmtest configs
From: Daniel Müller @ 2022-07-12 21:21 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kernel-team; +Cc: mykolal
In-Reply-To: <20220712212124.3180314-1-deso@posteo.net>

This change integrates the configuration from the vmtest repository [0],
where it is currently used for testing kernel patches into the existing
configuration pulled in with an earlier patch. The result is a super set
of the configs from the two repositories.

[0]: https://github.com/kernel-patches/vmtest/tree/831ee8eb72ddb7e03babb8f7e050d52a451237aa/travis-ci/vmtest/configs

Signed-off-by: Daniel Müller <deso@posteo.net>
---
 tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest | 5 +++++
 .../selftests/bpf/configs/denylist/DENYLIST-latest.s390x     | 1 +
 2 files changed, 6 insertions(+)

diff --git a/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest b/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest
index 939de574..ddf8a0c5 100644
--- a/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest
+++ b/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest
@@ -4,3 +4,8 @@ stacktrace_build_id_nmi
 stacktrace_build_id
 task_fd_query_rawtp
 varlen
+btf_dump/btf_dump: syntax
+kprobe_multi_test/bench_attach
+core_reloc/enum64val
+core_reloc/size___diff_sz
+core_reloc/type_based___diff_sz
diff --git a/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest.s390x b/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest.s390x
index e33cab..36574b0 100644
--- a/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest.s390x
+++ b/tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest.s390x
@@ -63,5 +63,6 @@ bpf_cookie                               # failed to open_and_load program: -524
 xdp_do_redirect                          # prog_run_max_size unexpected error: -22 (errno 22)
 send_signal                              # intermittently fails to receive signal
 select_reuseport                         # intermittently fails on new s390x setup
+tc_redirect/tc_redirect_dtime            # very flaky
 xdp_synproxy                             # JIT does not support calling kernel function                                (kfunc)
 unpriv_bpf_disabled                      # fentry
-- 
2.30.2


^ permalink raw reply related

* [PATCH bpf-next 0/3] Maintain selftest configuration in-tree
From: Daniel Müller @ 2022-07-12 21:21 UTC (permalink / raw)
  To: bpf, ast, andrii, daniel, kernel-team; +Cc: mykolal

BPF selftests mandate certain kernel configuration options to be present in
order to pass. Currently the "reference" config files containing these options
are hosted in a separate repository [0]. From there they are picked up by the
BPF continuous integration system as well as the in-tree vmtest.sh helper
script, which allows for running tests in a VM-based setup locally.

But it gets worse, as "BPF CI" is really two CI systems: one for libbpf
(mentioned above) and one for the bpf-next kernel repository (or more precisely:
family of repositories, as bpf-rc is using the system). As such, we have an
additional -- and slightly divergent -- copy of these configurations.

This patch set proposes the merging of said configurations into this repository.
Doing so provides several benefits:
1) the vmtest.sh script is now self-contained, no longer requiring to pull
   configurations over the network
2) we can have a single copy of these configurations, eliminating the
   maintenance burden of keeping two versions in-sync
3) the kernel tree is the place where most development happens, so it is the
   most natural to adjust configurations as changes are proposed there, as
   opposed to out-of-tree, where they would always remain an afterthought

The patch set is structed in such a way that we first integrate the first set of
configurations [0], then we add the difference to the second one [1], and lastly
we adjust the vmtest.sh script to pick up the local configuration instead of
reaching out to GitHub.

[0] https://github.com/libbpf/libbpf/tree/20f03302350a4143825cedcbd210c4d7112c1898/travis-ci/vmtest/configs
[1] https://github.com/kernel-patches/vmtest/tree/831ee8eb72ddb7e03babb8f7e050d52a451237aa/travis-ci/vmtest/configs

Daniel Müller (3):
  selftests/bpf: Copy over libbpf configs
  selftests/bpf: Integrate vmtest configs
  selftests/bpf: Adjust vmtest.sh to use local kernel configuration

 .../bpf/configs/allowlist/ALLOWLIST-4.9.0     |    8 +
 .../bpf/configs/allowlist/ALLOWLIST-5.5.0     |   55 +
 .../selftests/bpf/configs/config-latest.s390x | 2711 +++++++++++++++
 .../bpf/configs/config-latest.x86_64          | 3073 +++++++++++++++++
 .../bpf/configs/denylist/DENYLIST-5.5.0       |  117 +
 .../bpf/configs/denylist/DENYLIST-latest      |   11 +
 .../configs/denylist/DENYLIST-latest.s390x    |   68 +
 tools/testing/selftests/bpf/vmtest.sh         |   28 +-
 8 files changed, 6055 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/configs/allowlist/ALLOWLIST-4.9.0
 create mode 100644 tools/testing/selftests/bpf/configs/allowlist/ALLOWLIST-5.5.0
 create mode 100644 tools/testing/selftests/bpf/configs/config-latest.s390x
 create mode 100644 tools/testing/selftests/bpf/configs/config-latest.x86_64
 create mode 100644 tools/testing/selftests/bpf/configs/denylist/DENYLIST-5.5.0
 create mode 100644 tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest
 create mode 100644 tools/testing/selftests/bpf/configs/denylist/DENYLIST-latest.s390x

-- 
2.30.2


^ permalink raw reply

* [PATCH bpf-next v1] bpf: Tidy up verifier check_func_arg()
From: Joanne Koong @ 2022-07-12 21:06 UTC (permalink / raw)
  To: bpf; +Cc: andrii, daniel, ast, Joanne Koong

This patch does two things:

1. For matching against the arg type, the match should be against the
base type of the arg type, since the arg type can have different
bpf_type_flags set on it.

2. Uses switch casing to improve readability + efficiency.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 kernel/bpf/verifier.c | 66 +++++++++++++++++++++++++------------------
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 328cfab3af60..26e7e787c20a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5533,17 +5533,6 @@ static bool arg_type_is_mem_size(enum bpf_arg_type type)
 	       type == ARG_CONST_SIZE_OR_ZERO;
 }
 
-static bool arg_type_is_alloc_size(enum bpf_arg_type type)
-{
-	return type == ARG_CONST_ALLOC_SIZE_OR_ZERO;
-}
-
-static bool arg_type_is_int_ptr(enum bpf_arg_type type)
-{
-	return type == ARG_PTR_TO_INT ||
-	       type == ARG_PTR_TO_LONG;
-}
-
 static bool arg_type_is_release(enum bpf_arg_type type)
 {
 	return type & OBJ_RELEASE;
@@ -5929,7 +5918,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		meta->ref_obj_id = reg->ref_obj_id;
 	}
 
-	if (arg_type == ARG_CONST_MAP_PTR) {
+	switch (base_type(arg_type)) {
+	case ARG_CONST_MAP_PTR:
 		/* bpf_map_xxx(map_ptr) call: remember that map_ptr */
 		if (meta->map_ptr) {
 			/* Use map_uid (which is unique id of inner map) to reject:
@@ -5954,7 +5944,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		}
 		meta->map_ptr = reg->map_ptr;
 		meta->map_uid = reg->map_uid;
-	} else if (arg_type == ARG_PTR_TO_MAP_KEY) {
+		break;
+	case ARG_PTR_TO_MAP_KEY:
 		/* bpf_map_xxx(..., map_ptr, ..., key) call:
 		 * check that [key, key + map->key_size) are within
 		 * stack limits and initialized
@@ -5971,7 +5962,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->key_size, false,
 					      NULL);
-	} else if (base_type(arg_type) == ARG_PTR_TO_MAP_VALUE) {
+		break;
+	case ARG_PTR_TO_MAP_VALUE:
 		if (type_may_be_null(arg_type) && register_is_null(reg))
 			return 0;
 
@@ -5987,14 +5979,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		err = check_helper_mem_access(env, regno,
 					      meta->map_ptr->value_size, false,
 					      meta);
-	} else if (arg_type == ARG_PTR_TO_PERCPU_BTF_ID) {
+		break;
+	case ARG_PTR_TO_PERCPU_BTF_ID:
 		if (!reg->btf_id) {
 			verbose(env, "Helper has invalid btf_id in R%d\n", regno);
 			return -EACCES;
 		}
 		meta->ret_btf = reg->btf;
 		meta->ret_btf_id = reg->btf_id;
-	} else if (arg_type == ARG_PTR_TO_SPIN_LOCK) {
+		break;
+	case ARG_PTR_TO_SPIN_LOCK:
 		if (meta->func_id == BPF_FUNC_spin_lock) {
 			if (process_spin_lock(env, regno, true))
 				return -EACCES;
@@ -6005,12 +5999,15 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "verifier internal error\n");
 			return -EFAULT;
 		}
-	} else if (arg_type == ARG_PTR_TO_TIMER) {
+		break;
+	case ARG_PTR_TO_TIMER:
 		if (process_timer_func(env, regno, meta))
 			return -EACCES;
-	} else if (arg_type == ARG_PTR_TO_FUNC) {
+		break;
+	case ARG_PTR_TO_FUNC:
 		meta->subprogno = reg->subprogno;
-	} else if (base_type(arg_type) == ARG_PTR_TO_MEM) {
+		break;
+	case ARG_PTR_TO_MEM:
 		/* The access to this pointer is only checked when we hit the
 		 * next is_mem_size argument below.
 		 */
@@ -6020,11 +6017,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 						      fn->arg_size[arg], false,
 						      meta);
 		}
-	} else if (arg_type_is_mem_size(arg_type)) {
-		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
-
-		err = check_mem_size_reg(env, reg, regno, zero_size_allowed, meta);
-	} else if (arg_type_is_dynptr(arg_type)) {
+		break;
+	case ARG_CONST_SIZE:
+		err = check_mem_size_reg(env, reg, regno, false, meta);
+		break;
+	case ARG_CONST_SIZE_OR_ZERO:
+		err = check_mem_size_reg(env, reg, regno, true, meta);
+		break;
+	case ARG_PTR_TO_DYNPTR:
 		if (arg_type & MEM_UNINIT) {
 			if (!is_dynptr_reg_valid_uninit(env, reg)) {
 				verbose(env, "Dynptr has to be an uninitialized dynptr\n");
@@ -6058,21 +6058,28 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 				err_extra, arg + 1);
 			return -EINVAL;
 		}
-	} else if (arg_type_is_alloc_size(arg_type)) {
+		break;
+	case ARG_CONST_ALLOC_SIZE_OR_ZERO:
 		if (!tnum_is_const(reg->var_off)) {
 			verbose(env, "R%d is not a known constant'\n",
 				regno);
 			return -EACCES;
 		}
 		meta->mem_size = reg->var_off.value;
-	} else if (arg_type_is_int_ptr(arg_type)) {
+		break;
+	case ARG_PTR_TO_INT:
+	case ARG_PTR_TO_LONG:
+	{
 		int size = int_ptr_type_to_size(arg_type);
 
 		err = check_helper_mem_access(env, regno, size, false, meta);
 		if (err)
 			return err;
 		err = check_ptr_alignment(env, reg, 0, size, true);
-	} else if (arg_type == ARG_PTR_TO_CONST_STR) {
+		break;
+	}
+	case ARG_PTR_TO_CONST_STR:
+	{
 		struct bpf_map *map = reg->map_ptr;
 		int map_off;
 		u64 map_addr;
@@ -6111,9 +6118,12 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "string is not zero-terminated\n");
 			return -EINVAL;
 		}
-	} else if (arg_type == ARG_PTR_TO_KPTR) {
+		break;
+	}
+	case ARG_PTR_TO_KPTR:
 		if (process_kptr_func(env, regno, meta))
 			return -EACCES;
+		break;
 	}
 
 	return err;
-- 
2.30.2


^ permalink raw reply related

* Re: [PATCH bpf-next v2 1/1] libbpf: perfbuf: allow raw access to buffers
From: Andrii Nakryiko @ 2022-07-12 21:03 UTC (permalink / raw)
  To: Jon Doron
  Cc: Alexei Starovoitov, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Jon Doron
In-Reply-To: <CAP7QCojcQ0kGMDO2BZH+zea7tKiWQqx_qo0KZ028hfX_2WLs9A@mail.gmail.com>

On Mon, Jul 11, 2022 at 10:47 PM Jon Doron <arilou@gmail.com> wrote:
>
>
>
> On Tue, Jul 12, 2022, 07:50 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>
>> On Mon, Jul 11, 2022 at 9:19 PM Jon Doron <arilou@gmail.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jul 12, 2022, 07:01 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>> >>
>> >> On Sun, Jul 10, 2022 at 10:07 AM Jon Doron <arilou@gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Sun, Jul 10, 2022, 18:16 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >> >>
>> >> >> On Sat, Jul 9, 2022 at 10:43 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> >
>> >> >> > I was referring to the following:
>> >> >> > https://github.com/libbpf/libbpf-rs/blob/master/libbpf-rs/src/perf_buffer.rs
>> >> >>
>> >> >> How does your patch help libbpf-rs?
>> >> >>
>> >> >> Please don't top post.
>> >> >
>> >> >
>> >> > You will be able to implement a custom perf buffer consumer, as it already has good bindings with libbpf-sys which is built from the C headers
>> >> >
>> >> > Sorry for the top posting I'm not home and replying from my phone
>> >> >
>> >>
>> >> I can see us exposing per-CPU buffers for (very) advanced users, something like:
>> >>
>> >> int perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, void
>> >> **buf, size_t buf_sz);
>> >
>> >
>> > Not sure I'm fully following what this API does, you will get a pointer to a message in the ring buffer?
>> > If so how do you consume without setting up a new tail?
>> >
>> > Or do you get a full copy of the current ring buffer (because that will mean you would have to alloc and copy which might hurt performance), but in that case you no longer a set tail or drain function.
>>
>> No, it returns a pointer to mmap()'ed per-CPU buffer memory, including
>> its header page which contains head/tail positions. As I said, it's
>> for an advanced user, you need to know the layout and how to consume
>> data.
>
>
> Oh I see well that sounds perfect to me, do you want me to send a patch? (I'm currently on vacation abroad so I don't have access to my PC but I can get to it later this week or during the weekend

Yes, please do when you get a chance.

>
>>
>> >
>> > Also perhaps regardless if this patchset will be approved or not it would probably be nice to have something like
>> > int perf_buffer__state(perf_buffer__buffer(struct perf_buffer *pb, int buf_idx, size_t *free_space, size_t *used_space);
>> >
>> > Cheers,
>> > --Jon.
>> >
>> >>
>> >> Then in combination with perf_buffer__buffer_fd() you can implement
>> >> your own polling and processing. So you just use libbpf logic to setup
>> >> buffers, but then don't call perf_buffer__poll() at all and read
>> >> records and update tail on your own.
>> >>
>> >> But this combination of perf_buffer__raw_ring_buf() and
>> >> perf_buffer__set_ring_buf_tail() seems like a bad API, sorry.
>> >>
>> >>
>> >> >>
>> >> >> > Thanks,
>> >> >> > -- Jon.
>> >> >> >
>> >> >> > On Sun, Jul 10, 2022, 08:23 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> >> >> >>
>> >> >> >> On Fri, Jul 8, 2022 at 7:54 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> > On 08/07/2022, Andrii Nakryiko wrote:
>> >> >> >> > >On Thu, Jul 7, 2022 at 11:04 PM Jon Doron <arilou@gmail.com> wrote:
>> >> >> >> > >>
>> >> >> >> > >> From: Jon Doron <jond@wiz.io>
>> >> >> >> > >>
>> >> >> >> > >> Add support for writing a custom event reader, by exposing the ring
>> >> >> >> > >> buffer state, and allowing to set it's tail.
>> >> >> >> > >>
>> >> >> >> > >> Few simple examples where this type of needed:
>> >> >> >> > >> 1. perf_event_read_simple is allocating using malloc, perhaps you want
>> >> >> >> > >>    to handle the wrap-around in some other way.
>> >> >> >> > >> 2. Since perf buf is per-cpu then the order of the events is not
>> >> >> >> > >>    guarnteed, for example:
>> >> >> >> > >>    Given 3 events where each event has a timestamp t0 < t1 < t2,
>> >> >> >> > >>    and the events are spread on more than 1 CPU, then we can end
>> >> >> >> > >>    up with the following state in the ring buf:
>> >> >> >> > >>    CPU[0] => [t0, t2]
>> >> >> >> > >>    CPU[1] => [t1]
>> >> >> >> > >>    When you consume the events from CPU[0], you could know there is
>> >> >> >> > >>    a t1 missing, (assuming there are no drops, and your event data
>> >> >> >> > >>    contains a sequential index).
>> >> >> >> > >>    So now one can simply do the following, for CPU[0], you can store
>> >> >> >> > >>    the address of t0 and t2 in an array (without moving the tail, so
>> >> >> >> > >>    there data is not perished) then move on the CPU[1] and set the
>> >> >> >> > >>    address of t1 in the same array.
>> >> >> >> > >>    So you end up with something like:
>> >> >> >> > >>    void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
>> >> >> >> > >>    and move the tails as you process in order.
>> >> >> >> > >> 3. Assuming there are multiple CPUs and we want to start draining the
>> >> >> >> > >>    messages from them, then we can "pick" with which one to start with
>> >> >> >> > >>    according to the remaining free space in the ring buffer.
>> >> >> >> > >>
>> >> >> >> > >
>> >> >> >> > >All the above use cases are sufficiently advanced that you as such an
>> >> >> >> > >advanced user should be able to write your own perfbuf consumer code.
>> >> >> >> > >There isn't a lot of code to set everything up, but then you get full
>> >> >> >> > >control over all the details.
>> >> >> >> > >
>> >> >> >> > >I don't see this API as a generally useful, it feels way too low-level
>> >> >> >> > >and special for inclusion in libbpf.
>> >> >> >> > >
>> >> >> >> >
>> >> >> >> > Hi Andrii,
>> >> >> >> >
>> >> >> >> > I understand, but I was still hoping you will be willing to expose this
>> >> >> >> > API.
>> >> >> >> > libbpf has very simple and nice binding to Rust and other languages,
>> >> >> >> > implementing one of those use cases in the bindings can make things much
>> >> >> >> > simpler than using some libc or syscall APIs, instead of enjoying all
>> >> >> >> > the simplicity that you get for free in libbpf.
>> >> >> >> >
>> >> >> >> > Hope you will be willing to reconsider :)
>> >> >> >>
>> >> >> >> The discussion would have been different if you mentioned that
>> >> >> >> motivation in the commit logs.
>> >> >> >> Please provide links to "Rust and other languages" code that
>> >> >> >> uses this api.

^ permalink raw reply

* Re: [RFC PATCH v1 1/1] bpftool: Add generating command to C dumped file.
From: sdf @ 2022-07-12 20:47 UTC (permalink / raw)
  To: Francis Laniel
  Cc: bpf, Quentin Monnet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Jiri Olsa, open list
In-Reply-To: <20220712184225.52429-2-flaniel@linux.microsoft.com>

On 07/12, Francis Laniel wrote:
> This commit adds the following lines to file generated by dump:
> /*
>   * File generated by bpftool using:
>   * bpftool btf dump file /sys/kernel/btf/vmlinux format c
>   * DO NOT EDIT.
>   */
> This warns users to not edit the file and documents the command used to
> generate the file.

> Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
> ---
>   tools/bpf/bpftool/btf.c | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)

> diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
> index 7e6accb9d9f7..eecfc27370c3 100644
> --- a/tools/bpf/bpftool/btf.c
> +++ b/tools/bpf/bpftool/btf.c
> @@ -415,7 +415,8 @@ static void __printf(2, 0) btf_dump_printf(void *ctx,
>   }

>   static int dump_btf_c(const struct btf *btf,
> -		      __u32 *root_type_ids, int root_type_cnt)
> +		      __u32 *root_type_ids, int root_type_cnt,
> +		      int argc, char **argv)
>   {
>   	struct btf_dump *d;
>   	int err = 0, i;
> @@ -425,6 +426,14 @@ static int dump_btf_c(const struct btf *btf,
>   	if (err)
>   		return err;

> +	printf("/*\n");
> +	printf(" * File generated by bpftool using:\n");
> +	printf(" * bpftool btf dump");

[..]

> +	for (i = 0; i < argc; i++)
> +		printf(" %s", argv[i]);

Do we really need that complexity to preserve the arguments?
For skeletons we're simply doing:

	/* THIS FILE IS AUTOGENERATED BY BPFTOOL! */

So probably the same should be fine here?

Also, while at it, might be worth adding SPDX license comment? So let's
align with whatever we have in gen.c ?


> +	printf("\n");
> +	printf(" * DO NOT EDIT.\n");
> +	printf(" */\n");
>   	printf("#ifndef __VMLINUX_H__\n");
>   	printf("#define __VMLINUX_H__\n");
>   	printf("\n");
> @@ -507,8 +516,10 @@ static bool btf_is_kernel_module(__u32 btf_id)
>   static int do_dump(int argc, char **argv)
>   {
>   	struct btf *btf = NULL, *base = NULL;
> +	char **orig_argv = argv;
>   	__u32 root_type_ids[2];
>   	int root_type_cnt = 0;
> +	int orig_argc = argc;
>   	bool dump_c = false;
>   	__u32 btf_id = -1;
>   	const char *src;
> @@ -649,7 +660,8 @@ static int do_dump(int argc, char **argv)
>   			err = -ENOTSUP;
>   			goto done;
>   		}
> -		err = dump_btf_c(btf, root_type_ids, root_type_cnt);
> +		err = dump_btf_c(btf, root_type_ids, root_type_cnt,
> +				 orig_argc, orig_argv);
>   	} else {
>   		err = dump_btf_raw(btf, root_type_ids, root_type_cnt);
>   	}
> --
> 2.25.1


^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Don't redirect packets with invalid pkt_len
From: Daniel Borkmann @ 2022-07-12 20:12 UTC (permalink / raw)
  To: sdf, Zhengchao Shao
  Cc: bpf, netdev, linux-kernel, davem, edumazet, kuba, pabeni, hawk,
	ast, andrii, martin.lau, song, yhs, john.fastabend, kpsingh,
	weiyongjun1, yuehaibing
In-Reply-To: <Ys2oPzt7Yn1oMou8@google.com>

On 7/12/22 6:58 PM, sdf@google.com wrote:
> On 07/12, Zhengchao Shao wrote:
>> Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
>> skbs, that is, the flow->head is null.
>> The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
>> run a bpf prog which redirects empty skbs.
>> So we should determine whether the length of the packet modified by bpf
>> prog or others like bpf_prog_test is valid before forwarding it directly.
> 
>> LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
>> LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html
> 
>> Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>> ---
>>   net/core/filter.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 4ef77ec5255e..27801b314960 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -2122,6 +2122,11 @@ static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
>>   {
>>       unsigned int mlen = skb_network_offset(skb);
> 
>> +    if (unlikely(skb->len == 0)) {
>> +        kfree_skb(skb);
>> +        return -EINVAL;
>> +    }
>> +
>>       if (mlen) {
>>           __skb_pull(skb, mlen);
> 
>> @@ -2143,7 +2148,9 @@ static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
>>                    u32 flags)
>>   {
>>       /* Verify that a link layer header is carried */
>> -    if (unlikely(skb->mac_header >= skb->network_header)) {
>> +    if (unlikely(skb->mac_header >= skb->network_header) ||
>> +        (min_t(u32, skb_mac_header_len(skb), skb->len) <
>> +         (u32)dev->min_header_len)) {
> 
> Why check skb->len != 0 above but skb->len < dev->min_header_len here?
> I guess it doesn't make sense in __bpf_redirect_no_mac because we know
> that mac is empty, but why do we care in __bpf_redirect_common?
> Why not put this check in the common __bpf_redirect?
> 
> Also, it's still not clear to me whether we should bake it into the core
> stack vs having some special checks from test_prog_run only. I'm
> assuming the issue is that we can construct illegal skbs with that
> test_prog_run interface, so maybe start by fixing that?

Agree, ideally we can prevent it right at the source rather than adding
more tests into the fast-path.

> Did you have a chance to look at the reproducer more closely? What
> exactly is it doing?
> 
>>           kfree_skb(skb);
>>           return -ERANGE;
>>       }
>> -- 
>> 2.17.1
> 


^ permalink raw reply

* Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.
From: Mina Almasry @ 2022-07-12 19:11 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Michal Hocko, Yosry Ahmed, Muchun Song,
	Johannes Weiner, Yafang Shao, Alexei Starovoitov, Matthew Wilcox,
	Christoph Hellwig, David S. Miller, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Kernel Team, linux-mm,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka
In-Reply-To: <CALvZod6Y3p1NZwSQe6+UWpY88iaOBrZXS5c5+uzMb+9sY1ziwg@mail.gmail.com>

On Tue, Jul 12, 2022 at 11:11 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> Ccing Mina who actually worked on upstreaming this. See [1] for
> previous discussion and more use-cases.
>
> [1] https://lore.kernel.org/linux-mm/20211120045011.3074840-1-almasrymina@google.com/
>
> On Tue, Jul 12, 2022 at 10:36 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > Hello,
> >
> > On Tue, Jul 12, 2022 at 10:26:22AM -0700, Shakeel Butt wrote:
> > > One use-case we have is a build & test service which runs independent
> > > builds and tests but all the build utilities (compiler, linker,
> > > libraries) are shared between those builds and tests.
> > >
> > > In terms of topology, the service has a top level cgroup (P) and all
> > > independent builds and tests run in their own cgroup under P. These
> > > builds/tests continuously come and go.
> > >
> > > This service continuously monitors all the builds/tests running and
> > > may kill some based on some criteria which includes memory usage.
> > > However the memory usage is nondeterministic and killing a specific
> > > build/test may not really free memory if most of the memory charged to
> > > it is from shared build utilities.
> >
> > That doesn't sound too unusual. So, one saving grace here is that the memory
> > pressure in the stressed cgroup should trigger reclaim of the shared memory
> > which will be likely picked up by someone else, hopefully, under less memory
> > pressure. Can you give more concerete details? ie. describe a failing
> > scenario with actual ballpark memory numbers?
>
> Mina, can you please provide details requested by Tejun?
>

As far as I am aware the builds/tests service Shakeel mentioned is a
theoretical use case we're considering, but the actual use cases we're
running are the 3 I listed in my cover letter in my original proposal:

https://lore.kernel.org/linux-mm/20211120045011.3074840-1-almasrymina@google.com/

Still, the use case Shakeel is talking about is almost identical to
use case #2 in that proposal:
"Our infrastructure has large meta jobs such as kubernetes which spawn
multiple subtasks which share a tmpfs mount. These jobs and its
subtasks use that tmpfs mount for various purposes such as data
sharing or persistent data between the subtask restarts. In kubernetes
terminology, the meta job is similar to pods and subtasks are
containers under pods. We want the shared memory to be
deterministically charged to the kubernetes's pod and independent to
the lifetime of containers under the pod."

To run such a job we do the following:

- We setup a hierarchy like so:
                   pod_container
                  /           |                 \
container_a    container_b     container_c

- We set up a tmpfs mount with memcg= pod_container. This instructs
the kernel to charge all of this tmpfs user data to pod_container,
instead of the memcg of the task which faults in the shared memory.

- We set up the pod_container.max to be the maximum amount of memory
allowed to the _entire_ job.

- We set up container_a.max, container_b.max, and container_c.max to
be the limit of each of sub-tasks a, b, and c respectively, not
including the shared memory, which is allocated via the tmpfs mount
and charged directly to pod_container.


For some rough numbers, you can imagine a scenario:

tmpfs memcg=pod_container,size=100MB

                                 pod_container.max=130MB
                    /                           |
             \
container_a.max=10MB    container_b.max=20MB    container_c.max=30MB


Thanks to memcg=pod_container, neither tasks a, b, and c are charged
for the shared memory, so they can stay within their 10MB, 20MB, and
30MB limits respectively. This gives us fine grained control to
deterministically charge the shared memory and apply limits on the
memory usage of the individual sub-tasks and the overall amount of
memory the entire pod should consume.

For transparency's sake, this is Johannes's comments on the API:
https://lore.kernel.org/linux-mm/YZvppKvUPTIytM%2Fc@cmpxchg.org/

As Tejun puts it:

"it may make sense to have a way to escape certain resources to an ancestor for
shared resources provided that we can come up with a sane interface"

The interface Johannes has opted for is to reparent memory to the
common ancestor _when it is accessed by a task in another memcg_. This
doesn't work for us for a few reasons, one of which in the example
above container_a may get charged for all the 100MB of shared memory
if it's the unlucky task that faults in all the shared memory.


> >
> > FWIW, at least from generic resource constrol standpoint, I think it may
> > make sense to have a way to escape certain resources to an ancestor for
> > shared resources provided that we can come up with a sane interface.
> >
> > Thanks.
> >
> > --
> > tejun

^ permalink raw reply

* Re: [PATCH v6 bpf-next 0/5] bpf_prog_pack followup
From: Luis Chamberlain @ 2022-07-12 19:04 UTC (permalink / raw)
  To: Song Liu
  Cc: Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Masami Hiramatsu, Naveen N. Rao, David S. Miller,
	Anil S Keshavamurthy, Kees Cook, Song Liu, bpf, Christoph Hellwig,
	Davidlohr Bueso, lkml, Linux-MM, Daniel Borkmann, Kernel Team,
	x86@kernel.org, dave.hansen@linux.intel.com,
	rick.p.edgecombe@intel.com, linux-modules@vger.kernel.org
In-Reply-To: <E23B6EB1-AFFA-4B65-963E-B44BA0F2142D@fb.com>

On Tue, Jul 12, 2022 at 05:49:32AM +0000, Song Liu wrote:
> > On Jul 11, 2022, at 9:18 PM, Luis Chamberlain <mcgrof@kernel.org> wrote:
> 
> > I believe you are mentioning requiring text_poke() because the way
> > eBPF code uses the module_alloc() is different. Correct me if I'm
> > wrong, but from what I gather is you use the text_poke_copy() as the data
> > is already RO+X, contrary module_alloc() use cases. You do this since your
> > bpf_prog_pack_alloc() calls set_memory_ro() and set_memory_x() after
> > module_alloc() and before you can use this memory. This is a different type
> > of allocator. And, again please correct me if I'm wrong but now you want to
> > share *one* 2 MiB huge-page for multiple BPF programs to help with the
> > impact of TLB misses.
> 
> Yes, sharing 1x 2MiB huge page is the main reason to require text_poke. 
> OTOH, 2MiB huge pages without sharing is not really useful. Both kprobe
> and ftrace only uses a fraction of a 4kB page. Most BPF programs and 
> modules cannot use 2MiB either. Therefore, vmalloc_rw_exec() doesn't add
> much value on top of current module_alloc(). 

Thanks for the clarification.

> > A vmalloc_ro_exec() by definition would imply a text_poke().
> > 
> > Can kprobes, ftrace and modules use it too? It would be nice
> > so to not have to deal with the loose semantics on the user to
> > have to use set_vm_flush_reset_perms() on ro+x later, but
> > I think this can be addressed separately on a case by case basis.
> 
> I am pretty confident that kprobe and ftrace can share huge pages with 
> BPF programs.

Then wonderful, we know where to go in terms of a new API then as it
can be shared in the future for sure and there are gains.

> I haven't looked into all the details with modules, but 
> given CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC, I think it is also 
> possible.

Sure.

> Once this is done, a regular system (without huge BPF program or huge
> modules) will just use 1x 2MB page for text from module, ftrace, kprobe, 
> and bpf programs. 

That would be nice, if possible, however modules will require likely its
own thing, on my system I see about 57 MiB used on coresize alone.

lsmod | grep -v Module | cut -f1 -d ' ' | \
	xargs sudo modinfo | grep filename | \
	grep -o '/.*' | xargs stat -c "%s - %n" | \
	awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'
60001272

And so perhaps we need such a pool size to be configurable.

> > But a vmalloc_ro_exec() with a respective free can remove the
> > requirement to do set_vm_flush_reset_perms().
> 
> Removing the requirement to set_vm_flush_reset_perms() is the other
> reason to go directly to vmalloc_ro_exec(). 

Yes fantastic.

> My current version looks like this:
> 
> void *vmalloc_exec(unsigned long size);
> void vfree_exec(void *ptr, unsigned int size);
> 
> ro is eliminated as there is no rw version of the API. 

Alright.

I am not sure if 2 MiB will suffice given what I mentioned above, and
what to do to ensure this grows at a reasonable pace. Then, at least for
usage for all architectures since not all will support text_poke() we
will want to consider a way to make it easy to users to use non huge
page fallbacks, but that would be up to those users, so we can wait for
that.

> The ugly part is @size for vfree_exec(). We need it to share huge 
> pages. 

I suppose this will become evident during patch review.

> Under the hood, it looks similar to current bpf_prog_pack_alloc
> and bpf_prog_pack_free. 

Groovy.

  Luis

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.
From: Alexei Starovoitov @ 2022-07-12 18:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Tejun Heo, Mina Almasry, Michal Hocko, Yosry Ahmed, Muchun Song,
	Johannes Weiner, Yafang Shao, Matthew Wilcox, Christoph Hellwig,
	David S. Miller, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Kernel Team, linux-mm, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka
In-Reply-To: <CALvZod6Y3p1NZwSQe6+UWpY88iaOBrZXS5c5+uzMb+9sY1ziwg@mail.gmail.com>

On Tue, Jul 12, 2022 at 11:11:25AM -0700, Shakeel Butt wrote:
> Ccing Mina who actually worked on upstreaming this. See [1] for
> previous discussion and more use-cases.
> 
> [1] https://lore.kernel.org/linux-mm/20211120045011.3074840-1-almasrymina@google.com/

Doesn't look like that it landed upstream?

For bpf side we're thinking of something similar.
We cannot do memcg= mount option, of course.
Instead memcg path or FD will passed to bpf side to be used later.
So the user can select a memcg instead of taking it from current.

Yafang,
I'm assuming you're working on something like this?

^ permalink raw reply

* [PATCH v7 7/7] selftests/bpf: Add test for bpf_verify_pkcs7_signature() helper
From: Roberto Sassu @ 2022-07-12 18:41 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, song, yhs,
	kpsingh, dhowells, jarkko, shuah
  Cc: bpf, keyrings, linux-security-module, linux-kselftest, netdev,
	linux-kernel, Roberto Sassu
In-Reply-To: <20220712184128.999301-1-roberto.sassu@huawei.com>

Perform several tests to ensure the correct implementation of the
bpf_verify_pkcs7_signature() helper.

Do the tests with data signed with a generated testing key (by using
sign-file from scripts/) and with the tcp_bic.ko kernel module if it is
found in the system. The test does not fail if tcp_bic.ko is not found.

First, ensure that bpf_verify_pkcs7_signature() rejects invalid parameters.

Then, perform a successful signature verification with the session keyring
and a new one created for testing.

Then, ensure that permission and validation checks are done properly on the
keyring provided to bpf_verify_pkcs7_signature(), despite those checks were
deferred at the time the keyring was retrieved with bpf_lookup_user_key().
The tests expect to encounter an error if the Search permission is removed
from the keyring, or the keyring is expired.

Finally, perform a successful and unsuccessful signature verification with
the keyrings with pre-determined IDs (the last test fails because the key
is not in the platform keyring).

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
 tools/testing/selftests/bpf/Makefile          |  14 +-
 tools/testing/selftests/bpf/config            |   2 +
 .../bpf/prog_tests/verify_pkcs7_sig.c         | 410 ++++++++++++++++++
 .../bpf/progs/test_verify_pkcs7_sig.c         |  90 ++++
 .../testing/selftests/bpf/verify_sig_setup.sh | 104 +++++
 5 files changed, 617 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c
 create mode 100755 tools/testing/selftests/bpf/verify_sig_setup.sh

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 8d59ec7f4c2d..5ae079e276b3 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -14,6 +14,7 @@ BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool
 APIDIR := $(TOOLSINCDIR)/uapi
 GENDIR := $(abspath ../../../../include/generated)
 GENHDR := $(GENDIR)/autoconf.h
+HOSTPKG_CONFIG := pkg-config
 
 ifneq ($(wildcard $(GENHDR)),)
   GENFLAGS := -DHAVE_GENHDR
@@ -75,7 +76,7 @@ TEST_PROGS := test_kmod.sh \
 	test_xsk.sh
 
 TEST_PROGS_EXTENDED := with_addr.sh \
-	with_tunnels.sh ima_setup.sh \
+	with_tunnels.sh ima_setup.sh verify_sig_setup.sh \
 	test_xdp_vlan.sh test_bpftool.py
 
 # Compile but not part of 'make run_tests'
@@ -84,7 +85,7 @@ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
 	test_lirc_mode2_user xdping test_cpp runqslower bench bpf_testmod.ko \
 	xskxceiver xdp_redirect_multi xdp_synproxy
 
-TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
+TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read $(OUTPUT)/sign-file
 
 # Emit succinct information message describing current building step
 # $1 - generic step name (e.g., CC, LINK, etc);
@@ -189,6 +190,12 @@ $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_r
 		     -fuse-ld=$(LLD) -Wl,-znoseparate-code		       \
 		     -Wl,-rpath=. -Wl,--build-id=sha1 -o $@
 
+$(OUTPUT)/sign-file: ../../../../scripts/sign-file.c
+	$(call msg,SIGN-FILE,,$@)
+	$(Q)$(CC) $(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null) \
+		  $< -o $@ \
+		  $(shell $(HOSTPKG_CONFIG) --libs libcrypto 2> /dev/null || echo -lcrypto)
+
 $(OUTPUT)/bpf_testmod.ko: $(VMLINUX_BTF) $(wildcard bpf_testmod/Makefile bpf_testmod/*.[ch])
 	$(call msg,MOD,,$@)
 	$(Q)$(RM) bpf_testmod/bpf_testmod.ko # force re-compilation
@@ -514,7 +521,8 @@ TRUNNER_EXTRA_SOURCES := test_progs.c cgroup_helpers.c trace_helpers.c	\
 TRUNNER_EXTRA_FILES := $(OUTPUT)/urandom_read $(OUTPUT)/bpf_testmod.ko	\
 		       $(OUTPUT)/liburandom_read.so			\
 		       $(OUTPUT)/xdp_synproxy				\
-		       ima_setup.sh					\
+		       $(OUTPUT)/sign-file				\
+		       ima_setup.sh verify_sig_setup.sh			\
 		       $(wildcard progs/btf_dump_test_case_*.c)
 TRUNNER_BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
 TRUNNER_BPF_CFLAGS := $(BPF_CFLAGS) $(CLANG_CFLAGS) -DENABLE_ATOMICS_TESTS
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index c05904d631ec..76b65acd897e 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -63,3 +63,5 @@ CONFIG_NETFILTER_XT_MATCH_STATE=y
 CONFIG_IP_NF_FILTER=y
 CONFIG_IP_NF_TARGET_SYNPROXY=y
 CONFIG_IP_NF_RAW=y
+CONFIG_MODULE_SIG=y
+CONFIG_KEYS=y
diff --git a/tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c b/tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c
new file mode 100644
index 000000000000..05f7580e34ec
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH
+ *
+ * Author: Roberto Sassu <roberto.sassu@huawei.com>
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <endian.h>
+#include <limits.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <sys/mman.h>
+#include <linux/keyctl.h>
+#include <test_progs.h>
+
+#include "test_verify_pkcs7_sig.skel.h"
+
+#define MAX_DATA_SIZE (1024 * 1024)
+#define MAX_SIG_SIZE 1024
+#define LOG_BUF_SIZE 16384
+
+#define VERIFY_USE_SECONDARY_KEYRING (1UL)
+#define VERIFY_USE_PLATFORM_KEYRING  (2UL)
+
+/* In stripped ARM and x86-64 modules, ~ is surprisingly rare. */
+#define MODULE_SIG_STRING "~Module signature appended~\n"
+
+/*
+ * Module signature information block.
+ *
+ * The constituents of the signature section are, in order:
+ *
+ *	- Signer's name
+ *	- Key identifier
+ *	- Signature data
+ *	- Information block
+ */
+struct module_signature {
+	u8	algo;		/* Public-key crypto algorithm [0] */
+	u8	hash;		/* Digest algorithm [0] */
+	u8	id_type;	/* Key identifier type [PKEY_ID_PKCS7] */
+	u8	signer_len;	/* Length of signer's name [0] */
+	u8	key_id_len;	/* Length of key identifier [0] */
+	u8	__pad[3];
+	__be32	sig_len;	/* Length of signature data */
+};
+
+struct data {
+	u8 data[MAX_DATA_SIZE];
+	u32 data_len;
+	u8 sig[MAX_SIG_SIZE];
+	u32 sig_len;
+};
+
+static int _run_setup_process(const char *setup_dir, const char *cmd)
+{
+	int child_pid, child_status;
+
+	child_pid = fork();
+	if (child_pid == 0) {
+		execlp("./verify_sig_setup.sh", "./verify_sig_setup.sh", cmd,
+		       setup_dir, NULL);
+		exit(errno);
+
+	} else if (child_pid > 0) {
+		waitpid(child_pid, &child_status, 0);
+		return WEXITSTATUS(child_status);
+	}
+
+	return -EINVAL;
+}
+
+static int populate_data_item_str(const char *tmp_dir, struct data *data_item)
+{
+	struct stat st;
+	char data_template[] = "/tmp/dataXXXXXX";
+	char path[PATH_MAX];
+	int ret, fd, child_status, child_pid;
+
+	data_item->data_len = 4;
+	memcpy(data_item->data, "test", data_item->data_len);
+
+	fd = mkstemp(data_template);
+	if (fd == -1)
+		return -errno;
+
+	ret = write(fd, data_item->data, data_item->data_len);
+
+	close(fd);
+
+	if (ret != data_item->data_len) {
+		ret = -EIO;
+		goto out;
+	}
+
+	child_pid = fork();
+
+	if (child_pid == -1) {
+		ret = -errno;
+		goto out;
+	}
+
+	if (child_pid == 0) {
+		snprintf(path, sizeof(path), "%s/signing_key.pem", tmp_dir);
+
+		return execlp("./sign-file", "./sign-file", "-d", "sha256",
+			      path, path, data_template, NULL);
+	}
+
+	waitpid(child_pid, &child_status, 0);
+
+	ret = WEXITSTATUS(child_status);
+	if (ret)
+		goto out;
+
+	snprintf(path, sizeof(path), "%s.p7s", data_template);
+
+	ret = stat(path, &st);
+	if (ret == -1) {
+		ret = -errno;
+		goto out;
+	}
+
+	if (st.st_size > sizeof(data_item->sig)) {
+		ret = -EINVAL;
+		goto out_sig;
+	}
+
+	data_item->sig_len = st.st_size;
+
+	fd = open(path, O_RDONLY);
+	if (fd == -1) {
+		ret = -errno;
+		goto out_sig;
+	}
+
+	ret = read(fd, data_item->sig, data_item->sig_len);
+
+	close(fd);
+
+	if (ret != data_item->sig_len) {
+		ret = -EIO;
+		goto out_sig;
+	}
+
+	ret = 0;
+out_sig:
+	unlink(path);
+out:
+	unlink(data_template);
+	return ret;
+}
+
+static int populate_data_item_mod(struct data *data_item)
+{
+	char mod_path[PATH_MAX], *mod_path_ptr;
+	struct stat st;
+	void *mod;
+	FILE *fp;
+	struct module_signature ms;
+	int ret, fd, modlen, marker_len, sig_len;
+
+	data_item->data_len = 0;
+
+	if (stat("/lib/modules", &st) == -1)
+		return 0;
+
+	/* Requires CONFIG_TCP_CONG_BIC=m. */
+	fp = popen("find /lib/modules/$(uname -r) -name tcp_bic.ko", "r");
+	if (!fp)
+		return 0;
+
+	mod_path_ptr = fgets(mod_path, sizeof(mod_path), fp);
+	pclose(fp);
+
+	if (!mod_path_ptr)
+		return 0;
+
+	mod_path_ptr = strchr(mod_path, '\n');
+	if (!mod_path_ptr)
+		return 0;
+
+	*mod_path_ptr = '\0';
+
+	if (stat(mod_path, &st) == -1)
+		return 0;
+
+	modlen = st.st_size;
+	marker_len = sizeof(MODULE_SIG_STRING) - 1;
+
+	fd = open(mod_path, O_RDONLY);
+	if (fd == -1)
+		return -errno;
+
+	mod = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+	close(fd);
+
+	if (mod == MAP_FAILED)
+		return -errno;
+
+	if (strncmp(mod + modlen - marker_len, MODULE_SIG_STRING, marker_len)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	modlen -= marker_len;
+
+	memcpy(&ms, mod + (modlen - sizeof(ms)), sizeof(ms));
+
+	sig_len = __be32_to_cpu(ms.sig_len);
+	modlen -= sig_len + sizeof(ms);
+
+	if (modlen > sizeof(data_item->data)) {
+		ret = -E2BIG;
+		goto out;
+	}
+
+	memcpy(data_item->data, mod, modlen);
+	data_item->data_len = modlen;
+
+	if (sig_len > sizeof(data_item->sig)) {
+		ret = -E2BIG;
+		goto out;
+	}
+
+	memcpy(data_item->sig, mod + modlen, sig_len);
+	data_item->sig_len = sig_len;
+	ret = 0;
+out:
+	munmap(mod, st.st_size);
+	return ret;
+}
+
+void test_verify_pkcs7_sig(void)
+{
+	char tmp_dir_template[] = "/tmp/verify_sigXXXXXX";
+	char *tmp_dir;
+	char *buf = NULL;
+	struct test_verify_pkcs7_sig *skel = NULL;
+	struct bpf_map *map;
+	struct data data;
+	int ret, zero = 0;
+
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+
+	/* Trigger creation of session keyring. */
+	syscall(__NR_request_key, "keyring", "_uid.0", NULL,
+		KEY_SPEC_SESSION_KEYRING);
+
+	tmp_dir = mkdtemp(tmp_dir_template);
+	if (!ASSERT_OK_PTR(tmp_dir, "mkdtemp"))
+		return;
+
+	ret = _run_setup_process(tmp_dir, "setup");
+	if (!ASSERT_OK(ret, "_run_setup_process"))
+		goto close_prog;
+
+	buf = malloc(LOG_BUF_SIZE);
+	if (!ASSERT_OK_PTR(buf, "malloc"))
+		goto close_prog;
+
+	opts.kernel_log_buf = buf;
+	opts.kernel_log_size = LOG_BUF_SIZE;
+	opts.kernel_log_level = 1;
+
+	skel = test_verify_pkcs7_sig__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "test_verify_pkcs7_sig__open_opts"))
+		goto close_prog;
+
+	ret = test_verify_pkcs7_sig__load(skel);
+
+	if (ret < 0 && strstr(buf, "unknown func bpf_verify_pkcs7_signature")) {
+		printf(
+		  "%s:SKIP:bpf_verify_pkcs7_signature() helper not supported\n",
+		  __func__);
+		test__skip();
+		goto close_prog;
+	}
+
+	if (!ASSERT_OK(ret, "test_verify_pkcs7_sig__load"))
+		goto close_prog;
+
+	ret = test_verify_pkcs7_sig__attach(skel);
+	if (!ASSERT_OK(ret, "test_verify_pkcs7_sig__attach"))
+		goto close_prog;
+
+	map = bpf_object__find_map_by_name(skel->obj, "data_input");
+	if (!ASSERT_OK_PTR(map, "data_input not found"))
+		goto close_prog;
+
+	skel->bss->monitored_pid = getpid();
+
+	/* Test incorrect parameters. */
+	skel->bss->user_keyring_serial = 0;
+	skel->bss->system_keyring = UINT64_MAX;
+
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	skel->bss->user_keyring_serial = KEY_SPEC_SESSION_KEYRING;
+	skel->bss->system_keyring = 0;
+
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	/* Test without data and signature. */
+	skel->bss->user_keyring_serial = KEY_SPEC_SESSION_KEYRING;
+	skel->bss->system_keyring = UINT64_MAX;
+
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	/* Test successful signature verification with session keyring. */
+	ret = populate_data_item_str(tmp_dir, &data);
+	if (!ASSERT_OK(ret, "populate_data_item_str"))
+		goto close_prog;
+
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_OK(ret, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	/* Test successful signature verification with testing keyring. */
+	skel->bss->user_keyring_serial = syscall(__NR_request_key, "keyring",
+						 "ebpf_testing_keyring", NULL,
+						 KEY_SPEC_SESSION_KEYRING);
+
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_OK(ret, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	/*
+	 * Ensure key_task_permission() is called and rejects the keyring
+	 * (no Search permission).
+	 */
+	syscall(__NR_keyctl, KEYCTL_SETPERM, skel->bss->user_keyring_serial,
+		0x37373737);
+
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	syscall(__NR_keyctl, KEYCTL_SETPERM, skel->bss->user_keyring_serial,
+		0x3f3f3f3f);
+
+	/*
+	 * Ensure key_validate() is called and rejects the keyring (key expired)
+	 */
+	syscall(__NR_keyctl, KEYCTL_SET_TIMEOUT,
+		skel->bss->user_keyring_serial, 1);
+	sleep(1);
+
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	skel->bss->user_keyring_serial = KEY_SPEC_SESSION_KEYRING;
+
+	/* Test with corrupted data (signature verification should fail). */
+	data.data[0] = 'a';
+	ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data, BPF_ANY);
+	if (!ASSERT_LT(ret, 0, "bpf_map_update_elem data_input"))
+		goto close_prog;
+
+	ret = populate_data_item_mod(&data);
+	if (!ASSERT_OK(ret, "populate_data_item_mod"))
+		goto close_prog;
+
+	/* Test signature verification with system keyrings. */
+	if (data.data_len) {
+		skel->bss->user_keyring_serial = 0;
+		skel->bss->system_keyring = 0;
+
+		ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data,
+					  BPF_ANY);
+		if (!ASSERT_OK(ret, "bpf_map_update_elem data_input"))
+			goto close_prog;
+
+		skel->bss->system_keyring = VERIFY_USE_SECONDARY_KEYRING;
+
+		ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data,
+					  BPF_ANY);
+		if (!ASSERT_OK(ret, "bpf_map_update_elem data_input"))
+			goto close_prog;
+
+		skel->bss->system_keyring = VERIFY_USE_PLATFORM_KEYRING;
+
+		ret = bpf_map_update_elem(bpf_map__fd(map), &zero, &data,
+					  BPF_ANY);
+		ASSERT_LT(ret, 0, "bpf_map_update_elem data_input");
+	}
+
+close_prog:
+	_run_setup_process(tmp_dir, "cleanup");
+	free(buf);
+
+	if (!skel)
+		return;
+
+	skel->bss->monitored_pid = 0;
+	test_verify_pkcs7_sig__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c b/tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c
new file mode 100644
index 000000000000..421de5ed3fcb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH
+ *
+ * Author: Roberto Sassu <roberto.sassu@huawei.com>
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <limits.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#define MAX_DATA_SIZE (1024 * 1024)
+#define MAX_SIG_SIZE 1024
+
+typedef __u8 u8;
+typedef __u16 u16;
+typedef __u32 u32;
+typedef __u64 u64;
+
+u32 monitored_pid;
+u32 user_keyring_serial;
+u64 system_keyring;
+
+struct data {
+	u8 data[MAX_DATA_SIZE];
+	u32 data_len;
+	u8 sig[MAX_SIG_SIZE];
+	u32 sig_len;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct data);
+} data_input SEC(".maps");
+
+char _license[] SEC("license") = "GPL";
+
+SEC("lsm.s/bpf")
+int BPF_PROG(bpf, int cmd, union bpf_attr *attr, unsigned int size)
+{
+	struct bpf_dynptr data_ptr, sig_ptr;
+	struct data *data_val;
+	struct key *user_keyring = NULL;
+	u32 pid;
+	u64 value;
+	int ret, zero = 0;
+
+	pid = bpf_get_current_pid_tgid() >> 32;
+	if (pid != monitored_pid)
+		return 0;
+
+	data_val = bpf_map_lookup_elem(&data_input, &zero);
+	if (!data_val)
+		return 0;
+
+	bpf_probe_read(&value, sizeof(value), &attr->value);
+
+	bpf_copy_from_user(data_val, sizeof(struct data),
+			   (void *)(unsigned long)value);
+
+	if (data_val->data_len > sizeof(data_val->data))
+		return -EINVAL;
+
+	bpf_dynptr_from_mem(data_val->data, data_val->data_len, 0, &data_ptr);
+
+	if (data_val->sig_len > sizeof(data_val->sig))
+		return -EINVAL;
+
+	bpf_dynptr_from_mem(data_val->sig, data_val->sig_len, 0, &sig_ptr);
+
+	if (user_keyring_serial) {
+		user_keyring = bpf_lookup_user_key(user_keyring_serial, 0);
+		if (!user_keyring)
+			return -ENOENT;
+	}
+
+	ret = bpf_verify_pkcs7_signature(&data_ptr, &sig_ptr, user_keyring,
+					 system_keyring);
+
+	if (user_keyring)
+		bpf_key_put(user_keyring);
+
+	return ret;
+}
diff --git a/tools/testing/selftests/bpf/verify_sig_setup.sh b/tools/testing/selftests/bpf/verify_sig_setup.sh
new file mode 100755
index 000000000000..ba08922b4a27
--- /dev/null
+++ b/tools/testing/selftests/bpf/verify_sig_setup.sh
@@ -0,0 +1,104 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+set -e
+set -u
+set -o pipefail
+
+VERBOSE="${SELFTESTS_VERBOSE:=0}"
+LOG_FILE="$(mktemp /tmp/verify_sig_setup.log.XXXXXX)"
+
+x509_genkey_content="\
+[ req ]
+default_bits = 2048
+distinguished_name = req_distinguished_name
+prompt = no
+string_mask = utf8only
+x509_extensions = myexts
+
+[ req_distinguished_name ]
+CN = eBPF Signature Verification Testing Key
+
+[ myexts ]
+basicConstraints=critical,CA:FALSE
+keyUsage=digitalSignature
+subjectKeyIdentifier=hash
+authorityKeyIdentifier=keyid
+"
+
+usage()
+{
+	echo "Usage: $0 <setup|cleanup <existing_tmp_dir>"
+	exit 1
+}
+
+setup()
+{
+	local tmp_dir="$1"
+
+	echo "${x509_genkey_content}" > ${tmp_dir}/x509.genkey
+
+	openssl req -new -nodes -utf8 -sha256 -days 36500 \
+			-batch -x509 -config ${tmp_dir}/x509.genkey \
+			-outform PEM -out ${tmp_dir}/signing_key.pem \
+			-keyout ${tmp_dir}/signing_key.pem 2>&1
+
+	openssl x509 -in ${tmp_dir}/signing_key.pem -out \
+		${tmp_dir}/signing_key.der -outform der
+
+	key_id=$(cat ${tmp_dir}/signing_key.der | keyctl padd asymmetric ebpf_testing_key @s)
+
+	keyring_id=$(keyctl newring ebpf_testing_keyring @s)
+	keyctl link $key_id $keyring_id
+}
+
+cleanup() {
+	local tmp_dir="$1"
+
+	keyctl unlink $(keyctl search @s asymmetric ebpf_testing_key) @s
+	keyctl unlink $(keyctl search @s keyring ebpf_testing_keyring) @s
+	rm -rf ${tmp_dir}
+}
+
+catch()
+{
+	local exit_code="$1"
+	local log_file="$2"
+
+	if [[ "${exit_code}" -ne 0 ]]; then
+		cat "${log_file}" >&3
+	fi
+
+	rm -f "${log_file}"
+	exit ${exit_code}
+}
+
+main()
+{
+	[[ $# -ne 2 ]] && usage
+
+	local action="$1"
+	local tmp_dir="$2"
+
+	[[ ! -d "${tmp_dir}" ]] && echo "Directory ${tmp_dir} doesn't exist" && exit 1
+
+	if [[ "${action}" == "setup" ]]; then
+		setup "${tmp_dir}"
+	elif [[ "${action}" == "cleanup" ]]; then
+		cleanup "${tmp_dir}"
+	else
+		echo "Unknown action: ${action}"
+		exit 1
+	fi
+}
+
+trap 'catch "$?" "${LOG_FILE}"' EXIT
+
+if [[ "${VERBOSE}" -eq 0 ]]; then
+	# Save the stderr to 3 so that we can output back to
+	# it incase of an error.
+	exec 3>&2 1>"${LOG_FILE}" 2>&1
+fi
+
+main "$@"
+rm -f "${LOG_FILE}"
-- 
2.25.1


^ permalink raw reply related

* [PATCH v7 6/7] selftests/bpf: Add additional test for bpf_lookup_user_key()
From: Roberto Sassu @ 2022-07-12 18:41 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, song, yhs,
	kpsingh, dhowells, jarkko, shuah
  Cc: bpf, keyrings, linux-security-module, linux-kselftest, netdev,
	linux-kernel, Roberto Sassu
In-Reply-To: <20220712184128.999301-1-roberto.sassu@huawei.com>

Add an additional test to ensure that bpf_lookup_user_key() creates a
referenced special keyring when the KEY_LOOKUP_CREATE flag is passed to
this function.

Also ensure that the helper rejects invalid flags.

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
 .../bpf/prog_tests/lookup_user_key.c          | 94 +++++++++++++++++++
 .../bpf/progs/test_lookup_user_key.c          | 35 +++++++
 2 files changed, 129 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/lookup_user_key.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_lookup_user_key.c

diff --git a/tools/testing/selftests/bpf/prog_tests/lookup_user_key.c b/tools/testing/selftests/bpf/prog_tests/lookup_user_key.c
new file mode 100644
index 000000000000..6487699d30f6
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/lookup_user_key.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH
+ *
+ * Author: Roberto Sassu <roberto.sassu@huawei.com>
+ */
+
+#include <linux/keyctl.h>
+#include <test_progs.h>
+
+#include "test_lookup_user_key.skel.h"
+
+#define LOG_BUF_SIZE 16384
+
+#define KEY_LOOKUP_CREATE	0x01
+#define KEY_LOOKUP_PARTIAL	0x02
+
+void test_lookup_user_key(void)
+{
+	char *buf = NULL;
+	struct test_lookup_user_key *skel = NULL;
+	u32 next_id;
+	int ret;
+
+	LIBBPF_OPTS(bpf_object_open_opts, opts);
+
+	buf = malloc(LOG_BUF_SIZE);
+	if (!ASSERT_OK_PTR(buf, "malloc"))
+		goto close_prog;
+
+	opts.kernel_log_buf = buf;
+	opts.kernel_log_size = LOG_BUF_SIZE;
+	opts.kernel_log_level = 1;
+
+	skel = test_lookup_user_key__open_opts(&opts);
+	if (!ASSERT_OK_PTR(skel, "test_lookup_user_key__open_opts"))
+		goto close_prog;
+
+	ret = test_lookup_user_key__load(skel);
+
+	if (ret < 0 && strstr(buf, "unknown func bpf_lookup_user_key")) {
+		printf("%s:SKIP:bpf_lookup_user_key() helper not supported\n",
+		       __func__);
+		test__skip();
+		goto close_prog;
+	}
+
+	if (!ASSERT_OK(ret, "test_lookup_user_key__load"))
+		goto close_prog;
+
+	ret = test_lookup_user_key__attach(skel);
+	if (!ASSERT_OK(ret, "test_lookup_user_key__attach"))
+		goto close_prog;
+
+	skel->bss->monitored_pid = getpid();
+	skel->bss->key_serial = KEY_SPEC_THREAD_KEYRING;
+
+	/* The thread-specific keyring does not exist, this test fails. */
+	skel->bss->flags = 0;
+
+	ret = bpf_prog_get_next_id(0, &next_id);
+	if (!ASSERT_LT(ret, 0, "bpf_prog_get_next_id"))
+		goto close_prog;
+
+	/* Force creation of the thread-specific keyring, this test succeeds. */
+	skel->bss->flags = KEY_LOOKUP_CREATE;
+
+	ret = bpf_prog_get_next_id(0, &next_id);
+	if (!ASSERT_OK(ret, "bpf_prog_get_next_id"))
+		goto close_prog;
+
+	/* Pass both lookup flags for parameter validation. */
+	skel->bss->flags = KEY_LOOKUP_CREATE | KEY_LOOKUP_PARTIAL;
+
+	ret = bpf_prog_get_next_id(0, &next_id);
+	if (!ASSERT_OK(ret, "bpf_prog_get_next_id"))
+		goto close_prog;
+
+	/* Pass invalid flags. */
+	skel->bss->flags = UINT64_MAX;
+
+	ret = bpf_prog_get_next_id(0, &next_id);
+	ASSERT_LT(ret, 0, "bpf_prog_get_next_id");
+
+close_prog:
+	free(buf);
+
+	if (!skel)
+		return;
+
+	skel->bss->monitored_pid = 0;
+	test_lookup_user_key__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_lookup_user_key.c b/tools/testing/selftests/bpf/progs/test_lookup_user_key.c
new file mode 100644
index 000000000000..5b63753ba163
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_lookup_user_key.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2022 Huawei Technologies Duesseldorf GmbH
+ *
+ * Author: Roberto Sassu <roberto.sassu@huawei.com>
+ */
+
+#include <errno.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u32 monitored_pid;
+__u32 key_serial;
+__u64 flags;
+
+SEC("lsm.s/bpf")
+int BPF_PROG(bpf, int cmd, union bpf_attr *attr, unsigned int size)
+{
+	struct key *key;
+	__u32 pid;
+
+	pid = bpf_get_current_pid_tgid() >> 32;
+	if (pid != monitored_pid)
+		return 0;
+
+	key = bpf_lookup_user_key(key_serial, flags);
+	if (key)
+		bpf_key_put(key);
+
+	return (key) ? 0 : -ENOENT;
+}
-- 
2.25.1


^ permalink raw reply related

* [PATCH v7 5/7] selftests: Add verifier tests for bpf_lookup_user_key() and bpf_key_put()
From: Roberto Sassu @ 2022-07-12 18:41 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, song, yhs,
	kpsingh, dhowells, jarkko, shuah
  Cc: bpf, keyrings, linux-security-module, linux-kselftest, netdev,
	linux-kernel, Roberto Sassu
In-Reply-To: <20220712184128.999301-1-roberto.sassu@huawei.com>

Add verifier tests for bpf_lookup_user_key() and bpf_key_put(), to ensure
that acquired key references can be released, that a non-NULL pointer is
passed to bpf_key_put(), and that key references are not leaked.

Also, slightly modify test_verifier.c, to find the BTF ID of the attach
point for the LSM program type (currently, it is done only for TRACING).

Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
---
 tools/testing/selftests/bpf/test_verifier.c   |  3 +-
 .../selftests/bpf/verifier/ref_tracking.c     | 66 +++++++++++++++++++
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index f9d553fbf68a..2dbcbf363c18 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1498,7 +1498,8 @@ static void do_test_single(struct bpf_test *test, bool unpriv,
 		opts.log_level = DEFAULT_LIBBPF_LOG_LEVEL;
 	opts.prog_flags = pflags;
 
-	if (prog_type == BPF_PROG_TYPE_TRACING && test->kfunc) {
+	if ((prog_type == BPF_PROG_TYPE_TRACING ||
+	     prog_type == BPF_PROG_TYPE_LSM) && test->kfunc) {
 		int attach_btf_id;
 
 		attach_btf_id = libbpf_find_vmlinux_btf_id(test->kfunc,
diff --git a/tools/testing/selftests/bpf/verifier/ref_tracking.c b/tools/testing/selftests/bpf/verifier/ref_tracking.c
index 57a83d763ec1..36e0b9b342de 100644
--- a/tools/testing/selftests/bpf/verifier/ref_tracking.c
+++ b/tools/testing/selftests/bpf/verifier/ref_tracking.c
@@ -84,6 +84,72 @@
 	.errstr = "Unreleased reference",
 	.result = REJECT,
 },
+{
+	"reference tracking: release key reference",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_1, -3),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_EMIT_CALL(BPF_FUNC_lookup_user_key),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_EMIT_CALL(BPF_FUNC_key_put),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_LSM,
+	.kfunc = "bpf",
+	.expected_attach_type = BPF_LSM_MAC,
+	.flags = BPF_F_SLEEPABLE,
+	.result = ACCEPT,
+},
+{
+	"reference tracking: release key reference without check",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_1, -3),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_EMIT_CALL(BPF_FUNC_lookup_user_key),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_EMIT_CALL(BPF_FUNC_key_put),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_LSM,
+	.kfunc = "bpf",
+	.expected_attach_type = BPF_LSM_MAC,
+	.flags = BPF_F_SLEEPABLE,
+	.errstr = "type=ptr_or_null_ expected=ptr_",
+	.result = REJECT,
+},
+{
+	"reference tracking: release reference with NULL key pointer",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_1, 0),
+	BPF_EMIT_CALL(BPF_FUNC_key_put),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_LSM,
+	.kfunc = "bpf",
+	.expected_attach_type = BPF_LSM_MAC,
+	.flags = BPF_F_SLEEPABLE,
+	.errstr = "type=scalar expected=ptr_",
+	.result = REJECT,
+},
+{
+	"reference tracking: leak potential reference to key",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_1, -3),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_EMIT_CALL(BPF_FUNC_lookup_user_key),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_LSM,
+	.kfunc = "bpf",
+	.expected_attach_type = BPF_LSM_MAC,
+	.flags = BPF_F_SLEEPABLE,
+	.errstr = "Unreleased reference",
+	.result = REJECT,
+},
 {
 	"reference tracking: release reference without check",
 	.insns = {
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH RFC bpf-next 0/3] Execution context callbacks
From: Delyan Kratunov @ 2022-07-12 18:42 UTC (permalink / raw)
  To: sdf@google.com
  Cc: daniel@iogearbox.net, ast@kernel.org, andrii@kernel.org,
	bpf@vger.kernel.org
In-Reply-To: <Ys24W4RJS0BAfKzP@google.com>

Thanks for taking a look, Stanislav!

On Tue, 2022-07-12 at 11:07 -0700, sdf@google.com wrote:
> *snip*
> > 2. The callback arguments need to be in a map. We can currently express  
> > helper arguments taking a
> > pointer to a map value but not a pointer to _within_ a map value. Should  
> > we add a new argument
> > type or should we just pass the map value pointer to the callback?
> 
> Passing map value pointer (as you do in the selftest) seems fine; do
> you think we need more flexibility here?

I think it makes a cleaner and more familiar API - the pointer to my data that I give
to the submission function is the one I get in the callback. Requiring it to be a map
value is a little bit quirky (it's not really my data it's pointing to!). I don't
know if it's a lot of work in the verifier to iron out this quirk but if it's
reasonable, I'd be happy to make the developer experience a little more predictable.

> > 3. A lot of the map handling code is verbatim from bpf_timer. This feels  
> > icky but I'm not sure if it
> > justifies a refactor quite yet. Opinions welcome.
> 
> +1, it does seem very close to a timer with expiry time == 0.
> 
> I don't know what's the exact usecase you're trying to solve exactly,

The primary motivating examples are 1) GFP_ATOMIC usage is not safe in NMI aiui, so
switching allocations to hardirq helps and 2) copy_from_user in tracing programs (nmi
or softirq when using software clocks). The latter shows up in insidious ways like
build id not being reliable when retrieving stack traces ([1] is a thread from a
while ago about it).

> but have you though of maybe initially supporting something like:
> 
> bpf_timer_init(&timer, map, SOME_NEW_DEFERRED_NMI_ONLY_FLAG);
> bpf_timer_set_callback(&timer, cg);
> bpf_timer_start(&timer, 0, 0);
> 
> If you init a timer with that special flag, I'm assuming you can have
> special cases in the existing helpers to simulate the delayed work?

Potentially but I have some reservations about drawing this equivalence.

> Then, the verifier changes should be minimal it seems.
> 
> OTOH, having a separate set of helpers seems more clear API-wise :-/

The primary way this differs from timers is that timers already specify an execution
context - the callback will be called from a softirq. 

It doesn't make sense to me to have some "timers" (but only 0-delay, super-special
timers) run in hardirq or, more confusingly, user context. At that point, there's
little in the API to express these differences, (e.g., bpf_copy_from_user_task is
accessible in *this* callback) and the verifier work will be far more challenging (if
at all possible since the init and the set_callback would be split).

I think it's worth thinking about how to unify the handling of timer-like map value
members but I don't think it's worth it trying to shoehorn this functionality into
existing infra.

> *snip*

  [1]: https://lore.kernel.org/bpf/CA+khW7gh=vO8m-_SVnwWwj7kv+EDeUPcuWFqebf2Zmi9T_oEAQ@mail.gmail.com/


^ permalink raw reply

* [RFC PATCH v1 1/1] bpftool: Add generating command to C dumped file.
From: Francis Laniel @ 2022-07-12 18:42 UTC (permalink / raw)
  To: bpf
  Cc: Francis Laniel, Quentin Monnet, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, open list
In-Reply-To: <20220712184225.52429-1-flaniel@linux.microsoft.com>

This commit adds the following lines to file generated by dump:
/*
 * File generated by bpftool using:
 * bpftool btf dump file /sys/kernel/btf/vmlinux format c
 * DO NOT EDIT.
 */
This warns users to not edit the file and documents the command used to
generate the file.

Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
---
 tools/bpf/bpftool/btf.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 7e6accb9d9f7..eecfc27370c3 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -415,7 +415,8 @@ static void __printf(2, 0) btf_dump_printf(void *ctx,
 }
 
 static int dump_btf_c(const struct btf *btf,
-		      __u32 *root_type_ids, int root_type_cnt)
+		      __u32 *root_type_ids, int root_type_cnt,
+		      int argc, char **argv)
 {
 	struct btf_dump *d;
 	int err = 0, i;
@@ -425,6 +426,14 @@ static int dump_btf_c(const struct btf *btf,
 	if (err)
 		return err;
 
+	printf("/*\n");
+	printf(" * File generated by bpftool using:\n");
+	printf(" * bpftool btf dump");
+	for (i = 0; i < argc; i++)
+		printf(" %s", argv[i]);
+	printf("\n");
+	printf(" * DO NOT EDIT.\n");
+	printf(" */\n");
 	printf("#ifndef __VMLINUX_H__\n");
 	printf("#define __VMLINUX_H__\n");
 	printf("\n");
@@ -507,8 +516,10 @@ static bool btf_is_kernel_module(__u32 btf_id)
 static int do_dump(int argc, char **argv)
 {
 	struct btf *btf = NULL, *base = NULL;
+	char **orig_argv = argv;
 	__u32 root_type_ids[2];
 	int root_type_cnt = 0;
+	int orig_argc = argc;
 	bool dump_c = false;
 	__u32 btf_id = -1;
 	const char *src;
@@ -649,7 +660,8 @@ static int do_dump(int argc, char **argv)
 			err = -ENOTSUP;
 			goto done;
 		}
-		err = dump_btf_c(btf, root_type_ids, root_type_cnt);
+		err = dump_btf_c(btf, root_type_ids, root_type_cnt,
+				 orig_argc, orig_argv);
 	} else {
 		err = dump_btf_raw(btf, root_type_ids, root_type_cnt);
 	}
-- 
2.25.1


^ permalink raw reply related

* [RFC PATCH v1 0/1] bpftool: Add generating command to C dumped file.
From: Francis Laniel @ 2022-07-12 18:42 UTC (permalink / raw)
  To: bpf
  Cc: Francis Laniel, Quentin Monnet, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, open list

Hi.


First, I hope you are fine and the same for your relatives.

In this patch, I added the command used to generate a BTF dump at the top of the
dump when outputting to C:
/*
 * File generated by bpftool using:
 * bpftool btf dump file /sys/kernel/btf/vmlinux format c
 * DO NOT EDIT.
 */

The goal of this is to first warn users this file must not be edited and also
to document the command used to get it.
The idea was gathered from a message posted on iovisor/bcc repository and from
message written by bpf2go when it generates a file [1, 2].

This patch is clearly not a big change which impacts the future of bpftool but
I think it could be welcomed.
If you see any way to improve it or have any question, feel free to ask.

Francis Laniel (1):
  bpftool: Add generating command to dumped file.

 tools/bpf/bpftool/btf.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)


Best regards and thank you in advance.
---
[1] https://github.com/iovisor/bcc/pull/4088#pullrequestreview-1032543916
[2] https://github.com/cilium/ebpf/blob/951bb28908d23e50fca063a2d51098ca028352bf/cmd/bpf2go/output.go#L21
-- 
2.25.1


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox