Re: [PATCH bpf-next v5 03/25] bpf: Support bpf_list_head in map values

BPF List
 help / color / mirror / Atom feed

From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Dave Marchevsky <davemarchevsky@meta.com>,
	Delyan Kratunov <delyank@meta.com>
Subject: Re: [PATCH bpf-next v5 03/25] bpf: Support bpf_list_head in map values
Date: Wed, 9 Nov 2022 05:09:44 +0530	[thread overview]
Message-ID: <20221108233944.o6ktnoinaggzir7t@apollo> (raw)
In-Reply-To: <CAEf4Bza6R67US05R6Oh-FY9Kit8abH6eiJ33Z6TnSSpC_n5FBA@mail.gmail.com>

On Wed, Nov 09, 2022 at 04:31:52AM IST, Andrii Nakryiko wrote:
> On Mon, Nov 7, 2022 at 3:10 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Add the support on the map side to parse, recognize, verify, and build
> > metadata table for a new special field of the type struct bpf_list_head.
> > To parameterize the bpf_list_head for a certain value type and the
> > list_node member it will accept in that value type, we use BTF
> > declaration tags.
> >
> > The definition of bpf_list_head in a map value will be done as follows:
> >
> > struct foo {
> >         struct bpf_list_node node;
> >         int data;
> > };
> >
> > struct map_value {
> >         struct bpf_list_head head __contains(foo, node);
> > };
> >
> > Then, the bpf_list_head only allows adding to the list 'head' using the
> > bpf_list_node 'node' for the type struct foo.
> >
> > The 'contains' annotation is a BTF declaration tag composed of four
> > parts, "contains:name:node" where the name is then used to look up the
> > type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during
> > the lookup. The node defines name of the member in this type that has
> > the type struct bpf_list_node, which is actually used for linking into
> > the linked list. For now, 'kind' part is hardcoded as struct.
> >
> > This allows building intrusive linked lists in BPF, using container_of
> > to obtain pointer to entry, while being completely type safe from the
> > perspective of the verifier. The verifier knows exactly the type of the
> > nodes, and knows that list helpers return that type at some fixed offset
> > where the bpf_list_node member used for this list exists. The verifier
> > also uses this information to disallow adding types that are not
> > accepted by a certain list.
> >
> > For now, no elements can be added to such lists. Support for that is
> > coming in future patches, hence draining and freeing items is done with
> > a TODO that will be resolved in a future patch.
> >
> > Note that the bpf_list_head_free function moves the list out to a local
> > variable under the lock and releases it, doing the actual draining of
> > the list items outside the lock. While this helps with not holding the
> > lock for too long pessimizing other concurrent list operations, it is
> > also necessary for deadlock prevention: unless every function called in
> > the critical section would be notrace, a fentry/fexit program could
> > attach and call bpf_map_update_elem again on the map, leading to the
> > same lock being acquired if the key matches and lead to a deadlock.
> > While this requires some special effort on part of the BPF programmer to
> > trigger and is highly unlikely to occur in practice, it is always better
> > if we can avoid such a condition.
> >
> > While notrace would prevent this, doing the draining outside the lock
> > has advantages of its own, hence it is used to also fix the deadlock
> > related problem.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  include/linux/bpf.h            |  17 ++++
> >  include/uapi/linux/bpf.h       |  10 +++
> >  kernel/bpf/btf.c               | 143 ++++++++++++++++++++++++++++++++-
> >  kernel/bpf/helpers.c           |  32 ++++++++
> >  kernel/bpf/syscall.c           |  22 ++++-
> >  kernel/bpf/verifier.c          |   7 ++
> >  tools/include/uapi/linux/bpf.h |  10 +++
> >  7 files changed, 237 insertions(+), 4 deletions(-)
> >
>
> [...]
>
> >  struct bpf_offload_dev;
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 94659f6b3395..dd381086bad9 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -6887,6 +6887,16 @@ struct bpf_dynptr {
> >         __u64 :64;
> >  } __attribute__((aligned(8)));
> >
> > +struct bpf_list_head {
> > +       __u64 :64;
> > +       __u64 :64;
> > +} __attribute__((aligned(8)));
> > +
> > +struct bpf_list_node {
> > +       __u64 :64;
> > +       __u64 :64;
> > +} __attribute__((aligned(8)));
>
> Dave mentioned that this `__u64 :64` trick makes vmlinux.h lose the
> alignment information, as the struct itself is empty, and so there is
> nothing indicating that it has to be 8-byte aligned.
>
> So what if we have
>
> struct bpf_list_node {
>     __u64 __opaque[2];
> } __attribute__((aligned(8)));
>
> ?
>

Yep, can do that. Note that it's also potentially an issue for existing cases,
like bpf_spin_lock, bpf_timer, bpf_dynptr, etc. Not completely sure if changing
things now is possible, but if it is, we should probably make it for all of
them?

> > +
> >  struct bpf_sysctl {
> >         __u32   write;          /* Sysctl is being read (= 0) or written (= 1).
> >                                  * Allows 1,2,4-byte read, but no write.
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>
> [...]
>
> > @@ -3284,6 +3347,12 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
> >                         goto end;
> >                 }
> >         }
> > +       if (field_mask & BPF_LIST_HEAD) {
> > +               if (!strcmp(name, "bpf_list_head")) {
> > +                       type = BPF_LIST_HEAD;
> > +                       goto end;
> > +               }
> > +       }
> >         /* Only return BPF_KPTR when all other types with matchable names fail */
> >         if (field_mask & BPF_KPTR) {
> >                 type = BPF_KPTR_REF;
> > @@ -3317,6 +3386,8 @@ static int btf_find_struct_field(const struct btf *btf,
> >                         return field_type;
> >
> >                 off = __btf_member_bit_offset(t, member);
> > +               if (i && !off)
> > +                       return -EFAULT;
>
> why? why can't my struct has zero-sized field in the beginning? This
> seems like a very incomplete and unnecessary check to me.
>

Right, I will drop it for the struct case.

> >                 if (off % 8)
> >                         /* valid C code cannot generate such BTF */
> >                         return -EINVAL;
> > @@ -3339,6 +3410,12 @@ static int btf_find_struct_field(const struct btf *btf,
> >                         if (ret < 0)
> >                                 return ret;
> >                         break;
> > +               case BPF_LIST_HEAD:
> > +                       ret = btf_find_list_head(btf, t, member_type, i, off, sz,
> > +                                                idx < info_cnt ? &info[idx] : &tmp);
> > +                       if (ret < 0)
> > +                               return ret;
> > +                       break;
> >                 default:
> >                         return -EFAULT;
> >                 }
> > @@ -3373,6 +3450,8 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
> >                         return field_type;
> >
> >                 off = vsi->offset;
> > +               if (i && !off)
> > +                       return -EFAULT;
>
> similarly, I'd say that either we'd need to calculate the exact
> expected offset, or just not do anything here?
>

This thread is actually what prompted this check:
https://lore.kernel.org/bpf/CAEf4Bza7ga2hEQ4J7EtgRHz49p1vZtaT4d2RDiyGOKGK41Nt=Q@mail.gmail.com

Unless loaded using libbpf all offsets are zero. So I think we need to reject it
here, but I think the same zero sized field might be an issue for this as well,
so maybe we remember the last field size and check whether it was zero or not?

I'll also include some more tests for these cases.

next prev parent reply	other threads:[~2022-11-08 23:39 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-07 23:09 [PATCH bpf-next v5 00/25] Local kptrs, BPF linked lists Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 01/25] bpf: Remove BPF_MAP_OFF_ARR_MAX Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 02/25] bpf: Fix copy_map_value, zero_map_value Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 03/25] bpf: Support bpf_list_head in map values Kumar Kartikeya Dwivedi
2022-11-08 23:01   ` Andrii Nakryiko
2022-11-08 23:39     ` Kumar Kartikeya Dwivedi [this message]
2022-11-09  0:22       ` Andrii Nakryiko
2022-11-09  1:03         ` Alexei Starovoitov
2022-11-09 16:41           ` Kumar Kartikeya Dwivedi
2022-11-09 23:14             ` Andrii Nakryiko
2022-11-09 23:11           ` Andrii Nakryiko
2022-11-09 23:35             ` Alexei Starovoitov
2022-11-07 23:09 ` [PATCH bpf-next v5 04/25] bpf: Rename RET_PTR_TO_ALLOC_MEM Kumar Kartikeya Dwivedi
2022-11-08 23:08   ` Andrii Nakryiko
2022-11-07 23:09 ` [PATCH bpf-next v5 05/25] bpf: Rename MEM_ALLOC to MEM_RINGBUF Kumar Kartikeya Dwivedi
2022-11-08 23:14   ` Andrii Nakryiko
2022-11-08 23:49     ` Kumar Kartikeya Dwivedi
2022-11-09  0:26       ` Andrii Nakryiko
2022-11-09  1:05         ` Alexei Starovoitov
2022-11-09 22:58           ` Andrii Nakryiko
2022-11-07 23:09 ` [PATCH bpf-next v5 06/25] bpf: Introduce local kptrs Kumar Kartikeya Dwivedi
2022-11-08 23:29   ` Andrii Nakryiko
2022-11-09  0:00     ` Kumar Kartikeya Dwivedi
2022-11-09  0:36       ` Andrii Nakryiko
2022-11-09  1:32         ` Alexei Starovoitov
2022-11-09 17:00           ` Kumar Kartikeya Dwivedi
2022-11-09 23:23             ` Andrii Nakryiko
2022-11-09 23:21           ` Andrii Nakryiko
2022-11-07 23:09 ` [PATCH bpf-next v5 07/25] bpf: Recognize bpf_{spin_lock,list_head,list_node} in " Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 08/25] bpf: Verify ownership relationships for user BTF types Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 09/25] bpf: Allow locking bpf_spin_lock in local kptr Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 10/25] bpf: Allow locking bpf_spin_lock global variables Kumar Kartikeya Dwivedi
2022-11-08 23:37   ` Andrii Nakryiko
2022-11-09  0:03     ` Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 11/25] bpf: Allow locking bpf_spin_lock in inner map values Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 12/25] bpf: Rewrite kfunc argument handling Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 13/25] bpf: Drop kfunc bits from btf_check_func_arg_match Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 14/25] bpf: Support constant scalar arguments for kfuncs Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 15/25] bpf: Teach verifier about non-size constant arguments Kumar Kartikeya Dwivedi
2022-11-09  0:05   ` Andrii Nakryiko
2022-11-09 16:29     ` Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 16/25] bpf: Introduce bpf_obj_new Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 17/25] bpf: Introduce bpf_obj_drop Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 18/25] bpf: Permit NULL checking pointer with non-zero fixed offset Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 19/25] bpf: Introduce single ownership BPF linked list API Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 20/25] bpf: Add 'release on unlock' logic for bpf_list_push_{front,back} Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 21/25] selftests/bpf: Add __contains macro to bpf_experimental.h Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 22/25] selftests/bpf: Update spinlock selftest Kumar Kartikeya Dwivedi
2022-11-09  0:13   ` Andrii Nakryiko
2022-11-09 16:32     ` Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 23/25] selftests/bpf: Add failure test cases for spin lock pairing Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 24/25] selftests/bpf: Add BPF linked list API tests Kumar Kartikeya Dwivedi
2022-11-07 23:09 ` [PATCH bpf-next v5 25/25] selftests/bpf: Add BTF sanity tests Kumar Kartikeya Dwivedi
2022-11-09  0:18   ` Andrii Nakryiko
2022-11-09 16:33     ` Kumar Kartikeya Dwivedi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221108233944.o6ktnoinaggzir7t@apollo \
    --to=memxor@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@meta.com \
    --cc=delyank@meta.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox