All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Dave Marchevsky <davemarchevsky@fb.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v1 bpf-next 0/6] Allow bpf_refcount_acquire of mapval
Date: Mon, 30 Oct 2023 19:38:29 -0700	[thread overview]
Message-ID: <9e8834b4-bad3-4f92-b699-6780b5410a6a@linux.dev> (raw)
In-Reply-To: <20231025214007.2920506-1-davemarchevsky@fb.com>


On 10/25/23 2:40 PM, Dave Marchevsky wrote:
> Consider this BPF program:
>
>    struct cgv_node {
>      int d;
>      struct bpf_refcount r;
>      struct bpf_rb_node rb;
>    };
>
>    struct val_stash {
>      struct cgv_node __kptr *v;
>    };
>
>    struct {
>      __uint(type, BPF_MAP_TYPE_ARRAY);
>      __type(key, int);
>      __type(value, struct val_stash);
>      __uint(max_entries, 10);
>    } array_map SEC(".maps");
>
>    long bpf_program(void *ctx)
>    {
>      struct val_stash *mapval;
>      struct cgv_node *p;
>      int idx = 0;
>
>      mapval = bpf_map_lookup_elem(&array_map, &idx);
>      if (!mapval || !mapval->v) { /* omitted */ }
>
>      p = bpf_refcount_acquire(mapval->v); /* Verification FAILs here */
>
>      /* Add p to some tree */
>      return 0;
>    }
>
> Verification fails on the refcount_acquire:
>
>    160: (79) r1 = *(u64 *)(r8 +8)        ; R1_w=untrusted_ptr_or_null_cgv_node(id=11,off=0,imm=0) R8_w=map_value(id=10,off=0,ks=8,vs=16,imm=0) refs=6
>    161: (b7) r2 = 0                      ; R2_w=0 refs=6
>    162: (85) call bpf_refcount_acquire_impl#117824
>    arg#0 is neither owning or non-owning ref
>
> The above verifier dump is actually from sched_ext's scx_flatcg [0],
> which is the motivating usecase for this series' changes. Specifically,
> scx_flatcg stashes a rb_node type w/ cgroup-specific info (struct
> cgv_node) in a map when the cgroup is created, then later puts that
> cgroup's node in a rbtree in .enqueue . Making struct cgv_node
> refcounted would simplify the code a bit by virtue of allowing us to
> remove the kptr_xchg's, but "later puts that cgroups node in a rbtree"
> is not possible without a refcount_acquire, which suffers from the above
> verification failure.
>
> If we get rid of PTR_UNTRUSTED flag, and add MEM_ALLOC | NON_OWN_REF,
> mapval->v would be a non-owning ref and verification would succeed. Due
> to the most recent set of refcount changes [1], which modified
> bpf_obj_drop behavior to not reuse refcounted graph node's underlying
> memory until after RCU grace period, this is safe to do. Once mapval->v
> has the correct flags it _is_ a non-owning reference and verification of
> the motivating example will succeed.
>
>    [0]: https://github.com/sched-ext/sched_ext/blob/52911e1040a0f94b9c426dddcc00be5364a7ae9f/tools/sched_ext/scx_flatcg.bpf.c#L275
>    [1]: https://lore.kernel.org/bpf/20230821193311.3290257-1-davemarchevsky@fb.com/
>
> Summary of patches:
>    * Patch 1 fixes an issue with bpf_refcount_acquire verification
>      letting MAYBE_NULL ptrs through
>      * Patch 2 tests Patch 1's fix
>    * Patch 3 broadens the use of "free only after RCU GP" to all
>      user-allocated types
>    * Patch 4 is a small nonfunctional refactoring
>    * Patch 5 changes verifier to mark direct LD of stashed graph node
>      kptr as non-owning ref
>      * Patch 6 tests Patch 5's verifier changes
>
> Dave Marchevsky (6):
>    bpf: Add KF_RCU flag to bpf_refcount_acquire_impl
>    selftests/bpf: Add test passing MAYBE_NULL reg to bpf_refcount_acquire
>    bpf: Use bpf_mem_free_rcu when bpf_obj_dropping non-refcounted nodes
>    bpf: Move GRAPH_{ROOT,NODE}_MASK macros into btf_field_type enum
>    bpf: Mark direct ld of stashed bpf_{rb,list}_node as non-owning ref
>    selftests/bpf: Test bpf_refcount_acquire of node obtained via direct
>      ld
>
>   include/linux/bpf.h                           |  4 +-
>   kernel/bpf/btf.c                              | 11 ++-
>   kernel/bpf/helpers.c                          |  7 +-
>   kernel/bpf/verifier.c                         | 36 ++++++++--
>   .../bpf/prog_tests/local_kptr_stash.c         | 33 +++++++++
>   .../selftests/bpf/progs/local_kptr_stash.c    | 68 +++++++++++++++++++
>   .../bpf/progs/refcounted_kptr_fail.c          | 19 ++++++
>   7 files changed, 159 insertions(+), 19 deletions(-)
>
The patch looks good to me from high level.

There is a test failure and I added some comment in Patch 5.

Please take a look and address the test failure. Thanks!


      parent reply	other threads:[~2023-10-31  2:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-25 21:40 [PATCH v1 bpf-next 0/6] Allow bpf_refcount_acquire of mapval Dave Marchevsky
2023-10-25 21:40 ` [PATCH v1 bpf-next 1/6] bpf: Add KF_RCU flag to bpf_refcount_acquire_impl Dave Marchevsky
2023-10-25 21:40 ` [PATCH v1 bpf-next 2/6] selftests/bpf: Add test passing MAYBE_NULL reg to bpf_refcount_acquire Dave Marchevsky
2023-10-25 21:40 ` [PATCH v1 bpf-next 3/6] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping non-refcounted nodes Dave Marchevsky
2023-10-25 21:40 ` [PATCH v1 bpf-next 4/6] bpf: Move GRAPH_{ROOT,NODE}_MASK macros into btf_field_type enum Dave Marchevsky
2023-10-25 21:40 ` [PATCH v1 bpf-next 5/6] bpf: Mark direct ld of stashed bpf_{rb,list}_node as non-owning ref Dave Marchevsky
2023-10-31  2:36   ` Yonghong Song
2023-10-25 21:40 ` [PATCH v1 bpf-next 6/6] selftests/bpf: Test bpf_refcount_acquire of node obtained via direct ld Dave Marchevsky
2023-10-25 21:48 ` [PATCH v1 bpf-next 0/6] Allow bpf_refcount_acquire of mapval David Marchevsky
2023-10-31  2:38 ` Yonghong Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e8834b4-bad3-4f92-b699-6780b5410a6a@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.