public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Dave Marchevsky <davemarchevsky@fb.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
Date: Tue, 1 Aug 2023 20:07:08 -0700	[thread overview]
Message-ID: <7b99d323-711a-9971-3138-186461fbd245@linux.dev> (raw)
In-Reply-To: <20230801203630.3581291-1-davemarchevsky@fb.com>



On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> This series is the third of three (or more) followups to address issues
> in the bpf_refcount shared ownership implementation discovered by Kumar.
> This series addresses the use-after-free scenario described in [0]. The
> first followup series ([1]) also attempted to address the same
> use-after-free, but only got rid of the splat without addressing the
> underlying issue. After this series the underyling issue is fixed and
> bpf_refcount_acquire can be re-enabled.
> 
> The main fix here is migration of bpf_obj_drop to use
> bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
> the example interleaving provided by Kumar in [0]:
> 
> CPU 0                                   CPU 1
> n = bpf_obj_new
> lock(lock1)
> bpf_rbtree_add(rbtree1, n)
> m = bpf_rbtree_acquire(n)
> unlock(lock1)
> 
> kptr_xchg(map, m) // move to map
> // at this point, refcount = 2
> 					m = kptr_xchg(map, NULL)
> 					lock(lock2)
> lock(lock1)				bpf_rbtree_add(rbtree2, m)

On the right column: bpf_rbtree_add(rbtree1, m) ?

> p = bpf_rbtree_first(rbtree1)			if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
> bpf_rbtree_remove(rbtree1, p)
> unlock(lock1)
> bpf_obj_drop(p) // B
> 					bpf_refcount_acquire(m) // use-after-free
> 					...
> 
> Before this series, bpf_obj_drop returns memory to the allocator using
> bpf_mem_free. At this point (B in the example) there might be some
> non-owning references to that memory which the verifier believes are valid,
> but where the underlying memory was reused for some other allocation.
> Commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> non-owning refs") attempted to fix this by doing refcount_inc_non_zero
> on refcount_acquire in instead of refcount_inc under the assumption that
> preventing erroneous incr-on-0 would be sufficient. This isn't true,
> though: refcount_inc_non_zero must *check* if the refcount is zero, and
> the memory it's checking could have been reused, so the check may look
> at and incr random reused bytes.
> 
> If we wait to reuse this memory until all non-owning refs that could
> point to it are gone, there is no possibility of this scenario
> happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
> nodes accomplishes this.
> 
> For such nodes, the validity of their underlying memory is now tied to
> RCU Tasks Trace critical section. This matches MEM_RCU trustedness
> expectations, so the series takes the opportunity to more explicitly
> mark this trustedness state.
> 
> The functional effects of trustedness changes here are rather small.
> This is largely due to local kptrs having separate verifier handling -
> with implicit trustedness assumptions - than arbitrary kptrs.
> Regardless, let's take the opportunity to move towards a world where
> trustedness is more explictly handled.
> 
> Summary of patch contents, with sub-bullets being leading questions and
> comments I think are worth reviewer attention:
> 
>    * Patches 1 and 2 are moreso documententation - and
>      enforcement, in patch 1's case - of existing semantics / expectations
> 
>    * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
>      their underlying memory is not reused until RCU grace period elapses
>      * Perhaps it makes sense to move to mem_free_rcu for _all_
>        non-owning refs in the future, not just refcounted. This might
>        allow custom non-owning ref lifetime + invalidation logic to be
>        entirely subsumed by MEM_RCU handling. IMO this needs a bit more
>        thought and should be tackled outside of a fix series, so it's not
>        attempted here.
> 
>    * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
>      the remaining use-after-free
>      * One might expect this patch to be last in the series, or last
>        before selftest changes. Patches 5 and 6 don't change
>        verification or runtime behavior for existing BPF progs, though.
> 
>    * Patch 5 brings the verifier's understanding of refcounted node
>      trustedness in line with Patch 4's changes
> 
>    * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
>      progs. Marked RFC for a few reasons:
>      * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
>        since before the introduction of bpf linked list and rbtree. As
>        such this feels more like a new feature that may not belong in
>        this fixes series.
>      * If we do want to allow bpf_spin_{lock,unlock} in sleepable progs,
>        Alexei has expressed a preference for that do be done as part of a
>        broader set of changes to verifier determination of where those
>        helpers can be called, and what can be called inside the spin_lock
>        CS.
>      * I'm unsure whether preemption needs to be disabled in this patch
>        as well.
> 
>    * Patch 7 adds tests
> 
>    [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
>    [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/
> 
> Dave Marchevsky (7):
>    bpf: Ensure kptr_struct_meta is non-NULL for collection insert and
>      refcount_acquire
>    bpf: Consider non-owning refs trusted
>    bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
>    bpf: Reenable bpf_refcount_acquire
>    bpf: Consider non-owning refs to refcounted nodes RCU protected
>    [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS
>    selftests/bpf: Add tests for rbtree API interaction in sleepable progs
> 
>   include/linux/bpf.h                           |  3 +-
>   include/linux/bpf_verifier.h                  |  2 +-
>   kernel/bpf/helpers.c                          |  6 ++-
>   kernel/bpf/verifier.c                         | 45 ++++++++++++-----
>   .../bpf/prog_tests/refcounted_kptr.c          | 26 ++++++++++
>   .../selftests/bpf/progs/refcounted_kptr.c     | 37 ++++++++++++++
>   .../bpf/progs/refcounted_kptr_fail.c          | 48 +++++++++++++++++++
>   7 files changed, 153 insertions(+), 14 deletions(-)
> 

      parent reply	other threads:[~2023-08-02  3:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-02  3:57   ` Yonghong Song
2023-08-02 19:23     ` Dave Marchevsky
2023-08-02 21:41       ` Yonghong Song
2023-08-04  6:17         ` David Marchevsky
2023-08-04 15:37           ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-02  4:11   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-02  4:15   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-02  5:21   ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-02  5:59   ` Yonghong Song
2023-08-04  6:47     ` David Marchevsky
2023-08-04 15:43       ` Yonghong Song
2023-08-02 22:50   ` Alexei Starovoitov
2023-08-04  6:55     ` David Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
2023-08-02  6:33   ` Yonghong Song
2023-08-02 22:55   ` Alexei Starovoitov
2023-08-01 20:36 ` [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs Dave Marchevsky
2023-08-02 23:07   ` Alexei Starovoitov
2023-08-02  3:07 ` Yonghong Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b99d323-711a-9971-3138-186461fbd245@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davemarchevsky@fb.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox