From: Yonghong Song <yonghong.song@linux.dev>
To: Dave Marchevsky <davemarchevsky@fb.com>, bpf@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
Date: Tue, 1 Aug 2023 20:07:08 -0700 [thread overview]
Message-ID: <7b99d323-711a-9971-3138-186461fbd245@linux.dev> (raw)
In-Reply-To: <20230801203630.3581291-1-davemarchevsky@fb.com>
On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> This series is the third of three (or more) followups to address issues
> in the bpf_refcount shared ownership implementation discovered by Kumar.
> This series addresses the use-after-free scenario described in [0]. The
> first followup series ([1]) also attempted to address the same
> use-after-free, but only got rid of the splat without addressing the
> underlying issue. After this series the underyling issue is fixed and
> bpf_refcount_acquire can be re-enabled.
>
> The main fix here is migration of bpf_obj_drop to use
> bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
> the example interleaving provided by Kumar in [0]:
>
> CPU 0 CPU 1
> n = bpf_obj_new
> lock(lock1)
> bpf_rbtree_add(rbtree1, n)
> m = bpf_rbtree_acquire(n)
> unlock(lock1)
>
> kptr_xchg(map, m) // move to map
> // at this point, refcount = 2
> m = kptr_xchg(map, NULL)
> lock(lock2)
> lock(lock1) bpf_rbtree_add(rbtree2, m)
On the right column: bpf_rbtree_add(rbtree1, m) ?
> p = bpf_rbtree_first(rbtree1) if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
> bpf_rbtree_remove(rbtree1, p)
> unlock(lock1)
> bpf_obj_drop(p) // B
> bpf_refcount_acquire(m) // use-after-free
> ...
>
> Before this series, bpf_obj_drop returns memory to the allocator using
> bpf_mem_free. At this point (B in the example) there might be some
> non-owning references to that memory which the verifier believes are valid,
> but where the underlying memory was reused for some other allocation.
> Commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> non-owning refs") attempted to fix this by doing refcount_inc_non_zero
> on refcount_acquire in instead of refcount_inc under the assumption that
> preventing erroneous incr-on-0 would be sufficient. This isn't true,
> though: refcount_inc_non_zero must *check* if the refcount is zero, and
> the memory it's checking could have been reused, so the check may look
> at and incr random reused bytes.
>
> If we wait to reuse this memory until all non-owning refs that could
> point to it are gone, there is no possibility of this scenario
> happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
> nodes accomplishes this.
>
> For such nodes, the validity of their underlying memory is now tied to
> RCU Tasks Trace critical section. This matches MEM_RCU trustedness
> expectations, so the series takes the opportunity to more explicitly
> mark this trustedness state.
>
> The functional effects of trustedness changes here are rather small.
> This is largely due to local kptrs having separate verifier handling -
> with implicit trustedness assumptions - than arbitrary kptrs.
> Regardless, let's take the opportunity to move towards a world where
> trustedness is more explictly handled.
>
> Summary of patch contents, with sub-bullets being leading questions and
> comments I think are worth reviewer attention:
>
> * Patches 1 and 2 are moreso documententation - and
> enforcement, in patch 1's case - of existing semantics / expectations
>
> * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
> their underlying memory is not reused until RCU grace period elapses
> * Perhaps it makes sense to move to mem_free_rcu for _all_
> non-owning refs in the future, not just refcounted. This might
> allow custom non-owning ref lifetime + invalidation logic to be
> entirely subsumed by MEM_RCU handling. IMO this needs a bit more
> thought and should be tackled outside of a fix series, so it's not
> attempted here.
>
> * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
> the remaining use-after-free
> * One might expect this patch to be last in the series, or last
> before selftest changes. Patches 5 and 6 don't change
> verification or runtime behavior for existing BPF progs, though.
>
> * Patch 5 brings the verifier's understanding of refcounted node
> trustedness in line with Patch 4's changes
>
> * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
> progs. Marked RFC for a few reasons:
> * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
> since before the introduction of bpf linked list and rbtree. As
> such this feels more like a new feature that may not belong in
> this fixes series.
> * If we do want to allow bpf_spin_{lock,unlock} in sleepable progs,
> Alexei has expressed a preference for that do be done as part of a
> broader set of changes to verifier determination of where those
> helpers can be called, and what can be called inside the spin_lock
> CS.
> * I'm unsure whether preemption needs to be disabled in this patch
> as well.
>
> * Patch 7 adds tests
>
> [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
> [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/
>
> Dave Marchevsky (7):
> bpf: Ensure kptr_struct_meta is non-NULL for collection insert and
> refcount_acquire
> bpf: Consider non-owning refs trusted
> bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
> bpf: Reenable bpf_refcount_acquire
> bpf: Consider non-owning refs to refcounted nodes RCU protected
> [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS
> selftests/bpf: Add tests for rbtree API interaction in sleepable progs
>
> include/linux/bpf.h | 3 +-
> include/linux/bpf_verifier.h | 2 +-
> kernel/bpf/helpers.c | 6 ++-
> kernel/bpf/verifier.c | 45 ++++++++++++-----
> .../bpf/prog_tests/refcounted_kptr.c | 26 ++++++++++
> .../selftests/bpf/progs/refcounted_kptr.c | 37 ++++++++++++++
> .../bpf/progs/refcounted_kptr_fail.c | 48 +++++++++++++++++++
> 7 files changed, 153 insertions(+), 14 deletions(-)
>
prev parent reply other threads:[~2023-08-02 3:07 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 20:36 [PATCH v1 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-02 3:57 ` Yonghong Song
2023-08-02 19:23 ` Dave Marchevsky
2023-08-02 21:41 ` Yonghong Song
2023-08-04 6:17 ` David Marchevsky
2023-08-04 15:37 ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-02 4:11 ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-02 4:15 ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-02 5:21 ` Yonghong Song
2023-08-01 20:36 ` [PATCH v1 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-02 5:59 ` Yonghong Song
2023-08-04 6:47 ` David Marchevsky
2023-08-04 15:43 ` Yonghong Song
2023-08-02 22:50 ` Alexei Starovoitov
2023-08-04 6:55 ` David Marchevsky
2023-08-01 20:36 ` [PATCH v1 bpf-next 6/7] [RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS Dave Marchevsky
2023-08-02 6:33 ` Yonghong Song
2023-08-02 22:55 ` Alexei Starovoitov
2023-08-01 20:36 ` [PATCH v1 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs Dave Marchevsky
2023-08-02 23:07 ` Alexei Starovoitov
2023-08-02 3:07 ` Yonghong Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b99d323-711a-9971-3138-186461fbd245@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davemarchevsky@fb.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox