[PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
@ 2023-08-21 19:33 Dave Marchevsky
  2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

This series is the third of three (or more) followups to address issues
in the bpf_refcount shared ownership implementation discovered by Kumar.
This series addresses the use-after-free scenario described in [0]. The
first followup series ([1]) also attempted to address the same
use-after-free, but only got rid of the splat without addressing the
underlying issue. After this series the underyling issue is fixed and
bpf_refcount_acquire can be re-enabled.

The main fix here is migration of bpf_obj_drop to use
bpf_mem_free_rcu. To understand why this fixes the issue, let us consider
the example interleaving provided by Kumar in [0]:

CPU 0                                   CPU 1
n = bpf_obj_new
lock(lock1)
bpf_rbtree_add(rbtree1, n)
m = bpf_rbtree_acquire(n)
unlock(lock1)

kptr_xchg(map, m) // move to map
// at this point, refcount = 2
					m = kptr_xchg(map, NULL)
					lock(lock2)
lock(lock1)				bpf_rbtree_add(rbtree2, m)
p = bpf_rbtree_first(rbtree1)			if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A
bpf_rbtree_remove(rbtree1, p)
unlock(lock1)
bpf_obj_drop(p) // B
					bpf_refcount_acquire(m) // use-after-free
					...

Before this series, bpf_obj_drop returns memory to the allocator using
bpf_mem_free. At this point (B in the example) there might be some
non-owning references to that memory which the verifier believes are valid,
but where the underlying memory was reused for some other allocation.
Commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
non-owning refs") attempted to fix this by doing refcount_inc_non_zero
on refcount_acquire in instead of refcount_inc under the assumption that
preventing erroneous incr-on-0 would be sufficient. This isn't true,
though: refcount_inc_non_zero must *check* if the refcount is zero, and
the memory it's checking could have been reused, so the check may look
at and incr random reused bytes.

If we wait to reuse this memory until all non-owning refs that could
point to it are gone, there is no possibility of this scenario
happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted
nodes accomplishes this.

For such nodes, the validity of their underlying memory is now tied to
RCU critical section. This matches MEM_RCU trustedness
expectations, so the series takes the opportunity to more explicitly 
mark this trustedness state.

The functional effects of trustedness changes here are rather small.
This is largely due to local kptrs having separate verifier handling -
with implicit trustedness assumptions - than arbitrary kptrs.
Regardless, let's take the opportunity to move towards a world where
trustedness is more explicitly handled.

Changelog:

v1 -> v2: https://lore.kernel.org/bpf/20230801203630.3581291-1-davemarchevsky@fb.com/

Patch 1 ("bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire")
  * Spent some time experimenting with a better approach as per convo w/
    Yonghong on v1's patch. It started getting too complex, so left unchanged
    for now. Yonghong was fine with this approach being shipped.

Patch 2 ("bpf: Consider non-owning refs trusted")
  * Add Yonghong ack
Patch 3 ("bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes")
  * Add Yonghong ack
Patch 4 ("bpf: Reenable bpf_refcount_acquire")
  * Add Yonghong ack

Patch 5 ("bpf: Consider non-owning refs to refcounted nodes RCU protected")
  * Undo a nonfunctional whitespace change that shouldn't have been included
    (Yonghong)
  * Better logging message when complaining about rcu_read_{lock,unlock} in
    rbtree cb (Alexei)
  * Don't invalidate_non_owning_refs when processing bpf_rcu_read_unlock
    (Yonghong, Alexei)

Patch 6 ("[RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS")
  * preempt_{disable,enable} in __bpf_spin_{lock,unlock} (Alexei)
    * Due to this we can consider spin_lock CS an RCU-sched read-side CS (per
      RCU/Design/Requirements/Requirements.rst). Modify in_rcu_cs accordingly.
  * no need to check for !in_rcu_cs before allowing bpf_spin_{lock,unlock}
    (Alexei)
  * RFC tag removed and renamed to "bpf: Allow bpf_spin_{lock,unlock} in
    sleepable progs"

Patch 7 ("selftests/bpf: Add tests for rbtree API interaction in sleepable progs")
  * Remove "no explicit bpf_rcu_read_lock" failure test, add similar success
    test (Alexei)

Summary of patch contents, with sub-bullets being leading questions and
comments I think are worth reviewer attention:

  * Patches 1 and 2 are moreso documententation - and
    enforcement, in patch 1's case - of existing semantics / expectations

  * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that
    their underlying memory is not reused until RCU grace period elapses
    * Perhaps it makes sense to move to mem_free_rcu for _all_
      non-owning refs in the future, not just refcounted. This might
      allow custom non-owning ref lifetime + invalidation logic to be
      entirely subsumed by MEM_RCU handling. IMO this needs a bit more
      thought and should be tackled outside of a fix series, so it's not
      attempted here.

  * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix
    the remaining use-after-free
    * One might expect this patch to be last in the series, or last
      before selftest changes. Patches 5 and 6 don't change
      verification or runtime behavior for existing BPF progs, though.

  * Patch 5 brings the verifier's understanding of refcounted node
    trustedness in line with Patch 4's changes

  * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable
    progs. Marked RFC for a few reasons:
    * bpf_spin_{lock,unlock} haven't been usable in sleepable progs
      since before the introduction of bpf linked list and rbtree. As
      such this feels more like a new feature that may not belong in
      this fixes series.

  * Patch 7 adds tests

  [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/
  [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/

Dave Marchevsky (7):
  bpf: Ensure kptr_struct_meta is non-NULL for collection insert and
    refcount_acquire
  bpf: Consider non-owning refs trusted
  bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  bpf: Reenable bpf_refcount_acquire
  bpf: Consider non-owning refs to refcounted nodes RCU protected
  bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
  selftests/bpf: Add tests for rbtree API interaction in sleepable progs

 include/linux/bpf.h                           |  3 +-
 include/linux/bpf_verifier.h                  |  2 +-
 kernel/bpf/helpers.c                          |  8 ++-
 kernel/bpf/verifier.c                         | 41 ++++++++---
 .../bpf/prog_tests/refcounted_kptr.c          | 26 +++++++
 .../selftests/bpf/progs/refcounted_kptr.c     | 71 +++++++++++++++++++
 .../bpf/progs/refcounted_kptr_fail.c          | 28 ++++++++
 7 files changed, 165 insertions(+), 14 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
@ 2023-08-21 19:33 ` Dave Marchevsky
  2023-08-22  1:52   ` Yonghong Song
  2023-08-21 19:33 ` [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

It's straightforward to prove that kptr_struct_meta must be non-NULL for
any valid call to these kfuncs:

  * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
    struct in user BTF with a special field (e.g. bpf_refcount,
    {rb,list}_node). These are stored in that BTF's struct_meta_tab.

  * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
    have {rb,list}_node field and that it's at the correct offset.
    Similarly, check_kfunc_args ensures bpf_refcount field existence for
    node param to bpf_refcount_acquire.

  * So a btf_struct_meta must have been created for the struct type of
    node param to these kfuncs

  * That BTF and its struct_meta_tab are guaranteed to still be around.
    Any arbitrary {rb,list} node the BPF program interacts with either:
    came from bpf_obj_new or a collection removal kfunc in the same
    program, in which case the BTF is associated with the program and
    still around; or came from bpf_kptr_xchg, in which case the BTF was
    associated with the map and is still around

Instead of silently continuing with NULL struct_meta, which caused
confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
kptr_struct_meta for node param to list and rbtree insert funcs"), let's
error out. Then, at runtime, we can confidently say that the
implementations of these kfuncs were given a non-NULL kptr_struct_meta,
meaning that special-field-specific functionality like
bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
series are guaranteed to execute.

This patch doesn't change functionality, just makes it easier to reason
about existing functionality.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ccca1f6c998..ebc1a44abb33 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18281,6 +18281,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
 
+		if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
+		    !kptr_struct_meta) {
+			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		insn_buf[0] = addr[0];
 		insn_buf[1] = addr[1];
 		insn_buf[2] = *insn;
@@ -18288,6 +18295,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		int struct_meta_reg = BPF_REG_3;
 		int node_offset_reg = BPF_REG_4;
 
@@ -18297,6 +18305,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			node_offset_reg = BPF_REG_5;
 		}
 
+		if (!kptr_struct_meta) {
+			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
 						node_offset_reg, insn, insn_buf, cnt);
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
  2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
@ 2023-08-21 19:33 ` Dave Marchevsky
  2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

Recent discussions around default kptr "trustedness" led to changes such
as commit 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the
verifier."). One of the conclusions of those discussions, as expressed
in code and comments in that patch, is that we'd like to move away from
'raw' PTR_TO_BTF_ID without some type flag or other register state
indicating trustedness. Although PTR_TRUSTED and PTR_UNTRUSTED flags mark
this state explicitly, the verifier currently considers trustedness
implied by other register state. For example, owning refs to graph
collection nodes must have a nonzero ref_obj_id, so they pass the
is_trusted_reg check despite having no explicit PTR_{UN}TRUSTED flag.
This patch makes trustedness of non-owning refs to graph collection
nodes explicit as well.

By definition, non-owning refs are currently trusted. Although the ref
has no control over pointee lifetime, due to non-owning ref clobbering
rules (see invalidate_non_owning_refs) dereferencing a non-owning ref is
safe in the critical section controlled by bpf_spin_lock associated with
its owning collection.

Note that the previous statement does not hold true for nodes with shared
ownership due to the use-after-free issue that this series is
addressing. True shared ownership was disabled by commit 7deca5eae833
("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed"),
though, so the statement holds for now. Further patches in the series will change
the trustedness state of non-owning refs before re-enabling
bpf_refcount_acquire.

Let's add NON_OWN_REF type flag to BPF_REG_TRUSTED_MODIFIERS such that a
non-owning ref reg state would pass is_trusted_reg check. Somewhat
surprisingly, this doesn't result in any change to user-visible
functionality elsewhere in the verifier: graph collection nodes are all
marked MEM_ALLOC, which tends to be handled in separate codepaths from
"raw" PTR_TO_BTF_ID. Regardless, let's be explicit here and document the
current state of things before changing it elsewhere in the series.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf_verifier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f70f9ac884d2..b6e58dab8e27 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -745,7 +745,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
 	}
 }

-#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
+#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED | NON_OWN_REF)

 static inline bool bpf_type_has_unsafe_modifiers(u32 type)
 {
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
  2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
  2023-08-21 19:33 ` [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
@ 2023-08-21 19:33 ` Dave Marchevsky
  2023-08-23  6:26   ` Yonghong Song
  2023-08-21 19:33 ` [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

This is the final fix for the use-after-free scenario described in
commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
non-owning refs"). That commit, by virtue of changing
bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
the "refcount incr on 0" splat. The not_zero check in
refcount_inc_not_zero, though, still occurs on memory that could have
been free'd and reused, so the commit didn't properly fix the root
cause.

This patch actually fixes the issue by free'ing using the recently-added
bpf_mem_free_rcu, which ensures that the memory is not reused until
RCU grace period has elapsed. If that has happened then
there are no non-owning references alive that point to the
recently-free'd memory, so it can be safely reused.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/helpers.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index eb91cae0612a..945a85e25ac5 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
 
 	if (rec)
 		bpf_obj_free_fields(rec, p);
-	bpf_mem_free(&bpf_global_ma, p);
+
+	if (rec && rec->refcount_off >= 0)
+		bpf_mem_free_rcu(&bpf_global_ma, p);
+	else
+		bpf_mem_free(&bpf_global_ma, p);
 }
 
 __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (2 preceding siblings ...)
  2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
@ 2023-08-21 19:33 ` Dave Marchevsky
  2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

Now that all reported issues are fixed, bpf_refcount_acquire can be
turned back on. Also reenable all bpf_refcount-related tests which were
disabled.

This a revert of:
 * commit f3514a5d6740 ("selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled")
 * commit 7deca5eae833 ("bpf: Disable bpf_refcount_acquire kfunc calls until race conditions are fixed")

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c                         |  5 +---
 .../bpf/prog_tests/refcounted_kptr.c          | 26 +++++++++++++++++++
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ebc1a44abb33..8db0afa5985c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11223,10 +11223,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "arg#%d doesn't point to a type with bpf_refcount field\n", i);
 				return -EINVAL;
 			}
-			if (rec->refcount_off >= 0) {
-				verbose(env, "bpf_refcount_acquire calls are disabled for now\n");
-				return -EINVAL;
-			}
+
 			meta->arg_btf = reg->btf;
 			meta->arg_btf_id = reg->btf_id;
 			break;
diff --git a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
index 7423983472c7..d6bd5e16e637 100644
--- a/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c
@@ -9,12 +9,38 @@
 
 void test_refcounted_kptr(void)
 {
+	RUN_TESTS(refcounted_kptr);
 }
 
 void test_refcounted_kptr_fail(void)
 {
+	RUN_TESTS(refcounted_kptr_fail);
 }
 
 void test_refcounted_kptr_wrong_owner(void)
 {
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct refcounted_kptr *skel;
+	int ret;
+
+	skel = refcounted_kptr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_a1), &opts);
+	ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_a1");
+	ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_a1 retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_b), &opts);
+	ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_b");
+	ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_b retval");
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_a2), &opts);
+	ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_a2");
+	ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_a2 retval");
+	refcounted_kptr__destroy(skel);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (3 preceding siblings ...)
  2023-08-21 19:33 ` [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
@ 2023-08-21 19:33 ` Dave Marchevsky
  2023-08-22  2:37   ` Yonghong Song
  2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

An earlier patch in the series ensures that the underlying memory of
nodes with bpf_refcount - which can have multiple owners - is not reused
until RCU grace period has elapsed. This prevents
use-after-free with non-owning references that may point to
recently-freed memory. While RCU read lock is held, it's safe to
dereference such a non-owning ref, as by definition RCU GP couldn't have
elapsed and therefore underlying memory couldn't have been reused.

From the perspective of verifier "trustedness" non-owning refs to
refcounted nodes are now trusted only in RCU CS and therefore should no
longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
MEM_RCU in order to reflect this new state.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 include/linux/bpf.h   |  3 ++-
 kernel/bpf/verifier.c | 13 ++++++++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eced6400f778..12596af59c00 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -653,7 +653,8 @@ enum bpf_type_flag {
 	MEM_RCU			= BIT(13 + BPF_BASE_TYPE_BITS),
 
 	/* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
-	 * Currently only valid for linked-list and rbtree nodes.
+	 * Currently only valid for linked-list and rbtree nodes. If the nodes
+	 * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
 	 */
 	NON_OWN_REF		= BIT(14 + BPF_BASE_TYPE_BITS),
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8db0afa5985c..55607ab30522 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	case PTR_TO_BTF_ID | PTR_TRUSTED:
 	case PTR_TO_BTF_ID | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
+	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * its fixed offset must be 0. In the other cases, fixed offset
 		 * can be non-zero. This was already checked above. So pass
@@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
 static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
 	struct bpf_verifier_state *state = env->cur_state;
+	struct btf_record *rec = reg_btf_record(reg);
 
 	if (!state->active_lock.ptr) {
 		verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
@@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
 	}
 
 	reg->type |= NON_OWN_REF;
+	if (rec->refcount_off >= 0)
+		reg->type |= MEM_RCU;
+
 	return 0;
 }
 
@@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		struct bpf_func_state *state;
 		struct bpf_reg_state *reg;
 
+		if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
+			verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
+			return -EACCES;
+		}
+
 		if (rcu_lock) {
 			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
 			return -EINVAL;
@@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
-				if (env->cur_state->active_rcu_lock) {
+				if (env->cur_state->active_rcu_lock &&
+				    !in_rbtree_lock_required_cb(env)) {
 					verbose(env, "bpf_rcu_read_unlock is missing\n");
 					return -EINVAL;
 				}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (4 preceding siblings ...)
  2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
@ 2023-08-21 19:33 ` Dave Marchevsky
  2023-08-22  2:53   ` Yonghong Song
  2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
  2023-08-25 16:40 ` [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes patchwork-bot+netdevbpf
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks")
disabled bpf_spin_lock usage in sleepable progs, stating:

 Sleepable LSM programs can be preempted which means that allowng spin
 locks will need more work (disabling preemption and the verifier
 ensuring that no sleepable helpers are called when a spin lock is
 held).

This patch disables preemption before grabbing bpf_spin_lock. The second
requirement above "no sleepable helpers are called when a spin lock is
held" is implicitly enforced by current verifier logic due to helper
calls in spin_lock CS being disabled except for a few exceptions, none
of which sleep.

Due to above preemption changes, bpf_spin_lock CS can also be considered
a RCU CS, so verifier's in_rcu_cs check is modified to account for this.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/helpers.c  | 2 ++
 kernel/bpf/verifier.c | 9 +++------
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 945a85e25ac5..8bd3812fb8df 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -286,6 +286,7 @@ static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
 	compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
 	BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
 	BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
+	preempt_disable();
 	arch_spin_lock(l);
 }
 
@@ -294,6 +295,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
 	arch_spinlock_t *l = (void *)lock;
 
 	arch_spin_unlock(l);
+	preempt_enable();
 }
 
 #else
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 55607ab30522..33e4b854d2d4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5062,7 +5062,9 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
  */
 static bool in_rcu_cs(struct bpf_verifier_env *env)
 {
-	return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
+	return env->cur_state->active_rcu_lock ||
+	       env->cur_state->active_lock.ptr ||
+	       !env->prog->aux->sleepable;
 }
 
 /* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */
@@ -16980,11 +16982,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 			verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;
 		}
-
-		if (prog->aux->sleepable) {
-			verbose(env, "sleepable progs cannot use bpf_spin_lock yet\n");
-			return -EINVAL;
-		}
 	}
 
 	if (btf_record_has_field(map->record, BPF_TIMER)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (5 preceding siblings ...)
  2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
@ 2023-08-21 19:33 ` Dave Marchevsky
  2023-08-22  3:18   ` Yonghong Song
  2023-08-25 16:40 ` [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes patchwork-bot+netdevbpf
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Marchevsky @ 2023-08-21 19:33 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team, yonghong.song, Dave Marchevsky

Confirm that the following sleepable prog states fail verification:
  * bpf_rcu_read_unlock before bpf_spin_unlock
     * RCU CS will last at least as long as spin_lock CS

Also confirm that correct usage passes verification, specifically:
  * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog
  * Implied RCU CS due to spin_lock CS

None of the selftest progs actually attach to bpf_testmod's
bpf_testmod_test_read.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 .../selftests/bpf/progs/refcounted_kptr.c     | 71 +++++++++++++++++++
 .../bpf/progs/refcounted_kptr_fail.c          | 28 ++++++++
 2 files changed, 99 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
index c55652fdc63a..893a4fdb4b6e 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
@@ -8,6 +8,9 @@
 #include "bpf_misc.h"
 #include "bpf_experimental.h"
 
+extern void bpf_rcu_read_lock(void) __ksym;
+extern void bpf_rcu_read_unlock(void) __ksym;
+
 struct node_data {
 	long key;
 	long list_data;
@@ -497,4 +500,72 @@ long rbtree_wrong_owner_remove_fail_a2(void *ctx)
 	return 0;
 }
 
+SEC("?fentry.s/bpf_testmod_test_read")
+__success
+int BPF_PROG(rbtree_sleepable_rcu,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	struct bpf_rb_node *rb;
+	struct node_data *n, *m = NULL;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 0;
+
+	bpf_rcu_read_lock();
+	bpf_spin_lock(&lock);
+	bpf_rbtree_add(&root, &n->r, less);
+	rb = bpf_rbtree_first(&root);
+	if (!rb)
+		goto err_out;
+
+	rb = bpf_rbtree_remove(&root, rb);
+	if (!rb)
+		goto err_out;
+
+	m = container_of(rb, struct node_data, r);
+
+err_out:
+	bpf_spin_unlock(&lock);
+	bpf_rcu_read_unlock();
+	if (m)
+		bpf_obj_drop(m);
+	return 0;
+}
+
+SEC("?fentry.s/bpf_testmod_test_read")
+__success
+int BPF_PROG(rbtree_sleepable_rcu_no_explicit_rcu_lock,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	struct bpf_rb_node *rb;
+	struct node_data *n, *m = NULL;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 0;
+
+	/* No explicit bpf_rcu_read_lock */
+	bpf_spin_lock(&lock);
+	bpf_rbtree_add(&root, &n->r, less);
+	rb = bpf_rbtree_first(&root);
+	if (!rb)
+		goto err_out;
+
+	rb = bpf_rbtree_remove(&root, rb);
+	if (!rb)
+		goto err_out;
+
+	m = container_of(rb, struct node_data, r);
+
+err_out:
+	bpf_spin_unlock(&lock);
+	/* No explicit bpf_rcu_read_unlock */
+	if (m)
+		bpf_obj_drop(m);
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
index 0b09e5c915b1..1ef07f6ee580 100644
--- a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
@@ -13,6 +13,9 @@ struct node_acquire {
 	struct bpf_refcount refcount;
 };
 
+extern void bpf_rcu_read_lock(void) __ksym;
+extern void bpf_rcu_read_unlock(void) __ksym;
+
 #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
 private(A) struct bpf_spin_lock glock;
 private(A) struct bpf_rb_root groot __contains(node_acquire, node);
@@ -71,4 +74,29 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
 	return 0;
 }
 
+SEC("?fentry.s/bpf_testmod_test_read")
+__failure __msg("function calls are not allowed while holding a lock")
+int BPF_PROG(rbtree_fail_sleepable_lock_across_rcu,
+	     struct file *file, struct kobject *kobj,
+	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
+{
+	struct node_acquire *n;
+
+	n = bpf_obj_new(typeof(*n));
+	if (!n)
+		return 0;
+
+	/* spin_{lock,unlock} are in different RCU CS */
+	bpf_rcu_read_lock();
+	bpf_spin_lock(&glock);
+	bpf_rbtree_add(&groot, &n->node, less);
+	bpf_rcu_read_unlock();
+
+	bpf_rcu_read_lock();
+	bpf_spin_unlock(&glock);
+	bpf_rcu_read_unlock();
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
  2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
@ 2023-08-22  1:52   ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-22  1:52 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> It's straightforward to prove that kptr_struct_meta must be non-NULL for
> any valid call to these kfuncs:
> 
>    * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>      struct in user BTF with a special field (e.g. bpf_refcount,
>      {rb,list}_node). These are stored in that BTF's struct_meta_tab.
> 
>    * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>      have {rb,list}_node field and that it's at the correct offset.
>      Similarly, check_kfunc_args ensures bpf_refcount field existence for
>      node param to bpf_refcount_acquire.
> 
>    * So a btf_struct_meta must have been created for the struct type of
>      node param to these kfuncs
> 
>    * That BTF and its struct_meta_tab are guaranteed to still be around.
>      Any arbitrary {rb,list} node the BPF program interacts with either:
>      came from bpf_obj_new or a collection removal kfunc in the same
>      program, in which case the BTF is associated with the program and
>      still around; or came from bpf_kptr_xchg, in which case the BTF was
>      associated with the map and is still around
> 
> Instead of silently continuing with NULL struct_meta, which caused
> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
> error out. Then, at runtime, we can confidently say that the
> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
> meaning that special-field-specific functionality like
> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
> series are guaranteed to execute.
> 
> This patch doesn't change functionality, just makes it easier to reason
> about existing functionality.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>

Acked-by: Yonghong Song <yonghong.song@linux.dev>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
@ 2023-08-22  2:37   ` Yonghong Song
  2023-08-22  3:19     ` Yonghong Song
  2023-08-22  5:47     ` David Marchevsky
  0 siblings, 2 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-22  2:37 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> An earlier patch in the series ensures that the underlying memory of
> nodes with bpf_refcount - which can have multiple owners - is not reused
> until RCU grace period has elapsed. This prevents
> use-after-free with non-owning references that may point to
> recently-freed memory. While RCU read lock is held, it's safe to
> dereference such a non-owning ref, as by definition RCU GP couldn't have
> elapsed and therefore underlying memory couldn't have been reused.
> 
>  From the perspective of verifier "trustedness" non-owning refs to
> refcounted nodes are now trusted only in RCU CS and therefore should no
> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
> MEM_RCU in order to reflect this new state.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   include/linux/bpf.h   |  3 ++-
>   kernel/bpf/verifier.c | 13 ++++++++++++-
>   2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index eced6400f778..12596af59c00 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>   	MEM_RCU			= BIT(13 + BPF_BASE_TYPE_BITS),
>   
>   	/* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
> -	 * Currently only valid for linked-list and rbtree nodes.
> +	 * Currently only valid for linked-list and rbtree nodes. If the nodes
> +	 * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>   	 */
>   	NON_OWN_REF		= BIT(14 + BPF_BASE_TYPE_BITS),
>   
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8db0afa5985c..55607ab30522 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>   	case PTR_TO_BTF_ID | PTR_TRUSTED:
>   	case PTR_TO_BTF_ID | MEM_RCU:
>   	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
> +	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>   		/* When referenced PTR_TO_BTF_ID is passed to release function,
>   		 * its fixed offset must be 0. In the other cases, fixed offset
>   		 * can be non-zero. This was already checked above. So pass
> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>   {
>   	struct bpf_verifier_state *state = env->cur_state;
> +	struct btf_record *rec = reg_btf_record(reg);
>   
>   	if (!state->active_lock.ptr) {
>   		verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>   	}
>   
>   	reg->type |= NON_OWN_REF;
> +	if (rec->refcount_off >= 0)
> +		reg->type |= MEM_RCU;

Should the above MEM_RCU marking be done unless reg access is in
rcu critical section?

I think we still have issues for state resetting
with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
will try to convert the reg state to PTR_UNTRUSTED.

Let us say reg state is
   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU

(1). If hitting bpf_spin_unlock(), since MEM_RCU is in
the reg state, the state should become
   PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
some additional code might be needed so we wont have
verifier complaints about ref_obj_id == 0.

(2). If hitting bpf_rcu_read_unlock(), the state should become
   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
since register access still in bpf_spin_lock() region.

Does this make sense?

> +
>   	return 0;
>   }
>   
> @@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   		struct bpf_func_state *state;
>   		struct bpf_reg_state *reg;
>   
> +		if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
> +			verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
> +			return -EACCES;
> +		}
> +
>   		if (rcu_lock) {
>   			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>   			return -EINVAL;
> @@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env)
>   					return -EINVAL;
>   				}
>   
> -				if (env->cur_state->active_rcu_lock) {
> +				if (env->cur_state->active_rcu_lock &&
> +				    !in_rbtree_lock_required_cb(env)) {
>   					verbose(env, "bpf_rcu_read_unlock is missing\n");
>   					return -EINVAL;
>   				}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
  2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
@ 2023-08-22  2:53   ` Yonghong Song
  2023-08-22 19:46     ` Alexei Starovoitov
  0 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-22  2:53 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks")
> disabled bpf_spin_lock usage in sleepable progs, stating:
> 
>   Sleepable LSM programs can be preempted which means that allowng spin
>   locks will need more work (disabling preemption and the verifier
>   ensuring that no sleepable helpers are called when a spin lock is
>   held).
> 
> This patch disables preemption before grabbing bpf_spin_lock. The second
> requirement above "no sleepable helpers are called when a spin lock is
> held" is implicitly enforced by current verifier logic due to helper
> calls in spin_lock CS being disabled except for a few exceptions, none
> of which sleep.
> 
> Due to above preemption changes, bpf_spin_lock CS can also be considered
> a RCU CS, so verifier's in_rcu_cs check is modified to account for this.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   kernel/bpf/helpers.c  | 2 ++
>   kernel/bpf/verifier.c | 9 +++------
>   2 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 945a85e25ac5..8bd3812fb8df 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -286,6 +286,7 @@ static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
>   	compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
>   	BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
>   	BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
> +	preempt_disable();
>   	arch_spin_lock(l);
>   }
>   
> @@ -294,6 +295,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
>   	arch_spinlock_t *l = (void *)lock;
>   
>   	arch_spin_unlock(l);
> +	preempt_enable();
>   }

preempt_disable()/preempt_enable() is not needed. Is it possible we can
have a different bpf_spin_lock proto, e.g, bpf_spin_lock_sleepable_proto
which implements the above with preempt_disable()/preempt_enable()?
Not sure how much difference my proposal will make since current
bpf_spin_lock() region does not support func calls except some
graph api kfunc operations.

>   
>   #else
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 55607ab30522..33e4b854d2d4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5062,7 +5062,9 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>    */
>   static bool in_rcu_cs(struct bpf_verifier_env *env)
>   {
> -	return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
> +	return env->cur_state->active_rcu_lock ||
> +	       env->cur_state->active_lock.ptr ||
> +	       !env->prog->aux->sleepable;
>   }
>   
>   /* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */
> @@ -16980,11 +16982,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
>   			verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
>   			return -EINVAL;
>   		}
> -
> -		if (prog->aux->sleepable) {
> -			verbose(env, "sleepable progs cannot use bpf_spin_lock yet\n");
> -			return -EINVAL;
> -		}
>   	}
>   
>   	if (btf_record_has_field(map->record, BPF_TIMER)) {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
@ 2023-08-22  3:18   ` Yonghong Song
  2023-08-22  5:21     ` David Marchevsky
  0 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-22  3:18 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> Confirm that the following sleepable prog states fail verification:
>    * bpf_rcu_read_unlock before bpf_spin_unlock
>       * RCU CS will last at least as long as spin_lock CS

I think the reason is bpf_spin_lock() does not allow any functions
in spin lock region except some graph api kfunc's.

> 
> Also confirm that correct usage passes verification, specifically:
>    * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog
>    * Implied RCU CS due to spin_lock CS
> 
> None of the selftest progs actually attach to bpf_testmod's
> bpf_testmod_test_read.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   .../selftests/bpf/progs/refcounted_kptr.c     | 71 +++++++++++++++++++
>   .../bpf/progs/refcounted_kptr_fail.c          | 28 ++++++++
>   2 files changed, 99 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
> index c55652fdc63a..893a4fdb4b6e 100644
[...]
> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
> index 0b09e5c915b1..1ef07f6ee580 100644
> --- a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
> +++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
> @@ -13,6 +13,9 @@ struct node_acquire {
>   	struct bpf_refcount refcount;
>   };
>   
> +extern void bpf_rcu_read_lock(void) __ksym;
> +extern void bpf_rcu_read_unlock(void) __ksym;
> +
>   #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
>   private(A) struct bpf_spin_lock glock;
>   private(A) struct bpf_rb_root groot __contains(node_acquire, node);
> @@ -71,4 +74,29 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
>   	return 0;
>   }
>   
> +SEC("?fentry.s/bpf_testmod_test_read")
> +__failure __msg("function calls are not allowed while holding a lock")
> +int BPF_PROG(rbtree_fail_sleepable_lock_across_rcu,
> +	     struct file *file, struct kobject *kobj,
> +	     struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
> +{
> +	struct node_acquire *n;
> +
> +	n = bpf_obj_new(typeof(*n));
> +	if (!n)
> +		return 0;
> +
> +	/* spin_{lock,unlock} are in different RCU CS */
> +	bpf_rcu_read_lock();
> +	bpf_spin_lock(&glock);
> +	bpf_rbtree_add(&groot, &n->node, less);
> +	bpf_rcu_read_unlock();
> +
> +	bpf_rcu_read_lock();
> +	bpf_spin_unlock(&glock);
> +	bpf_rcu_read_unlock();
> +
> +	return 0;
> +}
> +
>   char _license[] SEC("license") = "GPL";

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-22  2:37   ` Yonghong Song
@ 2023-08-22  3:19     ` Yonghong Song
  2023-08-22  5:47     ` David Marchevsky
  1 sibling, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-22  3:19 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 7:37 PM, Yonghong Song wrote:
> 
> 
> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>> An earlier patch in the series ensures that the underlying memory of
>> nodes with bpf_refcount - which can have multiple owners - is not reused
>> until RCU grace period has elapsed. This prevents
>> use-after-free with non-owning references that may point to
>> recently-freed memory. While RCU read lock is held, it's safe to
>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>> elapsed and therefore underlying memory couldn't have been reused.
>>
>>  From the perspective of verifier "trustedness" non-owning refs to
>> refcounted nodes are now trusted only in RCU CS and therefore should no
>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>> MEM_RCU in order to reflect this new state.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>   include/linux/bpf.h   |  3 ++-
>>   kernel/bpf/verifier.c | 13 ++++++++++++-
>>   2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index eced6400f778..12596af59c00 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>       MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>       /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are 
>> non-owning.
>> -     * Currently only valid for linked-list and rbtree nodes.
>> +     * Currently only valid for linked-list and rbtree nodes. If the 
>> nodes
>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>        */
>>       NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 8db0afa5985c..55607ab30522 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct 
>> bpf_verifier_env *env,
>>       case PTR_TO_BTF_ID | PTR_TRUSTED:
>>       case PTR_TO_BTF_ID | MEM_RCU:
>>       case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>           /* When referenced PTR_TO_BTF_ID is passed to release function,
>>            * its fixed offset must be 0. In the other cases, fixed offset
>>            * can be non-zero. This was already checked above. So pass
>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct 
>> bpf_verifier_env *env,
>>   static int ref_set_non_owning(struct bpf_verifier_env *env, struct 
>> bpf_reg_state *reg)
>>   {
>>       struct bpf_verifier_state *state = env->cur_state;
>> +    struct btf_record *rec = reg_btf_record(reg);
>>       if (!state->active_lock.ptr) {
>>           verbose(env, "verifier internal error: ref_set_non_owning 
>> w/o active lock\n");
>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct 
>> bpf_verifier_env *env, struct bpf_reg_state
>>       }
>>       reg->type |= NON_OWN_REF;
>> +    if (rec->refcount_off >= 0)
>> +        reg->type |= MEM_RCU;
> 
> Should the above MEM_RCU marking be done unless reg access is in
> rcu critical section?
> 
> I think we still have issues for state resetting
> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
> will try to convert the reg state to PTR_UNTRUSTED.
> 
> Let us say reg state is
>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
> 
> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
> the reg state, the state should become
>    PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
> some additional code might be needed so we wont have
> verifier complaints about ref_obj_id == 0.
> 
> (2). If hitting bpf_rcu_read_unlock(), the state should become
>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
> since register access still in bpf_spin_lock() region.
> 
> Does this make sense?

Okay, seems scenario (2) is not possible as bpf_rcu_read_unlock()
is not allowed in bpf spin lock region.

> 
>> +
>>       return 0;
>>   }
>> @@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct 
>> bpf_verifier_env *env, struct bpf_insn *insn,
>>           struct bpf_func_state *state;
>>           struct bpf_reg_state *reg;
>> +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || 
>> rcu_unlock)) {
>> +            verbose(env, "Calling bpf_rcu_read_{lock,unlock} in 
>> unnecessary rbtree callback\n");
>> +            return -EACCES;
>> +        }
>> +
>>           if (rcu_lock) {
>>               verbose(env, "nested rcu read lock (kernel function 
>> %s)\n", func_name);
>>               return -EINVAL;
>> @@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env)
>>                       return -EINVAL;
>>                   }
>> -                if (env->cur_state->active_rcu_lock) {
>> +                if (env->cur_state->active_rcu_lock &&
>> +                    !in_rbtree_lock_required_cb(env)) {
>>                       verbose(env, "bpf_rcu_read_unlock is missing\n");
>>                       return -EINVAL;
>>                   }
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  2023-08-22  3:18   ` Yonghong Song
@ 2023-08-22  5:21     ` David Marchevsky
  2023-08-22 15:00       ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: David Marchevsky @ 2023-08-22  5:21 UTC (permalink / raw)
  To: yonghong.song, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On 8/21/23 11:18 PM, Yonghong Song wrote:
> 
> 
> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>> Confirm that the following sleepable prog states fail verification:
>>    * bpf_rcu_read_unlock before bpf_spin_unlock
>>       * RCU CS will last at least as long as spin_lock CS
> 
> I think the reason is bpf_spin_lock() does not allow any functions
> in spin lock region except some graph api kfunc's.
> 

Yeah, agreed, this test isn't really validating anything with current verifier
logic. But, given that spin_lock CS w/ disabled preemption is an RCU CS, do
you forsee wanting to allow rcu_read_unlock within spin_lock CS?

I'll delete the test if you think it should go, but maybe it's worth
keeping with a comment summarizing why it's an interesting example.

Also, the existing comment in that test is incorrect, will fix.

>>
>> Also confirm that correct usage passes verification, specifically:
>>    * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog
>>    * Implied RCU CS due to spin_lock CS
>>
>> None of the selftest progs actually attach to bpf_testmod's
>> bpf_testmod_test_read.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>   .../selftests/bpf/progs/refcounted_kptr.c     | 71 +++++++++++++++++++
>>   .../bpf/progs/refcounted_kptr_fail.c          | 28 ++++++++
>>   2 files changed, 99 insertions(+)
>>
>> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
>> index c55652fdc63a..893a4fdb4b6e 100644
> [...]
>> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
>> index 0b09e5c915b1..1ef07f6ee580 100644
>> --- a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
>> +++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
>> @@ -13,6 +13,9 @@ struct node_acquire {
>>       struct bpf_refcount refcount;
>>   };
>>   +extern void bpf_rcu_read_lock(void) __ksym;
>> +extern void bpf_rcu_read_unlock(void) __ksym;
>> +
>>   #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
>>   private(A) struct bpf_spin_lock glock;
>>   private(A) struct bpf_rb_root groot __contains(node_acquire, node);
>> @@ -71,4 +74,29 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
>>       return 0;
>>   }
>>   +SEC("?fentry.s/bpf_testmod_test_read")
>> +__failure __msg("function calls are not allowed while holding a lock")
>> +int BPF_PROG(rbtree_fail_sleepable_lock_across_rcu,
>> +         struct file *file, struct kobject *kobj,
>> +         struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
>> +{
>> +    struct node_acquire *n;
>> +
>> +    n = bpf_obj_new(typeof(*n));
>> +    if (!n)
>> +        return 0;
>> +
>> +    /* spin_{lock,unlock} are in different RCU CS */
>> +    bpf_rcu_read_lock();
>> +    bpf_spin_lock(&glock);
>> +    bpf_rbtree_add(&groot, &n->node, less);
>> +    bpf_rcu_read_unlock();
>> +
>> +    bpf_rcu_read_lock();
>> +    bpf_spin_unlock(&glock);
>> +    bpf_rcu_read_unlock();
>> +
>> +    return 0;
>> +}
>> +
>>   char _license[] SEC("license") = "GPL";

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-22  2:37   ` Yonghong Song
  2023-08-22  3:19     ` Yonghong Song
@ 2023-08-22  5:47     ` David Marchevsky
  2023-08-22 16:02       ` Yonghong Song
  2023-08-22 23:45       ` Alexei Starovoitov
  1 sibling, 2 replies; 32+ messages in thread
From: David Marchevsky @ 2023-08-22  5:47 UTC (permalink / raw)
  To: yonghong.song, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team

On 8/21/23 10:37 PM, Yonghong Song wrote:
> 
> 
> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>> An earlier patch in the series ensures that the underlying memory of
>> nodes with bpf_refcount - which can have multiple owners - is not reused
>> until RCU grace period has elapsed. This prevents
>> use-after-free with non-owning references that may point to
>> recently-freed memory. While RCU read lock is held, it's safe to
>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>> elapsed and therefore underlying memory couldn't have been reused.
>>
>>  From the perspective of verifier "trustedness" non-owning refs to
>> refcounted nodes are now trusted only in RCU CS and therefore should no
>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>> MEM_RCU in order to reflect this new state.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>   include/linux/bpf.h   |  3 ++-
>>   kernel/bpf/verifier.c | 13 ++++++++++++-
>>   2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index eced6400f778..12596af59c00 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>       MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>         /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>> -     * Currently only valid for linked-list and rbtree nodes.
>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>        */
>>       NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>>   diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 8db0afa5985c..55607ab30522 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>       case PTR_TO_BTF_ID | PTR_TRUSTED:
>>       case PTR_TO_BTF_ID | MEM_RCU:
>>       case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>           /* When referenced PTR_TO_BTF_ID is passed to release function,
>>            * its fixed offset must be 0. In the other cases, fixed offset
>>            * can be non-zero. This was already checked above. So pass
>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>   {
>>       struct bpf_verifier_state *state = env->cur_state;
>> +    struct btf_record *rec = reg_btf_record(reg);
>>         if (!state->active_lock.ptr) {
>>           verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>       }
>>         reg->type |= NON_OWN_REF;
>> +    if (rec->refcount_off >= 0)
>> +        reg->type |= MEM_RCU;
> 
> Should the above MEM_RCU marking be done unless reg access is in
> rcu critical section?

I think it is fine, since non-owning references currently exist only within
spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
disabled + spin_lock CS should imply RCU CS.

  [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/

> 
> I think we still have issues for state resetting
> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
> will try to convert the reg state to PTR_UNTRUSTED.
> 
> Let us say reg state is
>   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
> 
> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
> the reg state, the state should become
>   PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
> some additional code might be needed so we wont have
> verifier complaints about ref_obj_id == 0.
> 
> (2). If hitting bpf_rcu_read_unlock(), the state should become
>   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
> since register access still in bpf_spin_lock() region.

I agree w/ your comment in side reply stating that this
case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
is currently not allowed.

> 
> Does this make sense?
> 


IIUC the specific reg state flow you're recommending is based on the convos
we've had over the past few weeks re: getting rid of special non-owning ref
lifetime rules, instead using RCU as much as possible. Specifically, this
recommended change would remove non-owning ref clobbering, instead just removing
NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
to collection kfuncs (refcount_acquire, etc).

I agree that in general we'll be able to loosen the lifetime logic for
non-owning refs, and your specific suggestion sounds reasonable. IMO it's
better to ship that as a separate series, though, as this series was meant
to be the minimum changes necessary to re-enable bpf_refcount_acquire, and
it's expanded a bit past that already. Easier to reason about the rest
of this series' changes without having to account for clobbering changes.

>> +
>>       return 0;
>>   }
>>   @@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>           struct bpf_func_state *state;
>>           struct bpf_reg_state *reg;
>>   +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>> +            verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
>> +            return -EACCES;
>> +        }
>> +
>>           if (rcu_lock) {
>>               verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>               return -EINVAL;
>> @@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env)
>>                       return -EINVAL;
>>                   }
>>   -                if (env->cur_state->active_rcu_lock) {
>> +                if (env->cur_state->active_rcu_lock &&
>> +                    !in_rbtree_lock_required_cb(env)) {
>>                       verbose(env, "bpf_rcu_read_unlock is missing\n");
>>                       return -EINVAL;
>>                   }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs
  2023-08-22  5:21     ` David Marchevsky
@ 2023-08-22 15:00       ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-22 15:00 UTC (permalink / raw)
  To: David Marchevsky, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 10:21 PM, David Marchevsky wrote:
> On 8/21/23 11:18 PM, Yonghong Song wrote:
>>
>>
>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>> Confirm that the following sleepable prog states fail verification:
>>>     * bpf_rcu_read_unlock before bpf_spin_unlock
>>>        * RCU CS will last at least as long as spin_lock CS
>>
>> I think the reason is bpf_spin_lock() does not allow any functions
>> in spin lock region except some graph api kfunc's.
>>
> 
> Yeah, agreed, this test isn't really validating anything with current verifier
> logic. But, given that spin_lock CS w/ disabled preemption is an RCU CS, do
> you forsee wanting to allow rcu_read_unlock within spin_lock CS?
> 
> I'll delete the test if you think it should go, but maybe it's worth
> keeping with a comment summarizing why it's an interesting example.

Ya, it is an interesting case for interaction of rcu lock vs. spin lock.
I guess you can keep it with comments, unless there are some other
objections.

> 
> Also, the existing comment in that test is incorrect, will fix.
> 
>>>
>>> Also confirm that correct usage passes verification, specifically:
>>>     * Explicit use of bpf_rcu_read_{lock, unlock} in sleepable test prog
>>>     * Implied RCU CS due to spin_lock CS
>>>
>>> None of the selftest progs actually attach to bpf_testmod's
>>> bpf_testmod_test_read.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>    .../selftests/bpf/progs/refcounted_kptr.c     | 71 +++++++++++++++++++
>>>    .../bpf/progs/refcounted_kptr_fail.c          | 28 ++++++++
>>>    2 files changed, 99 insertions(+)
>>>
>>> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr.c b/tools/testing/selftests/bpf/progs/refcounted_kptr.c
>>> index c55652fdc63a..893a4fdb4b6e 100644
>> [...]
>>> diff --git a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
>>> index 0b09e5c915b1..1ef07f6ee580 100644
>>> --- a/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
>>> +++ b/tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c
>>> @@ -13,6 +13,9 @@ struct node_acquire {
>>>        struct bpf_refcount refcount;
>>>    };
>>>    +extern void bpf_rcu_read_lock(void) __ksym;
>>> +extern void bpf_rcu_read_unlock(void) __ksym;
>>> +
>>>    #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
>>>    private(A) struct bpf_spin_lock glock;
>>>    private(A) struct bpf_rb_root groot __contains(node_acquire, node);
>>> @@ -71,4 +74,29 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
>>>        return 0;
>>>    }
>>>    +SEC("?fentry.s/bpf_testmod_test_read")
>>> +__failure __msg("function calls are not allowed while holding a lock")
>>> +int BPF_PROG(rbtree_fail_sleepable_lock_across_rcu,
>>> +         struct file *file, struct kobject *kobj,
>>> +         struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
>>> +{
>>> +    struct node_acquire *n;
>>> +
>>> +    n = bpf_obj_new(typeof(*n));
>>> +    if (!n)
>>> +        return 0;
>>> +
>>> +    /* spin_{lock,unlock} are in different RCU CS */
>>> +    bpf_rcu_read_lock();
>>> +    bpf_spin_lock(&glock);
>>> +    bpf_rbtree_add(&groot, &n->node, less);
>>> +    bpf_rcu_read_unlock();
>>> +
>>> +    bpf_rcu_read_lock();
>>> +    bpf_spin_unlock(&glock);
>>> +    bpf_rcu_read_unlock();
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>    char _license[] SEC("license") = "GPL";

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-22  5:47     ` David Marchevsky
@ 2023-08-22 16:02       ` Yonghong Song
  2023-08-22 23:45       ` Alexei Starovoitov
  1 sibling, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-22 16:02 UTC (permalink / raw)
  To: David Marchevsky, Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 10:47 PM, David Marchevsky wrote:
> On 8/21/23 10:37 PM, Yonghong Song wrote:
>>
>>
>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>> An earlier patch in the series ensures that the underlying memory of
>>> nodes with bpf_refcount - which can have multiple owners - is not reused
>>> until RCU grace period has elapsed. This prevents
>>> use-after-free with non-owning references that may point to
>>> recently-freed memory. While RCU read lock is held, it's safe to
>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>>> elapsed and therefore underlying memory couldn't have been reused.
>>>
>>>   From the perspective of verifier "trustedness" non-owning refs to
>>> refcounted nodes are now trusted only in RCU CS and therefore should no
>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>>> MEM_RCU in order to reflect this new state.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>    include/linux/bpf.h   |  3 ++-
>>>    kernel/bpf/verifier.c | 13 ++++++++++++-
>>>    2 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index eced6400f778..12596af59c00 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>>        MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>>          /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>>> -     * Currently only valid for linked-list and rbtree nodes.
>>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
>>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>>         */
>>>        NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>>>    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 8db0afa5985c..55607ab30522 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>        case PTR_TO_BTF_ID | PTR_TRUSTED:
>>>        case PTR_TO_BTF_ID | MEM_RCU:
>>>        case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>>            /* When referenced PTR_TO_BTF_ID is passed to release function,
>>>             * its fixed offset must be 0. In the other cases, fixed offset
>>>             * can be non-zero. This was already checked above. So pass
>>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>>    static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>>    {
>>>        struct bpf_verifier_state *state = env->cur_state;
>>> +    struct btf_record *rec = reg_btf_record(reg);
>>>          if (!state->active_lock.ptr) {
>>>            verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>>        }
>>>          reg->type |= NON_OWN_REF;
>>> +    if (rec->refcount_off >= 0)
>>> +        reg->type |= MEM_RCU;
>>
>> Should the above MEM_RCU marking be done unless reg access is in
>> rcu critical section?
> 
> I think it is fine, since non-owning references currently exist only within
> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
> disabled + spin_lock CS should imply RCU CS.
> 
>    [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/

Okay, I see. Thanks for the pointer.
If MEM_RCU is marked not in explicit rcu cs, it does make sense
to clobber everything when bpf_spin_unlock is hit.
But see below.

> 
>>
>> I think we still have issues for state resetting
>> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
>> will try to convert the reg state to PTR_UNTRUSTED.
>>
>> Let us say reg state is
>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
>>
>> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
>> the reg state, the state should become
>>    PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
>> some additional code might be needed so we wont have
>> verifier complaints about ref_obj_id == 0.
>>
>> (2). If hitting bpf_rcu_read_unlock(), the state should become
>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
>> since register access still in bpf_spin_lock() region.
> 
> I agree w/ your comment in side reply stating that this
> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
> is currently not allowed.
> 
>>
>> Does this make sense?
>>
> 
> 
> IIUC the specific reg state flow you're recommending is based on the convos
> we've had over the past few weeks re: getting rid of special non-owning ref
> lifetime rules, instead using RCU as much as possible. Specifically, this
> recommended change would remove non-owning ref clobbering, instead just removing
> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
> to collection kfuncs (refcount_acquire, etc).
> 
> I agree that in general we'll be able to loosen the lifetime logic for
> non-owning refs, and your specific suggestion sounds reasonable. IMO it's
> better to ship that as a separate series, though, as this series was meant
> to be the minimum changes necessary to re-enable bpf_refcount_acquire, and
> it's expanded a bit past that already. Easier to reason about the rest
> of this series' changes without having to account for clobbering changes.

I removed
   >>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
and
   >>> +    if (rec->refcount_off >= 0)
   >>> +        reg->type |= MEM_RCU;

All 'refcount' selftests seem working fine.
If this is indeed unnecessary for this patch set, maybe we can delay
this RCU marking thing for a future patch set?

> 
>>> +
>>>        return 0;
>>>    }
>>>    @@ -11328,6 +11333,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>            struct bpf_func_state *state;
>>>            struct bpf_reg_state *reg;
>>>    +        if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
>>> +            verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
>>> +            return -EACCES;
>>> +        }
>>> +
>>>            if (rcu_lock) {
>>>                verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
>>>                return -EINVAL;
>>> @@ -16689,7 +16699,8 @@ static int do_check(struct bpf_verifier_env *env)
>>>                        return -EINVAL;
>>>                    }
>>>    -                if (env->cur_state->active_rcu_lock) {
>>> +                if (env->cur_state->active_rcu_lock &&
>>> +                    !in_rbtree_lock_required_cb(env)) {
>>>                        verbose(env, "bpf_rcu_read_unlock is missing\n");
>>>                        return -EINVAL;
>>>                    }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
  2023-08-22  2:53   ` Yonghong Song
@ 2023-08-22 19:46     ` Alexei Starovoitov
  2023-08-22 19:53       ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 19:46 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team

On Mon, Aug 21, 2023 at 07:53:22PM -0700, Yonghong Song wrote:
> 
> 
> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> > Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks")
> > disabled bpf_spin_lock usage in sleepable progs, stating:
> > 
> >   Sleepable LSM programs can be preempted which means that allowng spin
> >   locks will need more work (disabling preemption and the verifier
> >   ensuring that no sleepable helpers are called when a spin lock is
> >   held).
> > 
> > This patch disables preemption before grabbing bpf_spin_lock. The second
> > requirement above "no sleepable helpers are called when a spin lock is
> > held" is implicitly enforced by current verifier logic due to helper
> > calls in spin_lock CS being disabled except for a few exceptions, none
> > of which sleep.
> > 
> > Due to above preemption changes, bpf_spin_lock CS can also be considered
> > a RCU CS, so verifier's in_rcu_cs check is modified to account for this.
> > 
> > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > ---
> >   kernel/bpf/helpers.c  | 2 ++
> >   kernel/bpf/verifier.c | 9 +++------
> >   2 files changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 945a85e25ac5..8bd3812fb8df 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -286,6 +286,7 @@ static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
> >   	compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
> >   	BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
> >   	BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
> > +	preempt_disable();
> >   	arch_spin_lock(l);
> >   }
> > @@ -294,6 +295,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
> >   	arch_spinlock_t *l = (void *)lock;
> >   	arch_spin_unlock(l);
> > +	preempt_enable();
> >   }
> 
> preempt_disable()/preempt_enable() is not needed. Is it possible we can

preempt_disable is needed in all cases. This mistake slipped in when
we converted preempt disabled bpf progs into migrate disabled.
For example, see how raw_spin_lock is doing it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
  2023-08-22 19:46     ` Alexei Starovoitov
@ 2023-08-22 19:53       ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-22 19:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team



On 8/22/23 12:46 PM, Alexei Starovoitov wrote:
> On Mon, Aug 21, 2023 at 07:53:22PM -0700, Yonghong Song wrote:
>>
>>
>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>> Commit 9e7a4d9831e8 ("bpf: Allow LSM programs to use bpf spin locks")
>>> disabled bpf_spin_lock usage in sleepable progs, stating:
>>>
>>>    Sleepable LSM programs can be preempted which means that allowng spin
>>>    locks will need more work (disabling preemption and the verifier
>>>    ensuring that no sleepable helpers are called when a spin lock is
>>>    held).
>>>
>>> This patch disables preemption before grabbing bpf_spin_lock. The second
>>> requirement above "no sleepable helpers are called when a spin lock is
>>> held" is implicitly enforced by current verifier logic due to helper
>>> calls in spin_lock CS being disabled except for a few exceptions, none
>>> of which sleep.
>>>
>>> Due to above preemption changes, bpf_spin_lock CS can also be considered
>>> a RCU CS, so verifier's in_rcu_cs check is modified to account for this.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>    kernel/bpf/helpers.c  | 2 ++
>>>    kernel/bpf/verifier.c | 9 +++------
>>>    2 files changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index 945a85e25ac5..8bd3812fb8df 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -286,6 +286,7 @@ static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
>>>    	compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
>>>    	BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
>>>    	BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
>>> +	preempt_disable();
>>>    	arch_spin_lock(l);
>>>    }
>>> @@ -294,6 +295,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
>>>    	arch_spinlock_t *l = (void *)lock;
>>>    	arch_spin_unlock(l);
>>> +	preempt_enable();
>>>    }
>>
>> preempt_disable()/preempt_enable() is not needed. Is it possible we can
> 
> preempt_disable is needed in all cases. This mistake slipped in when
> we converted preempt disabled bpf progs into migrate disabled.
> For example, see how raw_spin_lock is doing it.

Okay, a slipped bug. That explains the difference between our 
bpf_spin_lock and raw_spin_lock. The change then makes sense.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-22  5:47     ` David Marchevsky
  2023-08-22 16:02       ` Yonghong Song
@ 2023-08-22 23:45       ` Alexei Starovoitov
  2023-08-23  0:18         ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-22 23:45 UTC (permalink / raw)
  To: David Marchevsky
  Cc: yonghong.song, Dave Marchevsky, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Kernel Team

On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote:
> On 8/21/23 10:37 PM, Yonghong Song wrote:
> > 
> > 
> > On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> >> An earlier patch in the series ensures that the underlying memory of
> >> nodes with bpf_refcount - which can have multiple owners - is not reused
> >> until RCU grace period has elapsed. This prevents
> >> use-after-free with non-owning references that may point to
> >> recently-freed memory. While RCU read lock is held, it's safe to
> >> dereference such a non-owning ref, as by definition RCU GP couldn't have
> >> elapsed and therefore underlying memory couldn't have been reused.
> >>
> >>  From the perspective of verifier "trustedness" non-owning refs to
> >> refcounted nodes are now trusted only in RCU CS and therefore should no
> >> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
> >> MEM_RCU in order to reflect this new state.
> >>
> >> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> >> ---
> >>   include/linux/bpf.h   |  3 ++-
> >>   kernel/bpf/verifier.c | 13 ++++++++++++-
> >>   2 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >> index eced6400f778..12596af59c00 100644
> >> --- a/include/linux/bpf.h
> >> +++ b/include/linux/bpf.h
> >> @@ -653,7 +653,8 @@ enum bpf_type_flag {
> >>       MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
> >>         /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
> >> -     * Currently only valid for linked-list and rbtree nodes.
> >> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
> >> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
> >>        */
> >>       NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
> >>   diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index 8db0afa5985c..55607ab30522 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >>       case PTR_TO_BTF_ID | PTR_TRUSTED:
> >>       case PTR_TO_BTF_ID | MEM_RCU:
> >>       case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
> >> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
> >>           /* When referenced PTR_TO_BTF_ID is passed to release function,
> >>            * its fixed offset must be 0. In the other cases, fixed offset
> >>            * can be non-zero. This was already checked above. So pass
> >> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
> >>   static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> >>   {
> >>       struct bpf_verifier_state *state = env->cur_state;
> >> +    struct btf_record *rec = reg_btf_record(reg);
> >>         if (!state->active_lock.ptr) {
> >>           verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
> >> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
> >>       }
> >>         reg->type |= NON_OWN_REF;
> >> +    if (rec->refcount_off >= 0)
> >> +        reg->type |= MEM_RCU;
> > 
> > Should the above MEM_RCU marking be done unless reg access is in
> > rcu critical section?
> 
> I think it is fine, since non-owning references currently exist only within
> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
> disabled + spin_lock CS should imply RCU CS.
> 
>   [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/
> 
> > 
> > I think we still have issues for state resetting
> > with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
> > will try to convert the reg state to PTR_UNTRUSTED.
> > 
> > Let us say reg state is
> >   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
> > 
> > (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
> > the reg state, the state should become
> >   PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
> > some additional code might be needed so we wont have
> > verifier complaints about ref_obj_id == 0.
> > 
> > (2). If hitting bpf_rcu_read_unlock(), the state should become
> >   PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
> > since register access still in bpf_spin_lock() region.
> 
> I agree w/ your comment in side reply stating that this
> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
> is currently not allowed.
> 
> > 
> > Does this make sense?
> > 
> 
> 
> IIUC the specific reg state flow you're recommending is based on the convos
> we've had over the past few weeks re: getting rid of special non-owning ref
> lifetime rules, instead using RCU as much as possible. Specifically, this
> recommended change would remove non-owning ref clobbering, instead just removing
> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
> to collection kfuncs (refcount_acquire, etc).

Overall the patch set makes sense to me, but I want to clarify above.
My understanding that after the patch set applied bpf_spin_unlock()
will invalidate_non_owning_refs(), so what Yonghong is saying in (1)
is not correct.
Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid().

Re: (2) even if/when bpf_rcu_read_unlock() will allowed inside spinlocked region
it will convert PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU to
PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | PTR_UNTRUSTED
which is a buggy combination which we would need to address if rcu_unlock is allowed eventually.

Did I get it right?
If so I think the whole set is good to do.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-22 23:45       ` Alexei Starovoitov
@ 2023-08-23  0:18         ` Yonghong Song
  2023-08-23  0:21           ` Alexei Starovoitov
  0 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-23  0:18 UTC (permalink / raw)
  To: Alexei Starovoitov, David Marchevsky
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team



On 8/22/23 4:45 PM, Alexei Starovoitov wrote:
> On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote:
>> On 8/21/23 10:37 PM, Yonghong Song wrote:
>>>
>>>
>>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>>> An earlier patch in the series ensures that the underlying memory of
>>>> nodes with bpf_refcount - which can have multiple owners - is not reused
>>>> until RCU grace period has elapsed. This prevents
>>>> use-after-free with non-owning references that may point to
>>>> recently-freed memory. While RCU read lock is held, it's safe to
>>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
>>>> elapsed and therefore underlying memory couldn't have been reused.
>>>>
>>>>   From the perspective of verifier "trustedness" non-owning refs to
>>>> refcounted nodes are now trusted only in RCU CS and therefore should no
>>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
>>>> MEM_RCU in order to reflect this new state.
>>>>
>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>> ---
>>>>    include/linux/bpf.h   |  3 ++-
>>>>    kernel/bpf/verifier.c | 13 ++++++++++++-
>>>>    2 files changed, 14 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>> index eced6400f778..12596af59c00 100644
>>>> --- a/include/linux/bpf.h
>>>> +++ b/include/linux/bpf.h
>>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
>>>>        MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
>>>>          /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
>>>> -     * Currently only valid for linked-list and rbtree nodes.
>>>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
>>>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
>>>>         */
>>>>        NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
>>>>    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 8db0afa5985c..55607ab30522 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>>        case PTR_TO_BTF_ID | PTR_TRUSTED:
>>>>        case PTR_TO_BTF_ID | MEM_RCU:
>>>>        case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
>>>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
>>>>            /* When referenced PTR_TO_BTF_ID is passed to release function,
>>>>             * its fixed offset must be 0. In the other cases, fixed offset
>>>>             * can be non-zero. This was already checked above. So pass
>>>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
>>>>    static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>>>>    {
>>>>        struct bpf_verifier_state *state = env->cur_state;
>>>> +    struct btf_record *rec = reg_btf_record(reg);
>>>>          if (!state->active_lock.ptr) {
>>>>            verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
>>>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
>>>>        }
>>>>          reg->type |= NON_OWN_REF;
>>>> +    if (rec->refcount_off >= 0)
>>>> +        reg->type |= MEM_RCU;
>>>
>>> Should the above MEM_RCU marking be done unless reg access is in
>>> rcu critical section?
>>
>> I think it is fine, since non-owning references currently exist only within
>> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
>> disabled + spin_lock CS should imply RCU CS.
>>
>>    [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/
>>
>>>
>>> I think we still have issues for state resetting
>>> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
>>> will try to convert the reg state to PTR_UNTRUSTED.
>>>
>>> Let us say reg state is
>>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
>>>
>>> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
>>> the reg state, the state should become
>>>    PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
>>> some additional code might be needed so we wont have
>>> verifier complaints about ref_obj_id == 0.
>>>
>>> (2). If hitting bpf_rcu_read_unlock(), the state should become
>>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
>>> since register access still in bpf_spin_lock() region.
>>
>> I agree w/ your comment in side reply stating that this
>> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
>> is currently not allowed.
>>
>>>
>>> Does this make sense?
>>>
>>
>>
>> IIUC the specific reg state flow you're recommending is based on the convos
>> we've had over the past few weeks re: getting rid of special non-owning ref
>> lifetime rules, instead using RCU as much as possible. Specifically, this
>> recommended change would remove non-owning ref clobbering, instead just removing
>> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
>> to collection kfuncs (refcount_acquire, etc).
> 
> Overall the patch set makes sense to me, but I want to clarify above.
> My understanding that after the patch set applied bpf_spin_unlock()
> will invalidate_non_owning_refs(), so what Yonghong is saying in (1)
> is not correct.
> Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid().

I said it 'should become ...', but you are right. right now, it will do
mark_reg_invalid(). So it is correct just MAYBE a little conservative.

> 
> Re: (2) even if/when bpf_rcu_read_unlock() will allowed inside spinlocked region
> it will convert PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU to
> PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | PTR_UNTRUSTED
> which is a buggy combination which we would need to address if rcu_unlock is allowed eventually.
> 
> Did I get it right?
> If so I think the whole set is good to do.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
  2023-08-23  0:18         ` Yonghong Song
@ 2023-08-23  0:21           ` Alexei Starovoitov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-23  0:21 UTC (permalink / raw)
  To: Yonghong Song
  Cc: David Marchevsky, Dave Marchevsky, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Kernel Team

On Tue, Aug 22, 2023 at 5:18 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/22/23 4:45 PM, Alexei Starovoitov wrote:
> > On Tue, Aug 22, 2023 at 01:47:01AM -0400, David Marchevsky wrote:
> >> On 8/21/23 10:37 PM, Yonghong Song wrote:
> >>>
> >>>
> >>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> >>>> An earlier patch in the series ensures that the underlying memory of
> >>>> nodes with bpf_refcount - which can have multiple owners - is not reused
> >>>> until RCU grace period has elapsed. This prevents
> >>>> use-after-free with non-owning references that may point to
> >>>> recently-freed memory. While RCU read lock is held, it's safe to
> >>>> dereference such a non-owning ref, as by definition RCU GP couldn't have
> >>>> elapsed and therefore underlying memory couldn't have been reused.
> >>>>
> >>>>   From the perspective of verifier "trustedness" non-owning refs to
> >>>> refcounted nodes are now trusted only in RCU CS and therefore should no
> >>>> longer pass is_trusted_reg, but rather is_rcu_reg. Let's mark them
> >>>> MEM_RCU in order to reflect this new state.
> >>>>
> >>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> >>>> ---
> >>>>    include/linux/bpf.h   |  3 ++-
> >>>>    kernel/bpf/verifier.c | 13 ++++++++++++-
> >>>>    2 files changed, 14 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >>>> index eced6400f778..12596af59c00 100644
> >>>> --- a/include/linux/bpf.h
> >>>> +++ b/include/linux/bpf.h
> >>>> @@ -653,7 +653,8 @@ enum bpf_type_flag {
> >>>>        MEM_RCU            = BIT(13 + BPF_BASE_TYPE_BITS),
> >>>>          /* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
> >>>> -     * Currently only valid for linked-list and rbtree nodes.
> >>>> +     * Currently only valid for linked-list and rbtree nodes. If the nodes
> >>>> +     * have a bpf_refcount_field, they must be tagged MEM_RCU as well.
> >>>>         */
> >>>>        NON_OWN_REF        = BIT(14 + BPF_BASE_TYPE_BITS),
> >>>>    diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>>> index 8db0afa5985c..55607ab30522 100644
> >>>> --- a/kernel/bpf/verifier.c
> >>>> +++ b/kernel/bpf/verifier.c
> >>>> @@ -8013,6 +8013,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >>>>        case PTR_TO_BTF_ID | PTR_TRUSTED:
> >>>>        case PTR_TO_BTF_ID | MEM_RCU:
> >>>>        case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
> >>>> +    case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
> >>>>            /* When referenced PTR_TO_BTF_ID is passed to release function,
> >>>>             * its fixed offset must be 0. In the other cases, fixed offset
> >>>>             * can be non-zero. This was already checked above. So pass
> >>>> @@ -10479,6 +10480,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
> >>>>    static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> >>>>    {
> >>>>        struct bpf_verifier_state *state = env->cur_state;
> >>>> +    struct btf_record *rec = reg_btf_record(reg);
> >>>>          if (!state->active_lock.ptr) {
> >>>>            verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
> >>>> @@ -10491,6 +10493,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
> >>>>        }
> >>>>          reg->type |= NON_OWN_REF;
> >>>> +    if (rec->refcount_off >= 0)
> >>>> +        reg->type |= MEM_RCU;
> >>>
> >>> Should the above MEM_RCU marking be done unless reg access is in
> >>> rcu critical section?
> >>
> >> I think it is fine, since non-owning references currently exist only within
> >> spin_lock CS. Based on Alexei's comments on v1 of this series [0], preemption
> >> disabled + spin_lock CS should imply RCU CS.
> >>
> >>    [0]: https://lore.kernel.org/bpf/20230802230715.3ltalexaczbomvbu@MacBook-Pro-8.local/
> >>
> >>>
> >>> I think we still have issues for state resetting
> >>> with bpf_spin_unlock() and bpf_rcu_read_unlock(), both of which
> >>> will try to convert the reg state to PTR_UNTRUSTED.
> >>>
> >>> Let us say reg state is
> >>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU
> >>>
> >>> (1). If hitting bpf_spin_unlock(), since MEM_RCU is in
> >>> the reg state, the state should become
> >>>    PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU
> >>> some additional code might be needed so we wont have
> >>> verifier complaints about ref_obj_id == 0.
> >>>
> >>> (2). If hitting bpf_rcu_read_unlock(), the state should become
> >>>    PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF
> >>> since register access still in bpf_spin_lock() region.
> >>
> >> I agree w/ your comment in side reply stating that this
> >> case isn't possible since bpf_rcu_read_{lock,unlock} in spin_lock CS
> >> is currently not allowed.
> >>
> >>>
> >>> Does this make sense?
> >>>
> >>
> >>
> >> IIUC the specific reg state flow you're recommending is based on the convos
> >> we've had over the past few weeks re: getting rid of special non-owning ref
> >> lifetime rules, instead using RCU as much as possible. Specifically, this
> >> recommended change would remove non-owning ref clobbering, instead just removing
> >> NON_OWN_REF flag on bpf_spin_unlock so that such nodes can no longer be passed
> >> to collection kfuncs (refcount_acquire, etc).
> >
> > Overall the patch set makes sense to me, but I want to clarify above.
> > My understanding that after the patch set applied bpf_spin_unlock()
> > will invalidate_non_owning_refs(), so what Yonghong is saying in (1)
> > is not correct.
> > Instead PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU will become mark_reg_invalid().
>
> I said it 'should become ...', but you are right. right now, it will do
> mark_reg_invalid(). So it is correct just MAYBE a little conservative.

Ahh. You mean that it should be fixed to do that. Got it.
non_own_ref after spin_unlock should become a pure mem_rcu pointer.
Need to think it through. Probably correct.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
@ 2023-08-23  6:26   ` Yonghong Song
  2023-08-23 16:20     ` Alexei Starovoitov
  0 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-23  6:26 UTC (permalink / raw)
  To: Dave Marchevsky, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Kernel Team



On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> This is the final fix for the use-after-free scenario described in
> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> non-owning refs"). That commit, by virtue of changing
> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
> the "refcount incr on 0" splat. The not_zero check in
> refcount_inc_not_zero, though, still occurs on memory that could have
> been free'd and reused, so the commit didn't properly fix the root
> cause.
> 
> This patch actually fixes the issue by free'ing using the recently-added
> bpf_mem_free_rcu, which ensures that the memory is not reused until
> RCU grace period has elapsed. If that has happened then
> there are no non-owning references alive that point to the
> recently-free'd memory, so it can be safely reused.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>   kernel/bpf/helpers.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index eb91cae0612a..945a85e25ac5 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>   
>   	if (rec)
>   		bpf_obj_free_fields(rec, p);

During reviewing my percpu kptr patch with link
 
https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
Kumar mentioned although percpu memory itself is freed under rcu.
But its record fields are freed immediately. This will cause
the problem since there may be some active uses of these fields
within rcu cs and after bpf_obj_free_fields(), some fields may
be re-initialized with new memory but they do not have chances
to free any more.

Do we have problem here as well?

I am thinking whether I could create another flavor of bpf_mem_free_rcu
with a pre_free_callback function, something like
   bpf_mem_free_rcu_cb2(struct bpf_mem_alloc *ma, void *ptr,
       void (*cb)(void *, void *), void *arg1, void *arg2)

The cb(arg1, arg2) will be called right before the real free of "ptr".

For example, for this patch, the callback function can be

static bpf_obj_free_fields_cb(void *rec, void *p)
{
	if (rec)
		bpf_obj_free_fields(rec, p);
		/* we need to ensure recursive freeing fields free
		 * needs to be done immediately, which means we will
		 * add a parameter to __bpf_obj_drop_impl() to
		 * indicate whether bpf_mem_free or bpf_mem_free_rcu
		 * should be called.
		 */
}

bpf_mem_free_rcu_cb2(&bpf_global_ma, p, bpf_obj_free_fields_cb, rec, p);

In __bpf_obj_drop_impl,
we need to ensure recursive freeing fields free
needs to be done immediately, which means we will
add a parameter to __bpf_obj_drop_impl() to
indicate whether bpf_mem_free or bpf_mem_free_rcu
should be called. If __bpf_obj_drop_impl is called
from bpf_obj_drop_impl(), __rcu version should be used.
Otherwise, non rcu version should be used.

Is this something we need to worry about here as well?
What do you think the above interface?


> -	bpf_mem_free(&bpf_global_ma, p);
> +
> +	if (rec && rec->refcount_off >= 0)
> +		bpf_mem_free_rcu(&bpf_global_ma, p);
> +	else
> +		bpf_mem_free(&bpf_global_ma, p);
>   }
>   
>   __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-23  6:26   ` Yonghong Song
@ 2023-08-23 16:20     ` Alexei Starovoitov
  2023-08-23 20:29       ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-23 16:20 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team

On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> > This is the final fix for the use-after-free scenario described in
> > commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> > non-owning refs"). That commit, by virtue of changing
> > bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
> > the "refcount incr on 0" splat. The not_zero check in
> > refcount_inc_not_zero, though, still occurs on memory that could have
> > been free'd and reused, so the commit didn't properly fix the root
> > cause.
> >
> > This patch actually fixes the issue by free'ing using the recently-added
> > bpf_mem_free_rcu, which ensures that the memory is not reused until
> > RCU grace period has elapsed. If that has happened then
> > there are no non-owning references alive that point to the
> > recently-free'd memory, so it can be safely reused.
> >
> > Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > Acked-by: Yonghong Song <yonghong.song@linux.dev>
> > ---
> >   kernel/bpf/helpers.c | 6 +++++-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index eb91cae0612a..945a85e25ac5 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
> >
> >       if (rec)
> >               bpf_obj_free_fields(rec, p);
>
> During reviewing my percpu kptr patch with link
>
> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
> Kumar mentioned although percpu memory itself is freed under rcu.
> But its record fields are freed immediately. This will cause
> the problem since there may be some active uses of these fields
> within rcu cs and after bpf_obj_free_fields(), some fields may
> be re-initialized with new memory but they do not have chances
> to free any more.
>
> Do we have problem here as well?

I think it's not an issue here or in your percpu patch,
since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
call bpf_mem_free_rcu() (after this patch set lands).

In other words all bpf pointers are either properly life time
checked through the verifier and freed via immediate bpf_mem_free()
or they're bpf_mem_free_rcu().


>
> I am thinking whether I could create another flavor of bpf_mem_free_rcu
> with a pre_free_callback function, something like
>    bpf_mem_free_rcu_cb2(struct bpf_mem_alloc *ma, void *ptr,
>        void (*cb)(void *, void *), void *arg1, void *arg2)
>
> The cb(arg1, arg2) will be called right before the real free of "ptr".
>
> For example, for this patch, the callback function can be
>
> static bpf_obj_free_fields_cb(void *rec, void *p)
> {
>         if (rec)
>                 bpf_obj_free_fields(rec, p);
>                 /* we need to ensure recursive freeing fields free
>                  * needs to be done immediately, which means we will
>                  * add a parameter to __bpf_obj_drop_impl() to
>                  * indicate whether bpf_mem_free or bpf_mem_free_rcu
>                  * should be called.
>                  */
> }
>
> bpf_mem_free_rcu_cb2(&bpf_global_ma, p, bpf_obj_free_fields_cb, rec, p);

It sounds nice in theory, but will be difficult to implement.
bpf_ma would need to add extra two 8 byte pointers for every object,
but there is no room at present. So to free after rcu gp with a callback
it would need to allocate 16 byte (at least) which might fail and so on.
bpf_mem_free_rcu_cb2() would be the last resort.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-23 16:20     ` Alexei Starovoitov
@ 2023-08-23 20:29       ` Yonghong Song
  2023-08-24  1:38         ` Alexei Starovoitov
  0 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-23 20:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team



On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
> On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>> This is the final fix for the use-after-free scenario described in
>>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
>>> non-owning refs"). That commit, by virtue of changing
>>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
>>> the "refcount incr on 0" splat. The not_zero check in
>>> refcount_inc_not_zero, though, still occurs on memory that could have
>>> been free'd and reused, so the commit didn't properly fix the root
>>> cause.
>>>
>>> This patch actually fixes the issue by free'ing using the recently-added
>>> bpf_mem_free_rcu, which ensures that the memory is not reused until
>>> RCU grace period has elapsed. If that has happened then
>>> there are no non-owning references alive that point to the
>>> recently-free'd memory, so it can be safely reused.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>>> ---
>>>    kernel/bpf/helpers.c | 6 +++++-
>>>    1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index eb91cae0612a..945a85e25ac5 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>>>
>>>        if (rec)
>>>                bpf_obj_free_fields(rec, p);
>>
>> During reviewing my percpu kptr patch with link
>>
>> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
>> Kumar mentioned although percpu memory itself is freed under rcu.
>> But its record fields are freed immediately. This will cause
>> the problem since there may be some active uses of these fields
>> within rcu cs and after bpf_obj_free_fields(), some fields may
>> be re-initialized with new memory but they do not have chances
>> to free any more.
>>
>> Do we have problem here as well?
> 
> I think it's not an issue here or in your percpu patch,
> since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
> call bpf_mem_free_rcu() (after this patch set lands).

The following is my understanding.

void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
{
         const struct btf_field *fields;
         int i;

         if (IS_ERR_OR_NULL(rec))
                 return;
         fields = rec->fields;
         for (i = 0; i < rec->cnt; i++) {
                 struct btf_struct_meta *pointee_struct_meta;
                 const struct btf_field *field = &fields[i];
                 void *field_ptr = obj + field->offset;
                 void *xchgd_field;

                 switch (fields[i].type) {
                 case BPF_SPIN_LOCK:
                         break;
                 case BPF_TIMER:
                         bpf_timer_cancel_and_free(field_ptr);
                         break;
                 case BPF_KPTR_UNREF:
                         WRITE_ONCE(*(u64 *)field_ptr, 0);
                         break;
                 case BPF_KPTR_REF:
			......
                         break;
                 case BPF_LIST_HEAD:
                         if (WARN_ON_ONCE(rec->spin_lock_off < 0))
                                 continue;
                         bpf_list_head_free(field, field_ptr, obj + 
rec->spin_lock_off);
                         break;
                 case BPF_RB_ROOT:
                         if (WARN_ON_ONCE(rec->spin_lock_off < 0))
                                 continue;
                         bpf_rb_root_free(field, field_ptr, obj + 
rec->spin_lock_off);
                         break;
                 case BPF_LIST_NODE:
                 case BPF_RB_NODE:
                 case BPF_REFCOUNT:
                         break;
                 default:
                         WARN_ON_ONCE(1);
                         continue;
                 }
         }
}

For percpu kptr, the remaining possible actiionable fields are
	BPF_LIST_HEAD and BPF_RB_ROOT

So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
list/rb nodes to unlink them from the list_head/rb_root.

So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
Depending on whether the correspondingrec
with rb_node/list_node has ref count or not,
it may call bpf_mem_free() or bpf_mem_free_rcu(). If
bpf_mem_free() is called, then the field is immediately freed
but it may be used by some bpf prog (under rcu) concurrently,
could this be an issue? Changing bpf_mem_free() in
__bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.

Another thing is related to list_head/rb_root.
During bpf_obj_free_fields(), is it possible that another cpu
may allocate a list_node/rb_node and add to list_head/rb_root?
If this is true, then we might have a memory leak.
But I don't whether this is possible or not.

I think local kptr has the issue as percpu kptr.

> 
> In other words all bpf pointers are either properly life time
> checked through the verifier and freed via immediate bpf_mem_free()
> or they're bpf_mem_free_rcu().
> 
> 
>>
>> I am thinking whether I could create another flavor of bpf_mem_free_rcu
>> with a pre_free_callback function, something like
>>     bpf_mem_free_rcu_cb2(struct bpf_mem_alloc *ma, void *ptr,
>>         void (*cb)(void *, void *), void *arg1, void *arg2)
>>
>> The cb(arg1, arg2) will be called right before the real free of "ptr".
>>
>> For example, for this patch, the callback function can be
>>
>> static bpf_obj_free_fields_cb(void *rec, void *p)
>> {
>>          if (rec)
>>                  bpf_obj_free_fields(rec, p);
>>                  /* we need to ensure recursive freeing fields free
>>                   * needs to be done immediately, which means we will
>>                   * add a parameter to __bpf_obj_drop_impl() to
>>                   * indicate whether bpf_mem_free or bpf_mem_free_rcu
>>                   * should be called.
>>                   */
>> }
>>
>> bpf_mem_free_rcu_cb2(&bpf_global_ma, p, bpf_obj_free_fields_cb, rec, p);
> 
> It sounds nice in theory, but will be difficult to implement.
> bpf_ma would need to add extra two 8 byte pointers for every object,
> but there is no room at present. So to free after rcu gp with a callback
> it would need to allocate 16 byte (at least) which might fail and so on.
> bpf_mem_free_rcu_cb2() would be the last resort.

I don't like this either. I hope we can find better solutions than this.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-23 20:29       ` Yonghong Song
@ 2023-08-24  1:38         ` Alexei Starovoitov
  2023-08-24  2:09           ` Alexei Starovoitov
  2023-08-24  3:52           ` Yonghong Song
  0 siblings, 2 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-24  1:38 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team

On Wed, Aug 23, 2023 at 1:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
> > On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >>
> >>
> >> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> >>> This is the final fix for the use-after-free scenario described in
> >>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> >>> non-owning refs"). That commit, by virtue of changing
> >>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
> >>> the "refcount incr on 0" splat. The not_zero check in
> >>> refcount_inc_not_zero, though, still occurs on memory that could have
> >>> been free'd and reused, so the commit didn't properly fix the root
> >>> cause.
> >>>
> >>> This patch actually fixes the issue by free'ing using the recently-added
> >>> bpf_mem_free_rcu, which ensures that the memory is not reused until
> >>> RCU grace period has elapsed. If that has happened then
> >>> there are no non-owning references alive that point to the
> >>> recently-free'd memory, so it can be safely reused.
> >>>
> >>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> >>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> >>> ---
> >>>    kernel/bpf/helpers.c | 6 +++++-
> >>>    1 file changed, 5 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>> index eb91cae0612a..945a85e25ac5 100644
> >>> --- a/kernel/bpf/helpers.c
> >>> +++ b/kernel/bpf/helpers.c
> >>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
> >>>
> >>>        if (rec)
> >>>                bpf_obj_free_fields(rec, p);
> >>
> >> During reviewing my percpu kptr patch with link
> >>
> >> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
> >> Kumar mentioned although percpu memory itself is freed under rcu.
> >> But its record fields are freed immediately. This will cause
> >> the problem since there may be some active uses of these fields
> >> within rcu cs and after bpf_obj_free_fields(), some fields may
> >> be re-initialized with new memory but they do not have chances
> >> to free any more.
> >>
> >> Do we have problem here as well?
> >
> > I think it's not an issue here or in your percpu patch,
> > since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
> > call bpf_mem_free_rcu() (after this patch set lands).
>
> The following is my understanding.
>
> void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
> {
>          const struct btf_field *fields;
>          int i;
>
>          if (IS_ERR_OR_NULL(rec))
>                  return;
>          fields = rec->fields;
>          for (i = 0; i < rec->cnt; i++) {
>                  struct btf_struct_meta *pointee_struct_meta;
>                  const struct btf_field *field = &fields[i];
>                  void *field_ptr = obj + field->offset;
>                  void *xchgd_field;
>
>                  switch (fields[i].type) {
>                  case BPF_SPIN_LOCK:
>                          break;
>                  case BPF_TIMER:
>                          bpf_timer_cancel_and_free(field_ptr);
>                          break;
>                  case BPF_KPTR_UNREF:
>                          WRITE_ONCE(*(u64 *)field_ptr, 0);
>                          break;
>                  case BPF_KPTR_REF:
>                         ......
>                          break;
>                  case BPF_LIST_HEAD:
>                          if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>                                  continue;
>                          bpf_list_head_free(field, field_ptr, obj +
> rec->spin_lock_off);
>                          break;
>                  case BPF_RB_ROOT:
>                          if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>                                  continue;
>                          bpf_rb_root_free(field, field_ptr, obj +
> rec->spin_lock_off);
>                          break;
>                  case BPF_LIST_NODE:
>                  case BPF_RB_NODE:
>                  case BPF_REFCOUNT:
>                          break;
>                  default:
>                          WARN_ON_ONCE(1);
>                          continue;
>                  }
>          }
> }
>
> For percpu kptr, the remaining possible actiionable fields are
>         BPF_LIST_HEAD and BPF_RB_ROOT
>
> So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
> list/rb nodes to unlink them from the list_head/rb_root.
>
> So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
> Depending on whether the correspondingrec
> with rb_node/list_node has ref count or not,
> it may call bpf_mem_free() or bpf_mem_free_rcu(). If
> bpf_mem_free() is called, then the field is immediately freed
> but it may be used by some bpf prog (under rcu) concurrently,
> could this be an issue?

I see. Yeah. Looks like percpu makes such fields refcount-like.
For non-percpu non-refcount only one bpf prog on one cpu can observe
that object. That's why we're doing plain bpf_mem_free() for them.

So this patch is a good fix for refcounted, but you and Kumar are
correct that it's not sufficient for the case when percpu struct
includes multiple rb_roots. One for each cpu.

> Changing bpf_mem_free() in
> __bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.

I guess we can do that when obj is either refcount or can be
insider percpu, but it might not be enough. More below.

> Another thing is related to list_head/rb_root.
> During bpf_obj_free_fields(), is it possible that another cpu
> may allocate a list_node/rb_node and add to list_head/rb_root?

It's not an issue for the single owner case and for refcounted.
Access to rb_root/list_head is always lock protected.
For refcounted the obj needs to be acquired (from the verifier pov)
meaning to have refcount =1 to be able to do spin_lock and
operate on list_head.

But bpf_rb_root_free is indeed an issue for percpu, since each
percpu has its own rb root field with its own bpf_spinlock, but
for_each_cpu() {bpf_obj_free_fields();} violates access contract.

percpu and rb_root creates such a maze of dependencies that
I think it's better to disallow rb_root-s and kptr-s inside percpu
for now.

> If this is true, then we might have a memory leak.
> But I don't whether this is possible or not.
>
> I think local kptr has the issue as percpu kptr.

Let's tackle one at a time.
I still think Dave's patch set is a good fix for recounted,
while we need to think more about percpu case.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-24  1:38         ` Alexei Starovoitov
@ 2023-08-24  2:09           ` Alexei Starovoitov
  2023-08-24  4:01             ` Yonghong Song
  2023-08-24  3:52           ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-24  2:09 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team

On Wed, Aug 23, 2023 at 6:38 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 23, 2023 at 1:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >
> >
> >
> > On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
> > > On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > >>
> > >>
> > >>
> > >> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> > >>> This is the final fix for the use-after-free scenario described in
> > >>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> > >>> non-owning refs"). That commit, by virtue of changing
> > >>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
> > >>> the "refcount incr on 0" splat. The not_zero check in
> > >>> refcount_inc_not_zero, though, still occurs on memory that could have
> > >>> been free'd and reused, so the commit didn't properly fix the root
> > >>> cause.
> > >>>
> > >>> This patch actually fixes the issue by free'ing using the recently-added
> > >>> bpf_mem_free_rcu, which ensures that the memory is not reused until
> > >>> RCU grace period has elapsed. If that has happened then
> > >>> there are no non-owning references alive that point to the
> > >>> recently-free'd memory, so it can be safely reused.
> > >>>
> > >>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> > >>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> > >>> ---
> > >>>    kernel/bpf/helpers.c | 6 +++++-
> > >>>    1 file changed, 5 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > >>> index eb91cae0612a..945a85e25ac5 100644
> > >>> --- a/kernel/bpf/helpers.c
> > >>> +++ b/kernel/bpf/helpers.c
> > >>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
> > >>>
> > >>>        if (rec)
> > >>>                bpf_obj_free_fields(rec, p);
> > >>
> > >> During reviewing my percpu kptr patch with link
> > >>
> > >> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
> > >> Kumar mentioned although percpu memory itself is freed under rcu.
> > >> But its record fields are freed immediately. This will cause
> > >> the problem since there may be some active uses of these fields
> > >> within rcu cs and after bpf_obj_free_fields(), some fields may
> > >> be re-initialized with new memory but they do not have chances
> > >> to free any more.
> > >>
> > >> Do we have problem here as well?
> > >
> > > I think it's not an issue here or in your percpu patch,
> > > since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
> > > call bpf_mem_free_rcu() (after this patch set lands).
> >
> > The following is my understanding.
> >
> > void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
> > {
> >          const struct btf_field *fields;
> >          int i;
> >
> >          if (IS_ERR_OR_NULL(rec))
> >                  return;
> >          fields = rec->fields;
> >          for (i = 0; i < rec->cnt; i++) {
> >                  struct btf_struct_meta *pointee_struct_meta;
> >                  const struct btf_field *field = &fields[i];
> >                  void *field_ptr = obj + field->offset;
> >                  void *xchgd_field;
> >
> >                  switch (fields[i].type) {
> >                  case BPF_SPIN_LOCK:
> >                          break;
> >                  case BPF_TIMER:
> >                          bpf_timer_cancel_and_free(field_ptr);
> >                          break;
> >                  case BPF_KPTR_UNREF:
> >                          WRITE_ONCE(*(u64 *)field_ptr, 0);
> >                          break;
> >                  case BPF_KPTR_REF:
> >                         ......
> >                          break;
> >                  case BPF_LIST_HEAD:
> >                          if (WARN_ON_ONCE(rec->spin_lock_off < 0))
> >                                  continue;
> >                          bpf_list_head_free(field, field_ptr, obj +
> > rec->spin_lock_off);
> >                          break;
> >                  case BPF_RB_ROOT:
> >                          if (WARN_ON_ONCE(rec->spin_lock_off < 0))
> >                                  continue;
> >                          bpf_rb_root_free(field, field_ptr, obj +
> > rec->spin_lock_off);
> >                          break;
> >                  case BPF_LIST_NODE:
> >                  case BPF_RB_NODE:
> >                  case BPF_REFCOUNT:
> >                          break;
> >                  default:
> >                          WARN_ON_ONCE(1);
> >                          continue;
> >                  }
> >          }
> > }
> >
> > For percpu kptr, the remaining possible actiionable fields are
> >         BPF_LIST_HEAD and BPF_RB_ROOT
> >
> > So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
> > list/rb nodes to unlink them from the list_head/rb_root.
> >
> > So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
> > Depending on whether the correspondingrec
> > with rb_node/list_node has ref count or not,
> > it may call bpf_mem_free() or bpf_mem_free_rcu(). If
> > bpf_mem_free() is called, then the field is immediately freed
> > but it may be used by some bpf prog (under rcu) concurrently,
> > could this be an issue?
>
> I see. Yeah. Looks like percpu makes such fields refcount-like.
> For non-percpu non-refcount only one bpf prog on one cpu can observe
> that object. That's why we're doing plain bpf_mem_free() for them.
>
> So this patch is a good fix for refcounted, but you and Kumar are
> correct that it's not sufficient for the case when percpu struct
> includes multiple rb_roots. One for each cpu.
>
> > Changing bpf_mem_free() in
> > __bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.
>
> I guess we can do that when obj is either refcount or can be
> insider percpu, but it might not be enough. More below.
>
> > Another thing is related to list_head/rb_root.
> > During bpf_obj_free_fields(), is it possible that another cpu
> > may allocate a list_node/rb_node and add to list_head/rb_root?
>
> It's not an issue for the single owner case and for refcounted.
> Access to rb_root/list_head is always lock protected.
> For refcounted the obj needs to be acquired (from the verifier pov)
> meaning to have refcount =1 to be able to do spin_lock and
> operate on list_head.
>
> But bpf_rb_root_free is indeed an issue for percpu, since each
> percpu has its own rb root field with its own bpf_spinlock, but
> for_each_cpu() {bpf_obj_free_fields();} violates access contract.
>
> percpu and rb_root creates such a maze of dependencies that
> I think it's better to disallow rb_root-s and kptr-s inside percpu
> for now.
>
> > If this is true, then we might have a memory leak.
> > But I don't whether this is possible or not.
> >
> > I think local kptr has the issue as percpu kptr.
>
> Let's tackle one at a time.
> I still think Dave's patch set is a good fix for recounted,
> while we need to think more about percpu case.

I'm planning to land the series though there could be still bugs to fix.
At least they're fixing known issues.
Please yell if you think we have to wait for another release.

Like I think the following is needed regardless:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bb78212fa5b2..efa6482b1c2c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7210,6 +7210,10 @@ static int process_spin_lock(struct
bpf_verifier_env *env, int regno,
                        map ? map->name : "kptr");
                return -EINVAL;
        }
+       if (type_is_non_owning_ref(reg->type)) {
+               verbose(env, "Accessing locks in non-owning object is
not allowed\n");
+               return -EINVAL;
+       }
        if (rec->spin_lock_off != val + reg->off) {
                verbose(env, "off %lld doesn't point to 'struct
bpf_spin_lock' that is at %d\n",

atm I don't see where we enforce such.
Since refcounted non-own refs can have refcount=0 it's not safe to access
non-scalar objects inside them.
Maybe stronger enforcement is needed?

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-24  1:38         ` Alexei Starovoitov
  2023-08-24  2:09           ` Alexei Starovoitov
@ 2023-08-24  3:52           ` Yonghong Song
  2023-08-24 22:03             ` Alexei Starovoitov
  1 sibling, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-24  3:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team



On 8/23/23 6:38 PM, Alexei Starovoitov wrote:
> On Wed, Aug 23, 2023 at 1:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
>>> On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>
>>>>
>>>>
>>>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>>>> This is the final fix for the use-after-free scenario described in
>>>>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
>>>>> non-owning refs"). That commit, by virtue of changing
>>>>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
>>>>> the "refcount incr on 0" splat. The not_zero check in
>>>>> refcount_inc_not_zero, though, still occurs on memory that could have
>>>>> been free'd and reused, so the commit didn't properly fix the root
>>>>> cause.
>>>>>
>>>>> This patch actually fixes the issue by free'ing using the recently-added
>>>>> bpf_mem_free_rcu, which ensures that the memory is not reused until
>>>>> RCU grace period has elapsed. If that has happened then
>>>>> there are no non-owning references alive that point to the
>>>>> recently-free'd memory, so it can be safely reused.
>>>>>
>>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>>>>> ---
>>>>>     kernel/bpf/helpers.c | 6 +++++-
>>>>>     1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>>>> index eb91cae0612a..945a85e25ac5 100644
>>>>> --- a/kernel/bpf/helpers.c
>>>>> +++ b/kernel/bpf/helpers.c
>>>>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>>>>>
>>>>>         if (rec)
>>>>>                 bpf_obj_free_fields(rec, p);
>>>>
>>>> During reviewing my percpu kptr patch with link
>>>>
>>>> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
>>>> Kumar mentioned although percpu memory itself is freed under rcu.
>>>> But its record fields are freed immediately. This will cause
>>>> the problem since there may be some active uses of these fields
>>>> within rcu cs and after bpf_obj_free_fields(), some fields may
>>>> be re-initialized with new memory but they do not have chances
>>>> to free any more.
>>>>
>>>> Do we have problem here as well?
>>>
>>> I think it's not an issue here or in your percpu patch,
>>> since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
>>> call bpf_mem_free_rcu() (after this patch set lands).
>>
>> The following is my understanding.
>>
>> void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>> {
>>           const struct btf_field *fields;
>>           int i;
>>
>>           if (IS_ERR_OR_NULL(rec))
>>                   return;
>>           fields = rec->fields;
>>           for (i = 0; i < rec->cnt; i++) {
>>                   struct btf_struct_meta *pointee_struct_meta;
>>                   const struct btf_field *field = &fields[i];
>>                   void *field_ptr = obj + field->offset;
>>                   void *xchgd_field;
>>
>>                   switch (fields[i].type) {
>>                   case BPF_SPIN_LOCK:
>>                           break;
>>                   case BPF_TIMER:
>>                           bpf_timer_cancel_and_free(field_ptr);
>>                           break;
>>                   case BPF_KPTR_UNREF:
>>                           WRITE_ONCE(*(u64 *)field_ptr, 0);
>>                           break;
>>                   case BPF_KPTR_REF:
>>                          ......
>>                           break;
>>                   case BPF_LIST_HEAD:
>>                           if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>>                                   continue;
>>                           bpf_list_head_free(field, field_ptr, obj +
>> rec->spin_lock_off);
>>                           break;
>>                   case BPF_RB_ROOT:
>>                           if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>>                                   continue;
>>                           bpf_rb_root_free(field, field_ptr, obj +
>> rec->spin_lock_off);
>>                           break;
>>                   case BPF_LIST_NODE:
>>                   case BPF_RB_NODE:
>>                   case BPF_REFCOUNT:
>>                           break;
>>                   default:
>>                           WARN_ON_ONCE(1);
>>                           continue;
>>                   }
>>           }
>> }
>>
>> For percpu kptr, the remaining possible actiionable fields are
>>          BPF_LIST_HEAD and BPF_RB_ROOT
>>
>> So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
>> list/rb nodes to unlink them from the list_head/rb_root.
>>
>> So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
>> Depending on whether the correspondingrec
>> with rb_node/list_node has ref count or not,
>> it may call bpf_mem_free() or bpf_mem_free_rcu(). If
>> bpf_mem_free() is called, then the field is immediately freed
>> but it may be used by some bpf prog (under rcu) concurrently,
>> could this be an issue?
> 
> I see. Yeah. Looks like percpu makes such fields refcount-like.
> For non-percpu non-refcount only one bpf prog on one cpu can observe
> that object. That's why we're doing plain bpf_mem_free() for them.
> 
> So this patch is a good fix for refcounted, but you and Kumar are
> correct that it's not sufficient for the case when percpu struct
> includes multiple rb_roots. One for each cpu.
> 
>> Changing bpf_mem_free() in
>> __bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.
> 
> I guess we can do that when obj is either refcount or can be
> insider percpu, but it might not be enough. More below.
> 
>> Another thing is related to list_head/rb_root.
>> During bpf_obj_free_fields(), is it possible that another cpu
>> may allocate a list_node/rb_node and add to list_head/rb_root?
> 
> It's not an issue for the single owner case and for refcounted.
> Access to rb_root/list_head is always lock protected.
> For refcounted the obj needs to be acquired (from the verifier pov)
> meaning to have refcount =1 to be able to do spin_lock and
> operate on list_head.

Martin and I came up with the following example early today like below,
assuming the map value struct contains a list_head and a spin_lock.

          cpu 0                              cpu 1
       key = 1;
       v = bpf_map_lookup(&map, &key);
                                         key = 1;
                                         bpf_map_delete_elem(&map, &key);
                                         /* distruction continues and
                                          * bpf_obj_free_fields() are
                                          * called.
                                          */
                                         /* in bpf_list_head_free():
                                          * __bpf_spin_lock_irqsave(...)
                                          * ...
                                          * __bpf_spin_unlock_irqrestore();
                                          */

       n = bpf_obj_new(...)
       bpf_spin_lock(&v->lock);
       bpf_list_push_front(&v->head, &v->node);
       bpf_spin_lock(&v->lock);

In cpu 1 'bpf_obj_free_fields', there is a list head, so
bpf_list_head_free() is called. In bpf_list_head_free(), we do

         __bpf_spin_lock_irqsave(spin_lock);
         if (!head->next || list_empty(head))
                 goto unlock;
         head = head->next;
unlock:
         INIT_LIST_HEAD(orig_head);
         __bpf_spin_unlock_irqrestore(spin_lock);

         while (head != orig_head) {
                 void *obj = head;

                 obj -= field->graph_root.node_offset;
                 head = head->next;
                 /* The contained type can also have resources, including a
                  * bpf_list_head which needs to be freed.
                  */
                 migrate_disable();
                 __bpf_obj_drop_impl(obj, field->graph_root.value_rec);
                 migrate_enable();
         }

So it is possible the cpu 0 may add one element to the list_head
which will never been freed.

This happens to say list_head or rb_root too. I am aware that
this may be an existing issue for some maps, e.g. hash map.
So it may not be a big problem. Just want to mention this though.

> 
> But bpf_rb_root_free is indeed an issue for percpu, since each
> percpu has its own rb root field with its own bpf_spinlock, but
> for_each_cpu() {bpf_obj_free_fields();} violates access contract.

Could you explain what 'access contract' mean here? For non-percpu
case, 'x' special fields may be checked. For percpu case, it is
just 'x * nr_cpus' special fields to be checked.

> 
> percpu and rb_root creates such a maze of dependencies that
> I think it's better to disallow rb_root-s and kptr-s inside percpu
> for now.

I can certainly disallow rb_root and list_head. Just want to
understand what kind of issues we face specially for percpu kptr.

Current kptrs cannot be nested since the first argument of
bpf_kptr_xchg() must be a map_value. So only impactful fields
(w.r.t. bpf_obj_free_fields()) are rb_root and list_head.

> 
>> If this is true, then we might have a memory leak.
>> But I don't whether this is possible or not.
>>
>> I think local kptr has the issue as percpu kptr.
> 
> Let's tackle one at a time.
> I still think Dave's patch set is a good fix for recounted,
> while we need to think more about percpu case.

I agree that Dave's patch indeed fixed an existing issue.
We can resolve other issues (like above) gradually.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-24  2:09           ` Alexei Starovoitov
@ 2023-08-24  4:01             ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-24  4:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team



On 8/23/23 7:09 PM, Alexei Starovoitov wrote:
> On Wed, Aug 23, 2023 at 6:38 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Wed, Aug 23, 2023 at 1:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>
>>>
>>>
>>> On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
>>>> On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>>>>> This is the final fix for the use-after-free scenario described in
>>>>>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
>>>>>> non-owning refs"). That commit, by virtue of changing
>>>>>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
>>>>>> the "refcount incr on 0" splat. The not_zero check in
>>>>>> refcount_inc_not_zero, though, still occurs on memory that could have
>>>>>> been free'd and reused, so the commit didn't properly fix the root
>>>>>> cause.
>>>>>>
>>>>>> This patch actually fixes the issue by free'ing using the recently-added
>>>>>> bpf_mem_free_rcu, which ensures that the memory is not reused until
>>>>>> RCU grace period has elapsed. If that has happened then
>>>>>> there are no non-owning references alive that point to the
>>>>>> recently-free'd memory, so it can be safely reused.
>>>>>>
>>>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>>>>>> ---
>>>>>>     kernel/bpf/helpers.c | 6 +++++-
>>>>>>     1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>>>>> index eb91cae0612a..945a85e25ac5 100644
>>>>>> --- a/kernel/bpf/helpers.c
>>>>>> +++ b/kernel/bpf/helpers.c
>>>>>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>>>>>>
>>>>>>         if (rec)
>>>>>>                 bpf_obj_free_fields(rec, p);
>>>>>
>>>>> During reviewing my percpu kptr patch with link
>>>>>
>>>>> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
>>>>> Kumar mentioned although percpu memory itself is freed under rcu.
>>>>> But its record fields are freed immediately. This will cause
>>>>> the problem since there may be some active uses of these fields
>>>>> within rcu cs and after bpf_obj_free_fields(), some fields may
>>>>> be re-initialized with new memory but they do not have chances
>>>>> to free any more.
>>>>>
>>>>> Do we have problem here as well?
>>>>
>>>> I think it's not an issue here or in your percpu patch,
>>>> since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
>>>> call bpf_mem_free_rcu() (after this patch set lands).
>>>
>>> The following is my understanding.
>>>
>>> void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>>> {
>>>           const struct btf_field *fields;
>>>           int i;
>>>
>>>           if (IS_ERR_OR_NULL(rec))
>>>                   return;
>>>           fields = rec->fields;
>>>           for (i = 0; i < rec->cnt; i++) {
>>>                   struct btf_struct_meta *pointee_struct_meta;
>>>                   const struct btf_field *field = &fields[i];
>>>                   void *field_ptr = obj + field->offset;
>>>                   void *xchgd_field;
>>>
>>>                   switch (fields[i].type) {
>>>                   case BPF_SPIN_LOCK:
>>>                           break;
>>>                   case BPF_TIMER:
>>>                           bpf_timer_cancel_and_free(field_ptr);
>>>                           break;
>>>                   case BPF_KPTR_UNREF:
>>>                           WRITE_ONCE(*(u64 *)field_ptr, 0);
>>>                           break;
>>>                   case BPF_KPTR_REF:
>>>                          ......
>>>                           break;
>>>                   case BPF_LIST_HEAD:
>>>                           if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>>>                                   continue;
>>>                           bpf_list_head_free(field, field_ptr, obj +
>>> rec->spin_lock_off);
>>>                           break;
>>>                   case BPF_RB_ROOT:
>>>                           if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>>>                                   continue;
>>>                           bpf_rb_root_free(field, field_ptr, obj +
>>> rec->spin_lock_off);
>>>                           break;
>>>                   case BPF_LIST_NODE:
>>>                   case BPF_RB_NODE:
>>>                   case BPF_REFCOUNT:
>>>                           break;
>>>                   default:
>>>                           WARN_ON_ONCE(1);
>>>                           continue;
>>>                   }
>>>           }
>>> }
>>>
>>> For percpu kptr, the remaining possible actiionable fields are
>>>          BPF_LIST_HEAD and BPF_RB_ROOT
>>>
>>> So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
>>> list/rb nodes to unlink them from the list_head/rb_root.
>>>
>>> So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
>>> Depending on whether the correspondingrec
>>> with rb_node/list_node has ref count or not,
>>> it may call bpf_mem_free() or bpf_mem_free_rcu(). If
>>> bpf_mem_free() is called, then the field is immediately freed
>>> but it may be used by some bpf prog (under rcu) concurrently,
>>> could this be an issue?
>>
>> I see. Yeah. Looks like percpu makes such fields refcount-like.
>> For non-percpu non-refcount only one bpf prog on one cpu can observe
>> that object. That's why we're doing plain bpf_mem_free() for them.
>>
>> So this patch is a good fix for refcounted, but you and Kumar are
>> correct that it's not sufficient for the case when percpu struct
>> includes multiple rb_roots. One for each cpu.
>>
>>> Changing bpf_mem_free() in
>>> __bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.
>>
>> I guess we can do that when obj is either refcount or can be
>> insider percpu, but it might not be enough. More below.
>>
>>> Another thing is related to list_head/rb_root.
>>> During bpf_obj_free_fields(), is it possible that another cpu
>>> may allocate a list_node/rb_node and add to list_head/rb_root?
>>
>> It's not an issue for the single owner case and for refcounted.
>> Access to rb_root/list_head is always lock protected.
>> For refcounted the obj needs to be acquired (from the verifier pov)
>> meaning to have refcount =1 to be able to do spin_lock and
>> operate on list_head.
>>
>> But bpf_rb_root_free is indeed an issue for percpu, since each
>> percpu has its own rb root field with its own bpf_spinlock, but
>> for_each_cpu() {bpf_obj_free_fields();} violates access contract.
>>
>> percpu and rb_root creates such a maze of dependencies that
>> I think it's better to disallow rb_root-s and kptr-s inside percpu
>> for now.
>>
>>> If this is true, then we might have a memory leak.
>>> But I don't whether this is possible or not.
>>>
>>> I think local kptr has the issue as percpu kptr.
>>
>> Let's tackle one at a time.
>> I still think Dave's patch set is a good fix for recounted,
>> while we need to think more about percpu case.
> 
> I'm planning to land the series though there could be still bugs to fix.
> At least they're fixing known issues.
> Please yell if you think we have to wait for another release.
> 
> Like I think the following is needed regardless:
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index bb78212fa5b2..efa6482b1c2c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7210,6 +7210,10 @@ static int process_spin_lock(struct
> bpf_verifier_env *env, int regno,
>                          map ? map->name : "kptr");
>                  return -EINVAL;
>          }
> +       if (type_is_non_owning_ref(reg->type)) {
> +               verbose(env, "Accessing locks in non-owning object is
> not allowed\n");
> +               return -EINVAL;
> +       }
>          if (rec->spin_lock_off != val + reg->off) {
>                  verbose(env, "off %lld doesn't point to 'struct
> bpf_spin_lock' that is at %d\n",
> 
> atm I don't see where we enforce such.
> Since refcounted non-own refs can have refcount=0 it's not safe to access
> non-scalar objects inside them.
> Maybe stronger enforcement is needed?

Currently in bpf_obj_free_fields(), BPF_SPIN_LOCK is not touched which
means it CAN be used for non-owning reference. I do not have
strong preferences one way or another. No sure whether Dave has
tests which will use spin_lock with non-owning-ref or not.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-24  3:52           ` Yonghong Song
@ 2023-08-24 22:03             ` Alexei Starovoitov
  2023-08-24 22:25               ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-24 22:03 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team

On Wed, Aug 23, 2023 at 8:52 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
>
> On 8/23/23 6:38 PM, Alexei Starovoitov wrote:
> > On Wed, Aug 23, 2023 at 1:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>
> >>
> >>
> >> On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
> >>> On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
> >>>>> This is the final fix for the use-after-free scenario described in
> >>>>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
> >>>>> non-owning refs"). That commit, by virtue of changing
> >>>>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
> >>>>> the "refcount incr on 0" splat. The not_zero check in
> >>>>> refcount_inc_not_zero, though, still occurs on memory that could have
> >>>>> been free'd and reused, so the commit didn't properly fix the root
> >>>>> cause.
> >>>>>
> >>>>> This patch actually fixes the issue by free'ing using the recently-added
> >>>>> bpf_mem_free_rcu, which ensures that the memory is not reused until
> >>>>> RCU grace period has elapsed. If that has happened then
> >>>>> there are no non-owning references alive that point to the
> >>>>> recently-free'd memory, so it can be safely reused.
> >>>>>
> >>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> >>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
> >>>>> ---
> >>>>>     kernel/bpf/helpers.c | 6 +++++-
> >>>>>     1 file changed, 5 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>>>> index eb91cae0612a..945a85e25ac5 100644
> >>>>> --- a/kernel/bpf/helpers.c
> >>>>> +++ b/kernel/bpf/helpers.c
> >>>>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
> >>>>>
> >>>>>         if (rec)
> >>>>>                 bpf_obj_free_fields(rec, p);
> >>>>
> >>>> During reviewing my percpu kptr patch with link
> >>>>
> >>>> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
> >>>> Kumar mentioned although percpu memory itself is freed under rcu.
> >>>> But its record fields are freed immediately. This will cause
> >>>> the problem since there may be some active uses of these fields
> >>>> within rcu cs and after bpf_obj_free_fields(), some fields may
> >>>> be re-initialized with new memory but they do not have chances
> >>>> to free any more.
> >>>>
> >>>> Do we have problem here as well?
> >>>
> >>> I think it's not an issue here or in your percpu patch,
> >>> since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
> >>> call bpf_mem_free_rcu() (after this patch set lands).
> >>
> >> The following is my understanding.
> >>
> >> void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
> >> {
> >>           const struct btf_field *fields;
> >>           int i;
> >>
> >>           if (IS_ERR_OR_NULL(rec))
> >>                   return;
> >>           fields = rec->fields;
> >>           for (i = 0; i < rec->cnt; i++) {
> >>                   struct btf_struct_meta *pointee_struct_meta;
> >>                   const struct btf_field *field = &fields[i];
> >>                   void *field_ptr = obj + field->offset;
> >>                   void *xchgd_field;
> >>
> >>                   switch (fields[i].type) {
> >>                   case BPF_SPIN_LOCK:
> >>                           break;
> >>                   case BPF_TIMER:
> >>                           bpf_timer_cancel_and_free(field_ptr);
> >>                           break;
> >>                   case BPF_KPTR_UNREF:
> >>                           WRITE_ONCE(*(u64 *)field_ptr, 0);
> >>                           break;
> >>                   case BPF_KPTR_REF:
> >>                          ......
> >>                           break;
> >>                   case BPF_LIST_HEAD:
> >>                           if (WARN_ON_ONCE(rec->spin_lock_off < 0))
> >>                                   continue;
> >>                           bpf_list_head_free(field, field_ptr, obj +
> >> rec->spin_lock_off);
> >>                           break;
> >>                   case BPF_RB_ROOT:
> >>                           if (WARN_ON_ONCE(rec->spin_lock_off < 0))
> >>                                   continue;
> >>                           bpf_rb_root_free(field, field_ptr, obj +
> >> rec->spin_lock_off);
> >>                           break;
> >>                   case BPF_LIST_NODE:
> >>                   case BPF_RB_NODE:
> >>                   case BPF_REFCOUNT:
> >>                           break;
> >>                   default:
> >>                           WARN_ON_ONCE(1);
> >>                           continue;
> >>                   }
> >>           }
> >> }
> >>
> >> For percpu kptr, the remaining possible actiionable fields are
> >>          BPF_LIST_HEAD and BPF_RB_ROOT
> >>
> >> So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
> >> list/rb nodes to unlink them from the list_head/rb_root.
> >>
> >> So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
> >> Depending on whether the correspondingrec
> >> with rb_node/list_node has ref count or not,
> >> it may call bpf_mem_free() or bpf_mem_free_rcu(). If
> >> bpf_mem_free() is called, then the field is immediately freed
> >> but it may be used by some bpf prog (under rcu) concurrently,
> >> could this be an issue?
> >
> > I see. Yeah. Looks like percpu makes such fields refcount-like.
> > For non-percpu non-refcount only one bpf prog on one cpu can observe
> > that object. That's why we're doing plain bpf_mem_free() for them.
> >
> > So this patch is a good fix for refcounted, but you and Kumar are
> > correct that it's not sufficient for the case when percpu struct
> > includes multiple rb_roots. One for each cpu.
> >
> >> Changing bpf_mem_free() in
> >> __bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.
> >
> > I guess we can do that when obj is either refcount or can be
> > insider percpu, but it might not be enough. More below.
> >
> >> Another thing is related to list_head/rb_root.
> >> During bpf_obj_free_fields(), is it possible that another cpu
> >> may allocate a list_node/rb_node and add to list_head/rb_root?
> >
> > It's not an issue for the single owner case and for refcounted.
> > Access to rb_root/list_head is always lock protected.
> > For refcounted the obj needs to be acquired (from the verifier pov)
> > meaning to have refcount =1 to be able to do spin_lock and
> > operate on list_head.
>
> Martin and I came up with the following example early today like below,
> assuming the map value struct contains a list_head and a spin_lock.
>
>           cpu 0                              cpu 1
>        key = 1;
>        v = bpf_map_lookup(&map, &key);
>                                          key = 1;
>                                          bpf_map_delete_elem(&map, &key);
>                                          /* distruction continues and
>                                           * bpf_obj_free_fields() are
>                                           * called.
>                                           */
>                                          /* in bpf_list_head_free():
>                                           * __bpf_spin_lock_irqsave(...)
>                                           * ...
>                                           * __bpf_spin_unlock_irqrestore();
>                                           */
>
>        n = bpf_obj_new(...)
>        bpf_spin_lock(&v->lock);
>        bpf_list_push_front(&v->head, &v->node);
>        bpf_spin_lock(&v->lock);
>
> In cpu 1 'bpf_obj_free_fields', there is a list head, so
> bpf_list_head_free() is called. In bpf_list_head_free(), we do
>
>          __bpf_spin_lock_irqsave(spin_lock);
>          if (!head->next || list_empty(head))
>                  goto unlock;
>          head = head->next;
> unlock:
>          INIT_LIST_HEAD(orig_head);
>          __bpf_spin_unlock_irqrestore(spin_lock);
>
>          while (head != orig_head) {
>                  void *obj = head;
>
>                  obj -= field->graph_root.node_offset;
>                  head = head->next;
>                  /* The contained type can also have resources, including a
>                   * bpf_list_head which needs to be freed.
>                   */
>                  migrate_disable();
>                  __bpf_obj_drop_impl(obj, field->graph_root.value_rec);
>                  migrate_enable();
>          }
>
> So it is possible the cpu 0 may add one element to the list_head
> which will never been freed.
>
> This happens to say list_head or rb_root too. I am aware that
> this may be an existing issue for some maps, e.g. hash map.
> So it may not be a big problem. Just want to mention this though.

argh. That is indeed a problem and it's not limited to rbtree.
cpu0 doing kptr_xchg into a map value that was just deleted by cpu1
will cause a leak.
It affects both sleepable and non-sleepable progs.
For sleepable I see no other option, but to enforce 'v' use in RCU CS.
rcu_unlock currently converts mem_rcu into untrusted.
I think to fix the above it should convert all ptr_to_map_value to
untrusted as well.
In addition we can exten bpf_memalloc api with a dtor.
Register it at the time of bpf_mem_alloc_init() and use
bpf_mem_cache_free_rcu() in htab_elem_free() when kptr or
other special fields are present in the value.
For preallocated htab we'd need to reinstate call_rcu().
This would be a heavy fix. Better ideas?

>
> >
> > But bpf_rb_root_free is indeed an issue for percpu, since each
> > percpu has its own rb root field with its own bpf_spinlock, but
> > for_each_cpu() {bpf_obj_free_fields();} violates access contract.
>
> Could you explain what 'access contract' mean here? For non-percpu
> case, 'x' special fields may be checked. For percpu case, it is
> just 'x * nr_cpus' special fields to be checked.

Right. That's what I mean by refcounted-like.

>
> >
> > percpu and rb_root creates such a maze of dependencies that
> > I think it's better to disallow rb_root-s and kptr-s inside percpu
> > for now.
>
> I can certainly disallow rb_root and list_head. Just want to
> understand what kind of issues we face specially for percpu kptr.

Just matter of priorities. Fixing above is more urgent than
adding percpu kptr with their own corner cases to analyze.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
  2023-08-24 22:03             ` Alexei Starovoitov
@ 2023-08-24 22:25               ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-24 22:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Dave Marchevsky, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Kernel Team



On 8/24/23 3:03 PM, Alexei Starovoitov wrote:
> On Wed, Aug 23, 2023 at 8:52 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>>
>> On 8/23/23 6:38 PM, Alexei Starovoitov wrote:
>>> On Wed, Aug 23, 2023 at 1:29 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>
>>>>
>>>>
>>>> On 8/23/23 9:20 AM, Alexei Starovoitov wrote:
>>>>> On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/21/23 12:33 PM, Dave Marchevsky wrote:
>>>>>>> This is the final fix for the use-after-free scenario described in
>>>>>>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for
>>>>>>> non-owning refs"). That commit, by virtue of changing
>>>>>>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed
>>>>>>> the "refcount incr on 0" splat. The not_zero check in
>>>>>>> refcount_inc_not_zero, though, still occurs on memory that could have
>>>>>>> been free'd and reused, so the commit didn't properly fix the root
>>>>>>> cause.
>>>>>>>
>>>>>>> This patch actually fixes the issue by free'ing using the recently-added
>>>>>>> bpf_mem_free_rcu, which ensures that the memory is not reused until
>>>>>>> RCU grace period has elapsed. If that has happened then
>>>>>>> there are no non-owning references alive that point to the
>>>>>>> recently-free'd memory, so it can be safely reused.
>>>>>>>
>>>>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev>
>>>>>>> ---
>>>>>>>      kernel/bpf/helpers.c | 6 +++++-
>>>>>>>      1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>>>>>> index eb91cae0612a..945a85e25ac5 100644
>>>>>>> --- a/kernel/bpf/helpers.c
>>>>>>> +++ b/kernel/bpf/helpers.c
>>>>>>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>>>>>>>
>>>>>>>          if (rec)
>>>>>>>                  bpf_obj_free_fields(rec, p);
>>>>>>
>>>>>> During reviewing my percpu kptr patch with link
>>>>>>
>>>>>> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@linux.dev/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7
>>>>>> Kumar mentioned although percpu memory itself is freed under rcu.
>>>>>> But its record fields are freed immediately. This will cause
>>>>>> the problem since there may be some active uses of these fields
>>>>>> within rcu cs and after bpf_obj_free_fields(), some fields may
>>>>>> be re-initialized with new memory but they do not have chances
>>>>>> to free any more.
>>>>>>
>>>>>> Do we have problem here as well?
>>>>>
>>>>> I think it's not an issue here or in your percpu patch,
>>>>> since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will
>>>>> call bpf_mem_free_rcu() (after this patch set lands).
>>>>
>>>> The following is my understanding.
>>>>
>>>> void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>>>> {
>>>>            const struct btf_field *fields;
>>>>            int i;
>>>>
>>>>            if (IS_ERR_OR_NULL(rec))
>>>>                    return;
>>>>            fields = rec->fields;
>>>>            for (i = 0; i < rec->cnt; i++) {
>>>>                    struct btf_struct_meta *pointee_struct_meta;
>>>>                    const struct btf_field *field = &fields[i];
>>>>                    void *field_ptr = obj + field->offset;
>>>>                    void *xchgd_field;
>>>>
>>>>                    switch (fields[i].type) {
>>>>                    case BPF_SPIN_LOCK:
>>>>                            break;
>>>>                    case BPF_TIMER:
>>>>                            bpf_timer_cancel_and_free(field_ptr);
>>>>                            break;
>>>>                    case BPF_KPTR_UNREF:
>>>>                            WRITE_ONCE(*(u64 *)field_ptr, 0);
>>>>                            break;
>>>>                    case BPF_KPTR_REF:
>>>>                           ......
>>>>                            break;
>>>>                    case BPF_LIST_HEAD:
>>>>                            if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>>>>                                    continue;
>>>>                            bpf_list_head_free(field, field_ptr, obj +
>>>> rec->spin_lock_off);
>>>>                            break;
>>>>                    case BPF_RB_ROOT:
>>>>                            if (WARN_ON_ONCE(rec->spin_lock_off < 0))
>>>>                                    continue;
>>>>                            bpf_rb_root_free(field, field_ptr, obj +
>>>> rec->spin_lock_off);
>>>>                            break;
>>>>                    case BPF_LIST_NODE:
>>>>                    case BPF_RB_NODE:
>>>>                    case BPF_REFCOUNT:
>>>>                            break;
>>>>                    default:
>>>>                            WARN_ON_ONCE(1);
>>>>                            continue;
>>>>                    }
>>>>            }
>>>> }
>>>>
>>>> For percpu kptr, the remaining possible actiionable fields are
>>>>           BPF_LIST_HEAD and BPF_RB_ROOT
>>>>
>>>> So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all
>>>> list/rb nodes to unlink them from the list_head/rb_root.
>>>>
>>>> So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl().
>>>> Depending on whether the correspondingrec
>>>> with rb_node/list_node has ref count or not,
>>>> it may call bpf_mem_free() or bpf_mem_free_rcu(). If
>>>> bpf_mem_free() is called, then the field is immediately freed
>>>> but it may be used by some bpf prog (under rcu) concurrently,
>>>> could this be an issue?
>>>
>>> I see. Yeah. Looks like percpu makes such fields refcount-like.
>>> For non-percpu non-refcount only one bpf prog on one cpu can observe
>>> that object. That's why we're doing plain bpf_mem_free() for them.
>>>
>>> So this patch is a good fix for refcounted, but you and Kumar are
>>> correct that it's not sufficient for the case when percpu struct
>>> includes multiple rb_roots. One for each cpu.
>>>
>>>> Changing bpf_mem_free() in
>>>> __bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem.
>>>
>>> I guess we can do that when obj is either refcount or can be
>>> insider percpu, but it might not be enough. More below.
>>>
>>>> Another thing is related to list_head/rb_root.
>>>> During bpf_obj_free_fields(), is it possible that another cpu
>>>> may allocate a list_node/rb_node and add to list_head/rb_root?
>>>
>>> It's not an issue for the single owner case and for refcounted.
>>> Access to rb_root/list_head is always lock protected.
>>> For refcounted the obj needs to be acquired (from the verifier pov)
>>> meaning to have refcount =1 to be able to do spin_lock and
>>> operate on list_head.
>>
>> Martin and I came up with the following example early today like below,
>> assuming the map value struct contains a list_head and a spin_lock.
>>
>>            cpu 0                              cpu 1
>>         key = 1;
>>         v = bpf_map_lookup(&map, &key);
>>                                           key = 1;
>>                                           bpf_map_delete_elem(&map, &key);
>>                                           /* distruction continues and
>>                                            * bpf_obj_free_fields() are
>>                                            * called.
>>                                            */
>>                                           /* in bpf_list_head_free():
>>                                            * __bpf_spin_lock_irqsave(...)
>>                                            * ...
>>                                            * __bpf_spin_unlock_irqrestore();
>>                                            */
>>
>>         n = bpf_obj_new(...)
>>         bpf_spin_lock(&v->lock);
>>         bpf_list_push_front(&v->head, &v->node);
>>         bpf_spin_lock(&v->lock);
>>
>> In cpu 1 'bpf_obj_free_fields', there is a list head, so
>> bpf_list_head_free() is called. In bpf_list_head_free(), we do
>>
>>           __bpf_spin_lock_irqsave(spin_lock);
>>           if (!head->next || list_empty(head))
>>                   goto unlock;
>>           head = head->next;
>> unlock:
>>           INIT_LIST_HEAD(orig_head);
>>           __bpf_spin_unlock_irqrestore(spin_lock);
>>
>>           while (head != orig_head) {
>>                   void *obj = head;
>>
>>                   obj -= field->graph_root.node_offset;
>>                   head = head->next;
>>                   /* The contained type can also have resources, including a
>>                    * bpf_list_head which needs to be freed.
>>                    */
>>                   migrate_disable();
>>                   __bpf_obj_drop_impl(obj, field->graph_root.value_rec);
>>                   migrate_enable();
>>           }
>>
>> So it is possible the cpu 0 may add one element to the list_head
>> which will never been freed.
>>
>> This happens to say list_head or rb_root too. I am aware that
>> this may be an existing issue for some maps, e.g. hash map.
>> So it may not be a big problem. Just want to mention this though.
> 
> argh. That is indeed a problem and it's not limited to rbtree.
> cpu0 doing kptr_xchg into a map value that was just deleted by cpu1
> will cause a leak.
> It affects both sleepable and non-sleepable progs.
> For sleepable I see no other option, but to enforce 'v' use in RCU CS.
> rcu_unlock currently converts mem_rcu into untrusted.
> I think to fix the above it should convert all ptr_to_map_value to
> untrusted as well.
> In addition we can exten bpf_memalloc api with a dtor.
> Register it at the time of bpf_mem_alloc_init() and use
> bpf_mem_cache_free_rcu() in htab_elem_free() when kptr or
> other special fields are present in the value.
> For preallocated htab we'd need to reinstate call_rcu().
> This would be a heavy fix. Better ideas?
> 
>>
>>>
>>> But bpf_rb_root_free is indeed an issue for percpu, since each
>>> percpu has its own rb root field with its own bpf_spinlock, but
>>> for_each_cpu() {bpf_obj_free_fields();} violates access contract.
>>
>> Could you explain what 'access contract' mean here? For non-percpu
>> case, 'x' special fields may be checked. For percpu case, it is
>> just 'x * nr_cpus' special fields to be checked.
> 
> Right. That's what I mean by refcounted-like.
> 
>>
>>>
>>> percpu and rb_root creates such a maze of dependencies that
>>> I think it's better to disallow rb_root-s and kptr-s inside percpu
>>> for now.
>>
>> I can certainly disallow rb_root and list_head. Just want to
>> understand what kind of issues we face specially for percpu kptr.
> 
> Just matter of priorities. Fixing above is more urgent than
> adding percpu kptr with their own corner cases to analyze.

Okay, percpu kptr patch set will still take some time.
So agree that existing refcount fix should have higher priority.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes
  2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
                   ` (6 preceding siblings ...)
  2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
@ 2023-08-25 16:40 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 32+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-08-25 16:40 UTC (permalink / raw)
  To: Dave Marchevsky
  Cc: bpf, ast, daniel, andrii, martin.lau, kernel-team, yonghong.song

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Mon, 21 Aug 2023 12:33:04 -0700 you wrote:
> This series is the third of three (or more) followups to address issues
> in the bpf_refcount shared ownership implementation discovered by Kumar.
> This series addresses the use-after-free scenario described in [0]. The
> first followup series ([1]) also attempted to address the same
> use-after-free, but only got rid of the splat without addressing the
> underlying issue. After this series the underyling issue is fixed and
> bpf_refcount_acquire can be re-enabled.
> 
> [...]

Here is the summary with links:
  - [v2,bpf-next,1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire
    https://git.kernel.org/bpf/bpf-next/c/f0d991a07075
  - [v2,bpf-next,2/7] bpf: Consider non-owning refs trusted
    https://git.kernel.org/bpf/bpf-next/c/2a6d50b50d6d
  - [v2,bpf-next,3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes
    https://git.kernel.org/bpf/bpf-next/c/7e26cd12ad1c
  - [v2,bpf-next,4/7] bpf: Reenable bpf_refcount_acquire
    https://git.kernel.org/bpf/bpf-next/c/ba2464c86f18
  - [v2,bpf-next,5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected
    https://git.kernel.org/bpf/bpf-next/c/0816b8c6bf7f
  - [v2,bpf-next,6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs
    https://git.kernel.org/bpf/bpf-next/c/5861d1e8dbc4
  - [v2,bpf-next,7/7] selftests/bpf: Add tests for rbtree API interaction in sleepable progs
    https://git.kernel.org/bpf/bpf-next/c/312aa5bde898

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-08-25 16:40 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-21 19:33 [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire Dave Marchevsky
2023-08-22  1:52   ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 2/7] bpf: Consider non-owning refs trusted Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 3/7] bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes Dave Marchevsky
2023-08-23  6:26   ` Yonghong Song
2023-08-23 16:20     ` Alexei Starovoitov
2023-08-23 20:29       ` Yonghong Song
2023-08-24  1:38         ` Alexei Starovoitov
2023-08-24  2:09           ` Alexei Starovoitov
2023-08-24  4:01             ` Yonghong Song
2023-08-24  3:52           ` Yonghong Song
2023-08-24 22:03             ` Alexei Starovoitov
2023-08-24 22:25               ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 4/7] bpf: Reenable bpf_refcount_acquire Dave Marchevsky
2023-08-21 19:33 ` [PATCH v2 bpf-next 5/7] bpf: Consider non-owning refs to refcounted nodes RCU protected Dave Marchevsky
2023-08-22  2:37   ` Yonghong Song
2023-08-22  3:19     ` Yonghong Song
2023-08-22  5:47     ` David Marchevsky
2023-08-22 16:02       ` Yonghong Song
2023-08-22 23:45       ` Alexei Starovoitov
2023-08-23  0:18         ` Yonghong Song
2023-08-23  0:21           ` Alexei Starovoitov
2023-08-21 19:33 ` [PATCH v2 bpf-next 6/7] bpf: Allow bpf_spin_{lock,unlock} in sleepable progs Dave Marchevsky
2023-08-22  2:53   ` Yonghong Song
2023-08-22 19:46     ` Alexei Starovoitov
2023-08-22 19:53       ` Yonghong Song
2023-08-21 19:33 ` [PATCH v2 bpf-next 7/7] selftests/bpf: Add tests for rbtree API interaction " Dave Marchevsky
2023-08-22  3:18   ` Yonghong Song
2023-08-22  5:21     ` David Marchevsky
2023-08-22 15:00       ` Yonghong Song
2023-08-25 16:40 ` [PATCH v2 bpf-next 0/7] BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox