* [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map
@ 2022-08-30 17:27 Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range Dave Marchevsky
` (17 more replies)
0 siblings, 18 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
[ RFCv2: This RFC focuses on locking improvements. Some feedback from RFCv1
feedback and other discussions is not yet applied e.g. conversion to kfuncs,
dropping open-coded iter. This isn't meant to imply that I disagree with this
feedback, but rather want to make it easier to compare locking changes to
RFCv1. Please see changelog at end of this cover letter for a list of
newly-added patches and major changes worth looking at ]
Introduce a new map type, bpf_rbtree_map, allowing BPF programs to
create and manipulate rbtree data structures. bpf_rbtree_map differs
from 'classic' BPF map patterns in a few important ways:
* The map does not allocate its own elements. Instead,
BPF programs must call bpf_rbtree_alloc_node helper to allocate and
bpf_rbtree_add to add map elem - referred to as 'node' from now on -
to the rbtree. This means that rbtree maps can grow dynamically, do
not preallocate, and that 'max_entries' has no meaning for rbtree
maps.
* Separating allocation and insertion allows alloc_node call to
occur in contexts where it's safe to allocate.
* It's possible to remove a node from a rbtree map with
bpf_rbtree_remove helper.
* Goal here is to allow a node to be removed from one rbtree and
added to another
[ NOTE: This functionality is still in progress ]
Helpers are added to manipulate nodes and trees:
* bpf_rbtree_{alloc,free}_node: Allocate / free node structs
* bpf_rbtree_{add,remove}: Add / remove nodes from rbtree maps
* A comparator function is passed to bpf_rbtree_add in order to
find the correct place to add the node.
* bpf_rbtree_find: Find a node matching some condition in the rbtree
* A comparator function is passed in order to determine whether a
node matches what's being searched for.
bpf_rbtree_add is very similar to the 'map_push_elem' builtin, but since
verifier needs special logic to setup the comparator callback a new
helper is added. Same for bpf_rbtree_find and 'map_lookup_elem' builtin.
In order to safely support separate allocation / insertion and passing
nodes between rbtrees, some invariants must hold:
* A node that is not in a rbtree map must either be free'd or added to
a rbtree map before the program terminates
* Nodes are in this state when returned from bpf_rbtree_alloc_node
or bpf_rbtree_remove.
If a node is in a rbtree map it is 'owned' by the map, otherwise it is
owned by the BPF program which holds a reference to it. Owner is
responsible for the lifetime of the node.
This matches existing acquire / release semantics well. node_alloc and
remove operations 'acquire' a node while add and node_free operations
'release' the node. The verifier enforces that acquired nodes are
released before program terminates.
Some other implementation details:
* The value type of an rbtree map is expected to be a struct which
contains 'struct rb_node' at offset 0.
* BPF programs may not modify the node struct's rb_node field.
* Otherwise the tree could be left in corrupted state
* Rbtree map is value-only. Keys have no meaning
* Since the value type is not known until the rbtree map is
instantiated, helper protos have input and return type
'struct rb_node *' which verifier replaces with actual
map value type.
* [ TODO: Existing logic prevents any writes to PTR_TO_BTF_ID. This
broadly turned off in this patch and replaced with "no writes to
struct rb_node is PTR_TO_BTF_ID struct has one". This is a hack and
needs to be replaced. ]
Changelog:
RFCv1 -> RFCv2: lore.kernel.org/bpf/20220722183438.3319790-1-davemarchevsky@fb.com
Major changes:
* Add new patch ("bpf: Add verifier check for BPF_PTR_POISON retval and arg"),
use BPF_PTR_POISON as placeholder btf_id for helpers which return or
manipulate structs which contain rbtree node. (Alexei)
* Migrate all rbtree helpers to use BPF_PTR_POISON
* Testing this patch uncovered that rbtree_find was not in
function_returns_rbtree_node check, add it. Rbtree_add was not
setting a return btf_id type, fix that as well.
* Add new patch ("libbpf: Add support for private BSS map section")
allowing struct bpf_spin_lock declaration in special internal map
* Add new patches improving ergonomics of associating lock with rbtree(1),
teaching verifier to track rbtree locks(2), and teaching verifier to reject
programs calling rbtree helpers without holding the necessary lock
* (1): "bpf: Support declarative association of lock with rbtree map" (3)
* (2): "bpf: Verifier tracking of rbtree_spin_lock held"
* (3): "bpf: Check rbtree lock held during verification"
* These are the changes most worth a look. Each commit has associated test
change patch also added at the end of the series. Test patches are left
unsquashed to ease RFC review.
Minor changes:
* patch "bpf: Add rbtree map"
* Alloc nodes w/ GFP_NOWAIT instead of GFP_KERNEL (Yonghong, Alexei)
* Rename access_may_touch_field -> access_can_write_field (Alexei)
* patch "bpf: Add CONDITIONAL_RELEASE type flag"
* No need to return int from mark_ptr_or_null_regs, go back to returning
void (Alexei)
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Dave Marchevsky (18):
bpf: Add verifier support for custom callback return range
bpf: Add verifier check for BPF_PTR_POISON retval and arg
bpf: Add rb_node_off to bpf_map
bpf: Add rbtree map
libbpf: Add support for private BSS map section
bpf: Add bpf_spin_lock member to rbtree
bpf: Add bpf_rbtree_{lock,unlock} helpers
bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find}
bpf: Support declarative association of lock with rbtree map
bpf: Verifier tracking of rbtree_spin_lock held
bpf: Check rbtree lock held during verification
bpf: Add OBJ_NON_OWNING_REF type flag
bpf: Add CONDITIONAL_RELEASE type flag
bpf: Introduce PTR_ITER and PTR_ITER_END type flags
selftests/bpf: Add rbtree map tests
selftests/bpf: Declarative lock definition test changes
selftests/bpf: Lock tracking test changes
selftests/bpf: Rbtree static lock verification test changes
include/linux/bpf.h | 17 +
include/linux/bpf_types.h | 1 +
include/linux/bpf_verifier.h | 3 +
include/linux/btf.h | 1 +
include/linux/poison.h | 3 +
include/uapi/linux/bpf.h | 120 ++++
kernel/bpf/Makefile | 2 +-
kernel/bpf/btf.c | 21 +
kernel/bpf/helpers.c | 48 +-
kernel/bpf/rbtree.c | 442 ++++++++++++++
kernel/bpf/syscall.c | 11 +-
kernel/bpf/verifier.c | 546 ++++++++++++++++--
tools/include/uapi/linux/bpf.h | 120 ++++
tools/lib/bpf/libbpf.c | 164 +++++-
.../selftests/bpf/prog_tests/rbtree_map.c | 134 +++++
.../testing/selftests/bpf/progs/rbtree_map.c | 119 ++++
.../selftests/bpf/progs/rbtree_map_fail.c | 243 ++++++++
.../bpf/progs/rbtree_map_load_fail.c | 31 +
18 files changed, 1951 insertions(+), 75 deletions(-)
create mode 100644 kernel/bpf/rbtree.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree_map.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_fail.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
--
2.30.2
^ permalink raw reply [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-09-01 21:01 ` Joanne Koong
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 02/18] bpf: Add verifier check for BPF_PTR_POISON retval and arg Dave Marchevsky
` (16 subsequent siblings)
17 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Verifier logic to confirm that a callback function returns 0 or 1 was
added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
At the time, callback return value was only used to continue or stop
iteration.
In order to support callbacks with a broader return value range, such as
those added further in this series, add a callback_ret_range to
bpf_func_state. Verifier's helpers which set in_callback_fn will also
set the new field, which the verifier will later use to check return
value bounds.
Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
value as the latter would prevent the valid range (0, U64_MAX) being
used.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf_verifier.h | 1 +
kernel/bpf/verifier.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 2e3bad8640dc..9c017575c034 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -237,6 +237,7 @@ struct bpf_func_state {
*/
u32 async_entry_cnt;
bool in_callback_fn;
+ struct tnum callback_ret_range;
bool in_async_callback_fn;
/* The following fields should be last. See copy_func_state() */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9bef8b41e737..68bfa7c28048 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
state->callsite = callsite;
state->frameno = frameno;
state->subprogno = subprogno;
+ state->callback_ret_range = tnum_range(0, 1);
init_reg_state(env, state);
mark_verifier_state_scratched(env);
}
@@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
callee->in_callback_fn = true;
+ callee->callback_ret_range = tnum_range(0, 1);
return 0;
}
@@ -6906,7 +6908,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
caller = state->frame[state->curframe];
if (callee->in_callback_fn) {
/* enforce R0 return value range [0, 1]. */
- struct tnum range = tnum_range(0, 1);
+ struct tnum range = callee->callback_ret_range;
if (r0->type != SCALAR_VALUE) {
verbose(env, "R0 not a scalar value\n");
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 02/18] bpf: Add verifier check for BPF_PTR_POISON retval and arg
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 03/18] bpf: Add rb_node_off to bpf_map Dave Marchevsky
` (15 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
BPF_PTR_POISON was added in commit c0a5a21c25f37 ("bpf: Allow storing
referenced kptr in map") to denote a bpf_func_proto btf_id which the
verifier will replace with a dynamically-determined one at verification
time.
This patch adds 'poison' functionality to BPF_PTR_POISON in order to
prepare for expanded use of the value to poison ret- and arg-btf_id in
further patches of this series. Specifically, when the verifier checks
helper calls, it assumes that BPF_PTR_POISON'ed ret type will be
replaced with a valid type before - or in lieu of - the default
ret_btf_id logic. Similarly for argument btf_id.
If poisoned btf_id reaches default handling block for either, consider
this a verifier internal error and fail verification. Otherwise a helper
w/ poisoned btf_id but no verifier logic replacing the type will cause a
crash as the invalid pointer is dereferenced.
Also move BPF_PTR_POISON to existing <linux/posion.h> header and replace
the hardcoded value with something that looks slightly like BPF (0xB4F).
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/poison.h | 3 +++
kernel/bpf/helpers.c | 6 +++---
kernel/bpf/verifier.c | 28 ++++++++++++++++++++++------
3 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/include/linux/poison.h b/include/linux/poison.h
index d62ef5a6b4e9..599995d29fe4 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -81,4 +81,7 @@
/********** net/core/page_pool.c **********/
#define PP_SIGNATURE (0x40 + POISON_POINTER_DELTA)
+/********** kernel/bpf/ **********/
+#define BPF_PTR_POISON ((void *)(0xB4FUL + POISON_POINTER_DELTA))
+
#endif
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3c1b9bbcf971..6de4dedf45c0 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -15,6 +15,7 @@
#include <linux/ctype.h>
#include <linux/jiffies.h>
#include <linux/pid_namespace.h>
+#include <linux/poison.h>
#include <linux/proc_ns.h>
#include <linux/security.h>
#include <linux/btf_ids.h>
@@ -1410,10 +1411,9 @@ BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
}
/* Unlike other PTR_TO_BTF_ID helpers the btf_id in bpf_kptr_xchg()
- * helper is determined dynamically by the verifier.
+ * helper is determined dynamically by the verifier. Use BPF_PTR_POISON to
+ * denote type that verifier will determine.
*/
-#define BPF_PTR_POISON ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA))
-
static const struct bpf_func_proto bpf_kptr_xchg_proto = {
.func = bpf_kptr_xchg,
.gpl_only = false,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 68bfa7c28048..ee5b57140c73 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -23,6 +23,7 @@
#include <linux/error-injection.h>
#include <linux/bpf_lsm.h>
#include <linux/btf_ids.h>
+#include <linux/poison.h>
#include "disasm.h"
@@ -5764,13 +5765,21 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
if (meta->func_id == BPF_FUNC_kptr_xchg) {
if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
return -EACCES;
- } else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+ } else {
+ if (arg_btf_id == BPF_PTR_POISON) {
+ verbose(env, "verifier internal error: R%d has "
+ "non-overwritten BPF_PTR_POISON type\n", regno);
+ return -EACCES;
+ }
+
+ if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
btf_vmlinux, *arg_btf_id,
strict_type_match)) {
- verbose(env, "R%d is of type %s but %s is expected\n",
- regno, kernel_type_name(reg->btf, reg->btf_id),
- kernel_type_name(btf_vmlinux, *arg_btf_id));
- return -EACCES;
+ verbose(env, "R%d is of type %s but %s is expected\n",
+ regno, kernel_type_name(reg->btf, reg->btf_id),
+ kernel_type_name(btf_vmlinux, *arg_btf_id));
+ return -EACCES;
+ }
}
}
@@ -7454,7 +7463,14 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
ret_btf = meta.kptr_off_desc->kptr.btf;
ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
} else {
- ret_btf = btf_vmlinux;
+ if (fn->ret_btf_id == BPF_PTR_POISON) {
+ verbose(env, "verifier internal error: func %s "
+ "has non-overwritten "
+ "BPF_PTR_POISON return type\n",
+ func_id_name(func_id));
+ return -EINVAL;
+ }
+ ret_btf = btf_vmlinux;
ret_btf_id = *fn->ret_btf_id;
}
if (ret_btf_id == 0) {
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 03/18] bpf: Add rb_node_off to bpf_map
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 02/18] bpf: Add verifier check for BPF_PTR_POISON retval and arg Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 04/18] bpf: Add rbtree map Dave Marchevsky
` (14 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Similarly to other protected fields which might be in a map value -
bpf_spin_lock and bpf_timer - keep track of existence and offset of
struct rb_node within map value struct. This will allow later patches in
this series to prevent writes to rb_node field.
Allowing bpf programs to modify the rbtree node struct's rb_node field
would allow parent and children node pointers to be changed, which could
corrupt an otherwise-valid rbtree.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf.h | 6 ++++++
include/linux/btf.h | 1 +
kernel/bpf/btf.c | 21 +++++++++++++++++++++
kernel/bpf/syscall.c | 3 +++
4 files changed, 31 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a627a02cf8ab..b4a44ffb0d6c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -215,6 +215,7 @@ struct bpf_map {
int spin_lock_off; /* >=0 valid offset, <0 error */
struct bpf_map_value_off *kptr_off_tab;
int timer_off; /* >=0 valid offset, <0 error */
+ int rb_node_off; /* >=0 valid offset, <0 error */
u32 id;
int numa_node;
u32 btf_key_type_id;
@@ -264,6 +265,11 @@ static inline bool map_value_has_kptrs(const struct bpf_map *map)
return !IS_ERR_OR_NULL(map->kptr_off_tab);
}
+static inline bool map_value_has_rb_node(const struct bpf_map *map)
+{
+ return map->rb_node_off >= 0;
+}
+
static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
{
if (unlikely(map_value_has_spin_lock(map)))
diff --git a/include/linux/btf.h b/include/linux/btf.h
index ad93c2d9cc1c..cbf10b6b4ada 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -154,6 +154,7 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
u32 expected_offset, u32 expected_size);
int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
int btf_find_timer(const struct btf *btf, const struct btf_type *t);
+int btf_find_rb_node(const struct btf *btf, const struct btf_type *t);
struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
const struct btf_type *t);
bool btf_type_is_void(const struct btf_type *t);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 903719b89238..f020efad8d9b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3195,6 +3195,7 @@ enum btf_field_type {
BTF_FIELD_SPIN_LOCK,
BTF_FIELD_TIMER,
BTF_FIELD_KPTR,
+ BTF_FIELD_RB_NODE,
};
enum {
@@ -3282,6 +3283,7 @@ static int btf_find_struct_field(const struct btf *btf, const struct btf_type *t
switch (field_type) {
case BTF_FIELD_SPIN_LOCK:
case BTF_FIELD_TIMER:
+ case BTF_FIELD_RB_NODE:
ret = btf_find_struct(btf, member_type, off, sz,
idx < info_cnt ? &info[idx] : &tmp);
if (ret < 0)
@@ -3332,6 +3334,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
switch (field_type) {
case BTF_FIELD_SPIN_LOCK:
case BTF_FIELD_TIMER:
+ case BTF_FIELD_RB_NODE:
ret = btf_find_struct(btf, var_type, off, sz,
idx < info_cnt ? &info[idx] : &tmp);
if (ret < 0)
@@ -3374,6 +3377,11 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
sz = sizeof(struct bpf_timer);
align = __alignof__(struct bpf_timer);
break;
+ case BTF_FIELD_RB_NODE:
+ name = "rb_node";
+ sz = sizeof(struct rb_node);
+ align = __alignof__(struct rb_node);
+ break;
case BTF_FIELD_KPTR:
name = NULL;
sz = sizeof(u64);
@@ -3420,6 +3428,19 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t)
return info.off;
}
+int btf_find_rb_node(const struct btf *btf, const struct btf_type *t)
+{
+ struct btf_field_info info;
+ int ret;
+
+ ret = btf_find_field(btf, t, BTF_FIELD_RB_NODE, &info, 1);
+ if (ret < 0)
+ return ret;
+ if (!ret)
+ return -ENOENT;
+ return info.off;
+}
+
struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
const struct btf_type *t)
{
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 83c7136c5788..3947fbd137af 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1052,6 +1052,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
}
}
+ map->rb_node_off = btf_find_rb_node(btf, value_type);
+
if (map->ops->map_check_btf) {
ret = map->ops->map_check_btf(map, btf, key_type, value_type);
if (ret < 0)
@@ -1115,6 +1117,7 @@ static int map_create(union bpf_attr *attr)
map->spin_lock_off = -EINVAL;
map->timer_off = -EINVAL;
+ map->rb_node_off = -EINVAL;
if (attr->btf_key_type_id || attr->btf_value_type_id ||
/* Even the map's value is a kernel's struct,
* the bpf_prog.o must have BTF to begin with
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 04/18] bpf: Add rbtree map
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (2 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 03/18] bpf: Add rb_node_off to bpf_map Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 05/18] libbpf: Add support for private BSS map section Dave Marchevsky
` (13 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Introduce a new map type, bpf_rbtree_map, allowing BPF programs to
create and manipulate rbtree data structures. bpf_rbtree_map differs
from 'classic' BPF map patterns in a few important ways:
* The map does not allocate its own elements. Instead,
BPF programs must call bpf_rbtree_alloc_node helper to allocate and
bpf_rbtree_add to add map elem - referred to as 'node' from now on -
to the rbtree. This means that rbtree maps can grow dynamically, do
not preallocate, and that 'max_entries' has no meaning for rbtree
maps.
* Separating allocation and insertion allows alloc_node call to
occur in contexts where it's safe to allocate.
* It's possible to remove a node from a rbtree map with
bpf_rbtree_remove helper.
* Goal here is to allow a node to be removed from one rbtree and
added to another
[ NOTE: This functionality is still in progress ]
Helpers are added to manipulate nodes and trees:
* bpf_rbtree_{alloc,free}_node: Allocate / free node structs
* bpf_rbtree_{add,remove}: Add / remove nodes from rbtree maps
* A comparator function is passed to bpf_rbtree_add in order to
find the correct place to add the node.
* bpf_rbtree_find: Find a node matching some condition in the rbtree
* A comparator function is passed in order to determine whether a
node matches what's being searched for.
bpf_rbtree_add is very similar to the 'map_push_elem' builtin, but since
verifier needs special logic to setup the comparator callback a new
helper is added. Same for bpf_rbtree_find and 'map_lookup_elem' builtin.
In order to safely support separate allocation / insertion and passing
nodes between rbtrees, some invariants must hold:
* A node that is not in a rbtree map must either be free'd or added to
a rbtree map before the program terminates
* Nodes are in this state when returned from bpf_rbtree_alloc_node
or bpf_rbtree_remove.
If a node is in a rbtree map it is 'owned' by the map, otherwise it is
owned by the BPF program which holds a reference to it. Owner is
responsible for the lifetime of the node.
This matches existing acquire / release semantics well. node_alloc and
remove operations 'acquire' a node while add and node_free operations
'release' the node. The verifier enforces that acquired nodes are
released before program terminates.
Some other implementation details:
* The value type of an rbtree map is expected to be a struct which
contains 'struct rb_node' at offset 0.
* BPF programs may not modify the node struct's rb_node field.
* Otherwise the tree could be left in corrupted state
* Rbtree map is value-only. Keys have no meaning
* Since the value type is not known until the rbtree map is
instantiated, helper protos have input and return type
'struct rb_node *' which verifier replaces with actual
map value type.
* [ TODO: Existing logic prevents any writes to PTR_TO_BTF_ID. This
broadly turned off in this patch and replaced with "no writes to
struct rb_node is PTR_TO_BTF_ID struct has one". This is a hack and
needs to be replaced. ]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf_types.h | 1 +
include/uapi/linux/bpf.h | 61 ++++++++
kernel/bpf/Makefile | 2 +-
kernel/bpf/helpers.c | 15 ++
kernel/bpf/rbtree.c | 256 +++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 121 +++++++++++++++-
tools/include/uapi/linux/bpf.h | 61 ++++++++
7 files changed, 511 insertions(+), 6 deletions(-)
create mode 100644 kernel/bpf/rbtree.c
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2b9112b80171..78e9b5253983 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_RBTREE, rbtree_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e174ad28aeb7..1af17b27d34f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -909,6 +909,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_INODE_STORAGE,
BPF_MAP_TYPE_TASK_STORAGE,
BPF_MAP_TYPE_BLOOM_FILTER,
+ BPF_MAP_TYPE_RBTREE,
};
/* Note that tracing related programs such as
@@ -5353,6 +5354,61 @@ union bpf_attr {
* Return
* Current *ktime*.
*
+ * void *bpf_rbtree_alloc_node(struct bpf_map *map, u32 sz)
+ * Description
+ * Allocate a node of size *sz* bytes for use in rbtree *map*.
+ *
+ * *sz* must be >= sizeof(struct rb_node)
+ * Return
+ * A pointer to the allocated node if successful, otherwise NULL.
+ *
+ * The verifier considers the type of the returned pointer to be
+ * the BTF id of *map*'s value type.
+ *
+ * void *bpf_rbtree_add(struct bpf_map *map, void *node, void *cb)
+ * Description
+ * Add *node* to rbtree *map* using *cb* comparator callback to
+ * walk the rbtree.
+ *
+ * *cb* must take (struct rb_node \*, const struct rb_node \*) as
+ * input and return a bool signifying whether the first rb_node's
+ * struct is less than the second.
+ *
+ * Return
+ * If success, returns a pointer to the added node in the rbtree.
+ *
+ * If add fails, returns NULL
+ *
+ * long bpf_rbtree_find(struct bpf_map *map, void *key, void *cb)
+ * Description
+ * Find *key* in rbtree *map* using *cb* comparator callback to walk the
+ * rbtree.
+ *
+ * *cb* must take (const void \*key, const struct rb_node \*n) as
+ * input and return an int. If *cb* determines *n* to match *key*, *cb* must
+ * return 0. If larger, a positive int, and a negative int if smaller.
+ *
+ * *key* does not need to be a rbtree node struct.
+ *
+ * Return
+ * Ptr to rbtree node if found, otherwise NULL.
+ *
+ * void *bpf_rbtree_remove(struct bpf_map *map, void *elem)
+ * Description
+ * Remove *elem* from rbtree *map*.
+ *
+ * Return
+ * If success, returns a pointer to the removed node.
+ *
+ * If remove fails, returns NULL
+ *
+ * long bpf_rbtree_free_node(struct bpf_map *map, void *elem)
+ * Description
+ * Free rb_node that isn't associated w/ a tree.
+ *
+ * Return
+ * 0
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5564,6 +5620,11 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
+ FN(rbtree_alloc_node), \
+ FN(rbtree_add), \
+ FN(rbtree_find), \
+ FN(rbtree_remove), \
+ FN(rbtree_free_node), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 057ba8e01e70..00eedab3ad53 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -7,7 +7,7 @@ endif
CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
-obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
+obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o rbtree.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 6de4dedf45c0..d18d4d8ca1e2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1598,6 +1598,11 @@ const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;
const struct bpf_func_proto bpf_probe_read_kernel_proto __weak;
const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
+const struct bpf_func_proto bpf_rbtree_alloc_node_proto __weak;
+const struct bpf_func_proto bpf_rbtree_add_proto __weak;
+const struct bpf_func_proto bpf_rbtree_find_proto __weak;
+const struct bpf_func_proto bpf_rbtree_remove_proto __weak;
+const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1689,6 +1694,16 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_timer_cancel_proto;
case BPF_FUNC_kptr_xchg:
return &bpf_kptr_xchg_proto;
+ case BPF_FUNC_rbtree_alloc_node:
+ return &bpf_rbtree_alloc_node_proto;
+ case BPF_FUNC_rbtree_add:
+ return &bpf_rbtree_add_proto;
+ case BPF_FUNC_rbtree_find:
+ return &bpf_rbtree_find_proto;
+ case BPF_FUNC_rbtree_remove:
+ return &bpf_rbtree_remove_proto;
+ case BPF_FUNC_rbtree_free_node:
+ return &bpf_rbtree_free_node_proto;
default:
break;
}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
new file mode 100644
index 000000000000..7d50574e4d57
--- /dev/null
+++ b/kernel/bpf/rbtree.c
@@ -0,0 +1,256 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+#include <linux/filter.h>
+#include <linux/poison.h>
+
+struct bpf_rbtree {
+ struct bpf_map map;
+ struct rb_root_cached root;
+};
+
+static int rbtree_map_alloc_check(union bpf_attr *attr)
+{
+ if (attr->max_entries || !attr->btf_value_type_id)
+ return -EINVAL;
+
+ return 0;
+}
+
+static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
+{
+ struct bpf_rbtree *tree;
+ int numa_node;
+
+ if (!bpf_capable())
+ return ERR_PTR(-EPERM);
+
+ if (attr->value_size == 0)
+ return ERR_PTR(-EINVAL);
+
+ numa_node = bpf_map_attr_numa_node(attr);
+ tree = bpf_map_area_alloc(sizeof(*tree), numa_node);
+ if (!tree)
+ return ERR_PTR(-ENOMEM);
+
+ tree->root = RB_ROOT_CACHED;
+ bpf_map_init_from_attr(&tree->map, attr);
+ return &tree->map;
+}
+
+static struct rb_node *rbtree_map_alloc_node(struct bpf_map *map, size_t sz)
+{
+ struct rb_node *node;
+
+ node = bpf_map_kmalloc_node(map, sz, GFP_NOWAIT, map->numa_node);
+ if (!node)
+ return NULL;
+ RB_CLEAR_NODE(node);
+ return node;
+}
+
+BPF_CALL_2(bpf_rbtree_alloc_node, struct bpf_map *, map, u32, sz)
+{
+ struct rb_node *node;
+
+ if (map->map_type != BPF_MAP_TYPE_RBTREE)
+ return (u64)NULL;
+
+ if (sz < sizeof(*node))
+ return (u64)NULL;
+
+ node = rbtree_map_alloc_node(map, (size_t)sz);
+ if (!node)
+ return (u64)NULL;
+
+ return (u64)node;
+}
+
+const struct bpf_func_proto bpf_rbtree_alloc_node_proto = {
+ .func = bpf_rbtree_alloc_node,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_CONST_ALLOC_SIZE_OR_ZERO,
+};
+
+BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+ struct rb_node *node = (struct rb_node *)value;
+
+ if (WARN_ON_ONCE(!RB_EMPTY_NODE(node)))
+ return (u64)NULL;
+
+ rb_add_cached(node, &tree->root, (bool (*)(struct rb_node *, const struct rb_node *))cb);
+ return (u64)node;
+}
+
+const struct bpf_func_proto bpf_rbtree_add_proto = {
+ .func = bpf_rbtree_add,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
+ .arg2_btf_id = BPF_PTR_POISON,
+ .arg3_type = ARG_PTR_TO_FUNC,
+};
+
+BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ return (u64)rb_find(key, &tree->root.rb_root,
+ (int (*)(const void *key,
+ const struct rb_node *))cb);
+}
+
+const struct bpf_func_proto bpf_rbtree_find_proto = {
+ .func = bpf_rbtree_find,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_FUNC,
+};
+
+/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
+ * 'rb_node *', so field name of rb_node within containing struct is not
+ * needed.
+ *
+ * Since bpf_rb_tree's node always has 'struct rb_node' at offset 0 it's
+ * not necessary to know field name or type of node struct
+ */
+#define bpf_rbtree_postorder_for_each_entry_safe(pos, n, root) \
+ for (pos = rb_first_postorder(root); \
+ pos && ({ n = rb_next_postorder(pos); 1; }); \
+ pos = n)
+
+static void rbtree_map_free(struct bpf_map *map)
+{
+ struct rb_node *pos, *n;
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root)
+ kfree(pos);
+ bpf_map_area_free(tree);
+}
+
+static int rbtree_map_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
+{
+ if (!map_value_has_rb_node(map))
+ return -EINVAL;
+
+ if (map->rb_node_off > 0)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int rbtree_map_push_elem(struct bpf_map *map, void *value, u64 flags)
+{
+ /* Use bpf_rbtree_add helper instead
+ */
+ return -ENOTSUPP;
+}
+
+static int rbtree_map_pop_elem(struct bpf_map *map, void *value)
+{
+ return -ENOTSUPP;
+}
+
+static int rbtree_map_peek_elem(struct bpf_map *map, void *value)
+{
+ return -ENOTSUPP;
+}
+
+static void *rbtree_map_lookup_elem(struct bpf_map *map, void *value)
+{
+ /* Use bpf_rbtree_find helper instead
+ */
+ return ERR_PTR(-ENOTSUPP);
+}
+
+static int rbtree_map_update_elem(struct bpf_map *map, void *key, void *value,
+ u64 flags)
+{
+ return -ENOTSUPP;
+}
+
+static int rbtree_map_delete_elem(struct bpf_map *map, void *value)
+{
+ return -ENOTSUPP;
+}
+
+BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+ struct rb_node *node = (struct rb_node *)value;
+
+ if (WARN_ON_ONCE(RB_EMPTY_NODE(node)))
+ return (u64)NULL;
+
+ rb_erase_cached(node, &tree->root);
+ RB_CLEAR_NODE(node);
+ return (u64)node;
+}
+
+const struct bpf_func_proto bpf_rbtree_remove_proto = {
+ .func = bpf_rbtree_remove,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .arg2_btf_id = BPF_PTR_POISON,
+};
+
+BPF_CALL_2(bpf_rbtree_free_node, struct bpf_map *, map, void *, value)
+{
+ struct rb_node *node = (struct rb_node *)value;
+
+ WARN_ON_ONCE(!RB_EMPTY_NODE(node));
+ kfree(node);
+ return 0;
+}
+
+const struct bpf_func_proto bpf_rbtree_free_node_proto = {
+ .func = bpf_rbtree_free_node,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
+ .arg2_btf_id = BPF_PTR_POISON,
+};
+
+static int rbtree_map_get_next_key(struct bpf_map *map, void *key,
+ void *next_key)
+{
+ return -ENOTSUPP;
+}
+
+BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree)
+const struct bpf_map_ops rbtree_map_ops = {
+ .map_meta_equal = bpf_map_meta_equal,
+ .map_alloc_check = rbtree_map_alloc_check,
+ .map_alloc = rbtree_map_alloc,
+ .map_free = rbtree_map_free,
+ .map_get_next_key = rbtree_map_get_next_key,
+ .map_push_elem = rbtree_map_push_elem,
+ .map_peek_elem = rbtree_map_peek_elem,
+ .map_pop_elem = rbtree_map_pop_elem,
+ .map_lookup_elem = rbtree_map_lookup_elem,
+ .map_update_elem = rbtree_map_update_elem,
+ .map_delete_elem = rbtree_map_delete_elem,
+ .map_check_btf = rbtree_map_check_btf,
+ .map_btf_id = &bpf_rbtree_map_btf_ids[0],
+};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ee5b57140c73..0a2e958ddca8 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -482,7 +482,9 @@ static bool is_acquire_function(enum bpf_func_id func_id,
func_id == BPF_FUNC_sk_lookup_udp ||
func_id == BPF_FUNC_skc_lookup_tcp ||
func_id == BPF_FUNC_ringbuf_reserve ||
- func_id == BPF_FUNC_kptr_xchg)
+ func_id == BPF_FUNC_kptr_xchg ||
+ func_id == BPF_FUNC_rbtree_alloc_node ||
+ func_id == BPF_FUNC_rbtree_remove)
return true;
if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -532,6 +534,21 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
insn->imm == BPF_CMPXCHG;
}
+static bool function_manipulates_rbtree_node(enum bpf_func_id func_id)
+{
+ return func_id == BPF_FUNC_rbtree_add ||
+ func_id == BPF_FUNC_rbtree_remove ||
+ func_id == BPF_FUNC_rbtree_free_node;
+}
+
+static bool function_returns_rbtree_node(enum bpf_func_id func_id)
+{
+ return func_id == BPF_FUNC_rbtree_alloc_node ||
+ func_id == BPF_FUNC_rbtree_find ||
+ func_id == BPF_FUNC_rbtree_add ||
+ func_id == BPF_FUNC_rbtree_remove;
+}
+
/* string representation of 'enum bpf_reg_type'
*
* Note that reg_type_str() can not appear more than once in a single verbose()
@@ -3785,6 +3802,13 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
return 0;
}
+static bool access_can_write_field(u32 access_off, size_t access_sz,
+ u32 field_off, size_t field_sz)
+{
+ return access_off < field_off + field_sz &&
+ field_off < access_off + access_sz;
+}
+
/* if any part of struct field can be touched by
* load/store reject this program.
* To check that [x1, x2) overlaps with [y1, y2)
@@ -4491,7 +4515,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
const char *tname = btf_name_by_offset(reg->btf, t->name_off);
enum bpf_type_flag flag = 0;
u32 btf_id;
- int ret;
+ int ret, rb_node_off;
if (off < 0) {
verbose(env,
@@ -4528,8 +4552,13 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
off, size, atype, &btf_id, &flag);
} else {
if (atype != BPF_READ) {
- verbose(env, "only read is supported\n");
- return -EACCES;
+ rb_node_off = btf_find_rb_node(reg->btf, t);
+ if (rb_node_off < 0 ||
+ access_can_write_field(off, size, rb_node_off,
+ sizeof(struct rb_node))) {
+ verbose(env, "only read is supported\n");
+ return -EACCES;
+ }
}
ret = btf_struct_access(&env->log, reg->btf, t, off, size,
@@ -5746,7 +5775,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
return -EACCES;
found:
- if (reg->type == PTR_TO_BTF_ID) {
+ if (base_type(reg->type) == PTR_TO_BTF_ID) {
/* For bpf_sk_release, it needs to match against first member
* 'struct sock_common', hence make an exception for it. This
* allows bpf_sk_release to work for multiple socket types.
@@ -5765,6 +5794,17 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
if (meta->func_id == BPF_FUNC_kptr_xchg) {
if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno))
return -EACCES;
+ } else if (function_manipulates_rbtree_node(meta->func_id)) {
+ if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+ meta->map_ptr->btf,
+ meta->map_ptr->btf_value_type_id,
+ strict_type_match)) {
+ verbose(env, "rbtree: R%d is of type %s but %s is expected\n",
+ regno, kernel_type_name(reg->btf, reg->btf_id),
+ kernel_type_name(meta->map_ptr->btf,
+ meta->map_ptr->btf_value_type_id));
+ return -EACCES;
+ }
} else {
if (arg_btf_id == BPF_PTR_POISON) {
verbose(env, "verifier internal error: R%d has "
@@ -6378,10 +6418,17 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
break;
case BPF_FUNC_map_pop_elem:
if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+ map->map_type != BPF_MAP_TYPE_RBTREE &&
map->map_type != BPF_MAP_TYPE_STACK)
goto error;
break;
case BPF_FUNC_map_peek_elem:
+ if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+ map->map_type != BPF_MAP_TYPE_STACK &&
+ map->map_type != BPF_MAP_TYPE_RBTREE &&
+ map->map_type != BPF_MAP_TYPE_BLOOM_FILTER)
+ goto error;
+ break;
case BPF_FUNC_map_push_elem:
if (map->map_type != BPF_MAP_TYPE_QUEUE &&
map->map_type != BPF_MAP_TYPE_STACK &&
@@ -6836,6 +6883,57 @@ static int set_loop_callback_state(struct bpf_verifier_env *env,
return 0;
}
+static int set_rbtree_add_callback_state(struct bpf_verifier_env *env,
+ struct bpf_func_state *caller,
+ struct bpf_func_state *callee,
+ int insn_idx)
+{
+ struct bpf_map *map_ptr = caller->regs[BPF_REG_1].map_ptr;
+
+ /* bpf_rbtree_add(struct bpf_map *map, void *value, void *cb)
+ * cb(struct rb_node *a, const struct rb_node *b);
+ */
+ callee->regs[BPF_REG_1].type = PTR_TO_MAP_VALUE;
+ __mark_reg_known_zero(&callee->regs[BPF_REG_1]);
+ callee->regs[BPF_REG_1].map_ptr = map_ptr;
+
+ callee->regs[BPF_REG_2].type = PTR_TO_MAP_VALUE;
+ __mark_reg_known_zero(&callee->regs[BPF_REG_2]);
+ callee->regs[BPF_REG_2].map_ptr = map_ptr;
+
+ __mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+ __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+ __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+ callee->in_callback_fn = true;
+ return 0;
+}
+
+static int set_rbtree_find_callback_state(struct bpf_verifier_env *env,
+ struct bpf_func_state *caller,
+ struct bpf_func_state *callee,
+ int insn_idx)
+{
+ struct bpf_map *map_ptr = caller->regs[BPF_REG_1].map_ptr;
+
+ /* bpf_rbtree_find(struct bpf_map *map, void *key, void *cb)
+ * cb(void *key, const struct rb_node *b);
+ */
+ callee->regs[BPF_REG_1].type = PTR_TO_MAP_VALUE;
+ __mark_reg_known_zero(&callee->regs[BPF_REG_1]);
+ callee->regs[BPF_REG_1].map_ptr = map_ptr;
+
+ callee->regs[BPF_REG_2].type = PTR_TO_MAP_VALUE;
+ __mark_reg_known_zero(&callee->regs[BPF_REG_2]);
+ callee->regs[BPF_REG_2].map_ptr = map_ptr;
+
+ __mark_reg_not_init(env, &callee->regs[BPF_REG_3]);
+ __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+ __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+ callee->in_callback_fn = true;
+ callee->callback_ret_range = tnum_range(0, U64_MAX);
+ return 0;
+}
+
static int set_timer_callback_state(struct bpf_verifier_env *env,
struct bpf_func_state *caller,
struct bpf_func_state *callee,
@@ -7318,6 +7416,14 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
set_loop_callback_state);
break;
+ case BPF_FUNC_rbtree_add:
+ err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+ set_rbtree_add_callback_state);
+ break;
+ case BPF_FUNC_rbtree_find:
+ err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+ set_rbtree_find_callback_state);
+ break;
case BPF_FUNC_dynptr_from_mem:
if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) {
verbose(env, "Unsupported reg type %s for bpf_dynptr_from_mem data\n",
@@ -7462,6 +7568,9 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
if (func_id == BPF_FUNC_kptr_xchg) {
ret_btf = meta.kptr_off_desc->kptr.btf;
ret_btf_id = meta.kptr_off_desc->kptr.btf_id;
+ } else if (function_returns_rbtree_node(func_id)) {
+ ret_btf = meta.map_ptr->btf;
+ ret_btf_id = meta.map_ptr->btf_value_type_id;
} else {
if (fn->ret_btf_id == BPF_PTR_POISON) {
verbose(env, "verifier internal error: func %s "
@@ -13499,8 +13608,10 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
BPF_SIZE((insn)->code);
env->prog->aux->num_exentries++;
} else if (resolve_prog_type(env->prog) != BPF_PROG_TYPE_STRUCT_OPS) {
+ /*TODO: Not sure what to do here
verbose(env, "Writes through BTF pointers are not allowed\n");
return -EINVAL;
+ */
}
continue;
default:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e174ad28aeb7..1af17b27d34f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -909,6 +909,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_INODE_STORAGE,
BPF_MAP_TYPE_TASK_STORAGE,
BPF_MAP_TYPE_BLOOM_FILTER,
+ BPF_MAP_TYPE_RBTREE,
};
/* Note that tracing related programs such as
@@ -5353,6 +5354,61 @@ union bpf_attr {
* Return
* Current *ktime*.
*
+ * void *bpf_rbtree_alloc_node(struct bpf_map *map, u32 sz)
+ * Description
+ * Allocate a node of size *sz* bytes for use in rbtree *map*.
+ *
+ * *sz* must be >= sizeof(struct rb_node)
+ * Return
+ * A pointer to the allocated node if successful, otherwise NULL.
+ *
+ * The verifier considers the type of the returned pointer to be
+ * the BTF id of *map*'s value type.
+ *
+ * void *bpf_rbtree_add(struct bpf_map *map, void *node, void *cb)
+ * Description
+ * Add *node* to rbtree *map* using *cb* comparator callback to
+ * walk the rbtree.
+ *
+ * *cb* must take (struct rb_node \*, const struct rb_node \*) as
+ * input and return a bool signifying whether the first rb_node's
+ * struct is less than the second.
+ *
+ * Return
+ * If success, returns a pointer to the added node in the rbtree.
+ *
+ * If add fails, returns NULL
+ *
+ * long bpf_rbtree_find(struct bpf_map *map, void *key, void *cb)
+ * Description
+ * Find *key* in rbtree *map* using *cb* comparator callback to walk the
+ * rbtree.
+ *
+ * *cb* must take (const void \*key, const struct rb_node \*n) as
+ * input and return an int. If *cb* determines *n* to match *key*, *cb* must
+ * return 0. If larger, a positive int, and a negative int if smaller.
+ *
+ * *key* does not need to be a rbtree node struct.
+ *
+ * Return
+ * Ptr to rbtree node if found, otherwise NULL.
+ *
+ * void *bpf_rbtree_remove(struct bpf_map *map, void *elem)
+ * Description
+ * Remove *elem* from rbtree *map*.
+ *
+ * Return
+ * If success, returns a pointer to the removed node.
+ *
+ * If remove fails, returns NULL
+ *
+ * long bpf_rbtree_free_node(struct bpf_map *map, void *elem)
+ * Description
+ * Free rb_node that isn't associated w/ a tree.
+ *
+ * Return
+ * 0
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5564,6 +5620,11 @@ union bpf_attr {
FN(tcp_raw_check_syncookie_ipv4), \
FN(tcp_raw_check_syncookie_ipv6), \
FN(ktime_get_tai_ns), \
+ FN(rbtree_alloc_node), \
+ FN(rbtree_add), \
+ FN(rbtree_find), \
+ FN(rbtree_remove), \
+ FN(rbtree_free_node), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 05/18] libbpf: Add support for private BSS map section
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (3 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 04/18] bpf: Add rbtree map Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 06/18] bpf: Add bpf_spin_lock member to rbtree Dave Marchevsky
` (12 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Currently libbpf does not allow declaration of a struct bpf_spin_lock in
global scope. Attempting to do so results in "failed to re-mmap" error,
as .bss arraymap containing spinlock is not allowed to be mmap'd.
This patch adds support for a .bss.private section. The maps contained
in this section will not be mmaped into userspace by libbpf, nor will
they be exposed via bpftool-generated skeleton.
Intent here is to allow more natural programming pattern for
global-scope spinlocks which will be used by rbtree locking mechanism in
further patches in this series.
[
RFC Notes:
* Initially I called the section .bss.no_mmap, but the broader
'private' term better indicates that skeleton shouldn't expose these
maps at all, IMO.
* bpftool/gen.c's is_internal_mmapable_map function checks whether the
map flags have BPF_F_MMAPABLE, so no bpftool changes were necessary
to remove .bss.private maps from skeleton
]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
tools/lib/bpf/libbpf.c | 65 ++++++++++++++++++++++++++++--------------
1 file changed, 44 insertions(+), 21 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3f01f5cd8a4c..a6dd53e0c4b4 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -463,6 +463,7 @@ struct bpf_struct_ops {
#define KCONFIG_SEC ".kconfig"
#define KSYMS_SEC ".ksyms"
#define STRUCT_OPS_SEC ".struct_ops"
+#define BSS_SEC_PRIVATE ".bss.private"
enum libbpf_map_type {
LIBBPF_MAP_UNSPEC,
@@ -576,6 +577,7 @@ enum sec_type {
SEC_BSS,
SEC_DATA,
SEC_RODATA,
+ SEC_BSS_PRIVATE,
};
struct elf_sec_desc {
@@ -1578,7 +1580,8 @@ bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map);
static int
bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
- const char *real_name, int sec_idx, void *data, size_t data_sz)
+ const char *real_name, int sec_idx, void *data,
+ size_t data_sz, bool do_mmap)
{
struct bpf_map_def *def;
struct bpf_map *map;
@@ -1606,27 +1609,31 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
def->max_entries = 1;
def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
? BPF_F_RDONLY_PROG : 0;
- def->map_flags |= BPF_F_MMAPABLE;
+ if (do_mmap)
+ def->map_flags |= BPF_F_MMAPABLE;
pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
map->name, map->sec_idx, map->sec_offset, def->map_flags);
- map->mmaped = mmap(NULL, bpf_map_mmap_sz(map), PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_ANONYMOUS, -1, 0);
- if (map->mmaped == MAP_FAILED) {
- err = -errno;
- map->mmaped = NULL;
- pr_warn("failed to alloc map '%s' content buffer: %d\n",
- map->name, err);
- zfree(&map->real_name);
- zfree(&map->name);
- return err;
+ map->mmaped = NULL;
+ if (do_mmap) {
+ map->mmaped = mmap(NULL, bpf_map_mmap_sz(map), PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+ if (map->mmaped == MAP_FAILED) {
+ err = -errno;
+ map->mmaped = NULL;
+ pr_warn("failed to alloc map '%s' content buffer: %d\n",
+ map->name, err);
+ zfree(&map->real_name);
+ zfree(&map->name);
+ return err;
+ }
}
/* failures are fine because of maps like .rodata.str1.1 */
(void) bpf_map_find_btf_info(obj, map);
- if (data)
+ if (do_mmap && data)
memcpy(map->mmaped, data, data_sz);
pr_debug("map %td is \"%s\"\n", map - obj->maps, map->name);
@@ -1638,12 +1645,14 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
struct elf_sec_desc *sec_desc;
const char *sec_name;
int err = 0, sec_idx;
+ bool do_mmap;
/*
* Populate obj->maps with libbpf internal maps.
*/
for (sec_idx = 1; sec_idx < obj->efile.sec_cnt; sec_idx++) {
sec_desc = &obj->efile.secs[sec_idx];
+ do_mmap = true;
/* Skip recognized sections with size 0. */
if (sec_desc->data && sec_desc->data->d_size == 0)
@@ -1655,7 +1664,8 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
err = bpf_object__init_internal_map(obj, LIBBPF_MAP_DATA,
sec_name, sec_idx,
sec_desc->data->d_buf,
- sec_desc->data->d_size);
+ sec_desc->data->d_size,
+ do_mmap);
break;
case SEC_RODATA:
obj->has_rodata = true;
@@ -1663,14 +1673,18 @@ static int bpf_object__init_global_data_maps(struct bpf_object *obj)
err = bpf_object__init_internal_map(obj, LIBBPF_MAP_RODATA,
sec_name, sec_idx,
sec_desc->data->d_buf,
- sec_desc->data->d_size);
+ sec_desc->data->d_size,
+ do_mmap);
break;
+ case SEC_BSS_PRIVATE:
+ do_mmap = false;
case SEC_BSS:
sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
err = bpf_object__init_internal_map(obj, LIBBPF_MAP_BSS,
sec_name, sec_idx,
NULL,
- sec_desc->data->d_size);
+ sec_desc->data->d_size,
+ do_mmap);
break;
default:
/* skip */
@@ -1984,7 +1998,7 @@ static int bpf_object__init_kconfig_map(struct bpf_object *obj)
map_sz = last_ext->kcfg.data_off + last_ext->kcfg.sz;
err = bpf_object__init_internal_map(obj, LIBBPF_MAP_KCONFIG,
".kconfig", obj->efile.symbols_shndx,
- NULL, map_sz);
+ NULL, map_sz, true);
if (err)
return err;
@@ -3428,6 +3442,10 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
sec_desc->sec_type = SEC_BSS;
sec_desc->shdr = sh;
sec_desc->data = data;
+ } else if (sh->sh_type == SHT_NOBITS && strcmp(name, BSS_SEC_PRIVATE) == 0) {
+ sec_desc->sec_type = SEC_BSS_PRIVATE;
+ sec_desc->shdr = sh;
+ sec_desc->data = data;
} else {
pr_info("elf: skipping section(%d) %s (size %zu)\n", idx, name,
(size_t)sh->sh_size);
@@ -3890,6 +3908,7 @@ static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
case SEC_BSS:
case SEC_DATA:
case SEC_RODATA:
+ case SEC_BSS_PRIVATE:
return true;
default:
return false;
@@ -3909,6 +3928,7 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
return LIBBPF_MAP_KCONFIG;
switch (obj->efile.secs[shndx].sec_type) {
+ case SEC_BSS_PRIVATE:
case SEC_BSS:
return LIBBPF_MAP_BSS;
case SEC_DATA:
@@ -4889,16 +4909,19 @@ bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
{
enum libbpf_map_type map_type = map->libbpf_type;
char *cp, errmsg[STRERR_BUFSIZE];
- int err, zero = 0;
+ int err = 0, zero = 0;
if (obj->gen_loader) {
- bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
- map->mmaped, map->def.value_size);
+ if (map->mmaped)
+ bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
+ map->mmaped, map->def.value_size);
if (map_type == LIBBPF_MAP_RODATA || map_type == LIBBPF_MAP_KCONFIG)
bpf_gen__map_freeze(obj->gen_loader, map - obj->maps);
return 0;
}
- err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
+
+ if (map->mmaped)
+ err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
if (err) {
err = -errno;
cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 06/18] bpf: Add bpf_spin_lock member to rbtree
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (4 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 05/18] libbpf: Add support for private BSS map section Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 07/18] bpf: Add bpf_rbtree_{lock,unlock} helpers Dave Marchevsky
` (11 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
This patch adds a struct bpf_spin_lock *lock member to bpf_rbtree, as
well as a bpf_rbtree_get_lock helper which allows bpf programs to access
the lock.
Ideally the bpf_spin_lock would be created independently oustide of the
tree and associated with it before the tree is used, either as part of
map definition or via some call like rbtree_init(&rbtree, &lock). Doing
this in an ergonomic way is proving harder than expected, so for now use
this workaround.
Why is creating the bpf_spin_lock independently and associating it with
the tree preferable? Because we want to be able to transfer nodes
between trees atomically, and for this to work need same lock associated
with 2 trees.
Further locking-related patches will make it possible for the lock to be
used in BPF programs and add code which enforces that the lock is held
when doing any operation on the tree.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/uapi/linux/bpf.h | 7 +++++++
kernel/bpf/helpers.c | 3 +++
kernel/bpf/rbtree.c | 24 ++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 7 +++++++
4 files changed, 41 insertions(+)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1af17b27d34f..06d71207de0b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5409,6 +5409,12 @@ union bpf_attr {
* Return
* 0
*
+ * void *bpf_rbtree_get_lock(struct bpf_map *map)
+ * Description
+ * Return the bpf_spin_lock associated with the rbtree
+ *
+ * Return
+ * Ptr to lock
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5625,6 +5631,7 @@ union bpf_attr {
FN(rbtree_find), \
FN(rbtree_remove), \
FN(rbtree_free_node), \
+ FN(rbtree_get_lock), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index d18d4d8ca1e2..ae974d0aa70d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1603,6 +1603,7 @@ const struct bpf_func_proto bpf_rbtree_add_proto __weak;
const struct bpf_func_proto bpf_rbtree_find_proto __weak;
const struct bpf_func_proto bpf_rbtree_remove_proto __weak;
const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
+const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak;
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1704,6 +1705,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_rbtree_remove_proto;
case BPF_FUNC_rbtree_free_node:
return &bpf_rbtree_free_node_proto;
+ case BPF_FUNC_rbtree_get_lock:
+ return &bpf_rbtree_get_lock_proto;
default:
break;
}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 7d50574e4d57..0cc495b7cb26 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -10,6 +10,7 @@
struct bpf_rbtree {
struct bpf_map map;
struct rb_root_cached root;
+ struct bpf_spin_lock *lock;
};
static int rbtree_map_alloc_check(union bpf_attr *attr)
@@ -38,6 +39,14 @@ static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
tree->root = RB_ROOT_CACHED;
bpf_map_init_from_attr(&tree->map, attr);
+
+ tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!tree->lock) {
+ bpf_map_area_free(tree);
+ return ERR_PTR(-ENOMEM);
+ }
+
return &tree->map;
}
@@ -139,6 +148,7 @@ static void rbtree_map_free(struct bpf_map *map)
bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root)
kfree(pos);
+ kfree(tree->lock);
bpf_map_area_free(tree);
}
@@ -238,6 +248,20 @@ static int rbtree_map_get_next_key(struct bpf_map *map, void *key,
return -ENOTSUPP;
}
+BPF_CALL_1(bpf_rbtree_get_lock, struct bpf_map *, map)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ return (u64)tree->lock;
+}
+
+const struct bpf_func_proto bpf_rbtree_get_lock_proto = {
+ .func = bpf_rbtree_get_lock,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_MAP_VALUE,
+ .arg1_type = ARG_CONST_MAP_PTR,
+};
+
BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree)
const struct bpf_map_ops rbtree_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1af17b27d34f..06d71207de0b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5409,6 +5409,12 @@ union bpf_attr {
* Return
* 0
*
+ * void *bpf_rbtree_get_lock(struct bpf_map *map)
+ * Description
+ * Return the bpf_spin_lock associated with the rbtree
+ *
+ * Return
+ * Ptr to lock
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5625,6 +5631,7 @@ union bpf_attr {
FN(rbtree_find), \
FN(rbtree_remove), \
FN(rbtree_free_node), \
+ FN(rbtree_get_lock), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 07/18] bpf: Add bpf_rbtree_{lock,unlock} helpers
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (5 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 06/18] bpf: Add bpf_spin_lock member to rbtree Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 08/18] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Dave Marchevsky
` (10 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
These helpers are equivalent to bpf_spin_{lock,unlock}, but the verifier
doesn't try to enforce that no helper calls occur when there's an active
spin lock.
[ TODO: Currently the verifier doesn't do _anything_ spinlock related
when it sees one of these, including setting active_spin_lock. This is
probably too lenient. Also, EXPORT_SYMBOL for internal lock helpers
might not be the best code structure. ]
Future patches will add enforcement of "rbtree helpers must always be
called when lock is held" constraint.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/uapi/linux/bpf.h | 20 ++++++++++++++++++++
kernel/bpf/helpers.c | 12 ++++++++++--
kernel/bpf/rbtree.c | 29 +++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 2 ++
tools/include/uapi/linux/bpf.h | 20 ++++++++++++++++++++
5 files changed, 81 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 06d71207de0b..f4c615fbf64f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5415,6 +5415,24 @@ union bpf_attr {
*
* Return
* Ptr to lock
+ *
+ * void *bpf_rbtree_lock(struct bpf_spin_lock *lock)
+ * Description
+ * Like bpf_spin_lock helper, but use separate helper for now
+ * as we don't want this helper to have special meaning to the verifier
+ * so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ * Return
+ * 0
+ *
+ * void *bpf_rbtree_unlock(struct bpf_spin_lock *lock)
+ * Description
+ * Like bpf_spin_unlock helper, but use separate helper for now
+ * as we don't want this helper to have special meaning to the verifier
+ * so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ * Return
+ * 0
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5632,6 +5650,8 @@ union bpf_attr {
FN(rbtree_remove), \
FN(rbtree_free_node), \
FN(rbtree_get_lock), \
+ FN(rbtree_lock), \
+ FN(rbtree_unlock), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index ae974d0aa70d..0ca5fed1013b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -316,7 +316,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
static DEFINE_PER_CPU(unsigned long, irqsave_flags);
-static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
+inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
{
unsigned long flags;
@@ -324,6 +324,7 @@ static inline void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock)
__bpf_spin_lock(lock);
__this_cpu_write(irqsave_flags, flags);
}
+EXPORT_SYMBOL(__bpf_spin_lock_irqsave);
notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
{
@@ -338,7 +339,7 @@ const struct bpf_func_proto bpf_spin_lock_proto = {
.arg1_type = ARG_PTR_TO_SPIN_LOCK,
};
-static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
+inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
{
unsigned long flags;
@@ -346,6 +347,7 @@ static inline void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock)
__bpf_spin_unlock(lock);
local_irq_restore(flags);
}
+EXPORT_SYMBOL(__bpf_spin_unlock_irqrestore);
notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
{
@@ -1604,6 +1606,8 @@ const struct bpf_func_proto bpf_rbtree_find_proto __weak;
const struct bpf_func_proto bpf_rbtree_remove_proto __weak;
const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak;
+const struct bpf_func_proto bpf_rbtree_lock_proto __weak;
+const struct bpf_func_proto bpf_rbtree_unlock_proto __weak;
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1707,6 +1711,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_rbtree_free_node_proto;
case BPF_FUNC_rbtree_get_lock:
return &bpf_rbtree_get_lock_proto;
+ case BPF_FUNC_rbtree_lock:
+ return &bpf_rbtree_lock_proto;
+ case BPF_FUNC_rbtree_unlock:
+ return &bpf_rbtree_unlock_proto;
default:
break;
}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 0cc495b7cb26..641821ee1a7f 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -262,6 +262,35 @@ const struct bpf_func_proto bpf_rbtree_get_lock_proto = {
.arg1_type = ARG_CONST_MAP_PTR,
};
+extern void __bpf_spin_unlock_irqrestore(struct bpf_spin_lock *lock);
+extern void __bpf_spin_lock_irqsave(struct bpf_spin_lock *lock);
+
+BPF_CALL_1(bpf_rbtree_lock, void *, lock)
+{
+ __bpf_spin_lock_irqsave((struct bpf_spin_lock *)lock);
+ return 0;
+}
+
+const struct bpf_func_proto bpf_rbtree_lock_proto = {
+ .func = bpf_rbtree_lock,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_SPIN_LOCK,
+};
+
+BPF_CALL_1(bpf_rbtree_unlock, void *, lock)
+{
+ __bpf_spin_unlock_irqrestore((struct bpf_spin_lock *)lock);
+ return 0;
+}
+
+const struct bpf_func_proto bpf_rbtree_unlock_proto = {
+ .func = bpf_rbtree_unlock,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_SPIN_LOCK,
+};
+
BTF_ID_LIST_SINGLE(bpf_rbtree_map_btf_ids, struct, bpf_rbtree)
const struct bpf_map_ops rbtree_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0a2e958ddca8..b9e5d87fe323 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6057,6 +6057,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
} else if (meta->func_id == BPF_FUNC_spin_unlock) {
if (process_spin_lock(env, regno, false))
return -EACCES;
+ } else if (meta->func_id == BPF_FUNC_rbtree_lock ||
+ meta->func_id == BPF_FUNC_rbtree_unlock) { // Do nothing for now
} else {
verbose(env, "verifier internal error\n");
return -EFAULT;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 06d71207de0b..f4c615fbf64f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5415,6 +5415,24 @@ union bpf_attr {
*
* Return
* Ptr to lock
+ *
+ * void *bpf_rbtree_lock(struct bpf_spin_lock *lock)
+ * Description
+ * Like bpf_spin_lock helper, but use separate helper for now
+ * as we don't want this helper to have special meaning to the verifier
+ * so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ * Return
+ * 0
+ *
+ * void *bpf_rbtree_unlock(struct bpf_spin_lock *lock)
+ * Description
+ * Like bpf_spin_unlock helper, but use separate helper for now
+ * as we don't want this helper to have special meaning to the verifier
+ * so that we can do rbtree helper calls between rbtree_lock/unlock
+ *
+ * Return
+ * 0
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5632,6 +5650,8 @@ union bpf_attr {
FN(rbtree_remove), \
FN(rbtree_free_node), \
FN(rbtree_get_lock), \
+ FN(rbtree_lock), \
+ FN(rbtree_unlock), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 08/18] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find}
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (6 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 07/18] bpf: Add bpf_rbtree_{lock,unlock} helpers Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 09/18] bpf: Support declarative association of lock with rbtree map Dave Marchevsky
` (9 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
The bpf program calling these helpers must hold the spinlock associated
with the rbtree map when doing so. Otherwise, a concurrent add/remove
operation could corrupt the tree while {add,remove,find} are walking it
with callback or pivoting after update.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
kernel/bpf/rbtree.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 641821ee1a7f..85a1d35818d0 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -13,6 +13,11 @@ struct bpf_rbtree {
struct bpf_spin_lock *lock;
};
+static bool __rbtree_lock_held(struct bpf_rbtree *tree)
+{
+ return spin_is_locked((spinlock_t *)tree->lock);
+}
+
static int rbtree_map_alloc_check(union bpf_attr *attr)
{
if (attr->max_entries || !attr->btf_value_type_id)
@@ -92,6 +97,9 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
struct rb_node *node = (struct rb_node *)value;
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
if (WARN_ON_ONCE(!RB_EMPTY_NODE(node)))
return (u64)NULL;
@@ -114,6 +122,9 @@ BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
{
struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
return (u64)rb_find(key, &tree->root.rb_root,
(int (*)(const void *key,
const struct rb_node *))cb);
@@ -206,6 +217,9 @@ BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value)
struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
struct rb_node *node = (struct rb_node *)value;
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
if (WARN_ON_ONCE(RB_EMPTY_NODE(node)))
return (u64)NULL;
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 09/18] bpf: Support declarative association of lock with rbtree map
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (7 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 08/18] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 10/18] bpf: Verifier tracking of rbtree_spin_lock held Dave Marchevsky
` (8 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
This patch adds support for association of a bpf_spin_lock in an
internal arraymap with an rbtree, using the following pattern:
struct bpf_spin_lock rbtree_lock SEC(".bss.private");
struct {
__uint(type, BPF_MAP_TYPE_RBTREE);
__type(value, struct node_data);
__array(lock, struct bpf_spin_lock);
} rbtree SEC(".maps") = {
.lock = {
[0] = &rbtree_lock,
},
};
There are a few benefits of this pattern over existing "init lock as
part of map, use bpf_rbtree_get_lock" logic:
* Multiple rbtrees, potentially in different compilation units, may
share the same lock
* Lock lifetime does not need to be tied to map lifetime, aside from
being strictly greater than map's lifetime
* Can move from bpf_rbtree_{lock,unlock} to more generic
bpf_{lock,unlock} helpers while still retaining static verification
of locking behavior
* struct bpf_rbtree still keeps lock pointer and this declarative
association guarantees that the pointer address is known at rbtree
map creation time
Implementation notes:
The mechanics of declarative lock association are heavily inspired by
declarative map-in-map inner map definition using __array(values,... .
Similarly to map-in-map "values", libbpf's bpf_map init_slots is used to
hold the target map, which in the above example is the .bss.private
internal arraymap. However, unlike "values" inner maps, the lock is a
map value within this map, so it's necessary to also pass the offset of
the lock symbol within the internal arraymap.
No uapi changes are necessary as existing BPF_MAP_CREATE map_extra is
used to pass the pair (fd of lock map, offset of symbol within lock map)
to the kernel when creating the rbtree map. rbtree_map_alloc can then
use this pair to find the address of the lock. Logic here is equivalent
to verifier's rewrite of BPF_PSEUDO_MAP_FD in check_ld_imm and
resolve_pseudo_ldimm64 - get actual struct bpf_map using fd, get address
of map's value region using map_direct_value_addr, add offset to get
actual lock addr.
This does introduce a dependency on the internal map containing the lock
("lock map") on both the libbpf- and kernel-side. For the former, it's
now necessary to init internal global data maps before user btf maps, as
the fd of the lock map must be known when data relo on rbtree map is
performed. For the latter, rbtree now holds refcount to the lock map to
ensure it cannot be freed until after the rbtree map is.
[
RFC NOTE: This patch doesn't change verifier behavior, it's still doing
dynamic lock checking. Further patches in the series change this
behavior. This patch and the rest of the locking patches should be
squashed, leaving in this state for now to make it easier to include
or toss things piecemeal after feedback
]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
kernel/bpf/rbtree.c | 47 +++++++++++++++++---
kernel/bpf/syscall.c | 8 +++-
tools/lib/bpf/libbpf.c | 99 +++++++++++++++++++++++++++++++++++++-----
3 files changed, 136 insertions(+), 18 deletions(-)
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 85a1d35818d0..c61662822511 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -11,6 +11,7 @@ struct bpf_rbtree {
struct bpf_map map;
struct rb_root_cached root;
struct bpf_spin_lock *lock;
+ struct bpf_map *lock_map;
};
static bool __rbtree_lock_held(struct bpf_rbtree *tree)
@@ -26,10 +27,22 @@ static int rbtree_map_alloc_check(union bpf_attr *attr)
return 0;
}
+static void __rbtree_map_free(struct bpf_rbtree *tree)
+{
+ if (tree->lock_map)
+ bpf_map_put(tree->lock_map);
+ else if (tree->lock)
+ kfree(tree->lock);
+ bpf_map_area_free(tree);
+}
+
static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
{
+ u32 lock_map_ufd, lock_map_offset;
struct bpf_rbtree *tree;
+ u64 lock_map_addr;
int numa_node;
+ int err;
if (!bpf_capable())
return ERR_PTR(-EPERM);
@@ -45,14 +58,35 @@ static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
tree->root = RB_ROOT_CACHED;
bpf_map_init_from_attr(&tree->map, attr);
- tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
- GFP_KERNEL | __GFP_NOWARN);
- if (!tree->lock) {
- bpf_map_area_free(tree);
- return ERR_PTR(-ENOMEM);
+ if (!attr->map_extra) {
+ tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!tree->lock) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+ } else {
+ lock_map_ufd = (u32)(attr->map_extra >> 32);
+ lock_map_offset = (u32)attr->map_extra;
+ tree->lock_map = bpf_map_get(lock_map_ufd);
+ if (IS_ERR(tree->lock_map) || !tree->lock_map->ops->map_direct_value_addr) {
+ err = PTR_ERR(tree->lock_map);
+ tree->lock_map = NULL;
+ goto err_free;
+ }
+
+ err = tree->lock_map->ops->map_direct_value_addr(tree->lock_map, &lock_map_addr,
+ lock_map_offset);
+ if (err)
+ goto err_free;
+
+ tree->lock = (struct bpf_spin_lock *)(lock_map_addr + lock_map_offset);
}
return &tree->map;
+err_free:
+ __rbtree_map_free(tree);
+ return ERR_PTR(err);
}
static struct rb_node *rbtree_map_alloc_node(struct bpf_map *map, size_t sz)
@@ -159,8 +193,7 @@ static void rbtree_map_free(struct bpf_map *map)
bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root)
kfree(pos);
- kfree(tree->lock);
- bpf_map_area_free(tree);
+ __rbtree_map_free(tree);
}
static int rbtree_map_check_btf(const struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3947fbd137af..fa1220394462 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1066,6 +1066,12 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
return ret;
}
+static bool map_uses_map_extra(enum bpf_map_type type)
+{
+ return type == BPF_MAP_TYPE_BLOOM_FILTER ||
+ type == BPF_MAP_TYPE_RBTREE;
+}
+
#define BPF_MAP_CREATE_LAST_FIELD map_extra
/* called via syscall */
static int map_create(union bpf_attr *attr)
@@ -1087,7 +1093,7 @@ static int map_create(union bpf_attr *attr)
return -EINVAL;
}
- if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
+ if (!map_uses_map_extra(attr->map_type) &&
attr->map_extra != 0)
return -EINVAL;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a6dd53e0c4b4..10c840137bac 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1429,6 +1429,11 @@ bpf_object__init_kversion(struct bpf_object *obj, void *data, size_t size)
return 0;
}
+static bool bpf_map_type__uses_lock_def(enum bpf_map_type type)
+{
+ return type == BPF_MAP_TYPE_RBTREE;
+}
+
static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
{
if (type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
@@ -1517,6 +1522,16 @@ static size_t bpf_map_mmap_sz(const struct bpf_map *map)
return map_sz;
}
+static bool internal_map_in_custom_section(const char *real_name)
+{
+ if (strchr(real_name + 1, '.') != NULL) {
+ if (strcmp(real_name, BSS_SEC_PRIVATE) == 0)
+ return false;
+ return true;
+ }
+ return false;
+}
+
static char *internal_map_name(struct bpf_object *obj, const char *real_name)
{
char map_name[BPF_OBJ_NAME_LEN], *p;
@@ -1559,7 +1574,7 @@ static char *internal_map_name(struct bpf_object *obj, const char *real_name)
sfx_len = BPF_OBJ_NAME_LEN - 1;
/* if there are two or more dots in map name, it's a custom dot map */
- if (strchr(real_name + 1, '.') != NULL)
+ if (internal_map_in_custom_section(real_name))
pfx_len = 0;
else
pfx_len = min((size_t)BPF_OBJ_NAME_LEN - sfx_len - 1, strlen(obj->name));
@@ -2331,10 +2346,23 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
} else if (strcmp(name, "map_extra") == 0) {
__u32 map_extra;
+ if (bpf_map_type__uses_lock_def(map_def->map_type)) {
+ pr_warn("map '%s': can't set map_extra for map using 'lock' def.\n",
+ map_name);
+ return -EINVAL;
+ }
+
if (!get_map_field_int(map_name, btf, m, &map_extra))
return -EINVAL;
map_def->map_extra = map_extra;
map_def->parts |= MAP_DEF_MAP_EXTRA;
+ } else if (strcmp(name, "lock") == 0) {
+ if (!bpf_map_type__uses_lock_def(map_def->map_type)) {
+ pr_warn("map '%s': can't set 'lock' for map.\n", map_name);
+ return -ENOTSUP;
+ }
+ /* TODO: More sanity checking
+ */
} else {
if (strict) {
pr_warn("map '%s': unknown field '%s'.\n", map_name, name);
@@ -2603,8 +2631,8 @@ static int bpf_object__init_maps(struct bpf_object *obj,
strict = !OPTS_GET(opts, relaxed_maps, false);
pin_root_path = OPTS_GET(opts, pin_root_path, NULL);
- err = err ?: bpf_object__init_user_btf_maps(obj, strict, pin_root_path);
err = err ?: bpf_object__init_global_data_maps(obj);
+ err = err ?: bpf_object__init_user_btf_maps(obj, strict, pin_root_path);
err = err ?: bpf_object__init_kconfig_map(obj);
err = err ?: bpf_object__init_struct_ops_maps(obj);
@@ -4865,6 +4893,25 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
map_info.map_extra == map->map_extra);
}
+static struct bpf_map *find_internal_map_by_shndx(struct bpf_object *obj,
+ int shndx)
+{
+ struct bpf_map *map;
+ int i;
+
+ for (i = 0; i < obj->nr_maps; i++) {
+ map = &obj->maps[i];
+
+ if (!bpf_map__is_internal(map))
+ continue;
+
+ if (map->sec_idx == shndx)
+ return map;
+ }
+
+ return NULL;
+}
+
static int
bpf_object__reuse_map(struct bpf_map *map)
{
@@ -4981,6 +5028,19 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
}
if (map->inner_map_fd >= 0)
create_attr.inner_map_fd = map->inner_map_fd;
+ } else if (bpf_map_type__uses_lock_def(def->type)) {
+ if (map->init_slots_sz != 1) {
+ pr_warn("map '%s': expecting single lock def, actual count %d\n",
+ map->name, map->init_slots_sz);
+ return -EINVAL;
+ }
+
+ if (bpf_map__fd(map->init_slots[0]) < 0) {
+ pr_warn("map '%s': failed to find lock map fd\n", map->name);
+ return -EINVAL;
+ }
+
+ create_attr.map_extra |= (__u64)bpf_map__fd(map->init_slots[0]) << 32;
}
switch (def->type) {
@@ -5229,8 +5289,7 @@ bpf_object__create_maps(struct bpf_object *obj)
goto err_out;
}
}
-
- if (map->init_slots_sz && map->def.type != BPF_MAP_TYPE_PROG_ARRAY) {
+ if (map->init_slots_sz && bpf_map_type__is_map_in_map(map->def.type)) {
err = init_map_in_map_slots(obj, map);
if (err < 0) {
zclose(map->fd);
@@ -6424,9 +6483,9 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
const struct btf_type *sec, *var, *def;
struct bpf_map *map = NULL, *targ_map = NULL;
struct bpf_program *targ_prog = NULL;
- bool is_prog_array, is_map_in_map;
const struct btf_member *member;
const char *name, *mname, *type;
+ bool is_prog_array;
unsigned int moff;
Elf64_Sym *sym;
Elf64_Rel *rel;
@@ -6474,10 +6533,12 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
return -EINVAL;
}
- is_map_in_map = bpf_map_type__is_map_in_map(map->def.type);
+ /* PROG_ARRAY passes prog pointers using init_slots, other map
+ * types pass map pointers
+ */
is_prog_array = map->def.type == BPF_MAP_TYPE_PROG_ARRAY;
- type = is_map_in_map ? "map" : "prog";
- if (is_map_in_map) {
+ type = is_prog_array ? "prog" : "map";
+ if (bpf_map_type__is_map_in_map(map->def.type)) {
if (sym->st_shndx != obj->efile.btf_maps_shndx) {
pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n",
i, name);
@@ -6509,6 +6570,24 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
i, name);
return -LIBBPF_ERRNO__RELOC;
}
+ } else if (bpf_map_type__uses_lock_def(map->def.type)) {
+ targ_map = find_internal_map_by_shndx(obj, sym->st_shndx);
+ if (!targ_map) {
+ pr_warn(".maps relo #%d: '%s' isn't a valid map reference\n",
+ i, name);
+ return -LIBBPF_ERRNO__RELOC;
+ }
+
+ /* This shouldn't happen, check in parse_btf_map_def
+ * should catch this, but to be safe let's prevent
+ * map_extra overwrite
+ */
+ if (map->map_extra) {
+ pr_warn(".maps rbtree relo #%d: map '%s' has ", i, map->name);
+ pr_warn("map_extra, can't relo lock, internal error.\n");
+ return -EINVAL;
+ }
+ map->map_extra = sym->st_value;
} else {
return -EINVAL;
}
@@ -6519,7 +6598,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
return -EINVAL;
member = btf_members(def) + btf_vlen(def) - 1;
mname = btf__name_by_offset(obj->btf, member->name_off);
- if (strcmp(mname, "values"))
+ if (strcmp(mname, "values") && strcmp(mname, "lock"))
return -EINVAL;
moff = btf_member_bit_offset(def, btf_vlen(def) - 1) / 8;
@@ -6543,7 +6622,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
(new_sz - map->init_slots_sz) * host_ptr_sz);
map->init_slots_sz = new_sz;
}
- map->init_slots[moff] = is_map_in_map ? (void *)targ_map : (void *)targ_prog;
+ map->init_slots[moff] = is_prog_array ? (void *)targ_prog : (void *)targ_map;
pr_debug(".maps relo #%d: map '%s' slot [%d] points to %s '%s'\n",
i, map->name, moff, type, name);
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 10/18] bpf: Verifier tracking of rbtree_spin_lock held
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (8 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 09/18] bpf: Support declarative association of lock with rbtree map Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 11/18] bpf: Check rbtree lock held during verification Dave Marchevsky
` (7 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
This patch teaches the verifier that rbtree_{lock,unlock} interact with
locks and eases use of rbtree_lock(&some_lock) where &some_lock is in a
global internal map.
The verifier now tracks lock id for rbtree_{lock,unlock} and understands
that lock can / must be held when calling various other rbtree helpers.
Logic is also added to ease this pattern:
/* internal map */
struct bpf_spin_lock lock SEC(".bss.private");
/* In bpf prog */
rbtree_lock(&lock);
/* ... use all registers for other work */
rbtree_unlock(&lock);
In the above example, the prog compiles down to something like:
r1 = 0x12345
call rbtree_lock
/* begin other work
r1 = some_unrelated_value
r2 = r1 or similar never happens
*/
r1 = 0x12345
call rbtree_unlock
Each time "r1 = 0x12345" happens, verifier's check_ld_imm will assign a
new id to the lock, which will result in it later complaining that
the incorrect lock is being unlocked when checking rbtree_unlock.
To help with this pattern, bpf_verifier_state now has a
maybe_active_spin_lock_addr field. If this field is nonzero and
bpf_verifier_state's active_spin_lock is also nonzero, then
maybe_active_spin_lock_addr contains the address of the active spin lock
(corresponding to active_spin_lock's id). This allows the verifier to
avoid assigning a new lock id when it sees the second "r1 = 0x12345",
since it can recognize that the address matches an existing lock id.
[ RFC Notes:
* rbtree_process_spin_lock should be merged w/ normal
process_spin_lock, same with rbtree_lock and normal lock helpers.
Left separate for now to highlight the logic differences.
* The hacky maybe_active_spin_lock_addr logic can be improved by
adding support to a custom .lock section similar to existing use of
.bss.private. The new section type would function like .bss.private,
but the verifier would know that locks in .lock are likely to be
used like bpf_spin_lock(&lock), and could track the address of each
map value for deduping, instead of just tracking single address. For
multiple-lock scenario this is probably necessary.
]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf.h | 2 +
include/linux/bpf_verifier.h | 1 +
kernel/bpf/rbtree.c | 2 +-
kernel/bpf/verifier.c | 136 ++++++++++++++++++++++++++++++++---
4 files changed, 129 insertions(+), 12 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b4a44ffb0d6c..d6458aa7b79c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -497,6 +497,7 @@ enum bpf_return_type {
RET_PTR_TO_ALLOC_MEM, /* returns a pointer to dynamically allocated memory */
RET_PTR_TO_MEM_OR_BTF_ID, /* returns a pointer to a valid memory or a btf_id */
RET_PTR_TO_BTF_ID, /* returns a pointer to a btf_id */
+ RET_PTR_TO_SPIN_LOCK, /* returns a pointer to a struct bpf_spin_lock */
__BPF_RET_TYPE_MAX,
/* Extended ret_types. */
@@ -612,6 +613,7 @@ enum bpf_reg_type {
PTR_TO_MEM, /* reg points to valid memory region */
PTR_TO_BUF, /* reg points to a read/write buffer */
PTR_TO_FUNC, /* reg points to a bpf program function */
+ PTR_TO_SPIN_LOCK, /* reg points to a struct bpf_spin_lock */
__BPF_REG_TYPE_MAX,
/* Extended reg_types. */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 9c017575c034..f81638844a4d 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -313,6 +313,7 @@ struct bpf_verifier_state {
u32 insn_idx;
u32 curframe;
u32 active_spin_lock;
+ void *maybe_active_spin_lock_addr;
bool speculative;
/* first and last insn idx of this verifier state */
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index c61662822511..0821e841a518 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -305,7 +305,7 @@ BPF_CALL_1(bpf_rbtree_get_lock, struct bpf_map *, map)
const struct bpf_func_proto bpf_rbtree_get_lock_proto = {
.func = bpf_rbtree_get_lock,
.gpl_only = true,
- .ret_type = RET_PTR_TO_MAP_VALUE,
+ .ret_type = RET_PTR_TO_SPIN_LOCK,
.arg1_type = ARG_CONST_MAP_PTR,
};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b9e5d87fe323..f8ba381f1327 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -452,8 +452,9 @@ static bool reg_type_not_null(enum bpf_reg_type type)
static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
{
- return reg->type == PTR_TO_MAP_VALUE &&
- map_value_has_spin_lock(reg->map_ptr);
+ return (reg->type == PTR_TO_MAP_VALUE &&
+ map_value_has_spin_lock(reg->map_ptr)) ||
+ reg->type == PTR_TO_SPIN_LOCK;
}
static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type)
@@ -507,6 +508,34 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
func_id == BPF_FUNC_skc_to_tcp_request_sock;
}
+/* These functions can only be called when spinlock associated with rbtree
+ * is held. If they have a callback argument, that callback is not required
+ * to release active_spin_lock before exiting
+ */
+static bool is_rbtree_lock_required_function(enum bpf_func_id func_id)
+{
+ return func_id == BPF_FUNC_rbtree_add ||
+ func_id == BPF_FUNC_rbtree_remove ||
+ func_id == BPF_FUNC_rbtree_find ||
+ func_id == BPF_FUNC_rbtree_unlock;
+}
+
+/* These functions are OK to call when spinlock associated with rbtree
+ * is held.
+ */
+static bool is_rbtree_lock_ok_function(enum bpf_func_id func_id)
+{
+ return func_id == BPF_FUNC_rbtree_alloc_node ||
+ func_id == BPF_FUNC_rbtree_free_node ||
+ is_rbtree_lock_required_function(func_id);
+}
+
+static bool is_lock_allowed_function(enum bpf_func_id func_id)
+{
+ return func_id == BPF_FUNC_spin_unlock ||
+ is_rbtree_lock_ok_function(func_id);
+}
+
static bool is_dynptr_ref_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_dynptr_data;
@@ -579,6 +608,7 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
[PTR_TO_BUF] = "buf",
[PTR_TO_FUNC] = "func",
[PTR_TO_MAP_KEY] = "map_key",
+ [PTR_TO_SPIN_LOCK] = "spin_lock",
};
if (type & PTR_MAYBE_NULL) {
@@ -1199,6 +1229,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
dst_state->speculative = src->speculative;
dst_state->curframe = src->curframe;
dst_state->active_spin_lock = src->active_spin_lock;
+ dst_state->maybe_active_spin_lock_addr = src->maybe_active_spin_lock_addr;
dst_state->branches = src->branches;
dst_state->parent = src->parent;
dst_state->first_insn_idx = src->first_insn_idx;
@@ -5471,6 +5502,35 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
return -EINVAL;
}
cur->active_spin_lock = 0;
+ cur->maybe_active_spin_lock_addr = 0;
+ }
+ return 0;
+}
+
+static int rbtree_process_spin_lock(struct bpf_verifier_env *env, int regno,
+ bool is_lock)
+{
+ struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno];
+ struct bpf_verifier_state *cur = env->cur_state;
+
+ if (is_lock) {
+ if (cur->active_spin_lock) {
+ verbose(env,
+ "Locking two bpf_spin_locks are not allowed\n");
+ return -EINVAL;
+ }
+ cur->active_spin_lock = reg->id;
+ } else {
+ if (!cur->active_spin_lock) {
+ verbose(env, "rbtree_spin_unlock without taking a lock\n");
+ return -EINVAL;
+ }
+ if (cur->active_spin_lock != reg->id) {
+ verbose(env, "rbtree_spin_unlock of different lock\n");
+ return -EINVAL;
+ }
+ cur->active_spin_lock = 0;
+ cur->maybe_active_spin_lock_addr = 0;
}
return 0;
}
@@ -5686,12 +5746,18 @@ static const struct bpf_reg_types int_ptr_types = {
},
};
+static const struct bpf_reg_types spin_lock_types = {
+ .types = {
+ PTR_TO_MAP_VALUE,
+ PTR_TO_SPIN_LOCK
+ },
+};
+
static const struct bpf_reg_types fullsock_types = { .types = { PTR_TO_SOCKET } };
static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
@@ -6057,8 +6123,12 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
} else if (meta->func_id == BPF_FUNC_spin_unlock) {
if (process_spin_lock(env, regno, false))
return -EACCES;
- } else if (meta->func_id == BPF_FUNC_rbtree_lock ||
- meta->func_id == BPF_FUNC_rbtree_unlock) { // Do nothing for now
+ } else if (meta->func_id == BPF_FUNC_rbtree_lock) {
+ if (rbtree_process_spin_lock(env, regno, true))
+ return -EACCES;
+ } else if (meta->func_id == BPF_FUNC_rbtree_unlock) {
+ if (rbtree_process_spin_lock(env, regno, false))
+ return -EACCES;
} else {
verbose(env, "verifier internal error\n");
return -EFAULT;
@@ -6993,6 +7063,29 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
return 0;
}
+/* Are we currently verifying the callback for a rbtree helper that must
+ * be called with lock held? If so, no need to complain about unreleased
+ * lock
+ */
+static bool in_rbtree_lock_required_cb(struct bpf_verifier_env *env)
+{
+ struct bpf_verifier_state *state = env->cur_state;
+ struct bpf_insn *insn = env->prog->insnsi;
+ struct bpf_func_state *callee;
+ int func_id;
+
+ if (!state->curframe)
+ return false;
+
+ callee = state->frame[state->curframe];
+
+ if (!callee->in_callback_fn)
+ return false;
+
+ func_id = insn[callee->callsite].imm;
+ return is_rbtree_lock_required_function(func_id);
+}
+
static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
{
struct bpf_verifier_state *state = env->cur_state;
@@ -7508,6 +7601,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
regs[BPF_REG_0].id = ++env->id_gen;
}
break;
+ case RET_PTR_TO_SPIN_LOCK:
+ mark_reg_known_zero(env, regs, BPF_REG_0);
+ regs[BPF_REG_0].type = PTR_TO_SPIN_LOCK | ret_flag;
+ regs[BPF_REG_0].id = ++env->id_gen;
+ break;
case RET_PTR_TO_SOCKET:
mark_reg_known_zero(env, regs, BPF_REG_0);
regs[BPF_REG_0].type = PTR_TO_SOCKET | ret_flag;
@@ -10366,6 +10464,20 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
return 0;
}
+static unsigned int ld_imm_lock_id_gen(struct bpf_verifier_env *env,
+ void *imm)
+{
+ struct bpf_verifier_state *cur = env->cur_state;
+
+ if (cur->active_spin_lock && cur->maybe_active_spin_lock_addr &&
+ cur->maybe_active_spin_lock_addr == imm)
+ return cur->active_spin_lock;
+
+ if (!cur->active_spin_lock)
+ cur->maybe_active_spin_lock_addr = imm;
+ return ++env->id_gen;
+}
+
/* verify BPF_LD_IMM64 instruction */
static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
{
@@ -10373,6 +10485,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
struct bpf_reg_state *regs = cur_regs(env);
struct bpf_reg_state *dst_reg;
struct bpf_map *map;
+ u64 imm;
int err;
if (BPF_SIZE(insn->code) != BPF_DW) {
@@ -10390,7 +10503,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
dst_reg = ®s[insn->dst_reg];
if (insn->src_reg == 0) {
- u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
+ imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
dst_reg->type = SCALAR_VALUE;
__mark_reg_known(®s[insn->dst_reg], imm);
@@ -10441,13 +10554,14 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
map = env->used_maps[aux->map_index];
dst_reg->map_ptr = map;
-
if (insn->src_reg == BPF_PSEUDO_MAP_VALUE ||
insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) {
dst_reg->type = PTR_TO_MAP_VALUE;
dst_reg->off = aux->map_off;
- if (map_value_has_spin_lock(map))
- dst_reg->id = ++env->id_gen;
+ if (map_value_has_spin_lock(map)) {
+ imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
+ dst_reg->id = ld_imm_lock_id_gen(env, (void *)imm);
+ }
} else if (insn->src_reg == BPF_PSEUDO_MAP_FD ||
insn->src_reg == BPF_PSEUDO_MAP_IDX) {
dst_reg->type = CONST_PTR_TO_MAP;
@@ -12432,7 +12546,7 @@ static int do_check(struct bpf_verifier_env *env)
if (env->cur_state->active_spin_lock &&
(insn->src_reg == BPF_PSEUDO_CALL ||
- insn->imm != BPF_FUNC_spin_unlock)) {
+ !is_lock_allowed_function(insn->imm))) {
verbose(env, "function calls are not allowed while holding a lock\n");
return -EINVAL;
}
@@ -12467,7 +12581,7 @@ static int do_check(struct bpf_verifier_env *env)
return -EINVAL;
}
- if (env->cur_state->active_spin_lock) {
+ if (state->active_spin_lock && !in_rbtree_lock_required_cb(env)) {
verbose(env, "bpf_spin_unlock is missing\n");
return -EINVAL;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 11/18] bpf: Check rbtree lock held during verification
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (9 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 10/18] bpf: Verifier tracking of rbtree_spin_lock held Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 12/18] bpf: Add OBJ_NON_OWNING_REF type flag Dave Marchevsky
` (6 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
This patch builds on the previous few locking patches, teaching the
verifier to check whether rbtree's lock is held.
[ RFC Notes:
* After this patch, could change helpers which only can fail due to
dynamic lock check to always succeed, likely allowing us to get rid
of CONDITIONAL_RELEASE logic. But since dynamic lock checking is
almost certainly going to be necessary regardless, didn't do so.
]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf.h | 2 ++
kernel/bpf/rbtree.c | 8 ++++++++
kernel/bpf/verifier.c | 43 +++++++++++++++++++++++++++++++++++++++----
3 files changed, 49 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d6458aa7b79c..b762c6b3dcfb 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -155,6 +155,8 @@ struct bpf_map_ops {
bpf_callback_t callback_fn,
void *callback_ctx, u64 flags);
+ bool (*map_lock_held)(struct bpf_map *map, void *current_lock);
+
/* BTF id of struct allocated by map_alloc */
int *map_btf_id;
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 0821e841a518..b5d158254de6 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -245,6 +245,13 @@ static int rbtree_map_delete_elem(struct bpf_map *map, void *value)
return -ENOTSUPP;
}
+static bool rbtree_map_lock_held(struct bpf_map *map, void *current_lock)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ return tree->lock == current_lock;
+}
+
BPF_CALL_2(bpf_rbtree_remove, struct bpf_map *, map, void *, value)
{
struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
@@ -353,4 +360,5 @@ const struct bpf_map_ops rbtree_map_ops = {
.map_delete_elem = rbtree_map_delete_elem,
.map_check_btf = rbtree_map_check_btf,
.map_btf_id = &bpf_rbtree_map_btf_ids[0],
+ .map_lock_held = rbtree_map_lock_held,
};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f8ba381f1327..3c9af1047d80 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -508,16 +508,25 @@ static bool is_ptr_cast_function(enum bpf_func_id func_id)
func_id == BPF_FUNC_skc_to_tcp_request_sock;
}
+/* These functions can only be called when spinlock associated with rbtree
+ * is held. They all take struct bpf_map ptr to rbtree as their first argument,
+ * so we can verify that the correct lock is held before loading program
+ */
+static bool is_rbtree_lock_check_function(enum bpf_func_id func_id)
+{
+ return func_id == BPF_FUNC_rbtree_add ||
+ func_id == BPF_FUNC_rbtree_remove ||
+ func_id == BPF_FUNC_rbtree_find;
+}
+
/* These functions can only be called when spinlock associated with rbtree
* is held. If they have a callback argument, that callback is not required
* to release active_spin_lock before exiting
*/
static bool is_rbtree_lock_required_function(enum bpf_func_id func_id)
{
- return func_id == BPF_FUNC_rbtree_add ||
- func_id == BPF_FUNC_rbtree_remove ||
- func_id == BPF_FUNC_rbtree_find ||
- func_id == BPF_FUNC_rbtree_unlock;
+ return func_id == BPF_FUNC_rbtree_unlock ||
+ is_rbtree_lock_check_function(func_id);
}
/* These functions are OK to call when spinlock associated with rbtree
@@ -7354,6 +7363,26 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno
state->callback_subprogno == subprogno);
}
+static int check_rbtree_lock_held(struct bpf_verifier_env *env,
+ struct bpf_map *map)
+{
+ struct bpf_verifier_state *cur = env->cur_state;
+
+ if (!map)
+ return -1;
+
+ if (!cur->active_spin_lock || !cur->maybe_active_spin_lock_addr ||
+ !map || !map->ops->map_lock_held)
+ return -1;
+
+ if (!map->ops->map_lock_held(map, cur->maybe_active_spin_lock_addr)) {
+ verbose(env, "active spin lock doesn't match rbtree's lock\n");
+ return -1;
+ }
+
+ return 0;
+}
+
static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
int *insn_idx_p)
{
@@ -7560,6 +7589,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
if (err)
return err;
+ if (is_rbtree_lock_check_function(func_id) &&
+ check_rbtree_lock_held(env, meta.map_ptr)) {
+ verbose(env, "lock associated with rbtree is not held\n");
+ return -EINVAL;
+ }
+
/* reset caller saved regs */
for (i = 0; i < CALLER_SAVED_REGS; i++) {
mark_reg_not_init(env, regs, caller_saved[i]);
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 12/18] bpf: Add OBJ_NON_OWNING_REF type flag
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (10 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 11/18] bpf: Check rbtree lock held during verification Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 13/18] bpf: Add CONDITIONAL_RELEASE " Dave Marchevsky
` (5 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Consider a pointer to a type that would normally need acquire / release
semantics to be safely held. There may be scenarios where such a pointer
can be safely held without the need to acquire a reference.
For example, although a PTR_TO_BTF_ID for a rbtree_map node is released
via bpf_rbtree_add helper, the helper doesn't change the address of the
node and must be called with the rbtree_map's spinlock held. Since the
only way to remove a node from the rbtree - bpf_rbtree_remove helper -
requires the same lock, the newly-added node cannot be removed by a
concurrently-running program until the lock is released. Therefore it is
safe to hold a reference to this node until bpf_rbtree_unlock is called.
This patch introduces a new type flag and associated verifier logic to
handle such "non-owning" references.
Currently the only usecase I have is the rbtree example above, so the
verifier logic is straightforward:
* Tag return types of bpf_rbtree_{add,find} with OBJ_NON_OWNING_REF
* These both require the rbtree lock to be held to return anything
non-NULL
* Since ret type for both is PTR_TO_BTF_ID_OR_NULL, if lock is not
held and NULL is returned, existing mark_ptr_or_null_reg logic
will clear reg type.
* So if mark_ptr_or_null_reg logic turns the returned reg into a
PTR_TO_BTF_ID | OBJ_NON_OWNING_REF, verifier knows lock is held.
* When the lock is released the verifier invalidates any regs holding
non owning refs similarly to existing release_reference logic - but no
need to clear ref_obj_id as an 'owning' reference was never acquired.
[ TODO: Currently the invalidation logic in
clear_rbtree_node_non_owning_refs is not parametrized by map so
unlocking any rbtree lock will invalidate all non-owning refs ]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf.h | 1 +
kernel/bpf/rbtree.c | 4 +--
kernel/bpf/verifier.c | 65 +++++++++++++++++++++++++++++++++++++++----
3 files changed, 62 insertions(+), 8 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b762c6b3dcfb..f164bd6e2f3a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -415,6 +415,7 @@ enum bpf_type_flag {
/* Size is known at compile time. */
MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS),
+ OBJ_NON_OWNING_REF = BIT(11 + BPF_BASE_TYPE_BITS),
__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index b5d158254de6..cc89639df8a2 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -144,7 +144,7 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
const struct bpf_func_proto bpf_rbtree_add_proto = {
.func = bpf_rbtree_add,
.gpl_only = true,
- .ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
.ret_btf_id = BPF_PTR_POISON,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
@@ -167,7 +167,7 @@ BPF_CALL_3(bpf_rbtree_find, struct bpf_map *, map, void *, key, void *, cb)
const struct bpf_func_proto bpf_rbtree_find_proto = {
.func = bpf_rbtree_find,
.gpl_only = true,
- .ret_type = RET_PTR_TO_BTF_ID_OR_NULL,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
.ret_btf_id = BPF_PTR_POISON,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_ANYTHING,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c9af1047d80..26aa228fa860 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -469,6 +469,11 @@ static bool type_is_rdonly_mem(u32 type)
return type & MEM_RDONLY;
}
+static bool type_is_non_owning_ref(u32 type)
+{
+ return type & OBJ_NON_OWNING_REF;
+}
+
static bool type_may_be_null(u32 type)
{
return type & PTR_MAYBE_NULL;
@@ -595,7 +600,9 @@ static bool function_returns_rbtree_node(enum bpf_func_id func_id)
static const char *reg_type_str(struct bpf_verifier_env *env,
enum bpf_reg_type type)
{
- char postfix[16] = {0}, prefix[32] = {0};
+ char postfix[32] = {0}, prefix[32] = {0};
+ unsigned int postfix_idx = 0;
+
static const char * const str[] = {
[NOT_INIT] = "?",
[SCALAR_VALUE] = "scalar",
@@ -620,11 +627,18 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
[PTR_TO_SPIN_LOCK] = "spin_lock",
};
- if (type & PTR_MAYBE_NULL) {
+ if (type_may_be_null(type)) {
if (base_type(type) == PTR_TO_BTF_ID)
- strncpy(postfix, "or_null_", 16);
+ postfix_idx += strlcpy(postfix + postfix_idx, "or_null_", 32 - postfix_idx);
else
- strncpy(postfix, "_or_null", 16);
+ postfix_idx += strlcpy(postfix + postfix_idx, "_or_null", 32 - postfix_idx);
+ }
+
+ if (type_is_non_owning_ref(type)) {
+ if (base_type(type) == PTR_TO_BTF_ID)
+ postfix_idx += strlcpy(postfix + postfix_idx, "non_own_", 32 - postfix_idx);
+ else
+ postfix_idx += strlcpy(postfix + postfix_idx, "_non_own", 32 - postfix_idx);
}
if (type & MEM_RDONLY)
@@ -5758,7 +5772,14 @@ static const struct bpf_reg_types int_ptr_types = {
static const struct bpf_reg_types spin_lock_types = {
.types = {
PTR_TO_MAP_VALUE,
- PTR_TO_SPIN_LOCK
+ PTR_TO_SPIN_LOCK,
+ },
+};
+
+static const struct bpf_reg_types btf_ptr_types = {
+ .types = {
+ PTR_TO_BTF_ID,
+ PTR_TO_BTF_ID | OBJ_NON_OWNING_REF,
},
};
@@ -5767,7 +5788,6 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
@@ -6723,6 +6743,33 @@ static int release_reference(struct bpf_verifier_env *env,
return 0;
}
+static void clear_non_owning_ref_regs(struct bpf_verifier_env *env,
+ struct bpf_func_state *state)
+{
+ struct bpf_reg_state *regs = state->regs, *reg;
+ int i;
+
+ for (i = 0; i < MAX_BPF_REG; i++)
+ if (type_is_non_owning_ref(regs[i].type))
+ mark_reg_unknown(env, regs, i);
+
+ bpf_for_each_spilled_reg(i, state, reg) {
+ if (!reg)
+ continue;
+ if (type_is_non_owning_ref(reg->type))
+ __mark_reg_unknown(env, reg);
+ }
+}
+
+static void clear_rbtree_node_non_owning_refs(struct bpf_verifier_env *env)
+{
+ struct bpf_verifier_state *vstate = env->cur_state;
+ int i;
+
+ for (i = 0; i <= vstate->curframe; i++)
+ clear_non_owning_ref_regs(env, vstate->frame[i]);
+}
+
static void clear_caller_saved_regs(struct bpf_verifier_env *env,
struct bpf_reg_state *regs)
{
@@ -7584,6 +7631,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
return -EFAULT;
}
break;
+ case BPF_FUNC_rbtree_unlock:
+ /* TODO clear_rbtree_node_non_owning_refs calls should be
+ * parametrized by base_type or ideally by owning map
+ */
+ clear_rbtree_node_non_owning_refs(env);
+ break;
}
if (err)
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 13/18] bpf: Add CONDITIONAL_RELEASE type flag
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (11 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 12/18] bpf: Add OBJ_NON_OWNING_REF type flag Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 14/18] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Dave Marchevsky
` (4 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Currently if a helper proto has an arg with OBJ_RELEASE flag, the verifier
assumes the 'release' logic in the helper will always succeed, and
therefore always clears the arg reg, other regs w/ same
ref_obj_id, and releases the reference. This poses a problem for
'release' helpers which may not always succeed.
For example, bpf_rbtree_add will fail to add the passed-in node to a
bpf_rbtree if the rbtree's lock is not held when the helper is called.
In this case the helper returns NULL and the calling bpf program must
release the node in another way before terminating to avoid leaking
memory. An example of such logic:
1 struct node_data *node = bpf_rbtree_alloc_node(&rbtree, ...);
2 struct node_data *ret = bpf_rbtree_add(&rbtree, node);
3 if (!ret) {
4 bpf_rbtree_free_node(node);
5 return 0;
6 }
7 bpf_trace_printk("%d\n", ret->id);
However, current verifier logic does not allow this: after line 2,
ref_obj_id of reg holding 'node' (BPF_REG_2) will be released via
release_reference function, which will mark BPF_REG_2 and any other reg
with same ref_obj_id as unknown. As a result neither ret nor node will
point to anything useful if line 3's check passes. Additionally, since the
reference is unconditionally released, the program would pass the
verifier without the null check.
This patch adds 'conditional release' semantics so that the verifier can
handle the above example correctly. The CONDITIONAL_RELEASE type flag
works in concert with the existing OBJ_RELEASE flag - the latter is used
to tag an argument, while the new type flag is used to tag return type.
If a helper has an OBJ_RELEASE arg type and CONDITIONAL_RELEASE return
type, the helper is considered to use its return value to indicate
success or failure of the release operation. NULL is returned if release
fails, non-null otherwise.
For my concrete usecase - bpf_rbtree_add - CONDITIONAL_RELEASE works in
concert with OBJ_NON_OWNING_REF: successful release results in a non-owning
reference being returned, allowing line 7 in above example.
Instead of unconditionally releasing the OBJ_RELEASE reference when
doing check_helper_call, for CONDITIONAL_RELEASE helpers the verifier
will wait until the return value is checked for null.
If not null: the reference is released
If null: no reference is released. Since other regs w/ same ref_obj_id
were not marked unknown by check_helper_call, they can be
used to release the reference via other means (line 4 above),
It's necessary to prevent conditionally-released ref_obj_id regs from
being used between the release helper and null check. For example:
1 struct node_data *node = bpf_rbtree_alloc_node(&rbtree, ...);
2 struct node_data *ret = bpf_rbtree_add(&rbtree, node);
3 do_something_with_a_node(node);
4 if (!ret) {
5 bpf_rbtree_free_node(node);
6 return 0;
7 }
Line 3 shouldn't be allowed since node may have been released. The
verifier tags all regs with ref_obj_id of the conditionally-released arg
in the period between the helper call and null check for this reason.
Why no matching CONDITIONAL_ACQUIRE type flag? Existing verifier logic
already treats acquire of an _OR_NULL type as a conditional acquire.
Consider this code:
1 struct thing *i = acquire_helper_that_returns_thing_or_null();
2 if (!i)
3 return 0;
4 manipulate_thing(i);
5 release_thing(i);
After line 1, BPF_REG_0 will have an _OR_NULL type and a ref_obj_id set.
When the verifier sees line 2's conditional jump, existing logic in
mark_ptr_or_null_regs, specifically the if:
if (ref_obj_id && ref_obj_id == id && is_null)
/* regs[regno] is in the " == NULL" branch.
* No one could have freed the reference state before
* doing the NULL check.
*/
WARN_ON_ONCE(release_reference_state(state, id));
will release the reference in the is_null state.
[ TODO: Either need to remove WARN_ON_ONCE there without adding
CONDITIONAL_ACQUIRE flag or add the flag and don't WARN_ON_ONCE if it's
set. Left out of first pass for simplicity's sake. ]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf.h | 3 +
include/linux/bpf_verifier.h | 1 +
kernel/bpf/rbtree.c | 2 +-
kernel/bpf/verifier.c | 114 +++++++++++++++++++++++++++++++----
4 files changed, 108 insertions(+), 12 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f164bd6e2f3a..83b8d63545e3 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -416,6 +416,9 @@ enum bpf_type_flag {
MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS),
OBJ_NON_OWNING_REF = BIT(11 + BPF_BASE_TYPE_BITS),
+
+ CONDITIONAL_RELEASE = BIT(12 + BPF_BASE_TYPE_BITS),
+
__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f81638844a4d..e5f967d17bb9 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -314,6 +314,7 @@ struct bpf_verifier_state {
u32 curframe;
u32 active_spin_lock;
void *maybe_active_spin_lock_addr;
+ u32 active_cond_ref_obj_id;
bool speculative;
/* first and last insn idx of this verifier state */
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index cc89639df8a2..e6f51c27661c 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -144,7 +144,7 @@ BPF_CALL_3(bpf_rbtree_add, struct bpf_map *, map, void *, value, void *, cb)
const struct bpf_func_proto bpf_rbtree_add_proto = {
.func = bpf_rbtree_add,
.gpl_only = true,
- .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | OBJ_NON_OWNING_REF | CONDITIONAL_RELEASE,
.ret_btf_id = BPF_PTR_POISON,
.arg1_type = ARG_CONST_MAP_PTR,
.arg2_type = ARG_PTR_TO_BTF_ID | OBJ_RELEASE,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 26aa228fa860..3da8959e5f7a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -474,6 +474,11 @@ static bool type_is_non_owning_ref(u32 type)
return type & OBJ_NON_OWNING_REF;
}
+static bool type_is_cond_release(u32 type)
+{
+ return type & CONDITIONAL_RELEASE;
+}
+
static bool type_may_be_null(u32 type)
{
return type & PTR_MAYBE_NULL;
@@ -641,6 +646,15 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
postfix_idx += strlcpy(postfix + postfix_idx, "_non_own", 32 - postfix_idx);
}
+ if (type_is_cond_release(type)) {
+ if (base_type(type) == PTR_TO_BTF_ID)
+ postfix_idx += strlcpy(postfix + postfix_idx, "cond_rel_",
+ 32 - postfix_idx);
+ else
+ postfix_idx += strlcpy(postfix + postfix_idx, "_cond_rel",
+ 32 - postfix_idx);
+ }
+
if (type & MEM_RDONLY)
strncpy(prefix, "rdonly_", 32);
if (type & MEM_ALLOC)
@@ -1253,6 +1267,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
dst_state->curframe = src->curframe;
dst_state->active_spin_lock = src->active_spin_lock;
dst_state->maybe_active_spin_lock_addr = src->maybe_active_spin_lock_addr;
+ dst_state->active_cond_ref_obj_id = src->active_cond_ref_obj_id;
dst_state->branches = src->branches;
dst_state->parent = src->parent;
dst_state->first_insn_idx = src->first_insn_idx;
@@ -1460,6 +1475,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
return;
}
+ reg->type &= ~CONDITIONAL_RELEASE;
reg->type &= ~PTR_MAYBE_NULL;
}
@@ -6723,24 +6739,78 @@ static void release_reg_references(struct bpf_verifier_env *env,
}
}
+static int __release_reference(struct bpf_verifier_env *env, struct bpf_verifier_state *vstate,
+ int ref_obj_id)
+{
+ int err;
+ int i;
+
+ err = release_reference_state(vstate->frame[vstate->curframe], ref_obj_id);
+ if (err)
+ return err;
+
+ for (i = 0; i <= vstate->curframe; i++)
+ release_reg_references(env, vstate->frame[i], ref_obj_id);
+ return 0;
+}
+
/* The pointer with the specified id has released its reference to kernel
* resources. Identify all copies of the same pointer and clear the reference.
*/
static int release_reference(struct bpf_verifier_env *env,
int ref_obj_id)
{
- struct bpf_verifier_state *vstate = env->cur_state;
- int err;
+ return __release_reference(env, env->cur_state, ref_obj_id);
+}
+
+static void tag_reference_cond_release_regs(struct bpf_verifier_env *env,
+ struct bpf_func_state *state,
+ int ref_obj_id,
+ bool remove)
+{
+ struct bpf_reg_state *regs = state->regs, *reg;
int i;
- err = release_reference_state(cur_func(env), ref_obj_id);
- if (err)
- return err;
+ for (i = 0; i < MAX_BPF_REG; i++)
+ if (regs[i].ref_obj_id == ref_obj_id) {
+ if (remove)
+ regs[i].type &= ~CONDITIONAL_RELEASE;
+ else
+ regs[i].type |= CONDITIONAL_RELEASE;
+ }
+
+ bpf_for_each_spilled_reg(i, state, reg) {
+ if (!reg)
+ continue;
+ if (reg->ref_obj_id == ref_obj_id) {
+ if (remove)
+ reg->type &= ~CONDITIONAL_RELEASE;
+ else
+ reg->type |= CONDITIONAL_RELEASE;
+ }
+ }
+}
+
+static void tag_reference_cond_release(struct bpf_verifier_env *env,
+ int ref_obj_id)
+{
+ struct bpf_verifier_state *vstate = env->cur_state;
+ int i;
for (i = 0; i <= vstate->curframe; i++)
- release_reg_references(env, vstate->frame[i], ref_obj_id);
+ tag_reference_cond_release_regs(env, vstate->frame[i],
+ ref_obj_id, false);
+}
- return 0;
+static void untag_reference_cond_release(struct bpf_verifier_env *env,
+ struct bpf_verifier_state *vstate,
+ int ref_obj_id)
+{
+ int i;
+
+ for (i = 0; i <= vstate->curframe; i++)
+ tag_reference_cond_release_regs(env, vstate->frame[i],
+ ref_obj_id, true);
}
static void clear_non_owning_ref_regs(struct bpf_verifier_env *env,
@@ -7537,7 +7607,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
if (arg_type_is_dynptr(fn->arg_type[meta.release_regno - BPF_REG_1]))
err = unmark_stack_slots_dynptr(env, ®s[meta.release_regno]);
else if (meta.ref_obj_id)
- err = release_reference(env, meta.ref_obj_id);
+ if (type_is_cond_release(fn->ret_type)) {
+ if (env->cur_state->active_cond_ref_obj_id) {
+ verbose(env, "can't handle >1 cond_release\n");
+ return err;
+ }
+ env->cur_state->active_cond_ref_obj_id = meta.ref_obj_id;
+ tag_reference_cond_release(env, meta.ref_obj_id);
+ err = 0;
+ } else {
+ err = release_reference(env, meta.ref_obj_id);
+ }
/* meta.ref_obj_id can only be 0 if register that is meant to be
* released is NULL, which must be > R0.
*/
@@ -10212,7 +10292,7 @@ static void __mark_ptr_or_null_regs(struct bpf_func_state *state, u32 id,
* be folded together at some point.
*/
static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
- bool is_null)
+ bool is_null, struct bpf_verifier_env *env)
{
struct bpf_func_state *state = vstate->frame[vstate->curframe];
struct bpf_reg_state *regs = state->regs;
@@ -10227,6 +10307,15 @@ static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno,
*/
WARN_ON_ONCE(release_reference_state(state, id));
+ if (type_is_cond_release(regs[regno].type)) {
+ if (!is_null) {
+ __release_reference(env, vstate, vstate->active_cond_ref_obj_id);
+ vstate->active_cond_ref_obj_id = 0;
+ } else {
+ untag_reference_cond_release(env, vstate, vstate->active_cond_ref_obj_id);
+ vstate->active_cond_ref_obj_id = 0;
+ }
+ }
for (i = 0; i <= vstate->curframe; i++)
__mark_ptr_or_null_regs(vstate->frame[i], id, is_null);
}
@@ -10537,9 +10626,9 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
* safe or unknown depending R == 0 or R != 0 conditional.
*/
mark_ptr_or_null_regs(this_branch, insn->dst_reg,
- opcode == BPF_JNE);
+ opcode == BPF_JNE, env);
mark_ptr_or_null_regs(other_branch, insn->dst_reg,
- opcode == BPF_JEQ);
+ opcode == BPF_JEQ, env);
} else if (!try_match_pkt_pointers(insn, dst_reg, ®s[insn->src_reg],
this_branch, other_branch) &&
is_pointer_value(env, insn->dst_reg)) {
@@ -11996,6 +12085,9 @@ static bool states_equal(struct bpf_verifier_env *env,
if (old->active_spin_lock != cur->active_spin_lock)
return false;
+ if (old->active_cond_ref_obj_id != cur->active_cond_ref_obj_id)
+ return false;
+
/* for states to be equal callsites have to be the same
* and all frame states need to be equivalent
*/
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 14/18] bpf: Introduce PTR_ITER and PTR_ITER_END type flags
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (12 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 13/18] bpf: Add CONDITIONAL_RELEASE " Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 15/18] selftests/bpf: Add rbtree map tests Dave Marchevsky
` (3 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Add two type flags, PTR_ITER and PTR_ITER_END, meant to support the
following pattern for iterating data structures:
struct node *elem = data_structure_iter_first(&some_map)
while (elem) {
do_something();
elem = data_structure_iter_next(&some_map, elem);
}
Currently the ret_type of both _first() and _next() helpers would be
PTR_TO_BTF_ID_OR_NULL and the verifier would consider the loop to be
infinite as it knows nothing about an arbitrary PTR_TO_BTF_ID value.
However, if we can express "this PTR_TO_BTF_ID will eventually be
NULL if used in iteration helpers" via a new type flag, the verifier can
use this information to determine that the loop will terminate while
still verifying the loop body.
So for our contrived example above, _first() now returns a
PTR_TO_BTF_ID_OR_NULL | PTR_ITER, which _next() expects as input,
itself returning PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END.
The verifier does nothing special for PTR_ITER, so the first iteration
of the example loop will result in both elem == NULL and elem != NULL
branches being verified. When verifying the latter branch, elem will
become a PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END.
When the verifier sees a reg holding PTR_ITER_END in a conditional jump
it knows the reg type and val can be replaced with SCALAR 0. So in the
example loop on 2nd iteration elem == NULL will be assumed and
verification will continue with that branch only.
[ TODO:
* PTR_ITER needs to be associated with the helper that produced it, to
prevent something like:
struct node *elem = data_structure_iter_last(&some_map)
while (elem) {
do_something();
elem = data_structure_iter_next(&some_map, elem);
}
i.e. _first() and _next() should be used together, same with _last()
and _prev().
* This is currently very unsafe. Per multiple conversations w/ Alexei
and Andrii, there are probably some ways forward here, but may be more
complex than it's worth. If so, can migrate to a callback-based
approach with similar functionality .
]
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
include/linux/bpf.h | 3 ++
include/uapi/linux/bpf.h | 32 ++++++++++++++
kernel/bpf/helpers.c | 12 ++++++
kernel/bpf/rbtree.c | 78 ++++++++++++++++++++++++++++++++++
kernel/bpf/verifier.c | 53 +++++++++++++++++++++--
tools/include/uapi/linux/bpf.h | 32 ++++++++++++++
6 files changed, 207 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 83b8d63545e3..f4aabfa943c1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -419,6 +419,9 @@ enum bpf_type_flag {
CONDITIONAL_RELEASE = BIT(12 + BPF_BASE_TYPE_BITS),
+ PTR_ITER = BIT(13 + BPF_BASE_TYPE_BITS),
+ PTR_ITER_END = BIT(14 + BPF_BASE_TYPE_BITS),
+
__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f4c615fbf64f..bb556c449cf0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5433,6 +5433,34 @@ union bpf_attr {
*
* Return
* 0
+ *
+ * long bpf_rbtree_first(struct bpf_map *map)
+ * Description
+ * Return the first node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_last(struct bpf_map *map)
+ * Description
+ * Return the last node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_next(struct bpf_map *map, void *cur)
+ * Description
+ * Return the next node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_prev(struct bpf_map *map, void *cur)
+ * Description
+ * Return the previous node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5652,6 +5680,10 @@ union bpf_attr {
FN(rbtree_get_lock), \
FN(rbtree_lock), \
FN(rbtree_unlock), \
+ FN(rbtree_first), \
+ FN(rbtree_last), \
+ FN(rbtree_next), \
+ FN(rbtree_prev), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 0ca5fed1013b..32e245c559c4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1608,6 +1608,10 @@ const struct bpf_func_proto bpf_rbtree_free_node_proto __weak;
const struct bpf_func_proto bpf_rbtree_get_lock_proto __weak;
const struct bpf_func_proto bpf_rbtree_lock_proto __weak;
const struct bpf_func_proto bpf_rbtree_unlock_proto __weak;
+const struct bpf_func_proto bpf_rbtree_first_proto __weak;
+const struct bpf_func_proto bpf_rbtree_last_proto __weak;
+const struct bpf_func_proto bpf_rbtree_next_proto __weak;
+const struct bpf_func_proto bpf_rbtree_prev_proto __weak;
const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
@@ -1715,6 +1719,14 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_rbtree_lock_proto;
case BPF_FUNC_rbtree_unlock:
return &bpf_rbtree_unlock_proto;
+ case BPF_FUNC_rbtree_first:
+ return &bpf_rbtree_first_proto;
+ case BPF_FUNC_rbtree_last:
+ return &bpf_rbtree_last_proto;
+ case BPF_FUNC_rbtree_next:
+ return &bpf_rbtree_next_proto;
+ case BPF_FUNC_rbtree_prev:
+ return &bpf_rbtree_prev_proto;
default:
break;
}
diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index e6f51c27661c..4794a50adbca 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -174,6 +174,84 @@ const struct bpf_func_proto bpf_rbtree_find_proto = {
.arg3_type = ARG_PTR_TO_FUNC,
};
+BPF_CALL_1(bpf_rbtree_first, struct bpf_map *, map)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)rb_first_cached(&tree->root);
+}
+
+const struct bpf_func_proto bpf_rbtree_first_proto = {
+ .func = bpf_rbtree_first,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+};
+
+BPF_CALL_1(bpf_rbtree_last, struct bpf_map *, map)
+{
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)rb_last(&tree->root.rb_root);
+}
+
+const struct bpf_func_proto bpf_rbtree_last_proto = {
+ .func = bpf_rbtree_last,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER | OBJ_NON_OWNING_REF,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+};
+
+BPF_CALL_2(bpf_rbtree_next, struct bpf_map *, map, void *, cur)
+{
+ struct rb_node *next = rb_next((struct rb_node *)cur);
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)next;
+}
+
+const struct bpf_func_proto bpf_rbtree_next_proto = {
+ .func = bpf_rbtree_next,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER,
+ .arg2_btf_id = BPF_PTR_POISON,
+};
+
+BPF_CALL_2(bpf_rbtree_prev, struct bpf_map *, map, void *, cur)
+{
+ struct rb_node *node = (struct rb_node *)cur;
+ struct bpf_rbtree *tree = container_of(map, struct bpf_rbtree, map);
+
+ if (!__rbtree_lock_held(tree))
+ return (u64)NULL;
+
+ return (u64)rb_prev(node);
+}
+
+const struct bpf_func_proto bpf_rbtree_prev_proto = {
+ .func = bpf_rbtree_prev,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID_OR_NULL | PTR_ITER_END | OBJ_NON_OWNING_REF,
+ .ret_btf_id = BPF_PTR_POISON,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID | PTR_ITER,
+ .arg2_btf_id = BPF_PTR_POISON,
+};
+
/* Like rbtree_postorder_for_each_entry_safe, but 'pos' and 'n' are
* 'rb_node *', so field name of rb_node within containing struct is not
* needed.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3da8959e5f7a..6354d3a2217d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -484,6 +484,11 @@ static bool type_may_be_null(u32 type)
return type & PTR_MAYBE_NULL;
}
+static bool type_is_iter(u32 type)
+{
+ return type & PTR_ITER || type & PTR_ITER_END;
+}
+
static bool is_acquire_function(enum bpf_func_id func_id,
const struct bpf_map *map)
{
@@ -586,7 +591,9 @@ static bool function_manipulates_rbtree_node(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_rbtree_add ||
func_id == BPF_FUNC_rbtree_remove ||
- func_id == BPF_FUNC_rbtree_free_node;
+ func_id == BPF_FUNC_rbtree_free_node ||
+ func_id == BPF_FUNC_rbtree_next ||
+ func_id == BPF_FUNC_rbtree_prev;
}
static bool function_returns_rbtree_node(enum bpf_func_id func_id)
@@ -594,7 +601,11 @@ static bool function_returns_rbtree_node(enum bpf_func_id func_id)
return func_id == BPF_FUNC_rbtree_alloc_node ||
func_id == BPF_FUNC_rbtree_find ||
func_id == BPF_FUNC_rbtree_add ||
- func_id == BPF_FUNC_rbtree_remove;
+ func_id == BPF_FUNC_rbtree_remove ||
+ func_id == BPF_FUNC_rbtree_first ||
+ func_id == BPF_FUNC_rbtree_last ||
+ func_id == BPF_FUNC_rbtree_next ||
+ func_id == BPF_FUNC_rbtree_prev;
}
/* string representation of 'enum bpf_reg_type'
@@ -655,6 +666,12 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
32 - postfix_idx);
}
+ if (type_is_iter(type)) {
+ postfix_idx += strlcpy(postfix + postfix_idx, "_iter", 32 - postfix_idx);
+ if (type & PTR_ITER_END)
+ postfix_idx += strlcpy(postfix + postfix_idx, "_end", 32 - postfix_idx);
+ }
+
if (type & MEM_RDONLY)
strncpy(prefix, "rdonly_", 32);
if (type & MEM_ALLOC)
@@ -1470,7 +1487,7 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
map->map_type == BPF_MAP_TYPE_SOCKHASH) {
reg->type = PTR_TO_SOCKET;
} else {
- reg->type = PTR_TO_MAP_VALUE;
+ reg->type &= ~PTR_MAYBE_NULL;
}
return;
}
@@ -3063,6 +3080,11 @@ static bool __is_pointer_value(bool allow_ptr_leaks,
return reg->type != SCALAR_VALUE;
}
+static bool __is_iter_end(const struct bpf_reg_state *reg)
+{
+ return reg->type & PTR_ITER_END;
+}
+
static void save_register_state(struct bpf_func_state *state,
int spi, struct bpf_reg_state *reg,
int size)
@@ -5737,6 +5759,8 @@ static const struct bpf_reg_types map_key_value_types = {
PTR_TO_PACKET_META,
PTR_TO_MAP_KEY,
PTR_TO_MAP_VALUE,
+ PTR_TO_MAP_VALUE | PTR_ITER,
+ PTR_TO_MAP_VALUE | PTR_ITER_END,
},
};
@@ -5870,6 +5894,17 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
if (arg_type & PTR_MAYBE_NULL)
type &= ~PTR_MAYBE_NULL;
+ /* TYPE | PTR_ITER is valid input for helpers that expect TYPE
+ * TYPE is not valid input for helpers that expect TYPE | PTR_ITER
+ */
+ if (type_is_iter(arg_type)) {
+ if (!type_is_iter(type))
+ goto not_found;
+
+ type &= ~PTR_ITER;
+ type &= ~PTR_ITER_END;
+ }
+
for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
expected = compatible->types[i];
if (expected == NOT_INIT)
@@ -5879,6 +5914,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
goto found;
}
+not_found:
verbose(env, "R%d type=%s expected=", regno, reg_type_str(env, reg->type));
for (j = 0; j + 1 < i; j++)
verbose(env, "%s, ", reg_type_str(env, compatible->types[j]));
@@ -9933,6 +9969,17 @@ static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode,
bool is_jmp32)
{
if (__is_pointer_value(false, reg)) {
+ if (__is_iter_end(reg) && val == 0) {
+ __mark_reg_const_zero(reg);
+ switch (opcode) {
+ case BPF_JEQ:
+ return 1;
+ case BPF_JNE:
+ return 0;
+ default:
+ return -1;
+ }
+ }
if (!reg_type_not_null(reg->type))
return -1;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f4c615fbf64f..bb556c449cf0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5433,6 +5433,34 @@ union bpf_attr {
*
* Return
* 0
+ *
+ * long bpf_rbtree_first(struct bpf_map *map)
+ * Description
+ * Return the first node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_last(struct bpf_map *map)
+ * Description
+ * Return the last node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_next(struct bpf_map *map, void *cur)
+ * Description
+ * Return the next node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
+ *
+ * long bpf_rbtree_prev(struct bpf_map *map, void *cur)
+ * Description
+ * Return the previous node in the tree according to sort order
+ *
+ * Return
+ * If found, ptr to node, otherwise NULL
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5652,6 +5680,10 @@ union bpf_attr {
FN(rbtree_get_lock), \
FN(rbtree_lock), \
FN(rbtree_unlock), \
+ FN(rbtree_first), \
+ FN(rbtree_last), \
+ FN(rbtree_next), \
+ FN(rbtree_prev), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 15/18] selftests/bpf: Add rbtree map tests
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (13 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 14/18] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 16/18] selftests/bpf: Declarative lock definition test changes Dave Marchevsky
` (2 subsequent siblings)
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
Add tests demonstrating happy path of rbtree map usage as well as
exercising numerous failure paths and conditions. Structure of failing
test runner is based on dynptr tests.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
.../selftests/bpf/prog_tests/rbtree_map.c | 164 ++++++++++++
.../testing/selftests/bpf/progs/rbtree_map.c | 111 ++++++++
.../selftests/bpf/progs/rbtree_map_fail.c | 236 ++++++++++++++++++
.../bpf/progs/rbtree_map_load_fail.c | 24 ++
4 files changed, 535 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/rbtree_map.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_fail.c
create mode 100644 tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
new file mode 100644
index 000000000000..17cadcd05ee4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
@@ -0,0 +1,164 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <sys/syscall.h>
+#include <test_progs.h>
+#include "rbtree_map.skel.h"
+#include "rbtree_map_fail.skel.h"
+#include "rbtree_map_load_fail.skel.h"
+
+static size_t log_buf_sz = 1048576; /* 1 MB */
+static char obj_log_buf[1048576];
+
+static struct {
+ const char *prog_name;
+ const char *expected_err_msg;
+} rbtree_prog_load_fail_tests[] = {
+ {"rb_node__field_store", "only read is supported"},
+ {"rb_node__alloc_no_add", "Unreleased reference id=2 alloc_insn=3"},
+ {"rb_node__two_alloc_one_add", "Unreleased reference id=2 alloc_insn=3"},
+ {"rb_node__remove_no_free", "Unreleased reference id=5 alloc_insn=28"},
+ {"rb_tree__add_wrong_type", "rbtree: R2 is of type task_struct but node_data is expected"},
+ {"rb_tree__conditional_release_helper_usage",
+ "R2 type=ptr_cond_rel_ expected=ptr_"},
+};
+
+void test_rbtree_map_load_fail(void)
+{
+ struct rbtree_map_load_fail *skel;
+
+ skel = rbtree_map_load_fail__open_and_load();
+ if (!ASSERT_ERR_PTR(skel, "rbtree_map_load_fail__open_and_load"))
+ rbtree_map_load_fail__destroy(skel);
+}
+
+static void verify_fail(const char *prog_name, const char *expected_err_msg)
+{
+ LIBBPF_OPTS(bpf_object_open_opts, opts);
+ struct rbtree_map_fail *skel;
+ struct bpf_program *prog;
+ int err;
+
+ opts.kernel_log_buf = obj_log_buf;
+ opts.kernel_log_size = log_buf_sz;
+ opts.kernel_log_level = 1;
+
+ skel = rbtree_map_fail__open_opts(&opts);
+ if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open_opts"))
+ goto cleanup;
+
+ prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+ if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+ goto cleanup;
+
+ bpf_program__set_autoload(prog, true);
+ err = rbtree_map_fail__load(skel);
+ if (!ASSERT_ERR(err, "unexpected load success"))
+ goto cleanup;
+
+ if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
+ fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
+ fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
+ }
+
+cleanup:
+ rbtree_map_fail__destroy(skel);
+}
+
+void test_rbtree_map_alloc_node__size_too_small(void)
+{
+ struct rbtree_map_fail *skel;
+ struct bpf_program *prog;
+ struct bpf_link *link;
+ int err;
+
+ skel = rbtree_map_fail__open();
+ if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open"))
+ goto cleanup;
+
+ prog = skel->progs.alloc_node__size_too_small;
+ bpf_program__set_autoload(prog, true);
+
+ err = rbtree_map_fail__load(skel);
+ if (!ASSERT_OK(err, "unexpected load fail"))
+ goto cleanup;
+
+ link = bpf_program__attach(skel->progs.alloc_node__size_too_small);
+ if (!ASSERT_OK_PTR(link, "link"))
+ goto cleanup;
+
+ syscall(SYS_getpgid);
+
+ ASSERT_EQ(skel->bss->size_too_small__alloc_fail, 1, "alloc_fail");
+
+ bpf_link__destroy(link);
+cleanup:
+ rbtree_map_fail__destroy(skel);
+}
+
+void test_rbtree_map_add_node__no_lock(void)
+{
+ struct rbtree_map_fail *skel;
+ struct bpf_program *prog;
+ struct bpf_link *link;
+ int err;
+
+ skel = rbtree_map_fail__open();
+ if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open"))
+ goto cleanup;
+
+ prog = skel->progs.add_node__no_lock;
+ bpf_program__set_autoload(prog, true);
+
+ err = rbtree_map_fail__load(skel);
+ if (!ASSERT_OK(err, "unexpected load fail"))
+ goto cleanup;
+
+ link = bpf_program__attach(skel->progs.add_node__no_lock);
+ if (!ASSERT_OK_PTR(link, "link"))
+ goto cleanup;
+
+ syscall(SYS_getpgid);
+
+ ASSERT_EQ(skel->bss->no_lock_add__fail, 1, "alloc_fail");
+
+ bpf_link__destroy(link);
+cleanup:
+ rbtree_map_fail__destroy(skel);
+}
+
+void test_rbtree_map_prog_load_fail(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(rbtree_prog_load_fail_tests); i++) {
+ if (!test__start_subtest(rbtree_prog_load_fail_tests[i].prog_name))
+ continue;
+
+ verify_fail(rbtree_prog_load_fail_tests[i].prog_name,
+ rbtree_prog_load_fail_tests[i].expected_err_msg);
+ }
+}
+
+void test_rbtree_map(void)
+{
+ struct rbtree_map *skel;
+ struct bpf_link *link;
+
+ skel = rbtree_map__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "rbtree_map__open_and_load"))
+ goto cleanup;
+
+ link = bpf_program__attach(skel->progs.check_rbtree);
+ if (!ASSERT_OK_PTR(link, "link"))
+ goto cleanup;
+
+ for (int i = 0; i < 100; i++)
+ syscall(SYS_getpgid);
+
+ ASSERT_EQ(skel->bss->calls, 100, "calls_equal");
+
+ bpf_link__destroy(link);
+cleanup:
+ rbtree_map__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map.c b/tools/testing/selftests/bpf/progs/rbtree_map.c
new file mode 100644
index 000000000000..0cd467838f6e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_map.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+struct node_data {
+ struct rb_node node;
+ __u32 one;
+ __u32 two;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RBTREE);
+ __type(value, struct node_data);
+} rbtree SEC(".maps");
+
+long calls;
+
+static bool less(struct rb_node *a, const struct rb_node *b)
+{
+ struct node_data *node_a;
+ struct node_data *node_b;
+
+ node_a = container_of(a, struct node_data, node);
+ node_b = container_of(b, struct node_data, node);
+
+ return node_a->one < node_b->one;
+}
+
+// Key = node_datq
+static int cmp(const void *key, const struct rb_node *b)
+{
+ struct node_data *node_a;
+ struct node_data *node_b;
+
+ node_a = container_of(key, struct node_data, node);
+ node_b = container_of(b, struct node_data, node);
+
+ return node_b->one - node_a->one;
+}
+
+// Key = just node_data.one
+static int cmp2(const void *key, const struct rb_node *b)
+{
+ __u32 one;
+ struct node_data *node_b;
+
+ one = *(__u32 *)key;
+ node_b = container_of(b, struct node_data, node);
+
+ return node_b->one - one;
+}
+
+SEC("fentry/" SYS_PREFIX "sys_getpgid")
+int check_rbtree(void *ctx)
+{
+ struct node_data *node, *found, *ret;
+ struct node_data popped;
+ struct node_data search;
+ __u32 search2;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+
+ node->one = calls;
+ node->two = 6;
+ bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+ ret = (struct node_data *)bpf_rbtree_add(&rbtree, node, less);
+ if (!ret) {
+ bpf_rbtree_free_node(&rbtree, node);
+ goto unlock_ret;
+ }
+
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+
+ bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+ search.one = calls;
+ found = (struct node_data *)bpf_rbtree_find(&rbtree, &search, cmp);
+ if (!found)
+ goto unlock_ret;
+
+ int node_ct = 0;
+ struct node_data *iter = (struct node_data *)bpf_rbtree_first(&rbtree);
+
+ while (iter) {
+ node_ct++;
+ iter = (struct node_data *)bpf_rbtree_next(&rbtree, iter);
+ }
+
+ ret = (struct node_data *)bpf_rbtree_remove(&rbtree, found);
+ if (!ret)
+ goto unlock_ret;
+
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+
+ bpf_rbtree_free_node(&rbtree, ret);
+
+ __sync_fetch_and_add(&calls, 1);
+ return 0;
+
+unlock_ret:
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
new file mode 100644
index 000000000000..287c8db080d8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+
+struct node_data {
+ struct rb_node node;
+ __u32 one;
+ __u32 two;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RBTREE);
+ __type(value, struct node_data);
+} rbtree SEC(".maps");
+
+long calls;
+
+static bool less(struct rb_node *a, const struct rb_node *b)
+{
+ struct node_data *node_a;
+ struct node_data *node_b;
+
+ node_a = container_of(a, struct node_data, node);
+ node_b = container_of(b, struct node_data, node);
+
+ return node_a->one < node_b->one;
+}
+
+// Key = node_datq
+static int cmp(const void *key, const struct rb_node *b)
+{
+ struct node_data *node_a;
+ struct node_data *node_b;
+
+ node_a = container_of(key, struct node_data, node);
+ node_b = container_of(b, struct node_data, node);
+
+ return node_b->one - node_a->one;
+}
+
+long size_too_small__alloc_fail;
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int alloc_node__size_too_small(void *ctx)
+{
+ struct node_data *node, *ret;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(char));
+ if (!node) {
+ size_too_small__alloc_fail++;
+ return 0;
+ }
+
+ bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ /* will never execute, alloc_node should fail */
+ node->one = 1;
+ ret = bpf_rbtree_add(&rbtree, node, less);
+ if (!ret) {
+ bpf_rbtree_free_node(&rbtree, node);
+ goto unlock_ret;
+ }
+
+unlock_ret:
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ return 0;
+}
+
+long no_lock_add__fail;
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int add_node__no_lock(void *ctx)
+{
+ struct node_data *node, *ret;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+
+ node->one = 1;
+ ret = bpf_rbtree_add(&rbtree, node, less);
+ if (!ret) {
+ no_lock_add__fail++;
+ /* will always execute, rbtree_add should fail
+ * because no lock held
+ */
+ bpf_rbtree_free_node(&rbtree, node);
+ }
+
+unlock_ret:
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__field_store(void *ctx)
+{
+ struct node_data *node;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+
+ /* Only rbtree_map helpers can modify rb_node field */
+ node->node.rb_left = NULL;
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__alloc_no_add(void *ctx)
+{
+ struct node_data *node;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+ /* The node alloc'd above is never added to the rbtree. It must be
+ * added or free'd before prog terminates.
+ */
+
+ node->one = 42;
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__two_alloc_one_add(void *ctx)
+{
+ struct node_data *node, *ret;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+ node->one = 1;
+ /* The node alloc'd above is never added to the rbtree. It must be
+ * added or free'd before prog terminates.
+ */
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+ node->one = 42;
+
+ bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+ ret = bpf_rbtree_add(&rbtree, node, less);
+ if (!ret) {
+ bpf_rbtree_free_node(&rbtree, node);
+ goto unlock_ret;
+ }
+
+unlock_ret:
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_node__remove_no_free(void *ctx)
+{
+ struct node_data *node, *ret;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+ node->one = 42;
+
+ bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+ ret = bpf_rbtree_add(&rbtree, node, less);
+ if (!ret) {
+ bpf_rbtree_free_node(&rbtree, node);
+ goto unlock_ret;
+ }
+
+ ret = bpf_rbtree_remove(&rbtree, ret);
+ if (!ret)
+ goto unlock_ret;
+ /* At this point we've successfully acquired a reference from
+ * bpf_rbtree_remove. It must be released via rbtree_add or
+ * rbtree_free_node before prog terminates.
+ */
+
+unlock_ret:
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_tree__add_wrong_type(void *ctx)
+{
+ /* Can't add a task_struct to rbtree
+ */
+ struct task_struct *task;
+ struct node_data *ret;
+
+ task = bpf_get_current_task_btf();
+
+ bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+ ret = bpf_rbtree_add(&rbtree, task, less);
+ /* Verifier should fail at bpf_rbtree_add, so don't bother handling
+ * failure.
+ */
+
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int rb_tree__conditional_release_helper_usage(void *ctx)
+{
+ struct node_data *node, *ret;
+
+ node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
+ if (!node)
+ return 0;
+ node->one = 42;
+
+ bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+
+ ret = bpf_rbtree_add(&rbtree, node, less);
+ /* Verifier should fail when trying to use CONDITIONAL_RELEASE
+ * type in a helper
+ */
+ bpf_rbtree_free_node(&rbtree, node);
+ if (!ret) {
+ bpf_rbtree_free_node(&rbtree, node);
+ goto unlock_ret;
+ }
+
+unlock_ret:
+ bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
new file mode 100644
index 000000000000..036558eedd66
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+struct node_data_no_rb_node {
+ __u64 one;
+ __u64 two;
+ __u64 three;
+ __u64 four;
+ __u64 five;
+ __u64 six;
+ __u64 seven;
+};
+
+/* Should fail because value struct has no rb_node
+ */
+struct {
+ __uint(type, BPF_MAP_TYPE_RBTREE);
+ __type(value, struct node_data_no_rb_node);
+} rbtree_fail_no_rb_node SEC(".maps");
+
+char _license[] SEC("license") = "GPL";
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 16/18] selftests/bpf: Declarative lock definition test changes
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (14 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 15/18] selftests/bpf: Add rbtree map tests Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 17/18] selftests/bpf: Lock tracking " Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 18/18] selftests/bpf: Rbtree static lock verification " Dave Marchevsky
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
This patch contains test changes corresponding to the functional changes
in patch "bpf: Support declarative association of lock with rbtree map".
It will be squashed with other test patches, leaving in this state for
RFCv2 feedback.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
tools/testing/selftests/bpf/progs/rbtree_map.c | 12 +++++++++---
tools/testing/selftests/bpf/progs/rbtree_map_fail.c | 9 ++++++++-
.../selftests/bpf/progs/rbtree_map_load_fail.c | 9 ++++++++-
3 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map.c b/tools/testing/selftests/bpf/progs/rbtree_map.c
index 0cd467838f6e..50f29b9a5b82 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map.c
@@ -11,12 +11,18 @@ struct node_data {
__u32 two;
};
+long calls;
+struct bpf_spin_lock rbtree_lock SEC(".bss.private");
+
struct {
__uint(type, BPF_MAP_TYPE_RBTREE);
__type(value, struct node_data);
-} rbtree SEC(".maps");
-
-long calls;
+ __array(lock, struct bpf_spin_lock);
+} rbtree SEC(".maps") = {
+ .lock = {
+ [0] = &rbtree_lock,
+ },
+};
static bool less(struct rb_node *a, const struct rb_node *b)
{
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
index 287c8db080d8..ab4002a8211c 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
@@ -11,10 +11,17 @@ struct node_data {
__u32 two;
};
+struct bpf_spin_lock rbtree_lock SEC(".bss.private");
+
struct {
__uint(type, BPF_MAP_TYPE_RBTREE);
__type(value, struct node_data);
-} rbtree SEC(".maps");
+ __array(lock, struct bpf_spin_lock);
+} rbtree SEC(".maps") = {
+ .lock = {
+ [0] = &rbtree_lock,
+ },
+};
long calls;
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
index 036558eedd66..5578769efa2f 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_load_fail.c
@@ -14,11 +14,18 @@ struct node_data_no_rb_node {
__u64 seven;
};
+struct bpf_spin_lock rbtree_lock SEC(".bss.private");
+
/* Should fail because value struct has no rb_node
*/
struct {
__uint(type, BPF_MAP_TYPE_RBTREE);
__type(value, struct node_data_no_rb_node);
-} rbtree_fail_no_rb_node SEC(".maps");
+ __array(lock, struct bpf_spin_lock);
+} rbtree_fail_no_rb_node SEC(".maps") = {
+ .lock = {
+ [0] = &rbtree_lock,
+ },
+};
char _license[] SEC("license") = "GPL";
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 17/18] selftests/bpf: Lock tracking test changes
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (15 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 16/18] selftests/bpf: Declarative lock definition test changes Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 18/18] selftests/bpf: Rbtree static lock verification " Dave Marchevsky
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
This patch contains test changes corresponding to the functional changes
in "bpf: Verifier tracking of rbtree_spin_lock held". It will be
squashed with other test patches, leaving in this state for RFCv2
feedback.
iter section of rbtree_map.c prog is commented out because iter helpers
will be tossed anyways.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
.../selftests/bpf/prog_tests/rbtree_map.c | 2 +-
.../testing/selftests/bpf/progs/rbtree_map.c | 16 ++++++++-------
.../selftests/bpf/progs/rbtree_map_fail.c | 20 +++++++++----------
3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
index 17cadcd05ee4..7634a2d93f0b 100644
--- a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
@@ -17,7 +17,7 @@ static struct {
{"rb_node__field_store", "only read is supported"},
{"rb_node__alloc_no_add", "Unreleased reference id=2 alloc_insn=3"},
{"rb_node__two_alloc_one_add", "Unreleased reference id=2 alloc_insn=3"},
- {"rb_node__remove_no_free", "Unreleased reference id=5 alloc_insn=28"},
+ {"rb_node__remove_no_free", "Unreleased reference id=6 alloc_insn=26"},
{"rb_tree__add_wrong_type", "rbtree: R2 is of type task_struct but node_data is expected"},
{"rb_tree__conditional_release_helper_usage",
"R2 type=ptr_cond_rel_ expected=ptr_"},
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map.c b/tools/testing/selftests/bpf/progs/rbtree_map.c
index 50f29b9a5b82..957672cce82a 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map.c
@@ -65,6 +65,7 @@ int check_rbtree(void *ctx)
struct node_data *node, *found, *ret;
struct node_data popped;
struct node_data search;
+ struct bpf_spin_lock *lock;
__u32 search2;
node = bpf_rbtree_alloc_node(&rbtree, sizeof(struct node_data));
@@ -73,7 +74,8 @@ int check_rbtree(void *ctx)
node->one = calls;
node->two = 6;
- bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ lock = &rbtree_lock;
+ bpf_rbtree_lock(lock);
ret = (struct node_data *)bpf_rbtree_add(&rbtree, node, less);
if (!ret) {
@@ -81,28 +83,28 @@ int check_rbtree(void *ctx)
goto unlock_ret;
}
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(lock);
- bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_lock(lock);
search.one = calls;
found = (struct node_data *)bpf_rbtree_find(&rbtree, &search, cmp);
if (!found)
goto unlock_ret;
- int node_ct = 0;
+ /*int node_ct = 0;
struct node_data *iter = (struct node_data *)bpf_rbtree_first(&rbtree);
while (iter) {
node_ct++;
iter = (struct node_data *)bpf_rbtree_next(&rbtree, iter);
- }
+ }*/
ret = (struct node_data *)bpf_rbtree_remove(&rbtree, found);
if (!ret)
goto unlock_ret;
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(lock);
bpf_rbtree_free_node(&rbtree, ret);
@@ -110,7 +112,7 @@ int check_rbtree(void *ctx)
return 0;
unlock_ret:
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(&rbtree_lock);
return 0;
}
diff --git a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
index ab4002a8211c..779b85294f37 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_map_fail.c
@@ -61,7 +61,7 @@ int alloc_node__size_too_small(void *ctx)
return 0;
}
- bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_lock(&rbtree_lock);
/* will never execute, alloc_node should fail */
node->one = 1;
ret = bpf_rbtree_add(&rbtree, node, less);
@@ -71,7 +71,7 @@ int alloc_node__size_too_small(void *ctx)
}
unlock_ret:
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(&rbtree_lock);
return 0;
}
@@ -148,7 +148,7 @@ int rb_node__two_alloc_one_add(void *ctx)
return 0;
node->one = 42;
- bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_lock(&rbtree_lock);
ret = bpf_rbtree_add(&rbtree, node, less);
if (!ret) {
@@ -157,7 +157,7 @@ int rb_node__two_alloc_one_add(void *ctx)
}
unlock_ret:
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(&rbtree_lock);
return 0;
}
@@ -171,7 +171,7 @@ int rb_node__remove_no_free(void *ctx)
return 0;
node->one = 42;
- bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_lock(&rbtree_lock);
ret = bpf_rbtree_add(&rbtree, node, less);
if (!ret) {
@@ -188,7 +188,7 @@ int rb_node__remove_no_free(void *ctx)
*/
unlock_ret:
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(&rbtree_lock);
return 0;
}
@@ -202,14 +202,14 @@ int rb_tree__add_wrong_type(void *ctx)
task = bpf_get_current_task_btf();
- bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_lock(&rbtree_lock);
ret = bpf_rbtree_add(&rbtree, task, less);
/* Verifier should fail at bpf_rbtree_add, so don't bother handling
* failure.
*/
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(&rbtree_lock);
return 0;
}
@@ -223,7 +223,7 @@ int rb_tree__conditional_release_helper_usage(void *ctx)
return 0;
node->one = 42;
- bpf_rbtree_lock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_lock(&rbtree_lock);
ret = bpf_rbtree_add(&rbtree, node, less);
/* Verifier should fail when trying to use CONDITIONAL_RELEASE
@@ -236,7 +236,7 @@ int rb_tree__conditional_release_helper_usage(void *ctx)
}
unlock_ret:
- bpf_rbtree_unlock(bpf_rbtree_get_lock(&rbtree));
+ bpf_rbtree_unlock(&rbtree_lock);
return 0;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [RFCv2 PATCH bpf-next 18/18] selftests/bpf: Rbtree static lock verification test changes
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
` (16 preceding siblings ...)
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 17/18] selftests/bpf: Lock tracking " Dave Marchevsky
@ 2022-08-30 17:27 ` Dave Marchevsky
17 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-08-30 17:27 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Kernel Team,
Dave Marchevsky
These are test chagnes corresponding to commit "bpf: Check rbtree lock
held during verification". They should be squashed with other test
changes, but are left unsquashed as part of RFCv2 to ease tossing them
piecemeal if associated functionality is dropped.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
.../selftests/bpf/prog_tests/rbtree_map.c | 32 +------------------
1 file changed, 1 insertion(+), 31 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
index 7634a2d93f0b..07bc2780854c 100644
--- a/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree_map.c
@@ -19,6 +19,7 @@ static struct {
{"rb_node__two_alloc_one_add", "Unreleased reference id=2 alloc_insn=3"},
{"rb_node__remove_no_free", "Unreleased reference id=6 alloc_insn=26"},
{"rb_tree__add_wrong_type", "rbtree: R2 is of type task_struct but node_data is expected"},
+ {"add_node__no_lock", "lock associated with rbtree is not held"},
{"rb_tree__conditional_release_helper_usage",
"R2 type=ptr_cond_rel_ expected=ptr_"},
};
@@ -96,37 +97,6 @@ void test_rbtree_map_alloc_node__size_too_small(void)
rbtree_map_fail__destroy(skel);
}
-void test_rbtree_map_add_node__no_lock(void)
-{
- struct rbtree_map_fail *skel;
- struct bpf_program *prog;
- struct bpf_link *link;
- int err;
-
- skel = rbtree_map_fail__open();
- if (!ASSERT_OK_PTR(skel, "rbtree_map_fail__open"))
- goto cleanup;
-
- prog = skel->progs.add_node__no_lock;
- bpf_program__set_autoload(prog, true);
-
- err = rbtree_map_fail__load(skel);
- if (!ASSERT_OK(err, "unexpected load fail"))
- goto cleanup;
-
- link = bpf_program__attach(skel->progs.add_node__no_lock);
- if (!ASSERT_OK_PTR(link, "link"))
- goto cleanup;
-
- syscall(SYS_getpgid);
-
- ASSERT_EQ(skel->bss->no_lock_add__fail, 1, "alloc_fail");
-
- bpf_link__destroy(link);
-cleanup:
- rbtree_map_fail__destroy(skel);
-}
-
void test_rbtree_map_prog_load_fail(void)
{
int i;
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range Dave Marchevsky
@ 2022-09-01 21:01 ` Joanne Koong
2022-09-06 23:42 ` Dave Marchevsky
0 siblings, 1 reply; 25+ messages in thread
From: Joanne Koong @ 2022-09-01 21:01 UTC (permalink / raw)
To: Dave Marchevsky
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Kernel Team
On Tue, Aug 30, 2022 at 11:03 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> Verifier logic to confirm that a callback function returns 0 or 1 was
> added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
> At the time, callback return value was only used to continue or stop
> iteration.
>
> In order to support callbacks with a broader return value range, such as
> those added further in this series, add a callback_ret_range to
> bpf_func_state. Verifier's helpers which set in_callback_fn will also
> set the new field, which the verifier will later use to check return
> value bounds.
>
> Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
> value as the latter would prevent the valid range (0, U64_MAX) being
> used.
>
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
> include/linux/bpf_verifier.h | 1 +
> kernel/bpf/verifier.c | 4 +++-
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 2e3bad8640dc..9c017575c034 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -237,6 +237,7 @@ struct bpf_func_state {
> */
> u32 async_entry_cnt;
> bool in_callback_fn;
> + struct tnum callback_ret_range;
> bool in_async_callback_fn;
>
> /* The following fields should be last. See copy_func_state() */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 9bef8b41e737..68bfa7c28048 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
> state->callsite = callsite;
> state->frameno = frameno;
> state->subprogno = subprogno;
> + state->callback_ret_range = tnum_range(0, 1);
> init_reg_state(env, state);
> mark_verifier_state_scratched(env);
> }
> @@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
> __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
> __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
> callee->in_callback_fn = true;
> + callee->callback_ret_range = tnum_range(0, 1);
Thanks for removing this restriction for callback functions!
One quick question: is this line above needed? I think in
__check_func_call, we always call init_func_state() first before
calling set_find_vma_callback_state(), so after the init_func_state()
call, the callee->callback_ret_range will already be set to
tnum_range(0,1).
> return 0;
> }
>
> @@ -6906,7 +6908,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
> caller = state->frame[state->curframe];
> if (callee->in_callback_fn) {
> /* enforce R0 return value range [0, 1]. */
> - struct tnum range = tnum_range(0, 1);
> + struct tnum range = callee->callback_ret_range;
>
> if (r0->type != SCALAR_VALUE) {
> verbose(env, "R0 not a scalar value\n");
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range
2022-09-01 21:01 ` Joanne Koong
@ 2022-09-06 23:42 ` Dave Marchevsky
2022-09-07 1:53 ` Alexei Starovoitov
0 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2022-09-06 23:42 UTC (permalink / raw)
To: Joanne Koong
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Kernel Team
On 9/1/22 5:01 PM, Joanne Koong wrote:
> On Tue, Aug 30, 2022 at 11:03 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>
>> Verifier logic to confirm that a callback function returns 0 or 1 was
>> added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
>> At the time, callback return value was only used to continue or stop
>> iteration.
>>
>> In order to support callbacks with a broader return value range, such as
>> those added further in this series, add a callback_ret_range to
>> bpf_func_state. Verifier's helpers which set in_callback_fn will also
>> set the new field, which the verifier will later use to check return
>> value bounds.
>>
>> Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
>> value as the latter would prevent the valid range (0, U64_MAX) being
>> used.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>> include/linux/bpf_verifier.h | 1 +
>> kernel/bpf/verifier.c | 4 +++-
>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 2e3bad8640dc..9c017575c034 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -237,6 +237,7 @@ struct bpf_func_state {
>> */
>> u32 async_entry_cnt;
>> bool in_callback_fn;
>> + struct tnum callback_ret_range;
>> bool in_async_callback_fn;
>>
>> /* The following fields should be last. See copy_func_state() */
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 9bef8b41e737..68bfa7c28048 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
>> state->callsite = callsite;
>> state->frameno = frameno;
>> state->subprogno = subprogno;
>> + state->callback_ret_range = tnum_range(0, 1);
>> init_reg_state(env, state);
>> mark_verifier_state_scratched(env);
>> }
>> @@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
>> __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
>> __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
>> callee->in_callback_fn = true;
>> + callee->callback_ret_range = tnum_range(0, 1);
>
> Thanks for removing this restriction for callback functions!
>
> One quick question: is this line above needed? I think in
> __check_func_call, we always call init_func_state() first before
> calling set_find_vma_callback_state(), so after the init_func_state()
> call, the callee->callback_ret_range will already be set to
> tnum_range(0,1).
>
You're right, it's not strictly necessary. I think that the default range being
tnum_range(0, 1) - although necessary for backwards compat - is unintuitive. So
decided to be explicit with existing callbacks so folks didn't have to go
searching for the default to understand what the ret_range is, and it's more
obvious that callback_ret_range should be changed if existing helper code is
reused.
>> return 0;
>> }
>>
>> @@ -6906,7 +6908,7 @@ static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
>> caller = state->frame[state->curframe];
>> if (callee->in_callback_fn) {
>> /* enforce R0 return value range [0, 1]. */
>> - struct tnum range = tnum_range(0, 1);
>> + struct tnum range = callee->callback_ret_range;
>>
>> if (r0->type != SCALAR_VALUE) {
>> verbose(env, "R0 not a scalar value\n");
>> --
>> 2.30.2
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range
2022-09-06 23:42 ` Dave Marchevsky
@ 2022-09-07 1:53 ` Alexei Starovoitov
2022-09-08 21:36 ` Dave Marchevsky
0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2022-09-07 1:53 UTC (permalink / raw)
To: Dave Marchevsky, Joanne Koong
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Kernel Team
On 9/6/22 4:42 PM, Dave Marchevsky wrote:
> On 9/1/22 5:01 PM, Joanne Koong wrote:
>> On Tue, Aug 30, 2022 at 11:03 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>>
>>> Verifier logic to confirm that a callback function returns 0 or 1 was
>>> added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
>>> At the time, callback return value was only used to continue or stop
>>> iteration.
>>>
>>> In order to support callbacks with a broader return value range, such as
>>> those added further in this series, add a callback_ret_range to
>>> bpf_func_state. Verifier's helpers which set in_callback_fn will also
>>> set the new field, which the verifier will later use to check return
>>> value bounds.
>>>
>>> Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
>>> value as the latter would prevent the valid range (0, U64_MAX) being
>>> used.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>> include/linux/bpf_verifier.h | 1 +
>>> kernel/bpf/verifier.c | 4 +++-
>>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>> index 2e3bad8640dc..9c017575c034 100644
>>> --- a/include/linux/bpf_verifier.h
>>> +++ b/include/linux/bpf_verifier.h
>>> @@ -237,6 +237,7 @@ struct bpf_func_state {
>>> */
>>> u32 async_entry_cnt;
>>> bool in_callback_fn;
>>> + struct tnum callback_ret_range;
>>> bool in_async_callback_fn;
>>>
>>> /* The following fields should be last. See copy_func_state() */
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index 9bef8b41e737..68bfa7c28048 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
>>> state->callsite = callsite;
>>> state->frameno = frameno;
>>> state->subprogno = subprogno;
>>> + state->callback_ret_range = tnum_range(0, 1);
>>> init_reg_state(env, state);
>>> mark_verifier_state_scratched(env);
>>> }
>>> @@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
>>> callee->in_callback_fn = true;
>>> + callee->callback_ret_range = tnum_range(0, 1);
>>
>> Thanks for removing this restriction for callback functions!
>>
>> One quick question: is this line above needed? I think in
>> __check_func_call, we always call init_func_state() first before
>> calling set_find_vma_callback_state(), so after the init_func_state()
>> call, the callee->callback_ret_range will already be set to
>> tnum_range(0,1).
>>
>
> You're right, it's not strictly necessary. I think that the default range being
> tnum_range(0, 1) - although necessary for backwards compat - is unintuitive. So
> decided to be explicit with existing callbacks so folks didn't have to go
> searching for the default to understand what the ret_range is, and it's more
> obvious that callback_ret_range should be changed if existing helper code is
> reused.
Maybe then it's better to keep callback_ret_range as range(0,0)
in init_func_state() to nudge/force other places to set it explicitly ?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range
2022-09-07 1:53 ` Alexei Starovoitov
@ 2022-09-08 21:36 ` Dave Marchevsky
2022-09-08 21:40 ` Alexei Starovoitov
0 siblings, 1 reply; 25+ messages in thread
From: Dave Marchevsky @ 2022-09-08 21:36 UTC (permalink / raw)
To: Alexei Starovoitov, Joanne Koong
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Kernel Team
On 9/6/22 9:53 PM, Alexei Starovoitov wrote:
> On 9/6/22 4:42 PM, Dave Marchevsky wrote:
>> On 9/1/22 5:01 PM, Joanne Koong wrote:
>>> On Tue, Aug 30, 2022 at 11:03 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>>>
>>>> Verifier logic to confirm that a callback function returns 0 or 1 was
>>>> added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
>>>> At the time, callback return value was only used to continue or stop
>>>> iteration.
>>>>
>>>> In order to support callbacks with a broader return value range, such as
>>>> those added further in this series, add a callback_ret_range to
>>>> bpf_func_state. Verifier's helpers which set in_callback_fn will also
>>>> set the new field, which the verifier will later use to check return
>>>> value bounds.
>>>>
>>>> Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
>>>> value as the latter would prevent the valid range (0, U64_MAX) being
>>>> used.
>>>>
>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>> ---
>>>> include/linux/bpf_verifier.h | 1 +
>>>> kernel/bpf/verifier.c | 4 +++-
>>>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>>> index 2e3bad8640dc..9c017575c034 100644
>>>> --- a/include/linux/bpf_verifier.h
>>>> +++ b/include/linux/bpf_verifier.h
>>>> @@ -237,6 +237,7 @@ struct bpf_func_state {
>>>> */
>>>> u32 async_entry_cnt;
>>>> bool in_callback_fn;
>>>> + struct tnum callback_ret_range;
>>>> bool in_async_callback_fn;
>>>>
>>>> /* The following fields should be last. See copy_func_state() */
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index 9bef8b41e737..68bfa7c28048 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
>>>> state->callsite = callsite;
>>>> state->frameno = frameno;
>>>> state->subprogno = subprogno;
>>>> + state->callback_ret_range = tnum_range(0, 1);
>>>> init_reg_state(env, state);
>>>> mark_verifier_state_scratched(env);
>>>> }
>>>> @@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
>>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
>>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
>>>> callee->in_callback_fn = true;
>>>> + callee->callback_ret_range = tnum_range(0, 1);
>>>
>>> Thanks for removing this restriction for callback functions!
>>>
>>> One quick question: is this line above needed? I think in
>>> __check_func_call, we always call init_func_state() first before
>>> calling set_find_vma_callback_state(), so after the init_func_state()
>>> call, the callee->callback_ret_range will already be set to
>>> tnum_range(0,1).
>>>
>>
>> You're right, it's not strictly necessary. I think that the default range being
>> tnum_range(0, 1) - although necessary for backwards compat - is unintuitive. So
>> decided to be explicit with existing callbacks so folks didn't have to go
>> searching for the default to understand what the ret_range is, and it's more
>> obvious that callback_ret_range should be changed if existing helper code is
>> reused.
>
> Maybe then it's better to keep callback_ret_range as range(0,0)
> in init_func_state() to nudge/force other places to set it explicitly ?
tnum_range(0, 0) sounds good to me.
Would you like me to send this separately with that change, so it can be applied
independently of rest of these changes?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range
2022-09-08 21:36 ` Dave Marchevsky
@ 2022-09-08 21:40 ` Alexei Starovoitov
2022-09-08 23:10 ` Dave Marchevsky
0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2022-09-08 21:40 UTC (permalink / raw)
To: Dave Marchevsky
Cc: Alexei Starovoitov, Joanne Koong, bpf, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Kernel Team
On Thu, Sep 8, 2022 at 2:37 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>
> On 9/6/22 9:53 PM, Alexei Starovoitov wrote:
> > On 9/6/22 4:42 PM, Dave Marchevsky wrote:
> >> On 9/1/22 5:01 PM, Joanne Koong wrote:
> >>> On Tue, Aug 30, 2022 at 11:03 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
> >>>>
> >>>> Verifier logic to confirm that a callback function returns 0 or 1 was
> >>>> added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
> >>>> At the time, callback return value was only used to continue or stop
> >>>> iteration.
> >>>>
> >>>> In order to support callbacks with a broader return value range, such as
> >>>> those added further in this series, add a callback_ret_range to
> >>>> bpf_func_state. Verifier's helpers which set in_callback_fn will also
> >>>> set the new field, which the verifier will later use to check return
> >>>> value bounds.
> >>>>
> >>>> Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
> >>>> value as the latter would prevent the valid range (0, U64_MAX) being
> >>>> used.
> >>>>
> >>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> >>>> ---
> >>>> include/linux/bpf_verifier.h | 1 +
> >>>> kernel/bpf/verifier.c | 4 +++-
> >>>> 2 files changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >>>> index 2e3bad8640dc..9c017575c034 100644
> >>>> --- a/include/linux/bpf_verifier.h
> >>>> +++ b/include/linux/bpf_verifier.h
> >>>> @@ -237,6 +237,7 @@ struct bpf_func_state {
> >>>> */
> >>>> u32 async_entry_cnt;
> >>>> bool in_callback_fn;
> >>>> + struct tnum callback_ret_range;
> >>>> bool in_async_callback_fn;
> >>>>
> >>>> /* The following fields should be last. See copy_func_state() */
> >>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>>> index 9bef8b41e737..68bfa7c28048 100644
> >>>> --- a/kernel/bpf/verifier.c
> >>>> +++ b/kernel/bpf/verifier.c
> >>>> @@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
> >>>> state->callsite = callsite;
> >>>> state->frameno = frameno;
> >>>> state->subprogno = subprogno;
> >>>> + state->callback_ret_range = tnum_range(0, 1);
> >>>> init_reg_state(env, state);
> >>>> mark_verifier_state_scratched(env);
> >>>> }
> >>>> @@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
> >>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
> >>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
> >>>> callee->in_callback_fn = true;
> >>>> + callee->callback_ret_range = tnum_range(0, 1);
> >>>
> >>> Thanks for removing this restriction for callback functions!
> >>>
> >>> One quick question: is this line above needed? I think in
> >>> __check_func_call, we always call init_func_state() first before
> >>> calling set_find_vma_callback_state(), so after the init_func_state()
> >>> call, the callee->callback_ret_range will already be set to
> >>> tnum_range(0,1).
> >>>
> >>
> >> You're right, it's not strictly necessary. I think that the default range being
> >> tnum_range(0, 1) - although necessary for backwards compat - is unintuitive. So
> >> decided to be explicit with existing callbacks so folks didn't have to go
> >> searching for the default to understand what the ret_range is, and it's more
> >> obvious that callback_ret_range should be changed if existing helper code is
> >> reused.
> >
> > Maybe then it's better to keep callback_ret_range as range(0,0)
> > in init_func_state() to nudge/force other places to set it explicitly ?
>
> tnum_range(0, 0) sounds good to me.
>
> Would you like me to send this separately with that change, so it can be applied
> independently of rest of these changes?
Whichever way is faster.
We can always apply a patch or a few patches out of a bigger set.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range
2022-09-08 21:40 ` Alexei Starovoitov
@ 2022-09-08 23:10 ` Dave Marchevsky
0 siblings, 0 replies; 25+ messages in thread
From: Dave Marchevsky @ 2022-09-08 23:10 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Joanne Koong, bpf, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Kernel Team
On 9/8/22 5:40 PM, Alexei Starovoitov wrote:
> On Thu, Sep 8, 2022 at 2:37 PM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>
>> On 9/6/22 9:53 PM, Alexei Starovoitov wrote:
>>> On 9/6/22 4:42 PM, Dave Marchevsky wrote:
>>>> On 9/1/22 5:01 PM, Joanne Koong wrote:
>>>>> On Tue, Aug 30, 2022 at 11:03 AM Dave Marchevsky <davemarchevsky@fb.com> wrote:
>>>>>>
>>>>>> Verifier logic to confirm that a callback function returns 0 or 1 was
>>>>>> added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper").
>>>>>> At the time, callback return value was only used to continue or stop
>>>>>> iteration.
>>>>>>
>>>>>> In order to support callbacks with a broader return value range, such as
>>>>>> those added further in this series, add a callback_ret_range to
>>>>>> bpf_func_state. Verifier's helpers which set in_callback_fn will also
>>>>>> set the new field, which the verifier will later use to check return
>>>>>> value bounds.
>>>>>>
>>>>>> Default to tnum_range(0, 1) instead of using tnum_unknown as a sentinel
>>>>>> value as the latter would prevent the valid range (0, U64_MAX) being
>>>>>> used.
>>>>>>
>>>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>>>> ---
>>>>>> include/linux/bpf_verifier.h | 1 +
>>>>>> kernel/bpf/verifier.c | 4 +++-
>>>>>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>>>>> index 2e3bad8640dc..9c017575c034 100644
>>>>>> --- a/include/linux/bpf_verifier.h
>>>>>> +++ b/include/linux/bpf_verifier.h
>>>>>> @@ -237,6 +237,7 @@ struct bpf_func_state {
>>>>>> */
>>>>>> u32 async_entry_cnt;
>>>>>> bool in_callback_fn;
>>>>>> + struct tnum callback_ret_range;
>>>>>> bool in_async_callback_fn;
>>>>>>
>>>>>> /* The following fields should be last. See copy_func_state() */
>>>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>>>> index 9bef8b41e737..68bfa7c28048 100644
>>>>>> --- a/kernel/bpf/verifier.c
>>>>>> +++ b/kernel/bpf/verifier.c
>>>>>> @@ -1745,6 +1745,7 @@ static void init_func_state(struct bpf_verifier_env *env,
>>>>>> state->callsite = callsite;
>>>>>> state->frameno = frameno;
>>>>>> state->subprogno = subprogno;
>>>>>> + state->callback_ret_range = tnum_range(0, 1);
>>>>>> init_reg_state(env, state);
>>>>>> mark_verifier_state_scratched(env);
>>>>>> }
>>>>>> @@ -6879,6 +6880,7 @@ static int set_find_vma_callback_state(struct bpf_verifier_env *env,
>>>>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
>>>>>> __mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
>>>>>> callee->in_callback_fn = true;
>>>>>> + callee->callback_ret_range = tnum_range(0, 1);
>>>>>
>>>>> Thanks for removing this restriction for callback functions!
>>>>>
>>>>> One quick question: is this line above needed? I think in
>>>>> __check_func_call, we always call init_func_state() first before
>>>>> calling set_find_vma_callback_state(), so after the init_func_state()
>>>>> call, the callee->callback_ret_range will already be set to
>>>>> tnum_range(0,1).
>>>>>
>>>>
>>>> You're right, it's not strictly necessary. I think that the default range being
>>>> tnum_range(0, 1) - although necessary for backwards compat - is unintuitive. So
>>>> decided to be explicit with existing callbacks so folks didn't have to go
>>>> searching for the default to understand what the ret_range is, and it's more
>>>> obvious that callback_ret_range should be changed if existing helper code is
>>>> reused.
>>>
>>> Maybe then it's better to keep callback_ret_range as range(0,0)
>>> in init_func_state() to nudge/force other places to set it explicitly ?
>>
>> tnum_range(0, 0) sounds good to me.
>>
>> Would you like me to send this separately with that change, so it can be applied
>> independently of rest of these changes?
>
> Whichever way is faster.
> We can always apply a patch or a few patches out of a bigger set.
Updated + rebased and sent as https://lore.kernel.org/bpf/20220908230716.2751723-1-davemarchevsky@fb.com/
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-09-08 23:10 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range Dave Marchevsky
2022-09-01 21:01 ` Joanne Koong
2022-09-06 23:42 ` Dave Marchevsky
2022-09-07 1:53 ` Alexei Starovoitov
2022-09-08 21:36 ` Dave Marchevsky
2022-09-08 21:40 ` Alexei Starovoitov
2022-09-08 23:10 ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 02/18] bpf: Add verifier check for BPF_PTR_POISON retval and arg Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 03/18] bpf: Add rb_node_off to bpf_map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 04/18] bpf: Add rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 05/18] libbpf: Add support for private BSS map section Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 06/18] bpf: Add bpf_spin_lock member to rbtree Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 07/18] bpf: Add bpf_rbtree_{lock,unlock} helpers Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 08/18] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 09/18] bpf: Support declarative association of lock with rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 10/18] bpf: Verifier tracking of rbtree_spin_lock held Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 11/18] bpf: Check rbtree lock held during verification Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 12/18] bpf: Add OBJ_NON_OWNING_REF type flag Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 13/18] bpf: Add CONDITIONAL_RELEASE " Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 14/18] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 15/18] selftests/bpf: Add rbtree map tests Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 16/18] selftests/bpf: Declarative lock definition test changes Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 17/18] selftests/bpf: Lock tracking " Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 18/18] selftests/bpf: Rbtree static lock verification " Dave Marchevsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox