BPF List
 help / color / mirror / Atom feed
From: Dave Marchevsky <davemarchevsky@fb.com>
To: <bpf@vger.kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kernel Team <kernel-team@fb.com>,
	Dave Marchevsky <davemarchevsky@fb.com>
Subject: [RFCv2 PATCH bpf-next 09/18] bpf: Support declarative association of lock with rbtree map
Date: Tue, 30 Aug 2022 10:27:50 -0700	[thread overview]
Message-ID: <20220830172759.4069786-10-davemarchevsky@fb.com> (raw)
In-Reply-To: <20220830172759.4069786-1-davemarchevsky@fb.com>

This patch adds support for association of a bpf_spin_lock in an
internal arraymap with an rbtree, using the following pattern:

  struct bpf_spin_lock rbtree_lock SEC(".bss.private");

  struct {
          __uint(type, BPF_MAP_TYPE_RBTREE);
          __type(value, struct node_data);
          __array(lock, struct bpf_spin_lock);
  } rbtree SEC(".maps") = {
          .lock = {
                  [0] = &rbtree_lock,
          },
  };

There are a few benefits of this pattern over existing "init lock as
part of map, use bpf_rbtree_get_lock" logic:

  * Multiple rbtrees, potentially in different compilation units, may
    share the same lock

  * Lock lifetime does not need to be tied to map lifetime, aside from
    being strictly greater than map's lifetime

  * Can move from bpf_rbtree_{lock,unlock} to more generic
    bpf_{lock,unlock} helpers while still retaining static verification
    of locking behavior
    * struct bpf_rbtree still keeps lock pointer and this declarative
      association guarantees that the pointer address is known at rbtree
      map creation time

Implementation notes:

The mechanics of declarative lock association are heavily inspired by
declarative map-in-map inner map definition using __array(values,... .
Similarly to map-in-map "values", libbpf's bpf_map init_slots is used to
hold the target map, which in the above example is the .bss.private
internal arraymap. However, unlike "values" inner maps, the lock is a
map value within this map, so it's necessary to also pass the offset of
the lock symbol within the internal arraymap.

No uapi changes are necessary as existing BPF_MAP_CREATE map_extra is
used to pass the pair (fd of lock map, offset of symbol within lock map)
to the kernel when creating the rbtree map. rbtree_map_alloc can then
use this pair to find the address of the lock. Logic here is equivalent
to verifier's rewrite of BPF_PSEUDO_MAP_FD in check_ld_imm and
resolve_pseudo_ldimm64 - get actual struct bpf_map using fd, get address
of map's value region using map_direct_value_addr, add offset to get
actual lock addr.

This does introduce a dependency on the internal map containing the lock
("lock map") on both the libbpf- and kernel-side. For the former, it's
now necessary to init internal global data maps before user btf maps, as
the fd of the lock map must be known when data relo on rbtree map is
performed. For the latter, rbtree now holds refcount to the lock map to
ensure it cannot be freed until after the rbtree map is.

[
  RFC NOTE: This patch doesn't change verifier behavior, it's still doing
  dynamic lock checking. Further patches in the series change this
  behavior. This patch and the rest of the locking patches should be
  squashed, leaving in this state for now to make it easier to include
  or toss things piecemeal after feedback
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/rbtree.c    | 47 +++++++++++++++++---
 kernel/bpf/syscall.c   |  8 +++-
 tools/lib/bpf/libbpf.c | 99 +++++++++++++++++++++++++++++++++++++-----
 3 files changed, 136 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/rbtree.c b/kernel/bpf/rbtree.c
index 85a1d35818d0..c61662822511 100644
--- a/kernel/bpf/rbtree.c
+++ b/kernel/bpf/rbtree.c
@@ -11,6 +11,7 @@ struct bpf_rbtree {
 	struct bpf_map map;
 	struct rb_root_cached root;
 	struct bpf_spin_lock *lock;
+	struct bpf_map *lock_map;
 };
 
 static bool __rbtree_lock_held(struct bpf_rbtree *tree)
@@ -26,10 +27,22 @@ static int rbtree_map_alloc_check(union bpf_attr *attr)
 	return 0;
 }
 
+static void __rbtree_map_free(struct bpf_rbtree *tree)
+{
+	if (tree->lock_map)
+		bpf_map_put(tree->lock_map);
+	else if (tree->lock)
+		kfree(tree->lock);
+	bpf_map_area_free(tree);
+}
+
 static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
 {
+	u32 lock_map_ufd, lock_map_offset;
 	struct bpf_rbtree *tree;
+	u64 lock_map_addr;
 	int numa_node;
+	int err;
 
 	if (!bpf_capable())
 		return ERR_PTR(-EPERM);
@@ -45,14 +58,35 @@ static struct bpf_map *rbtree_map_alloc(union bpf_attr *attr)
 	tree->root = RB_ROOT_CACHED;
 	bpf_map_init_from_attr(&tree->map, attr);
 
-	tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
-				     GFP_KERNEL | __GFP_NOWARN);
-	if (!tree->lock) {
-		bpf_map_area_free(tree);
-		return ERR_PTR(-ENOMEM);
+	if (!attr->map_extra) {
+		tree->lock = bpf_map_kzalloc(&tree->map, sizeof(struct bpf_spin_lock),
+					     GFP_KERNEL | __GFP_NOWARN);
+		if (!tree->lock) {
+			err = -ENOMEM;
+			goto err_free;
+		}
+	} else {
+		lock_map_ufd = (u32)(attr->map_extra >> 32);
+		lock_map_offset = (u32)attr->map_extra;
+		tree->lock_map = bpf_map_get(lock_map_ufd);
+		if (IS_ERR(tree->lock_map) || !tree->lock_map->ops->map_direct_value_addr) {
+			err = PTR_ERR(tree->lock_map);
+			tree->lock_map = NULL;
+			goto err_free;
+		}
+
+		err = tree->lock_map->ops->map_direct_value_addr(tree->lock_map, &lock_map_addr,
+								 lock_map_offset);
+		if (err)
+			goto err_free;
+
+		tree->lock = (struct bpf_spin_lock *)(lock_map_addr + lock_map_offset);
 	}
 
 	return &tree->map;
+err_free:
+	__rbtree_map_free(tree);
+	return ERR_PTR(err);
 }
 
 static struct rb_node *rbtree_map_alloc_node(struct bpf_map *map, size_t sz)
@@ -159,8 +193,7 @@ static void rbtree_map_free(struct bpf_map *map)
 
 	bpf_rbtree_postorder_for_each_entry_safe(pos, n, &tree->root.rb_root)
 		kfree(pos);
-	kfree(tree->lock);
-	bpf_map_area_free(tree);
+	__rbtree_map_free(tree);
 }
 
 static int rbtree_map_check_btf(const struct bpf_map *map,
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3947fbd137af..fa1220394462 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1066,6 +1066,12 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
+static bool map_uses_map_extra(enum bpf_map_type type)
+{
+	return type == BPF_MAP_TYPE_BLOOM_FILTER ||
+	       type == BPF_MAP_TYPE_RBTREE;
+}
+
 #define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
@@ -1087,7 +1093,7 @@ static int map_create(union bpf_attr *attr)
 		return -EINVAL;
 	}
 
-	if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
+	if (!map_uses_map_extra(attr->map_type) &&
 	    attr->map_extra != 0)
 		return -EINVAL;
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a6dd53e0c4b4..10c840137bac 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1429,6 +1429,11 @@ bpf_object__init_kversion(struct bpf_object *obj, void *data, size_t size)
 	return 0;
 }
 
+static bool bpf_map_type__uses_lock_def(enum bpf_map_type type)
+{
+	return type == BPF_MAP_TYPE_RBTREE;
+}
+
 static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
 {
 	if (type == BPF_MAP_TYPE_ARRAY_OF_MAPS ||
@@ -1517,6 +1522,16 @@ static size_t bpf_map_mmap_sz(const struct bpf_map *map)
 	return map_sz;
 }
 
+static bool internal_map_in_custom_section(const char *real_name)
+{
+	if (strchr(real_name + 1, '.') != NULL) {
+		if (strcmp(real_name, BSS_SEC_PRIVATE) == 0)
+			return false;
+		return true;
+	}
+	return false;
+}
+
 static char *internal_map_name(struct bpf_object *obj, const char *real_name)
 {
 	char map_name[BPF_OBJ_NAME_LEN], *p;
@@ -1559,7 +1574,7 @@ static char *internal_map_name(struct bpf_object *obj, const char *real_name)
 		sfx_len = BPF_OBJ_NAME_LEN - 1;
 
 	/* if there are two or more dots in map name, it's a custom dot map */
-	if (strchr(real_name + 1, '.') != NULL)
+	if (internal_map_in_custom_section(real_name))
 		pfx_len = 0;
 	else
 		pfx_len = min((size_t)BPF_OBJ_NAME_LEN - sfx_len - 1, strlen(obj->name));
@@ -2331,10 +2346,23 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
 		} else if (strcmp(name, "map_extra") == 0) {
 			__u32 map_extra;
 
+			if (bpf_map_type__uses_lock_def(map_def->map_type)) {
+				pr_warn("map '%s': can't set map_extra for map using 'lock' def.\n",
+					map_name);
+				return -EINVAL;
+			}
+
 			if (!get_map_field_int(map_name, btf, m, &map_extra))
 				return -EINVAL;
 			map_def->map_extra = map_extra;
 			map_def->parts |= MAP_DEF_MAP_EXTRA;
+		} else if (strcmp(name, "lock") == 0) {
+			if (!bpf_map_type__uses_lock_def(map_def->map_type)) {
+				pr_warn("map '%s': can't set 'lock' for map.\n", map_name);
+				return -ENOTSUP;
+			}
+			/* TODO: More sanity checking
+			 */
 		} else {
 			if (strict) {
 				pr_warn("map '%s': unknown field '%s'.\n", map_name, name);
@@ -2603,8 +2631,8 @@ static int bpf_object__init_maps(struct bpf_object *obj,
 	strict = !OPTS_GET(opts, relaxed_maps, false);
 	pin_root_path = OPTS_GET(opts, pin_root_path, NULL);
 
-	err = err ?: bpf_object__init_user_btf_maps(obj, strict, pin_root_path);
 	err = err ?: bpf_object__init_global_data_maps(obj);
+	err = err ?: bpf_object__init_user_btf_maps(obj, strict, pin_root_path);
 	err = err ?: bpf_object__init_kconfig_map(obj);
 	err = err ?: bpf_object__init_struct_ops_maps(obj);
 
@@ -4865,6 +4893,25 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
 		map_info.map_extra == map->map_extra);
 }
 
+static struct bpf_map *find_internal_map_by_shndx(struct bpf_object *obj,
+						  int shndx)
+{
+	struct bpf_map *map;
+	int i;
+
+	for (i = 0; i < obj->nr_maps; i++) {
+		map = &obj->maps[i];
+
+		if (!bpf_map__is_internal(map))
+			continue;
+
+		if (map->sec_idx == shndx)
+			return map;
+	}
+
+	return NULL;
+}
+
 static int
 bpf_object__reuse_map(struct bpf_map *map)
 {
@@ -4981,6 +5028,19 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 		}
 		if (map->inner_map_fd >= 0)
 			create_attr.inner_map_fd = map->inner_map_fd;
+	} else if (bpf_map_type__uses_lock_def(def->type)) {
+		if (map->init_slots_sz != 1) {
+			pr_warn("map '%s': expecting single lock def, actual count %d\n",
+				map->name, map->init_slots_sz);
+			return -EINVAL;
+		}
+
+		if (bpf_map__fd(map->init_slots[0]) < 0) {
+			pr_warn("map '%s': failed to find lock map fd\n", map->name);
+			return -EINVAL;
+		}
+
+		create_attr.map_extra |= (__u64)bpf_map__fd(map->init_slots[0]) << 32;
 	}
 
 	switch (def->type) {
@@ -5229,8 +5289,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 					goto err_out;
 				}
 			}
-
-			if (map->init_slots_sz && map->def.type != BPF_MAP_TYPE_PROG_ARRAY) {
+			if (map->init_slots_sz && bpf_map_type__is_map_in_map(map->def.type)) {
 				err = init_map_in_map_slots(obj, map);
 				if (err < 0) {
 					zclose(map->fd);
@@ -6424,9 +6483,9 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 	const struct btf_type *sec, *var, *def;
 	struct bpf_map *map = NULL, *targ_map = NULL;
 	struct bpf_program *targ_prog = NULL;
-	bool is_prog_array, is_map_in_map;
 	const struct btf_member *member;
 	const char *name, *mname, *type;
+	bool is_prog_array;
 	unsigned int moff;
 	Elf64_Sym *sym;
 	Elf64_Rel *rel;
@@ -6474,10 +6533,12 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 			return -EINVAL;
 		}
 
-		is_map_in_map = bpf_map_type__is_map_in_map(map->def.type);
+		/* PROG_ARRAY passes prog pointers using init_slots, other map
+		 * types pass map pointers
+		 */
 		is_prog_array = map->def.type == BPF_MAP_TYPE_PROG_ARRAY;
-		type = is_map_in_map ? "map" : "prog";
-		if (is_map_in_map) {
+		type = is_prog_array ? "prog" : "map";
+		if (bpf_map_type__is_map_in_map(map->def.type)) {
 			if (sym->st_shndx != obj->efile.btf_maps_shndx) {
 				pr_warn(".maps relo #%d: '%s' isn't a BTF-defined map\n",
 					i, name);
@@ -6509,6 +6570,24 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 					i, name);
 				return -LIBBPF_ERRNO__RELOC;
 			}
+		} else if (bpf_map_type__uses_lock_def(map->def.type)) {
+			targ_map = find_internal_map_by_shndx(obj, sym->st_shndx);
+			if (!targ_map) {
+				pr_warn(".maps relo #%d: '%s' isn't a valid map reference\n",
+					i, name);
+				return -LIBBPF_ERRNO__RELOC;
+			}
+
+			/* This shouldn't happen, check in parse_btf_map_def
+			 * should catch this, but to be safe let's prevent
+			 * map_extra overwrite
+			 */
+			if (map->map_extra) {
+				pr_warn(".maps rbtree relo #%d: map '%s' has ", i, map->name);
+				pr_warn("map_extra, can't relo lock, internal error.\n");
+				return -EINVAL;
+			}
+			map->map_extra = sym->st_value;
 		} else {
 			return -EINVAL;
 		}
@@ -6519,7 +6598,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 			return -EINVAL;
 		member = btf_members(def) + btf_vlen(def) - 1;
 		mname = btf__name_by_offset(obj->btf, member->name_off);
-		if (strcmp(mname, "values"))
+		if (strcmp(mname, "values") && strcmp(mname, "lock"))
 			return -EINVAL;
 
 		moff = btf_member_bit_offset(def, btf_vlen(def) - 1) / 8;
@@ -6543,7 +6622,7 @@ static int bpf_object__collect_map_relos(struct bpf_object *obj,
 			       (new_sz - map->init_slots_sz) * host_ptr_sz);
 			map->init_slots_sz = new_sz;
 		}
-		map->init_slots[moff] = is_map_in_map ? (void *)targ_map : (void *)targ_prog;
+		map->init_slots[moff] = is_prog_array ? (void *)targ_prog : (void *)targ_map;
 
 		pr_debug(".maps relo #%d: map '%s' slot [%d] points to %s '%s'\n",
 			 i, map->name, moff, type, name);
-- 
2.30.2


  parent reply	other threads:[~2022-08-30 17:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-30 17:27 [RFCv2 PATCH bpf-next 00/18] bpf: Introduce rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 01/18] bpf: Add verifier support for custom callback return range Dave Marchevsky
2022-09-01 21:01   ` Joanne Koong
2022-09-06 23:42     ` Dave Marchevsky
2022-09-07  1:53       ` Alexei Starovoitov
2022-09-08 21:36         ` Dave Marchevsky
2022-09-08 21:40           ` Alexei Starovoitov
2022-09-08 23:10             ` Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 02/18] bpf: Add verifier check for BPF_PTR_POISON retval and arg Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 03/18] bpf: Add rb_node_off to bpf_map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 04/18] bpf: Add rbtree map Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 05/18] libbpf: Add support for private BSS map section Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 06/18] bpf: Add bpf_spin_lock member to rbtree Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 07/18] bpf: Add bpf_rbtree_{lock,unlock} helpers Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 08/18] bpf: Enforce spinlock hold for bpf_rbtree_{add,remove,find} Dave Marchevsky
2022-08-30 17:27 ` Dave Marchevsky [this message]
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 10/18] bpf: Verifier tracking of rbtree_spin_lock held Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 11/18] bpf: Check rbtree lock held during verification Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 12/18] bpf: Add OBJ_NON_OWNING_REF type flag Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 13/18] bpf: Add CONDITIONAL_RELEASE " Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 14/18] bpf: Introduce PTR_ITER and PTR_ITER_END type flags Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 15/18] selftests/bpf: Add rbtree map tests Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 16/18] selftests/bpf: Declarative lock definition test changes Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 17/18] selftests/bpf: Lock tracking " Dave Marchevsky
2022-08-30 17:27 ` [RFCv2 PATCH bpf-next 18/18] selftests/bpf: Rbtree static lock verification " Dave Marchevsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220830172759.4069786-10-davemarchevsky@fb.com \
    --to=davemarchevsky@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox