BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
@ 2024-04-12 21:08 Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 01/11] bpf: Remove unnecessary checks on the offset of btf_field Kui-Feng Lee
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
global variables. This was due to these types being initialized and
verified in a special manner in the kernel. This patchset allows BPF
programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
the global namespace.

The main change is to add "nelems" to btf_fields. The value of
"nelems" represents the number of elements in the array if a btf_field
represents an array. Otherwise, "nelem" will be 1. The verifier
verifies these types based on the information provided by the
btf_field.

The value of "size" will be the size of the entire array if a
btf_field represents an array. Dividing "size" by "nelems" gives the
size of an element. The value of "offset" will be the offset of the
beginning for an array. By putting this together, we can determine the
offset of each element in an array. For example,

    struct bpf_cpumask __kptr * global_mask_array[2];

the statement above indicates that there is an array containing two
kptr(s). The "size" specified in the corresponding 'btf_field' will be
"16" (the size of the array), and "nelems" will be "2". If the
"offset" of the 'btf_field' is 0xff00 from the beginning of the data
section, the first kptr is at 0xff00, and the second kptr is at
0xff08.

All arrays are flattened to get the value of "nelems". For example,

    struct bpf_cpumask __kptr * global_mask_array[2][3];

the above array will be flattened to a one dimension array having six
elements.

---
Changes from v1:

 - Move the check of element alignment out of btf_field_cmp() to
   btf_record_find().

 - Change the order of the previous patch 4 "bpf:
   check_map_kptr_access() compute the offset from the reg state" as
   the patch 7 now.

 - Reject BPF_RB_NODE and BPF_LIST_NODE with nelems > 1.

 - Rephrase the commit log of the patch "bpf: check_map_access() with
   the knowledge of arrays" to clarify the alignment on elements.

v1: https://lore.kernel.org/bpf/20240410004150.2917641-1-thinker.li@gmail.com/

Kui-Feng Lee (11):
  bpf: Remove unnecessary checks on the offset of btf_field.
  bpf: Remove unnecessary call to btf_field_type_size().
  bpf: Add nelems to struct btf_field_info and btf_field.
  bpf: initialize/free array of btf_field(s).
  bpf: Find btf_field with the knowledge of arrays.
  bpf: check_map_access() with the knowledge of arrays.
  bpf: check_map_kptr_access() compute the offset from the reg state.
  bpf: Enable and verify btf_field arrays.
  selftests/bpf: Test global kptr arrays.
  selftests/bpf: Test global bpf_rb_root arrays.
  selftests/bpf: Test global bpf_list_head arrays.

 include/linux/bpf.h                           |   8 +
 kernel/bpf/btf.c                              |  28 +++-
 kernel/bpf/syscall.c                          |  59 ++++---
 kernel/bpf/verifier.c                         |  26 ++--
 .../selftests/bpf/prog_tests/cpumask.c        |   3 +
 .../selftests/bpf/prog_tests/linked_list.c    |   6 +
 .../testing/selftests/bpf/prog_tests/rbtree.c |  23 +++
 .../selftests/bpf/progs/cpumask_success.c     | 147 ++++++++++++++++++
 .../testing/selftests/bpf/progs/linked_list.c |  24 +++
 tools/testing/selftests/bpf/progs/rbtree.c    |  63 ++++++++
 10 files changed, 353 insertions(+), 34 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 01/11] bpf: Remove unnecessary checks on the offset of btf_field.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 02/11] bpf: Remove unnecessary call to btf_field_type_size() Kui-Feng Lee
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee, Eduard Zingerman

reg_find_field_offset() always return a btf_field with a matching offset
value. Checking the offset of the returned btf_field is unnecessary.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 kernel/bpf/verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2aad6d90550f..0d44940c12d2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11529,7 +11529,7 @@ __process_kf_arg_ptr_to_graph_node(struct bpf_verifier_env *env,
 
 	node_off = reg->off + reg->var_off.value;
 	field = reg_find_field_offset(reg, node_off, node_field_type);
-	if (!field || field->offset != node_off) {
+	if (!field) {
 		verbose(env, "%s not found at offset=%u\n", node_type_name, node_off);
 		return -EINVAL;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 02/11] bpf: Remove unnecessary call to btf_field_type_size().
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 01/11] bpf: Remove unnecessary checks on the offset of btf_field Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 03/11] bpf: Add nelems to struct btf_field_info and btf_field Kui-Feng Lee
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee, Eduard Zingerman

field->size has been initialized by bpf_parse_fields() with the value
returned by btf_field_type_size(). Use it instead of calling
btf_field_type_size() again.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 kernel/bpf/btf.c      | 2 +-
 kernel/bpf/verifier.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 90c4a32d89ff..e71ea78a4db9 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6672,7 +6672,7 @@ int btf_struct_access(struct bpf_verifier_log *log,
 		for (i = 0; i < rec->cnt; i++) {
 			struct btf_field *field = &rec->fields[i];
 			u32 offset = field->offset;
-			if (off < offset + btf_field_type_size(field->type) && offset < off + size) {
+			if (off < offset + field->size && offset < off + size) {
 				bpf_log(log,
 					"direct access to %s is disallowed\n",
 					btf_field_type_name(field->type));
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0d44940c12d2..86adacc5f76c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5432,7 +5432,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 		 * this program. To check that [x1, x2) overlaps with [y1, y2),
 		 * it is sufficient to check x1 < y2 && y1 < x2.
 		 */
-		if (reg->smin_value + off < p + btf_field_type_size(field->type) &&
+		if (reg->smin_value + off < p + field->size &&
 		    p < reg->umax_value + off + size) {
 			switch (field->type) {
 			case BPF_KPTR_UNREF:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 03/11] bpf: Add nelems to struct btf_field_info and btf_field.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 01/11] bpf: Remove unnecessary checks on the offset of btf_field Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 02/11] bpf: Remove unnecessary call to btf_field_type_size() Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 04/11] bpf: initialize/free array of btf_field(s) Kui-Feng Lee
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

To support global field arrays for bpf_rb_root, bpf_list_head, and kptr,
the nelems (number of elements) has been added to btf_field_info and
btf_field. The value of nelems is the length of a field array. Nested
arrays are flatten to get the length.

If a field is not an array, the value of nelems should be 1. In the other
word, you can not distinguish an array with only one element from a field
with the same type as array's element. However, it is not a problem with
the help of the offset and size in btf_field.

field->size will be the size of the array if it is.

The value of nelems of btf_field is always 1 to deactivate arrays for now,
but the nelems of btf_field_info has been updated as the number of elements
for data sections.  A later patch will activate it.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 include/linux/bpf.h |  1 +
 kernel/bpf/btf.c    | 24 +++++++++++++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5034c1b4ded7..cab479925dfd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -226,6 +226,7 @@ struct btf_field_graph_root {
 struct btf_field {
 	u32 offset;
 	u32 size;
+	u32 nelems;
 	enum btf_field_type type;
 	union {
 		struct btf_field_kptr kptr;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index e71ea78a4db9..6fb482789f8e 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3289,6 +3289,7 @@ enum {
 struct btf_field_info {
 	enum btf_field_type type;
 	u32 off;
+	u32 nelems;
 	union {
 		struct {
 			u32 type_id;
@@ -3548,6 +3549,7 @@ static int btf_find_struct_field(const struct btf *btf,
 			continue;
 		if (idx >= info_cnt)
 			return -E2BIG;
+		info[idx].nelems = 1;
 		++idx;
 	}
 	return idx;
@@ -3565,6 +3567,19 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 	for_each_vsi(i, t, vsi) {
 		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
 		const struct btf_type *var_type = btf_type_by_id(btf, var->type);
+		const struct btf_array *array;
+		u32 j, nelems = 1;
+
+		/* Walk into array types to find the element type and the
+		 * number of elements in the (flattened) array.
+		 */
+		for (j = 0; j < MAX_RESOLVE_DEPTH && btf_type_is_array(var_type); j++) {
+			array = btf_array(var_type);
+			nelems *= array->nelems;
+			var_type = btf_type_by_id(btf, array->type);
+		}
+		if (nelems == 0)
+			continue;
 
 		field_type = btf_get_field_type(__btf_name_by_offset(btf, var_type->name_off),
 						field_mask, &seen_mask, &align, &sz);
@@ -3574,7 +3589,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			return field_type;
 
 		off = vsi->offset;
-		if (vsi->size != sz)
+		if (vsi->size != sz * nelems)
 			continue;
 		if (off % align)
 			continue;
@@ -3582,9 +3597,11 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 		switch (field_type) {
 		case BPF_SPIN_LOCK:
 		case BPF_TIMER:
+		case BPF_REFCOUNT:
 		case BPF_LIST_NODE:
 		case BPF_RB_NODE:
-		case BPF_REFCOUNT:
+			if (nelems != 1)
+				continue;
 			ret = btf_find_struct(btf, var_type, off, sz, field_type,
 					      idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3615,7 +3632,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			continue;
 		if (idx >= info_cnt)
 			return -E2BIG;
-		++idx;
+		info[idx++].nelems = nelems;
 	}
 	return idx;
 }
@@ -3834,6 +3851,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->fields[i].offset = info_arr[i].off;
 		rec->fields[i].type = info_arr[i].type;
 		rec->fields[i].size = field_type_size;
+		rec->fields[i].nelems = 1;
 
 		switch (info_arr[i].type) {
 		case BPF_SPIN_LOCK:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 04/11] bpf: initialize/free array of btf_field(s).
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (2 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 03/11] bpf: Add nelems to struct btf_field_info and btf_field Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 05/11] bpf: Find btf_field with the knowledge of arrays Kui-Feng Lee
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Initialize and free each element in a btf_field array based on the values
of nelems and size in btf_field. The value of nelems is the length of the
flatten array for nested arrays.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 include/linux/bpf.h  |  7 +++++++
 kernel/bpf/syscall.c | 39 ++++++++++++++++++++++++---------------
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cab479925dfd..b25dd498b737 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -390,6 +390,9 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 
 static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 {
+	u32 elem_size;
+	int i;
+
 	memset(addr, 0, field->size);
 
 	switch (field->type) {
@@ -400,6 +403,10 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 		RB_CLEAR_NODE((struct rb_node *)addr);
 		break;
 	case BPF_LIST_HEAD:
+		elem_size = field->size / field->nelems;
+		for (i = 0; i < field->nelems; i++, addr += elem_size)
+			INIT_LIST_HEAD((struct list_head *)addr);
+		break;
 	case BPF_LIST_NODE:
 		INIT_LIST_HEAD((struct list_head *)addr);
 		break;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7d392ec83655..cdabb673d358 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -672,6 +672,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 		const struct btf_field *field = &fields[i];
 		void *field_ptr = obj + field->offset;
 		void *xchgd_field;
+		u32 elem_size = field->size / field->nelems;
+		int j;
 
 		switch (fields[i].type) {
 		case BPF_SPIN_LOCK:
@@ -680,35 +682,42 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 			bpf_timer_cancel_and_free(field_ptr);
 			break;
 		case BPF_KPTR_UNREF:
-			WRITE_ONCE(*(u64 *)field_ptr, 0);
+			for (j = 0; j < field->nelems; j++, field_ptr += elem_size)
+				WRITE_ONCE(*(u64 *)field_ptr, 0);
 			break;
 		case BPF_KPTR_REF:
 		case BPF_KPTR_PERCPU:
-			xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
-			if (!xchgd_field)
-				break;
-
-			if (!btf_is_kernel(field->kptr.btf)) {
+			if (!btf_is_kernel(field->kptr.btf))
 				pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
 									   field->kptr.btf_id);
-				migrate_disable();
-				__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
-								 pointee_struct_meta->record : NULL,
-								 fields[i].type == BPF_KPTR_PERCPU);
-				migrate_enable();
-			} else {
-				field->kptr.dtor(xchgd_field);
+
+			for (j = 0; j < field->nelems; j++, field_ptr += elem_size) {
+				xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
+				if (!xchgd_field)
+					continue;
+
+				if (!btf_is_kernel(field->kptr.btf)) {
+					migrate_disable();
+					__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
+							    pointee_struct_meta->record : NULL,
+							    fields[i].type == BPF_KPTR_PERCPU);
+					migrate_enable();
+				} else {
+					field->kptr.dtor(xchgd_field);
+				}
 			}
 			break;
 		case BPF_LIST_HEAD:
 			if (WARN_ON_ONCE(rec->spin_lock_off < 0))
 				continue;
-			bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
+			for (j = 0; j < field->nelems; j++, field_ptr += elem_size)
+				bpf_list_head_free(field, field_ptr, obj + rec->spin_lock_off);
 			break;
 		case BPF_RB_ROOT:
 			if (WARN_ON_ONCE(rec->spin_lock_off < 0))
 				continue;
-			bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off);
+			for (j = 0; j < field->nelems; j++, field_ptr += elem_size)
+				bpf_rb_root_free(field, field_ptr, obj + rec->spin_lock_off);
 			break;
 		case BPF_LIST_NODE:
 		case BPF_RB_NODE:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 05/11] bpf: Find btf_field with the knowledge of arrays.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (3 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 04/11] bpf: initialize/free array of btf_field(s) Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 06/11] bpf: check_map_access() " Kui-Feng Lee
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Modify btf_record_find() so that it locates a btf_field by comparing the
provided offset with the offset of elements, instead of the offset of the
entire array, in the case where a btf_field represents an array. This
update is crucial for accommodating btf_field arrays in upcoming patches.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 kernel/bpf/syscall.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cdabb673d358..1c8a9bc00d17 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -516,11 +516,16 @@ int bpf_map_alloc_pages(const struct bpf_map *map, gfp_t gfp, int nid,
 static int btf_field_cmp(const void *a, const void *b)
 {
 	const struct btf_field *f1 = a, *f2 = b;
+	int gt = 1, lt = -1;
 
+	if (f2->nelems == 0) {
+		swap(f1, f2);
+		swap(gt, lt);
+	}
 	if (f1->offset < f2->offset)
-		return -1;
-	else if (f1->offset > f2->offset)
-		return 1;
+		return lt;
+	else if (f1->offset >= f2->offset + f2->size)
+		return gt;
 	return 0;
 }
 
@@ -528,12 +533,19 @@ struct btf_field *btf_record_find(const struct btf_record *rec, u32 offset,
 				  u32 field_mask)
 {
 	struct btf_field *field;
+	struct btf_field key = {
+		.offset = offset,
+		.size = 0,	/* as a label for this key */
+	};
 
 	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & field_mask))
 		return NULL;
-	field = bsearch(&offset, rec->fields, rec->cnt, sizeof(rec->fields[0]), btf_field_cmp);
+	field = bsearch(&key, rec->fields, rec->cnt, sizeof(rec->fields[0]), btf_field_cmp);
 	if (!field || !(field->type & field_mask))
 		return NULL;
+	if ((offset - field->offset) % (field->size / field->nelems))
+		/* not aligned with an array element */
+		return NULL;
 	return field;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 06/11] bpf: check_map_access() with the knowledge of arrays.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (4 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 05/11] bpf: Find btf_field with the knowledge of arrays Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 07/11] bpf: check_map_kptr_access() compute the offset from the reg state Kui-Feng Lee
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Ensure that accessing a map aligns with an element if the corresponding
btf_field, if there is, represents an array.

Without this change, an access would need to align with the beginning of an
array, otherwise it would be rejected.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 kernel/bpf/verifier.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 86adacc5f76c..302aad33a7f4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5426,7 +5426,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 	rec = map->record;
 	for (i = 0; i < rec->cnt; i++) {
 		struct btf_field *field = &rec->fields[i];
-		u32 p = field->offset;
+		u32 p = field->offset, var_p, elem_size;
 
 		/* If any part of a field  can be touched by load/store, reject
 		 * this program. To check that [x1, x2) overlaps with [y1, y2),
@@ -5446,7 +5446,10 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 					verbose(env, "kptr access cannot have variable offset\n");
 					return -EACCES;
 				}
-				if (p != off + reg->var_off.value) {
+				var_p = off + reg->var_off.value;
+				elem_size = field->size / field->nelems;
+				if (var_p < p || var_p >= p + field->size ||
+				    (var_p - p) % elem_size) {
 					verbose(env, "kptr access misaligned expected=%u off=%llu\n",
 						p, off + reg->var_off.value);
 					return -EACCES;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 07/11] bpf: check_map_kptr_access() compute the offset from the reg state.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (5 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 06/11] bpf: check_map_access() " Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 08/11] bpf: Enable and verify btf_field arrays Kui-Feng Lee
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee, Eduard Zingerman

Previously, check_map_kptr_access() assumed that the accessed offset was
identical to the offset in the btf_field. However, once field array is
supported, the accessed offset no longer matches the offset in the
bpf_field. It may refer to an element in an array while the offset in the
bpf_field refers to the beginning of the array.

To handle arrays, it computes the offset from the reg state instead.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 kernel/bpf/verifier.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 302aad33a7f4..67b89d4ea1ba 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5349,18 +5349,19 @@ static u32 btf_ld_kptr_type(struct bpf_verifier_env *env, struct btf_field *kptr
 }
 
 static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
-				 int value_regno, int insn_idx,
+				 u32 offset, int value_regno, int insn_idx,
 				 struct btf_field *kptr_field)
 {
 	struct bpf_insn *insn = &env->prog->insnsi[insn_idx];
 	int class = BPF_CLASS(insn->code);
-	struct bpf_reg_state *val_reg;
+	struct bpf_reg_state *val_reg, *reg;
 
 	/* Things we already checked for in check_map_access and caller:
 	 *  - Reject cases where variable offset may touch kptr
 	 *  - size of access (must be BPF_DW)
 	 *  - tnum_is_const(reg->var_off)
-	 *  - kptr_field->offset == off + reg->var_off.value
+	 *  - kptr_field->offset + kptr_field->size * i / kptr_field->nelems
+	 *    == off + reg->var_off.value where n is an index into the array
 	 */
 	/* Only BPF_[LDX,STX,ST] | BPF_MEM | BPF_DW is supported */
 	if (BPF_MODE(insn->code) != BPF_MEM) {
@@ -5393,8 +5394,9 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 	} else if (class == BPF_ST) {
 		if (insn->imm) {
-			verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%u\n",
-				kptr_field->offset);
+			reg = reg_state(env, regno);
+			verbose(env, "BPF_ST imm must be 0 when storing to kptr at off=%llu\n",
+				reg->var_off.value + offset);
 			return -EACCES;
 		}
 	} else {
@@ -6784,7 +6786,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			kptr_field = btf_record_find(reg->map_ptr->record,
 						     off + reg->var_off.value, BPF_KPTR);
 		if (kptr_field) {
-			err = check_map_kptr_access(env, regno, value_regno, insn_idx, kptr_field);
+			err = check_map_kptr_access(env, regno, off, value_regno,
+						    insn_idx, kptr_field);
 		} else if (t == BPF_READ && value_regno >= 0) {
 			struct bpf_map *map = reg->map_ptr;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 08/11] bpf: Enable and verify btf_field arrays.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (6 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 07/11] bpf: check_map_kptr_access() compute the offset from the reg state Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 09/11] selftests/bpf: Test global kptr arrays Kui-Feng Lee
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Set the value of "nelems" of a btf_field with the length of the flattened
array if the btf_field represents an array. The "size" is the size of the
entire array rather than the size of an individual element.

Once "nelems" and "size" reflects the length and size of arrays
respectively, it allows BPF programs to easily declare multiple kptr,
btf_rb_root, or btf_list_head in a global array.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 kernel/bpf/btf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 6fb482789f8e..ae17d3996843 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3835,7 +3835,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 	rec->timer_off = -EINVAL;
 	rec->refcount_off = -EINVAL;
 	for (i = 0; i < cnt; i++) {
-		field_type_size = btf_field_type_size(info_arr[i].type);
+		field_type_size = btf_field_type_size(info_arr[i].type) * info_arr[i].nelems;
 		if (info_arr[i].off + field_type_size > value_size) {
 			WARN_ONCE(1, "verifier bug off %d size %d", info_arr[i].off, value_size);
 			ret = -EFAULT;
@@ -3851,7 +3851,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 		rec->fields[i].offset = info_arr[i].off;
 		rec->fields[i].type = info_arr[i].type;
 		rec->fields[i].size = field_type_size;
-		rec->fields[i].nelems = 1;
+		rec->fields[i].nelems = info_arr[i].nelems;
 
 		switch (info_arr[i].type) {
 		case BPF_SPIN_LOCK:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 09/11] selftests/bpf: Test global kptr arrays.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (7 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 08/11] bpf: Enable and verify btf_field arrays Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 10/11] selftests/bpf: Test global bpf_rb_root arrays Kui-Feng Lee
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Make sure that BPF programs can declare global kptr arrays.

An array with only one element is special case, that it treats the element
like a non-array kptr field. Nested arrays are also tested to ensure they
are handled properly.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 .../selftests/bpf/prog_tests/cpumask.c        |   3 +
 .../selftests/bpf/progs/cpumask_success.c     | 147 ++++++++++++++++++
 2 files changed, 150 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/cpumask.c b/tools/testing/selftests/bpf/prog_tests/cpumask.c
index ecf89df78109..bba601e235f6 100644
--- a/tools/testing/selftests/bpf/prog_tests/cpumask.c
+++ b/tools/testing/selftests/bpf/prog_tests/cpumask.c
@@ -18,6 +18,9 @@ static const char * const cpumask_success_testcases[] = {
 	"test_insert_leave",
 	"test_insert_remove_release",
 	"test_global_mask_rcu",
+	"test_global_mask_array_one_rcu",
+	"test_global_mask_array_rcu",
+	"test_global_mask_array_l2_rcu",
 	"test_cpumask_weight",
 };
 
diff --git a/tools/testing/selftests/bpf/progs/cpumask_success.c b/tools/testing/selftests/bpf/progs/cpumask_success.c
index 7a1e64c6c065..9d76d85680d7 100644
--- a/tools/testing/selftests/bpf/progs/cpumask_success.c
+++ b/tools/testing/selftests/bpf/progs/cpumask_success.c
@@ -11,6 +11,9 @@
 char _license[] SEC("license") = "GPL";
 
 int pid, nr_cpus;
+private(MASK) static struct bpf_cpumask __kptr * global_mask_array[2];
+private(MASK) static struct bpf_cpumask __kptr * global_mask_array_l2[2][1];
+private(MASK) static struct bpf_cpumask __kptr * global_mask_array_one[1];
 
 static bool is_test_task(void)
 {
@@ -460,6 +463,150 @@ int BPF_PROG(test_global_mask_rcu, struct task_struct *task, u64 clone_flags)
 	return 0;
 }
 
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_global_mask_array_one_rcu, struct task_struct *task, u64 clone_flags)
+{
+	struct bpf_cpumask *local, *prev;
+
+	if (!is_test_task())
+		return 0;
+
+	/* Kptr arrays with one element are special cased, being treated
+	 * just like a single pointer.
+	 */
+
+	local = create_cpumask();
+	if (!local)
+		return 0;
+
+	prev = bpf_kptr_xchg(&global_mask_array_one[0], local);
+	if (prev) {
+		bpf_cpumask_release(prev);
+		err = 3;
+		return 0;
+	}
+
+	bpf_rcu_read_lock();
+	local = global_mask_array_one[0];
+	if (!local) {
+		err = 4;
+		bpf_rcu_read_unlock();
+		return 0;
+	}
+
+	bpf_rcu_read_unlock();
+
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_global_mask_array_rcu, struct task_struct *task, u64 clone_flags)
+{
+	struct bpf_cpumask *local;
+
+	if (!is_test_task())
+		return 0;
+
+	/* Check if two kptrs in the array work and independently */
+
+	local = create_cpumask();
+	if (!local)
+		return 0;
+
+	bpf_rcu_read_lock();
+
+	local = bpf_kptr_xchg(&global_mask_array[0], local);
+	if (local) {
+		err = 1;
+		goto err_exit;
+	}
+
+	/* global_mask_array => [<mask>, NULL] */
+	if (!global_mask_array[0] || global_mask_array[1]) {
+		err = 2;
+		goto err_exit;
+	}
+
+	local = create_cpumask();
+	if (!local) {
+		err = 9;
+		goto err_exit;
+	}
+
+	local = bpf_kptr_xchg(&global_mask_array[1], local);
+	if (local) {
+		err = 10;
+		goto err_exit;
+	}
+
+	/* global_mask_array => [<mask 1>, <mask 2>] */
+	if (!global_mask_array[0] || !global_mask_array[1] ||
+	    global_mask_array[0] == global_mask_array[1]) {
+		err = 11;
+		goto err_exit;
+	}
+
+err_exit:
+	if (local)
+		bpf_cpumask_release(local);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_global_mask_array_l2_rcu, struct task_struct *task, u64 clone_flags)
+{
+	struct bpf_cpumask *local;
+
+	if (!is_test_task())
+		return 0;
+
+	/* Check if two kptrs in the array work and independently */
+
+	local = create_cpumask();
+	if (!local)
+		return 0;
+
+	bpf_rcu_read_lock();
+
+	local = bpf_kptr_xchg(&global_mask_array_l2[0][0], local);
+	if (local) {
+		err = 1;
+		goto err_exit;
+	}
+
+	/* global_mask_array => [[<mask>], [NULL]] */
+	if (!global_mask_array_l2[0][0] || global_mask_array_l2[1][0]) {
+		err = 2;
+		goto err_exit;
+	}
+
+	local = create_cpumask();
+	if (!local) {
+		err = 9;
+		goto err_exit;
+	}
+
+	local = bpf_kptr_xchg(&global_mask_array_l2[1][0], local);
+	if (local) {
+		err = 10;
+		goto err_exit;
+	}
+
+	/* global_mask_array => [[<mask 1>], [<mask 2>]] */
+	if (!global_mask_array_l2[0][0] || !global_mask_array_l2[1][0] ||
+	    global_mask_array_l2[0][0] == global_mask_array_l2[1][0]) {
+		err = 11;
+		goto err_exit;
+	}
+
+err_exit:
+	if (local)
+		bpf_cpumask_release(local);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
 SEC("tp_btf/task_newtask")
 int BPF_PROG(test_cpumask_weight, struct task_struct *task, u64 clone_flags)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 10/11] selftests/bpf: Test global bpf_rb_root arrays.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (8 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 09/11] selftests/bpf: Test global kptr arrays Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-12 21:08 ` [PATCH bpf-next v2 11/11] selftests/bpf: Test global bpf_list_head arrays Kui-Feng Lee
  2024-04-18  3:30 ` [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Alexei Starovoitov
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Make sure global arrays of bpf_rb_root work correctly.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 .../testing/selftests/bpf/prog_tests/rbtree.c | 23 +++++++
 tools/testing/selftests/bpf/progs/rbtree.c    | 63 +++++++++++++++++++
 2 files changed, 86 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/rbtree.c b/tools/testing/selftests/bpf/prog_tests/rbtree.c
index e9300c96607d..399d61b8a9a4 100644
--- a/tools/testing/selftests/bpf/prog_tests/rbtree.c
+++ b/tools/testing/selftests/bpf/prog_tests/rbtree.c
@@ -53,6 +53,27 @@ static void test_rbtree_add_and_remove(void)
 	rbtree__destroy(skel);
 }
 
+static void test_rbtree_add_and_remove_array(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		    .data_in = &pkt_v4,
+		    .data_size_in = sizeof(pkt_v4),
+		    .repeat = 1,
+	);
+	struct rbtree *skel;
+	int ret;
+
+	skel = rbtree__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "rbtree__open_and_load"))
+		return;
+
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_add_and_remove_array), &opts);
+	ASSERT_OK(ret, "rbtree_add_and_remove_array");
+	ASSERT_OK(opts.retval, "rbtree_add_and_remove_array retval");
+
+	rbtree__destroy(skel);
+}
+
 static void test_rbtree_first_and_remove(void)
 {
 	LIBBPF_OPTS(bpf_test_run_opts, opts,
@@ -106,6 +127,8 @@ void test_rbtree_success(void)
 		test_rbtree_add_nodes();
 	if (test__start_subtest("rbtree_add_and_remove"))
 		test_rbtree_add_and_remove();
+	if (test__start_subtest("rbtree_add_and_remove_array"))
+		test_rbtree_add_and_remove_array();
 	if (test__start_subtest("rbtree_first_and_remove"))
 		test_rbtree_first_and_remove();
 	if (test__start_subtest("rbtree_api_release_aliasing"))
diff --git a/tools/testing/selftests/bpf/progs/rbtree.c b/tools/testing/selftests/bpf/progs/rbtree.c
index b09f4fffe57c..6bf7285944d6 100644
--- a/tools/testing/selftests/bpf/progs/rbtree.c
+++ b/tools/testing/selftests/bpf/progs/rbtree.c
@@ -20,6 +20,8 @@ long first_data[2] = {-1, -1};
 #define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
 private(A) struct bpf_spin_lock glock;
 private(A) struct bpf_rb_root groot __contains(node_data, node);
+private(A) struct bpf_rb_root groot_array[2] __contains(node_data, node);
+private(A) struct bpf_rb_root groot_array_one[1] __contains(node_data, node);
 
 static bool less(struct bpf_rb_node *a, const struct bpf_rb_node *b)
 {
@@ -109,6 +111,67 @@ long rbtree_add_and_remove(void *ctx)
 	return 1;
 }
 
+SEC("tc")
+long rbtree_add_and_remove_array(void *ctx)
+{
+	struct bpf_rb_node *res1 = NULL, *res2 = NULL, *res3 = NULL;
+	struct node_data *nodes[3][2] = {{NULL, NULL}, {NULL, NULL}, {NULL, NULL}};
+	struct node_data *n;
+	long k1 = -1, k2 = -1, k3 = -1;
+	int i, j;
+
+	for (i = 0; i < 3; i++) {
+		for (j = 0; j < 2; j++) {
+			nodes[i][j] = bpf_obj_new(typeof(*nodes[i][j]));
+			if (!nodes[i][j])
+				goto err_out;
+			nodes[i][j]->key = i * 2 + j;
+		}
+	}
+
+	bpf_spin_lock(&glock);
+	for (i = 0; i < 2; i++)
+		for (j = 0; j < 2; j++)
+			bpf_rbtree_add(&groot_array[i], &nodes[i][j]->node, less);
+	for (j = 0; j < 2; j++)
+		bpf_rbtree_add(&groot_array_one[0], &nodes[2][j]->node, less);
+	res1 = bpf_rbtree_remove(&groot_array[0], &nodes[0][0]->node);
+	res2 = bpf_rbtree_remove(&groot_array[1], &nodes[1][0]->node);
+	res3 = bpf_rbtree_remove(&groot_array_one[0], &nodes[2][0]->node);
+	bpf_spin_unlock(&glock);
+
+	if (res1) {
+		n = container_of(res1, struct node_data, node);
+		k1 = n->key;
+		bpf_obj_drop(n);
+	}
+	if (res2) {
+		n = container_of(res2, struct node_data, node);
+		k2 = n->key;
+		bpf_obj_drop(n);
+	}
+	if (res3) {
+		n = container_of(res3, struct node_data, node);
+		k3 = n->key;
+		bpf_obj_drop(n);
+	}
+	if (k1 != 0 || k2 != 2)
+		return 2;
+	if (k3 != 4)
+		return 3;
+
+	return 0;
+
+err_out:
+	for (i = 0; i < 3; i++) {
+		for (j = 0; j < 2; j++) {
+			if (nodes[i][j])
+				bpf_obj_drop(nodes[i][j]);
+		}
+	}
+	return 1;
+}
+
 SEC("tc")
 long rbtree_first_and_remove(void *ctx)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH bpf-next v2 11/11] selftests/bpf: Test global bpf_list_head arrays.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (9 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 10/11] selftests/bpf: Test global bpf_rb_root arrays Kui-Feng Lee
@ 2024-04-12 21:08 ` Kui-Feng Lee
  2024-04-18  3:30 ` [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Alexei Starovoitov
  11 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-12 21:08 UTC (permalink / raw)
  To: bpf, ast, martin.lau, song, kernel-team, andrii
  Cc: sinquersw, kuifeng, Kui-Feng Lee

Make sure global arrays of bpf_list_head(s) work correctly.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
---
 .../selftests/bpf/prog_tests/linked_list.c    |  6 +++++
 .../testing/selftests/bpf/progs/linked_list.c | 24 +++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 2fb89de63bd2..0f0ccee688c5 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -183,6 +183,12 @@ static void test_linked_list_success(int mode, bool leave_in_map)
 	if (!leave_in_map)
 		clear_fields(skel->maps.bss_A);
 
+	ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.global_list_array_push_pop), &opts);
+	ASSERT_OK(ret, "global_list_array_push_pop");
+	ASSERT_OK(opts.retval, "global_list_array_push_pop retval");
+	if (!leave_in_map)
+		clear_fields(skel->maps.bss_A);
+
 	if (mode == PUSH_POP)
 		goto end;
 
diff --git a/tools/testing/selftests/bpf/progs/linked_list.c b/tools/testing/selftests/bpf/progs/linked_list.c
index 26205ca80679..7c203b5367a7 100644
--- a/tools/testing/selftests/bpf/progs/linked_list.c
+++ b/tools/testing/selftests/bpf/progs/linked_list.c
@@ -11,6 +11,10 @@
 
 #include "linked_list.h"
 
+private(C) struct bpf_spin_lock glock_c;
+private(C) struct bpf_list_head ghead_array[2] __contains(foo, node2);
+private(C) struct bpf_list_head ghead_array_one[1] __contains(foo, node2);
+
 static __always_inline
 int list_push_pop(struct bpf_spin_lock *lock, struct bpf_list_head *head, bool leave_in_map)
 {
@@ -309,6 +313,26 @@ int global_list_push_pop(void *ctx)
 	return test_list_push_pop(&glock, &ghead);
 }
 
+SEC("tc")
+int global_list_array_push_pop(void *ctx)
+{
+	int r;
+
+	r = test_list_push_pop(&glock_c, &ghead_array[0]);
+	if (r)
+		return r;
+
+	r = test_list_push_pop(&glock_c, &ghead_array[1]);
+	if (r)
+		return r;
+
+	/* Arrays with only one element is a special case, being treated
+	 * just like a bpf_list_head variable by the verifier, not an
+	 * array.
+	 */
+	return test_list_push_pop(&glock_c, &ghead_array_one[0]);
+}
+
 SEC("tc")
 int map_list_push_pop_multiple(void *ctx)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
                   ` (10 preceding siblings ...)
  2024-04-12 21:08 ` [PATCH bpf-next v2 11/11] selftests/bpf: Test global bpf_list_head arrays Kui-Feng Lee
@ 2024-04-18  3:30 ` Alexei Starovoitov
  2024-04-18  4:31   ` Kui-Feng Lee
  11 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2024-04-18  3:30 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu, Kernel Team,
	Andrii Nakryiko, Kui-Feng Lee, Kui-Feng Lee

On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
>
> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
> global variables. This was due to these types being initialized and
> verified in a special manner in the kernel. This patchset allows BPF
> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
> the global namespace.
>
> The main change is to add "nelems" to btf_fields. The value of
> "nelems" represents the number of elements in the array if a btf_field
> represents an array. Otherwise, "nelem" will be 1. The verifier
> verifies these types based on the information provided by the
> btf_field.
>
> The value of "size" will be the size of the entire array if a
> btf_field represents an array. Dividing "size" by "nelems" gives the
> size of an element. The value of "offset" will be the offset of the
> beginning for an array. By putting this together, we can determine the
> offset of each element in an array. For example,
>
>     struct bpf_cpumask __kptr * global_mask_array[2];

Looks like this patch set enables arrays only.
Meaning the following is supported already:

+private(C) struct bpf_spin_lock glock_c;
+private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
+private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);

while this support is added:

+private(C) struct bpf_spin_lock glock_c;
+private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
+private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);

Am I right?

What about the case when bpf_list_head is wrapped in a struct?
private(C) struct foo {
  struct bpf_list_head ghead;
} ghead;

that's not enabled in this patch. I think.

And the following:
private(C) struct foo {
  struct bpf_list_head ghead;
} ghead[2];


or

private(C) struct foo {
  struct bpf_list_head ghead[2];
} ghead;

Won't work either.

I think eventually we want to support all such combinations and
the approach proposed in this patch with 'nelems'
won't work for wrapper structs.

I think it's better to unroll/flatten all structs and arrays
and represent them as individual elements in the flattened
structure. Then there will be no need to special case array with 'nelems'.
All special BTF types will be individual elements with unique offset.

Does this make sense?

pw-bot: cr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-18  3:30 ` [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Alexei Starovoitov
@ 2024-04-18  4:31   ` Kui-Feng Lee
  2024-04-18  5:11     ` Alexei Starovoitov
  0 siblings, 1 reply; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-18  4:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Kui-Feng Lee
  Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu, Kernel Team,
	Andrii Nakryiko, Kui-Feng Lee



On 4/17/24 20:30, Alexei Starovoitov wrote:
> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
>>
>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>> global variables. This was due to these types being initialized and
>> verified in a special manner in the kernel. This patchset allows BPF
>> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
>> the global namespace.
>>
>> The main change is to add "nelems" to btf_fields. The value of
>> "nelems" represents the number of elements in the array if a btf_field
>> represents an array. Otherwise, "nelem" will be 1. The verifier
>> verifies these types based on the information provided by the
>> btf_field.
>>
>> The value of "size" will be the size of the entire array if a
>> btf_field represents an array. Dividing "size" by "nelems" gives the
>> size of an element. The value of "offset" will be the offset of the
>> beginning for an array. By putting this together, we can determine the
>> offset of each element in an array. For example,
>>
>>      struct bpf_cpumask __kptr * global_mask_array[2];
> 
> Looks like this patch set enables arrays only.
> Meaning the following is supported already:
> 
> +private(C) struct bpf_spin_lock glock_c;
> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
> 
> while this support is added:
> 
> +private(C) struct bpf_spin_lock glock_c;
> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);
> 
> Am I right?
> 
> What about the case when bpf_list_head is wrapped in a struct?
> private(C) struct foo {
>    struct bpf_list_head ghead;
> } ghead;
> 
> that's not enabled in this patch. I think.
> 
> And the following:
> private(C) struct foo {
>    struct bpf_list_head ghead;
> } ghead[2];
> 
> 
> or
> 
> private(C) struct foo {
>    struct bpf_list_head ghead[2];
> } ghead;
> 
> Won't work either.

No, they don't work.
We had a discussion about this in the other day.
I proposed to have another patch set to work on struct types.
Do you prefer to handle it in this patch set?

> 
> I think eventually we want to support all such combinations and
> the approach proposed in this patch with 'nelems'
> won't work for wrapper structs.
> 
> I think it's better to unroll/flatten all structs and arrays
> and represent them as individual elements in the flattened
> structure. Then there will be no need to special case array with 'nelems'.
> All special BTF types will be individual elements with unique offset.
> 
> Does this make sense?

That means it will creates 10 btf_field(s) for an array having 10
elements. The purpose of adding "nelems" is to avoid the repetition. Do
you prefer to expand them?


> 
> pw-bot: cr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-18  4:31   ` Kui-Feng Lee
@ 2024-04-18  5:11     ` Alexei Starovoitov
  2024-04-18  6:07       ` Kui-Feng Lee
  0 siblings, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2024-04-18  5:11 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee

On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/17/24 20:30, Alexei Starovoitov wrote:
> > On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
> >>
> >> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
> >> global variables. This was due to these types being initialized and
> >> verified in a special manner in the kernel. This patchset allows BPF
> >> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
> >> the global namespace.
> >>
> >> The main change is to add "nelems" to btf_fields. The value of
> >> "nelems" represents the number of elements in the array if a btf_field
> >> represents an array. Otherwise, "nelem" will be 1. The verifier
> >> verifies these types based on the information provided by the
> >> btf_field.
> >>
> >> The value of "size" will be the size of the entire array if a
> >> btf_field represents an array. Dividing "size" by "nelems" gives the
> >> size of an element. The value of "offset" will be the offset of the
> >> beginning for an array. By putting this together, we can determine the
> >> offset of each element in an array. For example,
> >>
> >>      struct bpf_cpumask __kptr * global_mask_array[2];
> >
> > Looks like this patch set enables arrays only.
> > Meaning the following is supported already:
> >
> > +private(C) struct bpf_spin_lock glock_c;
> > +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
> > +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
> >
> > while this support is added:
> >
> > +private(C) struct bpf_spin_lock glock_c;
> > +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
> > +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);
> >
> > Am I right?
> >
> > What about the case when bpf_list_head is wrapped in a struct?
> > private(C) struct foo {
> >    struct bpf_list_head ghead;
> > } ghead;
> >
> > that's not enabled in this patch. I think.
> >
> > And the following:
> > private(C) struct foo {
> >    struct bpf_list_head ghead;
> > } ghead[2];
> >
> >
> > or
> >
> > private(C) struct foo {
> >    struct bpf_list_head ghead[2];
> > } ghead;
> >
> > Won't work either.
>
> No, they don't work.
> We had a discussion about this in the other day.
> I proposed to have another patch set to work on struct types.
> Do you prefer to handle it in this patch set?
>
> >
> > I think eventually we want to support all such combinations and
> > the approach proposed in this patch with 'nelems'
> > won't work for wrapper structs.
> >
> > I think it's better to unroll/flatten all structs and arrays
> > and represent them as individual elements in the flattened
> > structure. Then there will be no need to special case array with 'nelems'.
> > All special BTF types will be individual elements with unique offset.
> >
> > Does this make sense?
>
> That means it will creates 10 btf_field(s) for an array having 10
> elements. The purpose of adding "nelems" is to avoid the repetition. Do
> you prefer to expand them?

It's not just expansion, but a common way to handle nested structs too.

I suspect by delaying nested into another patchset this approach
will become useless.

So try adding nested structs in all combinations as a follow up and
I suspect you're realize that "nelems" approach doesn't really help.
You'd need to flatten them all.
And once you do there is no need for "nelems".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-18  5:11     ` Alexei Starovoitov
@ 2024-04-18  6:07       ` Kui-Feng Lee
  2024-04-18 14:53         ` Alexei Starovoitov
  0 siblings, 1 reply; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-18  6:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/17/24 22:11, Alexei Starovoitov wrote:
> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
>>>>
>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>> global variables. This was due to these types being initialized and
>>>> verified in a special manner in the kernel. This patchset allows BPF
>>>> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
>>>> the global namespace.
>>>>
>>>> The main change is to add "nelems" to btf_fields. The value of
>>>> "nelems" represents the number of elements in the array if a btf_field
>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>> verifies these types based on the information provided by the
>>>> btf_field.
>>>>
>>>> The value of "size" will be the size of the entire array if a
>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
>>>> size of an element. The value of "offset" will be the offset of the
>>>> beginning for an array. By putting this together, we can determine the
>>>> offset of each element in an array. For example,
>>>>
>>>>       struct bpf_cpumask __kptr * global_mask_array[2];
>>>
>>> Looks like this patch set enables arrays only.
>>> Meaning the following is supported already:
>>>
>>> +private(C) struct bpf_spin_lock glock_c;
>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
>>>
>>> while this support is added:
>>>
>>> +private(C) struct bpf_spin_lock glock_c;
>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);
>>>
>>> Am I right?
>>>
>>> What about the case when bpf_list_head is wrapped in a struct?
>>> private(C) struct foo {
>>>     struct bpf_list_head ghead;
>>> } ghead;
>>>
>>> that's not enabled in this patch. I think.
>>>
>>> And the following:
>>> private(C) struct foo {
>>>     struct bpf_list_head ghead;
>>> } ghead[2];
>>>
>>>
>>> or
>>>
>>> private(C) struct foo {
>>>     struct bpf_list_head ghead[2];
>>> } ghead;
>>>
>>> Won't work either.
>>
>> No, they don't work.
>> We had a discussion about this in the other day.
>> I proposed to have another patch set to work on struct types.
>> Do you prefer to handle it in this patch set?
>>
>>>
>>> I think eventually we want to support all such combinations and
>>> the approach proposed in this patch with 'nelems'
>>> won't work for wrapper structs.
>>>
>>> I think it's better to unroll/flatten all structs and arrays
>>> and represent them as individual elements in the flattened
>>> structure. Then there will be no need to special case array with 'nelems'.
>>> All special BTF types will be individual elements with unique offset.
>>>
>>> Does this make sense?
>>
>> That means it will creates 10 btf_field(s) for an array having 10
>> elements. The purpose of adding "nelems" is to avoid the repetition. Do
>> you prefer to expand them?
> 
> It's not just expansion, but a common way to handle nested structs too.
> 
> I suspect by delaying nested into another patchset this approach
> will become useless.
> 
> So try adding nested structs in all combinations as a follow up and
> I suspect you're realize that "nelems" approach doesn't really help.
> You'd need to flatten them all.
> And once you do there is no need for "nelems".

For me, "nelems" is more like a choice of avoiding repetition of
information, not a necessary. Before adding "nelems", I had considered
to expand them as well. But, eventually, I chose to add "nelems".

Since you think this repetition is not a problem, I will expand array as
individual elements.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-18  6:07       ` Kui-Feng Lee
@ 2024-04-18 14:53         ` Alexei Starovoitov
  2024-04-18 18:27           ` Kui-Feng Lee
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2024-04-18 14:53 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee

On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/17/24 22:11, Alexei Starovoitov wrote:
> > On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 4/17/24 20:30, Alexei Starovoitov wrote:
> >>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
> >>>>
> >>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
> >>>> global variables. This was due to these types being initialized and
> >>>> verified in a special manner in the kernel. This patchset allows BPF
> >>>> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
> >>>> the global namespace.
> >>>>
> >>>> The main change is to add "nelems" to btf_fields. The value of
> >>>> "nelems" represents the number of elements in the array if a btf_field
> >>>> represents an array. Otherwise, "nelem" will be 1. The verifier
> >>>> verifies these types based on the information provided by the
> >>>> btf_field.
> >>>>
> >>>> The value of "size" will be the size of the entire array if a
> >>>> btf_field represents an array. Dividing "size" by "nelems" gives the
> >>>> size of an element. The value of "offset" will be the offset of the
> >>>> beginning for an array. By putting this together, we can determine the
> >>>> offset of each element in an array. For example,
> >>>>
> >>>>       struct bpf_cpumask __kptr * global_mask_array[2];
> >>>
> >>> Looks like this patch set enables arrays only.
> >>> Meaning the following is supported already:
> >>>
> >>> +private(C) struct bpf_spin_lock glock_c;
> >>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
> >>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
> >>>
> >>> while this support is added:
> >>>
> >>> +private(C) struct bpf_spin_lock glock_c;
> >>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
> >>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);
> >>>
> >>> Am I right?
> >>>
> >>> What about the case when bpf_list_head is wrapped in a struct?
> >>> private(C) struct foo {
> >>>     struct bpf_list_head ghead;
> >>> } ghead;
> >>>
> >>> that's not enabled in this patch. I think.
> >>>
> >>> And the following:
> >>> private(C) struct foo {
> >>>     struct bpf_list_head ghead;
> >>> } ghead[2];
> >>>
> >>>
> >>> or
> >>>
> >>> private(C) struct foo {
> >>>     struct bpf_list_head ghead[2];
> >>> } ghead;
> >>>
> >>> Won't work either.
> >>
> >> No, they don't work.
> >> We had a discussion about this in the other day.
> >> I proposed to have another patch set to work on struct types.
> >> Do you prefer to handle it in this patch set?
> >>
> >>>
> >>> I think eventually we want to support all such combinations and
> >>> the approach proposed in this patch with 'nelems'
> >>> won't work for wrapper structs.
> >>>
> >>> I think it's better to unroll/flatten all structs and arrays
> >>> and represent them as individual elements in the flattened
> >>> structure. Then there will be no need to special case array with 'nelems'.
> >>> All special BTF types will be individual elements with unique offset.
> >>>
> >>> Does this make sense?
> >>
> >> That means it will creates 10 btf_field(s) for an array having 10
> >> elements. The purpose of adding "nelems" is to avoid the repetition. Do
> >> you prefer to expand them?
> >
> > It's not just expansion, but a common way to handle nested structs too.
> >
> > I suspect by delaying nested into another patchset this approach
> > will become useless.
> >
> > So try adding nested structs in all combinations as a follow up and
> > I suspect you're realize that "nelems" approach doesn't really help.
> > You'd need to flatten them all.
> > And once you do there is no need for "nelems".
>
> For me, "nelems" is more like a choice of avoiding repetition of
> information, not a necessary. Before adding "nelems", I had considered
> to expand them as well. But, eventually, I chose to add "nelems".
>
> Since you think this repetition is not a problem, I will expand array as
> individual elements.

You don't sound convinced :)
Please add support for nested structs on top of your "nelems" approach
and prototype the same without "nelems" and let's compare the two.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-18 14:53         ` Alexei Starovoitov
@ 2024-04-18 18:27           ` Kui-Feng Lee
  2024-04-19 18:36           ` Kui-Feng Lee
  2024-04-23  2:45           ` Kui-Feng Lee
  2 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-18 18:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/18/24 07:53, Alexei Starovoitov wrote:
> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
>>>>>>
>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>>>> global variables. This was due to these types being initialized and
>>>>>> verified in a special manner in the kernel. This patchset allows BPF
>>>>>> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
>>>>>> the global namespace.
>>>>>>
>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>> "nelems" represents the number of elements in the array if a btf_field
>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>> verifies these types based on the information provided by the
>>>>>> btf_field.
>>>>>>
>>>>>> The value of "size" will be the size of the entire array if a
>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
>>>>>> size of an element. The value of "offset" will be the offset of the
>>>>>> beginning for an array. By putting this together, we can determine the
>>>>>> offset of each element in an array. For example,
>>>>>>
>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>
>>>>> Looks like this patch set enables arrays only.
>>>>> Meaning the following is supported already:
>>>>>
>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
>>>>>
>>>>> while this support is added:
>>>>>
>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);
>>>>>
>>>>> Am I right?
>>>>>
>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead;
>>>>> } ghead;
>>>>>
>>>>> that's not enabled in this patch. I think.
>>>>>
>>>>> And the following:
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead;
>>>>> } ghead[2];
>>>>>
>>>>>
>>>>> or
>>>>>
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead[2];
>>>>> } ghead;
>>>>>
>>>>> Won't work either.
>>>>
>>>> No, they don't work.
>>>> We had a discussion about this in the other day.
>>>> I proposed to have another patch set to work on struct types.
>>>> Do you prefer to handle it in this patch set?
>>>>
>>>>>
>>>>> I think eventually we want to support all such combinations and
>>>>> the approach proposed in this patch with 'nelems'
>>>>> won't work for wrapper structs.
>>>>>
>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>> and represent them as individual elements in the flattened
>>>>> structure. Then there will be no need to special case array with 'nelems'.
>>>>> All special BTF types will be individual elements with unique offset.
>>>>>
>>>>> Does this make sense?
>>>>
>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>> elements. The purpose of adding "nelems" is to avoid the repetition. Do
>>>> you prefer to expand them?
>>>
>>> It's not just expansion, but a common way to handle nested structs too.
>>>
>>> I suspect by delaying nested into another patchset this approach
>>> will become useless.
>>>
>>> So try adding nested structs in all combinations as a follow up and
>>> I suspect you're realize that "nelems" approach doesn't really help.
>>> You'd need to flatten them all.
>>> And once you do there is no need for "nelems".
>>
>> For me, "nelems" is more like a choice of avoiding repetition of
>> information, not a necessary. Before adding "nelems", I had considered
>> to expand them as well. But, eventually, I chose to add "nelems".
>>
>> Since you think this repetition is not a problem, I will expand array as
>> individual elements.
> 
> You don't sound convinced :)
> Please add support for nested structs on top of your "nelems" approach
> and prototype the same without "nelems" and let's compare the two.

Flattening is definitely easier to implement.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-18 14:53         ` Alexei Starovoitov
  2024-04-18 18:27           ` Kui-Feng Lee
@ 2024-04-19 18:36           ` Kui-Feng Lee
  2024-04-23  2:45           ` Kui-Feng Lee
  2 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-19 18:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/18/24 07:53, Alexei Starovoitov wrote:
> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
>>>>>>
>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>>>> global variables. This was due to these types being initialized and
>>>>>> verified in a special manner in the kernel. This patchset allows BPF
>>>>>> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
>>>>>> the global namespace.
>>>>>>
>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>> "nelems" represents the number of elements in the array if a btf_field
>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>> verifies these types based on the information provided by the
>>>>>> btf_field.
>>>>>>
>>>>>> The value of "size" will be the size of the entire array if a
>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
>>>>>> size of an element. The value of "offset" will be the offset of the
>>>>>> beginning for an array. By putting this together, we can determine the
>>>>>> offset of each element in an array. For example,
>>>>>>
>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>
>>>>> Looks like this patch set enables arrays only.
>>>>> Meaning the following is supported already:
>>>>>
>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
>>>>>
>>>>> while this support is added:
>>>>>
>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);
>>>>>
>>>>> Am I right?
>>>>>
>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead;
>>>>> } ghead;
>>>>>
>>>>> that's not enabled in this patch. I think.
>>>>>
>>>>> And the following:
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead;
>>>>> } ghead[2];
>>>>>
>>>>>
>>>>> or
>>>>>
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead[2];
>>>>> } ghead;
>>>>>
>>>>> Won't work either.
>>>>
>>>> No, they don't work.
>>>> We had a discussion about this in the other day.
>>>> I proposed to have another patch set to work on struct types.
>>>> Do you prefer to handle it in this patch set?
>>>>
>>>>>
>>>>> I think eventually we want to support all such combinations and
>>>>> the approach proposed in this patch with 'nelems'
>>>>> won't work for wrapper structs.
>>>>>
>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>> and represent them as individual elements in the flattened
>>>>> structure. Then there will be no need to special case array with 'nelems'.
>>>>> All special BTF types will be individual elements with unique offset.
>>>>>
>>>>> Does this make sense?
>>>>
>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>> elements. The purpose of adding "nelems" is to avoid the repetition. Do
>>>> you prefer to expand them?
>>>
>>> It's not just expansion, but a common way to handle nested structs too.
>>>
>>> I suspect by delaying nested into another patchset this approach
>>> will become useless.
>>>
>>> So try adding nested structs in all combinations as a follow up and
>>> I suspect you're realize that "nelems" approach doesn't really help.
>>> You'd need to flatten them all.
>>> And once you do there is no need for "nelems".
>>
>> For me, "nelems" is more like a choice of avoiding repetition of
>> information, not a necessary. Before adding "nelems", I had considered
>> to expand them as well. But, eventually, I chose to add "nelems".
>>
>> Since you think this repetition is not a problem, I will expand array as
>> individual elements.
> 
> You don't sound convinced :)
> Please add support for nested structs on top of your "nelems" approach
> and prototype the same without "nelems" and let's compare the two.


I have an implementation following with "nelems".
The basic idea is to introduce field type BPF_REPEAT_FIELDS to repeat
some btf_field immediately before if necessary.

For example,

   struct foo {
     struct bpf_cpumask __kptr *a;
     struct bpf_cpumask __kptr *b;
   };

   struct foo fooptrs[10];

it will create two btf_fields for a & b, like

   [kptr_a, kptr_b]

However, fooptrs is any array of size 10. It will create another field
of BPF_REPEAT_FIELDS to repeat two fields immediate before for 9 times.

   [kptr_a, kptr_b, repeat_fields(nelems=9, repeated_cnt=2)]

The size of the repeat_fields with be the size of an element times 9,
and offset of the  repeat_fields will be the offset of &fooptrs[1].

Even struct foo is in another struct nested, it would still create the
same/or similar result. For example,

   struct foo_deep {
     int dummy;
     struct foo inner;
   };

   struct foo_deep deep_ptrs[10];

it will create the similar fields with different offsets.

   [kptr_a, kptr_b, repeated_fields(nelems=9, repeated_cnt=2)]

What if nested with array?

   struct foo_deep_arr {
     int dummy;
     struct foo inner[4];
   }

it will create fields like,

   [kptr_a, kptr_b, repeated_fields(nelems=3, repeated_cnt=2),
    repeated_fields(nelems=9, repeated_cnt=3)]


diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b25dd498b737..bfd31d2a9770 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -202,6 +202,7 @@ enum btf_field_type {
  	BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE,
  	BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
  	BPF_REFCOUNT   = (1 << 9),
+	BPF_REPEAT_FIELDS = (1 << 10),
  };

  typedef void (*btf_dtor_kfunc_t)(void *);
@@ -231,6 +232,7 @@ struct btf_field {
  	union {
  		struct btf_field_kptr kptr;
  		struct btf_field_graph_root graph_root;
+		u32 repeated_cnt;
  	};
  };

@@ -489,6 +491,21 @@ static inline void bpf_obj_memcpy(struct btf_record 
*rec,
  		u32 next_off = rec->fields[i].offset;
  		u32 sz = next_off - curr_off;

+		if (rec->fields[i].type == BPF_REPEAT_FIELDS) {
+			int cnt = rec->fields[i].repeated_cnt;
+			int elem_size = rec->fields[i].size / rec->fields[i].nelems;
+			int j, k;
+			for (j = 0; j < rec->fields[i].nelems; j++) {
+				for (k = i - cnt; k < i; k++) {
+					/* Use repated fields to copy */
+					next_off = rec->fields[k].offset + elem_size + elem_size * j;
+					sz = next_off - curr_off;
+					memcpy(dst + curr_off, src + curr_off, sz);
+					curr_off += rec->fields[k].size + sz;
+				}
+			}
+			continue;
+		}
  		memcpy(dst + curr_off, src + curr_off, sz);
  		curr_off += rec->fields[i].size + sz;
  	}
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ae17d3996843..b8acec702557 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3298,6 +3298,10 @@ struct btf_field_info {
  			const char *node_name;
  			u32 value_btf_id;
  		} graph_root;
+		struct {
+			u32 cnt;
+			u32 elem_size;
+		} repeat;
  	};
  };

@@ -3485,6 +3489,39 @@ static int btf_get_field_type(const char *name, 
u32 field_mask, u32 *seen_mask,

  #undef field_mask_test_name

+static int btf_find_struct_field(const struct btf *btf,
+				 const struct btf_type *t, u32 field_mask,
+				 struct btf_field_info *info, int info_cnt);
+
+static int btf_find_nested_struct(const struct btf *btf, const struct 
btf_type *t,
+				  u32 off, u32 nelems,
+				  u32 field_mask, struct btf_field_info *info,
+				  int info_cnt)
+{
+	int ret, i;
+
+	ret = btf_find_struct_field(btf, t, field_mask, info, info_cnt);
+
+	if (ret <= 0)
+		return ret;
+
+	for (i = 0; i < ret; i++)
+		info[i].off += off;
+
+	if (nelems > 1) {
+		if (ret >= info_cnt)
+			return -E2BIG;
+		info[ret].type = BPF_REPEAT_FIELDS;
+		info[ret].off = off + t->size;
+		info[ret].nelems = nelems - 1;
+		info[ret].repeat.cnt = ret;
+		info[ret].repeat.elem_size = t->size;
+		ret += 1;
+	}
+
+	return ret;
+}
+
  static int btf_find_struct_field(const struct btf *btf,
  				 const struct btf_type *t, u32 field_mask,
  				 struct btf_field_info *info, int info_cnt)
@@ -3497,9 +3534,26 @@ static int btf_find_struct_field(const struct btf 
*btf,
  	for_each_member(i, t, member) {
  		const struct btf_type *member_type = btf_type_by_id(btf,
  								    member->type);
+		const struct btf_array *array;
+		u32 j, nelems = 1;
+
+		/* Walk into array types to find the element type and the
+		 * number of elements in the (flattened) array.
+		 */
+		for (j = 0; j < MAX_RESOLVE_DEPTH && btf_type_is_array(member_type); 
j++) {
+			array = btf_array(member_type);
+			nelems *= array->nelems;
+			member_type = btf_type_by_id(btf, array->type);
+		}
+		if (nelems == 0)
+			continue;

-		field_type = btf_get_field_type(__btf_name_by_offset(btf, 
member_type->name_off),
-						field_mask, &seen_mask, &align, &sz);
+		if ((field_mask & BPF_REPEAT_FIELDS) &&
+		    __btf_type_is_struct(member_type))
+			field_type = BPF_REPEAT_FIELDS;
+		else
+			field_type = btf_get_field_type(__btf_name_by_offset(btf, 
member_type->name_off),
+							field_mask, &seen_mask, &align, &sz);
  		if (field_type == 0)
  			continue;
  		if (field_type < 0)
@@ -3541,6 +3595,15 @@ static int btf_find_struct_field(const struct btf 
*btf,
  			if (ret < 0)
  				return ret;
  			break;
+		case BPF_REPEAT_FIELDS:
+			ret = btf_find_nested_struct(btf, member_type, off, nelems, field_mask,
+						    &info[idx], info_cnt - idx);
+			if (ret < 0)
+				return ret;
+			idx += ret;
+			if (idx >= info_cnt)
+				return -E2BIG;
+			continue;
  		default:
  			return -EFAULT;
  		}
@@ -3549,7 +3612,7 @@ static int btf_find_struct_field(const struct btf 
*btf,
  			continue;
  		if (idx >= info_cnt)
  			return -E2BIG;
-		info[idx].nelems = 1;
+		info[idx].nelems = nelems;
  		++idx;
  	}
  	return idx;
@@ -3581,8 +3644,13 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  		if (nelems == 0)
  			continue;

-		field_type = btf_get_field_type(__btf_name_by_offset(btf, 
var_type->name_off),
-						field_mask, &seen_mask, &align, &sz);
+		if ((field_mask & BPF_REPEAT_FIELDS) &&
+		    __btf_type_is_struct(var_type)) {
+			field_type = BPF_REPEAT_FIELDS;
+			sz = var_type->size;
+		} else
+			field_type = btf_get_field_type(__btf_name_by_offset(btf, 
var_type->name_off),
+							field_mask, &seen_mask, &align, &sz);
  		if (field_type == 0)
  			continue;
  		if (field_type < 0)
@@ -3624,6 +3692,15 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  			if (ret < 0)
  				return ret;
  			break;
+		case BPF_REPEAT_FIELDS:
+			ret = btf_find_nested_struct(btf, var_type, off, nelems, field_mask,
+						     &info[idx], info_cnt - idx);
+			if (ret < 0)
+				return ret;
+			idx += ret;
+			if (idx >= info_cnt)
+				return -E2BIG;
+			continue;
  		default:
  			return -EFAULT;
  		}
@@ -3634,6 +3711,7 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  			return -E2BIG;
  		info[idx++].nelems = nelems;
  	}
+
  	return idx;
  }

@@ -3835,19 +3913,24 @@ struct btf_record *btf_parse_fields(const struct 
btf *btf, const struct btf_type
  	rec->timer_off = -EINVAL;
  	rec->refcount_off = -EINVAL;
  	for (i = 0; i < cnt; i++) {
-		field_type_size = btf_field_type_size(info_arr[i].type) * 
info_arr[i].nelems;
+		if (info_arr[i].type == BPF_REPEAT_FIELDS)
+			field_type_size = info_arr[i].repeat.elem_size * info_arr[i].nelems;
+		else
+			field_type_size = btf_field_type_size(info_arr[i].type) * 
info_arr[i].nelems;
  		if (info_arr[i].off + field_type_size > value_size) {
  			WARN_ONCE(1, "verifier bug off %d size %d", info_arr[i].off, 
value_size);
  			ret = -EFAULT;
  			goto end;
  		}
-		if (info_arr[i].off < next_off) {
+		if (info_arr[i].off < next_off &&
+		    info_arr[i].type != BPF_REPEAT_FIELDS) {
  			ret = -EEXIST;
  			goto end;
  		}
  		next_off = info_arr[i].off + field_type_size;

-		rec->field_mask |= info_arr[i].type;
+		if (info_arr[i].type != BPF_REPEAT_FIELDS)
+			rec->field_mask |= info_arr[i].type;
  		rec->fields[i].offset = info_arr[i].off;
  		rec->fields[i].type = info_arr[i].type;
  		rec->fields[i].size = field_type_size;
@@ -3889,6 +3972,10 @@ struct btf_record *btf_parse_fields(const struct 
btf *btf, const struct btf_type
  		case BPF_LIST_NODE:
  		case BPF_RB_NODE:
  			break;
+
+		case BPF_REPEAT_FIELDS:
+			rec->fields[i].repeated_cnt = info_arr[i].repeat.cnt;
+			break;
  		default:
  			ret = -EFAULT;
  			goto end;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1c8a9bc00d17..0effa1daf2ca 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -532,15 +532,27 @@ static int btf_field_cmp(const void *a, const void *b)
  struct btf_field *btf_record_find(const struct btf_record *rec, u32 
offset,
  				  u32 field_mask)
  {
+	const struct btf_field *fields;
  	struct btf_field *field;
  	struct btf_field key = {
  		.offset = offset,
  		.size = 0,	/* as a label for this key */
  	};
+	u32 cnt, elem_size;

  	if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & field_mask))
  		return NULL;
-	field = bsearch(&key, rec->fields, rec->cnt, sizeof(rec->fields[0]), 
btf_field_cmp);
+	fields = rec->fields;
+	cnt = rec->cnt;
+	while ((field = bsearch(&key, fields, cnt, sizeof(rec->fields[0]), 
btf_field_cmp)) && field->type == BPF_REPEAT_FIELDS) {
+		cnt = field->repeated_cnt;
+		fields = field - cnt;
+		elem_size = field->size / field->nelems;
+		/* Redirect to the offset of repeated fields */
+		offset = offset - field->offset;
+		offset = field->offset + (offset % elem_size) - elem_size;
+		key.offset = offset;
+	}
  	if (!field || !(field->type & field_mask))
  		return NULL;
  	if ((offset - field->offset) % (field->size / field->nelems))
@@ -1106,7 +1118,7 @@ static int map_check_btf(struct bpf_map *map, 
struct bpf_token *token,

  	map->record = btf_parse_fields(btf, value_type,
  				       BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
-				       BPF_RB_ROOT | BPF_REFCOUNT,
+				       BPF_RB_ROOT | BPF_REFCOUNT | BPF_REPEAT_FIELDS,
  				       map->value_size);
  	if (!IS_ERR_OR_NULL(map->record)) {
  		int i;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 67b89d4ea1ba..45b2da8a00d1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5416,6 +5416,7 @@ static int check_map_access(struct 
bpf_verifier_env *env, u32 regno,
  	struct bpf_reg_state *reg = &state->regs[regno];
  	struct bpf_map *map = reg->map_ptr;
  	struct btf_record *rec;
+	u32 cnt;
  	int err, i;

  	err = check_mem_region_access(env, regno, off, size, map->value_size,
@@ -5426,7 +5427,8 @@ static int check_map_access(struct 
bpf_verifier_env *env, u32 regno,
  	if (IS_ERR_OR_NULL(map->record))
  		return 0;
  	rec = map->record;
-	for (i = 0; i < rec->cnt; i++) {
+	cnt = rec->cnt;
+	for (i = 0; i < cnt; i++) {
  		struct btf_field *field = &rec->fields[i];
  		u32 p = field->offset, var_p, elem_size;

@@ -5461,6 +5463,18 @@ static int check_map_access(struct 
bpf_verifier_env *env, u32 regno,
  					return -EACCES;
  				}
  				break;
+			case BPF_REPEAT_FIELDS:
+				var_p = off + reg->var_off.value;
+				if (var_p < p || var_p >= p + field->size)
+					break;
+				elem_size = field->size / field->nelems;
+				/* Redirect to the offset of repeated
+				 * fields
+				 */
+				off = p + ((var_p - p) % elem_size) - reg->var_off.value - elem_size;
+				cnt = i;
+				i -= field->repeated_cnt + 1;
+				break;
  			default:
  				verbose(env, "%s cannot be accessed directly by load/store\n",
  					btf_field_type_name(field->type));
diff --git a/tools/testing/selftests/bpf/prog_tests/cpumask.c 
b/tools/testing/selftests/bpf/prog_tests/cpumask.c
index bba601e235f6..2570bd4b0cb2 100644
--- a/tools/testing/selftests/bpf/prog_tests/cpumask.c
+++ b/tools/testing/selftests/bpf/prog_tests/cpumask.c
@@ -21,6 +21,8 @@ static const char * const cpumask_success_testcases[] = {
  	"test_global_mask_array_one_rcu",
  	"test_global_mask_array_rcu",
  	"test_global_mask_array_l2_rcu",
+	"test_global_mask_nested_rcu",
+	"test_global_mask_nested_deep_rcu",
  	"test_cpumask_weight",
  };

diff --git a/tools/testing/selftests/bpf/progs/cpumask_success.c 
b/tools/testing/selftests/bpf/progs/cpumask_success.c
index 9d76d85680d7..0de6bc115f55 100644
--- a/tools/testing/selftests/bpf/progs/cpumask_success.c
+++ b/tools/testing/selftests/bpf/progs/cpumask_success.c
@@ -11,9 +11,21 @@
  char _license[] SEC("license") = "GPL";

  int pid, nr_cpus;
+struct kptr_nested {
+	struct bpf_cpumask __kptr * mask;
+};
+struct kptr_nested_mid {
+	int dummy;
+	struct kptr_nested m;
+};
+struct kptr_nested_deep {
+	struct kptr_nested_mid ptrs[2];
+};
  private(MASK) static struct bpf_cpumask __kptr * global_mask_array[2];
  private(MASK) static struct bpf_cpumask __kptr * 
global_mask_array_l2[2][1];
  private(MASK) static struct bpf_cpumask __kptr * global_mask_array_one[1];
+private(MASK) static struct kptr_nested global_mask_nested[2];
+private(MASK) static struct kptr_nested_deep global_mask_nested_deep;

  static bool is_test_task(void)
  {
@@ -553,6 +565,71 @@ int BPF_PROG(test_global_mask_array_rcu, struct 
task_struct *task, u64 clone_fla
  	return 0;
  }

+static int _global_mask_nested_rcu(struct bpf_cpumask **mask0,
+				       struct bpf_cpumask **mask1)
+{
+	struct bpf_cpumask *local;
+
+	if (!is_test_task())
+		return 0;
+
+	/* Check if two kptrs in the array work and independently */
+
+	local = create_cpumask();
+	if (!local)
+		return 0;
+
+	bpf_rcu_read_lock();
+
+	local = bpf_kptr_xchg(mask0, local);
+	if (local) {
+		err = 1;
+		goto err_exit;
+	}
+
+	/* [<mask 0>, NULL] */
+	if (!*mask0 || *mask1) {
+		err = 2;
+		goto err_exit;
+	}
+
+	local = create_cpumask();
+	if (!local) {
+		err = 9;
+		goto err_exit;
+	}
+
+	local = bpf_kptr_xchg(mask1, local);
+	if (local) {
+		err = 10;
+		goto err_exit;
+	}
+
+	/* [<mask 0>, <mask 1>] */
+	if (!*mask0 || !*mask1 || *mask0 == *mask1) {
+		err = 11;
+		goto err_exit;
+	}
+
+err_exit:
+	if (local)
+		bpf_cpumask_release(local);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_global_mask_nested_rcu, struct task_struct *task, u64 
clone_flags)
+{
+	return _global_mask_nested_rcu(&global_mask_nested[0].mask, 
&global_mask_nested[1].mask);
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_global_mask_nested_deep_rcu, struct task_struct 
*task, u64 clone_flags)
+{
+	return 
_global_mask_nested_rcu(&global_mask_nested_deep.ptrs[0].m.mask, 
&global_mask_nested_deep.ptrs[1].m.mask);
+}
+
  SEC("tp_btf/task_newtask")
  int BPF_PROG(test_global_mask_array_l2_rcu, struct task_struct *task, 
u64 clone_flags)
  {

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-18 14:53         ` Alexei Starovoitov
  2024-04-18 18:27           ` Kui-Feng Lee
  2024-04-19 18:36           ` Kui-Feng Lee
@ 2024-04-23  2:45           ` Kui-Feng Lee
  2024-04-23  2:54             ` Kui-Feng Lee
  2 siblings, 1 reply; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-23  2:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/18/24 07:53, Alexei Starovoitov wrote:
> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee <thinker.li@gmail.com> wrote:
>>>>>>
>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>>>> global variables. This was due to these types being initialized and
>>>>>> verified in a special manner in the kernel. This patchset allows BPF
>>>>>> programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head in
>>>>>> the global namespace.
>>>>>>
>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>> "nelems" represents the number of elements in the array if a btf_field
>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>> verifies these types based on the information provided by the
>>>>>> btf_field.
>>>>>>
>>>>>> The value of "size" will be the size of the entire array if a
>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
>>>>>> size of an element. The value of "offset" will be the offset of the
>>>>>> beginning for an array. By putting this together, we can determine the
>>>>>> offset of each element in an array. For example,
>>>>>>
>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>
>>>>> Looks like this patch set enables arrays only.
>>>>> Meaning the following is supported already:
>>>>>
>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
>>>>>
>>>>> while this support is added:
>>>>>
>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, node2);
>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, node2);
>>>>>
>>>>> Am I right?
>>>>>
>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead;
>>>>> } ghead;
>>>>>
>>>>> that's not enabled in this patch. I think.
>>>>>
>>>>> And the following:
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead;
>>>>> } ghead[2];
>>>>>
>>>>>
>>>>> or
>>>>>
>>>>> private(C) struct foo {
>>>>>      struct bpf_list_head ghead[2];
>>>>> } ghead;
>>>>>
>>>>> Won't work either.
>>>>
>>>> No, they don't work.
>>>> We had a discussion about this in the other day.
>>>> I proposed to have another patch set to work on struct types.
>>>> Do you prefer to handle it in this patch set?
>>>>
>>>>>
>>>>> I think eventually we want to support all such combinations and
>>>>> the approach proposed in this patch with 'nelems'
>>>>> won't work for wrapper structs.
>>>>>
>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>> and represent them as individual elements in the flattened
>>>>> structure. Then there will be no need to special case array with 'nelems'.
>>>>> All special BTF types will be individual elements with unique offset.
>>>>>
>>>>> Does this make sense?
>>>>
>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>> elements. The purpose of adding "nelems" is to avoid the repetition. Do
>>>> you prefer to expand them?
>>>
>>> It's not just expansion, but a common way to handle nested structs too.
>>>
>>> I suspect by delaying nested into another patchset this approach
>>> will become useless.
>>>
>>> So try adding nested structs in all combinations as a follow up and
>>> I suspect you're realize that "nelems" approach doesn't really help.
>>> You'd need to flatten them all.
>>> And once you do there is no need for "nelems".
>>
>> For me, "nelems" is more like a choice of avoiding repetition of
>> information, not a necessary. Before adding "nelems", I had considered
>> to expand them as well. But, eventually, I chose to add "nelems".
>>
>> Since you think this repetition is not a problem, I will expand array as
>> individual elements.
> 
> You don't sound convinced :)
> Please add support for nested structs on top of your "nelems" approach
> and prototype the same without "nelems" and let's compare the two.


The following is the prototype that flatten arrays and struct types.
This approach is definitely simpler than "nelems" one.  However,
it will repeat same information as many times as the size of an array.
For now, we have a limitation on the number of btf_fields (<= 10).

The core part of the "nelems" approach is quiet similar to this
"flatten" version.  However, the following function has to be modified
to handle "nelem" and fields in "BPF_REPEAT_FIELDS" type.

  - bpf_obj_init_field() & bpf_obj_free_fields()
  - btf_record_find()
  - check_map_access()
  - btf_record_find()
  - check_map_access()
  - bpf_obj_memcpy()
  - bpf_obj_memzero()


---
  include/linux/bpf.h |   1 +
  kernel/bpf/btf.c    | 125 +++++++++++++++++++++++++++++++++++++++++---
  2 files changed, 118 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5034c1b4ded7..b5d3d5e39d48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -202,6 +202,7 @@ enum btf_field_type {
  	BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE,
  	BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
  	BPF_REFCOUNT   = (1 << 9),
+	BPF_REPEAT_FIELDS = (1 << 10),
  };

  typedef void (*btf_dtor_kfunc_t)(void *);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 3233832f064f..0cc91f00d872 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3484,6 +3484,50 @@ static int btf_get_field_type(const char *name, 
u32 field_mask, u32 *seen_mask,

  #undef field_mask_test_name

+static int btf_find_struct_field(const struct btf *btf,
+				 const struct btf_type *t, u32 field_mask,
+				 struct btf_field_info *info, int info_cnt);
+
+static void btf_repeat_fields(struct btf_field_info *info, u32 first_field,
+			      u32 field_cnt, u32 repeat_cnt, u32 elem_size)
+{
+	u32 i, j;
+	u32 cur;
+
+	cur = first_field + field_cnt;
+	for (i = 0; i < repeat_cnt; i++) {
+		memcpy(&info[cur], &info[first_field], field_cnt * sizeof(info[0]));
+		for (j = 0; j < field_cnt; j++)
+			info[cur++].off += (i + 1) * elem_size;
+	}
+}
+
+static int btf_find_nested_struct(const struct btf *btf, const struct 
btf_type *t,
+				  u32 off, u32 nelems,
+				  u32 field_mask, struct btf_field_info *info,
+				  int info_cnt)
+{
+	int ret, i;
+
+	ret = btf_find_struct_field(btf, t, field_mask, info, info_cnt);
+
+	if (ret <= 0)
+		return ret;
+
+	/* Shift the offsets of the nested struct fields to the offsets
+	 * related to the container.
+	 */
+	for (i = 0; i < ret; i++)
+		info[i].off += off;
+
+	if (nelems > 1) {
+		btf_repeat_fields(info, 0, ret, nelems - 1, t->size);
+		ret *= nelems;
+	}
+
+	return ret;
+}
+
  static int btf_find_struct_field(const struct btf *btf,
  				 const struct btf_type *t, u32 field_mask,
  				 struct btf_field_info *info, int info_cnt)
@@ -3496,9 +3540,26 @@ static int btf_find_struct_field(const struct btf 
*btf,
  	for_each_member(i, t, member) {
  		const struct btf_type *member_type = btf_type_by_id(btf,
  								    member->type);
+		const struct btf_array *array;
+		u32 j, nelems = 1;
+
+		/* Walk into array types to find the element type and the
+		 * number of elements in the (flattened) array.
+		 */
+		for (j = 0; j < MAX_RESOLVE_DEPTH && btf_type_is_array(member_type); 
j++) {
+			array = btf_array(member_type);
+			nelems *= array->nelems;
+			member_type = btf_type_by_id(btf, array->type);
+		}
+		if (nelems == 0)
+			continue;

  		field_type = btf_get_field_type(__btf_name_by_offset(btf, 
member_type->name_off),
-						field_mask, &seen_mask, &align, &sz);
+							field_mask, &seen_mask, &align, &sz);
+		if ((field_type == BPF_KPTR_REF || !field_type) &&
+		    __btf_type_is_struct(member_type))
+			field_type = BPF_REPEAT_FIELDS;
+
  		if (field_type == 0)
  			continue;
  		if (field_type < 0)
@@ -3522,7 +3583,12 @@ static int btf_find_struct_field(const struct btf 
*btf,
  					      idx < info_cnt ? &info[idx] : &tmp);
  			if (ret < 0)
  				return ret;
-			break;
+			if (ret == BTF_FIELD_IGNORE)
+				continue;
+			if (idx >= info_cnt)
+				return -E2BIG;
+			idx++;
+			continue;
  		case BPF_KPTR_UNREF:
  		case BPF_KPTR_REF:
  		case BPF_KPTR_PERCPU:
@@ -3540,15 +3606,24 @@ static int btf_find_struct_field(const struct 
btf *btf,
  			if (ret < 0)
  				return ret;
  			break;
+		case BPF_REPEAT_FIELDS:
+			ret = btf_find_nested_struct(btf, member_type, off, nelems, field_mask,
+						    &info[idx], info_cnt - idx);
+			if (ret < 0)
+				return ret;
+			idx += ret;
+			continue;
  		default:
  			return -EFAULT;
  		}

  		if (ret == BTF_FIELD_IGNORE)
  			continue;
-		if (idx >= info_cnt)
+		if (idx + nelems > info_cnt)
  			return -E2BIG;
-		++idx;
+		if (nelems > 1)
+			btf_repeat_fields(info, idx, 1, nelems - 1, sz);
+		idx += nelems;
  	}
  	return idx;
  }
@@ -3565,16 +3640,35 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  	for_each_vsi(i, t, vsi) {
  		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
  		const struct btf_type *var_type = btf_type_by_id(btf, var->type);
+		const struct btf_array *array;
+		u32 j, nelems = 1;
+
+		/* Walk into array types to find the element type and the
+		 * number of elements in the (flattened) array.
+		 */
+		for (j = 0; j < MAX_RESOLVE_DEPTH && btf_type_is_array(var_type); j++) {
+			array = btf_array(var_type);
+			nelems *= array->nelems;
+			var_type = btf_type_by_id(btf, array->type);
+		}
+		if (nelems == 0)
+			continue;

  		field_type = btf_get_field_type(__btf_name_by_offset(btf, 
var_type->name_off),
  						field_mask, &seen_mask, &align, &sz);
+		if ((field_type == BPF_KPTR_REF || !field_type) &&
+		    __btf_type_is_struct(var_type)) {
+			field_type = BPF_REPEAT_FIELDS;
+			sz = var_type->size;
+		}
+
  		if (field_type == 0)
  			continue;
  		if (field_type < 0)
  			return field_type;

  		off = vsi->offset;
-		if (vsi->size != sz)
+		if (vsi->size != sz * nelems)
  			continue;
  		if (off % align)
  			continue;
@@ -3589,7 +3683,12 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  					      idx < info_cnt ? &info[idx] : &tmp);
  			if (ret < 0)
  				return ret;
-			break;
+			if (ret == BTF_FIELD_IGNORE)
+				continue;
+			if (idx >= info_cnt)
+				return -E2BIG;
+			idx++;
+			continue;
  		case BPF_KPTR_UNREF:
  		case BPF_KPTR_REF:
  		case BPF_KPTR_PERCPU:
@@ -3607,16 +3706,26 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  			if (ret < 0)
  				return ret;
  			break;
+		case BPF_REPEAT_FIELDS:
+			ret = btf_find_nested_struct(btf, var_type, off, nelems, field_mask,
+						     &info[idx], info_cnt - idx);
+			if (ret < 0)
+				return ret;
+			idx += ret;
+			continue;
  		default:
  			return -EFAULT;
  		}

  		if (ret == BTF_FIELD_IGNORE)
  			continue;
-		if (idx >= info_cnt)
+		if (idx + nelems > info_cnt)
  			return -E2BIG;
-		++idx;
+		if (nelems > 1)
+			btf_repeat_fields(info, idx, 1, nelems - 1, sz);
+		idx += nelems;
  	}
+
  	return idx;
  }

-- 
2.34.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-23  2:45           ` Kui-Feng Lee
@ 2024-04-23  2:54             ` Kui-Feng Lee
  2024-04-24 20:09               ` Alexei Starovoitov
  0 siblings, 1 reply; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-23  2:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/22/24 19:45, Kui-Feng Lee wrote:
> 
> 
> On 4/18/24 07:53, Alexei Starovoitov wrote:
>> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com> 
>> wrote:
>>>
>>>
>>>
>>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com> 
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee 
>>>>>> <thinker.li@gmail.com> wrote:
>>>>>>>
>>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>>>>> global variables. This was due to these types being initialized and
>>>>>>> verified in a special manner in the kernel. This patchset allows BPF
>>>>>>> programs to declare arrays of kptr, bpf_rb_root, and 
>>>>>>> bpf_list_head in
>>>>>>> the global namespace.
>>>>>>>
>>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>>> "nelems" represents the number of elements in the array if a 
>>>>>>> btf_field
>>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>>> verifies these types based on the information provided by the
>>>>>>> btf_field.
>>>>>>>
>>>>>>> The value of "size" will be the size of the entire array if a
>>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
>>>>>>> size of an element. The value of "offset" will be the offset of the
>>>>>>> beginning for an array. By putting this together, we can 
>>>>>>> determine the
>>>>>>> offset of each element in an array. For example,
>>>>>>>
>>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>>
>>>>>> Looks like this patch set enables arrays only.
>>>>>> Meaning the following is supported already:
>>>>>>
>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
>>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
>>>>>>
>>>>>> while this support is added:
>>>>>>
>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo, 
>>>>>> node2);
>>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo, 
>>>>>> node2);
>>>>>>
>>>>>> Am I right?
>>>>>>
>>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>>> private(C) struct foo {
>>>>>>      struct bpf_list_head ghead;
>>>>>> } ghead;
>>>>>>
>>>>>> that's not enabled in this patch. I think.
>>>>>>
>>>>>> And the following:
>>>>>> private(C) struct foo {
>>>>>>      struct bpf_list_head ghead;
>>>>>> } ghead[2];
>>>>>>
>>>>>>
>>>>>> or
>>>>>>
>>>>>> private(C) struct foo {
>>>>>>      struct bpf_list_head ghead[2];
>>>>>> } ghead;
>>>>>>
>>>>>> Won't work either.
>>>>>
>>>>> No, they don't work.
>>>>> We had a discussion about this in the other day.
>>>>> I proposed to have another patch set to work on struct types.
>>>>> Do you prefer to handle it in this patch set?
>>>>>
>>>>>>
>>>>>> I think eventually we want to support all such combinations and
>>>>>> the approach proposed in this patch with 'nelems'
>>>>>> won't work for wrapper structs.
>>>>>>
>>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>>> and represent them as individual elements in the flattened
>>>>>> structure. Then there will be no need to special case array with 
>>>>>> 'nelems'.
>>>>>> All special BTF types will be individual elements with unique offset.
>>>>>>
>>>>>> Does this make sense?
>>>>>
>>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>>> elements. The purpose of adding "nelems" is to avoid the 
>>>>> repetition. Do
>>>>> you prefer to expand them?
>>>>
>>>> It's not just expansion, but a common way to handle nested structs too.
>>>>
>>>> I suspect by delaying nested into another patchset this approach
>>>> will become useless.
>>>>
>>>> So try adding nested structs in all combinations as a follow up and
>>>> I suspect you're realize that "nelems" approach doesn't really help.
>>>> You'd need to flatten them all.
>>>> And once you do there is no need for "nelems".
>>>
>>> For me, "nelems" is more like a choice of avoiding repetition of
>>> information, not a necessary. Before adding "nelems", I had considered
>>> to expand them as well. But, eventually, I chose to add "nelems".
>>>
>>> Since you think this repetition is not a problem, I will expand array as
>>> individual elements.
>>
>> You don't sound convinced :)
>> Please add support for nested structs on top of your "nelems" approach
>> and prototype the same without "nelems" and let's compare the two.
> 
> 
> The following is the prototype that flatten arrays and struct types.
> This approach is definitely simpler than "nelems" one.  However,
> it will repeat same information as many times as the size of an array.
> For now, we have a limitation on the number of btf_fields (<= 10).
> 
> The core part of the "nelems" approach is quiet similar to this
> "flatten" version.  However, the following function has to be modified
> to handle "nelem" and fields in "BPF_REPEAT_FIELDS" type.
> 
>   - bpf_obj_init_field() & bpf_obj_free_fields()
>   - btf_record_find()
>   - check_map_access()
>   - btf_record_find()
>   - check_map_access()
>   - bpf_obj_memcpy()
>   - bpf_obj_memzero()
> 
> 


The following is the core part that I extracted from the patchset.
It doesn't include the change on the functions mentioned above.


---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index caea4e560eb3..bd9d56b9b6e4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -202,6 +202,7 @@ enum btf_field_type {
  	BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE,
  	BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
  	BPF_REFCOUNT   = (1 << 9),
+	BPF_REPEAT_FIELDS = (1 << 10),
  };

  typedef void (*btf_dtor_kfunc_t)(void *);
@@ -226,10 +227,12 @@ struct btf_field_graph_root {
  struct btf_field {
  	u32 offset;
  	u32 size;
+	u32 nelems;
  	enum btf_field_type type;
  	union {
  		struct btf_field_kptr kptr;
  		struct btf_field_graph_root graph_root;
+		u32 repeated_cnt;
  	};
  };

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 3233832f064f..005e530bf7e5 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3289,6 +3289,7 @@ enum {
  struct btf_field_info {
  	enum btf_field_type type;
  	u32 off;
+	u32 nelems;
  	union {
  		struct {
  			u32 type_id;
@@ -3297,6 +3298,10 @@ struct btf_field_info {
  			const char *node_name;
  			u32 value_btf_id;
  		} graph_root;
+		struct {
+			u32 cnt;
+			u32 elem_size;
+		} repeat;
  	};
  };

@@ -3484,6 +3489,43 @@ static int btf_get_field_type(const char *name, 
u32 field_mask, u32 *seen_mask,

  #undef field_mask_test_name

+static int btf_find_struct_field(const struct btf *btf,
+				 const struct btf_type *t, u32 field_mask,
+				 struct btf_field_info *info, int info_cnt);
+
+static int btf_find_nested_struct(const struct btf *btf, const struct 
btf_type *t,
+				  u32 off, u32 nelems,
+				  u32 field_mask, struct btf_field_info *info,
+				  int info_cnt)
+{
+	int ret, i;
+
+	ret = btf_find_struct_field(btf, t, field_mask, info, info_cnt);
+
+	if (ret <= 0)
+		return ret;
+
+	/* Shift the offsets of the nested struct fields to the offsets
+	 * related to the container.
+	 */
+	for (i = 0; i < ret; i++)
+		info[i].off += off;
+
+	if (nelems > 1) {
+		/* Repeat fields created for nested struct */
+		if (ret >= info_cnt)
+			return -E2BIG;
+		info[ret].type = BPF_REPEAT_FIELDS;
+		info[ret].off = off + t->size;
+		info[ret].nelems = nelems - 1;
+		info[ret].repeat.cnt = ret;
+		info[ret].repeat.elem_size = t->size;
+		ret += 1;
+	}
+
+	return ret;
+}
+
  static int btf_find_struct_field(const struct btf *btf,
  				 const struct btf_type *t, u32 field_mask,
  				 struct btf_field_info *info, int info_cnt)
@@ -3496,9 +3538,26 @@ static int btf_find_struct_field(const struct btf 
*btf,
  	for_each_member(i, t, member) {
  		const struct btf_type *member_type = btf_type_by_id(btf,
  								    member->type);
+		const struct btf_array *array;
+		u32 j, nelems = 1;
+
+		/* Walk into array types to find the element type and the
+		 * number of elements in the (flattened) array.
+		 */
+		for (j = 0; j < MAX_RESOLVE_DEPTH && btf_type_is_array(member_type); 
j++) {
+			array = btf_array(member_type);
+			nelems *= array->nelems;
+			member_type = btf_type_by_id(btf, array->type);
+		}
+		if (nelems == 0)
+			continue;

  		field_type = btf_get_field_type(__btf_name_by_offset(btf, 
member_type->name_off),
-						field_mask, &seen_mask, &align, &sz);
+							field_mask, &seen_mask, &align, &sz);
+		if ((field_type == BPF_KPTR_REF || !field_type) &&
+		    __btf_type_is_struct(member_type))
+			field_type = BPF_REPEAT_FIELDS;
+
  		if (field_type == 0)
  			continue;
  		if (field_type < 0)
@@ -3540,6 +3599,13 @@ static int btf_find_struct_field(const struct btf 
*btf,
  			if (ret < 0)
  				return ret;
  			break;
+		case BPF_REPEAT_FIELDS:
+			ret = btf_find_nested_struct(btf, member_type, off, nelems, field_mask,
+						    &info[idx], info_cnt - idx);
+			if (ret < 0)
+				return ret;
+			idx += ret;
+			continue;
  		default:
  			return -EFAULT;
  		}
@@ -3548,6 +3614,7 @@ static int btf_find_struct_field(const struct btf 
*btf,
  			continue;
  		if (idx >= info_cnt)
  			return -E2BIG;
+		info[idx].nelems = nelems;
  		++idx;
  	}
  	return idx;
@@ -3565,16 +3632,35 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  	for_each_vsi(i, t, vsi) {
  		const struct btf_type *var = btf_type_by_id(btf, vsi->type);
  		const struct btf_type *var_type = btf_type_by_id(btf, var->type);
+		const struct btf_array *array;
+		u32 j, nelems = 1;
+
+		/* Walk into array types to find the element type and the
+		 * number of elements in the (flattened) array.
+		 */
+		for (j = 0; j < MAX_RESOLVE_DEPTH && btf_type_is_array(var_type); j++) {
+			array = btf_array(var_type);
+			nelems *= array->nelems;
+			var_type = btf_type_by_id(btf, array->type);
+		}
+		if (nelems == 0)
+			continue;

  		field_type = btf_get_field_type(__btf_name_by_offset(btf, 
var_type->name_off),
  						field_mask, &seen_mask, &align, &sz);
+		if ((field_type == BPF_KPTR_REF || !field_type) &&
+		    __btf_type_is_struct(var_type)) {
+			field_type = BPF_REPEAT_FIELDS;
+			sz = var_type->size;
+		}
+
  		if (field_type == 0)
  			continue;
  		if (field_type < 0)
  			return field_type;

  		off = vsi->offset;
-		if (vsi->size != sz)
+		if (vsi->size != sz * nelems)
  			continue;
  		if (off % align)
  			continue;
@@ -3582,9 +3668,11 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  		switch (field_type) {
  		case BPF_SPIN_LOCK:
  		case BPF_TIMER:
+		case BPF_REFCOUNT:
  		case BPF_LIST_NODE:
  		case BPF_RB_NODE:
-		case BPF_REFCOUNT:
+			if (nelems != 1)
+				continue;
  			ret = btf_find_struct(btf, var_type, off, sz, field_type,
  					      idx < info_cnt ? &info[idx] : &tmp);
  			if (ret < 0)
@@ -3607,6 +3695,13 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  			if (ret < 0)
  				return ret;
  			break;
+		case BPF_REPEAT_FIELDS:
+			ret = btf_find_nested_struct(btf, var_type, off, nelems, field_mask,
+						     &info[idx], info_cnt - idx);
+			if (ret < 0)
+				return ret;
+			idx += ret;
+			continue;
  		default:
  			return -EFAULT;
  		}
@@ -3615,8 +3710,9 @@ static int btf_find_datasec_var(const struct btf 
*btf, const struct btf_type *t,
  			continue;
  		if (idx >= info_cnt)
  			return -E2BIG;
-		++idx;
+		info[idx++].nelems = nelems;
  	}
+
  	return idx;
  }

@@ -3818,7 +3914,10 @@ struct btf_record *btf_parse_fields(const struct 
btf *btf, const struct btf_type
  	rec->timer_off = -EINVAL;
  	rec->refcount_off = -EINVAL;
  	for (i = 0; i < cnt; i++) {
-		field_type_size = btf_field_type_size(info_arr[i].type);
+		if (info_arr[i].type == BPF_REPEAT_FIELDS)
+			field_type_size = info_arr[i].repeat.elem_size * info_arr[i].nelems;
+		else
+			field_type_size = btf_field_type_size(info_arr[i].type) * 
info_arr[i].nelems;
  		if (info_arr[i].off + field_type_size > value_size) {
  			WARN_ONCE(1, "verifier bug off %d size %d", info_arr[i].off, 
value_size);
  			ret = -EFAULT;
@@ -3830,10 +3929,12 @@ struct btf_record *btf_parse_fields(const struct 
btf *btf, const struct btf_type
  		}
  		next_off = info_arr[i].off + field_type_size;

-		rec->field_mask |= info_arr[i].type;
+		if (info_arr[i].type != BPF_REPEAT_FIELDS)
+			rec->field_mask |= info_arr[i].type;
  		rec->fields[i].offset = info_arr[i].off;
  		rec->fields[i].type = info_arr[i].type;
  		rec->fields[i].size = field_type_size;
+		rec->fields[i].nelems = info_arr[i].nelems;

  		switch (info_arr[i].type) {
  		case BPF_SPIN_LOCK:
@@ -3871,6 +3972,10 @@ struct btf_record *btf_parse_fields(const struct 
btf *btf, const struct btf_type
  		case BPF_LIST_NODE:
  		case BPF_RB_NODE:
  			break;
+
+		case BPF_REPEAT_FIELDS:
+			rec->fields[i].repeated_cnt = info_arr[i].repeat.cnt;
+			break;
  		default:
  			ret = -EFAULT;
  			goto end;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-23  2:54             ` Kui-Feng Lee
@ 2024-04-24 20:09               ` Alexei Starovoitov
  2024-04-24 22:32                 ` Kui-Feng Lee
  2024-04-25  0:48                 ` Andrii Nakryiko
  0 siblings, 2 replies; 29+ messages in thread
From: Alexei Starovoitov @ 2024-04-24 20:09 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee

On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/22/24 19:45, Kui-Feng Lee wrote:
> >
> >
> > On 4/18/24 07:53, Alexei Starovoitov wrote:
> >> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com>
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 4/17/24 22:11, Alexei Starovoitov wrote:
> >>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
> >>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
> >>>>>> <thinker.li@gmail.com> wrote:
> >>>>>>>
> >>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
> >>>>>>> global variables. This was due to these types being initialized and
> >>>>>>> verified in a special manner in the kernel. This patchset allows BPF
> >>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
> >>>>>>> bpf_list_head in
> >>>>>>> the global namespace.
> >>>>>>>
> >>>>>>> The main change is to add "nelems" to btf_fields. The value of
> >>>>>>> "nelems" represents the number of elements in the array if a
> >>>>>>> btf_field
> >>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
> >>>>>>> verifies these types based on the information provided by the
> >>>>>>> btf_field.
> >>>>>>>
> >>>>>>> The value of "size" will be the size of the entire array if a
> >>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
> >>>>>>> size of an element. The value of "offset" will be the offset of the
> >>>>>>> beginning for an array. By putting this together, we can
> >>>>>>> determine the
> >>>>>>> offset of each element in an array. For example,
> >>>>>>>
> >>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
> >>>>>>
> >>>>>> Looks like this patch set enables arrays only.
> >>>>>> Meaning the following is supported already:
> >>>>>>
> >>>>>> +private(C) struct bpf_spin_lock glock_c;
> >>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
> >>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
> >>>>>>
> >>>>>> while this support is added:
> >>>>>>
> >>>>>> +private(C) struct bpf_spin_lock glock_c;
> >>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
> >>>>>> node2);
> >>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
> >>>>>> node2);
> >>>>>>
> >>>>>> Am I right?
> >>>>>>
> >>>>>> What about the case when bpf_list_head is wrapped in a struct?
> >>>>>> private(C) struct foo {
> >>>>>>      struct bpf_list_head ghead;
> >>>>>> } ghead;
> >>>>>>
> >>>>>> that's not enabled in this patch. I think.
> >>>>>>
> >>>>>> And the following:
> >>>>>> private(C) struct foo {
> >>>>>>      struct bpf_list_head ghead;
> >>>>>> } ghead[2];
> >>>>>>
> >>>>>>
> >>>>>> or
> >>>>>>
> >>>>>> private(C) struct foo {
> >>>>>>      struct bpf_list_head ghead[2];
> >>>>>> } ghead;
> >>>>>>
> >>>>>> Won't work either.
> >>>>>
> >>>>> No, they don't work.
> >>>>> We had a discussion about this in the other day.
> >>>>> I proposed to have another patch set to work on struct types.
> >>>>> Do you prefer to handle it in this patch set?
> >>>>>
> >>>>>>
> >>>>>> I think eventually we want to support all such combinations and
> >>>>>> the approach proposed in this patch with 'nelems'
> >>>>>> won't work for wrapper structs.
> >>>>>>
> >>>>>> I think it's better to unroll/flatten all structs and arrays
> >>>>>> and represent them as individual elements in the flattened
> >>>>>> structure. Then there will be no need to special case array with
> >>>>>> 'nelems'.
> >>>>>> All special BTF types will be individual elements with unique offset.
> >>>>>>
> >>>>>> Does this make sense?
> >>>>>
> >>>>> That means it will creates 10 btf_field(s) for an array having 10
> >>>>> elements. The purpose of adding "nelems" is to avoid the
> >>>>> repetition. Do
> >>>>> you prefer to expand them?
> >>>>
> >>>> It's not just expansion, but a common way to handle nested structs too.
> >>>>
> >>>> I suspect by delaying nested into another patchset this approach
> >>>> will become useless.
> >>>>
> >>>> So try adding nested structs in all combinations as a follow up and
> >>>> I suspect you're realize that "nelems" approach doesn't really help.
> >>>> You'd need to flatten them all.
> >>>> And once you do there is no need for "nelems".
> >>>
> >>> For me, "nelems" is more like a choice of avoiding repetition of
> >>> information, not a necessary. Before adding "nelems", I had considered
> >>> to expand them as well. But, eventually, I chose to add "nelems".
> >>>
> >>> Since you think this repetition is not a problem, I will expand array as
> >>> individual elements.
> >>
> >> You don't sound convinced :)
> >> Please add support for nested structs on top of your "nelems" approach
> >> and prototype the same without "nelems" and let's compare the two.
> >
> >
> > The following is the prototype that flatten arrays and struct types.
> > This approach is definitely simpler than "nelems" one.  However,
> > it will repeat same information as many times as the size of an array.
> > For now, we have a limitation on the number of btf_fields (<= 10).

I understand the concern and desire to minimize duplication,
but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
From btf_parse_fields() pov it becomes one giant opaque field
that sort_r() processes as a blob.

How
btf_record_find(reg->map_ptr->record,
                off + reg->var_off.value, BPF_KPTR);

is going to find anything in there?
Are you making a restriction that arrays and nested structs
will only have kptrs in there ?
So BPF_REPEAT_FIELDS can only wrap kptrs ?
But even then these kptrs might have different btf_ids.
So
struct map_value {
   struct {
      struct task __kptr *p1;
      struct thread __kptr *p2;
   } arr[10];
};

won't be able to be represented as BPF_REPEAT_FIELDS?

I think that simple flattening without repeat/nelems optimization
is much easier to reason about.
BTF_FIELDS_MAX is just a constant.
Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-24 20:09               ` Alexei Starovoitov
@ 2024-04-24 22:32                 ` Kui-Feng Lee
  2024-04-24 22:34                   ` Kui-Feng Lee
  2024-04-25  0:49                   ` Alexei Starovoitov
  2024-04-25  0:48                 ` Andrii Nakryiko
  1 sibling, 2 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-24 22:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/24/24 13:09, Alexei Starovoitov wrote:
> On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/22/24 19:45, Kui-Feng Lee wrote:
>>>
>>>
>>> On 4/18/24 07:53, Alexei Starovoitov wrote:
>>>> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>>>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
>>>>>>>> <thinker.li@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>>>>>>> global variables. This was due to these types being initialized and
>>>>>>>>> verified in a special manner in the kernel. This patchset allows BPF
>>>>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
>>>>>>>>> bpf_list_head in
>>>>>>>>> the global namespace.
>>>>>>>>>
>>>>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>>>>> "nelems" represents the number of elements in the array if a
>>>>>>>>> btf_field
>>>>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>>>>> verifies these types based on the information provided by the
>>>>>>>>> btf_field.
>>>>>>>>>
>>>>>>>>> The value of "size" will be the size of the entire array if a
>>>>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
>>>>>>>>> size of an element. The value of "offset" will be the offset of the
>>>>>>>>> beginning for an array. By putting this together, we can
>>>>>>>>> determine the
>>>>>>>>> offset of each element in an array. For example,
>>>>>>>>>
>>>>>>>>>         struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>>>>
>>>>>>>> Looks like this patch set enables arrays only.
>>>>>>>> Meaning the following is supported already:
>>>>>>>>
>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
>>>>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
>>>>>>>>
>>>>>>>> while this support is added:
>>>>>>>>
>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
>>>>>>>> node2);
>>>>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
>>>>>>>> node2);
>>>>>>>>
>>>>>>>> Am I right?
>>>>>>>>
>>>>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>>>>> private(C) struct foo {
>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>> } ghead;
>>>>>>>>
>>>>>>>> that's not enabled in this patch. I think.
>>>>>>>>
>>>>>>>> And the following:
>>>>>>>> private(C) struct foo {
>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>> } ghead[2];
>>>>>>>>
>>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>> private(C) struct foo {
>>>>>>>>       struct bpf_list_head ghead[2];
>>>>>>>> } ghead;
>>>>>>>>
>>>>>>>> Won't work either.
>>>>>>>
>>>>>>> No, they don't work.
>>>>>>> We had a discussion about this in the other day.
>>>>>>> I proposed to have another patch set to work on struct types.
>>>>>>> Do you prefer to handle it in this patch set?
>>>>>>>
>>>>>>>>
>>>>>>>> I think eventually we want to support all such combinations and
>>>>>>>> the approach proposed in this patch with 'nelems'
>>>>>>>> won't work for wrapper structs.
>>>>>>>>
>>>>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>>>>> and represent them as individual elements in the flattened
>>>>>>>> structure. Then there will be no need to special case array with
>>>>>>>> 'nelems'.
>>>>>>>> All special BTF types will be individual elements with unique offset.
>>>>>>>>
>>>>>>>> Does this make sense?
>>>>>>>
>>>>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>>>>> elements. The purpose of adding "nelems" is to avoid the
>>>>>>> repetition. Do
>>>>>>> you prefer to expand them?
>>>>>>
>>>>>> It's not just expansion, but a common way to handle nested structs too.
>>>>>>
>>>>>> I suspect by delaying nested into another patchset this approach
>>>>>> will become useless.
>>>>>>
>>>>>> So try adding nested structs in all combinations as a follow up and
>>>>>> I suspect you're realize that "nelems" approach doesn't really help.
>>>>>> You'd need to flatten them all.
>>>>>> And once you do there is no need for "nelems".
>>>>>
>>>>> For me, "nelems" is more like a choice of avoiding repetition of
>>>>> information, not a necessary. Before adding "nelems", I had considered
>>>>> to expand them as well. But, eventually, I chose to add "nelems".
>>>>>
>>>>> Since you think this repetition is not a problem, I will expand array as
>>>>> individual elements.
>>>>
>>>> You don't sound convinced :)
>>>> Please add support for nested structs on top of your "nelems" approach
>>>> and prototype the same without "nelems" and let's compare the two.
>>>
>>>
>>> The following is the prototype that flatten arrays and struct types.
>>> This approach is definitely simpler than "nelems" one.  However,
>>> it will repeat same information as many times as the size of an array.
>>> For now, we have a limitation on the number of btf_fields (<= 10).
> 
> I understand the concern and desire to minimize duplication,
> but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
>  From btf_parse_fields() pov it becomes one giant opaque field
> that sort_r() processes as a blob.
> 
> How
> btf_record_find(reg->map_ptr->record,
>                  off + reg->var_off.value, BPF_KPTR);
> 
> is going to find anything in there?
> Are you making a restriction that arrays and nested structs
> will only have kptrs in there ?
> So BPF_REPEAT_FIELDS can only wrap kptrs ?
> But even then these kptrs might have different btf_ids.
> So
> struct map_value {
>     struct {
>        struct task __kptr *p1;
>        struct thread __kptr *p2;
>     } arr[10];
> };
> 
> won't be able to be represented as BPF_REPEAT_FIELDS?


BPF_REPEAT_FIELDS can handle it. With this case, bpf_parse_fields() will 
create a list of btf_fields like this:

     [ btf_field(type=BPF_KPTR_..., offset=0, ...),
       btf_field(type=BPF_KPTR_..., offset=8, ...),
       btf_field(type=BPF_REPEAT_FIELDS, offset=16, repeated_fields=2, 
nelems=9, size=16)]

You might miss the explanation in [1].

btf_record_find() is still doing binary search. Looking for p2 in
obj->arr[1], the offset will be 24.  btf_record_find() will find the
BPF_REPEATED_FIELDS one, and redirect the offset to

   (field->offset - field->size + (16 - field->offset) % field->size) == 8

Then, it will return the btf_field whose offset is 8.


[1] 
https://lore.kernel.org/all/4d3dc24f-fb50-4674-8eec-4c38e4d4b2c1@gmail.com/

> 
> I think that simple flattening without repeat/nelems optimization
> is much easier to reason about.
> BTF_FIELDS_MAX is just a constant.
> Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.

I will switch to flatten one if you think "nelems" &
"BPF_REPEAT_FIELDS" are too complicated after reading the explanation in
[1].


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-24 22:32                 ` Kui-Feng Lee
@ 2024-04-24 22:34                   ` Kui-Feng Lee
  2024-04-24 22:36                     ` Kui-Feng Lee
  2024-04-25  0:49                   ` Alexei Starovoitov
  1 sibling, 1 reply; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-24 22:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/24/24 15:32, Kui-Feng Lee wrote:
> 
> 
> On 4/24/24 13:09, Alexei Starovoitov wrote:
>> On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>>
>>>
>>>
>>> On 4/22/24 19:45, Kui-Feng Lee wrote:
>>>>
>>>>
>>>> On 4/18/24 07:53, Alexei Starovoitov wrote:
>>>>> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>>>>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
>>>>>>>>> <thinker.li@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>>>>>>>> global variables. This was due to these types being 
>>>>>>>>>> initialized and
>>>>>>>>>> verified in a special manner in the kernel. This patchset 
>>>>>>>>>> allows BPF
>>>>>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
>>>>>>>>>> bpf_list_head in
>>>>>>>>>> the global namespace.
>>>>>>>>>>
>>>>>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>>>>>> "nelems" represents the number of elements in the array if a
>>>>>>>>>> btf_field
>>>>>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>>>>>> verifies these types based on the information provided by the
>>>>>>>>>> btf_field.
>>>>>>>>>>
>>>>>>>>>> The value of "size" will be the size of the entire array if a
>>>>>>>>>> btf_field represents an array. Dividing "size" by "nelems" 
>>>>>>>>>> gives the
>>>>>>>>>> size of an element. The value of "offset" will be the offset 
>>>>>>>>>> of the
>>>>>>>>>> beginning for an array. By putting this together, we can
>>>>>>>>>> determine the
>>>>>>>>>> offset of each element in an array. For example,
>>>>>>>>>>
>>>>>>>>>>         struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>>>>>
>>>>>>>>> Looks like this patch set enables arrays only.
>>>>>>>>> Meaning the following is supported already:
>>>>>>>>>
>>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, 
>>>>>>>>> node2);
>>>>>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, 
>>>>>>>>> node2);
>>>>>>>>>
>>>>>>>>> while this support is added:
>>>>>>>>>
>>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
>>>>>>>>> node2);
>>>>>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
>>>>>>>>> node2);
>>>>>>>>>
>>>>>>>>> Am I right?
>>>>>>>>>
>>>>>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>>>>>> private(C) struct foo {
>>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>>> } ghead;
>>>>>>>>>
>>>>>>>>> that's not enabled in this patch. I think.
>>>>>>>>>
>>>>>>>>> And the following:
>>>>>>>>> private(C) struct foo {
>>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>>> } ghead[2];
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> or
>>>>>>>>>
>>>>>>>>> private(C) struct foo {
>>>>>>>>>       struct bpf_list_head ghead[2];
>>>>>>>>> } ghead;
>>>>>>>>>
>>>>>>>>> Won't work either.
>>>>>>>>
>>>>>>>> No, they don't work.
>>>>>>>> We had a discussion about this in the other day.
>>>>>>>> I proposed to have another patch set to work on struct types.
>>>>>>>> Do you prefer to handle it in this patch set?
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think eventually we want to support all such combinations and
>>>>>>>>> the approach proposed in this patch with 'nelems'
>>>>>>>>> won't work for wrapper structs.
>>>>>>>>>
>>>>>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>>>>>> and represent them as individual elements in the flattened
>>>>>>>>> structure. Then there will be no need to special case array with
>>>>>>>>> 'nelems'.
>>>>>>>>> All special BTF types will be individual elements with unique 
>>>>>>>>> offset.
>>>>>>>>>
>>>>>>>>> Does this make sense?
>>>>>>>>
>>>>>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>>>>>> elements. The purpose of adding "nelems" is to avoid the
>>>>>>>> repetition. Do
>>>>>>>> you prefer to expand them?
>>>>>>>
>>>>>>> It's not just expansion, but a common way to handle nested 
>>>>>>> structs too.
>>>>>>>
>>>>>>> I suspect by delaying nested into another patchset this approach
>>>>>>> will become useless.
>>>>>>>
>>>>>>> So try adding nested structs in all combinations as a follow up and
>>>>>>> I suspect you're realize that "nelems" approach doesn't really help.
>>>>>>> You'd need to flatten them all.
>>>>>>> And once you do there is no need for "nelems".
>>>>>>
>>>>>> For me, "nelems" is more like a choice of avoiding repetition of
>>>>>> information, not a necessary. Before adding "nelems", I had 
>>>>>> considered
>>>>>> to expand them as well. But, eventually, I chose to add "nelems".
>>>>>>
>>>>>> Since you think this repetition is not a problem, I will expand 
>>>>>> array as
>>>>>> individual elements.
>>>>>
>>>>> You don't sound convinced :)
>>>>> Please add support for nested structs on top of your "nelems" approach
>>>>> and prototype the same without "nelems" and let's compare the two.
>>>>
>>>>
>>>> The following is the prototype that flatten arrays and struct types.
>>>> This approach is definitely simpler than "nelems" one.  However,
>>>> it will repeat same information as many times as the size of an array.
>>>> For now, we have a limitation on the number of btf_fields (<= 10).
>>
>> I understand the concern and desire to minimize duplication,
>> but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
>>  From btf_parse_fields() pov it becomes one giant opaque field
>> that sort_r() processes as a blob.
>>
>> How
>> btf_record_find(reg->map_ptr->record,
>>                  off + reg->var_off.value, BPF_KPTR);
>>
>> is going to find anything in there?
>> Are you making a restriction that arrays and nested structs
>> will only have kptrs in there ?
>> So BPF_REPEAT_FIELDS can only wrap kptrs ?
>> But even then these kptrs might have different btf_ids.
>> So
>> struct map_value {
>>     struct {
>>        struct task __kptr *p1;
>>        struct thread __kptr *p2;
>>     } arr[10];
>> };
>>
>> won't be able to be represented as BPF_REPEAT_FIELDS?
> 
> 
> BPF_REPEAT_FIELDS can handle it. With this case, bpf_parse_fields() will 
> create a list of btf_fields like this:
> 
>      [ btf_field(type=BPF_KPTR_..., offset=0, ...),
>        btf_field(type=BPF_KPTR_..., offset=8, ...),
>        btf_field(type=BPF_REPEAT_FIELDS, offset=16, repeated_fields=2, 
> nelems=9, size=16)]

  An error here.  size should be 16 * 9.

> 
> You might miss the explanation in [1].
> 
> btf_record_find() is still doing binary search. Looking for p2 in
> obj->arr[1], the offset will be 24.  btf_record_find() will find the
> BPF_REPEATED_FIELDS one, and redirect the offset to
> 
>    (field->offset - field->size + (16 - field->offset) % field->size) == 8

field->size should be replaced by (field->size / field->nelems).

> 
> Then, it will return the btf_field whose offset is 8.
> 
> 
> [1] 
> https://lore.kernel.org/all/4d3dc24f-fb50-4674-8eec-4c38e4d4b2c1@gmail.com/
> 
>>
>> I think that simple flattening without repeat/nelems optimization
>> is much easier to reason about.
>> BTF_FIELDS_MAX is just a constant.
>> Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.
> 
> I will switch to flatten one if you think "nelems" &
> "BPF_REPEAT_FIELDS" are too complicated after reading the explanation in
> [1].
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-24 22:34                   ` Kui-Feng Lee
@ 2024-04-24 22:36                     ` Kui-Feng Lee
  0 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-24 22:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/24/24 15:34, Kui-Feng Lee wrote:
> 
> 
> On 4/24/24 15:32, Kui-Feng Lee wrote:
>>
>>
>> On 4/24/24 13:09, Alexei Starovoitov wrote:
>>> On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@gmail.com> 
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 4/22/24 19:45, Kui-Feng Lee wrote:
>>>>>
>>>>>
>>>>> On 4/18/24 07:53, Alexei Starovoitov wrote:
>>>>>> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>>>>>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>>>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
>>>>>>>>>> <thinker.li@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't 
>>>>>>>>>>> work as
>>>>>>>>>>> global variables. This was due to these types being 
>>>>>>>>>>> initialized and
>>>>>>>>>>> verified in a special manner in the kernel. This patchset 
>>>>>>>>>>> allows BPF
>>>>>>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
>>>>>>>>>>> bpf_list_head in
>>>>>>>>>>> the global namespace.
>>>>>>>>>>>
>>>>>>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>>>>>>> "nelems" represents the number of elements in the array if a
>>>>>>>>>>> btf_field
>>>>>>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>>>>>>> verifies these types based on the information provided by the
>>>>>>>>>>> btf_field.
>>>>>>>>>>>
>>>>>>>>>>> The value of "size" will be the size of the entire array if a
>>>>>>>>>>> btf_field represents an array. Dividing "size" by "nelems" 
>>>>>>>>>>> gives the
>>>>>>>>>>> size of an element. The value of "offset" will be the offset 
>>>>>>>>>>> of the
>>>>>>>>>>> beginning for an array. By putting this together, we can
>>>>>>>>>>> determine the
>>>>>>>>>>> offset of each element in an array. For example,
>>>>>>>>>>>
>>>>>>>>>>>         struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>>>>>>
>>>>>>>>>> Looks like this patch set enables arrays only.
>>>>>>>>>> Meaning the following is supported already:
>>>>>>>>>>
>>>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, 
>>>>>>>>>> node2);
>>>>>>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, 
>>>>>>>>>> node2);
>>>>>>>>>>
>>>>>>>>>> while this support is added:
>>>>>>>>>>
>>>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
>>>>>>>>>> node2);
>>>>>>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
>>>>>>>>>> node2);
>>>>>>>>>>
>>>>>>>>>> Am I right?
>>>>>>>>>>
>>>>>>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>>>>>>> private(C) struct foo {
>>>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>>>> } ghead;
>>>>>>>>>>
>>>>>>>>>> that's not enabled in this patch. I think.
>>>>>>>>>>
>>>>>>>>>> And the following:
>>>>>>>>>> private(C) struct foo {
>>>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>>>> } ghead[2];
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> or
>>>>>>>>>>
>>>>>>>>>> private(C) struct foo {
>>>>>>>>>>       struct bpf_list_head ghead[2];
>>>>>>>>>> } ghead;
>>>>>>>>>>
>>>>>>>>>> Won't work either.
>>>>>>>>>
>>>>>>>>> No, they don't work.
>>>>>>>>> We had a discussion about this in the other day.
>>>>>>>>> I proposed to have another patch set to work on struct types.
>>>>>>>>> Do you prefer to handle it in this patch set?
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think eventually we want to support all such combinations and
>>>>>>>>>> the approach proposed in this patch with 'nelems'
>>>>>>>>>> won't work for wrapper structs.
>>>>>>>>>>
>>>>>>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>>>>>>> and represent them as individual elements in the flattened
>>>>>>>>>> structure. Then there will be no need to special case array with
>>>>>>>>>> 'nelems'.
>>>>>>>>>> All special BTF types will be individual elements with unique 
>>>>>>>>>> offset.
>>>>>>>>>>
>>>>>>>>>> Does this make sense?
>>>>>>>>>
>>>>>>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>>>>>>> elements. The purpose of adding "nelems" is to avoid the
>>>>>>>>> repetition. Do
>>>>>>>>> you prefer to expand them?
>>>>>>>>
>>>>>>>> It's not just expansion, but a common way to handle nested 
>>>>>>>> structs too.
>>>>>>>>
>>>>>>>> I suspect by delaying nested into another patchset this approach
>>>>>>>> will become useless.
>>>>>>>>
>>>>>>>> So try adding nested structs in all combinations as a follow up and
>>>>>>>> I suspect you're realize that "nelems" approach doesn't really 
>>>>>>>> help.
>>>>>>>> You'd need to flatten them all.
>>>>>>>> And once you do there is no need for "nelems".
>>>>>>>
>>>>>>> For me, "nelems" is more like a choice of avoiding repetition of
>>>>>>> information, not a necessary. Before adding "nelems", I had 
>>>>>>> considered
>>>>>>> to expand them as well. But, eventually, I chose to add "nelems".
>>>>>>>
>>>>>>> Since you think this repetition is not a problem, I will expand 
>>>>>>> array as
>>>>>>> individual elements.
>>>>>>
>>>>>> You don't sound convinced :)
>>>>>> Please add support for nested structs on top of your "nelems" 
>>>>>> approach
>>>>>> and prototype the same without "nelems" and let's compare the two.
>>>>>
>>>>>
>>>>> The following is the prototype that flatten arrays and struct types.
>>>>> This approach is definitely simpler than "nelems" one.  However,
>>>>> it will repeat same information as many times as the size of an array.
>>>>> For now, we have a limitation on the number of btf_fields (<= 10).
>>>
>>> I understand the concern and desire to minimize duplication,
>>> but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
>>>  From btf_parse_fields() pov it becomes one giant opaque field
>>> that sort_r() processes as a blob.
>>>
>>> How
>>> btf_record_find(reg->map_ptr->record,
>>>                  off + reg->var_off.value, BPF_KPTR);
>>>
>>> is going to find anything in there?
>>> Are you making a restriction that arrays and nested structs
>>> will only have kptrs in there ?
>>> So BPF_REPEAT_FIELDS can only wrap kptrs ?
>>> But even then these kptrs might have different btf_ids.
>>> So
>>> struct map_value {
>>>     struct {
>>>        struct task __kptr *p1;
>>>        struct thread __kptr *p2;
>>>     } arr[10];
>>> };
>>>
>>> won't be able to be represented as BPF_REPEAT_FIELDS?
>>
>>
>> BPF_REPEAT_FIELDS can handle it. With this case, bpf_parse_fields() 
>> will create a list of btf_fields like this:
>>
>>      [ btf_field(type=BPF_KPTR_..., offset=0, ...),
>>        btf_field(type=BPF_KPTR_..., offset=8, ...),
>>        btf_field(type=BPF_REPEAT_FIELDS, offset=16, repeated_fields=2, 
>> nelems=9, size=16)]
> 
>   An error here.  size should be 16 * 9.
> 
>>
>> You might miss the explanation in [1].
>>
>> btf_record_find() is still doing binary search. Looking for p2 in
>> obj->arr[1], the offset will be 24.  btf_record_find() will find the
>> BPF_REPEATED_FIELDS one, and redirect the offset to
>>
>>    (field->offset - field->size + (16 - field->offset) % field->size) 
                                       ^^^ should be 24
>> == 8
> 
> field->size should be replaced by (field->size / field->nelems).
> 
>>
>> Then, it will return the btf_field whose offset is 8.
>>
>>
>> [1] 
>> https://lore.kernel.org/all/4d3dc24f-fb50-4674-8eec-4c38e4d4b2c1@gmail.com/
>>
>>>
>>> I think that simple flattening without repeat/nelems optimization
>>> is much easier to reason about.
>>> BTF_FIELDS_MAX is just a constant.
>>> Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.
>>
>> I will switch to flatten one if you think "nelems" &
>> "BPF_REPEAT_FIELDS" are too complicated after reading the explanation in
>> [1].
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-24 20:09               ` Alexei Starovoitov
  2024-04-24 22:32                 ` Kui-Feng Lee
@ 2024-04-25  0:48                 ` Andrii Nakryiko
  2024-04-25 17:09                   ` Kui-Feng Lee
  1 sibling, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-04-25  0:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, Kui-Feng Lee, bpf, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Kernel Team, Andrii Nakryiko,
	Kui-Feng Lee

On Wed, Apr 24, 2024 at 1:09 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
> >
> >
> >
> > On 4/22/24 19:45, Kui-Feng Lee wrote:
> > >
> > >
> > > On 4/18/24 07:53, Alexei Starovoitov wrote:
> > >> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com>
> > >> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 4/17/24 22:11, Alexei Starovoitov wrote:
> > >>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com>
> > >>>> wrote:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
> > >>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
> > >>>>>> <thinker.li@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
> > >>>>>>> global variables. This was due to these types being initialized and
> > >>>>>>> verified in a special manner in the kernel. This patchset allows BPF
> > >>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
> > >>>>>>> bpf_list_head in
> > >>>>>>> the global namespace.
> > >>>>>>>
> > >>>>>>> The main change is to add "nelems" to btf_fields. The value of
> > >>>>>>> "nelems" represents the number of elements in the array if a
> > >>>>>>> btf_field
> > >>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
> > >>>>>>> verifies these types based on the information provided by the
> > >>>>>>> btf_field.
> > >>>>>>>
> > >>>>>>> The value of "size" will be the size of the entire array if a
> > >>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
> > >>>>>>> size of an element. The value of "offset" will be the offset of the
> > >>>>>>> beginning for an array. By putting this together, we can
> > >>>>>>> determine the
> > >>>>>>> offset of each element in an array. For example,
> > >>>>>>>
> > >>>>>>>        struct bpf_cpumask __kptr * global_mask_array[2];
> > >>>>>>
> > >>>>>> Looks like this patch set enables arrays only.
> > >>>>>> Meaning the following is supported already:
> > >>>>>>
> > >>>>>> +private(C) struct bpf_spin_lock glock_c;
> > >>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
> > >>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
> > >>>>>>
> > >>>>>> while this support is added:
> > >>>>>>
> > >>>>>> +private(C) struct bpf_spin_lock glock_c;
> > >>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
> > >>>>>> node2);
> > >>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
> > >>>>>> node2);
> > >>>>>>
> > >>>>>> Am I right?
> > >>>>>>
> > >>>>>> What about the case when bpf_list_head is wrapped in a struct?
> > >>>>>> private(C) struct foo {
> > >>>>>>      struct bpf_list_head ghead;
> > >>>>>> } ghead;
> > >>>>>>
> > >>>>>> that's not enabled in this patch. I think.
> > >>>>>>
> > >>>>>> And the following:
> > >>>>>> private(C) struct foo {
> > >>>>>>      struct bpf_list_head ghead;
> > >>>>>> } ghead[2];
> > >>>>>>
> > >>>>>>
> > >>>>>> or
> > >>>>>>
> > >>>>>> private(C) struct foo {
> > >>>>>>      struct bpf_list_head ghead[2];
> > >>>>>> } ghead;
> > >>>>>>
> > >>>>>> Won't work either.
> > >>>>>
> > >>>>> No, they don't work.
> > >>>>> We had a discussion about this in the other day.
> > >>>>> I proposed to have another patch set to work on struct types.
> > >>>>> Do you prefer to handle it in this patch set?
> > >>>>>
> > >>>>>>
> > >>>>>> I think eventually we want to support all such combinations and
> > >>>>>> the approach proposed in this patch with 'nelems'
> > >>>>>> won't work for wrapper structs.
> > >>>>>>
> > >>>>>> I think it's better to unroll/flatten all structs and arrays
> > >>>>>> and represent them as individual elements in the flattened
> > >>>>>> structure. Then there will be no need to special case array with
> > >>>>>> 'nelems'.
> > >>>>>> All special BTF types will be individual elements with unique offset.
> > >>>>>>
> > >>>>>> Does this make sense?
> > >>>>>
> > >>>>> That means it will creates 10 btf_field(s) for an array having 10
> > >>>>> elements. The purpose of adding "nelems" is to avoid the
> > >>>>> repetition. Do
> > >>>>> you prefer to expand them?
> > >>>>
> > >>>> It's not just expansion, but a common way to handle nested structs too.
> > >>>>
> > >>>> I suspect by delaying nested into another patchset this approach
> > >>>> will become useless.
> > >>>>
> > >>>> So try adding nested structs in all combinations as a follow up and
> > >>>> I suspect you're realize that "nelems" approach doesn't really help.
> > >>>> You'd need to flatten them all.
> > >>>> And once you do there is no need for "nelems".
> > >>>
> > >>> For me, "nelems" is more like a choice of avoiding repetition of
> > >>> information, not a necessary. Before adding "nelems", I had considered
> > >>> to expand them as well. But, eventually, I chose to add "nelems".
> > >>>
> > >>> Since you think this repetition is not a problem, I will expand array as
> > >>> individual elements.
> > >>
> > >> You don't sound convinced :)
> > >> Please add support for nested structs on top of your "nelems" approach
> > >> and prototype the same without "nelems" and let's compare the two.
> > >
> > >
> > > The following is the prototype that flatten arrays and struct types.
> > > This approach is definitely simpler than "nelems" one.  However,
> > > it will repeat same information as many times as the size of an array.
> > > For now, we have a limitation on the number of btf_fields (<= 10).
>
> I understand the concern and desire to minimize duplication,
> but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
> From btf_parse_fields() pov it becomes one giant opaque field
> that sort_r() processes as a blob.
>
> How
> btf_record_find(reg->map_ptr->record,
>                 off + reg->var_off.value, BPF_KPTR);
>
> is going to find anything in there?
> Are you making a restriction that arrays and nested structs
> will only have kptrs in there ?
> So BPF_REPEAT_FIELDS can only wrap kptrs ?
> But even then these kptrs might have different btf_ids.
> So
> struct map_value {
>    struct {
>       struct task __kptr *p1;
>       struct thread __kptr *p2;
>    } arr[10];
> };
>
> won't be able to be represented as BPF_REPEAT_FIELDS?
>
> I think that simple flattening without repeat/nelems optimization
> is much easier to reason about.

+100 to this, BPF_REPEAT_FIELDS just will add an extra layer of
cognitive overload. Even if it can handle all conceivable situations,
let's just have a list of all "unique fields". We already do dynamic
memory allocation for struct btf_record, one more or less doesn't
matter all that much. We seem to be doing this once per map, not per
instruction or per state.

Let's keep it simple.

> BTF_FIELDS_MAX is just a constant.
> Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-24 22:32                 ` Kui-Feng Lee
  2024-04-24 22:34                   ` Kui-Feng Lee
@ 2024-04-25  0:49                   ` Alexei Starovoitov
  2024-04-25 17:08                     ` Kui-Feng Lee
  1 sibling, 1 reply; 29+ messages in thread
From: Alexei Starovoitov @ 2024-04-25  0:49 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee

On Wed, Apr 24, 2024 at 3:32 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
> > struct map_value {
> >     struct {
> >        struct task __kptr *p1;
> >        struct thread __kptr *p2;
> >     } arr[10];
> > };
> >
> > won't be able to be represented as BPF_REPEAT_FIELDS?
>
>
> BPF_REPEAT_FIELDS can handle it. With this case, bpf_parse_fields() will
> create a list of btf_fields like this:
>
>      [ btf_field(type=BPF_KPTR_..., offset=0, ...),
>        btf_field(type=BPF_KPTR_..., offset=8, ...),
>        btf_field(type=BPF_REPEAT_FIELDS, offset=16, repeated_fields=2,
> nelems=9, size=16)]
>
> You might miss the explanation in [1].
>
> btf_record_find() is still doing binary search. Looking for p2 in
> obj->arr[1], the offset will be 24.  btf_record_find() will find the
> BPF_REPEATED_FIELDS one, and redirect the offset to
>
>    (field->offset - field->size + (16 - field->offset) % field->size) == 8
>
> Then, it will return the btf_field whose offset is 8.
>
>
> [1]
> https://lore.kernel.org/all/4d3dc24f-fb50-4674-8eec-4c38e4d4b2c1@gmail.com/

I somehow completely missed that email.
Just read it and tbh it looks very unnatural and convoluted.

> [kptr_a, kptr_b, repeated_fields(nelems=3, repeated_cnt=2),
>    repeated_fields(nelems=9, repeated_cnt=3)]

is kinda an inverted array description where elements come first
and then array type. I have a hard time imagining how search
in such thing will work.

Also consider that arrays won't be huge, since bpf prog
can only access them with a constant offset.
Even array[NR_CPUS] is unlikely, since indexing into it
with a variable index won't be possible.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-25  0:49                   ` Alexei Starovoitov
@ 2024-04-25 17:08                     ` Kui-Feng Lee
  0 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-25 17:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/24/24 17:49, Alexei Starovoitov wrote:
> On Wed, Apr 24, 2024 at 3:32 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>> struct map_value {
>>>      struct {
>>>         struct task __kptr *p1;
>>>         struct thread __kptr *p2;
>>>      } arr[10];
>>> };
>>>
>>> won't be able to be represented as BPF_REPEAT_FIELDS?
>>
>>
>> BPF_REPEAT_FIELDS can handle it. With this case, bpf_parse_fields() will
>> create a list of btf_fields like this:
>>
>>       [ btf_field(type=BPF_KPTR_..., offset=0, ...),
>>         btf_field(type=BPF_KPTR_..., offset=8, ...),
>>         btf_field(type=BPF_REPEAT_FIELDS, offset=16, repeated_fields=2,
>> nelems=9, size=16)]
>>
>> You might miss the explanation in [1].
>>
>> btf_record_find() is still doing binary search. Looking for p2 in
>> obj->arr[1], the offset will be 24.  btf_record_find() will find the
>> BPF_REPEATED_FIELDS one, and redirect the offset to
>>
>>     (field->offset - field->size + (16 - field->offset) % field->size) == 8
>>
>> Then, it will return the btf_field whose offset is 8.
>>
>>
>> [1]
>> https://lore.kernel.org/all/4d3dc24f-fb50-4674-8eec-4c38e4d4b2c1@gmail.com/
> 
> I somehow completely missed that email.
> Just read it and tbh it looks very unnatural and convoluted.
> 
>> [kptr_a, kptr_b, repeated_fields(nelems=3, repeated_cnt=2),
>>     repeated_fields(nelems=9, repeated_cnt=3)]
> 
> is kinda an inverted array description where elements come first
> and then array type. I have a hard time imagining how search
> in such thing will work.

About searching, it will find the elements if index is 0. For index >=
1, it will find repeated_fields(), and redirect to the offset to an
offset at index 0. The pseudo code looks like

   field = bsearch(all_fields, offset..);
   while (field && is_repeated_fields(field)) {
      offset = redirect_offset(offset, field);
      field = 
bsearch(&all_fields[field.index-field.repeated_cnt..field.index], offset);
   }


> 
> Also consider that arrays won't be huge, since bpf prog
> can only access them with a constant offset.
> Even array[NR_CPUS] is unlikely, since indexing into it
> with a variable index won't be possible.

I also got a similar opinion from Andrii in another message.
So, I will move to flatten solution.
Thank you for your feedback.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head.
  2024-04-25  0:48                 ` Andrii Nakryiko
@ 2024-04-25 17:09                   ` Kui-Feng Lee
  0 siblings, 0 replies; 29+ messages in thread
From: Kui-Feng Lee @ 2024-04-25 17:09 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Kui-Feng Lee, bpf, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
	Kernel Team, Andrii Nakryiko, Kui-Feng Lee



On 4/24/24 17:48, Andrii Nakryiko wrote:
> On Wed, Apr 24, 2024 at 1:09 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Mon, Apr 22, 2024 at 7:54 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>>
>>>
>>>
>>> On 4/22/24 19:45, Kui-Feng Lee wrote:
>>>>
>>>>
>>>> On 4/18/24 07:53, Alexei Starovoitov wrote:
>>>>> On Wed, Apr 17, 2024 at 11:07 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 4/17/24 22:11, Alexei Starovoitov wrote:
>>>>>>> On Wed, Apr 17, 2024 at 9:31 PM Kui-Feng Lee <sinquersw@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/17/24 20:30, Alexei Starovoitov wrote:
>>>>>>>>> On Fri, Apr 12, 2024 at 2:08 PM Kui-Feng Lee
>>>>>>>>> <thinker.li@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> The arrays of kptr, bpf_rb_root, and bpf_list_head didn't work as
>>>>>>>>>> global variables. This was due to these types being initialized and
>>>>>>>>>> verified in a special manner in the kernel. This patchset allows BPF
>>>>>>>>>> programs to declare arrays of kptr, bpf_rb_root, and
>>>>>>>>>> bpf_list_head in
>>>>>>>>>> the global namespace.
>>>>>>>>>>
>>>>>>>>>> The main change is to add "nelems" to btf_fields. The value of
>>>>>>>>>> "nelems" represents the number of elements in the array if a
>>>>>>>>>> btf_field
>>>>>>>>>> represents an array. Otherwise, "nelem" will be 1. The verifier
>>>>>>>>>> verifies these types based on the information provided by the
>>>>>>>>>> btf_field.
>>>>>>>>>>
>>>>>>>>>> The value of "size" will be the size of the entire array if a
>>>>>>>>>> btf_field represents an array. Dividing "size" by "nelems" gives the
>>>>>>>>>> size of an element. The value of "offset" will be the offset of the
>>>>>>>>>> beginning for an array. By putting this together, we can
>>>>>>>>>> determine the
>>>>>>>>>> offset of each element in an array. For example,
>>>>>>>>>>
>>>>>>>>>>         struct bpf_cpumask __kptr * global_mask_array[2];
>>>>>>>>>
>>>>>>>>> Looks like this patch set enables arrays only.
>>>>>>>>> Meaning the following is supported already:
>>>>>>>>>
>>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>>> +private(C) struct bpf_list_head ghead_array1 __contains(foo, node2);
>>>>>>>>> +private(C) struct bpf_list_head ghead_array2 __contains(foo, node2);
>>>>>>>>>
>>>>>>>>> while this support is added:
>>>>>>>>>
>>>>>>>>> +private(C) struct bpf_spin_lock glock_c;
>>>>>>>>> +private(C) struct bpf_list_head ghead_array1[3] __contains(foo,
>>>>>>>>> node2);
>>>>>>>>> +private(C) struct bpf_list_head ghead_array2[2] __contains(foo,
>>>>>>>>> node2);
>>>>>>>>>
>>>>>>>>> Am I right?
>>>>>>>>>
>>>>>>>>> What about the case when bpf_list_head is wrapped in a struct?
>>>>>>>>> private(C) struct foo {
>>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>>> } ghead;
>>>>>>>>>
>>>>>>>>> that's not enabled in this patch. I think.
>>>>>>>>>
>>>>>>>>> And the following:
>>>>>>>>> private(C) struct foo {
>>>>>>>>>       struct bpf_list_head ghead;
>>>>>>>>> } ghead[2];
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> or
>>>>>>>>>
>>>>>>>>> private(C) struct foo {
>>>>>>>>>       struct bpf_list_head ghead[2];
>>>>>>>>> } ghead;
>>>>>>>>>
>>>>>>>>> Won't work either.
>>>>>>>>
>>>>>>>> No, they don't work.
>>>>>>>> We had a discussion about this in the other day.
>>>>>>>> I proposed to have another patch set to work on struct types.
>>>>>>>> Do you prefer to handle it in this patch set?
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think eventually we want to support all such combinations and
>>>>>>>>> the approach proposed in this patch with 'nelems'
>>>>>>>>> won't work for wrapper structs.
>>>>>>>>>
>>>>>>>>> I think it's better to unroll/flatten all structs and arrays
>>>>>>>>> and represent them as individual elements in the flattened
>>>>>>>>> structure. Then there will be no need to special case array with
>>>>>>>>> 'nelems'.
>>>>>>>>> All special BTF types will be individual elements with unique offset.
>>>>>>>>>
>>>>>>>>> Does this make sense?
>>>>>>>>
>>>>>>>> That means it will creates 10 btf_field(s) for an array having 10
>>>>>>>> elements. The purpose of adding "nelems" is to avoid the
>>>>>>>> repetition. Do
>>>>>>>> you prefer to expand them?
>>>>>>>
>>>>>>> It's not just expansion, but a common way to handle nested structs too.
>>>>>>>
>>>>>>> I suspect by delaying nested into another patchset this approach
>>>>>>> will become useless.
>>>>>>>
>>>>>>> So try adding nested structs in all combinations as a follow up and
>>>>>>> I suspect you're realize that "nelems" approach doesn't really help.
>>>>>>> You'd need to flatten them all.
>>>>>>> And once you do there is no need for "nelems".
>>>>>>
>>>>>> For me, "nelems" is more like a choice of avoiding repetition of
>>>>>> information, not a necessary. Before adding "nelems", I had considered
>>>>>> to expand them as well. But, eventually, I chose to add "nelems".
>>>>>>
>>>>>> Since you think this repetition is not a problem, I will expand array as
>>>>>> individual elements.
>>>>>
>>>>> You don't sound convinced :)
>>>>> Please add support for nested structs on top of your "nelems" approach
>>>>> and prototype the same without "nelems" and let's compare the two.
>>>>
>>>>
>>>> The following is the prototype that flatten arrays and struct types.
>>>> This approach is definitely simpler than "nelems" one.  However,
>>>> it will repeat same information as many times as the size of an array.
>>>> For now, we have a limitation on the number of btf_fields (<= 10).
>>
>> I understand the concern and desire to minimize duplication,
>> but I don't see how this BPF_REPEAT_FIELDS approach is going to work.
>>  From btf_parse_fields() pov it becomes one giant opaque field
>> that sort_r() processes as a blob.
>>
>> How
>> btf_record_find(reg->map_ptr->record,
>>                  off + reg->var_off.value, BPF_KPTR);
>>
>> is going to find anything in there?
>> Are you making a restriction that arrays and nested structs
>> will only have kptrs in there ?
>> So BPF_REPEAT_FIELDS can only wrap kptrs ?
>> But even then these kptrs might have different btf_ids.
>> So
>> struct map_value {
>>     struct {
>>        struct task __kptr *p1;
>>        struct thread __kptr *p2;
>>     } arr[10];
>> };
>>
>> won't be able to be represented as BPF_REPEAT_FIELDS?
>>
>> I think that simple flattening without repeat/nelems optimization
>> is much easier to reason about.
> 
> +100 to this, BPF_REPEAT_FIELDS just will add an extra layer of
> cognitive overload. Even if it can handle all conceivable situations,
> let's just have a list of all "unique fields". We already do dynamic
> memory allocation for struct btf_record, one more or less doesn't
> matter all that much. We seem to be doing this once per map, not per
> instruction or per state.
> 
> Let's keep it simple.
> 


Thank you for the feedback.
I will move to the flatten approach.

>> BTF_FIELDS_MAX is just a constant.
>> Just don't do struct btf_field_info info_arr[BTF_FIELDS_MAX]; on stack.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-04-25 17:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-12 21:08 [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 01/11] bpf: Remove unnecessary checks on the offset of btf_field Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 02/11] bpf: Remove unnecessary call to btf_field_type_size() Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 03/11] bpf: Add nelems to struct btf_field_info and btf_field Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 04/11] bpf: initialize/free array of btf_field(s) Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 05/11] bpf: Find btf_field with the knowledge of arrays Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 06/11] bpf: check_map_access() " Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 07/11] bpf: check_map_kptr_access() compute the offset from the reg state Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 08/11] bpf: Enable and verify btf_field arrays Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 09/11] selftests/bpf: Test global kptr arrays Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 10/11] selftests/bpf: Test global bpf_rb_root arrays Kui-Feng Lee
2024-04-12 21:08 ` [PATCH bpf-next v2 11/11] selftests/bpf: Test global bpf_list_head arrays Kui-Feng Lee
2024-04-18  3:30 ` [PATCH bpf-next v2 00/11] Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head Alexei Starovoitov
2024-04-18  4:31   ` Kui-Feng Lee
2024-04-18  5:11     ` Alexei Starovoitov
2024-04-18  6:07       ` Kui-Feng Lee
2024-04-18 14:53         ` Alexei Starovoitov
2024-04-18 18:27           ` Kui-Feng Lee
2024-04-19 18:36           ` Kui-Feng Lee
2024-04-23  2:45           ` Kui-Feng Lee
2024-04-23  2:54             ` Kui-Feng Lee
2024-04-24 20:09               ` Alexei Starovoitov
2024-04-24 22:32                 ` Kui-Feng Lee
2024-04-24 22:34                   ` Kui-Feng Lee
2024-04-24 22:36                     ` Kui-Feng Lee
2024-04-25  0:49                   ` Alexei Starovoitov
2024-04-25 17:08                     ` Kui-Feng Lee
2024-04-25  0:48                 ` Andrii Nakryiko
2024-04-25 17:09                   ` Kui-Feng Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox