[PATCH bpf-next 00/15] Add support for local percpu kptr

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next 00/15] Add support for local percpu kptr
@ 2023-08-14 17:28 Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 01/15] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
                   ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Patch set [1] implemented cgroup local storage BPF_MAP_TYPE_CGRP_STORAGE
similar to sk/task/inode local storage and old BPF_MAP_TYPE_CGROUP_STORAGE
map is marked as deprecated since old BPF_MAP_TYPE_CGROUP_STORAGE map can
only work with current cgroup.

Similarly, the existing BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map
is a percpu version of BPF_MAP_TYPE_CGROUP_STORAGE and only works
with current cgroup. But there is no replacement which can work
with arbitrary cgroup.

This patch set solved this problem but adding support for local
percpu kptr. The map value can have a percpu kptr field which holds
a bpf prog allocated percpu data. The below is an example,

  struct percpu_val_t {
    ... fields ...
  }

  struct map_value_t {
    struct percpu_val_t __percpu *percpu_data_ptr;
  }

In the above, 'map_value_t' is the map value type for a
BPF_MAP_TYPE_CGRP_STORAGE map. User can access 'percpu_data_ptr'
and then read/write percpu data. This covers BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
and more. So BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map type
is marked as deprecated.

In additional, local percpu kptr supports the same map type
as other kptrs including hash, lru_hash, array, sk/inode/task/cgrp
local storage. Please for individual patches for details.

  [1] https://lore.kernel.org/all/20221026042835.672317-1-yhs@fb.com/

Yonghong Song (15):
  bpf: Add support for non-fix-size percpu mem allocation
  bpf: Add BPF_KPTR_PERCPU_REF as a field type
  bpf: Add alloc/xchg/direct_access support for local percpu kptr
  bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu
    obj
  selftests/bpf: Update error message in negative linked_list test
  libbpf: Add __percpu macro definition
  selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in
    bpf_experimental.h
  selftests/bpf: Add tests for array map with local percpu kptr
  bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
  selftests/bpf: Remove unnecessary direct read of local percpu kptr
  selftests/bpf: Add tests for cgrp_local_storage with local percpu kptr
  bpf: Allow bpf_spin_lock and bpf_list_head in allocated percpu data
    structure
  selftests/bpf: Add tests for percpu struct with bpf list head
  selftests/bpf: Add some negative tests
  bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated

 include/linux/bpf.h                           |  25 ++-
 include/linux/bpf_verifier.h                  |   1 +
 include/uapi/linux/bpf.h                      |   9 +-
 kernel/bpf/btf.c                              |   5 +
 kernel/bpf/core.c                             |   8 +-
 kernel/bpf/helpers.c                          |  49 +++++
 kernel/bpf/memalloc.c                         |  14 +-
 kernel/bpf/syscall.c                          |  26 ++-
 kernel/bpf/verifier.c                         | 164 +++++++++++++---
 tools/include/uapi/linux/bpf.h                |   9 +-
 tools/lib/bpf/bpf_helpers.h                   |   1 +
 .../testing/selftests/bpf/bpf_experimental.h  |  31 +++
 .../selftests/bpf/prog_tests/linked_list.c    |   4 +-
 .../selftests/bpf/prog_tests/percpu_alloc.c   | 165 ++++++++++++++++
 .../selftests/bpf/progs/percpu_alloc_array.c  | 183 ++++++++++++++++++
 .../progs/percpu_alloc_cgrp_local_storage.c   | 105 ++++++++++
 .../selftests/bpf/progs/percpu_alloc_fail.c   | 100 ++++++++++
 .../percpu_alloc_nested_special_fields.c      | 121 ++++++++++++
 18 files changed, 958 insertions(+), 62 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_array.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_fail.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 01/15] bpf: Add support for non-fix-size percpu mem allocation
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type Yonghong Song
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

This is needed for later percpu mem allocation when the
allocation is done by bpf program. For such cases, a global
bpf_global_percpu_ma is added where a flexible allocation
size is needed.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h   |  4 ++--
 kernel/bpf/core.c     |  8 +++++---
 kernel/bpf/memalloc.c | 14 ++++++--------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cfabbcf47bdb..60e80e90c37d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -55,8 +55,8 @@ struct cgroup;
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 extern struct kobject *btf_kobj;
-extern struct bpf_mem_alloc bpf_global_ma;
-extern bool bpf_global_ma_set;
+extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
+extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
 
 typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 0f8f036d8bd1..95599df82ee4 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -64,8 +64,8 @@
 #define OFF	insn->off
 #define IMM	insn->imm
 
-struct bpf_mem_alloc bpf_global_ma;
-bool bpf_global_ma_set;
+struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
+bool bpf_global_ma_set, bpf_global_percpu_ma_set;
 
 /* No hurry in this branch
  *
@@ -2921,7 +2921,9 @@ static int __init bpf_global_ma_init(void)
 
 	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
 	bpf_global_ma_set = !ret;
-	return ret;
+	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
+	bpf_global_percpu_ma_set = !ret;
+	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
 }
 late_initcall(bpf_global_ma_init);
 #endif
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 9c49ae53deaf..cb60445de98a 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -499,15 +499,16 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 	struct obj_cgroup *objcg = NULL;
 	int cpu, i, unit_size, percpu_size = 0;
 
+	/* room for llist_node and per-cpu pointer */
+	if (percpu)
+		percpu_size = LLIST_NODE_SZ + sizeof(void *);
+
 	if (size) {
 		pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL);
 		if (!pc)
 			return -ENOMEM;
 
-		if (percpu)
-			/* room for llist_node and per-cpu pointer */
-			percpu_size = LLIST_NODE_SZ + sizeof(void *);
-		else
+		if (!percpu)
 			size += LLIST_NODE_SZ; /* room for llist_node */
 		unit_size = size;
 
@@ -527,10 +528,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 		return 0;
 	}
 
-	/* size == 0 && percpu is an invalid combination */
-	if (WARN_ON_ONCE(percpu))
-		return -EINVAL;
-
 	pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL);
 	if (!pcc)
 		return -ENOMEM;
@@ -543,6 +540,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 			c = &cc->cache[i];
 			c->unit_size = sizes[i];
 			c->objcg = objcg;
+			c->percpu_size = percpu_size;
 			c->tgt = c;
 			prefill_mem_cache(c, cpu);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 01/15] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-18 18:37   ` David Marchevsky
  2023-08-14 17:28 ` [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

BPF_KPTR_PERCPU_REF represents a percpu field type like below

  struct val_t {
    ... fields ...
  };
  struct t {
    ...
    struct val_t __percpu *percpu_data_ptr;
    ...
  };

where
  #define __percpu __attribute__((btf_type_tag("percpu")))

While BPF_KPTR_REF points to a trusted kernel object or a trusted
local object, BPF_KPTR_PERCPU_REF points to a trusted local
percpu object.

This patch added basic support for BPF_KPTR_PERCPU_REF
related to percpu field parsing, recording and free operations.
BPF_KPTR_PERCPU_REF also supports the same map types
as BPF_KPTR_REF does.

Note that unlike a local kptr, it is possible that
a BPF_KTPR_PERCUP_REF struct may not contain any
special fields like other kptr, bpf_spin_lock, bpf_list_head, etc.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h  | 18 ++++++++++++------
 kernel/bpf/btf.c     |  5 +++++
 kernel/bpf/syscall.c |  7 ++++++-
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 60e80e90c37d..e6348fd0a785 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -180,14 +180,15 @@ enum btf_field_type {
 	BPF_TIMER      = (1 << 1),
 	BPF_KPTR_UNREF = (1 << 2),
 	BPF_KPTR_REF   = (1 << 3),
-	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
-	BPF_LIST_HEAD  = (1 << 4),
-	BPF_LIST_NODE  = (1 << 5),
-	BPF_RB_ROOT    = (1 << 6),
-	BPF_RB_NODE    = (1 << 7),
+	BPF_KPTR_PERCPU_REF   = (1 << 4),
+	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_PERCPU_REF,
+	BPF_LIST_HEAD  = (1 << 5),
+	BPF_LIST_NODE  = (1 << 6),
+	BPF_RB_ROOT    = (1 << 7),
+	BPF_RB_NODE    = (1 << 8),
 	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
 				 BPF_RB_NODE | BPF_RB_ROOT,
-	BPF_REFCOUNT   = (1 << 8),
+	BPF_REFCOUNT   = (1 << 9),
 };
 
 typedef void (*btf_dtor_kfunc_t)(void *);
@@ -300,6 +301,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return "kptr";
+	case BPF_KPTR_PERCPU_REF:
+		return "kptr_percpu";
 	case BPF_LIST_HEAD:
 		return "bpf_list_head";
 	case BPF_LIST_NODE:
@@ -325,6 +328,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_PERCPU_REF:
 		return sizeof(u64);
 	case BPF_LIST_HEAD:
 		return sizeof(struct bpf_list_head);
@@ -351,6 +355,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_PERCPU_REF:
 		return __alignof__(u64);
 	case BPF_LIST_HEAD:
 		return __alignof__(struct bpf_list_head);
@@ -389,6 +394,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 	case BPF_TIMER:
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_PERCPU_REF:
 		break;
 	default:
 		WARN_ON_ONCE(1);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 249657c466dd..bc1792edc64e 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3293,6 +3293,8 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 		type = BPF_KPTR_UNREF;
 	else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
 		type = BPF_KPTR_REF;
+	else if (!strcmp("percpu", __btf_name_by_offset(btf, t->name_off)))
+		type = BPF_KPTR_PERCPU_REF;
 	else
 		return -EINVAL;
 
@@ -3457,6 +3459,7 @@ static int btf_find_struct_field(const struct btf *btf,
 			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU_REF:
 			ret = btf_find_kptr(btf, member_type, off, sz,
 					    idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3523,6 +3526,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU_REF:
 			ret = btf_find_kptr(btf, var_type, off, sz,
 					    idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3783,6 +3787,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU_REF:
 			ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
 			if (ret < 0)
 				goto end;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7f4e8c357a6a..1c30b6ee84d4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -514,6 +514,7 @@ void btf_record_free(struct btf_record *rec)
 		switch (rec->fields[i].type) {
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU_REF:
 			if (rec->fields[i].kptr.module)
 				module_put(rec->fields[i].kptr.module);
 			btf_put(rec->fields[i].kptr.btf);
@@ -560,6 +561,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		switch (fields[i].type) {
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU_REF:
 			btf_get(fields[i].kptr.btf);
 			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
 				ret = -ENXIO;
@@ -650,6 +652,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 			WRITE_ONCE(*(u64 *)field_ptr, 0);
 			break;
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU_REF:
 			xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
 			if (!xchgd_field)
 				break;
@@ -657,7 +660,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 			if (!btf_is_kernel(field->kptr.btf)) {
 				pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
 									   field->kptr.btf_id);
-				WARN_ON_ONCE(!pointee_struct_meta);
+				if (field->type != BPF_KPTR_PERCPU_REF)
+					WARN_ON_ONCE(!pointee_struct_meta);
 				migrate_disable();
 				__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
 								 pointee_struct_meta->record :
@@ -1046,6 +1050,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 				break;
 			case BPF_KPTR_UNREF:
 			case BPF_KPTR_REF:
+			case BPF_KPTR_PERCPU_REF:
 			case BPF_REFCOUNT:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_PERCPU_HASH &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 01/15] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-19  0:29   ` Alexei Starovoitov
  2023-08-19  1:24   ` Kumar Kartikeya Dwivedi
  2023-08-14 17:28 ` [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Add two new kfunc's, bpf_percpu_obj_new_impl() and
bpf_percpu_obj_drop_impl(), to allocate a percpu obj.
Two functions are very similar to bpf_obj_new_impl()
and bpf_obj_drop_impl(). The major difference is related
to percpu handling.

    bpf_rcu_read_lock()
    struct val_t __percpu *v = map_val->percpu_data;
    ...
    bpf_rcu_read_unlock()

For a percpu data map_val like above 'v', the reg->type
is set as
	PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU
if inside rcu critical section.

MEM_RCU marking here is similar to NON_OWN_REF as 'v'
is not a owning referenace. But NON_OWN_REF is
trusted and typically inside the spinlock while
MEM_RCU is under rcu read lock. RCU is preferred here
since percpu data structures mean potential concurrent
access into its contents.

Also, bpf_percpu_obj_new_impl() is restricted to only accept
scalar struct which means nested kptr's are not allowed
but some other special field, e.g., bpf_list_head, bpf_spin_lock, etc.
could be nested (nested 'struct'). Later patch will improve verifier to
handle such nested special fields.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h   |  3 +-
 kernel/bpf/helpers.c  | 49 +++++++++++++++++++++++
 kernel/bpf/syscall.c  | 21 +++++++---
 kernel/bpf/verifier.c | 90 ++++++++++++++++++++++++++++++++++---------
 4 files changed, 137 insertions(+), 26 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e6348fd0a785..a2cb380c43c7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -197,7 +197,8 @@ struct btf_field_kptr {
 	struct btf *btf;
 	struct module *module;
 	/* dtor used if btf_is_kernel(btf), otherwise the type is
-	 * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl is used
+	 * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl
+	 * or __bpf_percpu_drop_impl is used
 	 */
 	btf_dtor_kfunc_t dtor;
 	u32 btf_id;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index eb91cae0612a..dd14cb7da4af 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1900,6 +1900,29 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
 	return p;
 }
 
+__bpf_kfunc void *bpf_percpu_obj_new_impl(u64 local_type_id__k, void *meta__ign)
+{
+	struct btf_struct_meta *meta = meta__ign;
+	const struct btf_record *rec;
+	u64 size = local_type_id__k;
+	void __percpu *pptr;
+	void *p;
+	int cpu;
+
+	p = bpf_mem_alloc(&bpf_global_percpu_ma, size);
+	if (!p)
+		return NULL;
+	if (meta) {
+		pptr = *((void __percpu **)p);
+		rec = meta->record;
+		for_each_possible_cpu(cpu) {
+			bpf_obj_init(rec, per_cpu_ptr(pptr, cpu));
+		}
+	}
+
+	return p;
+}
+
 /* Must be called under migrate_disable(), as required by bpf_mem_free */
 void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
 {
@@ -1924,6 +1947,30 @@ __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
 	__bpf_obj_drop_impl(p, meta ? meta->record : NULL);
 }
 
+/* Must be called under migrate_disable(), as required by bpf_mem_free_rcu */
+void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec)
+{
+	void __percpu *pptr;
+	int cpu;
+
+	if (rec) {
+		pptr = *((void __percpu **)p);
+		for_each_possible_cpu(cpu) {
+			bpf_obj_free_fields(rec, per_cpu_ptr(pptr, cpu));
+		}
+	}
+
+	bpf_mem_free_rcu(&bpf_global_percpu_ma, p);
+}
+
+__bpf_kfunc void bpf_percpu_obj_drop_impl(void *p__alloc, void *meta__ign)
+{
+	struct btf_struct_meta *meta = meta__ign;
+	void *p = p__alloc;
+
+	__bpf_percpu_obj_drop_impl(p, meta ? meta->record : NULL);
+}
+
 __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
 {
 	struct btf_struct_meta *meta = meta__ign;
@@ -2436,7 +2483,9 @@ BTF_SET8_START(generic_btf_ids)
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_percpu_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_push_front_impl)
 BTF_ID_FLAGS(func, bpf_list_push_back_impl)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1c30b6ee84d4..9ceb6fd9a0e2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -627,6 +627,7 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
 }
 
 extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
+extern void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec);
 
 void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 {
@@ -660,13 +661,21 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 			if (!btf_is_kernel(field->kptr.btf)) {
 				pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
 									   field->kptr.btf_id);
-				if (field->type != BPF_KPTR_PERCPU_REF)
+
+				if (field->type == BPF_KPTR_PERCPU_REF) {
+					migrate_disable();
+					__bpf_percpu_obj_drop_impl(xchgd_field, pointee_struct_meta ?
+										pointee_struct_meta->record :
+										NULL);
+					migrate_enable();
+				} else {
 					WARN_ON_ONCE(!pointee_struct_meta);
-				migrate_disable();
-				__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
-								 pointee_struct_meta->record :
-								 NULL);
-				migrate_enable();
+					migrate_disable();
+					__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
+									 pointee_struct_meta->record :
+									 NULL);
+					migrate_enable();
+				}
 			} else {
 				field->kptr.dtor(xchgd_field);
 			}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4ccca1f6c998..a985fbf18a11 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -304,7 +304,7 @@ struct bpf_kfunc_call_arg_meta {
 	/* arg_{btf,btf_id,owning_ref} are used by kfunc-specific handling,
 	 * generally to pass info about user-defined local kptr types to later
 	 * verification logic
-	 *   bpf_obj_drop
+	 *   bpf_obj_drop/bpf_percpu_obj_drop
 	 *     Record the local kptr type to be drop'd
 	 *   bpf_refcount_acquire (via KF_ARG_PTR_TO_REFCOUNTED_KPTR arg type)
 	 *     Record the local kptr type to be refcount_incr'd and use
@@ -4997,13 +4997,20 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	if (kptr_field->type == BPF_KPTR_UNREF)
 		perm_flags |= PTR_UNTRUSTED;
 
+	if (kptr_field->type == BPF_KPTR_PERCPU_REF)
+		perm_flags |= MEM_PERCPU | MEM_ALLOC;
+
 	if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
 		goto bad_type;
 
-	if (!btf_is_kernel(reg->btf)) {
+	if (kptr_field->type != BPF_KPTR_PERCPU_REF && !btf_is_kernel(reg->btf)) {
 		verbose(env, "R%d must point to kernel BTF\n", regno);
 		return -EINVAL;
 	}
+	if (kptr_field->type == BPF_KPTR_PERCPU_REF && btf_is_kernel(reg->btf)) {
+		verbose(env, "R%d must point to prog BTF\n", regno);
+		return -EINVAL;
+	}
 	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
 	reg_name = btf_type_name(reg->btf, reg->btf_id);
 
@@ -5084,7 +5091,17 @@ static bool rcu_safe_kptr(const struct btf_field *field)
 {
 	const struct btf_field_kptr *kptr = &field->kptr;
 
-	return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
+	return field->type == BPF_KPTR_PERCPU_REF ||
+	       (field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id));
+}
+
+static u32 btf_ld_kptr_type(struct bpf_verifier_env *env, struct btf_field *kptr_field)
+{
+	if (!rcu_safe_kptr(kptr_field) || !in_rcu_cs(env))
+		return PTR_MAYBE_NULL | PTR_UNTRUSTED;
+	if (kptr_field->type != BPF_KPTR_PERCPU_REF)
+		return PTR_MAYBE_NULL | MEM_RCU;
+	return PTR_MAYBE_NULL | MEM_RCU | MEM_PERCPU;
 }
 
 static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
@@ -5110,7 +5127,8 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	/* We only allow loading referenced kptr, since it will be marked as
 	 * untrusted, similar to unreferenced kptr.
 	 */
-	if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) {
+	if (class != BPF_LDX &&
+	    (kptr_field->type == BPF_KPTR_REF || kptr_field->type == BPF_KPTR_PERCPU_REF)) {
 		verbose(env, "store to referenced kptr disallowed\n");
 		return -EACCES;
 	}
@@ -5121,10 +5139,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf,
-				kptr_field->kptr.btf_id,
-				rcu_safe_kptr(kptr_field) && in_rcu_cs(env) ?
-				PTR_MAYBE_NULL | MEM_RCU :
-				PTR_MAYBE_NULL | PTR_UNTRUSTED);
+				kptr_field->kptr.btf_id, btf_ld_kptr_type(env, kptr_field));
 		/* For mark_ptr_or_null_reg */
 		val_reg->id = ++env->id_gen;
 	} else if (class == BPF_STX) {
@@ -5178,6 +5193,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			switch (field->type) {
 			case BPF_KPTR_UNREF:
 			case BPF_KPTR_REF:
+			case BPF_KPTR_PERCPU_REF:
 				if (src != ACCESS_DIRECT) {
 					verbose(env, "kptr cannot be accessed indirectly by helper\n");
 					return -EACCES;
@@ -7316,7 +7332,7 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
 		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
 		return -EACCES;
 	}
-	if (kptr_field->type != BPF_KPTR_REF) {
+	if (kptr_field->type != BPF_KPTR_REF && kptr_field->type != BPF_KPTR_PERCPU_REF) {
 		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
 		return -EACCES;
 	}
@@ -7827,8 +7843,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	if (base_type(arg_type) == ARG_PTR_TO_MEM)
 		type &= ~DYNPTR_TYPE_FLAG_MASK;
 
-	if (meta->func_id == BPF_FUNC_kptr_xchg && type_is_alloc(type))
+	if (meta->func_id == BPF_FUNC_kptr_xchg && type_is_alloc(type)) {
 		type &= ~MEM_ALLOC;
+		type &= ~MEM_PERCPU;
+	}
 
 	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
 		expected = compatible->types[i];
@@ -7918,6 +7936,14 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		}
 		/* Handled by helper specific checks */
 		break;
+	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_ALLOC:
+		if (meta->func_id != BPF_FUNC_kptr_xchg) {
+			verbose(env, "verifier internal error: unimplemented handling of MEM_PERCPU | MEM_ALLOC\n");
+			return -EFAULT;
+		}
+		if (map_kptr_match_type(env, meta->kptr_field, reg, regno))
+			return -EACCES;
+		break;
 	case PTR_TO_BTF_ID | MEM_PERCPU:
 	case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED:
 		/* Handled by helper specific checks */
@@ -9885,8 +9911,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		if (func_id == BPF_FUNC_kptr_xchg) {
 			ret_btf = meta.kptr_field->kptr.btf;
 			ret_btf_id = meta.kptr_field->kptr.btf_id;
-			if (!btf_is_kernel(ret_btf))
+			if (!btf_is_kernel(ret_btf)) {
 				regs[BPF_REG_0].type |= MEM_ALLOC;
+				if (meta.kptr_field->type == BPF_KPTR_PERCPU_REF)
+					regs[BPF_REG_0].type |= MEM_PERCPU;
+			}
 		} else {
 			if (fn->ret_btf_id == BPF_PTR_POISON) {
 				verbose(env, "verifier internal error:");
@@ -10271,6 +10300,8 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_slice,
 	KF_bpf_dynptr_slice_rdwr,
 	KF_bpf_dynptr_clone,
+	KF_bpf_percpu_obj_new_impl,
+	KF_bpf_percpu_obj_drop_impl,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -10291,6 +10322,8 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
+BTF_ID(func, bpf_percpu_obj_new_impl)
+BTF_ID(func, bpf_percpu_obj_drop_impl)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -10313,6 +10346,8 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
+BTF_ID(func, bpf_percpu_obj_new_impl)
+BTF_ID(func, bpf_percpu_obj_drop_impl)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -11003,7 +11038,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			}
 			break;
 		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
-			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC) &&
+			    reg->type != (PTR_TO_BTF_ID | MEM_PERCPU | MEM_ALLOC)) {
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
 				return -EINVAL;
 			}
@@ -11012,7 +11048,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				return -EINVAL;
 			}
 			if (meta->btf == btf_vmlinux &&
-			    meta->func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) {
+			    (meta->func_id == special_kfunc_list[KF_bpf_obj_drop_impl] ||
+			     meta->func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl])) {
 				meta->arg_btf = reg->btf;
 				meta->arg_btf_id = reg->btf_id;
 			}
@@ -11410,6 +11447,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		/* Only exception is bpf_obj_new_impl */
 		if (meta.btf != btf_vmlinux ||
 		    (meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl] &&
+		     meta.func_id != special_kfunc_list[KF_bpf_percpu_obj_new_impl] &&
 		     meta.func_id != special_kfunc_list[KF_bpf_refcount_acquire_impl])) {
 			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
 			return -EINVAL;
@@ -11423,11 +11461,15 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
 
 		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
-			if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
+			if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] ||
+			    meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
 				struct btf *ret_btf;
 				u32 ret_btf_id;
 
-				if (unlikely(!bpf_global_ma_set))
+				if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
+					return -ENOMEM;
+
+				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && !bpf_global_percpu_ma_set)
 					return -ENOMEM;
 
 				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
@@ -11440,13 +11482,18 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 				/* This may be NULL due to user not supplying a BTF */
 				if (!ret_btf) {
-					verbose(env, "bpf_obj_new requires prog BTF\n");
+					verbose(env, "bpf_obj_new/bpf_percpu_obj_new requires prog BTF\n");
 					return -EINVAL;
 				}
 
 				ret_t = btf_type_by_id(ret_btf, ret_btf_id);
 				if (!ret_t || !__btf_type_is_struct(ret_t)) {
-					verbose(env, "bpf_obj_new type ID argument must be of a struct\n");
+					verbose(env, "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct\n");
+					return -EINVAL;
+				}
+				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] &&
+				    !__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) {
+					verbose(env, "bpf_percpu_obj_new type ID argument must be of a struct of scalars\n");
 					return -EINVAL;
 				}
 
@@ -11454,6 +11501,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
 				regs[BPF_REG_0].btf = ret_btf;
 				regs[BPF_REG_0].btf_id = ret_btf_id;
+				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl])
+					regs[BPF_REG_0].type |= MEM_PERCPU;
 
 				insn_aux->obj_new_size = ret_t->size;
 				insn_aux->kptr_struct_meta =
@@ -11594,7 +11643,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].id = ++env->id_gen;
 	} else if (btf_type_is_void(t)) {
 		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
-			if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) {
+			if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl] ||
+			    meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl]) {
 				insn_aux->kptr_struct_meta =
 					btf_find_struct_meta(meta.arg_btf,
 							     meta.arg_btf_id);
@@ -18266,7 +18316,8 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn->imm = BPF_CALL_IMM(desc->addr);
 	if (insn->off)
 		return 0;
-	if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
+	if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl] ||
+	    desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
 		u64 obj_new_size = env->insn_aux_data[insn_idx].obj_new_size;
@@ -18277,6 +18328,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn_buf[3] = *insn;
 		*cnt = 4;
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_obj_drop_impl] ||
+		   desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]) {
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (2 preceding siblings ...)
  2023-08-14 17:28 ` [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-19  1:01   ` Alexei Starovoitov
  2023-08-14 17:28 ` [PATCH bpf-next 05/15] selftests/bpf: Update error message in negative linked_list test Yonghong Song
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

The bpf helpers bpf_this_cpu_ptr() and bpf_per_cpu_ptr() are re-purposed
for allocated percpu objects. For an allocated percpu obj,
the reg type is 'PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU'.

The return type for these two re-purposed helpera is
'PTR_TO_MEM | MEM_RCU | MEM_ALLOC'.
The MEM_ALLOC allows that the per-cpu data can be read and written.

Since the memory allocator bpf_mem_alloc() returns
a ptr to a percpu ptr for percpu data, the first argument
of bpf_this_cpu_ptr() and bpf_per_cpu_ptr() is patched
with a dereference before passing to the helper func.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 51 ++++++++++++++++++++++++++++++------
 2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f70f9ac884d2..e23480db37ec 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -480,6 +480,7 @@ struct bpf_insn_aux_data {
 	bool zext_dst; /* this insn zero extends dst reg */
 	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
 	bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
+	bool percpu_ptr_prog_alloc; /* {this,per}_cpu_ptr() with prog alloc */
 	u8 alu_state; /* used in combination with alu_limit */
 
 	/* below fields are initialized once */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a985fbf18a11..6fc200cb68b6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6221,7 +6221,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		}
 
 		if (type_is_alloc(reg->type) && !type_is_non_owning_ref(reg->type) &&
-		    !reg->ref_obj_id) {
+		    !(reg->type & MEM_RCU) && !reg->ref_obj_id) {
 			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
 			return -EFAULT;
 		}
@@ -7765,6 +7765,7 @@ static const struct bpf_reg_types btf_ptr_types = {
 static const struct bpf_reg_types percpu_btf_ptr_types = {
 	.types = {
 		PTR_TO_BTF_ID | MEM_PERCPU,
+		PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU,
 		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
 	}
 };
@@ -7945,6 +7946,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			return -EACCES;
 		break;
 	case PTR_TO_BTF_ID | MEM_PERCPU:
+	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED:
 		/* Handled by helper specific checks */
 		break;
@@ -8287,6 +8289,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "Helper has invalid btf_id in R%d\n", regno);
 			return -EACCES;
 		}
+		if (reg->type & MEM_RCU) {
+			const struct btf_type *type = btf_type_by_id(reg->btf, reg->btf_id);
+
+			if (!type || !btf_type_is_struct(type)) {
+				verbose(env, "Helper has invalid btf/btf_id in R%d\n", regno);
+				return -EFAULT;
+			}
+			env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc = true;
+		}
+
 		meta->ret_btf = reg->btf;
 		meta->ret_btf_id = reg->btf_id;
 		break;
@@ -9888,14 +9900,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
 			regs[BPF_REG_0].mem_size = tsize;
 		} else {
-			/* MEM_RDONLY may be carried from ret_flag, but it
-			 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
-			 * it will confuse the check of PTR_TO_BTF_ID in
-			 * check_mem_access().
-			 */
-			ret_flag &= ~MEM_RDONLY;
+			if (env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc) {
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU;
+			} else {
+				/* MEM_RDONLY may be carried from ret_flag, but it
+				 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
+				 * it will confuse the check of PTR_TO_BTF_ID in
+				 * check_mem_access().
+				 */
+				ret_flag &= ~MEM_RDONLY;
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
+			}
 
-			regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
 			regs[BPF_REG_0].btf = meta.ret_btf;
 			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
 		}
@@ -18646,6 +18662,25 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			goto patch_call_imm;
 		}
 
+		/* bpf_per_cpu_ptr() and bpf_this_cpu_ptr() */
+		if (env->insn_aux_data[i + delta].percpu_ptr_prog_alloc) {
+			/* patch with 'r1 = *(u64 *)(r1 + 0)' since for percpu data,
+			 * bpf_mem_alloc() returns a ptr to the percpu data ptr.
+			 */
+			insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0);
+			insn_buf[1] = *insn;
+			cnt = 2;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta += cnt - 1;
+			env->prog = prog = new_prog;
+			insn = new_prog->insnsi + i + delta;
+			goto patch_call_imm;
+		}
+
 		/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
 		 * and other inlining handlers are currently limited to 64 bit
 		 * only.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 05/15] selftests/bpf: Update error message in negative linked_list test
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (3 preceding siblings ...)
  2023-08-14 17:28 ` [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 06/15] libbpf: Add __percpu macro definition Yonghong Song
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Some error messages are changed due to the addition of
percpu kptr support. Fix linked_list test with changed
error messages.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/testing/selftests/bpf/prog_tests/linked_list.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 18cf7b17463d..db3bf6bbe01a 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -65,8 +65,8 @@ static struct {
 	{ "map_compat_raw_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
 	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
 	{ "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" },
-	{ "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" },
-	{ "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" },
+	{ "obj_new_no_composite", "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct" },
+	{ "obj_new_no_struct", "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct" },
 	{ "obj_drop_non_zero_off", "R1 must have zero offset when passed to release func" },
 	{ "new_null_ret", "R0 invalid mem access 'ptr_or_null_'" },
 	{ "obj_new_acq", "Unreleased reference id=" },
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 06/15] libbpf: Add __percpu macro definition
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (4 preceding siblings ...)
  2023-08-14 17:28 ` [PATCH bpf-next 05/15] selftests/bpf: Update error message in negative linked_list test Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 07/15] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Add __percpu macro definition in bpf_helpers.h.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/lib/bpf/bpf_helpers.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index bbab9ad9dc5a..30f050c6b106 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -181,6 +181,7 @@ enum libbpf_tristate {
 #define __ksym __attribute__((section(".ksyms")))
 #define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted")))
 #define __kptr __attribute__((btf_type_tag("kptr")))
+#define __percpu __attribute__((btf_type_tag("percpu")))
 
 #define bpf_ksym_exists(sym) ({									\
 	_Static_assert(!__builtin_constant_p(!!sym), #sym " should be marked as __weak");	\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 07/15] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (5 preceding siblings ...)
  2023-08-14 17:28 ` [PATCH bpf-next 06/15] libbpf: Add __percpu macro definition Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 08/15] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

The new macro bpf_percpu_obj_{new/drop}() is very similar to bpf_obj_{new,drop}()
as they both take a type as the argument.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../testing/selftests/bpf/bpf_experimental.h  | 31 +++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 209811b1993a..9c5fe11520ae 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -131,4 +131,35 @@ extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *nod
  */
 extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
 
+/* Description
+ *	Allocates a percpu object of the type represented by 'local_type_id' in
+ *	program BTF. User may use the bpf_core_type_id_local macro to pass the
+ *	type ID of a struct in program BTF.
+ *
+ *	The 'local_type_id' parameter must be a known constant.
+ *	The 'meta' parameter is rewritten by the verifier, no need for BPF
+ *	program to set it.
+ * Returns
+ *	A pointer to a percpu object of the type corresponding to the passed in
+ *	'local_type_id', or NULL on failure.
+ */
+extern void *bpf_percpu_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
+
+/* Convenience macro to wrap over bpf_percpu_obj_new_impl */
+#define bpf_percpu_obj_new(type) ((type __percpu *)bpf_percpu_obj_new_impl(bpf_core_type_id_local(type), NULL))
+
+/* Description
+ *	Free an allocated percpu object. All fields of the object that require
+ *	destruction will be destructed before the storage is freed.
+ *
+ *	The 'meta' parameter is rewritten by the verifier, no need for BPF
+ *	program to set it.
+ * Returns
+ *	Void.
+ */
+extern void bpf_percpu_obj_drop_impl(void *kptr, void *meta) __ksym;
+
+/* Convenience macro to wrap over bpf_obj_drop_impl */
+#define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL)
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 08/15] selftests/bpf: Add tests for array map with local percpu kptr
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (6 preceding siblings ...)
  2023-08-14 17:28 ` [PATCH bpf-next 07/15] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-14 17:28 ` [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Add non-sleepable and sleepable tests with percpu kptr. For
non-sleepable test, four programs are executed in the order of:
  1. allocate percpu data.
  2. assign values to percpu data.
  3. retrieve percpu data.
  4. de-allocate percpu data.

The sleepable prog tried to exercise all above 4 steps in a
single prog. Also for sleepable prog, rcu_read_lock is needed
to protect direct percpu ptr access (from map value) and
following bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   |  78 ++++++++
 .../selftests/bpf/progs/percpu_alloc_array.c  | 187 ++++++++++++++++++
 2 files changed, 265 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_array.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
new file mode 100644
index 000000000000..0fb536822f14
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "percpu_alloc_array.skel.h"
+
+static void test_array(void)
+{
+	struct percpu_alloc_array *skel;
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = percpu_alloc_array__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc_array__open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.test_array_map_1, true);
+	bpf_program__set_autoload(skel->progs.test_array_map_2, true);
+	bpf_program__set_autoload(skel->progs.test_array_map_3, true);
+	bpf_program__set_autoload(skel->progs.test_array_map_4, true);
+
+	skel->rodata->nr_cpus = libbpf_num_possible_cpus();
+
+	err = percpu_alloc_array__load(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__load"))
+		goto out;
+
+	err = percpu_alloc_array__attach(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__attach"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.test_array_map_1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run array_map 1-4");
+	ASSERT_EQ(topts.retval, 0, "test_run array_map 1-4");
+	ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d");
+	ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c");
+out:
+	percpu_alloc_array__destroy(skel);
+}
+
+static void test_array_sleepable(void)
+{
+	struct percpu_alloc_array *skel;
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = percpu_alloc_array__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc__open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.test_array_map_10, true);
+
+	skel->rodata->nr_cpus = libbpf_num_possible_cpus();
+
+	err = percpu_alloc_array__load(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__load"))
+		goto out;
+
+	err = percpu_alloc_array__attach(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__attach"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.test_array_map_10);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run array_map_10");
+	ASSERT_EQ(topts.retval, 0, "test_run array_map_10");
+	ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d");
+	ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c");
+out:
+	percpu_alloc_array__destroy(skel);
+}
+
+void test_percpu_alloc(void)
+{
+	if (test__start_subtest("array"))
+		test_array();
+	if (test__start_subtest("array_sleepable"))
+		test_array_sleepable();
+}
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
new file mode 100644
index 000000000000..24903ea565a0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
@@ -0,0 +1,187 @@
+#include "bpf_experimental.h"
+
+struct val_t {
+	long b, c, d;
+};
+
+struct elem {
+	long sum;
+	struct val_t __percpu *pc;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct elem);
+} array SEC(".maps");
+
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+const volatile int nr_cpus;
+
+/* Initialize the percpu object */
+SEC("?fentry/bpf_fentry_test1")
+int BPF_PROG(test_array_map_1)
+{
+	struct val_t __percpu *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	return 0;
+}
+
+/* Update percpu data */
+SEC("?fentry/bpf_fentry_test2")
+int BPF_PROG(test_array_map_2)
+{
+	struct val_t __percpu *p;
+	struct val_t *v;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	v = bpf_per_cpu_ptr(p, 0);
+	if (!v)
+		return 0;
+	v->c = 1;
+	v->d = 2;
+
+	return 0;
+}
+
+int cpu0_field_d, sum_field_c;
+
+/* Summarize percpu data */
+SEC("?fentry/bpf_fentry_test3")
+int BPF_PROG(test_array_map_3)
+{
+	struct val_t __percpu *p;
+	int i, index = 0;
+	struct val_t *v;
+	struct elem *e;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	bpf_for(i, 0, nr_cpus) {
+		v = bpf_per_cpu_ptr(p, i);
+		if (v) {
+			if (i == 0)
+				cpu0_field_d = v->d;
+			sum_field_c += v->c;
+		}
+	}
+
+	return 0;
+}
+
+/* Explicitly free allocated percpu data */
+SEC("?fentry/bpf_fentry_test4")
+int BPF_PROG(test_array_map_4)
+{
+	struct val_t __percpu *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	/* delete */
+	p = bpf_kptr_xchg(&e->pc, NULL);
+	if (p) {
+		bpf_percpu_obj_drop(p);
+	}
+
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+int BPF_PROG(test_array_map_10)
+{
+	struct val_t __percpu *p, *p1;
+	int i, index = 0;
+	struct val_t *v;
+	struct elem *e;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	bpf_rcu_read_lock();
+	p = e->pc;
+	if (!p) {
+		p = bpf_percpu_obj_new(struct val_t);
+		if (!p)
+			goto out;
+
+		p1 = bpf_kptr_xchg(&e->pc, p);
+		if (p1) {
+			/* race condition */
+			bpf_percpu_obj_drop(p1);
+		}
+
+		p = e->pc;
+		if (!p)
+			goto out;
+	}
+
+	v = bpf_this_cpu_ptr(p);
+	v->c = 3;
+	v = bpf_this_cpu_ptr(p);
+	v->c = 0;
+
+	v = bpf_per_cpu_ptr(p, 0);
+	if (!v)
+		goto out;
+	v->c = 1;
+	v->d = 2;
+
+	/* delete */
+	p1 = bpf_kptr_xchg(&e->pc, NULL);
+	if (!p1)
+		goto out;
+
+	bpf_for(i, 0, nr_cpus) {
+		v = bpf_per_cpu_ptr(p, i);
+		if (v) {
+			if (i == 0)
+				cpu0_field_d = v->d;
+			sum_field_c += v->c;
+		}
+	}
+
+	/* finally release p */
+	bpf_percpu_obj_drop(p1);
+out:
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (7 preceding siblings ...)
  2023-08-14 17:28 ` [PATCH bpf-next 08/15] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
@ 2023-08-14 17:28 ` Yonghong Song
  2023-08-19  1:44   ` Kumar Kartikeya Dwivedi
  2023-08-14 17:29 ` [PATCH bpf-next 10/15] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

In previous selftests/bpf patch, we have
  p = bpf_percpu_obj_new(struct val_t);
  if (!p)
          goto out;

  p1 = bpf_kptr_xchg(&e->pc, p);
  if (p1) {
          /* race condition */
          bpf_percpu_obj_drop(p1);
  }

  p = e->pc;
  if (!p)
          goto out;

After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
This is due to that the second argument of bpf_kptr_xchg() is marked
OBJ_RELEASE and it will be marked as invalid after the call.
So after bpf_kptr_xchg(), 'p' is an unknown scalar,
and the bpf program needs to reread from the map value.

This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
and if 'p' is RCU protected. If this is the case, 'p' can be marked
as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
an owning reference any more. Such a change makes re-read
from the map value unnecessary.

Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
a different value from 'p' if immediately before 'p = e->pc',
another cpu may do another bpf_kptr_xchg() and swap in another value
into 'e->pc'. If this is the case, then 'p = e->pc' may
get either 'p' or another value, and race condition already exists.
So removing direct re-reading seems fine too.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6fc200cb68b6..6fa458e13bfc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env,
 		return err;
 
 	bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
-		if (reg->ref_obj_id == ref_obj_id)
-			mark_reg_invalid(env, reg);
+		if (reg->ref_obj_id == ref_obj_id) {
+			if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
+				reg->ref_obj_id = 0;
+				reg->type &= ~MEM_ALLOC;
+				reg->type |= MEM_RCU;
+			} else {
+				mark_reg_invalid(env, reg);
+			}
+		}
 	}));
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 10/15] selftests/bpf: Remove unnecessary direct read of local percpu kptr
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (8 preceding siblings ...)
  2023-08-14 17:28 ` [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
@ 2023-08-14 17:29 ` Yonghong Song
  2023-08-14 17:29 ` [PATCH bpf-next 11/15] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

For the second argument of bpf_kptr_xchg(), if the reg type contains
MEM_ALLOC and MEM_PERCPU, which means a percpu allocation,
after bpf_kptr_xchg(), the argument is marked as MEM_RCU and MEM_PERCPU
if in rcu critical section. This way, re-reading from the map value
is not needed. Remove it from the percpu_alloc_array.c selftest.

Without previous kernel change, the test will fail like below:

  0: R1=ctx(off=0,imm=0) R10=fp0
  ; int BPF_PROG(test_array_map_10, int a)
  0: (b4) w1 = 0                        ; R1_w=0
  ; int i, index = 0;
  1: (63) *(u32 *)(r10 -4) = r1         ; R1_w=0 R10=fp0 fp-8=0000????
  2: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
  ;
  3: (07) r2 += -4                      ; R2_w=fp-4
  ; e = bpf_map_lookup_elem(&array, &index);
  4: (18) r1 = 0xffff88810e771800       ; R1_w=map_ptr(off=0,ks=4,vs=16,imm=0)
  6: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0)
  7: (bf) r6 = r0                       ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0) R6_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0)
  ; if (!e)
  8: (15) if r6 == 0x0 goto pc+81       ; R6_w=map_value(off=0,ks=4,vs=16,imm=0)
  ; bpf_rcu_read_lock();
  9: (85) call bpf_rcu_read_lock#87892          ;
  ; p = e->pc;
  10: (bf) r7 = r6                      ; R6=map_value(off=0,ks=4,vs=16,imm=0) R7_w=map_value(off=0,ks=4,vs=16,imm=0)
  11: (07) r7 += 8                      ; R7_w=map_value(off=8,ks=4,vs=16,imm=0)
  12: (79) r6 = *(u64 *)(r6 +8)         ; R6_w=percpu_rcu_ptr_or_null_val_t(id=2,off=0,imm=0)
  ; if (!p) {
  13: (55) if r6 != 0x0 goto pc+13      ; R6_w=0
  ; p = bpf_percpu_obj_new(struct val_t);
  14: (18) r1 = 0x12                    ; R1_w=18
  16: (b7) r2 = 0                       ; R2_w=0
  17: (85) call bpf_percpu_obj_new_impl#87883   ; R0_w=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4
  18: (bf) r6 = r0                      ; R0=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4
  ; if (!p)
  19: (15) if r6 == 0x0 goto pc+69      ; R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4
  ; p1 = bpf_kptr_xchg(&e->pc, p);
  20: (bf) r1 = r7                      ; R1_w=map_value(off=8,ks=4,vs=16,imm=0) R7=map_value(off=8,ks=4,vs=16,imm=0) refs=4
  21: (bf) r2 = r6                      ; R2_w=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4
  22: (85) call bpf_kptr_xchg#194       ; R0_w=percpu_ptr_or_null_val_t(id=6,ref_obj_id=6,off=0,imm=0) refs=6
  ; if (p1) {
  23: (15) if r0 == 0x0 goto pc+3       ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6
  ; bpf_percpu_obj_drop(p1);
  24: (bf) r1 = r0                      ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) R1_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6
  25: (b7) r2 = 0                       ; R2_w=0 refs=6
  26: (85) call bpf_percpu_obj_drop_impl#87882          ;
  ; v = bpf_this_cpu_ptr(p);
  27: (bf) r1 = r6                      ; R1_w=scalar(id=7) R6=scalar(id=7)
  28: (85) call bpf_this_cpu_ptr#154
  R1 type=scalar expected=percpu_ptr_, percpu_rcu_ptr_, percpu_trusted_ptr_

The R1 which gets its value from R6 is a scalar. But before insn 22, R6 is
  R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0)
Its type is changed to a scalar at insn 22 without previous patch.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/testing/selftests/bpf/progs/percpu_alloc_array.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
index 24903ea565a0..cde23eb7468e 100644
--- a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
@@ -146,10 +146,6 @@ int BPF_PROG(test_array_map_10)
 			/* race condition */
 			bpf_percpu_obj_drop(p1);
 		}
-
-		p = e->pc;
-		if (!p)
-			goto out;
 	}
 
 	v = bpf_this_cpu_ptr(p);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 11/15] selftests/bpf: Add tests for cgrp_local_storage with local percpu kptr
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (9 preceding siblings ...)
  2023-08-14 17:29 ` [PATCH bpf-next 10/15] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
@ 2023-08-14 17:29 ` Yonghong Song
  2023-08-14 17:29 ` [PATCH bpf-next 12/15] bpf: Allow bpf_spin_lock and bpf_list_head in allocated percpu data structure Yonghong Song
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Add a non-sleepable cgrp_local_storage test with percpu kptr. The
test does allocation of percpu data, assigning values to percpu
data and retrieval of percpu data. The de-allocation of percpu
data is done when the map is freed.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   |  40 +++++++
 .../progs/percpu_alloc_cgrp_local_storage.c   | 105 ++++++++++++++++++
 2 files changed, 145 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index 0fb536822f14..41bf784a4bb3 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 #include "percpu_alloc_array.skel.h"
+#include "percpu_alloc_cgrp_local_storage.skel.h"
 
 static void test_array(void)
 {
@@ -69,10 +70,49 @@ static void test_array_sleepable(void)
 	percpu_alloc_array__destroy(skel);
 }
 
+static void test_cgrp_local_storage(void)
+{
+	struct percpu_alloc_cgrp_local_storage *skel;
+	int err, cgroup_fd, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	cgroup_fd = test__join_cgroup("/percpu_alloc");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /percpu_alloc"))
+		return;
+
+	skel = percpu_alloc_cgrp_local_storage__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc_cgrp_local_storage__open"))
+		goto close_fd;
+
+	skel->rodata->nr_cpus = libbpf_num_possible_cpus();
+
+	err = percpu_alloc_cgrp_local_storage__load(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_cgrp_local_storage__load"))
+		goto destroy_skel;
+
+	err = percpu_alloc_cgrp_local_storage__attach(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_cgrp_local_storage__attach"))
+		goto destroy_skel;
+
+	prog_fd = bpf_program__fd(skel->progs.test_cgrp_local_storage_1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run cgrp_local_storage 1-3");
+	ASSERT_EQ(topts.retval, 0, "test_run cgrp_local_storage 1-3");
+	ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d");
+	ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c");
+
+destroy_skel:
+	percpu_alloc_cgrp_local_storage__destroy(skel);
+close_fd:
+	close(cgroup_fd);
+}
+
 void test_percpu_alloc(void)
 {
 	if (test__start_subtest("array"))
 		test_array();
 	if (test__start_subtest("array_sleepable"))
 		test_array_sleepable();
+	if (test__start_subtest("cgrp_local_storage"))
+		test_cgrp_local_storage();
 }
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c b/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c
new file mode 100644
index 000000000000..b79285ac770e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c
@@ -0,0 +1,105 @@
+#include "bpf_experimental.h"
+
+struct val_t {
+	long b, c, d;
+};
+
+struct elem {
+	long sum;
+	struct val_t __percpu *pc;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct elem);
+} cgrp SEC(".maps");
+
+const volatile int nr_cpus;
+
+/* Initialize the percpu object */
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(test_cgrp_local_storage_1)
+{
+	struct task_struct *task;
+	struct val_t __percpu *p;
+	struct elem *e;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0,
+				 BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	return 0;
+}
+
+/* Percpu data collection */
+SEC("fentry/bpf_fentry_test2")
+int BPF_PROG(test_cgrp_local_storage_2)
+{
+	struct task_struct *task;
+	struct val_t __percpu *p;
+	struct val_t *v;
+	struct elem *e;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	v = bpf_per_cpu_ptr(p, 0);
+	if (!v)
+		return 0;
+	v->c = 1;
+	v->d = 2;
+	return 0;
+}
+
+int cpu0_field_d, sum_field_c;
+
+/* Summarize percpu data collection */
+SEC("fentry/bpf_fentry_test3")
+int BPF_PROG(test_cgrp_local_storage_3)
+{
+	struct task_struct *task;
+	struct val_t __percpu *p;
+	struct val_t *v;
+	struct elem *e;
+	int i;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	bpf_for(i, 0, nr_cpus) {
+		v = bpf_per_cpu_ptr(p, i);
+		if (v) {
+			if (i == 0)
+				cpu0_field_d = v->d;
+			sum_field_c += v->c;
+		}
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 12/15] bpf: Allow bpf_spin_lock and bpf_list_head in allocated percpu data structure
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (10 preceding siblings ...)
  2023-08-14 17:29 ` [PATCH bpf-next 11/15] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
@ 2023-08-14 17:29 ` Yonghong Song
  2023-08-14 17:29 ` [PATCH bpf-next 13/15] selftests/bpf: Add tests for percpu struct with bpf list head Yonghong Song
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Currently, bpf_percpu_obj_new() only allows scalar struct. A scalar struct contains
scalar member or a struct with scalar member up to 3 nested struct levels.
For example, below is a linked list in the per cpu object:
  struct val_t {
       long b, c, d;
       struct bpf_list_head head __contains(foo, node);
       struct bpf_spin_lock lock;
  };
  struct map_val_t {
       ...
       struct val_t __percpu *pc;
       ...
  };

This patch added verifier support for the above percpu struct
with bpf_spin_lock and bpf_list_head. The change is mostly to
add MEM_RCU to various type checking places since the eventual
per cpu object is marked as MEM_RCU, which is used as the input
to various helpers/kfuncs like bpf_spin_lock(), bpf_spin_unlock()
and bpf_list_push_back(), etc.

Another possible use case is bpf_rb_root where a rb tree is
in a percpu struct. The support can be added similarly in
the future if needed.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6fa458e13bfc..4c3045c8d25f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7848,6 +7848,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		type &= ~MEM_ALLOC;
 		type &= ~MEM_PERCPU;
 	}
+	if (meta->func_id == BPF_FUNC_spin_lock || meta->func_id == BPF_FUNC_spin_unlock)
+		type &= ~MEM_RCU;
 
 	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
 		expected = compatible->types[i];
@@ -7929,6 +7931,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		}
 		break;
 	}
+	case PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_ALLOC:
 		if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock &&
 		    meta->func_id != BPF_FUNC_kptr_xchg) {
@@ -8040,6 +8043,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	case PTR_TO_BTF_ID | MEM_ALLOC:
 	case PTR_TO_BTF_ID | PTR_TRUSTED:
 	case PTR_TO_BTF_ID | MEM_RCU:
+	case PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * its fixed offset must be 0. In the other cases, fixed offset
@@ -10604,8 +10608,8 @@ static int ref_convert_owning_non_owning(struct bpf_verifier_env *env, u32 ref_o
  *	active_lock.ptr = Register's type specific pointer
  *	active_lock.id  = A unique ID for each register pointer value
  *
- * Currently, PTR_TO_MAP_VALUE and PTR_TO_BTF_ID | MEM_ALLOC are the two
- * supported register types.
+ * Currently, PTR_TO_MAP_VALUE, PTR_TO_BTF_ID | MEM_ALLOC and
+ * PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU are the three supported register types.
  *
  * The active_lock.ptr in case of map values is the reg->map_ptr, and in case of
  * allocated objects is the reg->btf pointer.
@@ -10640,6 +10644,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
 		ptr = reg->map_ptr;
 		break;
 	case PTR_TO_BTF_ID | MEM_ALLOC:
+	case PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU:
 		ptr = reg->btf;
 		break;
 	default:
@@ -11140,7 +11145,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			break;
 		case KF_ARG_PTR_TO_LIST_HEAD:
 			if (reg->type != PTR_TO_MAP_VALUE &&
-			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC) &&
+			    reg->type != (PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU)) {
 				verbose(env, "arg#%d expected pointer to map value or allocated object\n", i);
 				return -EINVAL;
 			}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 13/15] selftests/bpf: Add tests for percpu struct with bpf list head
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (11 preceding siblings ...)
  2023-08-14 17:29 ` [PATCH bpf-next 12/15] bpf: Allow bpf_spin_lock and bpf_list_head in allocated percpu data structure Yonghong Song
@ 2023-08-14 17:29 ` Yonghong Song
  2023-08-14 17:29 ` [PATCH bpf-next 14/15] selftests/bpf: Add some negative tests Yonghong Song
  2023-08-14 17:29 ` [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

This is to test nested special fields with the following pattern:
  struct val_t {
       long b, c, d;
       struct bpf_list_head head __contains(foo, node);
       struct bpf_spin_lock lock;
  };
  struct map_val_t {
       ...
       struct val_t __percpu *pc;
       ...
  };

That is, the percpu data struct can hold a linked list.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   |  40 ++++++
 .../percpu_alloc_nested_special_fields.c      | 121 ++++++++++++++++++
 2 files changed, 161 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index 41bf784a4bb3..ee9dd495db7b 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -2,6 +2,7 @@
 #include <test_progs.h>
 #include "percpu_alloc_array.skel.h"
 #include "percpu_alloc_cgrp_local_storage.skel.h"
+#include "percpu_alloc_nested_special_fields.skel.h"
 
 static void test_array(void)
 {
@@ -107,6 +108,43 @@ static void test_cgrp_local_storage(void)
 	close(cgroup_fd);
 }
 
+static void test_nested_special_fields(void)
+{
+	struct percpu_alloc_nested_special_fields *skel;
+	int err, cgroup_fd, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	cgroup_fd = test__join_cgroup("/percpu_alloc");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /percpu_alloc"))
+		return;
+
+	skel = percpu_alloc_nested_special_fields__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc_nested_special_fields__open"))
+		goto close_fd;
+
+	skel->rodata->nr_cpus = libbpf_num_possible_cpus();
+
+	err = percpu_alloc_nested_special_fields__load(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_nested_special_fields__load"))
+		goto destroy_skel;
+
+	err = percpu_alloc_nested_special_fields__attach(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_nested_special_fields__attach"))
+		goto destroy_skel;
+
+	prog_fd = bpf_program__fd(skel->progs.test_cgrp_local_storage_1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run nested_special_fields 1-3");
+	ASSERT_EQ(topts.retval, 0, "test_run nested_special_fields 1-3");
+	ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d");
+	ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c");
+
+destroy_skel:
+	percpu_alloc_nested_special_fields__destroy(skel);
+close_fd:
+	close(cgroup_fd);
+}
+
 void test_percpu_alloc(void)
 {
 	if (test__start_subtest("array"))
@@ -115,4 +153,6 @@ void test_percpu_alloc(void)
 		test_array_sleepable();
 	if (test__start_subtest("cgrp_local_storage"))
 		test_cgrp_local_storage();
+	if (test__start_subtest("nested_special_fields"))
+		test_nested_special_fields();
 }
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c b/tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c
new file mode 100644
index 000000000000..d1a8e9b6b472
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_nested_special_fields.c
@@ -0,0 +1,121 @@
+#include "bpf_experimental.h"
+
+struct foo {
+	long key, data;
+	struct bpf_list_node node;
+};
+
+struct val_t {
+	long b, c, d;
+	struct bpf_list_head head __contains(foo, node);
+	struct bpf_spin_lock lock;
+};
+
+struct elem {
+	long sum;
+	struct val_t __percpu *pc;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct elem);
+} cgrp SEC(".maps");
+
+const volatile int nr_cpus;
+
+/* Initialize the percpu object */
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(test_cgrp_local_storage_1)
+{
+	struct task_struct *task;
+	struct val_t __percpu *p;
+	struct elem *e;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0,
+				 BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	return 0;
+}
+
+/* Percpu data collection */
+SEC("fentry/bpf_fentry_test2")
+int BPF_PROG(test_cgrp_local_storage_2)
+{
+	struct task_struct *task;
+	struct val_t __percpu *p;
+	struct val_t *v;
+	struct elem *e;
+	struct foo *f;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	v = bpf_per_cpu_ptr(p, 0);
+	if (!v)
+		return 0;
+	v->c = 1;
+	v->d = 2;
+
+	f = bpf_obj_new(struct foo);
+	if (!f)
+		return 0;
+	bpf_spin_lock(&v->lock);
+	bpf_list_push_back(&v->head, &f->node);
+	bpf_spin_unlock(&v->lock);
+
+	return 0;
+}
+
+int cpu0_field_d, sum_field_c;
+
+/* Summarize percpu data collection */
+SEC("fentry/bpf_fentry_test3")
+int BPF_PROG(test_cgrp_local_storage_3)
+{
+	struct task_struct *task;
+	struct val_t __percpu *p;
+	struct val_t *v;
+	struct elem *e;
+	int i;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	bpf_for(i, 0, nr_cpus) {
+		v = bpf_per_cpu_ptr(p, i);
+		if (v) {
+			if (i == 0)
+				cpu0_field_d = v->d;
+			sum_field_c += v->c;
+		}
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 14/15] selftests/bpf: Add some negative tests
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (12 preceding siblings ...)
  2023-08-14 17:29 ` [PATCH bpf-next 13/15] selftests/bpf: Add tests for percpu struct with bpf list head Yonghong Song
@ 2023-08-14 17:29 ` Yonghong Song
  2023-08-14 17:29 ` [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
  14 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Add a few negative tests for common mistakes with using percpu kptr.
Other similar error cases occuring with bpf_obj_new/drop() are not
included here.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   |   7 ++
 .../selftests/bpf/progs/percpu_alloc_fail.c   | 100 ++++++++++++++++++
 2 files changed, 107 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index ee9dd495db7b..17ba3159cea7 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -3,6 +3,7 @@
 #include "percpu_alloc_array.skel.h"
 #include "percpu_alloc_cgrp_local_storage.skel.h"
 #include "percpu_alloc_nested_special_fields.skel.h"
+#include "percpu_alloc_fail.skel.h"
 
 static void test_array(void)
 {
@@ -145,6 +146,10 @@ static void test_nested_special_fields(void)
 	close(cgroup_fd);
 }
 
+static void test_failure(void) {
+	RUN_TESTS(percpu_alloc_fail);
+}
+
 void test_percpu_alloc(void)
 {
 	if (test__start_subtest("array"))
@@ -155,4 +160,6 @@ void test_percpu_alloc(void)
 		test_cgrp_local_storage();
 	if (test__start_subtest("nested_special_fields"))
 		test_nested_special_fields();
+	if (test__start_subtest("failure_tests"))
+		test_failure();
 }
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c b/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c
new file mode 100644
index 000000000000..47bdf6f857cb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c
@@ -0,0 +1,100 @@
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+struct val_t {
+	long b, c, d;
+};
+
+struct val2_t {
+	long b;
+};
+
+struct elem {
+	long sum;
+	struct val_t __percpu *pc;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct elem);
+} array SEC(".maps");
+
+long ret;
+
+SEC("?fentry/bpf_fentry_test1")
+__failure __msg("store to referenced kptr disallowed")
+int BPF_PROG(test_array_map_1)
+{
+	struct val_t __percpu *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	e->pc = (struct val_t __percpu *)ret;
+	return 0;
+}
+
+SEC("?fentry/bpf_fentry_test1")
+__failure __msg("invalid kptr access, R2 type=percpu_ptr_val2_t expected=ptr_val_t")
+int BPF_PROG(test_array_map_2)
+{
+	struct val2_t __percpu *p2;
+	struct val_t __percpu *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p2 = bpf_percpu_obj_new(struct val2_t);
+	if (!p2)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p2);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+__failure __msg("R1 type=scalar expected=percpu_ptr_, percpu_rcu_ptr_, percpu_trusted_ptr_")
+int BPF_PROG(test_array_map_3)
+{
+	struct val_t __percpu *p, *p1;
+	struct val_t *v;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p1 = bpf_kptr_xchg(&e->pc, p);
+	if (p1)
+		bpf_percpu_obj_drop(p1);
+
+	v = bpf_this_cpu_ptr(p);
+	ret = v->b;
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
  2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
                   ` (13 preceding siblings ...)
  2023-08-14 17:29 ` [PATCH bpf-next 14/15] selftests/bpf: Add some negative tests Yonghong Song
@ 2023-08-14 17:29 ` Yonghong Song
  2023-08-18 15:54   ` Daniel Borkmann
  2023-08-18 18:26   ` Zvi Effron
  14 siblings, 2 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-14 17:29 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

Now 'BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + local percpu ptr'
can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/uapi/linux/bpf.h       | 9 ++++++++-
 tools/include/uapi/linux/bpf.h | 9 ++++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d21deb46f49f..5d1bb6b42ea2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -932,7 +932,14 @@ enum bpf_map_type {
 	 */
 	BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
-	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
+	/* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
+	 * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
+	 * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * deprecated.
+	 */
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_QUEUE,
 	BPF_MAP_TYPE_STACK,
 	BPF_MAP_TYPE_SK_STORAGE,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d21deb46f49f..5d1bb6b42ea2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -932,7 +932,14 @@ enum bpf_map_type {
 	 */
 	BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
-	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
+	/* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
+	 * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
+	 * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * deprecated.
+	 */
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_QUEUE,
 	BPF_MAP_TYPE_STACK,
 	BPF_MAP_TYPE_SK_STORAGE,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
  2023-08-14 17:29 ` [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
@ 2023-08-18 15:54   ` Daniel Borkmann
  2023-08-18 17:17     ` Yonghong Song
  2023-08-18 18:26   ` Zvi Effron
  1 sibling, 1 reply; 32+ messages in thread
From: Daniel Borkmann @ 2023-08-18 15:54 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, kernel-team,
	Martin KaFai Lau

On 8/14/23 7:29 PM, Yonghong Song wrote:
> Now 'BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + local percpu ptr'
> can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
> and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>   include/uapi/linux/bpf.h       | 9 ++++++++-
>   tools/include/uapi/linux/bpf.h | 9 ++++++++-
>   2 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d21deb46f49f..5d1bb6b42ea2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -932,7 +932,14 @@ enum bpf_map_type {
>   	 */
>   	BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>   	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
> -	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
> +	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
> +	/* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
> +	 * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
> +	 * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> +	 * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> +	 * deprecated.
> +	 */
> +	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>   	BPF_MAP_TYPE_QUEUE,
>   	BPF_MAP_TYPE_STACK,
>   	BPF_MAP_TYPE_SK_STORAGE,
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d21deb46f49f..5d1bb6b42ea2 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -932,7 +932,14 @@ enum bpf_map_type {
>   	 */
>   	BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>   	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
> -	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
> +	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
> +	/* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
> +	 * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
> +	 * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> +	 * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> +	 * deprecated.
> +	 */
> +	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,

This breaks bpftool tests in BPF CI, presumably it thinks doc is missing here:

   [...]
   bpftool_checks - Running bpftool checks...
   Comparing /tmp/work/bpf/bpf/tools/include/uapi/linux/bpf.h (bpf_map_type) and /tmp/work/bpf/bpf/tools/bpf/bpftool/map.c (do_help() TYPE): {'percpu_cgroup_storage_deprecated', 'percpu_cgroup_storage'}
   Comparing /tmp/work/bpf/bpf/tools/include/uapi/linux/bpf.h (bpf_map_type) and /tmp/work/bpf/bpf/tools/bpf/bpftool/Documentation/bpftool-map.rst (TYPE): {'percpu_cgroup_storage_deprecated', 'percpu_cgroup_storage'}
   bpftool checks returned 1.
   [...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
  2023-08-18 15:54   ` Daniel Borkmann
@ 2023-08-18 17:17     ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-18 17:17 UTC (permalink / raw)
  To: Daniel Borkmann, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, kernel-team,
	Martin KaFai Lau



On 8/18/23 8:54 AM, Daniel Borkmann wrote:
> On 8/14/23 7:29 PM, Yonghong Song wrote:
>> Now 'BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + local percpu ptr'
>> can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
>> and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/uapi/linux/bpf.h       | 9 ++++++++-
>>   tools/include/uapi/linux/bpf.h | 9 ++++++++-
>>   2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index d21deb46f49f..5d1bb6b42ea2 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -932,7 +932,14 @@ enum bpf_map_type {
>>        */
>>       BPF_MAP_TYPE_CGROUP_STORAGE = 
>> BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>>       BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
>> -    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
>> +    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>> +    /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
>> +     * attaching to a cgroup. The new mechanism 
>> (BPF_MAP_TYPE_CGRP_STORAGE +
>> +     * local percpu kptr) supports all 
>> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> +     * functionality and more. So mark * 
>> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> +     * deprecated.
>> +     */
>> +    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = 
>> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>>       BPF_MAP_TYPE_QUEUE,
>>       BPF_MAP_TYPE_STACK,
>>       BPF_MAP_TYPE_SK_STORAGE,
>> diff --git a/tools/include/uapi/linux/bpf.h 
>> b/tools/include/uapi/linux/bpf.h
>> index d21deb46f49f..5d1bb6b42ea2 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -932,7 +932,14 @@ enum bpf_map_type {
>>        */
>>       BPF_MAP_TYPE_CGROUP_STORAGE = 
>> BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>>       BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
>> -    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
>> +    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>> +    /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
>> +     * attaching to a cgroup. The new mechanism 
>> (BPF_MAP_TYPE_CGRP_STORAGE +
>> +     * local percpu kptr) supports all 
>> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> +     * functionality and more. So mark * 
>> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> +     * deprecated.
>> +     */
>> +    BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = 
>> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
> 
> This breaks bpftool tests in BPF CI, presumably it thinks doc is missing 
> here:
> 
>    [...]
>    bpftool_checks - Running bpftool checks...
>    Comparing /tmp/work/bpf/bpf/tools/include/uapi/linux/bpf.h 
> (bpf_map_type) and /tmp/work/bpf/bpf/tools/bpf/bpftool/map.c (do_help() 
> TYPE): {'percpu_cgroup_storage_deprecated', 'percpu_cgroup_storage'}
>    Comparing /tmp/work/bpf/bpf/tools/include/uapi/linux/bpf.h 
> (bpf_map_type) and 
> /tmp/work/bpf/bpf/tools/bpf/bpftool/Documentation/bpftool-map.rst 
> (TYPE): {'percpu_cgroup_storage_deprecated', 'percpu_cgroup_storage'}
>    bpftool checks returned 1.
>    [...]

Thanks, Daniel. Will take a look!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
  2023-08-14 17:29 ` [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
  2023-08-18 15:54   ` Daniel Borkmann
@ 2023-08-18 18:26   ` Zvi Effron
  2023-08-18 18:58     ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Zvi Effron @ 2023-08-18 18:26 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Mon, Aug 14, 2023 at 10:30 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Now 'BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + local percpu ptr'

I found this commit message very confusing until I realized there was a typo
here. Shouldn't this be "Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr'"?

> can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
> and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
> include/uapi/linux/bpf.h | 9 ++++++++-
> tools/include/uapi/linux/bpf.h | 9 ++++++++-
> 2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d21deb46f49f..5d1bb6b42ea2 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -932,7 +932,14 @@ enum bpf_map_type {
> */
> BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
> BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
> - BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
> + /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
> + * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
> + * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> + * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> + * deprecated.
> + */
> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
> BPF_MAP_TYPE_QUEUE,
> BPF_MAP_TYPE_STACK,
> BPF_MAP_TYPE_SK_STORAGE,
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d21deb46f49f..5d1bb6b42ea2 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -932,7 +932,14 @@ enum bpf_map_type {
> */
> BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
> BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
> - BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
> + /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
> + * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
> + * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> + * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
> + * deprecated.
> + */
> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
> BPF_MAP_TYPE_QUEUE,
> BPF_MAP_TYPE_STACK,
> BPF_MAP_TYPE_SK_STORAGE,
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type
  2023-08-14 17:28 ` [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type Yonghong Song
@ 2023-08-18 18:37   ` David Marchevsky
  2023-08-18 23:24     ` Alexei Starovoitov
  2023-08-20  3:45     ` Yonghong Song
  0 siblings, 2 replies; 32+ messages in thread
From: David Marchevsky @ 2023-08-18 18:37 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

On 8/14/23 1:28 PM, Yonghong Song wrote:
> BPF_KPTR_PERCPU_REF represents a percpu field type like below
> 
>   struct val_t {
>     ... fields ...
>   };
>   struct t {
>     ...
>     struct val_t __percpu *percpu_data_ptr;
>     ...
>   };
> 
> where
>   #define __percpu __attribute__((btf_type_tag("percpu")))

nit: Maybe this should be __percpu_kptr (and similar for the actual tag)?

I don't feel strongly about this. It's certainly less concise, but given that
existing docs mention kptrs a few times, and anyone using percpu kptrs can
probably be expected to have some familiarity with "normal" kptrs, making
it more clear to BPF program writers that their existing mental model of
what a kptr is and how it should be used seems beneficial.

> 
> While BPF_KPTR_REF points to a trusted kernel object or a trusted
> local object, BPF_KPTR_PERCPU_REF points to a trusted local
> percpu object.
> 
> This patch added basic support for BPF_KPTR_PERCPU_REF
> related to percpu field parsing, recording and free operations.
> BPF_KPTR_PERCPU_REF also supports the same map types
> as BPF_KPTR_REF does.
> 
> Note that unlike a local kptr, it is possible that
> a BPF_KTPR_PERCUP_REF struct may not contain any

nit: typo here ("BPF_KTPR_PERCUP_REF" -> "BPF_KPTR_PERCPU_REF")

> special fields like other kptr, bpf_spin_lock, bpf_list_head, etc.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h  | 18 ++++++++++++------
>  kernel/bpf/btf.c     |  5 +++++
>  kernel/bpf/syscall.c |  7 ++++++-
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 60e80e90c37d..e6348fd0a785 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -180,14 +180,15 @@ enum btf_field_type {
>  	BPF_TIMER      = (1 << 1),
>  	BPF_KPTR_UNREF = (1 << 2),
>  	BPF_KPTR_REF   = (1 << 3),
> -	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> -	BPF_LIST_HEAD  = (1 << 4),
> -	BPF_LIST_NODE  = (1 << 5),
> -	BPF_RB_ROOT    = (1 << 6),
> -	BPF_RB_NODE    = (1 << 7),
> +	BPF_KPTR_PERCPU_REF   = (1 << 4),
> +	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_PERCPU_REF,
> +	BPF_LIST_HEAD  = (1 << 5),
> +	BPF_LIST_NODE  = (1 << 6),
> +	BPF_RB_ROOT    = (1 << 7),
> +	BPF_RB_NODE    = (1 << 8),
>  	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
>  				 BPF_RB_NODE | BPF_RB_ROOT,
> -	BPF_REFCOUNT   = (1 << 8),
> +	BPF_REFCOUNT   = (1 << 9),
>  };
>  
>  typedef void (*btf_dtor_kfunc_t)(void *);
> @@ -300,6 +301,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
>  	case BPF_KPTR_UNREF:
>  	case BPF_KPTR_REF:
>  		return "kptr";
> +	case BPF_KPTR_PERCPU_REF:
> +		return "kptr_percpu";
>  	case BPF_LIST_HEAD:
>  		return "bpf_list_head";
>  	case BPF_LIST_NODE:
> @@ -325,6 +328,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
>  		return sizeof(struct bpf_timer);
>  	case BPF_KPTR_UNREF:
>  	case BPF_KPTR_REF:
> +	case BPF_KPTR_PERCPU_REF:
>  		return sizeof(u64);
>  	case BPF_LIST_HEAD:
>  		return sizeof(struct bpf_list_head);
> @@ -351,6 +355,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
>  		return __alignof__(struct bpf_timer);
>  	case BPF_KPTR_UNREF:
>  	case BPF_KPTR_REF:
> +	case BPF_KPTR_PERCPU_REF:
>  		return __alignof__(u64);
>  	case BPF_LIST_HEAD:
>  		return __alignof__(struct bpf_list_head);
> @@ -389,6 +394,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
>  	case BPF_TIMER:
>  	case BPF_KPTR_UNREF:
>  	case BPF_KPTR_REF:
> +	case BPF_KPTR_PERCPU_REF:
>  		break;
>  	default:
>  		WARN_ON_ONCE(1);
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 249657c466dd..bc1792edc64e 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3293,6 +3293,8 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
>  		type = BPF_KPTR_UNREF;
>  	else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
>  		type = BPF_KPTR_REF;
> +	else if (!strcmp("percpu", __btf_name_by_offset(btf, t->name_off)))
> +		type = BPF_KPTR_PERCPU_REF;
>  	else
>  		return -EINVAL;
>  
> @@ -3457,6 +3459,7 @@ static int btf_find_struct_field(const struct btf *btf,
>  			break;
>  		case BPF_KPTR_UNREF:
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_PERCPU_REF:
>  			ret = btf_find_kptr(btf, member_type, off, sz,
>  					    idx < info_cnt ? &info[idx] : &tmp);
>  			if (ret < 0)
> @@ -3523,6 +3526,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
>  			break;
>  		case BPF_KPTR_UNREF:
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_PERCPU_REF:
>  			ret = btf_find_kptr(btf, var_type, off, sz,
>  					    idx < info_cnt ? &info[idx] : &tmp);
>  			if (ret < 0)
> @@ -3783,6 +3787,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
>  			break;
>  		case BPF_KPTR_UNREF:
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_PERCPU_REF:
>  			ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
>  			if (ret < 0)
>  				goto end;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 7f4e8c357a6a..1c30b6ee84d4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -514,6 +514,7 @@ void btf_record_free(struct btf_record *rec)
>  		switch (rec->fields[i].type) {
>  		case BPF_KPTR_UNREF:
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_PERCPU_REF:
>  			if (rec->fields[i].kptr.module)
>  				module_put(rec->fields[i].kptr.module);
>  			btf_put(rec->fields[i].kptr.btf);
> @@ -560,6 +561,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
>  		switch (fields[i].type) {
>  		case BPF_KPTR_UNREF:
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_PERCPU_REF:
>  			btf_get(fields[i].kptr.btf);
>  			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
>  				ret = -ENXIO;
> @@ -650,6 +652,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>  			WRITE_ONCE(*(u64 *)field_ptr, 0);
>  			break;
>  		case BPF_KPTR_REF:
> +		case BPF_KPTR_PERCPU_REF:
>  			xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>  			if (!xchgd_field)
>  				break;
> @@ -657,7 +660,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>  			if (!btf_is_kernel(field->kptr.btf)) {
>  				pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
>  									   field->kptr.btf_id);
> -				WARN_ON_ONCE(!pointee_struct_meta);
> +				if (field->type != BPF_KPTR_PERCPU_REF)
> +					WARN_ON_ONCE(!pointee_struct_meta);
>  				migrate_disable();
>  				__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
>  								 pointee_struct_meta->record :
> @@ -1046,6 +1050,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>  				break;
>  			case BPF_KPTR_UNREF:
>  			case BPF_KPTR_REF:
> +			case BPF_KPTR_PERCPU_REF:
>  			case BPF_REFCOUNT:
>  				if (map->map_type != BPF_MAP_TYPE_HASH &&
>  				    map->map_type != BPF_MAP_TYPE_PERCPU_HASH &&

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
  2023-08-18 18:26   ` Zvi Effron
@ 2023-08-18 18:58     ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-18 18:58 UTC (permalink / raw)
  To: Zvi Effron
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 8/18/23 11:26 AM, Zvi Effron wrote:
> On Mon, Aug 14, 2023 at 10:30 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> Now 'BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE + local percpu ptr'
> 
> I found this commit message very confusing until I realized there was a typo
> here. Shouldn't this be "Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr'"?

You are right. Thanks for pointing this out. Will fix in the
next revision.

> 
>> can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
>> and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>> include/uapi/linux/bpf.h | 9 ++++++++-
>> tools/include/uapi/linux/bpf.h | 9 ++++++++-
>> 2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index d21deb46f49f..5d1bb6b42ea2 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -932,7 +932,14 @@ enum bpf_map_type {
>> */
>> BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>> BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
>> - BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
>> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>> + /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
>> + * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
>> + * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> + * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> + * deprecated.
>> + */
>> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>> BPF_MAP_TYPE_QUEUE,
>> BPF_MAP_TYPE_STACK,
>> BPF_MAP_TYPE_SK_STORAGE,
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index d21deb46f49f..5d1bb6b42ea2 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -932,7 +932,14 @@ enum bpf_map_type {
>> */
>> BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>> BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
>> - BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
>> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>> + /* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
>> + * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
>> + * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> + * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
>> + * deprecated.
>> + */
>> + BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
>> BPF_MAP_TYPE_QUEUE,
>> BPF_MAP_TYPE_STACK,
>> BPF_MAP_TYPE_SK_STORAGE,
>> --
>> 2.34.1
>>
>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type
  2023-08-18 18:37   ` David Marchevsky
@ 2023-08-18 23:24     ` Alexei Starovoitov
  2023-08-20  3:46       ` Yonghong Song
  2023-08-20  3:45     ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-18 23:24 UTC (permalink / raw)
  To: David Marchevsky
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau

On Fri, Aug 18, 2023 at 02:37:41PM -0400, David Marchevsky wrote:
> On 8/14/23 1:28 PM, Yonghong Song wrote:
> > BPF_KPTR_PERCPU_REF represents a percpu field type like below
> > 
> >   struct val_t {
> >     ... fields ...
> >   };
> >   struct t {
> >     ...
> >     struct val_t __percpu *percpu_data_ptr;
> >     ...
> >   };
> > 
> > where
> >   #define __percpu __attribute__((btf_type_tag("percpu")))
> 
> nit: Maybe this should be __percpu_kptr (and similar for the actual tag)?

+1.

I think it might conflict with kernel:
include/linux/compiler_types.h:# define __percpu	BTF_TYPE_TAG(percpu)
It's the same tag name, but the kernel semantics are different from our kptr
semantics inside bpf prog.
I think we have to use a different tag like:
#define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))

> > index 60e80e90c37d..e6348fd0a785 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -180,14 +180,15 @@ enum btf_field_type {
> >  	BPF_TIMER      = (1 << 1),
> >  	BPF_KPTR_UNREF = (1 << 2),
> >  	BPF_KPTR_REF   = (1 << 3),
> > -	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
> > -	BPF_LIST_HEAD  = (1 << 4),
> > -	BPF_LIST_NODE  = (1 << 5),
> > -	BPF_RB_ROOT    = (1 << 6),
> > -	BPF_RB_NODE    = (1 << 7),
> > +	BPF_KPTR_PERCPU_REF   = (1 << 4),

I think _REF is redundant here. _UNREF is obsolete. We might remove it and
rename BPF_KPTR_REF to just BPF_KPTR.
BPF_KPTR_PERCPU should be clear enough.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr
  2023-08-14 17:28 ` [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
@ 2023-08-19  0:29   ` Alexei Starovoitov
  2023-08-20  3:47     ` Yonghong Song
  2023-08-19  1:24   ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-19  0:29 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Mon, Aug 14, 2023 at 10:28:25AM -0700, Yonghong Song wrote:
> @@ -4997,13 +4997,20 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  	if (kptr_field->type == BPF_KPTR_UNREF)
>  		perm_flags |= PTR_UNTRUSTED;
>  
> +	if (kptr_field->type == BPF_KPTR_PERCPU_REF)
> +		perm_flags |= MEM_PERCPU | MEM_ALLOC;

this bit doesn't look right and ...

> +
>  	if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
>  		goto bad_type;
>  
> -	if (!btf_is_kernel(reg->btf)) {
> +	if (kptr_field->type != BPF_KPTR_PERCPU_REF && !btf_is_kernel(reg->btf)) {
>  		verbose(env, "R%d must point to kernel BTF\n", regno);
>  		return -EINVAL;
>  	}
> +	if (kptr_field->type == BPF_KPTR_PERCPU_REF && btf_is_kernel(reg->btf)) {
> +		verbose(env, "R%d must point to prog BTF\n", regno);
> +		return -EINVAL;
> +	}

.. here it really doesn't look right.
The map_kptr_match_type() should have been used for kptrs pointing to kernel objects only.
But you're calling it for MEM_ALLOC object with prog's BTF...

> +	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_ALLOC:
> +		if (meta->func_id != BPF_FUNC_kptr_xchg) {
> +			verbose(env, "verifier internal error: unimplemented handling of MEM_PERCPU | MEM_ALLOC\n");
> +			return -EFAULT;
> +		}

this part should be handling it, but ...

> +		if (map_kptr_match_type(env, meta->kptr_field, reg, regno))
> +			return -EACCES;

why call this here?

Existing:
        case PTR_TO_BTF_ID | MEM_ALLOC:
                if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock &&
                    meta->func_id != BPF_FUNC_kptr_xchg) {
                        verbose(env, "verifier internal error: unimplemented handling of MEM_ALLOC\n");
                        return -EFAULT;
                }
doesn't call map_kptr_match_type().
Where do we check that btf of arg1 and arg2 matches for kptr_xchg of MEM_ALLOC objs? Do we have a bug?

Yep. We do :(

diff --git a/tools/testing/selftests/bpf/progs/local_kptr_stash.c b/tools/testing/selftests/bpf/progs/local_kptr_stash.c
index 06838083079c..a6f546f4da9a 100644
--- a/tools/testing/selftests/bpf/progs/local_kptr_stash.c
+++ b/tools/testing/selftests/bpf/progs/local_kptr_stash.c
@@ -14,10 +14,12 @@ struct node_data {
        struct bpf_rb_node node;
 };

+struct node_data2 { long foo[4];};
+
 struct map_value {
        struct prog_test_ref_kfunc *not_kptr;
        struct prog_test_ref_kfunc __kptr *val;
-       struct node_data __kptr *node;
+       struct node_data2 __kptr *node;
 };

 /* This is necessary so that LLVM generates BTF for node_data struct
@@ -32,6 +34,7 @@ struct map_value {
  * Had to do the same w/ bpf_kfunc_call_test_release below
  */
 struct node_data *just_here_because_btf_bug;
+struct node_data2 *just_here_because_btf_bug2;

passes the verifier and runs into kernel WARN_ONCE.

Let's fix this issue first before proceeding with this series.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj
  2023-08-14 17:28 ` [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
@ 2023-08-19  1:01   ` Alexei Starovoitov
  2023-08-20  4:16     ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2023-08-19  1:01 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Mon, Aug 14, 2023 at 10:28:30AM -0700, Yonghong Song wrote:
> The bpf helpers bpf_this_cpu_ptr() and bpf_per_cpu_ptr() are re-purposed
> for allocated percpu objects. For an allocated percpu obj,
> the reg type is 'PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU'.
> 
> The return type for these two re-purposed helpera is
> 'PTR_TO_MEM | MEM_RCU | MEM_ALLOC'.
> The MEM_ALLOC allows that the per-cpu data can be read and written.
> 
> Since the memory allocator bpf_mem_alloc() returns
> a ptr to a percpu ptr for percpu data, the first argument
> of bpf_this_cpu_ptr() and bpf_per_cpu_ptr() is patched
> with a dereference before passing to the helper func.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf_verifier.h |  1 +
>  kernel/bpf/verifier.c        | 51 ++++++++++++++++++++++++++++++------
>  2 files changed, 44 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index f70f9ac884d2..e23480db37ec 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -480,6 +480,7 @@ struct bpf_insn_aux_data {
>  	bool zext_dst; /* this insn zero extends dst reg */
>  	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
>  	bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
> +	bool percpu_ptr_prog_alloc; /* {this,per}_cpu_ptr() with prog alloc */
>  	u8 alu_state; /* used in combination with alu_limit */
>  
>  	/* below fields are initialized once */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a985fbf18a11..6fc200cb68b6 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6221,7 +6221,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  		}
>  
>  		if (type_is_alloc(reg->type) && !type_is_non_owning_ref(reg->type) &&
> -		    !reg->ref_obj_id) {
> +		    !(reg->type & MEM_RCU) && !reg->ref_obj_id) {
>  			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
>  			return -EFAULT;
>  		}
> @@ -7765,6 +7765,7 @@ static const struct bpf_reg_types btf_ptr_types = {
>  static const struct bpf_reg_types percpu_btf_ptr_types = {
>  	.types = {
>  		PTR_TO_BTF_ID | MEM_PERCPU,
> +		PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU,
>  		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
>  	}
>  };
> @@ -7945,6 +7946,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  			return -EACCES;
>  		break;
>  	case PTR_TO_BTF_ID | MEM_PERCPU:
> +	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU:
>  	case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED:
>  		/* Handled by helper specific checks */
>  		break;
> @@ -8287,6 +8289,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>  			verbose(env, "Helper has invalid btf_id in R%d\n", regno);
>  			return -EACCES;
>  		}
> +		if (reg->type & MEM_RCU) {
> +			const struct btf_type *type = btf_type_by_id(reg->btf, reg->btf_id);
> +
> +			if (!type || !btf_type_is_struct(type)) {
> +				verbose(env, "Helper has invalid btf/btf_id in R%d\n", regno);
> +				return -EFAULT;
> +			}
> +			env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc = true;
> +		}

Let's move this check out of check_func_arg() and do it in check_helper_call() after the loop.
meta has what we need.
Also I would do it only for specific helpers like:
case BPF_FUNC_per_cpu_ptr:
case BPF_FUNC_this_cpu_ptr:
   if (reg[R1]->type & MEM_RCU) {
      const struct btf_type *type = btf_type_by_id(meta->ret_btf, ...)
      ...
      returns_cpu_specific_alloc_ptr = true;
      insn_aux_datap[].call_with_percpu_alloc_ptr = ture;
    }
> +
>  		meta->ret_btf = reg->btf;
>  		meta->ret_btf_id = reg->btf_id;
>  		break;
> @@ -9888,14 +9900,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  			regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
>  			regs[BPF_REG_0].mem_size = tsize;
>  		} else {
> -			/* MEM_RDONLY may be carried from ret_flag, but it
> -			 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
> -			 * it will confuse the check of PTR_TO_BTF_ID in
> -			 * check_mem_access().
> -			 */
> -			ret_flag &= ~MEM_RDONLY;
> +			if (env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc) {

and here I would only check the local bool returns_percpu_alloc_ptr.

Because returns_cpu_specific_alloc_ptr is not the same as call_with_percpu_alloc_ptr.

Like this_cpu_read() is a call_with_percpu_alloc_ptr, but it doesn't return cpu specific pointer.
It derefs cpu specific pointer and returns the value.
Of course, we don't have such helper today, but worth thinking ahead.

> +				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU;
> +			} else {
> +				/* MEM_RDONLY may be carried from ret_flag, but it
> +				 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
> +				 * it will confuse the check of PTR_TO_BTF_ID in
> +				 * check_mem_access().
> +				 */
> +				ret_flag &= ~MEM_RDONLY;
> +				regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
> +			}
>  
> -			regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
>  			regs[BPF_REG_0].btf = meta.ret_btf;
>  			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
>  		}
> @@ -18646,6 +18662,25 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>  			goto patch_call_imm;
>  		}
>  
> +		/* bpf_per_cpu_ptr() and bpf_this_cpu_ptr() */
> +		if (env->insn_aux_data[i + delta].percpu_ptr_prog_alloc) {

call_with_percpu_alloc_ptr

> +			/* patch with 'r1 = *(u64 *)(r1 + 0)' since for percpu data,
> +			 * bpf_mem_alloc() returns a ptr to the percpu data ptr.
> +			 */
> +			insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0);

here R1 is no longer a concern, since we check R1 only in check_helper_call.

> +			insn_buf[1] = *insn;
> +			cnt = 2;
> +
> +			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> +			if (!new_prog)
> +				return -ENOMEM;
> +
> +			delta += cnt - 1;
> +			env->prog = prog = new_prog;
> +			insn = new_prog->insnsi + i + delta;
> +			goto patch_call_imm;
> +		}
> +
>  		/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
>  		 * and other inlining handlers are currently limited to 64 bit
>  		 * only.
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr
  2023-08-14 17:28 ` [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
  2023-08-19  0:29   ` Alexei Starovoitov
@ 2023-08-19  1:24   ` Kumar Kartikeya Dwivedi
  2023-08-20  4:04     ` Yonghong Song
  1 sibling, 1 reply; 32+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-19  1:24 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Mon, 14 Aug 2023 at 22:59, Yonghong Song <yonghong.song@linux.dev> wrote:
>
> Add two new kfunc's, bpf_percpu_obj_new_impl() and
> bpf_percpu_obj_drop_impl(), to allocate a percpu obj.
> Two functions are very similar to bpf_obj_new_impl()
> and bpf_obj_drop_impl(). The major difference is related
> to percpu handling.
>
>     bpf_rcu_read_lock()
>     struct val_t __percpu *v = map_val->percpu_data;
>     ...
>     bpf_rcu_read_unlock()
>
> For a percpu data map_val like above 'v', the reg->type
> is set as
>         PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU
> if inside rcu critical section.
>
> MEM_RCU marking here is similar to NON_OWN_REF as 'v'
> is not a owning referenace. But NON_OWN_REF is

typo: reference

> trusted and typically inside the spinlock while
> MEM_RCU is under rcu read lock. RCU is preferred here
> since percpu data structures mean potential concurrent
> access into its contents.
>
> Also, bpf_percpu_obj_new_impl() is restricted to only accept
> scalar struct which means nested kptr's are not allowed
> but some other special field, e.g., bpf_list_head, bpf_spin_lock, etc.
> could be nested (nested 'struct'). Later patch will improve verifier to
> handle such nested special fields.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h   |  3 +-
>  kernel/bpf/helpers.c  | 49 +++++++++++++++++++++++
>  kernel/bpf/syscall.c  | 21 +++++++---
>  kernel/bpf/verifier.c | 90 ++++++++++++++++++++++++++++++++++---------
>  4 files changed, 137 insertions(+), 26 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e6348fd0a785..a2cb380c43c7 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -197,7 +197,8 @@ struct btf_field_kptr {
>         struct btf *btf;
>         struct module *module;
>         /* dtor used if btf_is_kernel(btf), otherwise the type is
> -        * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl is used
> +        * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl
> +        * or __bpf_percpu_drop_impl is used
>          */
>         btf_dtor_kfunc_t dtor;
>         u32 btf_id;
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index eb91cae0612a..dd14cb7da4af 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1900,6 +1900,29 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
>         return p;
>  }
>
> +__bpf_kfunc void *bpf_percpu_obj_new_impl(u64 local_type_id__k, void *meta__ign)
> +{
> +       struct btf_struct_meta *meta = meta__ign;
> +       const struct btf_record *rec;
> +       u64 size = local_type_id__k;
> +       void __percpu *pptr;
> +       void *p;
> +       int cpu;
> +
> +       p = bpf_mem_alloc(&bpf_global_percpu_ma, size);
> +       if (!p)
> +               return NULL;
> +       if (meta) {
> +               pptr = *((void __percpu **)p);
> +               rec = meta->record;
> +               for_each_possible_cpu(cpu) {
> +                       bpf_obj_init(rec, per_cpu_ptr(pptr, cpu));
> +               }
> +       }
> +
> +       return p;
> +}
> +
>  /* Must be called under migrate_disable(), as required by bpf_mem_free */
>  void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>  {
> @@ -1924,6 +1947,30 @@ __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
>         __bpf_obj_drop_impl(p, meta ? meta->record : NULL);
>  }
>
> +/* Must be called under migrate_disable(), as required by bpf_mem_free_rcu */
> +void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec)
> +{
> +       void __percpu *pptr;
> +       int cpu;
> +
> +       if (rec) {
> +               pptr = *((void __percpu **)p);
> +               for_each_possible_cpu(cpu) {
> +                       bpf_obj_free_fields(rec, per_cpu_ptr(pptr, cpu));

Should this loop be done after we have waited for the RCU grace period?
Otherwise any other CPU can reinitialize a field after this is done,
move objects into lists/rbtree, and leak memory.
Please correct me if I'm mistaken.

> +               }
> +       }
> +
> +       bpf_mem_free_rcu(&bpf_global_percpu_ma, p);
> +}
> +
> +__bpf_kfunc void bpf_percpu_obj_drop_impl(void *p__alloc, void *meta__ign)
> +{
> +       struct btf_struct_meta *meta = meta__ign;
> +       void *p = p__alloc;
> +
> +       __bpf_percpu_obj_drop_impl(p, meta ? meta->record : NULL);
> +}
> +
>  __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
>  {
>         struct btf_struct_meta *meta = meta__ign;
> @@ -2436,7 +2483,9 @@ BTF_SET8_START(generic_btf_ids)
>  BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>  #endif
>  BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_percpu_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
> +BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE)
>  BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_list_push_front_impl)
>  BTF_ID_FLAGS(func, bpf_list_push_back_impl)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 1c30b6ee84d4..9ceb6fd9a0e2 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -627,6 +627,7 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
>  }
>
>  extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
> +extern void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec);
>
>  void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>  {
> @@ -660,13 +661,21 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>                         if (!btf_is_kernel(field->kptr.btf)) {
>                                 pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
>                                                                            field->kptr.btf_id);
> -                               if (field->type != BPF_KPTR_PERCPU_REF)
> +
> +                               if (field->type == BPF_KPTR_PERCPU_REF) {
> +                                       migrate_disable();
> +                                       __bpf_percpu_obj_drop_impl(xchgd_field, pointee_struct_meta ?
> +                                                                               pointee_struct_meta->record :
> +                                                                               NULL);
> +                                       migrate_enable();
> +                               } else {
>                                         WARN_ON_ONCE(!pointee_struct_meta);
> -                               migrate_disable();
> -                               __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
> -                                                                pointee_struct_meta->record :
> -                                                                NULL);
> -                               migrate_enable();
> +                                       migrate_disable();
> +                                       __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
> +                                                                        pointee_struct_meta->record :
> +                                                                        NULL);
> +                                       migrate_enable();
> +                               }
>                         } else {
>                                 field->kptr.dtor(xchgd_field);
>                         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4ccca1f6c998..a985fbf18a11 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -304,7 +304,7 @@ struct bpf_kfunc_call_arg_meta {
>         /* arg_{btf,btf_id,owning_ref} are used by kfunc-specific handling,
>          * generally to pass info about user-defined local kptr types to later
>          * verification logic
> -        *   bpf_obj_drop
> +        *   bpf_obj_drop/bpf_percpu_obj_drop
>          *     Record the local kptr type to be drop'd
>          *   bpf_refcount_acquire (via KF_ARG_PTR_TO_REFCOUNTED_KPTR arg type)
>          *     Record the local kptr type to be refcount_incr'd and use
> @@ -4997,13 +4997,20 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>         if (kptr_field->type == BPF_KPTR_UNREF)
>                 perm_flags |= PTR_UNTRUSTED;
>
> +       if (kptr_field->type == BPF_KPTR_PERCPU_REF)
> +               perm_flags |= MEM_PERCPU | MEM_ALLOC;
> +

I think just this would permit PTR_TO_BTF_ID | MEM_ALLOC for percpu kptr?
It would probably be good to include negative selftests for kptr_xchg
type matching with percpu_kptr to prevent things like these.

Alexei already said map_kptr_match_type is not being invoked for
MEM_ALLOC kptr_xchg, so that is also an existing bug.

>         if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
>                 goto bad_type;
>
> [...]
>         /* We need to verify reg->type and reg->btf, before accessing reg->btf */
>         reg_name = btf_type_name(reg->btf, reg->btf_id);
>
> @@ -5084,7 +5091,17 @@ static bool rcu_safe_kptr(const struct btf_field *field)
>  {
>         const struct btf_field_kptr *kptr = &field->kptr;
>
> -       return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
> +       return field->type == BPF_KPTR_PERCPU_REF ||
> +              (field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id));
> +}
> +
> +static u32 btf_ld_kptr_type(struct bpf_verifier_env *env, struct btf_field *kptr_field)
> +{
> +       if (!rcu_safe_kptr(kptr_field) || !in_rcu_cs(env))
> +               return PTR_MAYBE_NULL | PTR_UNTRUSTED;
> +       if (kptr_field->type != BPF_KPTR_PERCPU_REF)
> +               return PTR_MAYBE_NULL | MEM_RCU;
> +       return PTR_MAYBE_NULL | MEM_RCU | MEM_PERCPU;

The inverted conditions are a bit hard to follow. Maybe better to
explicitly check for both RCU cases, and default to untrusted
otherwise?

>  }
>
> [...]
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
  2023-08-14 17:28 ` [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
@ 2023-08-19  1:44   ` Kumar Kartikeya Dwivedi
  2023-08-20  4:19     ` Yonghong Song
  0 siblings, 1 reply; 32+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2023-08-19  1:44 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Mon, 14 Aug 2023 at 23:00, Yonghong Song <yonghong.song@linux.dev> wrote:
>
> In previous selftests/bpf patch, we have
>   p = bpf_percpu_obj_new(struct val_t);
>   if (!p)
>           goto out;
>
>   p1 = bpf_kptr_xchg(&e->pc, p);
>   if (p1) {
>           /* race condition */
>           bpf_percpu_obj_drop(p1);
>   }
>
>   p = e->pc;
>   if (!p)
>           goto out;
>
> After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
> This is due to that the second argument of bpf_kptr_xchg() is marked
> OBJ_RELEASE and it will be marked as invalid after the call.
> So after bpf_kptr_xchg(), 'p' is an unknown scalar,
> and the bpf program needs to reread from the map value.
>
> This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
> and if 'p' is RCU protected. If this is the case, 'p' can be marked
> as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
> an owning reference any more. Such a change makes re-read
> from the map value unnecessary.
>
> Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
> a different value from 'p' if immediately before 'p = e->pc',
> another cpu may do another bpf_kptr_xchg() and swap in another value
> into 'e->pc'. If this is the case, then 'p = e->pc' may
> get either 'p' or another value, and race condition already exists.
> So removing direct re-reading seems fine too.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  kernel/bpf/verifier.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6fc200cb68b6..6fa458e13bfc 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env,
>                 return err;
>
>         bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> -               if (reg->ref_obj_id == ref_obj_id)
> -                       mark_reg_invalid(env, reg);
> +               if (reg->ref_obj_id == ref_obj_id) {
> +                       if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {

Wouldn't this check also be true in case of bpf_percpu_obj_drop(p)
inside RCU CS/non-sleepable prog?
Do we want to permit access to p after drop in that case? I think it
will be a bit unintuitive.
I think we should preserve normal behavior for everything except for
kptr_xchg of a percpu_kptr.

> +                               reg->ref_obj_id = 0;
> +                               reg->type &= ~MEM_ALLOC;
> +                               reg->type |= MEM_RCU;
> +                       } else {
> +                               mark_reg_invalid(env, reg);
> +                       }
> +               }
>         }));
>
>         return 0;
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type
  2023-08-18 18:37   ` David Marchevsky
  2023-08-18 23:24     ` Alexei Starovoitov
@ 2023-08-20  3:45     ` Yonghong Song
  1 sibling, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-20  3:45 UTC (permalink / raw)
  To: David Marchevsky, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team,
	Martin KaFai Lau



On 8/18/23 11:37 AM, David Marchevsky wrote:
> On 8/14/23 1:28 PM, Yonghong Song wrote:
>> BPF_KPTR_PERCPU_REF represents a percpu field type like below
>>
>>    struct val_t {
>>      ... fields ...
>>    };
>>    struct t {
>>      ...
>>      struct val_t __percpu *percpu_data_ptr;
>>      ...
>>    };
>>
>> where
>>    #define __percpu __attribute__((btf_type_tag("percpu")))
> 
> nit: Maybe this should be __percpu_kptr (and similar for the actual tag)?
> 
> I don't feel strongly about this. It's certainly less concise, but given that
> existing docs mention kptrs a few times, and anyone using percpu kptrs can
> probably be expected to have some familiarity with "normal" kptrs, making
> it more clear to BPF program writers that their existing mental model of
> what a kptr is and how it should be used seems beneficial.

Thanks for suggestion. As Alexei suggested later as well,
__percpu_kptr is better than __percpu so users won't be confused
with kernel __percpu tag.

> 
>>
>> While BPF_KPTR_REF points to a trusted kernel object or a trusted
>> local object, BPF_KPTR_PERCPU_REF points to a trusted local
>> percpu object.
>>
>> This patch added basic support for BPF_KPTR_PERCPU_REF
>> related to percpu field parsing, recording and free operations.
>> BPF_KPTR_PERCPU_REF also supports the same map types
>> as BPF_KPTR_REF does.
>>
>> Note that unlike a local kptr, it is possible that
>> a BPF_KTPR_PERCUP_REF struct may not contain any
> 
> nit: typo here ("BPF_KTPR_PERCUP_REF" -> "BPF_KPTR_PERCPU_REF")

Ack. Thanks!

> 
>> special fields like other kptr, bpf_spin_lock, bpf_list_head, etc.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf.h  | 18 ++++++++++++------
>>   kernel/bpf/btf.c     |  5 +++++
>>   kernel/bpf/syscall.c |  7 ++++++-
>>   3 files changed, 23 insertions(+), 7 deletions(-)
>>
[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type
  2023-08-18 23:24     ` Alexei Starovoitov
@ 2023-08-20  3:46       ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-20  3:46 UTC (permalink / raw)
  To: Alexei Starovoitov, David Marchevsky
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 8/18/23 4:24 PM, Alexei Starovoitov wrote:
> On Fri, Aug 18, 2023 at 02:37:41PM -0400, David Marchevsky wrote:
>> On 8/14/23 1:28 PM, Yonghong Song wrote:
>>> BPF_KPTR_PERCPU_REF represents a percpu field type like below
>>>
>>>    struct val_t {
>>>      ... fields ...
>>>    };
>>>    struct t {
>>>      ...
>>>      struct val_t __percpu *percpu_data_ptr;
>>>      ...
>>>    };
>>>
>>> where
>>>    #define __percpu __attribute__((btf_type_tag("percpu")))
>>
>> nit: Maybe this should be __percpu_kptr (and similar for the actual tag)?
> 
> +1.
> 
> I think it might conflict with kernel:
> include/linux/compiler_types.h:# define __percpu	BTF_TYPE_TAG(percpu)
> It's the same tag name, but the kernel semantics are different from our kptr
> semantics inside bpf prog.
> I think we have to use a different tag like:
> #define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))

Agree. Will use __percpu_kptr in the next revision.

> 
>>> index 60e80e90c37d..e6348fd0a785 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -180,14 +180,15 @@ enum btf_field_type {
>>>   	BPF_TIMER      = (1 << 1),
>>>   	BPF_KPTR_UNREF = (1 << 2),
>>>   	BPF_KPTR_REF   = (1 << 3),
>>> -	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
>>> -	BPF_LIST_HEAD  = (1 << 4),
>>> -	BPF_LIST_NODE  = (1 << 5),
>>> -	BPF_RB_ROOT    = (1 << 6),
>>> -	BPF_RB_NODE    = (1 << 7),
>>> +	BPF_KPTR_PERCPU_REF   = (1 << 4),
> 
> I think _REF is redundant here. _UNREF is obsolete. We might remove it and
> rename BPF_KPTR_REF to just BPF_KPTR.
> BPF_KPTR_PERCPU should be clear enough.

Okay, will use BPF_KPTR_PERCPU in the next revision.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr
  2023-08-19  0:29   ` Alexei Starovoitov
@ 2023-08-20  3:47     ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-20  3:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 8/18/23 5:29 PM, Alexei Starovoitov wrote:
> On Mon, Aug 14, 2023 at 10:28:25AM -0700, Yonghong Song wrote:
>> @@ -4997,13 +4997,20 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>>   	if (kptr_field->type == BPF_KPTR_UNREF)
>>   		perm_flags |= PTR_UNTRUSTED;
>>   
>> +	if (kptr_field->type == BPF_KPTR_PERCPU_REF)
>> +		perm_flags |= MEM_PERCPU | MEM_ALLOC;
> 
> this bit doesn't look right and ...
> 
>> +
>>   	if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
>>   		goto bad_type;
>>   
>> -	if (!btf_is_kernel(reg->btf)) {
>> +	if (kptr_field->type != BPF_KPTR_PERCPU_REF && !btf_is_kernel(reg->btf)) {
>>   		verbose(env, "R%d must point to kernel BTF\n", regno);
>>   		return -EINVAL;
>>   	}
>> +	if (kptr_field->type == BPF_KPTR_PERCPU_REF && btf_is_kernel(reg->btf)) {
>> +		verbose(env, "R%d must point to prog BTF\n", regno);
>> +		return -EINVAL;
>> +	}
> 
> .. here it really doesn't look right.
> The map_kptr_match_type() should have been used for kptrs pointing to kernel objects only.
> But you're calling it for MEM_ALLOC object with prog's BTF...
> 
>> +	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_ALLOC:
>> +		if (meta->func_id != BPF_FUNC_kptr_xchg) {
>> +			verbose(env, "verifier internal error: unimplemented handling of MEM_PERCPU | MEM_ALLOC\n");
>> +			return -EFAULT;
>> +		}
> 
> this part should be handling it, but ...
> 
>> +		if (map_kptr_match_type(env, meta->kptr_field, reg, regno))
>> +			return -EACCES;
> 
> why call this here?
> 
> Existing:
>          case PTR_TO_BTF_ID | MEM_ALLOC:
>                  if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock &&
>                      meta->func_id != BPF_FUNC_kptr_xchg) {
>                          verbose(env, "verifier internal error: unimplemented handling of MEM_ALLOC\n");
>                          return -EFAULT;
>                  }
> doesn't call map_kptr_match_type().
> Where do we check that btf of arg1 and arg2 matches for kptr_xchg of MEM_ALLOC objs? Do we have a bug?
> 
> Yep. We do :(
> 
> diff --git a/tools/testing/selftests/bpf/progs/local_kptr_stash.c b/tools/testing/selftests/bpf/progs/local_kptr_stash.c
> index 06838083079c..a6f546f4da9a 100644
> --- a/tools/testing/selftests/bpf/progs/local_kptr_stash.c
> +++ b/tools/testing/selftests/bpf/progs/local_kptr_stash.c
> @@ -14,10 +14,12 @@ struct node_data {
>          struct bpf_rb_node node;
>   };
> 
> +struct node_data2 { long foo[4];};
> +
>   struct map_value {
>          struct prog_test_ref_kfunc *not_kptr;
>          struct prog_test_ref_kfunc __kptr *val;
> -       struct node_data __kptr *node;
> +       struct node_data2 __kptr *node;
>   };
> 
>   /* This is necessary so that LLVM generates BTF for node_data struct
> @@ -32,6 +34,7 @@ struct map_value {
>    * Had to do the same w/ bpf_kfunc_call_test_release below
>    */
>   struct node_data *just_here_because_btf_bug;
> +struct node_data2 *just_here_because_btf_bug2;
> 
> passes the verifier and runs into kernel WARN_ONCE.
> 
> Let's fix this issue first before proceeding with this series.

Sounds good. I will investigate and fix this issue before sending
out v2.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr
  2023-08-19  1:24   ` Kumar Kartikeya Dwivedi
@ 2023-08-20  4:04     ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-20  4:04 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 8/18/23 6:24 PM, Kumar Kartikeya Dwivedi wrote:
> On Mon, 14 Aug 2023 at 22:59, Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> Add two new kfunc's, bpf_percpu_obj_new_impl() and
>> bpf_percpu_obj_drop_impl(), to allocate a percpu obj.
>> Two functions are very similar to bpf_obj_new_impl()
>> and bpf_obj_drop_impl(). The major difference is related
>> to percpu handling.
>>
>>      bpf_rcu_read_lock()
>>      struct val_t __percpu *v = map_val->percpu_data;
>>      ...
>>      bpf_rcu_read_unlock()
>>
>> For a percpu data map_val like above 'v', the reg->type
>> is set as
>>          PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU
>> if inside rcu critical section.
>>
>> MEM_RCU marking here is similar to NON_OWN_REF as 'v'
>> is not a owning referenace. But NON_OWN_REF is
> 
> typo: reference

Ack.

> 
>> trusted and typically inside the spinlock while
>> MEM_RCU is under rcu read lock. RCU is preferred here
>> since percpu data structures mean potential concurrent
>> access into its contents.
>>
>> Also, bpf_percpu_obj_new_impl() is restricted to only accept
>> scalar struct which means nested kptr's are not allowed
>> but some other special field, e.g., bpf_list_head, bpf_spin_lock, etc.
>> could be nested (nested 'struct'). Later patch will improve verifier to
>> handle such nested special fields.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf.h   |  3 +-
>>   kernel/bpf/helpers.c  | 49 +++++++++++++++++++++++
>>   kernel/bpf/syscall.c  | 21 +++++++---
>>   kernel/bpf/verifier.c | 90 ++++++++++++++++++++++++++++++++++---------
>>   4 files changed, 137 insertions(+), 26 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index e6348fd0a785..a2cb380c43c7 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -197,7 +197,8 @@ struct btf_field_kptr {
>>          struct btf *btf;
>>          struct module *module;
>>          /* dtor used if btf_is_kernel(btf), otherwise the type is
>> -        * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl is used
>> +        * program-allocated, dtor is NULL,  and __bpf_obj_drop_impl
>> +        * or __bpf_percpu_drop_impl is used
>>           */
>>          btf_dtor_kfunc_t dtor;
>>          u32 btf_id;
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index eb91cae0612a..dd14cb7da4af 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -1900,6 +1900,29 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
>>          return p;
>>   }
>>
>> +__bpf_kfunc void *bpf_percpu_obj_new_impl(u64 local_type_id__k, void *meta__ign)
>> +{
>> +       struct btf_struct_meta *meta = meta__ign;
>> +       const struct btf_record *rec;
>> +       u64 size = local_type_id__k;
>> +       void __percpu *pptr;
>> +       void *p;
>> +       int cpu;
>> +
>> +       p = bpf_mem_alloc(&bpf_global_percpu_ma, size);
>> +       if (!p)
>> +               return NULL;
>> +       if (meta) {
>> +               pptr = *((void __percpu **)p);
>> +               rec = meta->record;
>> +               for_each_possible_cpu(cpu) {
>> +                       bpf_obj_init(rec, per_cpu_ptr(pptr, cpu));
>> +               }
>> +       }
>> +
>> +       return p;
>> +}
>> +
>>   /* Must be called under migrate_disable(), as required by bpf_mem_free */
>>   void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
>>   {
>> @@ -1924,6 +1947,30 @@ __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
>>          __bpf_obj_drop_impl(p, meta ? meta->record : NULL);
>>   }
>>
>> +/* Must be called under migrate_disable(), as required by bpf_mem_free_rcu */
>> +void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec)
>> +{
>> +       void __percpu *pptr;
>> +       int cpu;
>> +
>> +       if (rec) {
>> +               pptr = *((void __percpu **)p);
>> +               for_each_possible_cpu(cpu) {
>> +                       bpf_obj_free_fields(rec, per_cpu_ptr(pptr, cpu));
> 
> Should this loop be done after we have waited for the RCU grace period?
> Otherwise any other CPU can reinitialize a field after this is done,
> move objects into lists/rbtree, and leak memory.
> Please correct me if I'm mistaken.

Thanks for spotting this. I think you are correct. The above scenario is
indeed possible. one cpu takes a direct reference of __percpu_kptr and
do a bunch of stuff, and the other cpu is doing a bpf_kptr_xchg to
get the __percpu_kptr and drops it. We should really drop the 
__percpu_kptr itself and the fields in its record after a rcu
grace period so the exist direct reference operation won't be
affected.

Will fix it in the v2.

> 
>> +               }
>> +       }
>> +
>> +       bpf_mem_free_rcu(&bpf_global_percpu_ma, p);
>> +}
>> +
>> +__bpf_kfunc void bpf_percpu_obj_drop_impl(void *p__alloc, void *meta__ign)
>> +{
>> +       struct btf_struct_meta *meta = meta__ign;
>> +       void *p = p__alloc;
>> +
>> +       __bpf_percpu_obj_drop_impl(p, meta ? meta->record : NULL);
>> +}
>> +
>>   __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
>>   {
>>          struct btf_struct_meta *meta = meta__ign;
>> @@ -2436,7 +2483,9 @@ BTF_SET8_START(generic_btf_ids)
>>   BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
>>   #endif
>>   BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_percpu_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
>> +BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE)
>>   BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_list_push_front_impl)
>>   BTF_ID_FLAGS(func, bpf_list_push_back_impl)
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 1c30b6ee84d4..9ceb6fd9a0e2 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -627,6 +627,7 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
>>   }
>>
>>   extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
>> +extern void __bpf_percpu_obj_drop_impl(void *p, const struct btf_record *rec);
>>
>>   void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>>   {
>> @@ -660,13 +661,21 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
>>                          if (!btf_is_kernel(field->kptr.btf)) {
>>                                  pointee_struct_meta = btf_find_struct_meta(field->kptr.btf,
>>                                                                             field->kptr.btf_id);
>> -                               if (field->type != BPF_KPTR_PERCPU_REF)
>> +
>> +                               if (field->type == BPF_KPTR_PERCPU_REF) {
>> +                                       migrate_disable();
>> +                                       __bpf_percpu_obj_drop_impl(xchgd_field, pointee_struct_meta ?
>> +                                                                               pointee_struct_meta->record :
>> +                                                                               NULL);
>> +                                       migrate_enable();
>> +                               } else {
>>                                          WARN_ON_ONCE(!pointee_struct_meta);
>> -                               migrate_disable();
>> -                               __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
>> -                                                                pointee_struct_meta->record :
>> -                                                                NULL);
>> -                               migrate_enable();
>> +                                       migrate_disable();
>> +                                       __bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
>> +                                                                        pointee_struct_meta->record :
>> +                                                                        NULL);
>> +                                       migrate_enable();
>> +                               }
>>                          } else {
>>                                  field->kptr.dtor(xchgd_field);
>>                          }
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 4ccca1f6c998..a985fbf18a11 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -304,7 +304,7 @@ struct bpf_kfunc_call_arg_meta {
>>          /* arg_{btf,btf_id,owning_ref} are used by kfunc-specific handling,
>>           * generally to pass info about user-defined local kptr types to later
>>           * verification logic
>> -        *   bpf_obj_drop
>> +        *   bpf_obj_drop/bpf_percpu_obj_drop
>>           *     Record the local kptr type to be drop'd
>>           *   bpf_refcount_acquire (via KF_ARG_PTR_TO_REFCOUNTED_KPTR arg type)
>>           *     Record the local kptr type to be refcount_incr'd and use
>> @@ -4997,13 +4997,20 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>>          if (kptr_field->type == BPF_KPTR_UNREF)
>>                  perm_flags |= PTR_UNTRUSTED;
>>
>> +       if (kptr_field->type == BPF_KPTR_PERCPU_REF)
>> +               perm_flags |= MEM_PERCPU | MEM_ALLOC;
>> +
> 
> I think just this would permit PTR_TO_BTF_ID | MEM_ALLOC for percpu kptr?
> It would probably be good to include negative selftests for kptr_xchg
> type matching with percpu_kptr to prevent things like these.
> 
> Alexei already said map_kptr_match_type is not being invoked for
> MEM_ALLOC kptr_xchg, so that is also an existing bug.

I will fix that bug first and this part of change probably not
needed any more.

> 
>>          if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
>>                  goto bad_type;
>>
>> [...]
>>          /* We need to verify reg->type and reg->btf, before accessing reg->btf */
>>          reg_name = btf_type_name(reg->btf, reg->btf_id);
>>
>> @@ -5084,7 +5091,17 @@ static bool rcu_safe_kptr(const struct btf_field *field)
>>   {
>>          const struct btf_field_kptr *kptr = &field->kptr;
>>
>> -       return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
>> +       return field->type == BPF_KPTR_PERCPU_REF ||
>> +              (field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id));
>> +}
>> +
>> +static u32 btf_ld_kptr_type(struct bpf_verifier_env *env, struct btf_field *kptr_field)
>> +{
>> +       if (!rcu_safe_kptr(kptr_field) || !in_rcu_cs(env))
>> +               return PTR_MAYBE_NULL | PTR_UNTRUSTED;
>> +       if (kptr_field->type != BPF_KPTR_PERCPU_REF)
>> +               return PTR_MAYBE_NULL | MEM_RCU;
>> +       return PTR_MAYBE_NULL | MEM_RCU | MEM_PERCPU;
> 
> The inverted conditions are a bit hard to follow. Maybe better to
> explicitly check for both RCU cases, and default to untrusted
> otherwise?

Okay. Will do.

> 
>>   }
>>
>> [...]
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj
  2023-08-19  1:01   ` Alexei Starovoitov
@ 2023-08-20  4:16     ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-20  4:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 8/18/23 6:01 PM, Alexei Starovoitov wrote:
> On Mon, Aug 14, 2023 at 10:28:30AM -0700, Yonghong Song wrote:
>> The bpf helpers bpf_this_cpu_ptr() and bpf_per_cpu_ptr() are re-purposed
>> for allocated percpu objects. For an allocated percpu obj,
>> the reg type is 'PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU'.
>>
>> The return type for these two re-purposed helpera is
>> 'PTR_TO_MEM | MEM_RCU | MEM_ALLOC'.
>> The MEM_ALLOC allows that the per-cpu data can be read and written.
>>
>> Since the memory allocator bpf_mem_alloc() returns
>> a ptr to a percpu ptr for percpu data, the first argument
>> of bpf_this_cpu_ptr() and bpf_per_cpu_ptr() is patched
>> with a dereference before passing to the helper func.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/linux/bpf_verifier.h |  1 +
>>   kernel/bpf/verifier.c        | 51 ++++++++++++++++++++++++++++++------
>>   2 files changed, 44 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index f70f9ac884d2..e23480db37ec 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -480,6 +480,7 @@ struct bpf_insn_aux_data {
>>   	bool zext_dst; /* this insn zero extends dst reg */
>>   	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
>>   	bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
>> +	bool percpu_ptr_prog_alloc; /* {this,per}_cpu_ptr() with prog alloc */
>>   	u8 alu_state; /* used in combination with alu_limit */
>>   
>>   	/* below fields are initialized once */
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index a985fbf18a11..6fc200cb68b6 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -6221,7 +6221,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>>   		}
>>   
>>   		if (type_is_alloc(reg->type) && !type_is_non_owning_ref(reg->type) &&
>> -		    !reg->ref_obj_id) {
>> +		    !(reg->type & MEM_RCU) && !reg->ref_obj_id) {
>>   			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
>>   			return -EFAULT;
>>   		}
>> @@ -7765,6 +7765,7 @@ static const struct bpf_reg_types btf_ptr_types = {
>>   static const struct bpf_reg_types percpu_btf_ptr_types = {
>>   	.types = {
>>   		PTR_TO_BTF_ID | MEM_PERCPU,
>> +		PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU,
>>   		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
>>   	}
>>   };
>> @@ -7945,6 +7946,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>>   			return -EACCES;
>>   		break;
>>   	case PTR_TO_BTF_ID | MEM_PERCPU:
>> +	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU:
>>   	case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED:
>>   		/* Handled by helper specific checks */
>>   		break;
>> @@ -8287,6 +8289,16 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
>>   			verbose(env, "Helper has invalid btf_id in R%d\n", regno);
>>   			return -EACCES;
>>   		}
>> +		if (reg->type & MEM_RCU) {
>> +			const struct btf_type *type = btf_type_by_id(reg->btf, reg->btf_id);
>> +
>> +			if (!type || !btf_type_is_struct(type)) {
>> +				verbose(env, "Helper has invalid btf/btf_id in R%d\n", regno);
>> +				return -EFAULT;
>> +			}
>> +			env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc = true;
>> +		}
> 
> Let's move this check out of check_func_arg() and do it in check_helper_call() after the loop.
> meta has what we need.
> Also I would do it only for specific helpers like:
> case BPF_FUNC_per_cpu_ptr:
> case BPF_FUNC_this_cpu_ptr:
>     if (reg[R1]->type & MEM_RCU) {
>        const struct btf_type *type = btf_type_by_id(meta->ret_btf, ...)
>        ...
>        returns_cpu_specific_alloc_ptr = true;
>        insn_aux_datap[].call_with_percpu_alloc_ptr = ture;
>      }

This is more explicit and easier to understand. Will use the above
in v2.

>> +
>>   		meta->ret_btf = reg->btf;
>>   		meta->ret_btf_id = reg->btf_id;
>>   		break;
>> @@ -9888,14 +9900,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>>   			regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
>>   			regs[BPF_REG_0].mem_size = tsize;
>>   		} else {
>> -			/* MEM_RDONLY may be carried from ret_flag, but it
>> -			 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
>> -			 * it will confuse the check of PTR_TO_BTF_ID in
>> -			 * check_mem_access().
>> -			 */
>> -			ret_flag &= ~MEM_RDONLY;
>> +			if (env->insn_aux_data[insn_idx].percpu_ptr_prog_alloc) {
> 
> and here I would only check the local bool returns_percpu_alloc_ptr.
> 
> Because returns_cpu_specific_alloc_ptr is not the same as call_with_percpu_alloc_ptr.
> 
> Like this_cpu_read() is a call_with_percpu_alloc_ptr, but it doesn't return cpu specific pointer.
> It derefs cpu specific pointer and returns the value.
> Of course, we don't have such helper today, but worth thinking ahead.

Agree. percpu_ptr_prog_alloc is a bad name. I actually spent some time
on how to name it but did not come with a good one. Will use
returns_cpu_specific_alloc_ptr as you suggested.

> 
>> +				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU;
>> +			} else {
>> +				/* MEM_RDONLY may be carried from ret_flag, but it
>> +				 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
>> +				 * it will confuse the check of PTR_TO_BTF_ID in
>> +				 * check_mem_access().
>> +				 */
>> +				ret_flag &= ~MEM_RDONLY;
>> +				regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
>> +			}
>>   
>> -			regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
>>   			regs[BPF_REG_0].btf = meta.ret_btf;
>>   			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
>>   		}
>> @@ -18646,6 +18662,25 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>>   			goto patch_call_imm;
>>   		}
>>   
>> +		/* bpf_per_cpu_ptr() and bpf_this_cpu_ptr() */
>> +		if (env->insn_aux_data[i + delta].percpu_ptr_prog_alloc) {
> 
> call_with_percpu_alloc_ptr
> 
>> +			/* patch with 'r1 = *(u64 *)(r1 + 0)' since for percpu data,
>> +			 * bpf_mem_alloc() returns a ptr to the percpu data ptr.
>> +			 */
>> +			insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0);
> 
> here R1 is no longer a concern, since we check R1 only in check_helper_call.
> 
>> +			insn_buf[1] = *insn;
>> +			cnt = 2;
>> +
>> +			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
>> +			if (!new_prog)
>> +				return -ENOMEM;
>> +
>> +			delta += cnt - 1;
>> +			env->prog = prog = new_prog;
>> +			insn = new_prog->insnsi + i + delta;
>> +			goto patch_call_imm;
>> +		}
>> +
>>   		/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
>>   		 * and other inlining handlers are currently limited to 64 bit
>>   		 * only.
>> -- 
>> 2.34.1
>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
  2023-08-19  1:44   ` Kumar Kartikeya Dwivedi
@ 2023-08-20  4:19     ` Yonghong Song
  0 siblings, 0 replies; 32+ messages in thread
From: Yonghong Song @ 2023-08-20  4:19 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 8/18/23 6:44 PM, Kumar Kartikeya Dwivedi wrote:
> On Mon, 14 Aug 2023 at 23:00, Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>> In previous selftests/bpf patch, we have
>>    p = bpf_percpu_obj_new(struct val_t);
>>    if (!p)
>>            goto out;
>>
>>    p1 = bpf_kptr_xchg(&e->pc, p);
>>    if (p1) {
>>            /* race condition */
>>            bpf_percpu_obj_drop(p1);
>>    }
>>
>>    p = e->pc;
>>    if (!p)
>>            goto out;
>>
>> After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
>> This is due to that the second argument of bpf_kptr_xchg() is marked
>> OBJ_RELEASE and it will be marked as invalid after the call.
>> So after bpf_kptr_xchg(), 'p' is an unknown scalar,
>> and the bpf program needs to reread from the map value.
>>
>> This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
>> and if 'p' is RCU protected. If this is the case, 'p' can be marked
>> as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
>> an owning reference any more. Such a change makes re-read
>> from the map value unnecessary.
>>
>> Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
>> a different value from 'p' if immediately before 'p = e->pc',
>> another cpu may do another bpf_kptr_xchg() and swap in another value
>> into 'e->pc'. If this is the case, then 'p = e->pc' may
>> get either 'p' or another value, and race condition already exists.
>> So removing direct re-reading seems fine too.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   kernel/bpf/verifier.c | 11 +++++++++--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 6fc200cb68b6..6fa458e13bfc 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8854,8 +8854,15 @@ static int release_reference(struct bpf_verifier_env *env,
>>                  return err;
>>
>>          bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>> -               if (reg->ref_obj_id == ref_obj_id)
>> -                       mark_reg_invalid(env, reg);
>> +               if (reg->ref_obj_id == ref_obj_id) {
>> +                       if (in_rcu_cs(env) && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
> 
> Wouldn't this check also be true in case of bpf_percpu_obj_drop(p)
> inside RCU CS/non-sleepable prog?
> Do we want to permit access to p after drop in that case? I think it
> will be a bit unintuitive.
> I think we should preserve normal behavior for everything except for
> kptr_xchg of a percpu_kptr.

You are correct. Above condition also applies to bpf_percpu_obj_drop()
and we should should change MEM_ALLOC to MEM_RCU only for
bpf_percpu_obj_new(). Will fix.

> 
>> +                               reg->ref_obj_id = 0;
>> +                               reg->type &= ~MEM_ALLOC;
>> +                               reg->type |= MEM_RCU;
>> +                       } else {
>> +                               mark_reg_invalid(env, reg);
>> +                       }
>> +               }
>>          }));
>>
>>          return 0;
>> --
>> 2.34.1
>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-08-20  5:58 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-14 17:28 [PATCH bpf-next 00/15] Add support for local percpu kptr Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 01/15] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 02/15] bpf: Add BPF_KPTR_PERCPU_REF as a field type Yonghong Song
2023-08-18 18:37   ` David Marchevsky
2023-08-18 23:24     ` Alexei Starovoitov
2023-08-20  3:46       ` Yonghong Song
2023-08-20  3:45     ` Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 03/15] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
2023-08-19  0:29   ` Alexei Starovoitov
2023-08-20  3:47     ` Yonghong Song
2023-08-19  1:24   ` Kumar Kartikeya Dwivedi
2023-08-20  4:04     ` Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 04/15] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
2023-08-19  1:01   ` Alexei Starovoitov
2023-08-20  4:16     ` Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 05/15] selftests/bpf: Update error message in negative linked_list test Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 06/15] libbpf: Add __percpu macro definition Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 07/15] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 08/15] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
2023-08-14 17:28 ` [PATCH bpf-next 09/15] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
2023-08-19  1:44   ` Kumar Kartikeya Dwivedi
2023-08-20  4:19     ` Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 10/15] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 11/15] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 12/15] bpf: Allow bpf_spin_lock and bpf_list_head in allocated percpu data structure Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 13/15] selftests/bpf: Add tests for percpu struct with bpf list head Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 14/15] selftests/bpf: Add some negative tests Yonghong Song
2023-08-14 17:29 ` [PATCH bpf-next 15/15] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
2023-08-18 15:54   ` Daniel Borkmann
2023-08-18 17:17     ` Yonghong Song
2023-08-18 18:26   ` Zvi Effron
2023-08-18 18:58     ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox