[PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs
@ 2025-02-20 13:45 Maciej Fijalkowski
  2025-02-20 13:45 ` [PATCH bpf-next 1/3] bpf: call btf_is_projection_of() conditionally Maciej Fijalkowski
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-20 13:45 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii
  Cc: netdev, magnus.karlsson, martin.lau, Maciej Fijalkowski

Hi!

This patchset provides what is needed for storing skbs as kptrs in bpf
maps. We start with necessary kernel change as discussed at [0] with
Martin, then next patch adds kfuncs for handling skb refcount and on top
of that a test case is added where one program stores skbs and then next
program is able to retrieve them from map.

Martin, regarding the kernel change I decided to go with boolean
approach instead of what you initially suggested. Let me know if it
works for you.

Thanks,
Maciej

[0]: https://lore.kernel.org/bpf/Z0X%2F9PhIhvQwsgfW@boxer/

Maciej Fijalkowski (3):
  bpf: call btf_is_projection_of() conditionally
  bpf: add kfunc for skb refcounting
  selftests: bpf: implement test case for skb kptr map storage

 include/linux/btf.h                           |  4 +-
 kernel/bpf/btf.c                              | 11 +--
 kernel/bpf/verifier.c                         |  3 +-
 net/core/filter.c                             | 62 +++++++++++++++
 .../selftests/bpf/prog_tests/skb_map_kptrs.c  | 75 ++++++++++++++++++
 .../selftests/bpf/progs/skb_map_kptrs.c       | 77 +++++++++++++++++++
 6 files changed, 224 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/skb_map_kptrs.c
 create mode 100644 tools/testing/selftests/bpf/progs/skb_map_kptrs.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next 1/3] bpf: call btf_is_projection_of() conditionally
  2025-02-20 13:45 [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Maciej Fijalkowski
@ 2025-02-20 13:45 ` Maciej Fijalkowski
  2025-02-20 13:45 ` [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting Maciej Fijalkowski
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-20 13:45 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii
  Cc: netdev, magnus.karlsson, martin.lau, Maciej Fijalkowski

Currently verifier thinks when sk_buff is used as kfunc's argument it is
coming as pointer to context (KF_ARG_PTR_TO_CTX) but in kfuncs that are
going to be introduced for sk_buff's refcount handling we want it to be
interpreted as KF_ARG_PTR_TO_BTF_ID.

Make it possible by calling btf_is_projection_of() conditionally.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 include/linux/btf.h   |  4 ++--
 kernel/bpf/btf.c      | 11 ++++++-----
 kernel/bpf/verifier.c |  3 ++-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index ebc0c0c9b944..1307ea17542a 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -585,7 +585,7 @@ struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf, u32 btf_id);
 bool btf_is_projection_of(const char *pname, const char *tname);
 bool btf_is_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 			   const struct btf_type *t, enum bpf_prog_type prog_type,
-			   int arg);
+			   int arg, bool check_proj);
 int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_type);
 bool btf_types_are_same(const struct btf *btf1, u32 id1,
 			const struct btf *btf2, u32 id2);
@@ -661,7 +661,7 @@ static inline struct btf_struct_meta *btf_find_struct_meta(const struct btf *btf
 static inline bool
 btf_is_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 		     const struct btf_type *t, enum bpf_prog_type prog_type,
-		     int arg)
+		     int arg, bool check_proj)
 {
 	return false;
 }
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 69f5752e880b..62bdf6980cfb 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5909,7 +5909,7 @@ bool btf_is_projection_of(const char *pname, const char *tname)
 
 bool btf_is_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 			  const struct btf_type *t, enum bpf_prog_type prog_type,
-			  int arg)
+			  int arg, bool check_proj)
 {
 	const struct btf_type *ctx_type;
 	const char *tname, *ctx_tname;
@@ -5969,8 +5969,9 @@ bool btf_is_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 	 * int socket_filter_bpf_prog(struct __sk_buff *skb)
 	 * { // no fields of skb are ever used }
 	 */
-	if (btf_is_projection_of(ctx_tname, tname))
-		return true;
+	if (check_proj)
+		if (btf_is_projection_of(ctx_tname, tname))
+			return true;
 	if (strcmp(ctx_tname, tname)) {
 		/* bpf_user_pt_regs_t is a typedef, so resolve it to
 		 * underlying struct and check name again
@@ -6133,7 +6134,7 @@ static int btf_translate_to_vmlinux(struct bpf_verifier_log *log,
 				     enum bpf_prog_type prog_type,
 				     int arg)
 {
-	if (!btf_is_prog_ctx_type(log, btf, t, prog_type, arg))
+	if (!btf_is_prog_ctx_type(log, btf, t, prog_type, arg, true))
 		return -ENOENT;
 	return find_kern_ctx_type_id(prog_type);
 }
@@ -7739,7 +7740,7 @@ int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog)
 		if (!btf_type_is_ptr(t))
 			goto skip_pointer;
 
-		if ((tags & ARG_TAG_CTX) || btf_is_prog_ctx_type(log, btf, t, prog_type, i)) {
+		if ((tags & ARG_TAG_CTX) || btf_is_prog_ctx_type(log, btf, t, prog_type, i, true)) {
 			if (tags & ~ARG_TAG_CTX) {
 				bpf_log(log, "arg#%d has invalid combination of tags\n", i);
 				return -EINVAL;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e57b7c949860..6492bfa4bc7a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11997,7 +11997,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	 * type to our caller. When a set of conditions hold in the BTF type of
 	 * arguments, we resolve it to a known kfunc_ptr_arg_type.
 	 */
-	if (btf_is_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog), argno))
+	if (btf_is_prog_ctx_type(&env->log, meta->btf, t, resolve_prog_type(env->prog),
+				 argno, false))
 		return KF_ARG_PTR_TO_CTX;
 
 	if (is_kfunc_arg_nullable(meta->btf, &args[argno]) && register_is_null(reg))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting
  2025-02-20 13:45 [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Maciej Fijalkowski
  2025-02-20 13:45 ` [PATCH bpf-next 1/3] bpf: call btf_is_projection_of() conditionally Maciej Fijalkowski
@ 2025-02-20 13:45 ` Maciej Fijalkowski
  2025-02-20 23:25   ` Amery Hung
  2025-02-21 23:57   ` Amery Hung
  2025-02-20 13:45 ` [PATCH bpf-next 3/3] selftests: bpf: implement test case for skb kptr map storage Maciej Fijalkowski
  2025-02-22  1:55 ` [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Alexei Starovoitov
  3 siblings, 2 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-20 13:45 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii
  Cc: netdev, magnus.karlsson, martin.lau, Maciej Fijalkowski

These have been mostly taken from Amery Hung's work related to bpf qdisc
implementation. bpf_skb_{acquire,release}() are for increment/decrement
sk_buff::users whereas bpf_skb_destroy() is called for map entries that
have not been released and map is being wiped out from system.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 2ec162dd83c4..9bd2701be088 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -12064,6 +12064,56 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
 
 __bpf_kfunc_end_defs();
 
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+/* bpf_skb_acquire - Acquire a reference to an skb. An skb acquired by this
+ * kfunc which is not stored in a map as a kptr, must be released by calling
+ * bpf_skb_release().
+ * @skb: The skb on which a reference is being acquired.
+ */
+__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb)
+{
+	if (refcount_inc_not_zero(&skb->users))
+		return skb;
+	return NULL;
+}
+
+/* bpf_skb_release - Release the reference acquired on an skb.
+ * @skb: The skb on which a reference is being released.
+ */
+__bpf_kfunc void bpf_skb_release(struct sk_buff *skb)
+{
+	skb_unref(skb);
+}
+
+/* bpf_skb_destroy - Release an skb reference acquired and exchanged into
+ * an allocated object or a map.
+ * @skb: The skb on which a reference is being released.
+ */
+__bpf_kfunc void bpf_skb_destroy(struct sk_buff *skb)
+{
+	(void)skb_unref(skb);
+	consume_skb(skb);
+}
+
+__diag_pop();
+
+BTF_KFUNCS_START(skb_kfunc_btf_ids)
+BTF_ID_FLAGS(func, bpf_skb_acquire, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE)
+BTF_KFUNCS_END(skb_kfunc_btf_ids)
+
+static const struct btf_kfunc_id_set skb_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &skb_kfunc_btf_ids,
+};
+
+BTF_ID_LIST(skb_kfunc_dtor_ids)
+BTF_ID(struct, sk_buff)
+BTF_ID_FLAGS(func, bpf_skb_destroy, KF_RELEASE)
+
 int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
 			       struct bpf_dynptr *ptr__uninit)
 {
@@ -12117,6 +12167,13 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
 
 static int __init bpf_kfunc_init(void)
 {
+	const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = {
+		{
+			.btf_id       = skb_kfunc_dtor_ids[0],
+			.kfunc_btf_id = skb_kfunc_dtor_ids[1]
+		},
+	};
+
 	int ret;
 
 	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb);
@@ -12133,6 +12190,11 @@ static int __init bpf_kfunc_init(void)
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
 	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
 					       &bpf_kfunc_set_sock_addr);
+	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &skb_kfunc_set);
+
+	ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors,
+						 ARRAY_SIZE(skb_kfunc_dtors),
+						 THIS_MODULE);
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
 }
 late_initcall(bpf_kfunc_init);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next 3/3] selftests: bpf: implement test case for skb kptr map storage
  2025-02-20 13:45 [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Maciej Fijalkowski
  2025-02-20 13:45 ` [PATCH bpf-next 1/3] bpf: call btf_is_projection_of() conditionally Maciej Fijalkowski
  2025-02-20 13:45 ` [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting Maciej Fijalkowski
@ 2025-02-20 13:45 ` Maciej Fijalkowski
  2025-02-22  1:55 ` [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Alexei Starovoitov
  3 siblings, 0 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-20 13:45 UTC (permalink / raw)
  To: bpf, ast, daniel, andrii
  Cc: netdev, magnus.karlsson, martin.lau, Maciej Fijalkowski

Add a test case to exercise kptrs behavior against sk_buff structs
stored in eBPF map.

Let us have two programs, one for store and other for retrieving
sk_buffs from pinned map. Load first prog and run as many times as size
of map so that second program will see the map full and will be able to
retrieve each of sk_buff that bpf_prog_test_run_skb() created for us.

Reason for running the progs MAX_ENTRIES times from user space instead
of utilizing @repeat argument of is that we would like to have unique
skbs handled in map. With @repeat usage it would result in storing the
same skb.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 .../selftests/bpf/prog_tests/skb_map_kptrs.c  | 75 ++++++++++++++++++
 .../selftests/bpf/progs/skb_map_kptrs.c       | 77 +++++++++++++++++++
 2 files changed, 152 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/skb_map_kptrs.c
 create mode 100644 tools/testing/selftests/bpf/progs/skb_map_kptrs.c

diff --git a/tools/testing/selftests/bpf/prog_tests/skb_map_kptrs.c b/tools/testing/selftests/bpf/prog_tests/skb_map_kptrs.c
new file mode 100644
index 000000000000..993beac6c344
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/skb_map_kptrs.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+#define SKB_KPTR_MAP_PATH "/sys/fs/bpf/skb_kptr_map"
+
+static void skb_map_kptrs(void)
+{
+	int err, prog_fd, store_fd, get_fd, map_fd;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	char buff[128] = {};
+	struct bpf_map *map;
+	int i;
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = buff,
+		.data_size_in = sizeof(buff),
+		.repeat = 1,
+	);
+
+	err = bpf_prog_test_load("skb_map_kptrs.bpf.o", BPF_PROG_TYPE_SCHED_CLS, &obj,
+				 &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	map = bpf_object__find_map_by_name(obj, "skb_map");
+	if (CHECK_FAIL(!map))
+		goto map_err;
+
+	map_fd = bpf_map__fd(map);
+	if (map_fd < 0)
+		goto map_err;
+
+	err = bpf_obj_pin(map_fd, SKB_KPTR_MAP_PATH);
+	if (err < 0)
+		goto map_err;
+
+	prog = bpf_object__find_program_by_name(obj, "tc_skb_map_store");
+	if (CHECK_FAIL(!prog))
+		goto out;
+
+	store_fd = bpf_program__fd(prog);
+	if (CHECK_FAIL(store_fd < 0))
+		goto out;
+
+	// store skbs
+	for (i = 0; i < bpf_map__max_entries(map); i++) {
+		err = bpf_prog_test_run_opts(store_fd, &topts);
+		ASSERT_OK(err, "skb kptr store");
+	}
+
+	prog = bpf_object__find_program_by_name(obj, "tc_skb_map_get");
+	if (CHECK_FAIL(!prog))
+		goto out;
+
+	get_fd = bpf_program__fd(prog);
+	if (CHECK_FAIL(get_fd < 0))
+		goto out;
+
+	// get skbs
+	for (i = 0; i < bpf_map__max_entries(map); i++) {
+		err = bpf_prog_test_run_opts(get_fd, &topts);
+		ASSERT_OK(err, "skb kptr get");
+	}
+
+out:
+	unlink(SKB_KPTR_MAP_PATH);
+map_err:
+	bpf_object__close(obj);
+}
+
+void test_skb_map_kptrs(void)
+{
+	skb_map_kptrs();
+}
+
diff --git a/tools/testing/selftests/bpf/progs/skb_map_kptrs.c b/tools/testing/selftests/bpf/progs/skb_map_kptrs.c
new file mode 100644
index 000000000000..f4972978cb04
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/skb_map_kptrs.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+void *bpf_cast_to_kern_ctx(void *) __ksym;
+struct sk_buff *bpf_skb_acquire(struct sk_buff *skb) __ksym;
+void bpf_skb_release(struct sk_buff *skb) __ksym;
+
+struct skb_map_val {
+	struct sk_buff __kptr * skb;
+};
+
+static __u32 get_idx;
+static __u32 store_idx;
+
+#define MAX_ENTRIES 100
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, __u32);
+	__type(value, struct skb_map_val);
+	__uint(max_entries, MAX_ENTRIES);
+} skb_map SEC(".maps");
+
+static __always_inline __u32 idx_bump(__u32 idx)
+{
+	return idx >= MAX_ENTRIES ? 0 : idx + 1;
+}
+
+SEC("tc") int tc_skb_map_store(struct __sk_buff *ctx)
+{
+	struct sk_buff *skbk = bpf_cast_to_kern_ctx(ctx);
+	struct skb_map_val *map_entry, tmp_entry;
+	struct sk_buff *tmp;
+
+	tmp_entry.skb = NULL;
+	bpf_map_update_elem(&skb_map, &store_idx, &tmp_entry, BPF_ANY);
+	map_entry = bpf_map_lookup_elem(&skb_map, &store_idx);
+	if (!map_entry)
+		return -1;
+
+	skbk = bpf_skb_acquire(skbk);
+	if (!skbk)
+		return -2;
+
+	tmp = bpf_kptr_xchg(&map_entry->skb, skbk);
+	if (tmp)
+		bpf_skb_release(tmp);
+
+	store_idx = idx_bump(store_idx);
+
+	return 0;
+}
+
+SEC("tc") int tc_skb_map_get(struct __sk_buff *ctx)
+{
+	struct sk_buff *stored_skb = NULL;
+	struct skb_map_val *map_entry;
+	struct sk_buff *tmp = NULL;
+
+	(void)ctx;
+
+	map_entry = bpf_map_lookup_elem(&skb_map, &get_idx);
+	if (!map_entry)
+		return -1;
+
+	stored_skb = bpf_kptr_xchg(&map_entry->skb, tmp);
+	if (!stored_skb)
+		return -2;
+
+	bpf_skb_release(stored_skb);
+	get_idx = idx_bump(get_idx);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting
  2025-02-20 13:45 ` [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting Maciej Fijalkowski
@ 2025-02-20 23:25   ` Amery Hung
  2025-02-21 14:56     ` Maciej Fijalkowski
  2025-02-21 23:57   ` Amery Hung
  1 sibling, 1 reply; 14+ messages in thread
From: Amery Hung @ 2025-02-20 23:25 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel, andrii
  Cc: netdev, magnus.karlsson, martin.lau



On 2/20/2025 5:45 AM, Maciej Fijalkowski wrote:
> These have been mostly taken from Amery Hung's work related to bpf qdisc
> implementation. bpf_skb_{acquire,release}() are for increment/decrement
> sk_buff::users whereas bpf_skb_destroy() is called for map entries that
> have not been released and map is being wiped out from system.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>   net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2ec162dd83c4..9bd2701be088 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -12064,6 +12064,56 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
>   
>   __bpf_kfunc_end_defs();
>   
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in vmlinux BTF");
> +
> +/* bpf_skb_acquire - Acquire a reference to an skb. An skb acquired by this
> + * kfunc which is not stored in a map as a kptr, must be released by calling
> + * bpf_skb_release().
> + * @skb: The skb on which a reference is being acquired.
> + */
> +__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb)
> +{
> +	if (refcount_inc_not_zero(&skb->users))
> +		return skb;
> +	return NULL;
> +}
> +
> +/* bpf_skb_release - Release the reference acquired on an skb.
> + * @skb: The skb on which a reference is being released.
> + */
> +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb)
> +{
> +	skb_unref(skb);
> +}
> +
> +/* bpf_skb_destroy - Release an skb reference acquired and exchanged into
> + * an allocated object or a map.
> + * @skb: The skb on which a reference is being released.
> + */
> +__bpf_kfunc void bpf_skb_destroy(struct sk_buff *skb)
> +{
> +	(void)skb_unref(skb);
> +	consume_skb(skb);
> +}
> +
> +__diag_pop();
> +
> +BTF_KFUNCS_START(skb_kfunc_btf_ids)
> +BTF_ID_FLAGS(func, bpf_skb_acquire, KF_ACQUIRE | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE)
> +BTF_KFUNCS_END(skb_kfunc_btf_ids)
> +
> +static const struct btf_kfunc_id_set skb_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &skb_kfunc_btf_ids,
> +};
> +
> +BTF_ID_LIST(skb_kfunc_dtor_ids)
> +BTF_ID(struct, sk_buff)
> +BTF_ID_FLAGS(func, bpf_skb_destroy, KF_RELEASE)
> +
>   int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
>   			       struct bpf_dynptr *ptr__uninit)
>   {
> @@ -12117,6 +12167,13 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
>   
>   static int __init bpf_kfunc_init(void)
>   {
> +	const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = {
> +		{
> +			.btf_id       = skb_kfunc_dtor_ids[0],
> +			.kfunc_btf_id = skb_kfunc_dtor_ids[1]
> +		},
> +	};
> +
>   	int ret;
>   
>   	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb);
> @@ -12133,6 +12190,11 @@ static int __init bpf_kfunc_init(void)
>   	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
>   	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
>   					       &bpf_kfunc_set_sock_addr);
> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &skb_kfunc_set);
> +
> +	ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors,
> +						 ARRAY_SIZE(skb_kfunc_dtors),
> +						 THIS_MODULE);

I think we will need to deal with two versions of skb dtors here. Both 
qdisc and cls will register dtor associated for skb. The qdisc one just 
call kfree_skb(). While only one can exist for a specific btf id in the 
kernel if I understand correctly. Is it possible to have one that work
for both use cases?

>   	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
>   }
>   late_initcall(bpf_kfunc_init);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting
  2025-02-20 23:25   ` Amery Hung
@ 2025-02-21 14:56     ` Maciej Fijalkowski
  2025-02-21 19:11       ` Amery Hung
  0 siblings, 1 reply; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-21 14:56 UTC (permalink / raw)
  To: Amery Hung; +Cc: bpf, ast, daniel, andrii, netdev, magnus.karlsson, martin.lau

On Thu, Feb 20, 2025 at 03:25:03PM -0800, Amery Hung wrote:
> 
> 
> On 2/20/2025 5:45 AM, Maciej Fijalkowski wrote:
> > These have been mostly taken from Amery Hung's work related to bpf qdisc
> > implementation. bpf_skb_{acquire,release}() are for increment/decrement
> > sk_buff::users whereas bpf_skb_destroy() is called for map entries that
> > have not been released and map is being wiped out from system.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >   net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 62 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 2ec162dd83c4..9bd2701be088 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -12064,6 +12064,56 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
> >   __bpf_kfunc_end_defs();
> > +__diag_push();
> > +__diag_ignore_all("-Wmissing-prototypes",
> > +		  "Global functions as their definitions will be in vmlinux BTF");
> > +
> > +/* bpf_skb_acquire - Acquire a reference to an skb. An skb acquired by this
> > + * kfunc which is not stored in a map as a kptr, must be released by calling
> > + * bpf_skb_release().
> > + * @skb: The skb on which a reference is being acquired.
> > + */
> > +__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb)
> > +{
> > +	if (refcount_inc_not_zero(&skb->users))
> > +		return skb;
> > +	return NULL;
> > +}
> > +
> > +/* bpf_skb_release - Release the reference acquired on an skb.
> > + * @skb: The skb on which a reference is being released.
> > + */
> > +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb)
> > +{
> > +	skb_unref(skb);
> > +}
> > +
> > +/* bpf_skb_destroy - Release an skb reference acquired and exchanged into
> > + * an allocated object or a map.
> > + * @skb: The skb on which a reference is being released.
> > + */
> > +__bpf_kfunc void bpf_skb_destroy(struct sk_buff *skb)
> > +{
> > +	(void)skb_unref(skb);
> > +	consume_skb(skb);
> > +}
> > +
> > +__diag_pop();
> > +
> > +BTF_KFUNCS_START(skb_kfunc_btf_ids)
> > +BTF_ID_FLAGS(func, bpf_skb_acquire, KF_ACQUIRE | KF_RET_NULL)
> > +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE)
> > +BTF_KFUNCS_END(skb_kfunc_btf_ids)
> > +
> > +static const struct btf_kfunc_id_set skb_kfunc_set = {
> > +	.owner = THIS_MODULE,
> > +	.set   = &skb_kfunc_btf_ids,
> > +};
> > +
> > +BTF_ID_LIST(skb_kfunc_dtor_ids)
> > +BTF_ID(struct, sk_buff)
> > +BTF_ID_FLAGS(func, bpf_skb_destroy, KF_RELEASE)
> > +
> >   int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> >   			       struct bpf_dynptr *ptr__uninit)
> >   {
> > @@ -12117,6 +12167,13 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
> >   static int __init bpf_kfunc_init(void)
> >   {
> > +	const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = {
> > +		{
> > +			.btf_id       = skb_kfunc_dtor_ids[0],
> > +			.kfunc_btf_id = skb_kfunc_dtor_ids[1]
> > +		},
> > +	};
> > +
> >   	int ret;
> >   	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb);
> > @@ -12133,6 +12190,11 @@ static int __init bpf_kfunc_init(void)
> >   	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
> >   	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> >   					       &bpf_kfunc_set_sock_addr);
> > +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &skb_kfunc_set);
> > +
> > +	ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors,
> > +						 ARRAY_SIZE(skb_kfunc_dtors),
> > +						 THIS_MODULE);
> 
> I think we will need to deal with two versions of skb dtors here. Both qdisc
> and cls will register dtor associated for skb. The qdisc one just call
> kfree_skb(). While only one can exist for a specific btf id in the kernel if
> I understand correctly. Is it possible to have one that work
> for both use cases?

Looking at the current code it seems bpf_find_btf_id() (which
btf_parse_kptr() calls) will go through modules and return the first match
against sk_buff btf but that's currently a wild guess from my side. So
your concern stands as we have no mechanism that would distinguish the
dtors for same btf id.

I would have to take a deeper look at btf_parse_kptr() and find some way
to associate dtor with its module during registering and then use it
within btf_find_dtor_kfunc(). Would this be sufficient?

> 
> >   	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> >   }
> >   late_initcall(bpf_kfunc_init);
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting
  2025-02-21 14:56     ` Maciej Fijalkowski
@ 2025-02-21 19:11       ` Amery Hung
  2025-02-21 20:17         ` Maciej Fijalkowski
  0 siblings, 1 reply; 14+ messages in thread
From: Amery Hung @ 2025-02-21 19:11 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, ast, daniel, andrii, netdev, magnus.karlsson, martin.lau

On Fri, Feb 21, 2025 at 6:57 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Thu, Feb 20, 2025 at 03:25:03PM -0800, Amery Hung wrote:
> >
> >
> > On 2/20/2025 5:45 AM, Maciej Fijalkowski wrote:
> > > These have been mostly taken from Amery Hung's work related to bpf qdisc
> > > implementation. bpf_skb_{acquire,release}() are for increment/decrement
> > > sk_buff::users whereas bpf_skb_destroy() is called for map entries that
> > > have not been released and map is being wiped out from system.
> > >
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > >   net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 62 insertions(+)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 2ec162dd83c4..9bd2701be088 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -12064,6 +12064,56 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
> > >   __bpf_kfunc_end_defs();
> > > +__diag_push();
> > > +__diag_ignore_all("-Wmissing-prototypes",
> > > +             "Global functions as their definitions will be in vmlinux BTF");
> > > +
> > > +/* bpf_skb_acquire - Acquire a reference to an skb. An skb acquired by this
> > > + * kfunc which is not stored in a map as a kptr, must be released by calling
> > > + * bpf_skb_release().
> > > + * @skb: The skb on which a reference is being acquired.
> > > + */
> > > +__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb)
> > > +{
> > > +   if (refcount_inc_not_zero(&skb->users))
> > > +           return skb;
> > > +   return NULL;
> > > +}
> > > +
> > > +/* bpf_skb_release - Release the reference acquired on an skb.
> > > + * @skb: The skb on which a reference is being released.
> > > + */
> > > +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb)
> > > +{
> > > +   skb_unref(skb);
> > > +}
> > > +
> > > +/* bpf_skb_destroy - Release an skb reference acquired and exchanged into
> > > + * an allocated object or a map.
> > > + * @skb: The skb on which a reference is being released.
> > > + */
> > > +__bpf_kfunc void bpf_skb_destroy(struct sk_buff *skb)
> > > +{
> > > +   (void)skb_unref(skb);
> > > +   consume_skb(skb);
> > > +}
> > > +
> > > +__diag_pop();
> > > +
> > > +BTF_KFUNCS_START(skb_kfunc_btf_ids)
> > > +BTF_ID_FLAGS(func, bpf_skb_acquire, KF_ACQUIRE | KF_RET_NULL)
> > > +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE)
> > > +BTF_KFUNCS_END(skb_kfunc_btf_ids)
> > > +
> > > +static const struct btf_kfunc_id_set skb_kfunc_set = {
> > > +   .owner = THIS_MODULE,
> > > +   .set   = &skb_kfunc_btf_ids,
> > > +};
> > > +
> > > +BTF_ID_LIST(skb_kfunc_dtor_ids)
> > > +BTF_ID(struct, sk_buff)
> > > +BTF_ID_FLAGS(func, bpf_skb_destroy, KF_RELEASE)
> > > +
> > >   int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> > >                            struct bpf_dynptr *ptr__uninit)
> > >   {
> > > @@ -12117,6 +12167,13 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
> > >   static int __init bpf_kfunc_init(void)
> > >   {
> > > +   const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = {
> > > +           {
> > > +                   .btf_id       = skb_kfunc_dtor_ids[0],
> > > +                   .kfunc_btf_id = skb_kfunc_dtor_ids[1]
> > > +           },
> > > +   };
> > > +
> > >     int ret;
> > >     ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb);
> > > @@ -12133,6 +12190,11 @@ static int __init bpf_kfunc_init(void)
> > >     ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
> > >     ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> > >                                            &bpf_kfunc_set_sock_addr);
> > > +   ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &skb_kfunc_set);
> > > +
> > > +   ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors,
> > > +                                            ARRAY_SIZE(skb_kfunc_dtors),
> > > +                                            THIS_MODULE);
> >
> > I think we will need to deal with two versions of skb dtors here. Both qdisc
> > and cls will register dtor associated for skb. The qdisc one just call
> > kfree_skb(). While only one can exist for a specific btf id in the kernel if
> > I understand correctly. Is it possible to have one that work
> > for both use cases?
>
> Looking at the current code it seems bpf_find_btf_id() (which
> btf_parse_kptr() calls) will go through modules and return the first match
> against sk_buff btf but that's currently a wild guess from my side. So
> your concern stands as we have no mechanism that would distinguish the
> dtors for same btf id.
>
> I would have to take a deeper look at btf_parse_kptr() and find some way
> to associate dtor with its module during registering and then use it
> within btf_find_dtor_kfunc(). Would this be sufficient?
>

That might not be enough. Ultimately, if the user configures both
modules to be built-in, then there is no way to tell where a trusted
skb kptr in a bpf program is from.

Two possible ways to solve this:

1. Make the dtor be able to tell whether the skb is from qdisc or cls.
Since we are both in the TC layer, maybe we can use skb->cb for this?

2. Associate KF_ACQUIRE kfuncs with the corresponding KF_RELEASE
kfuncs somehow. Carry this additional information as the kptr
propagates in the bpf world so that we know which dtor to call. This
seems to be overly complicated.


> >
> > >     return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> > >   }
> > >   late_initcall(bpf_kfunc_init);
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting
  2025-02-21 19:11       ` Amery Hung
@ 2025-02-21 20:17         ` Maciej Fijalkowski
  2025-02-21 23:40           ` Amery Hung
  0 siblings, 1 reply; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-21 20:17 UTC (permalink / raw)
  To: Amery Hung; +Cc: bpf, ast, daniel, andrii, netdev, magnus.karlsson, martin.lau

On Fri, Feb 21, 2025 at 11:11:27AM -0800, Amery Hung wrote:
> On Fri, Feb 21, 2025 at 6:57 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Thu, Feb 20, 2025 at 03:25:03PM -0800, Amery Hung wrote:
> > >
> > >
> > > On 2/20/2025 5:45 AM, Maciej Fijalkowski wrote:
> > > > These have been mostly taken from Amery Hung's work related to bpf qdisc
> > > > implementation. bpf_skb_{acquire,release}() are for increment/decrement
> > > > sk_buff::users whereas bpf_skb_destroy() is called for map entries that
> > > > have not been released and map is being wiped out from system.
> > > >
> > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > > ---
> > > >   net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
> > > >   1 file changed, 62 insertions(+)
> > > >
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index 2ec162dd83c4..9bd2701be088 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -12064,6 +12064,56 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
> > > >   __bpf_kfunc_end_defs();
> > > > +__diag_push();
> > > > +__diag_ignore_all("-Wmissing-prototypes",
> > > > +             "Global functions as their definitions will be in vmlinux BTF");
> > > > +
> > > > +/* bpf_skb_acquire - Acquire a reference to an skb. An skb acquired by this
> > > > + * kfunc which is not stored in a map as a kptr, must be released by calling
> > > > + * bpf_skb_release().
> > > > + * @skb: The skb on which a reference is being acquired.
> > > > + */
> > > > +__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb)
> > > > +{
> > > > +   if (refcount_inc_not_zero(&skb->users))
> > > > +           return skb;
> > > > +   return NULL;
> > > > +}
> > > > +
> > > > +/* bpf_skb_release - Release the reference acquired on an skb.
> > > > + * @skb: The skb on which a reference is being released.
> > > > + */
> > > > +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb)
> > > > +{
> > > > +   skb_unref(skb);
> > > > +}
> > > > +
> > > > +/* bpf_skb_destroy - Release an skb reference acquired and exchanged into
> > > > + * an allocated object or a map.
> > > > + * @skb: The skb on which a reference is being released.
> > > > + */
> > > > +__bpf_kfunc void bpf_skb_destroy(struct sk_buff *skb)
> > > > +{
> > > > +   (void)skb_unref(skb);
> > > > +   consume_skb(skb);
> > > > +}
> > > > +
> > > > +__diag_pop();
> > > > +
> > > > +BTF_KFUNCS_START(skb_kfunc_btf_ids)
> > > > +BTF_ID_FLAGS(func, bpf_skb_acquire, KF_ACQUIRE | KF_RET_NULL)
> > > > +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE)
> > > > +BTF_KFUNCS_END(skb_kfunc_btf_ids)
> > > > +
> > > > +static const struct btf_kfunc_id_set skb_kfunc_set = {
> > > > +   .owner = THIS_MODULE,
> > > > +   .set   = &skb_kfunc_btf_ids,
> > > > +};
> > > > +
> > > > +BTF_ID_LIST(skb_kfunc_dtor_ids)
> > > > +BTF_ID(struct, sk_buff)
> > > > +BTF_ID_FLAGS(func, bpf_skb_destroy, KF_RELEASE)
> > > > +
> > > >   int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> > > >                            struct bpf_dynptr *ptr__uninit)
> > > >   {
> > > > @@ -12117,6 +12167,13 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
> > > >   static int __init bpf_kfunc_init(void)
> > > >   {
> > > > +   const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = {
> > > > +           {
> > > > +                   .btf_id       = skb_kfunc_dtor_ids[0],
> > > > +                   .kfunc_btf_id = skb_kfunc_dtor_ids[1]
> > > > +           },
> > > > +   };
> > > > +
> > > >     int ret;
> > > >     ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb);
> > > > @@ -12133,6 +12190,11 @@ static int __init bpf_kfunc_init(void)
> > > >     ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
> > > >     ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> > > >                                            &bpf_kfunc_set_sock_addr);
> > > > +   ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &skb_kfunc_set);
> > > > +
> > > > +   ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors,
> > > > +                                            ARRAY_SIZE(skb_kfunc_dtors),
> > > > +                                            THIS_MODULE);
> > >
> > > I think we will need to deal with two versions of skb dtors here. Both qdisc
> > > and cls will register dtor associated for skb. The qdisc one just call
> > > kfree_skb(). While only one can exist for a specific btf id in the kernel if
> > > I understand correctly. Is it possible to have one that work
> > > for both use cases?
> >
> > Looking at the current code it seems bpf_find_btf_id() (which
> > btf_parse_kptr() calls) will go through modules and return the first match
> > against sk_buff btf but that's currently a wild guess from my side. So
> > your concern stands as we have no mechanism that would distinguish the
> > dtors for same btf id.
> >
> > I would have to take a deeper look at btf_parse_kptr() and find some way
> > to associate dtor with its module during registering and then use it
> > within btf_find_dtor_kfunc(). Would this be sufficient?
> >
> 
> That might not be enough. Ultimately, if the user configures both
> modules to be built-in, then there is no way to tell where a trusted
> skb kptr in a bpf program is from.

Am I missing something or how are you going to use the kfuncs that are
required for loading skb onto map as kptr without loaded module? Dtors are
owned by same module as corresponding acquire/release kfuncs.

> 
> Two possible ways to solve this:
> 
> 1. Make the dtor be able to tell whether the skb is from qdisc or cls.
> Since we are both in the TC layer, maybe we can use skb->cb for this?
> 
> 2. Associate KF_ACQUIRE kfuncs with the corresponding KF_RELEASE
> kfuncs somehow. Carry this additional information as the kptr
> propagates in the bpf world so that we know which dtor to call. This
> seems to be overly complicated.
> 
> 
> > >
> > > >     return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> > > >   }
> > > >   late_initcall(bpf_kfunc_init);
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting
  2025-02-21 20:17         ` Maciej Fijalkowski
@ 2025-02-21 23:40           ` Amery Hung
  0 siblings, 0 replies; 14+ messages in thread
From: Amery Hung @ 2025-02-21 23:40 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, ast, daniel, andrii, netdev, magnus.karlsson, martin.lau

On Fri, Feb 21, 2025 at 12:17 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Fri, Feb 21, 2025 at 11:11:27AM -0800, Amery Hung wrote:
> > On Fri, Feb 21, 2025 at 6:57 AM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > On Thu, Feb 20, 2025 at 03:25:03PM -0800, Amery Hung wrote:
> > > >
> > > >
> > > > On 2/20/2025 5:45 AM, Maciej Fijalkowski wrote:
> > > > > These have been mostly taken from Amery Hung's work related to bpf qdisc
> > > > > implementation. bpf_skb_{acquire,release}() are for increment/decrement
> > > > > sk_buff::users whereas bpf_skb_destroy() is called for map entries that
> > > > > have not been released and map is being wiped out from system.
> > > > >
> > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > > > ---
> > > > >   net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
> > > > >   1 file changed, 62 insertions(+)
> > > > >
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index 2ec162dd83c4..9bd2701be088 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -12064,6 +12064,56 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
> > > > >   __bpf_kfunc_end_defs();
> > > > > +__diag_push();
> > > > > +__diag_ignore_all("-Wmissing-prototypes",
> > > > > +             "Global functions as their definitions will be in vmlinux BTF");
> > > > > +
> > > > > +/* bpf_skb_acquire - Acquire a reference to an skb. An skb acquired by this
> > > > > + * kfunc which is not stored in a map as a kptr, must be released by calling
> > > > > + * bpf_skb_release().
> > > > > + * @skb: The skb on which a reference is being acquired.
> > > > > + */
> > > > > +__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb)
> > > > > +{
> > > > > +   if (refcount_inc_not_zero(&skb->users))
> > > > > +           return skb;
> > > > > +   return NULL;
> > > > > +}
> > > > > +
> > > > > +/* bpf_skb_release - Release the reference acquired on an skb.
> > > > > + * @skb: The skb on which a reference is being released.
> > > > > + */
> > > > > +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb)
> > > > > +{
> > > > > +   skb_unref(skb);
> > > > > +}
> > > > > +
> > > > > +/* bpf_skb_destroy - Release an skb reference acquired and exchanged into
> > > > > + * an allocated object or a map.
> > > > > + * @skb: The skb on which a reference is being released.
> > > > > + */
> > > > > +__bpf_kfunc void bpf_skb_destroy(struct sk_buff *skb)
> > > > > +{
> > > > > +   (void)skb_unref(skb);
> > > > > +   consume_skb(skb);
> > > > > +}
> > > > > +
> > > > > +__diag_pop();
> > > > > +
> > > > > +BTF_KFUNCS_START(skb_kfunc_btf_ids)
> > > > > +BTF_ID_FLAGS(func, bpf_skb_acquire, KF_ACQUIRE | KF_RET_NULL)
> > > > > +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE)
> > > > > +BTF_KFUNCS_END(skb_kfunc_btf_ids)
> > > > > +
> > > > > +static const struct btf_kfunc_id_set skb_kfunc_set = {
> > > > > +   .owner = THIS_MODULE,
> > > > > +   .set   = &skb_kfunc_btf_ids,
> > > > > +};
> > > > > +
> > > > > +BTF_ID_LIST(skb_kfunc_dtor_ids)
> > > > > +BTF_ID(struct, sk_buff)
> > > > > +BTF_ID_FLAGS(func, bpf_skb_destroy, KF_RELEASE)
> > > > > +
> > > > >   int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> > > > >                            struct bpf_dynptr *ptr__uninit)
> > > > >   {
> > > > > @@ -12117,6 +12167,13 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
> > > > >   static int __init bpf_kfunc_init(void)
> > > > >   {
> > > > > +   const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = {
> > > > > +           {
> > > > > +                   .btf_id       = skb_kfunc_dtor_ids[0],
> > > > > +                   .kfunc_btf_id = skb_kfunc_dtor_ids[1]
> > > > > +           },
> > > > > +   };
> > > > > +
> > > > >     int ret;
> > > > >     ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb);
> > > > > @@ -12133,6 +12190,11 @@ static int __init bpf_kfunc_init(void)
> > > > >     ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
> > > > >     ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> > > > >                                            &bpf_kfunc_set_sock_addr);
> > > > > +   ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &skb_kfunc_set);
> > > > > +
> > > > > +   ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors,
> > > > > +                                            ARRAY_SIZE(skb_kfunc_dtors),
> > > > > +                                            THIS_MODULE);
> > > >
> > > > I think we will need to deal with two versions of skb dtors here. Both qdisc
> > > > and cls will register dtor associated for skb. The qdisc one just call
> > > > kfree_skb(). While only one can exist for a specific btf id in the kernel if
> > > > I understand correctly. Is it possible to have one that work
> > > > for both use cases?
> > >
> > > Looking at the current code it seems bpf_find_btf_id() (which
> > > btf_parse_kptr() calls) will go through modules and return the first match
> > > against sk_buff btf but that's currently a wild guess from my side. So
> > > your concern stands as we have no mechanism that would distinguish the
> > > dtors for same btf id.
> > >
> > > I would have to take a deeper look at btf_parse_kptr() and find some way
> > > to associate dtor with its module during registering and then use it
> > > within btf_find_dtor_kfunc(). Would this be sufficient?
> > >
> >
> > That might not be enough. Ultimately, if the user configures both
> > modules to be built-in, then there is no way to tell where a trusted
> > skb kptr in a bpf program is from.
>
> Am I missing something or how are you going to use the kfuncs that are
> required for loading skb onto map as kptr without loaded module? Dtors are
> owned by same module as corresponding acquire/release kfuncs.
>

bpf qdisc will be either built-in (CONFIG_NET_SCH_BPF=y) or not
enabled (=n). classifier can be either y or m.

Dtors are associated with (btf, btf_id). So in case both are built in,
only one of them should be registered to (btf_vmlinux, struct skb).
The current implementation does not check if a dtor of a btf id is
already registered in register_btf_id_dtor_kfunc, but I don't think we
should do it in the first place.
}



> >
> > Two possible ways to solve this:
> >
> > 1. Make the dtor be able to tell whether the skb is from qdisc or cls.
> > Since we are both in the TC layer, maybe we can use skb->cb for this?
> >
> > 2. Associate KF_ACQUIRE kfuncs with the corresponding KF_RELEASE
> > kfuncs somehow. Carry this additional information as the kptr
> > propagates in the bpf world so that we know which dtor to call. This
> > seems to be overly complicated.
> >
> >
> > > >
> > > > >     return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> > > > >   }
> > > > >   late_initcall(bpf_kfunc_init);
> > > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting
  2025-02-20 13:45 ` [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting Maciej Fijalkowski
  2025-02-20 23:25   ` Amery Hung
@ 2025-02-21 23:57   ` Amery Hung
  1 sibling, 0 replies; 14+ messages in thread
From: Amery Hung @ 2025-02-21 23:57 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel, andrii
  Cc: netdev, magnus.karlsson, martin.lau



On 2/20/2025 5:45 AM, Maciej Fijalkowski wrote:
> These have been mostly taken from Amery Hung's work related to bpf qdisc
> implementation. bpf_skb_{acquire,release}() are for increment/decrement
> sk_buff::users whereas bpf_skb_destroy() is called for map entries that
> have not been released and map is being wiped out from system.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>   net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 62 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2ec162dd83c4..9bd2701be088 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -12064,6 +12064,56 @@ __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct __sk_buff *s, struct sock *sk,
>   
>   __bpf_kfunc_end_defs();
>   
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in vmlinux BTF");
> +
> +/* bpf_skb_acquire - Acquire a reference to an skb. An skb acquired by this
> + * kfunc which is not stored in a map as a kptr, must be released by calling
> + * bpf_skb_release().
> + * @skb: The skb on which a reference is being acquired.
> + */
> +__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb)
> +{
> +	if (refcount_inc_not_zero(&skb->users))

Any reason to use refcount_inc_not_zero instead of refcount_inc here?

> +		return skb;
> +	return NULL;
> +}
> +
> +/* bpf_skb_release - Release the reference acquired on an skb.
> + * @skb: The skb on which a reference is being released.
> + */
> +__bpf_kfunc void bpf_skb_release(struct sk_buff *skb)
> +{
> +	skb_unref(skb);
> +}
> +
> +/* bpf_skb_destroy - Release an skb reference acquired and exchanged into
> + * an allocated object or a map.
> + * @skb: The skb on which a reference is being released.
> + */
> +__bpf_kfunc void bpf_skb_destroy(struct sk_buff *skb)
> +{
> +	(void)skb_unref(skb);
Actually, there might be a dtor that work for both cls and qdisc. This 
skb_unref() seems redundant, consume_skb() already unref once.

> +	consume_skb(skb);

consume_skb() indicates that the skb is consumed, but if the skb here is 
being dropped by dtor. kfree_skb() or maybe  kfree_skb_reason() with a 
proper reason should be better.

Here is the comments for consume_skb() in skbuff.c
  *      Functions identically to kfree_skb, but kfree_skb assumes that 
the frame
  *      is being dropped after a failure and notes that

So the dtor would be basically the same as the qdisc one.

> +}
> +
> +__diag_pop();
> +
> +BTF_KFUNCS_START(skb_kfunc_btf_ids)
> +BTF_ID_FLAGS(func, bpf_skb_acquire, KF_ACQUIRE | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_skb_release, KF_RELEASE)
> +BTF_KFUNCS_END(skb_kfunc_btf_ids)
> +
> +static const struct btf_kfunc_id_set skb_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &skb_kfunc_btf_ids,
> +};
> +
> +BTF_ID_LIST(skb_kfunc_dtor_ids)
> +BTF_ID(struct, sk_buff)
> +BTF_ID_FLAGS(func, bpf_skb_destroy, KF_RELEASE)
> +
>   int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
>   			       struct bpf_dynptr *ptr__uninit)
>   {
> @@ -12117,6 +12167,13 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = {
>   
>   static int __init bpf_kfunc_init(void)
>   {
> +	const struct btf_id_dtor_kfunc skb_kfunc_dtors[] = {
> +		{
> +			.btf_id       = skb_kfunc_dtor_ids[0],
> +			.kfunc_btf_id = skb_kfunc_dtor_ids[1]
> +		},
> +	};
> +
>   	int ret;
>   
>   	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb);
> @@ -12133,6 +12190,11 @@ static int __init bpf_kfunc_init(void)
>   	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
>   	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
>   					       &bpf_kfunc_set_sock_addr);
> +	ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &skb_kfunc_set);
> +
> +	ret = ret ?: register_btf_id_dtor_kfuncs(skb_kfunc_dtors,
> +						 ARRAY_SIZE(skb_kfunc_dtors),
> +						 THIS_MODULE);
>   	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
>   }
>   late_initcall(bpf_kfunc_init);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs
  2025-02-20 13:45 [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Maciej Fijalkowski
                   ` (2 preceding siblings ...)
  2025-02-20 13:45 ` [PATCH bpf-next 3/3] selftests: bpf: implement test case for skb kptr map storage Maciej Fijalkowski
@ 2025-02-22  1:55 ` Alexei Starovoitov
  2025-02-25 14:27   ` Maciej Fijalkowski
  3 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2025-02-22  1:55 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Network Development, Karlsson, Magnus, Martin KaFai Lau

On Thu, Feb 20, 2025 at 5:45 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> Hi!
>
> This patchset provides what is needed for storing skbs as kptrs in bpf
> maps. We start with necessary kernel change as discussed at [0] with
> Martin, then next patch adds kfuncs for handling skb refcount and on top
> of that a test case is added where one program stores skbs and then next
> program is able to retrieve them from map.
>
> Martin, regarding the kernel change I decided to go with boolean
> approach instead of what you initially suggested. Let me know if it
> works for you.
>
> Thanks,
> Maciej
>
> [0]: https://lore.kernel.org/bpf/Z0X%2F9PhIhvQwsgfW@boxer/

Before we go further we need a lot more details on "why" part.
In the old thread I was able to find:

> On TC egress hook skb is stored in a map ...
> During TC ingress hook on the same interface, the skb that was previously
stored in map is retrieved ...

This is too cryptic. I see several concerns with such use case
including netns crossing, L2/L3 mismatch, skb_scrub.

I doubt we can make such "skb stash in a map" safe without
restricting the usage, so please provide detailed
description of the use case.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs
  2025-02-22  1:55 ` [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Alexei Starovoitov
@ 2025-02-25 14:27   ` Maciej Fijalkowski
  2025-02-26 17:30     ` Alexei Starovoitov
  0 siblings, 1 reply; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-25 14:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Network Development, Karlsson, Magnus, Martin KaFai Lau

On Fri, Feb 21, 2025 at 05:55:57PM -0800, Alexei Starovoitov wrote:
> On Thu, Feb 20, 2025 at 5:45 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > Hi!
> >
> > This patchset provides what is needed for storing skbs as kptrs in bpf
> > maps. We start with necessary kernel change as discussed at [0] with
> > Martin, then next patch adds kfuncs for handling skb refcount and on top
> > of that a test case is added where one program stores skbs and then next
> > program is able to retrieve them from map.
> >
> > Martin, regarding the kernel change I decided to go with boolean
> > approach instead of what you initially suggested. Let me know if it
> > works for you.
> >
> > Thanks,
> > Maciej
> >
> > [0]: https://lore.kernel.org/bpf/Z0X%2F9PhIhvQwsgfW@boxer/
> 
> Before we go further we need a lot more details on "why" part.
> In the old thread I was able to find:
> 
> > On TC egress hook skb is stored in a map ...
> > During TC ingress hook on the same interface, the skb that was previously
> stored in map is retrieved ...
> 
> This is too cryptic. I see several concerns with such use case
> including netns crossing, L2/L3 mismatch, skb_scrub.
> 
> I doubt we can make such "skb stash in a map" safe without
> restricting the usage, so please provide detailed
> description of the use case.

Hi Alexei,

We have a system with two nodes: one is a fully fledged Linux system (big node)
and the other one a smaller embedded system. The big node runs Linux PTP for
time synchronization, the smaller node we have no control over (might run Linux
or something else). The problem is that we would like to use the Tx timestamps
from the small node in the Linux PTP application on the big node. When a packet
is sent out from the big node it arrives at the small node that send it out one
of its interfaces. It then replies with another packet back to the big node
with the Tx timestamp in it.

Our current PoC for attacking this is to store the skb in a map (using this
patch set) when it is sent out from the big node then retrieve it from the map
when the reply from the small node is received. We then take the timestamp from
the packet and put it in the skb and send it up to the socket error queue so
that Linux PTP works out of the box.

Note that for the purpose of setting skb->hwtstamp and sending it up to the
error queue we are adding a kfunc (bpf_tx_tstamp) responsible for it, which is
not included in this set, so I understand it is not obvious what we achieved
with the current form of this patch set being discussed.

We did not consider the restrictions that should be implemented from netns POV,
so that is a good point. How would you see this being fixed?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs
  2025-02-25 14:27   ` Maciej Fijalkowski
@ 2025-02-26 17:30     ` Alexei Starovoitov
  2025-02-27 18:40       ` Maciej Fijalkowski
  0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2025-02-26 17:30 UTC (permalink / raw)
  To: Maciej Fijalkowski, Jakub Kicinski
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Network Development, Karlsson, Magnus, Martin KaFai Lau,
	Amery Hung

On Tue, Feb 25, 2025 at 6:27 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Fri, Feb 21, 2025 at 05:55:57PM -0800, Alexei Starovoitov wrote:
> > On Thu, Feb 20, 2025 at 5:45 AM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > Hi!
> > >
> > > This patchset provides what is needed for storing skbs as kptrs in bpf
> > > maps. We start with necessary kernel change as discussed at [0] with
> > > Martin, then next patch adds kfuncs for handling skb refcount and on top
> > > of that a test case is added where one program stores skbs and then next
> > > program is able to retrieve them from map.
> > >
> > > Martin, regarding the kernel change I decided to go with boolean
> > > approach instead of what you initially suggested. Let me know if it
> > > works for you.
> > >
> > > Thanks,
> > > Maciej
> > >
> > > [0]: https://lore.kernel.org/bpf/Z0X%2F9PhIhvQwsgfW@boxer/
> >
> > Before we go further we need a lot more details on "why" part.
> > In the old thread I was able to find:
> >
> > > On TC egress hook skb is stored in a map ...
> > > During TC ingress hook on the same interface, the skb that was previously
> > stored in map is retrieved ...
> >
> > This is too cryptic. I see several concerns with such use case
> > including netns crossing, L2/L3 mismatch, skb_scrub.
> >
> > I doubt we can make such "skb stash in a map" safe without
> > restricting the usage, so please provide detailed
> > description of the use case.
>
> Hi Alexei,
>
> We have a system with two nodes: one is a fully fledged Linux system (big node)
> and the other one a smaller embedded system. The big node runs Linux PTP for
> time synchronization, the smaller node we have no control over (might run Linux
> or something else). The problem is that we would like to use the Tx timestamps
> from the small node in the Linux PTP application on the big node. When a packet
> is sent out from the big node it arrives at the small node that send it out one
> of its interfaces. It then replies with another packet back to the big node
> with the Tx timestamp in it.
>
> Our current PoC for attacking this is to store the skb in a map (using this
> patch set) when it is sent out from the big node then retrieve it from the map
> when the reply from the small node is received. We then take the timestamp from
> the packet and put it in the skb and send it up to the socket error queue so
> that Linux PTP works out of the box.

This sounds like you're actually xmit-ing the skb out of the big node
and storing it in a map via simple refcnt++.
That may work in some setups, but in general is not quite correct
from networking stack pov.
You need to skb_clone() it and keep the clone, so only cloned skb
can go into the socket error queue and up to user space.
xmit-ing the same skb and sending to user space
is going to cause issues.

Cleaner design would probably involve bpf_clone_redirect()
and may be some form of bpf-qdisc where packet is waiting in the queue
until its hwtimestamp is adjusted and its released from the queue
into user space.

What happens when small node doesn't send that timestamped packet?
The map will eventually overflow, right?
Overall it feels that stashing skb-s in a map isn't the right approach.

> Note that for the purpose of setting skb->hwtstamp and sending it up to the
> error queue we are adding a kfunc (bpf_tx_tstamp) responsible for it, which is
> not included in this set, so I understand it is not obvious what we achieved
> with the current form of this patch set being discussed.
>
> We did not consider the restrictions that should be implemented from netns POV,
> so that is a good point. How would you see this being fixed?
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs
  2025-02-26 17:30     ` Alexei Starovoitov
@ 2025-02-27 18:40       ` Maciej Fijalkowski
  0 siblings, 0 replies; 14+ messages in thread
From: Maciej Fijalkowski @ 2025-02-27 18:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jakub Kicinski, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Network Development, Karlsson, Magnus,
	Martin KaFai Lau, Amery Hung

On Wed, Feb 26, 2025 at 09:30:18AM -0800, Alexei Starovoitov wrote:
> On Tue, Feb 25, 2025 at 6:27 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Fri, Feb 21, 2025 at 05:55:57PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Feb 20, 2025 at 5:45 AM Maciej Fijalkowski
> > > <maciej.fijalkowski@intel.com> wrote:
> > > >
> > > > Hi!
> > > >
> > > > This patchset provides what is needed for storing skbs as kptrs in bpf
> > > > maps. We start with necessary kernel change as discussed at [0] with
> > > > Martin, then next patch adds kfuncs for handling skb refcount and on top
> > > > of that a test case is added where one program stores skbs and then next
> > > > program is able to retrieve them from map.
> > > >
> > > > Martin, regarding the kernel change I decided to go with boolean
> > > > approach instead of what you initially suggested. Let me know if it
> > > > works for you.
> > > >
> > > > Thanks,
> > > > Maciej
> > > >
> > > > [0]: https://lore.kernel.org/bpf/Z0X%2F9PhIhvQwsgfW@boxer/
> > >
> > > Before we go further we need a lot more details on "why" part.
> > > In the old thread I was able to find:
> > >
> > > > On TC egress hook skb is stored in a map ...
> > > > During TC ingress hook on the same interface, the skb that was previously
> > > stored in map is retrieved ...
> > >
> > > This is too cryptic. I see several concerns with such use case
> > > including netns crossing, L2/L3 mismatch, skb_scrub.
> > >
> > > I doubt we can make such "skb stash in a map" safe without
> > > restricting the usage, so please provide detailed
> > > description of the use case.
> >
> > Hi Alexei,
> >
> > We have a system with two nodes: one is a fully fledged Linux system (big node)
> > and the other one a smaller embedded system. The big node runs Linux PTP for
> > time synchronization, the smaller node we have no control over (might run Linux
> > or something else). The problem is that we would like to use the Tx timestamps
> > from the small node in the Linux PTP application on the big node. When a packet
> > is sent out from the big node it arrives at the small node that send it out one
> > of its interfaces. It then replies with another packet back to the big node
> > with the Tx timestamp in it.
> >
> > Our current PoC for attacking this is to store the skb in a map (using this
> > patch set) when it is sent out from the big node then retrieve it from the map
> > when the reply from the small node is received. We then take the timestamp from
> > the packet and put it in the skb and send it up to the socket error queue so
> > that Linux PTP works out of the box.
> 
> This sounds like you're actually xmit-ing the skb out of the big node
> and storing it in a map via simple refcnt++.
> That may work in some setups, but in general is not quite correct
> from networking stack pov.
> You need to skb_clone() it and keep the clone, so only cloned skb
> can go into the socket error queue and up to user space.
> xmit-ing the same skb and sending to user space
> is going to cause issues.

This skb only goes to errqueue and then gets its refcount decremented
and dropped, we do not xmit that skb per-se.

> 
> Cleaner design would probably involve bpf_clone_redirect()
> and may be some form of bpf-qdisc where packet is waiting in the queue
> until its hwtimestamp is adjusted and its released from the queue
> into user space.

We were working on clone initially but map storage turned out to be much
cleaner approach, but we can re-iterate on that in case explanation
provided above regarding not xmitting stored skb still feels off for you.

> 
> What happens when small node doesn't send that timestamped packet?
> The map will eventually overflow, right?
> Overall it feels that stashing skb-s in a map isn't the right approach.

We did not handle that in our demo but I believe BPF timers would be
useful here, wouldn't they?

> 
> > Note that for the purpose of setting skb->hwtstamp and sending it up to the
> > error queue we are adding a kfunc (bpf_tx_tstamp) responsible for it, which is
> > not included in this set, so I understand it is not obvious what we achieved
> > with the current form of this patch set being discussed.
> >
> > We did not consider the restrictions that should be implemented from netns POV,
> > so that is a good point. How would you see this being fixed?
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-02-27 18:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-20 13:45 [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Maciej Fijalkowski
2025-02-20 13:45 ` [PATCH bpf-next 1/3] bpf: call btf_is_projection_of() conditionally Maciej Fijalkowski
2025-02-20 13:45 ` [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting Maciej Fijalkowski
2025-02-20 23:25   ` Amery Hung
2025-02-21 14:56     ` Maciej Fijalkowski
2025-02-21 19:11       ` Amery Hung
2025-02-21 20:17         ` Maciej Fijalkowski
2025-02-21 23:40           ` Amery Hung
2025-02-21 23:57   ` Amery Hung
2025-02-20 13:45 ` [PATCH bpf-next 3/3] selftests: bpf: implement test case for skb kptr map storage Maciej Fijalkowski
2025-02-22  1:55 ` [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Alexei Starovoitov
2025-02-25 14:27   ` Maciej Fijalkowski
2025-02-26 17:30     ` Alexei Starovoitov
2025-02-27 18:40       ` Maciej Fijalkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).