* [PATCH RFC bpf-next 1/5] bpf: Introduce local storage for sk_buff
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
@ 2026-02-26 21:12 ` Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 2/5] bpf: Allow passing kernel context pointer to kfuncs Jakub Sitnicki
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Jakub Sitnicki @ 2026-02-26 21:12 UTC (permalink / raw)
To: bpf; +Cc: Jakub Kicinski, Martin KaFai Lau, netdev, kernel-team
BPF local storage exists for sk, task, cgroup, and inode objects, but not
for sk_buff. Add per-packet local storage so BPF programs can associate
key-value data with individual packets as they traverse the network stack.
Back the storage with a new skb extension (SKB_EXT_BPF_STORAGE) that holds
a bpf_local_storage pointer. The storage is automatically freed when the
sk_buff is released.
Expose the storage through a new map type BPF_MAP_TYPE_SKB_STORAGE (gated
by CONFIG_BPF_SKB_STORAGE) and two kfuncs modeled after existing local
storage helpers:
- bpf_skb_storage_get() - look up or create local storage for an skb
- bpf_skb_storage_delete() - delete local storage for an skb
Register the kfuncs for all prog types with access to a trusted pointer to
sk_buff. For LSM progs, reject sleepable programs for now; this requires
further changes to the verifier to pass gfp_flags to kfuncs. Skip
sk_reuseport and flow_dissector, which lack an associated kfunc hook.
Impose the following limitations with intention to lift them later:
1. skb clones can't access storage created by the original skb
Refuse storage access for cloned skbs when the skb extension is shared
between the original and its clones. This avoids corrupting state visible
to other skb owners. In the future, local storage can be copied on clone
when the user requests it with a BPF_F_CLONE map flag.
2. BPF local storage is not copied on skb_ext copy-on-write
On skb_ext COW — triggered when a new extension is activated while skb_ext
has multiple owners — reset the local storage pointer to NULL to avoid
sharing state between the original and the copy for now. In the future, a
copy of BPF local storage can be made as part of the skb_ext COW process.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
include/linux/bpf_types.h | 3 +
include/linux/skbuff.h | 3 +
include/net/bpf_skb_storage.h | 21 ++++
include/uapi/linux/bpf.h | 1 +
kernel/bpf/syscall.c | 1 +
kernel/bpf/verifier.c | 38 ++++++
net/Kconfig | 10 ++
net/core/Makefile | 1 +
net/core/bpf_skb_storage.c | 264 ++++++++++++++++++++++++++++++++++++++++++
net/core/skbuff.c | 15 +++
10 files changed, 357 insertions(+)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b13de31e163f..8c3038a93b35 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -134,6 +134,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_ARENA, arena_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_INSN_ARRAY, insn_array_map_ops)
+#ifdef CONFIG_BPF_SKB_STORAGE
+BPF_MAP_TYPE(BPF_MAP_TYPE_SKB_STORAGE, skb_storage_map_ops)
+#endif
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index daa4e4944ce3..5f6d721bc075 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -5004,6 +5004,9 @@ enum skb_ext_id {
#endif
#if IS_ENABLED(CONFIG_CAN)
SKB_EXT_CAN,
+#endif
+#if IS_ENABLED(CONFIG_BPF_SKB_STORAGE)
+ SKB_EXT_BPF_STORAGE,
#endif
SKB_EXT_NUM, /* must be last */
};
diff --git a/include/net/bpf_skb_storage.h b/include/net/bpf_skb_storage.h
new file mode 100644
index 000000000000..982467f61e7e
--- /dev/null
+++ b/include/net/bpf_skb_storage.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2026 Cloudflare, Inc. */
+
+#ifndef _BPF_SKB_STORAGE_H
+#define _BPF_SKB_STORAGE_H
+
+#ifdef CONFIG_BPF_SKB_STORAGE
+
+#include <linux/compiler_types.h>
+
+struct bpf_local_storage;
+
+struct bpf_skb_storage_ext {
+ struct bpf_local_storage __rcu *storage;
+};
+
+void bpf_skb_storage_free(struct bpf_skb_storage_ext *ext);
+
+#endif /* CONFIG_BPF_SKB_STORAGE */
+
+#endif /* _BPF_SKB_STORAGE_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c8d400b7680a..2b2c22c7992c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1046,6 +1046,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_CGRP_STORAGE,
BPF_MAP_TYPE_ARENA,
BPF_MAP_TYPE_INSN_ARRAY,
+ BPF_MAP_TYPE_SKB_STORAGE,
__MAX_BPF_MAP_TYPE
};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index dd89bf809772..e26e24481e19 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1488,6 +1488,7 @@ static int map_create(union bpf_attr *attr, bpfptr_t uattr)
case BPF_MAP_TYPE_CPUMAP:
case BPF_MAP_TYPE_ARENA:
case BPF_MAP_TYPE_INSN_ARRAY:
+ case BPF_MAP_TYPE_SKB_STORAGE:
if (!bpf_token_capable(token, CAP_BPF))
goto put_token;
break;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0162f946032f..113e2eaec4db 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10207,6 +10207,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
func_id != BPF_FUNC_map_push_elem)
goto error;
break;
+ case BPF_MAP_TYPE_SKB_STORAGE:
+ if (func_id != BPF_FUNC_kptr_xchg)
+ goto error;
+ break;
case BPF_MAP_TYPE_INSN_ARRAY:
goto error;
default:
@@ -12461,6 +12465,8 @@ enum special_kfunc_type {
KF_bpf_session_is_return,
KF_bpf_stream_vprintk,
KF_bpf_stream_print_stack,
+ KF_bpf_skb_storage_get,
+ KF_bpf_skb_storage_delete,
};
BTF_ID_LIST(special_kfunc_list)
@@ -12541,6 +12547,13 @@ BTF_ID(func, bpf_arena_reserve_pages)
BTF_ID(func, bpf_session_is_return)
BTF_ID(func, bpf_stream_vprintk)
BTF_ID(func, bpf_stream_print_stack)
+#ifdef CONFIG_BPF_SKB_STORAGE
+BTF_ID(func, bpf_skb_storage_get)
+BTF_ID(func, bpf_skb_storage_delete)
+#else
+BTF_ID_UNUSED
+BTF_ID_UNUSED
+#endif
static bool is_task_work_add_kfunc(u32 func_id)
{
@@ -13259,6 +13272,22 @@ static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
}
}
+static int check_kfunc_map_compatibility(struct bpf_verifier_env *env,
+ struct bpf_kfunc_call_arg_meta *meta,
+ struct bpf_map *map)
+{
+ if ((meta->func_id == special_kfunc_list[KF_bpf_skb_storage_get] ||
+ meta->func_id == special_kfunc_list[KF_bpf_skb_storage_delete]) &&
+ map->map_type != BPF_MAP_TYPE_SKB_STORAGE)
+ goto error;
+
+ return 0;
+error:
+ verbose(env, "cannot pass map_type %d into func %s#%d\n",
+ map->map_type, meta->func_name, meta->func_id);
+ return -EINVAL;
+}
+
static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta,
int insn_idx)
{
@@ -13418,6 +13447,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
}
meta->map.ptr = reg->map_ptr;
meta->map.uid = reg->map_uid;
+ ret = check_kfunc_map_compatibility(env, meta, meta->map.ptr);
+ if (ret < 0)
+ return ret;
fallthrough;
case KF_ARG_PTR_TO_ALLOC_BTF_ID:
case KF_ARG_PTR_TO_BTF_ID:
@@ -14005,6 +14037,12 @@ static int check_special_kfunc(struct bpf_verifier_env *env, struct bpf_kfunc_ca
* because packet slices are not refcounted (see
* dynptr_type_refcounted)
*/
+ } else if (meta->func_id == special_kfunc_list[KF_bpf_skb_storage_get]) {
+ mark_reg_known_zero(env, regs, BPF_REG_0);
+ regs[BPF_REG_0].type = PTR_TO_MAP_VALUE;
+ regs[BPF_REG_0].map_ptr = meta->map.ptr;
+ regs[BPF_REG_0].map_uid = meta->map.uid;
+ /* PTR_MAYBE_NULL will be added when is_kfunc_ret_null is checked */
} else {
return 0;
}
diff --git a/net/Kconfig b/net/Kconfig
index 62266eaf0e95..68ad0592c134 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -543,4 +543,14 @@ config NET_TEST
If unsure, say N.
+config BPF_SKB_STORAGE
+ bool "BPF local storage for sk_buff"
+ depends on BPF_SYSCALL
+ select SKB_EXTENSIONS
+ help
+ Enable an sk_buff extension for BPF local storage. This allows BPF
+ programs to associate arbitrary data with individual packets as they
+ traverse the network stack. The storage is automatically freed when
+ the sk_buff is freed.
+
endif # if NET
diff --git a/net/core/Makefile b/net/core/Makefile
index dc17c5a61e9a..32f7335236d0 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -49,3 +49,4 @@ obj-$(CONFIG_NET_TEST) += net_test.o
obj-$(CONFIG_NET_DEVMEM) += devmem.o
obj-$(CONFIG_DEBUG_NET) += lock_debug.o
obj-$(CONFIG_FAIL_SKB_REALLOC) += skb_fault_injection.o
+obj-$(CONFIG_BPF_SKB_STORAGE) += bpf_skb_storage.o
diff --git a/net/core/bpf_skb_storage.c b/net/core/bpf_skb_storage.c
new file mode 100644
index 000000000000..977d5c92c0ff
--- /dev/null
+++ b/net/core/bpf_skb_storage.c
@@ -0,0 +1,264 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Cloudflare, Inc. */
+
+#include <linux/bpf.h>
+#include <linux/bpf_local_storage.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+
+#include <net/bpf_skb_storage.h>
+
+DEFINE_BPF_STORAGE_CACHE(skb_cache);
+
+static int skb_storage_map_alloc_check(union bpf_attr *attr)
+{
+ /* Don't allow BPF_F_CLONE yet. Requires skb_ext copy on clone. */
+ if (attr->map_flags & ~BPF_F_NO_PREALLOC)
+ return -EINVAL;
+
+ return bpf_local_storage_map_alloc_check(attr);
+}
+
+static struct bpf_map *skb_storage_map_alloc(union bpf_attr *attr)
+{
+ return bpf_local_storage_map_alloc(attr, &skb_cache, false);
+}
+
+static void skb_storage_map_free(struct bpf_map *map)
+{
+ bpf_local_storage_map_free(map, &skb_cache);
+}
+
+static struct bpf_local_storage __rcu **skb_storage_ptr(void *owner)
+{
+ struct bpf_skb_storage_ext *ext = owner;
+
+ return &ext->storage;
+}
+
+static int notsupp_get_next_key(struct bpf_map *map, void *key,
+ void *next_key)
+{
+ return -EOPNOTSUPP;
+}
+
+static void *notsupp_lookup_elem(struct bpf_map *map, void *key)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+static long notsupp_update_elem(struct bpf_map *map, void *key,
+ void *value, u64 flags)
+{
+ return -EOPNOTSUPP;
+}
+
+static long notsupp_delete_elem(struct bpf_map *map, void *key)
+{
+ return -EOPNOTSUPP;
+}
+
+const struct bpf_map_ops skb_storage_map_ops = {
+ .map_meta_equal = bpf_map_meta_equal,
+ .map_alloc_check = skb_storage_map_alloc_check,
+ .map_alloc = skb_storage_map_alloc,
+ .map_free = skb_storage_map_free,
+ .map_get_next_key = notsupp_get_next_key,
+ .map_lookup_elem = notsupp_lookup_elem,
+ .map_update_elem = notsupp_update_elem,
+ .map_delete_elem = notsupp_delete_elem,
+ .map_check_btf = bpf_local_storage_map_check_btf,
+ .map_mem_usage = bpf_local_storage_map_mem_usage,
+ .map_owner_storage_ptr = skb_storage_ptr,
+ .map_btf_id = &bpf_local_storage_map_btf_id[0],
+};
+
+static struct bpf_local_storage_data *
+skb_storage_lookup(struct sk_buff *skb, struct bpf_map *map, bool cacheit_lockit)
+{
+ struct bpf_local_storage_map *smap = (typeof(smap))map;
+ struct bpf_local_storage *storage;
+ struct bpf_skb_storage_ext *ext;
+
+ ext = skb_ext_find(skb, SKB_EXT_BPF_STORAGE);
+ if (!ext)
+ return NULL;
+
+ storage = rcu_dereference_check(ext->storage, bpf_rcu_lock_held());
+ if (!storage)
+ return NULL;
+
+ return bpf_local_storage_lookup(storage, smap, cacheit_lockit);
+}
+
+void bpf_skb_storage_free(struct bpf_skb_storage_ext *ext)
+{
+ struct bpf_local_storage *storage;
+
+ rcu_read_lock_dont_migrate();
+ storage = rcu_dereference(ext->storage);
+ if (storage)
+ bpf_local_storage_destroy(storage);
+ rcu_read_unlock_migrate();
+}
+
+static bool is_skb_storage_shared(const struct sk_buff *skb)
+{
+ return skb_ext_exist(skb, SKB_EXT_BPF_STORAGE) &&
+ refcount_read(&skb->extensions->refcnt) != 1;
+}
+
+__bpf_kfunc_start_defs();
+
+/**
+ * bpf_skb_storage_get() - Get or create local storage for an skb
+ * @map: BPF map of type BPF_MAP_TYPE_SKB_STORAGE
+ * @skb: Socket buffer to get storage for
+ * @value: Initial value to set if creating new storage (can be NULL)
+ * @flags: BPF_LOCAL_STORAGE_GET_F_CREATE to create if not exists
+ *
+ * Get the local storage associated with @skb for @map. If @flags contains
+ * BPF_LOCAL_STORAGE_GET_F_CREATE and no storage exists, create new storage
+ * initialized with @value (or zeroed if @value is NULL).
+ *
+ * Return: Pointer to storage value on success, NULL on error
+ */
+__bpf_kfunc void *bpf_skb_storage_get(struct bpf_map *map__map,
+ struct sk_buff *skb,
+ void *value__nullable, u32 value__szk,
+ u64 flags)
+{
+ struct bpf_local_storage_map *map = (typeof(map))map__map;
+ struct bpf_local_storage_data *sdata;
+ struct bpf_skb_storage_ext *ext;
+
+ WARN_ON_ONCE(!bpf_rcu_lock_held());
+
+ if (!skb || flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE)
+ goto fail; /* EINVAL */
+ if (in_hardirq() || in_nmi())
+ goto fail; /* EPERM - requires kmalloc_nolock */
+ if (skb->cloned && is_skb_storage_shared(skb))
+ goto fail; /* EOPNOTSUPP */
+
+ sdata = skb_storage_lookup(skb, map__map, true);
+ if (sdata)
+ return sdata->data;
+ if (!(flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
+ goto fail; /* ENOENT */
+
+ ext = skb_ext_find(skb, SKB_EXT_BPF_STORAGE);
+ if (!ext) {
+ ext = skb_ext_add(skb, SKB_EXT_BPF_STORAGE);
+ if (!ext)
+ goto fail; /* ENOMEM */
+ RCU_INIT_POINTER(ext->storage, NULL);
+ }
+
+ sdata = bpf_local_storage_update(ext, map, value__nullable, BPF_NOEXIST,
+ false, GFP_ATOMIC);
+ if (IS_ERR(sdata))
+ goto fail;
+
+ return sdata->data;
+fail:
+ return NULL;
+}
+
+/**
+ * bpf_skb_storage_delete() - Delete local storage for an skb
+ * @map: BPF map of type BPF_MAP_TYPE_SKB_STORAGE
+ * @skb: Socket buffer to delete storage from
+ *
+ * Delete the local storage associated with @skb for @map.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+__bpf_kfunc int bpf_skb_storage_delete(struct bpf_map *map__map,
+ struct sk_buff *skb)
+{
+ struct bpf_local_storage_data *sdata;
+
+ WARN_ON_ONCE(!bpf_rcu_lock_held());
+ if (!skb)
+ return -EINVAL;
+ if (in_hardirq() || in_nmi())
+ return -EPERM;
+ if (skb->cloned && is_skb_storage_shared(skb))
+ return -EOPNOTSUPP;
+
+ sdata = skb_storage_lookup(skb, map__map, false);
+ if (!sdata)
+ return -ENOENT;
+
+ return bpf_selem_unlink(SELEM(sdata));
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(skb_storage_kfunc_ids)
+BTF_ID_FLAGS(func, bpf_skb_storage_get, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_skb_storage_delete)
+BTF_KFUNCS_END(skb_storage_kfunc_ids)
+
+static int skb_storage_tracing_kfunc_filter(const struct bpf_prog *prog,
+ u32 kfunc_id)
+{
+ /* Disabled until verifier can pass gfp_flags to kfuncs */
+ if (prog->sleepable)
+ return -EACCES;
+ /* Allow only progs with trusted pointers */
+ if (prog->type != BPF_PROG_TYPE_LSM &&
+ prog->type != BPF_PROG_TYPE_TRACING)
+ return -EACCES;
+ return 0;
+}
+
+static const struct btf_kfunc_id_set skb_storage_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &skb_storage_kfunc_ids,
+};
+
+static const struct btf_kfunc_id_set skb_storage_tracing_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &skb_storage_kfunc_ids,
+ .filter = skb_storage_tracing_kfunc_filter,
+};
+
+static int __init bpf_skb_storage_kfunc_init(void)
+{
+ int ret = 0;
+
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCKET_FILTER,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SK_SKB,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_OUT,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_IN,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_XMIT,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER,
+ &skb_storage_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
+ &skb_storage_kfunc_set);
+
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LSM,
+ &skb_storage_tracing_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING,
+ &skb_storage_tracing_kfunc_set);
+
+ return ret;
+}
+late_initcall(bpf_skb_storage_kfunc_init);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 699c401a5eae..9d3d680441ad 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -83,6 +83,7 @@
#include <net/psp/types.h>
#include <net/dropreason.h>
#include <net/xdp_sock.h>
+#include <net/bpf_skb_storage.h>
#include <linux/uaccess.h>
#include <trace/events/skb.h>
@@ -5143,6 +5144,9 @@ static const u8 skb_ext_type_len[] = {
#if IS_ENABLED(CONFIG_CAN)
[SKB_EXT_CAN] = SKB_EXT_CHUNKSIZEOF(struct can_skb_ext),
#endif
+#if IS_ENABLED(CONFIG_BPF_SKB_STORAGE)
+ [SKB_EXT_BPF_STORAGE] = SKB_EXT_CHUNKSIZEOF(struct bpf_skb_storage_ext),
+#endif
};
static __always_inline unsigned int skb_ext_total_length(void)
@@ -7099,6 +7103,13 @@ static struct skb_ext *skb_ext_maybe_cow(struct skb_ext *old,
if (flow->key)
refcount_inc(&flow->key->refs);
}
+#endif
+#ifdef CONFIG_BPF_SKB_STORAGE
+ if (old_active & (1 << SKB_EXT_BPF_STORAGE)) {
+ struct bpf_skb_storage_ext *ext = skb_ext_get_ptr(new, SKB_EXT_BPF_STORAGE);
+
+ RCU_INIT_POINTER(ext->storage, NULL);
+ }
#endif
__skb_ext_put(old);
return new;
@@ -7235,6 +7246,10 @@ void __skb_ext_put(struct skb_ext *ext)
if (__skb_ext_exist(ext, SKB_EXT_MCTP))
skb_ext_put_mctp(skb_ext_get_ptr(ext, SKB_EXT_MCTP));
#endif
+#ifdef CONFIG_BPF_SKB_STORAGE
+ if (__skb_ext_exist(ext, SKB_EXT_BPF_STORAGE))
+ bpf_skb_storage_free(skb_ext_get_ptr(ext, SKB_EXT_BPF_STORAGE));
+#endif
kmem_cache_free(skbuff_ext_cache, ext);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH RFC bpf-next 2/5] bpf: Allow passing kernel context pointer to kfuncs
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 1/5] bpf: Introduce local storage for sk_buff Jakub Sitnicki
@ 2026-02-26 21:12 ` Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 3/5] bpf: Allow access to bpf_sock_ops_kern->skb Jakub Sitnicki
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Jakub Sitnicki @ 2026-02-26 21:12 UTC (permalink / raw)
To: bpf; +Cc: Jakub Kicinski, Martin KaFai Lau, netdev, kernel-team
bpf_cast_to_kern_ctx() returns a trusted PTR_TO_BTF_ID pointing to the
kernel-side context struct (e.g., sk_buff for TC programs). Passing this
pointer to a kfunc that expects KF_ARG_PTR_TO_CTX fails verification,
because the check accepts only PTR_TO_CTX.
Relax the check to also accept a trusted PTR_TO_BTF_ID whose btf_id matches
the kernel context BTF ID for the program type. Introduce is_kern_ctx_ptr()
to perform this check.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
kernel/bpf/verifier.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 113e2eaec4db..017071197466 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6474,6 +6474,27 @@ static bool is_trusted_reg(const struct bpf_reg_state *reg)
!bpf_type_has_unsafe_modifiers(reg->type);
}
+static bool is_kern_ctx_ptr(struct bpf_verifier_env *env,
+ const struct bpf_reg_state *reg)
+{
+ int kctx_id;
+
+ if (base_type(reg->type) != PTR_TO_BTF_ID)
+ return false;
+ if (!tnum_is_const(reg->var_off))
+ return false;
+ if (!is_trusted_reg(reg))
+ return false;
+
+ kctx_id = get_kern_ctx_btf_id(&env->log, resolve_prog_type(env->prog));
+ if (kctx_id < 0)
+ return false;
+
+ return btf_struct_ids_match(&env->log, reg->btf, reg->btf_id,
+ reg->var_off.value, btf_vmlinux, kctx_id,
+ true);
+}
+
static bool is_rcu_reg(const struct bpf_reg_state *reg)
{
return reg->type & MEM_RCU;
@@ -13495,7 +13516,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
switch (kf_arg_type) {
case KF_ARG_PTR_TO_CTX:
- if (reg->type != PTR_TO_CTX) {
+ if (reg->type != PTR_TO_CTX &&
+ !is_kern_ctx_ptr(env, reg)) {
verbose(env, "arg#%d expected pointer to ctx, but got %s\n",
i, reg_type_str(env, reg->type));
return -EINVAL;
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH RFC bpf-next 3/5] bpf: Allow access to bpf_sock_ops_kern->skb
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 1/5] bpf: Introduce local storage for sk_buff Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 2/5] bpf: Allow passing kernel context pointer to kfuncs Jakub Sitnicki
@ 2026-02-26 21:12 ` Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 4/5] selftests/bpf: Add verifier tests for skb local storage Jakub Sitnicki
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Jakub Sitnicki @ 2026-02-26 21:12 UTC (permalink / raw)
To: bpf; +Cc: Jakub Kicinski, Martin KaFai Lau, netdev, kernel-team
sock_ops programs receive bpf_sock_ops_kern as their kernel context, which
holds a pointer to the sk_buff being processed. Mark bpf_sock_ops_kern->skb
as BTF_TYPE_SAFE_TRUSTED_OR_NULL so that BPF programs can dereference it
and pass it to kfuncs expecting a trusted sk_buff pointer, such as
bpf_skb_storage_get().
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
kernel/bpf/verifier.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 017071197466..378ff9dd450f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7257,6 +7257,10 @@ BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct) {
struct file *vm_file;
};
+BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct bpf_sock_ops_kern) {
+ struct sk_buff *skb;
+};
+
static bool type_is_rcu(struct bpf_verifier_env *env,
struct bpf_reg_state *reg,
const char *field_name, u32 btf_id)
@@ -7299,6 +7303,7 @@ static bool type_is_trusted_or_null(struct bpf_verifier_env *env,
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket));
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct dentry));
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct));
+ BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct bpf_sock_ops_kern));
return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id,
"__safe_trusted_or_null");
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH RFC bpf-next 4/5] selftests/bpf: Add verifier tests for skb local storage
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
` (2 preceding siblings ...)
2026-02-26 21:12 ` [PATCH RFC bpf-next 3/5] bpf: Allow access to bpf_sock_ops_kern->skb Jakub Sitnicki
@ 2026-02-26 21:12 ` Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 5/5] selftests/bpf: Add functional " Jakub Sitnicki
2026-02-26 21:56 ` [PATCH RFC bpf-next 0/5] skb extension for BPF " Alexei Starovoitov
5 siblings, 0 replies; 10+ messages in thread
From: Jakub Sitnicki @ 2026-02-26 21:12 UTC (permalink / raw)
To: bpf; +Cc: Jakub Kicinski, Martin KaFai Lau, netdev, kernel-team
Cover verifier checks for the bpf_skb_storage_get and
bpf_skb_storage_delete kfuncs.
Negative tests verify argument validation:
- not_a_map_on_{get,delete}: a non-map pointer as the first argument is
rejected
- wrong_map_type_on_{get,delete}: passing a BPF_MAP_TYPE_HASH map instead
of BPF_MAP_TYPE_SKB_STORAGE is rejected
Positive tests verify the kfuncs are callable from all registered program
types: socket filter, TC classifier, TC action, cgroup skb, sock_ops,
sk_skb, LWT (in/out/xmit/seg6local), netfilter, struct_ops, LSM, and raw
tracepoint.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
tools/testing/selftests/bpf/prog_tests/verifier.c | 2 +
.../selftests/bpf/progs/verifier_skb_storage.c | 209 +++++++++++++++++++++
2 files changed, 211 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index 8cdfd74c95d7..c82e0d48ba34 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -114,6 +114,7 @@
#include "verifier_bits_iter.skel.h"
#include "verifier_lsm.skel.h"
#include "verifier_jit_inline.skel.h"
+#include "verifier_skb_storage.skel.h"
#include "irq.skel.h"
#define MAX_ENTRIES 11
@@ -259,6 +260,7 @@ void test_verifier_lsm(void) { RUN(verifier_lsm); }
void test_irq(void) { RUN(irq); }
void test_verifier_mtu(void) { RUN(verifier_mtu); }
void test_verifier_jit_inline(void) { RUN(verifier_jit_inline); }
+void test_verifier_skb_storage(void) { RUN(verifier_skb_storage); }
static int init_test_val_map(struct bpf_object *obj, char *map_name)
{
diff --git a/tools/testing/selftests/bpf/progs/verifier_skb_storage.c b/tools/testing/selftests/bpf/progs/verifier_skb_storage.c
new file mode 100644
index 000000000000..cd4d817232de
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_skb_storage.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Cloudflare, Inc. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __type(key, int);
+ __type(value, int);
+ __uint(max_entries, 1);
+} hash_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_SKB_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, int);
+} storage_map SEC(".maps");
+
+SEC("socket")
+__failure __msg("pointer in R1 isn't map pointer")
+int not_a_map_on_get(struct __sk_buff *ctx)
+{
+ (void)bpf_skb_storage_get((void *)ctx, (void *)&storage_map, NULL, 0, 0);
+ return 0;
+}
+
+SEC("socket")
+__failure __msg("pointer in R1 isn't map pointer")
+int not_a_map_on_delete(struct __sk_buff *ctx)
+{
+ (void)bpf_skb_storage_get((void *)ctx, (void *)&storage_map, NULL, 0, 0);
+ return 0;
+}
+
+SEC("socket")
+__failure __msg("cannot pass map_type 1 into func bpf_skb_storage_get")
+int wrong_map_type_on_get(struct __sk_buff *ctx)
+{
+ (void)bpf_skb_storage_get((struct bpf_map *)&hash_map,
+ bpf_cast_to_kern_ctx(ctx), NULL, 0, 0);
+ return 0;
+}
+
+SEC("socket")
+__failure __msg("cannot pass map_type 1 into func bpf_skb_storage_delete")
+int wrong_map_type_on_delete(struct __sk_buff *ctx)
+{
+ (void)bpf_skb_storage_delete((struct bpf_map *)&hash_map,
+ bpf_cast_to_kern_ctx(ctx));
+ return 0;
+}
+
+static __always_inline int call_skb_storage_kfuncs(struct sk_buff *skb)
+{
+ struct bpf_map *map = (typeof(map))&storage_map;
+
+ (void)bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ (void)bpf_skb_storage_delete(map, skb);
+ return 0;
+}
+
+SEC("socket")
+__success
+int access_socket_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("classifier")
+__success
+int access_tc_cls_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("action")
+__success
+int access_tc_act_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("cgroup_skb/egress")
+__success
+int access_cgroup_skb_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("sockops")
+__success
+int access_sockops_prog(struct bpf_sock_ops *ctx)
+{
+ struct bpf_sock_ops_kern *kctx = bpf_cast_to_kern_ctx(ctx);
+ struct sk_buff *skb = kctx->skb;
+
+ return skb ? call_skb_storage_kfuncs(skb) : 0;
+}
+
+SEC("sk_skb")
+__success
+int access_sk_skb_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("lwt_in")
+__success
+int access_lwt_in_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("lwt_out")
+__success
+int access_lwt_out_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("lwt_seg6local")
+__success
+int access_lwt_seg6local_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("lwt_xmit")
+__success
+int access_lwt_xmit_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("netfilter")
+__success
+int access_netfilter_prog(struct bpf_nf_ctx *ctx)
+{
+ return call_skb_storage_kfuncs(ctx->skb);
+}
+
+SEC("struct_ops/bpf_fq_enqueue")
+__success
+int BPF_PROG(access_struct_ops_prog, struct sk_buff *skb, struct Qdisc *sch,
+ struct bpf_sk_buff_ptr *to_free)
+{
+ call_skb_storage_kfuncs(skb);
+ bpf_qdisc_skb_drop(skb, to_free);
+ return 0;
+}
+
+SEC(".struct_ops")
+struct Qdisc_ops qdisc_ops = {
+ .enqueue = (void *)access_struct_ops_prog,
+ .id = "qdisc_ops",
+};
+
+SEC("lsm/inet_conn_established")
+__success
+int BPF_PROG(access_lsm_prog, struct sock *sk, struct sk_buff *skb)
+{
+ return call_skb_storage_kfuncs(skb);
+}
+
+SEC("tp_btf/kfree_skb")
+__success
+int BPF_PROG(access_tracing_raw_tp_prog, struct sk_buff *skb, void *location,
+ enum skb_drop_reason reason)
+{
+ return call_skb_storage_kfuncs(skb);
+}
+
+SEC("sk_reuseport")
+__failure /* FIXME */
+int access_sk_reuseport_prog(struct sk_reuseport_md *ctx)
+{
+ struct sk_reuseport_kern *kctx = bpf_cast_to_kern_ctx(ctx);
+
+ return call_skb_storage_kfuncs(kctx->skb);
+}
+
+SEC("flow_dissector")
+__failure /* FIXME */
+int access_flow_dissector_prog(struct __sk_buff *ctx)
+{
+ return call_skb_storage_kfuncs(bpf_cast_to_kern_ctx(ctx));
+}
+
+SEC("lsm.s/inet_conn_established")
+__failure /* FIXME */
+int BPF_PROG(access_lsm_sleepable_prog, struct sock *sk, struct sk_buff *skb)
+{
+ return call_skb_storage_kfuncs(skb);
+}
+
+SEC("fentry/kfree_skb")
+__failure
+int BPF_PROG(access_tracing_fentry_prog, struct sk_buff *skb, void *location,
+ enum skb_drop_reason reason)
+{
+ return call_skb_storage_kfuncs(skb);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH RFC bpf-next 5/5] selftests/bpf: Add functional tests for skb local storage
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
` (3 preceding siblings ...)
2026-02-26 21:12 ` [PATCH RFC bpf-next 4/5] selftests/bpf: Add verifier tests for skb local storage Jakub Sitnicki
@ 2026-02-26 21:12 ` Jakub Sitnicki
2026-02-26 21:56 ` [PATCH RFC bpf-next 0/5] skb extension for BPF " Alexei Starovoitov
5 siblings, 0 replies; 10+ messages in thread
From: Jakub Sitnicki @ 2026-02-26 21:12 UTC (permalink / raw)
To: bpf; +Cc: Jakub Kicinski, Martin KaFai Lau, netdev, kernel-team
Exercise skb local storage across program types and hook boundaries.
1. skb_storage_ops: Unit test for the kfunc API - create, read,
delete, and re-create with an initial value. Runs as a socket
filter via BPF_PROG_TEST_RUN.
2. tc_clone_redirect: Store from TC egress on lo, bpf_clone_redirect
to a dummy device, read from TC egress on the dummy - verifies
storage is not carried over to the clone (BPF_F_CLONE is not
supported).
3. tc_ingress_to_cgrp_ingress: Store from TC ingress, read from
cgroup/skb ingress - verifies storage survives across hooks on
the same skb.
4. tc_ingress_to_sk_filter: Store from TC ingress, read from a
socket filter - verifies storage is accessible from SO_ATTACH_BPF.
5. cgrp_egress_to_tp_kfree_skb: Store from cgroup/skb egress, read
from tp_btf/kfree_skb after a blackhole qdisc drops the packet -
verifies storage survives until skb free.
6. tc_ingress_to_lsm_inet_conn_estab: Store from TC ingress, read
from LSM inet_conn_established hook - verifies storage is
accessible from LSM programs.
7. tc_ingress_to_skops_passive_estab: Store from TC ingress, read
from sock_ops BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB via
bpf_sock_ops_kern.skb - verifies the sock_ops access path.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
.../testing/selftests/bpf/prog_tests/skb_storage.c | 405 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/skb_storage.c | 312 ++++++++++++++++
2 files changed, 717 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/skb_storage.c b/tools/testing/selftests/bpf/prog_tests/skb_storage.c
new file mode 100644
index 000000000000..32b5669de9f5
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/skb_storage.c
@@ -0,0 +1,405 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Cloudflare, Inc. */
+
+#include <test_progs.h>
+#include <network_helpers.h>
+#include "skb_storage.skel.h"
+
+#define CGROUP_PATH "/skb_storage"
+#define DUMMY_DEV "dum0"
+#define IFINDEX_LO 1
+#define MAGIC_VALUE 0xdeadbeef
+#define TEST_PORT 4242
+
+static int send_udp_packet(__be16 port)
+{
+ struct sockaddr_in addr = {
+ .sin_family = AF_INET,
+ .sin_port = port,
+ .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+ };
+ int fd, ret = -1;
+
+ fd = socket(AF_INET, SOCK_DGRAM, 0);
+ if (fd < 0)
+ return -1;
+
+ ret = sendto(fd, "x", 1, 0, (struct sockaddr *)&addr, sizeof(addr));
+ if (ret < 0)
+ goto out;
+
+ ret = 0;
+out:
+ close(fd);
+ return ret;
+}
+
+static int recv_udp_packet(int server_fd)
+{
+ char buf[64];
+ struct sockaddr_in addr;
+ socklen_t len = sizeof(addr);
+
+ return recvfrom(server_fd, buf, sizeof(buf), 0,
+ (struct sockaddr *)&addr, &len);
+}
+
+static void test_skb_storage_ops(struct skb_storage *skel)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, topts,
+ .data_in = &pkt_v4,
+ .data_size_in = sizeof(pkt_v4),
+ );
+ int err;
+
+ err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.skb_storage_ops_test),
+ &topts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(skel->bss->test_result, 0, "test_result");
+}
+
+static void test_tc_clone_redirect(struct skb_storage *skel)
+{
+ const __be16 test_port = htons(TEST_PORT);
+ struct bpf_link *dummy_link = NULL;
+ struct bpf_link *lo_link = NULL;
+ struct netns_obj *netns;
+ int dummy_ifindex;
+
+ netns = netns_new(__func__, true);
+ if (!ASSERT_OK_PTR(netns, "netns_new"))
+ goto cleanup;
+
+ SYS(cleanup, "ip link add " DUMMY_DEV " type dummy");
+ SYS(cleanup, "ip link set " DUMMY_DEV " up");
+
+ dummy_ifindex = if_nametoindex(DUMMY_DEV);
+ if (!ASSERT_GT(dummy_ifindex, 0, "dummy_ifindex"))
+ goto cleanup;
+
+ skel->bss->target_port = test_port;
+ skel->bss->redirect_ifindex = dummy_ifindex;
+
+ lo_link = bpf_program__attach_tcx(skel->progs.tc_clone_redirect_store,
+ IFINDEX_LO, NULL);
+ if (!ASSERT_OK_PTR(lo_link, "attach_ingress"))
+ goto cleanup;
+
+ dummy_link = bpf_program__attach_tcx(skel->progs.tc_clone_redirect_load,
+ dummy_ifindex, NULL);
+ if (!ASSERT_OK_PTR(dummy_link, "attach_egress"))
+ goto cleanup;
+
+ skel->bss->store_seen = 0;
+ skel->bss->redir_seen = 0;
+ skel->bss->load_seen = 0;
+ skel->bss->load_value = 0;
+
+ if (!ASSERT_OK(send_udp_packet(test_port), "send_udp"))
+ goto cleanup;
+
+ ASSERT_EQ(skel->bss->store_seen, 1, "store_seen");
+ ASSERT_EQ(skel->bss->redir_seen, 1, "redir_seen");
+ ASSERT_EQ(skel->bss->load_seen, 0, "load_seen");
+ ASSERT_EQ(skel->bss->load_value, 0, "load_value");
+
+cleanup:
+ bpf_link__destroy(dummy_link);
+ bpf_link__destroy(lo_link);
+ netns_free(netns);
+}
+
+static void test_tc_ingress_to_cgrp_ingress(struct skb_storage *skel)
+{
+ struct bpf_link *tc_link = NULL;
+ struct bpf_link *cg_link = NULL;
+ int cgroup_fd = -1;
+ int server_fd = -1;
+ __be16 port;
+
+ cgroup_fd = test__join_cgroup(CGROUP_PATH);
+ if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+ return;
+
+ server_fd = start_server(AF_INET, SOCK_DGRAM, "127.0.0.1", 0, 0);
+ if (!ASSERT_GE(server_fd, 0, "start_server"))
+ goto cleanup;
+
+ port = get_socket_local_port(server_fd);
+ if (!ASSERT_GE(port, 0, "get_port"))
+ goto cleanup;
+
+ skel->bss->target_port = port;
+
+ tc_link = bpf_program__attach_tcx(skel->progs.tc_ingress_store,
+ IFINDEX_LO, NULL);
+ if (!ASSERT_OK_PTR(tc_link, "attach_tc"))
+ goto cleanup;
+
+ cg_link = bpf_program__attach_cgroup(skel->progs.cgrp_ingress_load,
+ cgroup_fd);
+ if (!ASSERT_OK_PTR(cg_link, "attach_cgroup"))
+ goto cleanup;
+
+ skel->bss->store_seen = 0;
+ skel->bss->load_seen = 0;
+ skel->bss->load_value = 0;
+
+ if (!ASSERT_OK(send_udp_packet(port), "send_udp"))
+ goto cleanup;
+
+ if (!ASSERT_GE(recv_udp_packet(server_fd), 0, "recv_udp"))
+ goto cleanup;
+
+ ASSERT_EQ(skel->bss->store_seen, 1, "store_seen");
+ ASSERT_EQ(skel->bss->load_seen, 1, "load_seen");
+ ASSERT_EQ(skel->bss->load_value, MAGIC_VALUE, "load_value");
+
+cleanup:
+ bpf_link__destroy(cg_link);
+ bpf_link__destroy(tc_link);
+ if (server_fd >= 0)
+ close(server_fd);
+ if (cgroup_fd >= 0)
+ close(cgroup_fd);
+}
+
+static void test_tc_ingress_to_sk_filter(struct skb_storage *skel)
+{
+ struct bpf_link *tc_link = NULL;
+ int server_fd = -1;
+ int filter_fd = -1;
+ __be16 port;
+ int ret;
+
+ server_fd = start_server(AF_INET, SOCK_DGRAM, "127.0.0.1", 0, 0);
+ if (!ASSERT_GE(server_fd, 0, "start_server"))
+ goto cleanup;
+
+ port = get_socket_local_port(server_fd);
+ if (!ASSERT_GE(port, 0, "get_port"))
+ goto cleanup;
+
+ skel->bss->target_port = port;
+
+ tc_link = bpf_program__attach_tcx(skel->progs.tc_ingress_store,
+ IFINDEX_LO, NULL);
+ if (!ASSERT_OK_PTR(tc_link, "attach_tcx"))
+ goto cleanup;
+
+ filter_fd = bpf_program__fd(skel->progs.sk_filter_load);
+ if (!ASSERT_GE(filter_fd, 0, "get_prog_fd"))
+ goto cleanup;
+
+ ret = setsockopt(server_fd, SOL_SOCKET, SO_ATTACH_BPF,
+ &filter_fd, sizeof(filter_fd));
+ if (!ASSERT_OK(ret, "attach_socket_filter"))
+ goto cleanup;
+
+ skel->bss->store_seen = 0;
+ skel->bss->load_seen = 0;
+ skel->bss->load_value = 0;
+
+ if (!ASSERT_OK(send_udp_packet(port), "send_udp"))
+ goto cleanup;
+
+ if (!ASSERT_GE(recv_udp_packet(server_fd), 0, "recv_udp"))
+ goto cleanup;
+
+ ASSERT_EQ(skel->bss->store_seen, 1, "store_seen");
+ ASSERT_EQ(skel->bss->load_seen, 1, "load_seen");
+ ASSERT_EQ(skel->bss->load_value, MAGIC_VALUE, "load_value");
+
+cleanup:
+ bpf_link__destroy(tc_link);
+ if (server_fd >= 0)
+ close(server_fd);
+}
+
+static void test_cgrp_egress_to_tp_kfree_skb(struct skb_storage *skel)
+{
+ struct bpf_link *cg_link = NULL;
+ struct bpf_link *tp_link = NULL;
+ struct netns_obj *netns = NULL;
+ int cgroup_fd = -1;
+ __be16 port;
+
+ cgroup_fd = test__join_cgroup(CGROUP_PATH);
+ if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+ return;
+
+ netns = netns_new(__func__, true);
+ if (!ASSERT_OK_PTR(netns, "netns_new"))
+ goto cleanup;
+
+ port = htons(TEST_PORT);
+ skel->bss->target_port = port;
+
+ cg_link = bpf_program__attach_cgroup(skel->progs.cgrp_egress_store,
+ cgroup_fd);
+ if (!ASSERT_OK_PTR(cg_link, "attach_cgroup"))
+ goto cleanup;
+
+ tp_link = bpf_program__attach_trace(skel->progs.tp_kfree_skb_load);
+ if (!ASSERT_OK_PTR(tp_link, "attach_tp"))
+ goto cleanup;
+
+ skel->bss->store_seen = 0;
+ skel->bss->load_seen = 0;
+ skel->bss->load_value = 0;
+
+ SYS(cleanup, "tc qdisc add dev lo root handle 1:0 blackhole");
+
+ if (!ASSERT_OK(send_udp_packet(port), "send_udp"))
+ goto cleanup;
+
+ ASSERT_EQ(skel->bss->store_seen, 1, "store_seen");
+ ASSERT_EQ(skel->bss->load_seen, 1, "load_seen");
+ ASSERT_EQ(skel->bss->load_value, MAGIC_VALUE, "load_value");
+
+cleanup:
+ bpf_link__destroy(tp_link);
+ bpf_link__destroy(cg_link);
+ netns_free(netns);
+ if (cgroup_fd >= 0)
+ close(cgroup_fd);
+}
+
+static void test_tc_ingress_to_lsm_inet_conn_estab(struct skb_storage *skel)
+{
+ struct bpf_link *lsm_link = NULL;
+ struct bpf_link *tc_link = NULL;
+ int server_fd = -1;
+ int client_fd = -1;
+ int conn_fd = -1;
+ __be16 port;
+
+ server_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1", 0, 0);
+ if (!ASSERT_GE(server_fd, 0, "start_server"))
+ goto cleanup;
+
+ port = get_socket_local_port(server_fd);
+ if (!ASSERT_GE(port, 0, "get_port"))
+ goto cleanup;
+
+ skel->bss->target_port = port;
+
+ tc_link = bpf_program__attach_tcx(skel->progs.tc_ingress_store, IFINDEX_LO, NULL);
+ if (!ASSERT_OK_PTR(tc_link, "attach_tcx"))
+ goto cleanup;
+
+ lsm_link = bpf_program__attach_lsm(skel->progs.lsm_inet_conn_estab_load);
+ if (!ASSERT_OK_PTR(lsm_link, "attach_lsm"))
+ goto cleanup;
+
+ skel->bss->store_seen = 0;
+ skel->bss->load_seen = 0;
+ skel->bss->load_value = 0;
+
+ client_fd = connect_to_fd(server_fd, 0);
+ if (!ASSERT_GE(client_fd, 0, "connect"))
+ goto cleanup;
+
+ conn_fd = accept(server_fd, NULL, NULL);
+ if (!ASSERT_GE(conn_fd, 0, "accept"))
+ goto cleanup;
+ close(conn_fd);
+
+ ASSERT_GT(skel->bss->store_seen, 0, "store_seen");
+ ASSERT_GT(skel->bss->load_seen, 0, "load_seen");
+ ASSERT_EQ(skel->bss->load_value, MAGIC_VALUE, "load_value");
+
+cleanup:
+ bpf_link__destroy(lsm_link);
+ bpf_link__destroy(tc_link);
+ if (client_fd >= 0)
+ close(client_fd);
+ if (server_fd >= 0)
+ close(server_fd);
+}
+
+static void test_tc_ingress_to_skops_passive_estab(struct skb_storage *skel)
+{
+ struct bpf_link *cg_link = NULL;
+ struct bpf_link *tc_link = NULL;
+ int cgroup_fd = -1;
+ int server_fd = -1;
+ int client_fd = -1;
+ int conn_fd = -1;
+ __be16 port;
+
+ cgroup_fd = test__join_cgroup(CGROUP_PATH);
+ if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup"))
+ return;
+
+ server_fd = start_server(AF_INET, SOCK_STREAM, "127.0.0.1", 0, 0);
+ if (!ASSERT_GE(server_fd, 0, "start_server"))
+ goto cleanup;
+
+ port = get_socket_local_port(server_fd);
+ if (!ASSERT_GE(port, 0, "get_port"))
+ goto cleanup;
+
+ skel->bss->target_port = port;
+
+ tc_link = bpf_program__attach_tcx(skel->progs.tc_ingress_store, IFINDEX_LO, NULL);
+ if (!ASSERT_OK_PTR(tc_link, "attach_tcx"))
+ goto cleanup;
+
+ cg_link = bpf_program__attach_cgroup(skel->progs.skops_passive_estab_load, cgroup_fd);
+ if (!ASSERT_OK_PTR(cg_link, "attach_cgroup"))
+ goto cleanup;
+
+ skel->bss->store_seen = 0;
+ skel->bss->load_seen = 0;
+ skel->bss->load_value = 0;
+
+ client_fd = connect_to_fd(server_fd, 0);
+ if (!ASSERT_GE(client_fd, 0, "connect"))
+ goto cleanup;
+
+ conn_fd = accept(server_fd, NULL, NULL);
+ if (!ASSERT_GE(conn_fd, 0, "accept"))
+ goto cleanup;
+ close(conn_fd);
+
+ ASSERT_GT(skel->bss->store_seen, 0, "store_seen");
+ ASSERT_GT(skel->bss->load_seen, 0, "load_seen");
+ ASSERT_EQ(skel->bss->load_value, MAGIC_VALUE, "load_value");
+
+cleanup:
+ bpf_link__destroy(cg_link);
+ bpf_link__destroy(tc_link);
+ if (client_fd >= 0)
+ close(client_fd);
+ if (server_fd >= 0)
+ close(server_fd);
+ if (cgroup_fd >= 0)
+ close(cgroup_fd);
+}
+
+void test_skb_storage(void)
+{
+ struct skb_storage *skel;
+
+ skel = skb_storage__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+ return;
+
+ if (test__start_subtest("skb_storage_ops"))
+ test_skb_storage_ops(skel);
+ if (test__start_subtest("tc_clone_redirect"))
+ test_tc_clone_redirect(skel);
+ if (test__start_subtest("tc_ingress_to_cgrp_ingress"))
+ test_tc_ingress_to_cgrp_ingress(skel);
+ if (test__start_subtest("tc_ingress_to_sk_filter"))
+ test_tc_ingress_to_sk_filter(skel);
+ if (test__start_subtest("cgrp_egress_to_tp_kfree_skb"))
+ test_cgrp_egress_to_tp_kfree_skb(skel);
+ if (test__start_subtest("tc_ingress_to_lsm_inet_conn_estab"))
+ test_tc_ingress_to_lsm_inet_conn_estab(skel);
+ if (test__start_subtest("tc_ingress_to_skops_passive_estab"))
+ test_tc_ingress_to_skops_passive_estab(skel);
+
+ skb_storage__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/skb_storage.c b/tools/testing/selftests/bpf/progs/skb_storage.c
new file mode 100644
index 000000000000..57628a6d26ae
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/skb_storage.c
@@ -0,0 +1,312 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Cloudflare, Inc. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_tracing.h>
+#include <errno.h>
+
+#define fallthrough __attribute__((__fallthrough__))
+
+#define ETH_P_IP 0x0800
+#define ETH_HLEN 14
+#define MAGIC_VALUE 0xdeadbeef
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __uint(type, BPF_MAP_TYPE_SKB_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, __u32);
+} skb_storage SEC(".maps");
+
+volatile __be16 target_port;
+volatile __u32 redirect_ifindex;
+
+volatile int test_result;
+volatile int store_seen;
+volatile int redir_seen;
+volatile int load_seen;
+volatile int load_value;
+
+enum {
+ CG_DROP = 0,
+ CG_PASS,
+};
+
+enum layer { L2, L3, L4 };
+
+static bool is_test_packet(struct __sk_buff *skb, enum layer layer)
+{
+ __u32 off = 0;
+ __be16 src;
+ __be16 dst;
+ __u8 ihl;
+
+ if (skb->protocol != bpf_htons(ETH_P_IP))
+ return false;
+
+ switch (layer) {
+ case L2:
+ off += ETH_HLEN;
+ fallthrough;
+ case L3:
+ if (bpf_skb_load_bytes(skb, off, &ihl, 1))
+ return false;
+ off += (ihl & 0xf) * 4;
+ fallthrough;
+ case L4:
+ if (bpf_skb_load_bytes(skb, off, &src, 2))
+ return false;
+ if (bpf_skb_load_bytes(skb, off + 2, &dst, 2))
+ return false;
+ }
+
+ return src == target_port || dst == target_port;
+}
+
+SEC("socket")
+int skb_storage_ops_test(struct __sk_buff *ctx)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = bpf_cast_to_kern_ctx(ctx);
+ __u32 init_value = MAGIC_VALUE;
+ __u32 *value;
+ int ret;
+
+ /* Get non-existent storage */
+ test_result = 1;
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (value)
+ goto out;
+
+ /* Create storage and write to it */
+ test_result = 2;
+ value = bpf_skb_storage_get(map, skb, NULL, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!value)
+ goto out;
+ if (*value)
+ goto out;
+ *value = MAGIC_VALUE;
+
+ /* Get existing storage and read from it */
+ test_result = 3;
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (!value)
+ goto out;
+ if (*value != MAGIC_VALUE)
+ goto out;
+
+ /* Delete existing storage */
+ test_result = 4;
+ ret = bpf_skb_storage_delete(map, skb);
+ if (ret)
+ goto out;
+
+ /* Delete non-existent storage */
+ test_result = 5;
+ ret = bpf_skb_storage_delete(map, skb);
+ if (ret != -ENOENT)
+ goto out;
+
+ /* Re-create storage with initial value */
+ test_result = 6;
+ value = bpf_skb_storage_get(map, skb, &init_value, sizeof(init_value),
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!value)
+ goto out;
+ if (*value != MAGIC_VALUE)
+ goto out;
+
+ test_result = 0;
+out:
+ return ctx->len;
+}
+
+SEC("tcx/egress")
+int tc_clone_redirect_store(struct __sk_buff *ctx)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = bpf_cast_to_kern_ctx(ctx);
+ __u32 *value;
+
+ if (!redirect_ifindex)
+ goto out;
+ if (!is_test_packet(ctx, L2))
+ goto out;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!value)
+ goto out;
+
+ *value = MAGIC_VALUE;
+ store_seen++;
+
+ bpf_clone_redirect(ctx, redirect_ifindex, 0);
+out:
+ return TCX_DROP;
+}
+
+SEC("tcx/egress")
+int tc_clone_redirect_load(struct __sk_buff *ctx)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = bpf_cast_to_kern_ctx(ctx);
+ __u32 *value;
+
+ if (!is_test_packet(ctx, L2))
+ goto out;
+
+ redir_seen++;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (!value)
+ goto out;
+
+ load_value = *value;
+ load_seen++;
+out:
+ return TCX_DROP;
+}
+
+SEC("tcx/ingress")
+int tc_ingress_store(struct __sk_buff *ctx)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = bpf_cast_to_kern_ctx(ctx);
+ __u32 *value;
+
+ if (!is_test_packet(ctx, L2))
+ goto out;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!value)
+ goto out;
+
+ *value = MAGIC_VALUE;
+ store_seen++;
+out:
+ return TCX_PASS;
+}
+
+SEC("cgroup_skb/ingress")
+int cgrp_ingress_load(struct __sk_buff *ctx)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = bpf_cast_to_kern_ctx(ctx);
+ __u32 *value;
+
+ if (!is_test_packet(ctx, L3))
+ goto out;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (!value)
+ goto out;
+
+ load_value = *value;
+ load_seen++;
+out:
+ return CG_PASS;
+}
+
+SEC("cgroup_skb/egress")
+int cgrp_egress_store(struct __sk_buff *ctx)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = bpf_cast_to_kern_ctx(ctx);
+ __u32 *value;
+
+ if (!is_test_packet(ctx, L3))
+ goto out;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!value)
+ goto out;
+
+ *value = MAGIC_VALUE;
+ store_seen++;
+out:
+ return CG_PASS;
+}
+
+SEC("socket")
+int sk_filter_load(struct __sk_buff *ctx)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = bpf_cast_to_kern_ctx(ctx);
+ __u32 *value;
+
+ if (!is_test_packet(ctx, L4))
+ goto out;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (!value)
+ goto out;
+
+ load_value = *value;
+ load_seen++;
+out:
+ return skb->len;
+}
+
+SEC("tp_btf/kfree_skb")
+int BPF_PROG(tp_kfree_skb_load, struct sk_buff *skb, void *location,
+ enum skb_drop_reason reason)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ __u32 *value;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (!value)
+ goto out;
+
+ load_value = *value;
+ load_seen++;
+out:
+ return 0;
+}
+
+SEC("lsm/inet_conn_established")
+int BPF_PROG(lsm_inet_conn_estab_load, struct sock *sk, struct sk_buff *skb)
+{
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ __u32 *value;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (!value)
+ goto out;
+
+ load_value = *value;
+ load_seen++;
+out:
+ return 0;
+}
+
+SEC("sockops")
+int skops_passive_estab_load(struct bpf_sock_ops *ctx)
+{
+ struct bpf_sock_ops_kern *kctx = bpf_cast_to_kern_ctx(ctx);
+ struct bpf_map *map = (typeof(map))&skb_storage;
+ struct sk_buff *skb = kctx->skb;
+ __u32 *value;
+
+ if (ctx->op != BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
+ goto out;
+ if (!skb)
+ goto out;
+
+ value = bpf_skb_storage_get(map, skb, NULL, 0, 0);
+ if (!value)
+ goto out;
+
+ load_value = *value;
+ load_seen++;
+out:
+ return CG_PASS;
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH RFC bpf-next 0/5] skb extension for BPF local storage
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
` (4 preceding siblings ...)
2026-02-26 21:12 ` [PATCH RFC bpf-next 5/5] selftests/bpf: Add functional " Jakub Sitnicki
@ 2026-02-26 21:56 ` Alexei Starovoitov
2026-02-27 20:11 ` Jakub Sitnicki
5 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2026-02-26 21:56 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: bpf, Jakub Kicinski, Martin KaFai Lau, Network Development,
kernel-team
On Thu, Feb 26, 2026 at 1:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Previously we have attempted to allow BPF users to attach tens of bytes of
> arbitrary data to packets by making XDP/skb metadata area persist across
> netstack layers [1].
>
> This approach turned out to be unsuccessful. It would require us to
> restrict the layout of skb headroom and patch call sites which modify the
> headroom by pushing/pulling the skb->data.
>
> As per Jakub's feedback [2] we're turning our attention to skb extensions
> as the new vehicle for passing BPF metadata. skb extensions avoid these
> problems by being a separate, opt-in side allocation that doesn't interfere
> with skb headroom layout.
>
> With the switch to skb extensions, we are no longer restricted by the
> features of XDP metadata, and hence we propose to extend the concept of BPF
> local storage to socket buffers - skb local storage.
>
> BPF local storage is an established pattern of attaching arbitrary data
> from BPF context to various common kernel entities (sk, task, cgroup,
> inode).
And that list of local storages ends with a solid period.
We're not going to add new local storages.
Not for skb and not for anything else.
We rejected it for cred, bdev and other things.
The path forward for such "local storage" like use cases is
to optimize hash, trie, rhashtable, whatever map, so
it's super fast for key == sizeof(void *) and use that
when you need it.
The life cycle of skb already has a tracepoint in the free path.
So do map_update(key=skb, ...) when you need to create such "skb local storage"
and free it from trace_consume/kfree_skb.
Potentially we can add a tracepoint in alloc_skb,
so bpf prog can alloc "skb local storage" there,
and to clone skb, so you can track the storage through clones
if you need to.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH RFC bpf-next 0/5] skb extension for BPF local storage
2026-02-26 21:56 ` [PATCH RFC bpf-next 0/5] skb extension for BPF " Alexei Starovoitov
@ 2026-02-27 20:11 ` Jakub Sitnicki
2026-02-28 23:50 ` Jakub Kicinski
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Sitnicki @ 2026-02-27 20:11 UTC (permalink / raw)
To: Alexei Starovoitov, Jakub Kicinski
Cc: bpf, Martin KaFai Lau, Network Development, kernel-team
On Thu, Feb 26, 2026 at 01:56 PM -08, Alexei Starovoitov wrote:
> On Thu, Feb 26, 2026 at 1:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Previously we have attempted to allow BPF users to attach tens of bytes of
>> arbitrary data to packets by making XDP/skb metadata area persist across
>> netstack layers [1].
>>
>> This approach turned out to be unsuccessful. It would require us to
>> restrict the layout of skb headroom and patch call sites which modify the
>> headroom by pushing/pulling the skb->data.
>>
>> As per Jakub's feedback [2] we're turning our attention to skb extensions
>> as the new vehicle for passing BPF metadata. skb extensions avoid these
>> problems by being a separate, opt-in side allocation that doesn't interfere
>> with skb headroom layout.
>>
>> With the switch to skb extensions, we are no longer restricted by the
>> features of XDP metadata, and hence we propose to extend the concept of BPF
>> local storage to socket buffers - skb local storage.
>>
>> BPF local storage is an established pattern of attaching arbitrary data
>> from BPF context to various common kernel entities (sk, task, cgroup,
>> inode).
>
> And that list of local storages ends with a solid period.
> We're not going to add new local storages.
> Not for skb and not for anything else.
> We rejected it for cred, bdev and other things.
Thanks for the concrete feedback. I appreciate it.
This saves us from going down a dead-end road.
> The path forward for such "local storage" like use cases is
> to optimize hash, trie, rhashtable, whatever map, so
> it's super fast for key == sizeof(void *) and use that
> when you need it.
> The life cycle of skb already has a tracepoint in the free path.
> So do map_update(key=skb, ...) when you need to create such "skb local storage"
> and free it from trace_consume/kfree_skb.
> Potentially we can add a tracepoint in alloc_skb,
> so bpf prog can alloc "skb local storage" there,
> and to clone skb, so you can track the storage through clones
> if you need to.
That is similar to the workaround we have in place (mentioned at LPC
[1]). And it was always our "plan C" to string it together with BPF
maps. But we wanted to go this way only as a last resort because:
1) consume_skb is a very frequent event spread across all CPUs
As a happy path it's getting hit 1M+ times/second. Hit by every kind of
skb (UNIX, Netlink), not necessarily just those we care about. Even if
we can keep runtime overhead low, that's wasted effort and potential
data bouncing issues across CPUs.
$ sudo perf stat -a -e skb:consume_skb -e skb:kfree_skb -- sleep 1
Performance counter stats for 'system wide':
1,132,924 skb:consume_skb
410,186 skb:kfree_skb
1.034636263 seconds time elapsed
$
2) Sizing the "skb storage" maps is tricky
We need to size for the worst case, but the worst case is
workload-dependent and can change at runtime. IOW, predicting in flight
skb count is hard to get right. We've got skbs queued in TCP retransmit,
qdisc backlog, and need to factor in RTT and queue depth to estimate the
skb life-time.
We'd probably have to arrive at the "right size" empirically.
So to exhaust all alternatives I gotta ask - would you and Jakub be open
to the idea of a plain byte buffer embedded in skb_ext and exposed as a
bpf_dynptr?
#define BPF_SKB_META_DATA_SIZE 64 /* make it build-time configurable */
struct bpf_skb_meta_ext {
char data[BPF_SKB_META_DATA_SIZE] __aligned(8);
};
Perhaps by reusing the existing bpf_dynptr_from_skb_meta to give access
to a "secondary metadata" storage backed by skb_ext.
bpf_dynptr_from_skb_meta(ctx, BPF_DYNPTR_SKB_EXT_F, &meta);
To be fair, the whole BPF local storage approach was never suggested by
Jakub, only skb extensions. That missed idea is on me.
IOW, what I'm wondering is if you're against a side storage in skb_ext
in general or just plugging BPF local storage there in particular?
Thanks,
-jkbs
[1] slides 57, 62 in https://lpc.events/event/19/contributions/2269/
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH RFC bpf-next 0/5] skb extension for BPF local storage
2026-02-27 20:11 ` Jakub Sitnicki
@ 2026-02-28 23:50 ` Jakub Kicinski
2026-03-01 17:59 ` Jakub Sitnicki
0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2026-02-28 23:50 UTC (permalink / raw)
To: Jakub Sitnicki, Paolo Abeni
Cc: Alexei Starovoitov, bpf, Martin KaFai Lau, Network Development,
kernel-team
On Fri, 27 Feb 2026 21:11:05 +0100 Jakub Sitnicki wrote:
> So to exhaust all alternatives I gotta ask - would you and Jakub be open
> to the idea of a plain byte buffer embedded in skb_ext and exposed as a
> bpf_dynptr?
I'm fine with that, but admittedly I (like you?) live in
the comfortable universe of controlling which .config options
are suitable for the narrow use cases I care about.
So I don't really trust my judgment. Adding Paolo..
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC bpf-next 0/5] skb extension for BPF local storage
2026-02-28 23:50 ` Jakub Kicinski
@ 2026-03-01 17:59 ` Jakub Sitnicki
0 siblings, 0 replies; 10+ messages in thread
From: Jakub Sitnicki @ 2026-03-01 17:59 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Paolo Abeni, Alexei Starovoitov, bpf, Martin KaFai Lau,
Network Development, kernel-team
On Sat, Feb 28, 2026 at 03:50 PM -08, Jakub Kicinski wrote:
> On Fri, 27 Feb 2026 21:11:05 +0100 Jakub Sitnicki wrote:
>> So to exhaust all alternatives I gotta ask - would you and Jakub be open
>> to the idea of a plain byte buffer embedded in skb_ext and exposed as a
>> bpf_dynptr?
>
> I'm fine with that, but admittedly I (like you?) live in
> the comfortable universe of controlling which .config options
> are suitable for the narrow use cases I care about.
> So I don't really trust my judgment. Adding Paolo..
Good point. Distro's would have to pick one-size-fits-all which won't
work. This needs to be at least a boot-time configurable, which looks
doable with minimal changes to skb_ext setup code.
Thanks for feedback.
^ permalink raw reply [flat|nested] 10+ messages in thread