public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next 0/5] skb extension for BPF local storage
@ 2026-02-26 21:12 Jakub Sitnicki
  2026-02-26 21:12 ` [PATCH RFC bpf-next 1/5] bpf: Introduce local storage for sk_buff Jakub Sitnicki
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Jakub Sitnicki @ 2026-02-26 21:12 UTC (permalink / raw)
  To: bpf; +Cc: Jakub Kicinski, Martin KaFai Lau, netdev, kernel-team

Previously we have attempted to allow BPF users to attach tens of bytes of
arbitrary data to packets by making XDP/skb metadata area persist across
netstack layers [1].

This approach turned out to be unsuccessful. It would require us to
restrict the layout of skb headroom and patch call sites which modify the
headroom by pushing/pulling the skb->data.

As per Jakub's feedback [2] we're turning our attention to skb extensions
as the new vehicle for passing BPF metadata. skb extensions avoid these
problems by being a separate, opt-in side allocation that doesn't interfere
with skb headroom layout.

With the switch to skb extensions, we are no longer restricted by the
features of XDP metadata, and hence we propose to extend the concept of BPF
local storage to socket buffers - skb local storage.

BPF local storage is an established pattern of attaching arbitrary data
from BPF context to various common kernel entities (sk, task, cgroup,
inode). It avoids some of the limitations of XDP metadata, namely:

1. Multiple users can allocate space for their data without the need to
coordinate. BPF local storage solves this by allocating space for each
user's BPF map and its elements separately. This matters when independent
BPF programs owned by different parties (e.g. a traffic policy and an
observability tooling) both need to annotate the same packets.

2. Lifetime of metadata is well-defined and can be precisely scoped. By
default, skb local storage is scrubbed on clone, tunnel encap/decap, and
netns crossing - matching the skb extension defaults. In later iterations
we plan to let users relax these defaults through BPF map flags for packet
tracking use cases (see Future Work below).

However, with flexibility also come downsides:

BPF local storage is not allocation-free like skb->data_meta area. Creating
the storage imposes additional overhead, which translates to skb processing
latency. This is especially painful considering the relatively short
lifetime of sk_buff objects compared to other entities like socks.

The overhead tolerance for this naive skb local storage implementation
depends on the pps rate and whether skb local storage gets created for
every packet or just some of them, for example, when sampling or tagging
first packet in an L4 connection.

Our initial rough benchmarks on a VM with kernel.bpf_stats_enabled=1 [3]
show that running a tc/ingress prog that creates skb local storage and
writes to it amounts to ~330 nsec of per-packet overhead.  Retrieving skb
local storage and reading from it in a cgroup_skb/ingress hook contributes
an additional ~115 nsec.

Rounding up to ~500 nsec per packet:
- at 100k pps, that's 5% of the 10 usec per-packet budget, but
- at 1 Mpps, that's already 50% of the budget, which is not acceptable.

While definitely not suitable for high-pps flows, the naive skb local
storage implementation is arguably acceptable at low rates, for example
when you need to attach metadata only to the first packet of a TCP/QUIC
connection or sample packets at very low rates for tracing.

From this initial implementation, fit for the low-pps use cases, we would
like to work towards lowering the overhead to enable use at higher packet
rates as proposed in the LSF/MM/BPF topic [4].

Future work - in the next iterations on the RFC I'm planning to address:

1. skb local storage copying/uncloning when user opts in with BPF_F_CLONE,
2. opt out from scrubbing BPF local storage on tunnel decap/encap,
3. opt out from scrubbing BPF local storage on crossing netns boundary.

The (2) and (3) as needed to support packet tracking use cases.

With this early posting I'm looking for feedback - is this going in the
direction that aligns with the maintainers' and reviewers' expectations for
the intended use of skb extensions and BPF local storage?

Thanks,
-jkbs

[1] https://lore.kernel.org/all/20260107-skb-meta-safeproof-netdevs-rx-only-v3-0-0d461c5e4764@cloudflare.com/
[2] https://lore.kernel.org/all/20260108174903.59323f72@kernel.org/
[3] https://github.com/jsitnicki/skb-metadata-tests/tree/main/skb-storage-bench
[4] https://msgid.link/87ecmffopy.fsf@cloudflare.com

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
Jakub Sitnicki (5):
      bpf: Introduce local storage for sk_buff
      bpf: Allow passing kernel context pointer to kfuncs
      bpf: Allow access to bpf_sock_ops_kern->skb
      selftests/bpf: Add verifier tests for skb local storage
      selftests/bpf: Add functional tests for skb local storage

 include/linux/bpf_types.h                          |   3 +
 include/linux/skbuff.h                             |   3 +
 include/net/bpf_skb_storage.h                      |  21 ++
 include/uapi/linux/bpf.h                           |   1 +
 kernel/bpf/syscall.c                               |   1 +
 kernel/bpf/verifier.c                              |  67 +++-
 net/Kconfig                                        |  10 +
 net/core/Makefile                                  |   1 +
 net/core/bpf_skb_storage.c                         | 264 ++++++++++++++
 net/core/skbuff.c                                  |  15 +
 .../testing/selftests/bpf/prog_tests/skb_storage.c | 405 +++++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/verifier.c  |   2 +
 tools/testing/selftests/bpf/progs/skb_storage.c    | 312 ++++++++++++++++
 .../selftests/bpf/progs/verifier_skb_storage.c     | 209 +++++++++++
 14 files changed, 1313 insertions(+), 1 deletion(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-03-01 17:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 1/5] bpf: Introduce local storage for sk_buff Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 2/5] bpf: Allow passing kernel context pointer to kfuncs Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 3/5] bpf: Allow access to bpf_sock_ops_kern->skb Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 4/5] selftests/bpf: Add verifier tests for skb local storage Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 5/5] selftests/bpf: Add functional " Jakub Sitnicki
2026-02-26 21:56 ` [PATCH RFC bpf-next 0/5] skb extension for BPF " Alexei Starovoitov
2026-02-27 20:11   ` Jakub Sitnicki
2026-02-28 23:50     ` Jakub Kicinski
2026-03-01 17:59       ` Jakub Sitnicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox