From: Jakub Sitnicki <jakub@cloudflare.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Jakub Kicinski <kuba@kernel.org>
Cc: bpf <bpf@vger.kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Network Development <netdev@vger.kernel.org>,
kernel-team <kernel-team@cloudflare.com>
Subject: Re: [PATCH RFC bpf-next 0/5] skb extension for BPF local storage
Date: Fri, 27 Feb 2026 21:11:05 +0100 [thread overview]
Message-ID: <87wlzydk12.fsf@cloudflare.com> (raw)
In-Reply-To: <CAADnVQKVfyh3_OZshvYf7GJUF-ph2eMfmaQsxNgwBJd1AJgXTQ@mail.gmail.com> (Alexei Starovoitov's message of "Thu, 26 Feb 2026 13:56:12 -0800")
On Thu, Feb 26, 2026 at 01:56 PM -08, Alexei Starovoitov wrote:
> On Thu, Feb 26, 2026 at 1:16 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Previously we have attempted to allow BPF users to attach tens of bytes of
>> arbitrary data to packets by making XDP/skb metadata area persist across
>> netstack layers [1].
>>
>> This approach turned out to be unsuccessful. It would require us to
>> restrict the layout of skb headroom and patch call sites which modify the
>> headroom by pushing/pulling the skb->data.
>>
>> As per Jakub's feedback [2] we're turning our attention to skb extensions
>> as the new vehicle for passing BPF metadata. skb extensions avoid these
>> problems by being a separate, opt-in side allocation that doesn't interfere
>> with skb headroom layout.
>>
>> With the switch to skb extensions, we are no longer restricted by the
>> features of XDP metadata, and hence we propose to extend the concept of BPF
>> local storage to socket buffers - skb local storage.
>>
>> BPF local storage is an established pattern of attaching arbitrary data
>> from BPF context to various common kernel entities (sk, task, cgroup,
>> inode).
>
> And that list of local storages ends with a solid period.
> We're not going to add new local storages.
> Not for skb and not for anything else.
> We rejected it for cred, bdev and other things.
Thanks for the concrete feedback. I appreciate it.
This saves us from going down a dead-end road.
> The path forward for such "local storage" like use cases is
> to optimize hash, trie, rhashtable, whatever map, so
> it's super fast for key == sizeof(void *) and use that
> when you need it.
> The life cycle of skb already has a tracepoint in the free path.
> So do map_update(key=skb, ...) when you need to create such "skb local storage"
> and free it from trace_consume/kfree_skb.
> Potentially we can add a tracepoint in alloc_skb,
> so bpf prog can alloc "skb local storage" there,
> and to clone skb, so you can track the storage through clones
> if you need to.
That is similar to the workaround we have in place (mentioned at LPC
[1]). And it was always our "plan C" to string it together with BPF
maps. But we wanted to go this way only as a last resort because:
1) consume_skb is a very frequent event spread across all CPUs
As a happy path it's getting hit 1M+ times/second. Hit by every kind of
skb (UNIX, Netlink), not necessarily just those we care about. Even if
we can keep runtime overhead low, that's wasted effort and potential
data bouncing issues across CPUs.
$ sudo perf stat -a -e skb:consume_skb -e skb:kfree_skb -- sleep 1
Performance counter stats for 'system wide':
1,132,924 skb:consume_skb
410,186 skb:kfree_skb
1.034636263 seconds time elapsed
$
2) Sizing the "skb storage" maps is tricky
We need to size for the worst case, but the worst case is
workload-dependent and can change at runtime. IOW, predicting in flight
skb count is hard to get right. We've got skbs queued in TCP retransmit,
qdisc backlog, and need to factor in RTT and queue depth to estimate the
skb life-time.
We'd probably have to arrive at the "right size" empirically.
So to exhaust all alternatives I gotta ask - would you and Jakub be open
to the idea of a plain byte buffer embedded in skb_ext and exposed as a
bpf_dynptr?
#define BPF_SKB_META_DATA_SIZE 64 /* make it build-time configurable */
struct bpf_skb_meta_ext {
char data[BPF_SKB_META_DATA_SIZE] __aligned(8);
};
Perhaps by reusing the existing bpf_dynptr_from_skb_meta to give access
to a "secondary metadata" storage backed by skb_ext.
bpf_dynptr_from_skb_meta(ctx, BPF_DYNPTR_SKB_EXT_F, &meta);
To be fair, the whole BPF local storage approach was never suggested by
Jakub, only skb extensions. That missed idea is on me.
IOW, what I'm wondering is if you're against a side storage in skb_ext
in general or just plugging BPF local storage there in particular?
Thanks,
-jkbs
[1] slides 57, 62 in https://lpc.events/event/19/contributions/2269/
next prev parent reply other threads:[~2026-02-27 20:11 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-26 21:12 [PATCH RFC bpf-next 0/5] skb extension for BPF local storage Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 1/5] bpf: Introduce local storage for sk_buff Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 2/5] bpf: Allow passing kernel context pointer to kfuncs Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 3/5] bpf: Allow access to bpf_sock_ops_kern->skb Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 4/5] selftests/bpf: Add verifier tests for skb local storage Jakub Sitnicki
2026-02-26 21:12 ` [PATCH RFC bpf-next 5/5] selftests/bpf: Add functional " Jakub Sitnicki
2026-02-26 21:56 ` [PATCH RFC bpf-next 0/5] skb extension for BPF " Alexei Starovoitov
2026-02-27 20:11 ` Jakub Sitnicki [this message]
2026-02-28 23:50 ` Jakub Kicinski
2026-03-01 17:59 ` Jakub Sitnicki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wlzydk12.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox