From: Jakub Sitnicki <jakub@cloudflare.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf@vger.kernel.org, kernel-team@cloudflare.com,
lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] BPF local storage for every packet
Date: Sat, 21 Feb 2026 14:42:10 +0100 [thread overview]
Message-ID: <877bs6fc25.fsf@cloudflare.com> (raw)
In-Reply-To: <5fdee5fd-aff1-4764-820e-3b1f3ad00941@linux.dev> (Martin KaFai Lau's message of "Fri, 20 Feb 2026 10:34:09 -0800")
On Fri, Feb 20, 2026 at 10:34 AM -08, Martin KaFai Lau wrote:
> On 2/20/26 6:56 AM, Jakub Sitnicki wrote:
>> In the upcoming days we are going to post an RFC which proposes to
>> extend the concept of BPF local storage to socket buffers (sk_buff, skb)
>> as means to attach arbitrary metadata to packets from BPF programs [1]
>> (slides 41-55).
>> Design wise, BPF local storage is a great fit for a packet metadata
>> container, as it that avoids some of the shortcoming of the the XDP
>> metadata interface:
>> 1. Users interact with storage through BPF maps and can take advantage
>> of existing built-in BPF map types, while still being able to
>> implement a custom data format,
>> 2. Maps within local storage can have different properties controlled by
>> map flags. For example, maps with BPF_F_CLONE set can survive packet
>> cloning. Other flags could allow map contents to survive sk_buff
>> scrubbing during encapsulation/decapsulation or pass across network
>> namespace boundaries.
>> 3. Local storage supports multiple users out of the box - each user
>> creates their own map, eliminating the need to coordinate data
>> layout,
>> 4. Local storage has its own backing memory, so persisting it across
>> network stack layers requires no changes to the network stack.
>> However, this flexibility comes at a cost. While XDP metadata requires
>> no allocations [2], an initial write to BPF local storage requires two:
>> one for bpf_local_storage_elem, and one for bpf_local_storage itself.
>> We would like to align this work with the needs of other BPF local
>> storage users (socks, cgroups, tasks, inodes), where allocation overhead
>> has been a concern as well [2].
>> Optimization ideas we would like to put up for discussion:
>> - slimming down bpf_local_storage so it can be embedded as an skb
>> extension chunk,
>> - making the bpf_local_storage cache size configurable,
>> - allowing bpf_local_storage to be pre-allocated,
>> - co-allocating bpf_local_storage and bpf_local_storage_elem for the
>> single-map case.
>
> The sk/cgroup/task storage has a much longer lifetime. Meaning once allocation
> is done, the storage stays in the sk until the sk is closed. The length of
> lifetime is quite different from the skb. I am afraid we are re-purposing
> bpf_local_storage for a very different use case where skb lifecycle is much
> shorter.
>
> We are planning to increase the 'sizeof(struct sock)' for perf reason. Saving an
> allocation is an upside but not the major one we are looking (or care) for
> sk. We are more looking for cacheline efficiency and probably remove the need
> for bpf_local_storage[_elem] if the user chooses to use the in-place spaces of a
> sk.
>
> If 'sizeof(struct sk_buff)' can be increased, this should align on where sk
> local storage is going. If skb will solely depend on the existing
> bpf_local_storage and has no plan to raise sizeof(struct sk_buff) for perf
> purpose, the existing bpf_local_storage may be the wrong place to
> repurpose/optimize because the lifecycle of skb is very different.
The lifetime difference is undeniable, but I still see common ground.
To make it more concrete:
1. IIRC you've mentioned wanting more bpf_local_storage->cache entries
for socks in some scenarios, while for skbs I'd expect we need
fewer. We could make the cache size configurable via a flexible
array.
2. Embedding bpf_local_storage is another overlap I had in mind. For
socks that in within the same memory blob as struct sock, while for
skbs we'd want to embed it in skb_ext (once it's small enough). This
depends on whether you end up dropping bpf_local_storage for
sk_local_storage entirely, which I didn't know about until now.
3. I've heard the idea of allocating skb_ext memory together with
sk_buff was floated in the past. While trimming skb_ext at build-time
is hard today — say I need XFRM but don't care about crypto offloads
keeping state in skb_ext — the idea is similar to what you're
proposing for struct sock.
Thanks,
-jkbs
next prev parent reply other threads:[~2026-02-21 13:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-20 14:56 [LSF/MM/BPF TOPIC] BPF local storage for every packet Jakub Sitnicki
2026-02-20 18:34 ` Martin KaFai Lau
2026-02-21 13:42 ` Jakub Sitnicki [this message]
2026-02-23 19:26 ` Martin KaFai Lau
2026-02-24 11:58 ` Jakub Sitnicki
2026-03-03 15:06 ` Zhu Yanjun
2026-03-03 21:07 ` Jakub Sitnicki
2026-03-16 3:02 ` Zhu Yanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877bs6fc25.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=bpf@vger.kernel.org \
--cc=kernel-team@cloudflare.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=martin.lau@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox