public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf@vger.kernel.org,  kernel-team@cloudflare.com,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] BPF local storage for every packet
Date: Sat, 21 Feb 2026 14:42:10 +0100	[thread overview]
Message-ID: <877bs6fc25.fsf@cloudflare.com> (raw)
In-Reply-To: <5fdee5fd-aff1-4764-820e-3b1f3ad00941@linux.dev> (Martin KaFai Lau's message of "Fri, 20 Feb 2026 10:34:09 -0800")

On Fri, Feb 20, 2026 at 10:34 AM -08, Martin KaFai Lau wrote:
> On 2/20/26 6:56 AM, Jakub Sitnicki wrote:
>> In the upcoming days we are going to post an RFC which proposes to
>> extend the concept of BPF local storage to socket buffers (sk_buff, skb)
>> as means to attach arbitrary metadata to packets from BPF programs [1]
>> (slides 41-55).
>> Design wise, BPF local storage is a great fit for a packet metadata
>> container, as it that avoids some of the shortcoming of the the XDP
>> metadata interface:
>> 1. Users interact with storage through BPF maps and can take advantage
>>     of existing built-in BPF map types, while still being able to
>>     implement a custom data format,
>> 2. Maps within local storage can have different properties controlled by
>>     map flags. For example, maps with BPF_F_CLONE set can survive packet
>>     cloning. Other flags could allow map contents to survive sk_buff
>>     scrubbing during encapsulation/decapsulation or pass across network
>>     namespace boundaries.
>> 3. Local storage supports multiple users out of the box - each user
>>     creates their own map, eliminating the need to coordinate data
>>     layout,
>> 4. Local storage has its own backing memory, so persisting it across
>>     network stack layers requires no changes to the network stack.
>> However, this flexibility comes at a cost. While XDP metadata requires
>> no allocations [2], an initial write to BPF local storage requires two:
>> one for bpf_local_storage_elem, and one for bpf_local_storage itself.
>> We would like to align this work with the needs of other BPF local
>> storage users (socks, cgroups, tasks, inodes), where allocation overhead
>> has been a concern as well [2].
>> Optimization ideas we would like to put up for discussion:
>> - slimming down bpf_local_storage so it can be embedded as an skb
>>    extension chunk,
>> - making the bpf_local_storage cache size configurable,
>> - allowing bpf_local_storage to be pre-allocated,
>> - co-allocating bpf_local_storage and bpf_local_storage_elem for the
>>    single-map case.
>
> The sk/cgroup/task storage has a much longer lifetime. Meaning once allocation
> is done, the storage stays in the sk until the sk is closed. The length of
> lifetime is quite different from the skb. I am afraid we are re-purposing
> bpf_local_storage for a very different use case where skb lifecycle is much
> shorter.
>
> We are planning to increase the 'sizeof(struct sock)' for perf reason. Saving an
> allocation is an upside but not the major one we are looking (or care) for
> sk. We are more looking for cacheline efficiency and probably remove the need
> for bpf_local_storage[_elem] if the user chooses to use the in-place spaces of a
> sk.
>
> If 'sizeof(struct sk_buff)' can be increased, this should align on where sk
> local storage is going. If skb will solely depend on the existing
> bpf_local_storage and has no plan to raise sizeof(struct sk_buff) for perf
> purpose, the existing bpf_local_storage may be the wrong place to
> repurpose/optimize because the lifecycle of skb is very different.

The lifetime difference is undeniable, but I still see common ground.
To make it more concrete:

1. IIRC you've mentioned wanting more bpf_local_storage->cache entries
   for socks in some scenarios, while for skbs I'd expect we need
   fewer. We could make the cache size configurable via a flexible
   array.

2. Embedding bpf_local_storage is another overlap I had in mind. For
   socks that in within the same memory blob as struct sock, while for
   skbs we'd want to embed it in skb_ext (once it's small enough). This
   depends on whether you end up dropping bpf_local_storage for
   sk_local_storage entirely, which I didn't know about until now.

3. I've heard the idea of allocating skb_ext memory together with
   sk_buff was floated in the past. While trimming skb_ext at build-time
   is hard today — say I need XFRM but don't care about crypto offloads
   keeping state in skb_ext — the idea is similar to what you're
   proposing for struct sock.

Thanks,
-jkbs

  reply	other threads:[~2026-02-21 13:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 14:56 [LSF/MM/BPF TOPIC] BPF local storage for every packet Jakub Sitnicki
2026-02-20 18:34 ` Martin KaFai Lau
2026-02-21 13:42   ` Jakub Sitnicki [this message]
2026-02-23 19:26     ` Martin KaFai Lau
2026-02-24 11:58       ` Jakub Sitnicki
2026-03-03 15:06 ` Zhu Yanjun
2026-03-03 21:07   ` Jakub Sitnicki
2026-03-16  3:02     ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877bs6fc25.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.lau@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox