Re: [LSF/MM/BPF TOPIC] BPF local storage for every packet

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Sitnicki <jakub@cloudflare.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: bpf@vger.kernel.org,  kernel-team@cloudflare.com,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] BPF local storage for every packet
Date: Sat, 21 Feb 2026 14:42:10 +0100	[thread overview]
Message-ID: <877bs6fc25.fsf@cloudflare.com> (raw)
In-Reply-To: <5fdee5fd-aff1-4764-820e-3b1f3ad00941@linux.dev> (Martin KaFai Lau's message of "Fri, 20 Feb 2026 10:34:09 -0800")

On Fri, Feb 20, 2026 at 10:34 AM -08, Martin KaFai Lau wrote:
> On 2/20/26 6:56 AM, Jakub Sitnicki wrote:
>> In the upcoming days we are going to post an RFC which proposes to
>> extend the concept of BPF local storage to socket buffers (sk_buff, skb)
>> as means to attach arbitrary metadata to packets from BPF programs [1]
>> (slides 41-55).
>> Design wise, BPF local storage is a great fit for a packet metadata
>> container, as it that avoids some of the shortcoming of the the XDP
>> metadata interface:
>> 1. Users interact with storage through BPF maps and can take advantage
>>     of existing built-in BPF map types, while still being able to
>>     implement a custom data format,
>> 2. Maps within local storage can have different properties controlled by
>>     map flags. For example, maps with BPF_F_CLONE set can survive packet
>>     cloning. Other flags could allow map contents to survive sk_buff
>>     scrubbing during encapsulation/decapsulation or pass across network
>>     namespace boundaries.
>> 3. Local storage supports multiple users out of the box - each user
>>     creates their own map, eliminating the need to coordinate data
>>     layout,
>> 4. Local storage has its own backing memory, so persisting it across
>>     network stack layers requires no changes to the network stack.
>> However, this flexibility comes at a cost. While XDP metadata requires
>> no allocations [2], an initial write to BPF local storage requires two:
>> one for bpf_local_storage_elem, and one for bpf_local_storage itself.
>> We would like to align this work with the needs of other BPF local
>> storage users (socks, cgroups, tasks, inodes), where allocation overhead
>> has been a concern as well [2].
>> Optimization ideas we would like to put up for discussion:
>> - slimming down bpf_local_storage so it can be embedded as an skb
>>    extension chunk,
>> - making the bpf_local_storage cache size configurable,
>> - allowing bpf_local_storage to be pre-allocated,
>> - co-allocating bpf_local_storage and bpf_local_storage_elem for the
>>    single-map case.
>
> The sk/cgroup/task storage has a much longer lifetime. Meaning once allocation
> is done, the storage stays in the sk until the sk is closed. The length of
> lifetime is quite different from the skb. I am afraid we are re-purposing
> bpf_local_storage for a very different use case where skb lifecycle is much
> shorter.
>
> We are planning to increase the 'sizeof(struct sock)' for perf reason. Saving an
> allocation is an upside but not the major one we are looking (or care) for
> sk. We are more looking for cacheline efficiency and probably remove the need
> for bpf_local_storage[_elem] if the user chooses to use the in-place spaces of a
> sk.
>
> If 'sizeof(struct sk_buff)' can be increased, this should align on where sk
> local storage is going. If skb will solely depend on the existing
> bpf_local_storage and has no plan to raise sizeof(struct sk_buff) for perf
> purpose, the existing bpf_local_storage may be the wrong place to
> repurpose/optimize because the lifecycle of skb is very different.

The lifetime difference is undeniable, but I still see common ground.
To make it more concrete:

1. IIRC you've mentioned wanting more bpf_local_storage->cache entries
   for socks in some scenarios, while for skbs I'd expect we need
   fewer. We could make the cache size configurable via a flexible
   array.

2. Embedding bpf_local_storage is another overlap I had in mind. For
   socks that in within the same memory blob as struct sock, while for
   skbs we'd want to embed it in skb_ext (once it's small enough). This
   depends on whether you end up dropping bpf_local_storage for
   sk_local_storage entirely, which I didn't know about until now.

3. I've heard the idea of allocating skb_ext memory together with
   sk_buff was floated in the past. While trimming skb_ext at build-time
   is hard today — say I need XFRM but don't care about crypto offloads
   keeping state in skb_ext — the idea is similar to what you're
   proposing for struct sock.

Thanks,
-jkbs

next prev parent reply	other threads:[~2026-02-21 13:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 14:56 [LSF/MM/BPF TOPIC] BPF local storage for every packet Jakub Sitnicki
2026-02-20 18:34 ` Martin KaFai Lau
2026-02-21 13:42   ` Jakub Sitnicki [this message]
2026-02-23 19:26     ` Martin KaFai Lau
2026-02-24 11:58       ` Jakub Sitnicki
2026-03-03 15:06 ` Zhu Yanjun
2026-03-03 21:07   ` Jakub Sitnicki
2026-03-16  3:02     ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877bs6fc25.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.lau@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.