public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jakub Sitnicki <jakub@cloudflare.com>
Cc: bpf@vger.kernel.org, kernel-team@cloudflare.com,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] BPF local storage for every packet
Date: Mon, 23 Feb 2026 11:26:13 -0800	[thread overview]
Message-ID: <e0f16c6c-cd9f-4881-b638-c52c8a83104e@linux.dev> (raw)
In-Reply-To: <877bs6fc25.fsf@cloudflare.com>

On 2/21/26 5:42 AM, Jakub Sitnicki wrote:
> On Fri, Feb 20, 2026 at 10:34 AM -08, Martin KaFai Lau wrote:
>> On 2/20/26 6:56 AM, Jakub Sitnicki wrote:
>>> In the upcoming days we are going to post an RFC which proposes to
>>> extend the concept of BPF local storage to socket buffers (sk_buff, skb)
>>> as means to attach arbitrary metadata to packets from BPF programs [1]
>>> (slides 41-55).
>>> Design wise, BPF local storage is a great fit for a packet metadata
>>> container, as it that avoids some of the shortcoming of the the XDP
>>> metadata interface:
>>> 1. Users interact with storage through BPF maps and can take advantage
>>>      of existing built-in BPF map types, while still being able to
>>>      implement a custom data format,
>>> 2. Maps within local storage can have different properties controlled by
>>>      map flags. For example, maps with BPF_F_CLONE set can survive packet
>>>      cloning. Other flags could allow map contents to survive sk_buff
>>>      scrubbing during encapsulation/decapsulation or pass across network
>>>      namespace boundaries.
>>> 3. Local storage supports multiple users out of the box - each user
>>>      creates their own map, eliminating the need to coordinate data
>>>      layout,
>>> 4. Local storage has its own backing memory, so persisting it across
>>>      network stack layers requires no changes to the network stack.
>>> However, this flexibility comes at a cost. While XDP metadata requires
>>> no allocations [2], an initial write to BPF local storage requires two:
>>> one for bpf_local_storage_elem, and one for bpf_local_storage itself.
>>> We would like to align this work with the needs of other BPF local
>>> storage users (socks, cgroups, tasks, inodes), where allocation overhead
>>> has been a concern as well [2].
>>> Optimization ideas we would like to put up for discussion:
>>> - slimming down bpf_local_storage so it can be embedded as an skb
>>>     extension chunk,
>>> - making the bpf_local_storage cache size configurable,
>>> - allowing bpf_local_storage to be pre-allocated,
>>> - co-allocating bpf_local_storage and bpf_local_storage_elem for the
>>>     single-map case.
>>
>> The sk/cgroup/task storage has a much longer lifetime. Meaning once allocation
>> is done, the storage stays in the sk until the sk is closed. The length of
>> lifetime is quite different from the skb. I am afraid we are re-purposing
>> bpf_local_storage for a very different use case where skb lifecycle is much
>> shorter.
>>
>> We are planning to increase the 'sizeof(struct sock)' for perf reason. Saving an
>> allocation is an upside but not the major one we are looking (or care) for
>> sk. We are more looking for cacheline efficiency and probably remove the need
>> for bpf_local_storage[_elem] if the user chooses to use the in-place spaces of a
>> sk.
>>
>> If 'sizeof(struct sk_buff)' can be increased, this should align on where sk
>> local storage is going. If skb will solely depend on the existing
>> bpf_local_storage and has no plan to raise sizeof(struct sk_buff) for perf
>> purpose, the existing bpf_local_storage may be the wrong place to
>> repurpose/optimize because the lifecycle of skb is very different.
> 
> The lifetime difference is undeniable, but I still see common ground.
> To make it more concrete:
> 
> 1. IIRC you've mentioned wanting more bpf_local_storage->cache entries
>     for socks in some scenarios, while for skbs I'd expect we need
>     fewer. We could make the cache size configurable via a flexible
>     array.
> 
> 2. Embedding bpf_local_storage is another overlap I had in mind. For
>     socks that in within the same memory blob as struct sock, while for
>     skbs we'd want to embed it in skb_ext (once it's small enough). This
>     depends on whether you end up dropping bpf_local_storage for
>     sk_local_storage entirely, which I didn't know about until now.

For the in-place sk storage, it should not need the bpf_local_storage 
and the bpf_local_storage_elem. A stable map_xyz->sk_offset should be 
enough. If a storage is needed for all sk, the bpf prog should use the 
in-place sk storage instead of going through the bpf_local_storage[_elem].

imo, if we manage to pull out a new solution (whatever that is) for skb 
but does not perform close to the skb->data_meta, it is probably hard to 
use in production. I could be wrong but I don't see how embedding 
local_storage and/or shrink the cache can get there. I think we need 
another solution/design.

> 
> 3. I've heard the idea of allocating skb_ext memory together with
>     sk_buff was floated in the past. While trimming skb_ext at build-time
>     is hard today — say I need XFRM but don't care about crypto offloads
>     keeping state in skb_ext — the idea is similar to what you're
>     proposing for struct sock.


  reply	other threads:[~2026-02-23 19:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 14:56 [LSF/MM/BPF TOPIC] BPF local storage for every packet Jakub Sitnicki
2026-02-20 18:34 ` Martin KaFai Lau
2026-02-21 13:42   ` Jakub Sitnicki
2026-02-23 19:26     ` Martin KaFai Lau [this message]
2026-02-24 11:58       ` Jakub Sitnicki
2026-03-03 15:06 ` Zhu Yanjun
2026-03-03 21:07   ` Jakub Sitnicki
2026-03-16  3:02     ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e0f16c6c-cd9f-4881-b638-c52c8a83104e@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=bpf@vger.kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox