From: Martin KaFai Lau <martin.lau@linux.dev>
To: Jakub Sitnicki <jakub@cloudflare.com>
Cc: netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Michael Chan <michael.chan@broadcom.com>,
Pavan Chebbi <pavan.chebbi@broadcom.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Tony Nguyen <anthony.l.nguyen@intel.com>,
Przemek Kitszel <przemyslaw.kitszel@intel.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jesper Dangaard Brouer <hawk@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
intel-wired-lan@lists.osuosl.org, bpf@vger.kernel.org,
kernel-team@cloudflare.com, Jakub Kicinski <kuba@kernel.org>,
Amery Hung <ameryhung@gmail.com>
Subject: Re: [PATCH net-next 00/10] Call skb_metadata_set when skb->data points past metadata
Date: Tue, 27 Jan 2026 11:33:30 -0800 [thread overview]
Message-ID: <243ea894-3bf3-4c10-b012-d4451e7ec17e@linux.dev> (raw)
In-Reply-To: <878qdltsg0.fsf@cloudflare.com>
On 1/25/26 11:15 AM, Jakub Sitnicki wrote:
> On Thu, Jan 22, 2026 at 12:21 PM -08, Martin KaFai Lau wrote:
>> On 1/13/26 4:33 AM, Jakub Sitnicki wrote:
>>> Good point. I'm hoping we don't have to allocate from
>>> skb_metadata_set(), which does sound prohibitively expensive. Instead
>>> we'd allocate the extension together with the skb if we know upfront
>>> that metadata will be used.
>>
>> [ Sorry for being late. Have been catching up after holidays. ]
>>
>> For the sk local storage (which was mentioned in other replies as making skb
>> metadata to look more like sk local storage), there is a plan (Amery has been
>> looking into it) to allocate the storage together with sk for performance
>> reason. This means allocating a larger 'struct sock'. The extra space will be at
>> the front of sk instead of the end of sk because of how the 'struct sock' is
>> embedded in tcp_sock/udp_sock/... If skb is going in the same direction, it
>> should be useful to have a similar scheme on: upfront allocation and then shared
>> by multiple BPF progs.
>>
>> The current thinking is to built upon the existing bpf_sk_local_storage usage. A
>> boot param decides how much BPF space should be allocated for 'struct
>> sock'. When a bpf_sk_storage_map is created (with a new use_reserve flag), the
>> space will be allocated permanently from the head space of every sk for this
>> map. The read (from a BPF prog) will be at one stable offset before a sk. If
>> there is no more head space left, the map creation will fail. User can decide if
>> it wants to retry without the 'use_reserve' flag.
>
> Thanks for sharing the plans.
>
> We will definitely be looking into ways of eliminating allocations in
> the long run. With one allocation for skb_ext, one for
> bpf_local_storage, and one for the actual map, it seems unlikely we will
> be able to attach metadata this way to every packet. Which is something
> we wanted for our "label packet once, use label everywhere" use case.
>
> I'm not sure how much we can squeeze in together with the sk_buff.
> Hopefully at least skb_ext plus a pointer to bpf_local_storage.
yeah, only a bpf_local_storage pointer is needed in skb (or in skb_ext).
It is the same for the bpf sk/task/... storage.
To be clear, for allocation in skb, I was thinking more about Paolo's
comment on "...increasing struct sk_buff size as an alternative to the
mptcp skb extension...".
>
> I'm also hoping we can allocate memory for bpf_local_storage together
> with the backing space for the map, which update triggers the skb
> extension activation.
Allocate the actual storage at the end of bpf_local_storage? Hmm... off
the top of my head, I don't have a good idea how to do it without
trading off flexibility. If trading off flexibility, may as well
allocate fixed extra space at the sk (/skb) and get a performance
benefit (which would need to be measured).
>
> Finally, bpf_local_storage itself has a pretty generous cache which
> blows it up. Maybe the cache could be a flexible array, which could be
> smaller for skb local storage.
For our usage, the cache has been slowly filling up, so we actually have
another side of the issue. Improvements on bpf_local_storage is always
welcomed.
I am currently more interested in getting the extra memory/headroom
allocated for an sk. Eventually, the storage(s) that will be needed for
all (or most) sk will use the extra headroom of sk. The current
bpf_local_storage (pointer) in sk will be more for testing/ad-hoc
purpose or for performance-insensitive usage.
It is probably off topic now. It seems having extra tail space in a skb
is not in your current plan for the next respin.
prev parent reply other threads:[~2026-01-27 19:33 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-10 21:05 [PATCH net-next 00/10] Call skb_metadata_set when skb->data points past metadata Jakub Sitnicki
2026-01-10 21:05 ` [PATCH net-next 01/10] net: Document skb_metadata_set contract with the drivers Jakub Sitnicki
2026-01-12 11:28 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 02/10] bnxt_en: Call skb_metadata_set when skb->data points past metadata Jakub Sitnicki
2026-01-12 11:29 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 03/10] i40e: " Jakub Sitnicki
2026-01-12 11:30 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 04/10] igb: " Jakub Sitnicki
2026-01-12 11:31 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 05/10] igc: " Jakub Sitnicki
2026-01-12 11:31 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 06/10] ixgbe: " Jakub Sitnicki
2026-01-12 11:32 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 07/10] mlx5e: " Jakub Sitnicki
2026-01-12 11:32 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-13 6:08 ` Tariq Toukan
2026-01-13 12:52 ` Jakub Sitnicki
2026-01-10 21:05 ` [PATCH net-next 08/10] veth: " Jakub Sitnicki
2026-01-12 11:33 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 09/10] xsk: " Jakub Sitnicki
2026-01-12 11:33 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-10 21:05 ` [PATCH net-next 10/10] xdp: " Jakub Sitnicki
2026-01-12 11:33 ` [Intel-wired-lan] " Loktionov, Aleksandr
2026-01-13 3:08 ` [PATCH net-next 00/10] " Jakub Kicinski
2026-01-13 12:09 ` Paolo Abeni
2026-01-13 12:40 ` [Intel-wired-lan] " Jakub Sitnicki
2026-01-13 18:52 ` Jesper Dangaard Brouer
2026-01-13 20:22 ` [Intel-wired-lan] " Jakub Sitnicki
2026-01-14 11:49 ` Toke Høiland-Jørgensen
2026-01-14 12:33 ` Jakub Sitnicki
2026-01-13 12:33 ` Jakub Sitnicki
2026-01-22 20:21 ` Martin KaFai Lau
2026-01-25 19:15 ` Jakub Sitnicki
2026-01-27 19:33 ` Martin KaFai Lau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=243ea894-3bf3-4c10-b012-d4451e7ec17e@linux.dev \
--to=martin.lau@linux.dev \
--cc=ameryhung@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=anthony.l.nguyen@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=mbloch@nvidia.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
--cc=przemyslaw.kitszel@intel.com \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox