* Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
@ 2026-05-15 0:39 ` Chris Arges
2026-05-15 10:47 ` sashiko-bot
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Chris Arges @ 2026-05-15 0:39 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Martin KaFai Lau,
netdev, bpf, kernel-team
On 2026-05-14 12:47:09, Jakub Sitnicki wrote:
> skb metadata is meant for passing information between XDP and TC.
> LWT programs cannot access the __sk_buff->data_meta pseudo-pointer.
>
> However, LWT_XMIT programs can call helpers that move packet data.
> One such helper is bpf_skb_change_head(), which uses skb_data_move() to
> move both packet data and metadata after skb_push(). skb_data_move()
> expects metadata to sit immediately before skb->data. This is the layout
> that commit 8989d328dfe7 ("net: Helper to move packet data and metadata
> after skb_push/pull") preserves for TC(X) programs when skb_push() or
> skb_pull() moves packet data.
>
> The IP output path can run LWT xmit before neighbour output has built the
> outgoing L2 header. For forwarded, or otherwise RX-originated, packets
> skb->data points at the network header while skb_mac_header() can still
> point at the old L2 header. If such an skb still carries XDP metadata,
> skb_data_move() sees metadata ending at skb_mac_header(), not immediately
> before skb->data, and warns:
>
> WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90
> CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G O 6.18.21 #1
> RIP: 0010:skb_data_move+0x47/0x90
> Call Trace:
> <IRQ>
> bpf_skb_change_head+0xe6/0x1a0
> bpf_prog_...+0x213/0x2e3
> run_lwt_bpf.isra.0+0x1d3/0x360
> bpf_xmit+0x46/0xe0
> lwtunnel_xmit+0xa1/0xf0
> ip_finish_output2+0x1e7/0x5e0
> ip_output+0x63/0x100
> __netif_receive_skb_one_core+0x85/0xa0
> process_backlog+0x9c/0x150
> __napi_poll+0x2b/0x190
> net_rx_action+0x40b/0x7f0
> handle_softirqs+0xd2/0x270
> do_softirq+0x3f/0x60
> </IRQ>
>
> Drop skb metadata in the LWT BPF xmit path, just before invoking the
> program. Doing it late, rather than in ip[6]_finish_output2(), preserves
> metadata for tc egress on the outgoing device when no LWT BPF program is
> attached - the only scenario where carrying tc-layer metadata through the
> IP output path could be meaningful. Non-BPF LWT encap types do not run
> helpers that touch metadata, so the BPF xmit op is the only site that needs
> the clear.
>
> Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> Changes in v2:
> - Clear metadata in bpf_xmit to allow access from tc(x) egress (Daniel)
> - Add WARNING snippet to the description
> - Link to v1: https://lore.kernel.org/r/20260428-wip-skb-local-storage-from-scratch-v1-1-8f7ca9b378ce@cloudflare.com
I was able to reproduce the problem with a C program. After applying your patch
I could no longer repro.
Tested-by: Chris J Arges <carges@cloudflare.com>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
2026-05-15 0:39 ` Chris Arges
@ 2026-05-15 10:47 ` sashiko-bot
2026-05-18 10:00 ` Daniel Borkmann
2026-05-18 23:18 ` Jakub Kicinski
3 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-05-15 10:47 UTC (permalink / raw)
To: Jakub Sitnicki; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unconditionally calling `skb_metadata_clear()` in `bpf_xmit` causes a data race and corrupts metadata for cloned SKBs.
- [High] Uninitialized memory read and metadata corruption in non-BPF encapsulations and `bpf_input`.
--
commit 2265e07ff5cfd9680de6b565a2b5679d5adafe20
Author: Jakub Sitnicki <jakub@cloudflare.com>
net: Clear skb metadata in LWT BPF xmit
This commit clears the SKB metadata in the LWT BPF xmit path before
invoking the BPF program. It aims to prevent a warning in skb_data_move()
when bpf_skb_change_head() is called on packets with existing metadata and a
non-matching MAC header.
> Drop skb metadata in the LWT BPF xmit path, just before invoking the
> program. Doing it late, rather than in ip[6]_finish_output2(), preserves
> metadata for tc egress on the outgoing device when no LWT BPF program is
> attached - the only scenario where carrying tc-layer metadata through the
> IP output path could be meaningful. Non-BPF LWT encap types do not run
> helpers that touch metadata, so the BPF xmit op is the only site that needs
> the clear.
Is this assumption accurate for non-BPF encapsulations and LWT_IN programs?
When non-BPF encapsulations (like IPIP or SEG6) or LWT_IN programs use
skb_push() to prepend outer headers, they shift skb->data downwards but do
not migrate the metadata block.
Because the metadata pointer is implicitly calculated as skb->data - meta_len,
the new encapsulation header silently overwrites the original metadata memory.
The implicit metadata pointer shifts further into the uninitialized SKB
headroom.
When the packet subsequently reaches TC egress, won't BPF programs accessing
data_meta read uninitialized headroom memory instead of the preserved
metadata?
> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d3..d26cb9bb81893 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -284,6 +284,11 @@ static int bpf_xmit(struct sk_buff *skb)
> __be16 proto = skb->protocol;
> int ret;
>
> + /* LWT BPF programs cannot access skb metadata but can corrupt
> + * it with helpers that operate on skb headroom. Drop it.
> + */
> + skb_metadata_clear(skb);
Since bpf_xmit() operates in the LWT IP output path which frequently processes
cloned SKBs (e.g., from multicast forwarding or TCP retransmits), is it safe
to call skb_metadata_clear() here without unsharing the SKB?
skb_metadata_clear() directly modifies skb_shinfo(skb)->meta_len. For cloned
SKBs, the skb_shinfo structure is shared across all clones. Modifying it
without gaining exclusive ownership first (such as via skb_unclone()) will
silently zero out the metadata length for all other clones of the same
packet, causing them to unexpectedly lose their metadata when processed on
concurrent, unrelated network paths.
> +
> ret = run_lwt_bpf(skb, &bpf->xmit, dst, CAN_REDIRECT);
> switch (ret) {
> case BPF_OK:
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260514-bpf-lwt-drop-skb-metadata-v2-1-458664edc2b5@cloudflare.com?part=1
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
2026-05-15 0:39 ` Chris Arges
2026-05-15 10:47 ` sashiko-bot
@ 2026-05-18 10:00 ` Daniel Borkmann
2026-05-18 23:18 ` Jakub Kicinski
3 siblings, 0 replies; 6+ messages in thread
From: Daniel Borkmann @ 2026-05-18 10:00 UTC (permalink / raw)
To: Jakub Sitnicki, David S. Miller, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Martin KaFai Lau
Cc: netdev, bpf, kernel-team
On 5/14/26 12:47 PM, Jakub Sitnicki wrote:
> skb metadata is meant for passing information between XDP and TC.
> LWT programs cannot access the __sk_buff->data_meta pseudo-pointer.
>
> However, LWT_XMIT programs can call helpers that move packet data.
> One such helper is bpf_skb_change_head(), which uses skb_data_move() to
> move both packet data and metadata after skb_push(). skb_data_move()
> expects metadata to sit immediately before skb->data. This is the layout
> that commit 8989d328dfe7 ("net: Helper to move packet data and metadata
> after skb_push/pull") preserves for TC(X) programs when skb_push() or
> skb_pull() moves packet data.
>
> The IP output path can run LWT xmit before neighbour output has built the
> outgoing L2 header. For forwarded, or otherwise RX-originated, packets
> skb->data points at the network header while skb_mac_header() can still
> point at the old L2 header. If such an skb still carries XDP metadata,
> skb_data_move() sees metadata ending at skb_mac_header(), not immediately
> before skb->data, and warns:
>
> WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90
> CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G O 6.18.21 #1
> RIP: 0010:skb_data_move+0x47/0x90
> Call Trace:
> <IRQ>
> bpf_skb_change_head+0xe6/0x1a0
> bpf_prog_...+0x213/0x2e3
> run_lwt_bpf.isra.0+0x1d3/0x360
> bpf_xmit+0x46/0xe0
> lwtunnel_xmit+0xa1/0xf0
> ip_finish_output2+0x1e7/0x5e0
> ip_output+0x63/0x100
> __netif_receive_skb_one_core+0x85/0xa0
> process_backlog+0x9c/0x150
> __napi_poll+0x2b/0x190
> net_rx_action+0x40b/0x7f0
> handle_softirqs+0xd2/0x270
> do_softirq+0x3f/0x60
> </IRQ>
>
> Drop skb metadata in the LWT BPF xmit path, just before invoking the
> program. Doing it late, rather than in ip[6]_finish_output2(), preserves
> metadata for tc egress on the outgoing device when no LWT BPF program is
> attached - the only scenario where carrying tc-layer metadata through the
> IP output path could be meaningful. Non-BPF LWT encap types do not run
> helpers that touch metadata, so the BPF xmit op is the only site that needs
> the clear.
>
> Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> Changes in v2:
> - Clear metadata in bpf_xmit to allow access from tc(x) egress (Daniel)
> - Add WARNING snippet to the description
> - Link to v1: https://lore.kernel.org/r/20260428-wip-skb-local-storage-from-scratch-v1-1-8f7ca9b378ce@cloudflare.com
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
` (2 preceding siblings ...)
2026-05-18 10:00 ` Daniel Borkmann
@ 2026-05-18 23:18 ` Jakub Kicinski
2026-05-19 9:47 ` Jakub Sitnicki
3 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2026-05-18 23:18 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
Paolo Abeni, Simon Horman, Martin KaFai Lau, netdev, bpf,
kernel-team
On Thu, 14 May 2026 12:47:09 +0200 Jakub Sitnicki wrote:
> + /* LWT BPF programs cannot access skb metadata but can corrupt
> + * it with helpers that operate on skb headroom. Drop it.
> + */
> + skb_metadata_clear(skb);
Sashiko has a point, right? Is there something in LWT which would
prevent us from working on clones?
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
2026-05-18 23:18 ` Jakub Kicinski
@ 2026-05-19 9:47 ` Jakub Sitnicki
0 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2026-05-19 9:47 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
Paolo Abeni, Simon Horman, Martin KaFai Lau, netdev, bpf,
kernel-team
On Mon, May 18, 2026 at 04:18 PM -07, Jakub Kicinski wrote:
> On Thu, 14 May 2026 12:47:09 +0200 Jakub Sitnicki wrote:
>> + /* LWT BPF programs cannot access skb metadata but can corrupt
>> + * it with helpers that operate on skb headroom. Drop it.
>> + */
>> + skb_metadata_clear(skb);
>
> Sashiko has a point, right? Is there something in LWT which would
> prevent us from working on clones?
Totally. I've just been spread a little thin lately.
We have a bpf/selftest that uses tc act mirred that exercises this
scenario that I need to extend.
In general, anything high/med from LLMs won't go unaddressed.
^ permalink raw reply [flat|nested] 6+ messages in thread