BPF List
 help / color / mirror / Atom feed
* [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
@ 2026-05-14 10:47 Jakub Sitnicki
  2026-05-15  0:39 ` Chris Arges
  2026-05-15 10:47 ` sashiko-bot
  0 siblings, 2 replies; 3+ messages in thread
From: Jakub Sitnicki @ 2026-05-14 10:47 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Martin KaFai Lau
  Cc: netdev, bpf, kernel-team

skb metadata is meant for passing information between XDP and TC.
LWT programs cannot access the __sk_buff->data_meta pseudo-pointer.

However, LWT_XMIT programs can call helpers that move packet data.
One such helper is bpf_skb_change_head(), which uses skb_data_move() to
move both packet data and metadata after skb_push(). skb_data_move()
expects metadata to sit immediately before skb->data. This is the layout
that commit 8989d328dfe7 ("net: Helper to move packet data and metadata
after skb_push/pull") preserves for TC(X) programs when skb_push() or
skb_pull() moves packet data.

The IP output path can run LWT xmit before neighbour output has built the
outgoing L2 header. For forwarded, or otherwise RX-originated, packets
skb->data points at the network header while skb_mac_header() can still
point at the old L2 header. If such an skb still carries XDP metadata,
skb_data_move() sees metadata ending at skb_mac_header(), not immediately
before skb->data, and warns:

  WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90
  CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G           O        6.18.21 #1
  RIP: 0010:skb_data_move+0x47/0x90
  Call Trace:
   <IRQ>
   bpf_skb_change_head+0xe6/0x1a0
   bpf_prog_...+0x213/0x2e3
   run_lwt_bpf.isra.0+0x1d3/0x360
   bpf_xmit+0x46/0xe0
   lwtunnel_xmit+0xa1/0xf0
   ip_finish_output2+0x1e7/0x5e0
   ip_output+0x63/0x100
   __netif_receive_skb_one_core+0x85/0xa0
   process_backlog+0x9c/0x150
   __napi_poll+0x2b/0x190
   net_rx_action+0x40b/0x7f0
   handle_softirqs+0xd2/0x270
   do_softirq+0x3f/0x60
   </IRQ>

Drop skb metadata in the LWT BPF xmit path, just before invoking the
program. Doing it late, rather than in ip[6]_finish_output2(), preserves
metadata for tc egress on the outgoing device when no LWT BPF program is
attached - the only scenario where carrying tc-layer metadata through the
IP output path could be meaningful. Non-BPF LWT encap types do not run
helpers that touch metadata, so the BPF xmit op is the only site that needs
the clear.

Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
Changes in v2:
- Clear metadata in bpf_xmit to allow access from tc(x) egress (Daniel)
- Add WARNING snippet to the description
- Link to v1: https://lore.kernel.org/r/20260428-wip-skb-local-storage-from-scratch-v1-1-8f7ca9b378ce@cloudflare.com
---
 net/core/lwt_bpf.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index f71ef82a5f3d..d26cb9bb8189 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -284,6 +284,11 @@ static int bpf_xmit(struct sk_buff *skb)
 		__be16 proto = skb->protocol;
 		int ret;
 
+		/* LWT BPF programs cannot access skb metadata but can corrupt
+		 * it with helpers that operate on skb headroom. Drop it.
+		 */
+		skb_metadata_clear(skb);
+
 		ret = run_lwt_bpf(skb, &bpf->xmit, dst, CAN_REDIRECT);
 		switch (ret) {
 		case BPF_OK:




^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
  2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
@ 2026-05-15  0:39 ` Chris Arges
  2026-05-15 10:47 ` sashiko-bot
  1 sibling, 0 replies; 3+ messages in thread
From: Chris Arges @ 2026-05-15  0:39 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Martin KaFai Lau,
	netdev, bpf, kernel-team

On 2026-05-14 12:47:09, Jakub Sitnicki wrote:
> skb metadata is meant for passing information between XDP and TC.
> LWT programs cannot access the __sk_buff->data_meta pseudo-pointer.
> 
> However, LWT_XMIT programs can call helpers that move packet data.
> One such helper is bpf_skb_change_head(), which uses skb_data_move() to
> move both packet data and metadata after skb_push(). skb_data_move()
> expects metadata to sit immediately before skb->data. This is the layout
> that commit 8989d328dfe7 ("net: Helper to move packet data and metadata
> after skb_push/pull") preserves for TC(X) programs when skb_push() or
> skb_pull() moves packet data.
> 
> The IP output path can run LWT xmit before neighbour output has built the
> outgoing L2 header. For forwarded, or otherwise RX-originated, packets
> skb->data points at the network header while skb_mac_header() can still
> point at the old L2 header. If such an skb still carries XDP metadata,
> skb_data_move() sees metadata ending at skb_mac_header(), not immediately
> before skb->data, and warns:
> 
>   WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90
>   CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G           O        6.18.21 #1
>   RIP: 0010:skb_data_move+0x47/0x90
>   Call Trace:
>    <IRQ>
>    bpf_skb_change_head+0xe6/0x1a0
>    bpf_prog_...+0x213/0x2e3
>    run_lwt_bpf.isra.0+0x1d3/0x360
>    bpf_xmit+0x46/0xe0
>    lwtunnel_xmit+0xa1/0xf0
>    ip_finish_output2+0x1e7/0x5e0
>    ip_output+0x63/0x100
>    __netif_receive_skb_one_core+0x85/0xa0
>    process_backlog+0x9c/0x150
>    __napi_poll+0x2b/0x190
>    net_rx_action+0x40b/0x7f0
>    handle_softirqs+0xd2/0x270
>    do_softirq+0x3f/0x60
>    </IRQ>
> 
> Drop skb metadata in the LWT BPF xmit path, just before invoking the
> program. Doing it late, rather than in ip[6]_finish_output2(), preserves
> metadata for tc egress on the outgoing device when no LWT BPF program is
> attached - the only scenario where carrying tc-layer metadata through the
> IP output path could be meaningful. Non-BPF LWT encap types do not run
> helpers that touch metadata, so the BPF xmit op is the only site that needs
> the clear.
> 
> Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> Changes in v2:
> - Clear metadata in bpf_xmit to allow access from tc(x) egress (Daniel)
> - Add WARNING snippet to the description
> - Link to v1: https://lore.kernel.org/r/20260428-wip-skb-local-storage-from-scratch-v1-1-8f7ca9b378ce@cloudflare.com

I was able to reproduce the problem with a C program. After applying your patch
I could no longer repro.

Tested-by: Chris J Arges <carges@cloudflare.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
  2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
  2026-05-15  0:39 ` Chris Arges
@ 2026-05-15 10:47 ` sashiko-bot
  1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-05-15 10:47 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unconditionally calling `skb_metadata_clear()` in `bpf_xmit` causes a data race and corrupts metadata for cloned SKBs.
- [High] Uninitialized memory read and metadata corruption in non-BPF encapsulations and `bpf_input`.
--

commit 2265e07ff5cfd9680de6b565a2b5679d5adafe20
Author: Jakub Sitnicki <jakub@cloudflare.com>

net: Clear skb metadata in LWT BPF xmit

This commit clears the SKB metadata in the LWT BPF xmit path before
invoking the BPF program. It aims to prevent a warning in skb_data_move()
when bpf_skb_change_head() is called on packets with existing metadata and a
non-matching MAC header.

> Drop skb metadata in the LWT BPF xmit path, just before invoking the
> program. Doing it late, rather than in ip[6]_finish_output2(), preserves
> metadata for tc egress on the outgoing device when no LWT BPF program is
> attached - the only scenario where carrying tc-layer metadata through the
> IP output path could be meaningful. Non-BPF LWT encap types do not run
> helpers that touch metadata, so the BPF xmit op is the only site that needs
> the clear.

Is this assumption accurate for non-BPF encapsulations and LWT_IN programs?

When non-BPF encapsulations (like IPIP or SEG6) or LWT_IN programs use
skb_push() to prepend outer headers, they shift skb->data downwards but do
not migrate the metadata block.

Because the metadata pointer is implicitly calculated as skb->data - meta_len,
the new encapsulation header silently overwrites the original metadata memory.
The implicit metadata pointer shifts further into the uninitialized SKB
headroom.

When the packet subsequently reaches TC egress, won't BPF programs accessing
data_meta read uninitialized headroom memory instead of the preserved
metadata?

> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d3..d26cb9bb81893 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -284,6 +284,11 @@ static int bpf_xmit(struct sk_buff *skb)
>  		__be16 proto = skb->protocol;
>  		int ret;
>  
> +		/* LWT BPF programs cannot access skb metadata but can corrupt
> +		 * it with helpers that operate on skb headroom. Drop it.
> +		 */
> +		skb_metadata_clear(skb);

Since bpf_xmit() operates in the LWT IP output path which frequently processes
cloned SKBs (e.g., from multicast forwarding or TCP retransmits), is it safe
to call skb_metadata_clear() here without unsharing the SKB?

skb_metadata_clear() directly modifies skb_shinfo(skb)->meta_len. For cloned
SKBs, the skb_shinfo structure is shared across all clones. Modifying it
without gaining exclusive ownership first (such as via skb_unclone()) will
silently zero out the metadata length for all other clones of the same
packet, causing them to unexpectedly lose their metadata when processed on
concurrent, unrelated network paths.

> +
>  		ret = run_lwt_bpf(skb, &bpf->xmit, dst, CAN_REDIRECT);
>  		switch (ret) {
>  		case BPF_OK:

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260514-bpf-lwt-drop-skb-metadata-v2-1-458664edc2b5@cloudflare.com?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-15 10:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
2026-05-15  0:39 ` Chris Arges
2026-05-15 10:47 ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox