Netdev List
 help / color / mirror / Atom feed
* [PATCH net] net: Clear skb metadata before LWT xmit
@ 2026-04-28 19:44 Jakub Sitnicki
  2026-05-01  0:31 ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Sitnicki @ 2026-04-28 19:44 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Martin KaFai Lau
  Cc: netdev, bpf, kernel-team

skb metadata is meant for passing information between XDP and TC.
LWT programs cannot access the __sk_buff->data_meta pseudo-pointer.

However, LWT_XMIT programs can call helpers that move packet data.
One such helper is bpf_skb_change_head(), which uses skb_data_move() to
move both packet data and metadata after skb_push(). skb_data_move()
expects metadata to sit immediately before skb->data. This is the layout
that commit 8989d328dfe7 ("net: Helper to move packet data and metadata
after skb_push/pull") preserves for TC(X) programs when skb_push() or
skb_pull() moves packet data.

The IP output path can run LWT xmit before neighbour output has built the
outgoing L2 header. For forwarded, or otherwise RX-originated, packets
skb->data points at the network header while skb_mac_header() can still
point at the old L2 header. If such an skb still carries XDP metadata,
skb_data_move() sees metadata ending at skb_mac_header(), not immediately
before skb->data, and warns.

Drop skb metadata before calling into LWT xmit from the IPv4 and IPv6
output paths. Metadata is not part of the LWT BPF interface, and keeping it
attached lets hidden RX metadata affect LWT skb mutation helpers.

Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/ip_output.c  | 2 ++
 net/ipv6/ip6_output.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e4790cc7b5c2..3295e043310f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -220,6 +220,8 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 			return -ENOMEM;
 	}
 
+	skb_metadata_clear(skb);
+
 	if (lwtunnel_xmit_redirect(dst->lwtstate)) {
 		int res = lwtunnel_xmit(skb);
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7e92909ab5be..734818463e07 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -111,6 +111,8 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
 		}
 	}
 
+	skb_metadata_clear(skb);
+
 	if (lwtunnel_xmit_redirect(dst->lwtstate)) {
 		int res = lwtunnel_xmit(skb);
 




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Clear skb metadata before LWT xmit
  2026-04-28 19:44 [PATCH net] net: Clear skb metadata before LWT xmit Jakub Sitnicki
@ 2026-05-01  0:31 ` Jakub Kicinski
  2026-05-13  9:17   ` Jakub Sitnicki
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2026-05-01  0:31 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Martin KaFai Lau, netdev, bpf, kernel-team

On Tue, 28 Apr 2026 21:44:37 +0200 Jakub Sitnicki wrote:
> skb metadata is meant for passing information between XDP and TC.
> LWT programs cannot access the __sk_buff->data_meta pseudo-pointer.

netdev wasn't CCed on the sashiko reply but I think it now auto-marks
the patches as Changes Requested :| The Sashiko comments didn't seem
actionable to me. WDYT?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Clear skb metadata before LWT xmit
  2026-05-01  0:31 ` Jakub Kicinski
@ 2026-05-13  9:17   ` Jakub Sitnicki
  2026-05-14  0:26     ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Sitnicki @ 2026-05-13  9:17 UTC (permalink / raw)
  To: Jakub Kicinski, Daniel Borkmann
  Cc: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Martin KaFai Lau, netdev, bpf, kernel-team

On Thu, Apr 30, 2026 at 05:31 PM -07, Jakub Kicinski wrote:
> On Tue, 28 Apr 2026 21:44:37 +0200 Jakub Sitnicki wrote:
>> skb metadata is meant for passing information between XDP and TC.
>> LWT programs cannot access the __sk_buff->data_meta pseudo-pointer.
>
> netdev wasn't CCed on the sashiko reply but I think it now auto-marks
> the patches as Changes Requested :| The Sashiko comments didn't seem
> actionable to me. WDYT?

Apologies, I completely missed your message in the chaos of the
conference.

I actually wanted to double-check with you and Daniel - is consuming
metadata from TC egress supposed to be supported?

If not, then I agree - nothing left to do about the Sashiko feedback
[1].

[1] https://sashiko.dev/#/patchset/20260428-wip-skb-local-storage-from-scratch-v1-1-8f7ca9b378ce@cloudflare.com?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Clear skb metadata before LWT xmit
  2026-05-13  9:17   ` Jakub Sitnicki
@ 2026-05-14  0:26     ` Jakub Kicinski
  2026-05-14  7:04       ` Daniel Borkmann
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2026-05-14  0:26 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
	Paolo Abeni, Simon Horman, Martin KaFai Lau, netdev, bpf,
	kernel-team

On Wed, 13 May 2026 11:17:12 +0200 Jakub Sitnicki wrote:
> I actually wanted to double-check with you and Daniel - is consuming
> metadata from TC egress supposed to be supported?

Not AFAIK, let's see if Daniel has a use case in mind.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Clear skb metadata before LWT xmit
  2026-05-14  0:26     ` Jakub Kicinski
@ 2026-05-14  7:04       ` Daniel Borkmann
  2026-05-14  7:18         ` Daniel Borkmann
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2026-05-14  7:04 UTC (permalink / raw)
  To: Jakub Kicinski, Jakub Sitnicki
  Cc: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Martin KaFai Lau, netdev, bpf, kernel-team

On 5/14/26 2:26 AM, Jakub Kicinski wrote:
> On Wed, 13 May 2026 11:17:12 +0200 Jakub Sitnicki wrote:
>> I actually wanted to double-check with you and Daniel - is consuming
>> metadata from TC egress supposed to be supported?
> 
> Not AFAIK, let's see if Daniel has a use case in mind.


The only case I can think of would be if someone uses metadata
within the tc(x) layer, so from either tc ingress then xmit totc egress or vice versa in order to store BPF-related data in
front, but iiuc that should still work with your patch.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Clear skb metadata before LWT xmit
  2026-05-14  7:04       ` Daniel Borkmann
@ 2026-05-14  7:18         ` Daniel Borkmann
  2026-05-14 10:49           ` Jakub Sitnicki
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2026-05-14  7:18 UTC (permalink / raw)
  To: Jakub Kicinski, Jakub Sitnicki
  Cc: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Martin KaFai Lau, netdev, bpf, kernel-team

On 5/14/26 9:04 AM, Daniel Borkmann wrote:
> On 5/14/26 2:26 AM, Jakub Kicinski wrote:
>> On Wed, 13 May 2026 11:17:12 +0200 Jakub Sitnicki wrote:
>>> I actually wanted to double-check with you and Daniel - is consuming
>>> metadata from TC egress supposed to be supported?
>>
>> Not AFAIK, let's see if Daniel has a use case in mind.
> 
> The only case I can think of would be if someone uses metadata
> within the tc(x) layer, so from either tc ingress then xmit to
> tc egress or vice versa in order to store BPF-related data in> front, but iiuc that should still work with your patch.

Maybe the one case where it could break down is if you would
try to transport data from container-related iface (e.g. tc @
host-facing veth, or netkit) all the way to phys dev when there
is tunnel in between. Not sure if it would survive today,
potentially yes. In such case maybe better to move this further
into lwt related paths?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: Clear skb metadata before LWT xmit
  2026-05-14  7:18         ` Daniel Borkmann
@ 2026-05-14 10:49           ` Jakub Sitnicki
  0 siblings, 0 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2026-05-14 10:49 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jakub Kicinski, David S. Miller, David Ahern, Eric Dumazet,
	Paolo Abeni, Simon Horman, Martin KaFai Lau, netdev, bpf,
	kernel-team

On Thu, May 14, 2026 at 09:18:07AM +0200, 'Daniel Borkmann' via kernel-team wrote:
> Maybe the one case where it could break down is if you would
> try to transport data from container-related iface (e.g. tc @
> host-facing veth, or netkit) all the way to phys dev when there
> is tunnel in between. Not sure if it would survive today,
> potentially yes. In such case maybe better to move this further
> into lwt related paths?

That sounds good:

https://lore.kernel.org/r/20260514-bpf-lwt-drop-skb-metadata-v2-1-458664edc2b5@cloudflare.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-14 10:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 19:44 [PATCH net] net: Clear skb metadata before LWT xmit Jakub Sitnicki
2026-05-01  0:31 ` Jakub Kicinski
2026-05-13  9:17   ` Jakub Sitnicki
2026-05-14  0:26     ` Jakub Kicinski
2026-05-14  7:04       ` Daniel Borkmann
2026-05-14  7:18         ` Daniel Borkmann
2026-05-14 10:49           ` Jakub Sitnicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox