Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit

BPF List
 help / color / mirror / Atom feed

From: Jakub Sitnicki <jakub@cloudflare.com>
To: sashiko-bot@kernel.org
Cc: sashiko-reviews@lists.linux.dev,  bpf@vger.kernel.org
Subject: Re: [PATCH net v2] net: Clear skb metadata in LWT BPF xmit
Date: Wed, 17 Jun 2026 17:20:46 +0200	[thread overview]
Message-ID: <877bnxb3ch.fsf@cloudflare.com> (raw)
In-Reply-To: <20260515104741.DF6A9C2BCB0@smtp.kernel.org> (sashiko-bot@kernel.org's message of "Fri, 15 May 2026 10:47:40 +0000")

Circling back to this.

On Fri, May 15, 2026 at 10:47 AM GMT, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Unconditionally calling `skb_metadata_clear()` in `bpf_xmit` causes a data race and corrupts metadata for cloned SKBs.
> - [High] Uninitialized memory read and metadata corruption in non-BPF encapsulations and `bpf_input`.
> --
>
> commit 2265e07ff5cfd9680de6b565a2b5679d5adafe20
> Author: Jakub Sitnicki <jakub@cloudflare.com>
>
> net: Clear skb metadata in LWT BPF xmit
>
> This commit clears the SKB metadata in the LWT BPF xmit path before
> invoking the BPF program. It aims to prevent a warning in skb_data_move()
> when bpf_skb_change_head() is called on packets with existing metadata and a
> non-matching MAC header.
>
>> Drop skb metadata in the LWT BPF xmit path, just before invoking the
>> program. Doing it late, rather than in ip[6]_finish_output2(), preserves
>> metadata for tc egress on the outgoing device when no LWT BPF program is
>> attached - the only scenario where carrying tc-layer metadata through the
>> IP output path could be meaningful. Non-BPF LWT encap types do not run
>> helpers that touch metadata, so the BPF xmit op is the only site that needs
>> the clear.
>
> Is this assumption accurate for non-BPF encapsulations and LWT_IN programs?
>
> When non-BPF encapsulations (like IPIP or SEG6) or LWT_IN programs use
> skb_push() to prepend outer headers, they shift skb->data downwards but do
> not migrate the metadata block.
>
> Because the metadata pointer is implicitly calculated as skb->data - meta_len,
> the new encapsulation header silently overwrites the original metadata memory.
> The implicit metadata pointer shifts further into the uninitialized SKB
> headroom.
>
> When the packet subsequently reaches TC egress, won't BPF programs accessing
> data_meta read uninitialized headroom memory instead of the preserved
> metadata?

I initially focused on the case where a warning is generated by
skb_data_move(). This can be triggered only from BPF LWT because the BPF
helpers have been converted to call skb_postpush_data_move() after
skb_push(), while non-BPF LWT encaps have not.

However, non-BPF LWT enaps mangle the metadata as well, as it was
pointed out. So this needs to be addressed.

>
>> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
>> index f71ef82a5f3d3..d26cb9bb81893 100644
>> --- a/net/core/lwt_bpf.c
>> +++ b/net/core/lwt_bpf.c
>> @@ -284,6 +284,11 @@ static int bpf_xmit(struct sk_buff *skb)
>>  		__be16 proto = skb->protocol;
>>  		int ret;
>>  
>> +		/* LWT BPF programs cannot access skb metadata but can corrupt
>> +		 * it with helpers that operate on skb headroom. Drop it.
>> +		 */
>> +		skb_metadata_clear(skb);
>
> Since bpf_xmit() operates in the LWT IP output path which frequently processes
> cloned SKBs (e.g., from multicast forwarding or TCP retransmits), is it safe
> to call skb_metadata_clear() here without unsharing the SKB?
>
> skb_metadata_clear() directly modifies skb_shinfo(skb)->meta_len. For cloned
> SKBs, the skb_shinfo structure is shared across all clones. Modifying it
> without gaining exclusive ownership first (such as via skb_unclone()) will
> silently zero out the metadata length for all other clones of the same
> packet, causing them to unexpectedly lose their metadata when processed on
> concurrent, unrelated network paths.

This is not an issue because:

1. Only RX-originated skbs can carry metadata.
2. RX-originated skbs must be forwarded (routed) to hit LWT.
3. ip_forward() will skb_cow() the packet to unclone it before LWT..

>
>> +
>>  		ret = run_lwt_bpf(skb, &bpf->xmit, dst, CAN_REDIRECT);
>>  		switch (ret) {
>>  		case BPF_OK:

next prev parent reply	other threads:[~2026-06-17 15:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14 10:47 [PATCH net v2] net: Clear skb metadata in LWT BPF xmit Jakub Sitnicki
2026-05-15  0:39 ` Chris Arges
2026-05-15 10:47 ` sashiko-bot
2026-06-17 15:20   ` Jakub Sitnicki [this message]
2026-05-18 10:00 ` Daniel Borkmann
2026-05-18 23:18 ` Jakub Kicinski
2026-05-19  9:47   ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877bnxb3ch.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=sashiko-bot@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox