All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Paul Chaignon <paul.chaignon@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	bpf@vger.kernel.org
Subject: Re: [PATCH bpf v2 1/2] bpf: Scrub packet on bpf_redirect_peer
Date: Tue, 6 May 2025 12:17:13 -0700	[thread overview]
Message-ID: <73cccd5e-de7d-404b-910d-c6a799c28c57@linux.dev> (raw)
In-Reply-To: <1728ead5e0fe45e7a6542c36bd4e3ca07a73b7d6.1746460653.git.paul.chaignon@gmail.com>

On 5/5/25 12:58 PM, Paul Chaignon wrote:
> When bpf_redirect_peer is used to redirect packets to a device in
> another network namespace, the skb isn't scrubbed. That can lead skb
> information from one namespace to be "misused" in another namespace.
> 
> As one example, this is causing Cilium to drop traffic when using
> bpf_redirect_peer to redirect packets that just went through IPsec
> decryption to a container namespace. The following pwru trace shows (1)
> the packet path from the host's XFRM layer to the container's XFRM
> layer where it's dropped and (2) the number of active skb extensions at
> each function.
> 
>      NETNS       MARK  IFACE  TUPLE                                FUNC
>      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm_rcv_cb
>                               .active_extensions = (__u8)2,
>      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm4_rcv_cb
>                               .active_extensions = (__u8)2,
>      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  gro_cells_receive
>                               .active_extensions = (__u8)2,
>      [...]
>      4026533547  0     eth0   10.244.3.124:35473->10.244.2.158:53  skb_do_redirect
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv_core
>                               .active_extensions = (__u8)2,
>      [...]
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  udp_queue_rcv_one_skb
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_policy_check
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_decode_session
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  security_xfrm_decode_session
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
>                               .active_extensions = (__u8)2,
> 
> In this case, there are no XFRM policies in the container's network
> namespace so the drop is unexpected. When we decrypt the IPsec packet,
> the XFRM state used for decryption is set in the skb extensions. This
> information is preserved across the netns switch. When we reach the
> XFRM policy check in the container's netns, __xfrm_policy_check drops
> the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
> policy can't be found that matches the (host-side) XFRM state used for
> decryption.
> 
> This patch fixes this by scrubbing the packet when using
> bpf_redirect_peer, as is done on typical netns switches via veth
> devices except skb->mark and skb->tstamp are not zeroed.
> 
> Fixes: 9aa1206e8f482 ("bpf: Add redirect_peer helper")
> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>


  parent reply	other threads:[~2025-05-06 19:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-05 19:58 [PATCH bpf v2 1/2] bpf: Scrub packet on bpf_redirect_peer Paul Chaignon
2025-05-05 19:58 ` [PATCH bpf v2 2/2] bpf: Clarify handling of mark and tstamp by redirect_peer Paul Chaignon
2025-05-05 21:29   ` Daniel Borkmann
2025-05-06 19:19   ` Martin KaFai Lau
2025-05-05 21:28 ` [PATCH bpf v2 1/2] bpf: Scrub packet on bpf_redirect_peer Daniel Borkmann
2025-05-06 19:17 ` Martin KaFai Lau [this message]
2025-05-08  1:40 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73cccd5e-de7d-404b-910d-c6a799c28c57@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paul.chaignon@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.