From: Paul Chaignon <paul.chaignon@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org, Martin KaFai Lau <martin.lau@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [PATCH bpf] bpf: Scrub packet on bpf_redirect_peer
Date: Mon, 5 May 2025 22:01:38 +0200 [thread overview]
Message-ID: <aBkZIoqBaClS71tn@mail.gmail.com> (raw)
In-Reply-To: <11ea4e28-4653-4061-9226-fbee3d5b9f32@iogearbox.net>
On Mon, May 05, 2025 at 12:01:30PM +0200, Daniel Borkmann wrote:
> On 5/5/25 11:48 AM, Paul Chaignon wrote:
> > When bpf_redirect_peer is used to redirect packets to a device in
> > another network namespace, the skb isn't scrubbed. That can lead skb
> > information from one namespace to be "misused" in another namespace.
> >
> > As one example, this is causing Cilium to drop traffic when using
> > bpf_redirect_peer to redirect packets that just went through IPsec
> > decryption to a container namespace. The following pwru trace shows (1)
> > the packet path from the host's XFRM layer to the container's XFRM
> > layer where it's dropped and (2) the number of active skb extensions at
> > each function.
> >
> > NETNS MARK IFACE TUPLE FUNC
> > 4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm_rcv_cb
> > .active_extensions = (__u8)2,
> > 4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 xfrm4_rcv_cb
> > .active_extensions = (__u8)2,
> > 4026533547 d00 eth0 10.244.3.124:35473->10.244.2.158:53 gro_cells_receive
> > .active_extensions = (__u8)2,
> > [...]
> > 4026533547 0 eth0 10.244.3.124:35473->10.244.2.158:53 skb_do_redirect
> > .active_extensions = (__u8)2,
> > 4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv
> > .active_extensions = (__u8)2,
> > 4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 ip_rcv_core
> > .active_extensions = (__u8)2,
> > [...]
> > 4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 udp_queue_rcv_one_skb
> > .active_extensions = (__u8)2,
> > 4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_policy_check
> > .active_extensions = (__u8)2,
> > 4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 __xfrm_decode_session
> > .active_extensions = (__u8)2,
> > 4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 security_xfrm_decode_session
> > .active_extensions = (__u8)2,
> > 4026534999 0 eth0 10.244.3.124:35473->10.244.2.158:53 kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
> > .active_extensions = (__u8)2,
> >
> > In this case, there are no XFRM policies in the container's network
> > namespace so the drop is unexpected. When we decrypt the IPsec packet,
> > the XFRM state used for decryption is set in the skb extensions. This
> > information is preserved across the netns switch. When we reach the
> > XFRM policy check in the container's netns, __xfrm_policy_check drops
> > the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
> > policy can't be found that matches the (host-side) XFRM state used for
> > decryption.
> >
> > This patch fixes this by scrubbing the packet when using
> > bpf_redirect_peer, as is done on typical netns switches via veth
> > devices.
> >
> > Fixes: 9aa1206e8f482 ("bpf: Add redirect_peer helper")
> > Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
> > ---
> > net/core/filter.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 79cab4d78dc3..12b6b8dbeb51 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2509,6 +2509,7 @@ int skb_do_redirect(struct sk_buff *skb)
> > goto out_drop;
> > skb->dev = dev;
> > dev_sw_netstats_rx_add(dev, skb->len);
> > + skb_scrub_packet(skb, true);
>
> This should be set to false, and defer the clearing of mark/tstamp
> iff needed to the BPF program in the host given this has been out
> for quite some time to avoid breakage if someone actively has been
> using BPF like that to transfer this information to apps in the
> target namespace.
Thanks for the review! I've sent a v2 with the suggested change and
added a commit to explain in the helper description that those two
fields are not cleared.
>
> > return -EAGAIN;
> > }
> > return flags & BPF_F_NEIGH ?
>
prev parent reply other threads:[~2025-05-05 20:01 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-05 9:48 [PATCH bpf] bpf: Scrub packet on bpf_redirect_peer Paul Chaignon
2025-05-05 10:01 ` Daniel Borkmann
2025-05-05 20:01 ` Paul Chaignon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBkZIoqBaClS71tn@mail.gmail.com \
--to=paul.chaignon@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=martin.lau@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.