All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf] bpf: Scrub packet on bpf_redirect_peer
@ 2025-05-05  9:48 Paul Chaignon
  2025-05-05 10:01 ` Daniel Borkmann
  0 siblings, 1 reply; 3+ messages in thread
From: Paul Chaignon @ 2025-05-05  9:48 UTC (permalink / raw)
  To: bpf; +Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko

When bpf_redirect_peer is used to redirect packets to a device in
another network namespace, the skb isn't scrubbed. That can lead skb
information from one namespace to be "misused" in another namespace.

As one example, this is causing Cilium to drop traffic when using
bpf_redirect_peer to redirect packets that just went through IPsec
decryption to a container namespace. The following pwru trace shows (1)
the packet path from the host's XFRM layer to the container's XFRM
layer where it's dropped and (2) the number of active skb extensions at
each function.

    NETNS       MARK  IFACE  TUPLE                                FUNC
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm_rcv_cb
                             .active_extensions = (__u8)2,
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm4_rcv_cb
                             .active_extensions = (__u8)2,
    4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  gro_cells_receive
                             .active_extensions = (__u8)2,
    [...]
    4026533547  0     eth0   10.244.3.124:35473->10.244.2.158:53  skb_do_redirect
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv_core
                             .active_extensions = (__u8)2,
    [...]
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  udp_queue_rcv_one_skb
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_policy_check
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_decode_session
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  security_xfrm_decode_session
                             .active_extensions = (__u8)2,
    4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
                             .active_extensions = (__u8)2,

In this case, there are no XFRM policies in the container's network
namespace so the drop is unexpected. When we decrypt the IPsec packet,
the XFRM state used for decryption is set in the skb extensions. This
information is preserved across the netns switch. When we reach the
XFRM policy check in the container's netns, __xfrm_policy_check drops
the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
policy can't be found that matches the (host-side) XFRM state used for
decryption.

This patch fixes this by scrubbing the packet when using
bpf_redirect_peer, as is done on typical netns switches via veth
devices.

Fixes: 9aa1206e8f482 ("bpf: Add redirect_peer helper")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
---
 net/core/filter.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 79cab4d78dc3..12b6b8dbeb51 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2509,6 +2509,7 @@ int skb_do_redirect(struct sk_buff *skb)
 			goto out_drop;
 		skb->dev = dev;
 		dev_sw_netstats_rx_add(dev, skb->len);
+		skb_scrub_packet(skb, true);
 		return -EAGAIN;
 	}
 	return flags & BPF_F_NEIGH ?
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf] bpf: Scrub packet on bpf_redirect_peer
  2025-05-05  9:48 [PATCH bpf] bpf: Scrub packet on bpf_redirect_peer Paul Chaignon
@ 2025-05-05 10:01 ` Daniel Borkmann
  2025-05-05 20:01   ` Paul Chaignon
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Borkmann @ 2025-05-05 10:01 UTC (permalink / raw)
  To: Paul Chaignon, bpf; +Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko

On 5/5/25 11:48 AM, Paul Chaignon wrote:
> When bpf_redirect_peer is used to redirect packets to a device in
> another network namespace, the skb isn't scrubbed. That can lead skb
> information from one namespace to be "misused" in another namespace.
> 
> As one example, this is causing Cilium to drop traffic when using
> bpf_redirect_peer to redirect packets that just went through IPsec
> decryption to a container namespace. The following pwru trace shows (1)
> the packet path from the host's XFRM layer to the container's XFRM
> layer where it's dropped and (2) the number of active skb extensions at
> each function.
> 
>      NETNS       MARK  IFACE  TUPLE                                FUNC
>      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm_rcv_cb
>                               .active_extensions = (__u8)2,
>      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm4_rcv_cb
>                               .active_extensions = (__u8)2,
>      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  gro_cells_receive
>                               .active_extensions = (__u8)2,
>      [...]
>      4026533547  0     eth0   10.244.3.124:35473->10.244.2.158:53  skb_do_redirect
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv_core
>                               .active_extensions = (__u8)2,
>      [...]
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  udp_queue_rcv_one_skb
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_policy_check
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_decode_session
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  security_xfrm_decode_session
>                               .active_extensions = (__u8)2,
>      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
>                               .active_extensions = (__u8)2,
> 
> In this case, there are no XFRM policies in the container's network
> namespace so the drop is unexpected. When we decrypt the IPsec packet,
> the XFRM state used for decryption is set in the skb extensions. This
> information is preserved across the netns switch. When we reach the
> XFRM policy check in the container's netns, __xfrm_policy_check drops
> the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
> policy can't be found that matches the (host-side) XFRM state used for
> decryption.
> 
> This patch fixes this by scrubbing the packet when using
> bpf_redirect_peer, as is done on typical netns switches via veth
> devices.
> 
> Fixes: 9aa1206e8f482 ("bpf: Add redirect_peer helper")
> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
> ---
>   net/core/filter.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 79cab4d78dc3..12b6b8dbeb51 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2509,6 +2509,7 @@ int skb_do_redirect(struct sk_buff *skb)
>   			goto out_drop;
>   		skb->dev = dev;
>   		dev_sw_netstats_rx_add(dev, skb->len);
> +		skb_scrub_packet(skb, true);

This should be set to false, and defer the clearing of mark/tstamp
iff needed to the BPF program in the host given this has been out
for quite some time to avoid breakage if someone actively has been
using BPF like that to transfer this information to apps in the
target namespace.

>   		return -EAGAIN;
>   	}
>   	return flags & BPF_F_NEIGH ?


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH bpf] bpf: Scrub packet on bpf_redirect_peer
  2025-05-05 10:01 ` Daniel Borkmann
@ 2025-05-05 20:01   ` Paul Chaignon
  0 siblings, 0 replies; 3+ messages in thread
From: Paul Chaignon @ 2025-05-05 20:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: bpf, Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko

On Mon, May 05, 2025 at 12:01:30PM +0200, Daniel Borkmann wrote:
> On 5/5/25 11:48 AM, Paul Chaignon wrote:
> > When bpf_redirect_peer is used to redirect packets to a device in
> > another network namespace, the skb isn't scrubbed. That can lead skb
> > information from one namespace to be "misused" in another namespace.
> > 
> > As one example, this is causing Cilium to drop traffic when using
> > bpf_redirect_peer to redirect packets that just went through IPsec
> > decryption to a container namespace. The following pwru trace shows (1)
> > the packet path from the host's XFRM layer to the container's XFRM
> > layer where it's dropped and (2) the number of active skb extensions at
> > each function.
> > 
> >      NETNS       MARK  IFACE  TUPLE                                FUNC
> >      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm_rcv_cb
> >                               .active_extensions = (__u8)2,
> >      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  xfrm4_rcv_cb
> >                               .active_extensions = (__u8)2,
> >      4026533547  d00   eth0   10.244.3.124:35473->10.244.2.158:53  gro_cells_receive
> >                               .active_extensions = (__u8)2,
> >      [...]
> >      4026533547  0     eth0   10.244.3.124:35473->10.244.2.158:53  skb_do_redirect
> >                               .active_extensions = (__u8)2,
> >      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv
> >                               .active_extensions = (__u8)2,
> >      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  ip_rcv_core
> >                               .active_extensions = (__u8)2,
> >      [...]
> >      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  udp_queue_rcv_one_skb
> >                               .active_extensions = (__u8)2,
> >      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_policy_check
> >                               .active_extensions = (__u8)2,
> >      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  __xfrm_decode_session
> >                               .active_extensions = (__u8)2,
> >      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  security_xfrm_decode_session
> >                               .active_extensions = (__u8)2,
> >      4026534999  0     eth0   10.244.3.124:35473->10.244.2.158:53  kfree_skb_reason(SKB_DROP_REASON_XFRM_POLICY)
> >                               .active_extensions = (__u8)2,
> > 
> > In this case, there are no XFRM policies in the container's network
> > namespace so the drop is unexpected. When we decrypt the IPsec packet,
> > the XFRM state used for decryption is set in the skb extensions. This
> > information is preserved across the netns switch. When we reach the
> > XFRM policy check in the container's netns, __xfrm_policy_check drops
> > the packet with LINUX_MIB_XFRMINNOPOLS because a (container-side) XFRM
> > policy can't be found that matches the (host-side) XFRM state used for
> > decryption.
> > 
> > This patch fixes this by scrubbing the packet when using
> > bpf_redirect_peer, as is done on typical netns switches via veth
> > devices.
> > 
> > Fixes: 9aa1206e8f482 ("bpf: Add redirect_peer helper")
> > Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
> > ---
> >   net/core/filter.c | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 79cab4d78dc3..12b6b8dbeb51 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2509,6 +2509,7 @@ int skb_do_redirect(struct sk_buff *skb)
> >   			goto out_drop;
> >   		skb->dev = dev;
> >   		dev_sw_netstats_rx_add(dev, skb->len);
> > +		skb_scrub_packet(skb, true);
> 
> This should be set to false, and defer the clearing of mark/tstamp
> iff needed to the BPF program in the host given this has been out
> for quite some time to avoid breakage if someone actively has been
> using BPF like that to transfer this information to apps in the
> target namespace.

Thanks for the review! I've sent a v2 with the suggested change and
added a commit to explain in the helper description that those two
fields are not cleared.

> 
> >   		return -EAGAIN;
> >   	}
> >   	return flags & BPF_F_NEIGH ?
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-05 20:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-05  9:48 [PATCH bpf] bpf: Scrub packet on bpf_redirect_peer Paul Chaignon
2025-05-05 10:01 ` Daniel Borkmann
2025-05-05 20:01   ` Paul Chaignon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.