[PATCH net] net: clear the dst when changing skb protocol

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] net: clear the dst when changing skb protocol
@ 2025-06-04 21:06 Jakub Kicinski
  2025-06-04 21:21 ` Maciej Żenczykowski
  2025-06-04 21:45 ` Daniel Borkmann
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2025-06-04 21:06 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	martin.lau, daniel, john.fastabend, eddyz87, sdf, haoluo, willemb,
	william.xuanziyang, alan.maguire, bpf, maze

A not-so-careful NAT46 BPF program can crash the kernel
if it indiscriminately flips ingress packets from v4 to v6:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
    ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
    ipv6_rcv (net/ipv6/ip6_input.c:306:8)
    process_backlog (net/core/dev.c:6186:4)
    napi_poll (net/core/dev.c:6906:9)
    net_rx_action (net/core/dev.c:7028:13)
    do_softirq (kernel/softirq.c:462:3)
    netif_rx (net/core/dev.c:5326:3)
    dev_loopback_xmit (net/core/dev.c:4015:2)
    ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
    NF_HOOK (./include/linux/netfilter.h:314:9)
    ip_mc_output (net/ipv4/ip_output.c:400:5)
    dst_output (./include/net/dst.h:459:9)
    ip_local_out (net/ipv4/ip_output.c:130:9)
    ip_send_skb (net/ipv4/ip_output.c:1496:8)
    udp_send_skb (net/ipv4/udp.c:1040:8)
    udp_sendmsg (net/ipv4/udp.c:1328:10)

The output interface has a 4->6 program attached at ingress.
We try to loop the multicast skb back to the sending socket.
Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
and changes skb->protocol to v6. We enter ip6_rcv_core which
tries to use skb_dst(). But the dst is still an IPv4 one left
after IPv4 mcast output.

Clear the dst in all BPF helpers which change the protcol.
Try to preserve metadata dsts, those won't hurt.

Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
I wonder if we should not skip ingress (tc_skip_classify?)
for looped back packets in the first place. But that doesn't
seem robust enough vs multiple redirections to solve the crash.

Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
work either, since BPF can change pkt_type arbitrarily.

CC: martin.lau@linux.dev
CC: daniel@iogearbox.net
CC: john.fastabend@gmail.com
CC: eddyz87@gmail.com
CC: sdf@fomichev.me
CC: haoluo@google.com
CC: willemb@google.com
CC: william.xuanziyang@huawei.com
CC: alan.maguire@oracle.com
CC: bpf@vger.kernel.org
CC: edumazet@google.com
CC: maze@google.com
---
 net/core/filter.c                      | 19 +++++++++++++------
 tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++++
 2 files changed, 28 insertions(+), 6 deletions(-)
 create mode 100755 tools/testing/selftests/net/nat6to4.sh

diff --git a/net/core/filter.c b/net/core/filter.c
index 327ca73f9cd7..7a72f766aacf 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3233,6 +3233,13 @@ static const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
 	.arg1_type      = ARG_PTR_TO_CTX,
 };
 
+static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)
+{
+	skb->protocol = htons(proto);
+	if (skb_valid_dst(skb))
+		skb_dst_drop(skb);
+}
+
 static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
 {
 	/* Caller already did skb_cow() with len as headroom,
@@ -3329,7 +3336,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
 		}
 	}
 
-	skb->protocol = htons(ETH_P_IPV6);
+	bpf_skb_change_protocol(skb, ETH_P_IPV6);
 	skb_clear_hash(skb);
 
 	return 0;
@@ -3359,7 +3366,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
 		}
 	}
 
-	skb->protocol = htons(ETH_P_IP);
+	bpf_skb_change_protocol(skb, ETH_P_IP);
 	skb_clear_hash(skb);
 
 	return 0;
@@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 		/* Match skb->protocol to new outer l3 protocol */
 		if (skb->protocol == htons(ETH_P_IP) &&
 		    flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
-			skb->protocol = htons(ETH_P_IPV6);
+			bpf_skb_change_protocol(skb, ETH_P_IPV6);
 		else if (skb->protocol == htons(ETH_P_IPV6) &&
 			 flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
-			skb->protocol = htons(ETH_P_IP);
+			bpf_skb_change_protocol(skb, ETH_P_IP);
 	}
 
 	if (skb_is_gso(skb)) {
@@ -3606,10 +3613,10 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 	/* Match skb->protocol to new outer l3 protocol */
 	if (skb->protocol == htons(ETH_P_IP) &&
 	    flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
-		skb->protocol = htons(ETH_P_IPV6);
+		bpf_skb_change_protocol(skb, ETH_P_IPV6);
 	else if (skb->protocol == htons(ETH_P_IPV6) &&
 		 flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
-		skb->protocol = htons(ETH_P_IP);
+		bpf_skb_change_protocol(skb, ETH_P_IP);
 
 	if (skb_is_gso(skb)) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
diff --git a/tools/testing/selftests/net/nat6to4.sh b/tools/testing/selftests/net/nat6to4.sh
new file mode 100755
index 000000000000..0ee859b622a4
--- /dev/null
+++ b/tools/testing/selftests/net/nat6to4.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+NS="ns-peer-$(mktemp -u XXXXXX)"
+
+ip netns add "${NS}"
+ip -netns "${NS}" link set lo up
+ip -netns "${NS}" route add default via 127.0.0.2 dev lo
+
+tc -n "${NS}" qdisc add dev lo ingress
+tc -n "${NS}" filter add dev lo ingress prio 4 protocol ip \
+   bpf object-file nat6to4.bpf.o section schedcls/egress4/snat4 direct-action
+
+ip netns exec "${NS}" \
+   bash -c 'echo 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abc | socat - UDP4-DATAGRAM:224.1.0.1:6666,ip-multicast-loop=1'
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-04 21:06 [PATCH net] net: clear the dst when changing skb protocol Jakub Kicinski
@ 2025-06-04 21:21 ` Maciej Żenczykowski
  2025-06-05 13:22   ` Jakub Kicinski
  2025-06-04 21:45 ` Daniel Borkmann
  1 sibling, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2025-06-04 21:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, martin.lau,
	daniel, john.fastabend, eddyz87, sdf, haoluo, willemb,
	william.xuanziyang, alan.maguire, bpf

On Wed, Jun 4, 2025 at 11:06 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> A not-so-careful NAT46 BPF program can crash the kernel
> if it indiscriminately flips ingress packets from v4 to v6:
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>     ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
>     ipv6_rcv (net/ipv6/ip6_input.c:306:8)
>     process_backlog (net/core/dev.c:6186:4)
>     napi_poll (net/core/dev.c:6906:9)
>     net_rx_action (net/core/dev.c:7028:13)
>     do_softirq (kernel/softirq.c:462:3)
>     netif_rx (net/core/dev.c:5326:3)
>     dev_loopback_xmit (net/core/dev.c:4015:2)
>     ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
>     NF_HOOK (./include/linux/netfilter.h:314:9)
>     ip_mc_output (net/ipv4/ip_output.c:400:5)
>     dst_output (./include/net/dst.h:459:9)
>     ip_local_out (net/ipv4/ip_output.c:130:9)
>     ip_send_skb (net/ipv4/ip_output.c:1496:8)
>     udp_send_skb (net/ipv4/udp.c:1040:8)
>     udp_sendmsg (net/ipv4/udp.c:1328:10)
>
> The output interface has a 4->6 program attached at ingress.
> We try to loop the multicast skb back to the sending socket.
> Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
> and changes skb->protocol to v6. We enter ip6_rcv_core which
> tries to use skb_dst(). But the dst is still an IPv4 one left
> after IPv4 mcast output.
>
> Clear the dst in all BPF helpers which change the protcol.
> Try to preserve metadata dsts, those won't hurt.
>
> Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
> Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Maciej Żenczykowski <maze@google.com>

> ---
> I wonder if we should not skip ingress (tc_skip_classify?)
> for looped back packets in the first place. But that doesn't
> seem robust enough vs multiple redirections to solve the crash.
>
> Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
> work either, since BPF can change pkt_type arbitrarily.
>
> CC: martin.lau@linux.dev
> CC: daniel@iogearbox.net
> CC: john.fastabend@gmail.com
> CC: eddyz87@gmail.com
> CC: sdf@fomichev.me
> CC: haoluo@google.com
> CC: willemb@google.com
> CC: william.xuanziyang@huawei.com
> CC: alan.maguire@oracle.com
> CC: bpf@vger.kernel.org
> CC: edumazet@google.com
> CC: maze@google.com
> ---
>  net/core/filter.c                      | 19 +++++++++++++------
>  tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++++
>  2 files changed, 28 insertions(+), 6 deletions(-)
>  create mode 100755 tools/testing/selftests/net/nat6to4.sh
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 327ca73f9cd7..7a72f766aacf 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3233,6 +3233,13 @@ static const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
>         .arg1_type      = ARG_PTR_TO_CTX,
>  };
>
> +static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)
> +{
> +       skb->protocol = htons(proto);
> +       if (skb_valid_dst(skb))
> +               skb_dst_drop(skb);
> +}
> +
>  static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
>  {
>         /* Caller already did skb_cow() with len as headroom,
> @@ -3329,7 +3336,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
>                 }
>         }
>
> -       skb->protocol = htons(ETH_P_IPV6);
> +       bpf_skb_change_protocol(skb, ETH_P_IPV6);
>         skb_clear_hash(skb);
>
>         return 0;
> @@ -3359,7 +3366,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
>                 }
>         }
>
> -       skb->protocol = htons(ETH_P_IP);
> +       bpf_skb_change_protocol(skb, ETH_P_IP);
>         skb_clear_hash(skb);
>
>         return 0;
> @@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>                 /* Match skb->protocol to new outer l3 protocol */
>                 if (skb->protocol == htons(ETH_P_IP) &&
>                     flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
> -                       skb->protocol = htons(ETH_P_IPV6);
> +                       bpf_skb_change_protocol(skb, ETH_P_IPV6);
>                 else if (skb->protocol == htons(ETH_P_IPV6) &&
>                          flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
> -                       skb->protocol = htons(ETH_P_IP);
> +                       bpf_skb_change_protocol(skb, ETH_P_IP);

I wonder if this shouldn't drop dst even when doing ipv4->ipv4 or
ipv6->ipv6 -- it's encapping, presumably old dst is irrelevant...

>         }
>
>         if (skb_is_gso(skb)) {
> @@ -3606,10 +3613,10 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>         /* Match skb->protocol to new outer l3 protocol */
>         if (skb->protocol == htons(ETH_P_IP) &&
>             flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
> -               skb->protocol = htons(ETH_P_IPV6);
> +               bpf_skb_change_protocol(skb, ETH_P_IPV6);
>         else if (skb->protocol == htons(ETH_P_IPV6) &&
>                  flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
> -               skb->protocol = htons(ETH_P_IP);
> +               bpf_skb_change_protocol(skb, ETH_P_IP);

ditto for decap
>
>         if (skb_is_gso(skb)) {
>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
> diff --git a/tools/testing/selftests/net/nat6to4.sh b/tools/testing/selftests/net/nat6to4.sh
> new file mode 100755
> index 000000000000..0ee859b622a4
> --- /dev/null
> +++ b/tools/testing/selftests/net/nat6to4.sh
> @@ -0,0 +1,15 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +NS="ns-peer-$(mktemp -u XXXXXX)"
> +
> +ip netns add "${NS}"
> +ip -netns "${NS}" link set lo up
> +ip -netns "${NS}" route add default via 127.0.0.2 dev lo
> +
> +tc -n "${NS}" qdisc add dev lo ingress
> +tc -n "${NS}" filter add dev lo ingress prio 4 protocol ip \
> +   bpf object-file nat6to4.bpf.o section schedcls/egress4/snat4 direct-action
> +
> +ip netns exec "${NS}" \
> +   bash -c 'echo 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abc | socat - UDP4-DATAGRAM:224.1.0.1:6666,ip-multicast-loop=1'
> --
> 2.49.0
>

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-04 21:06 [PATCH net] net: clear the dst when changing skb protocol Jakub Kicinski
  2025-06-04 21:21 ` Maciej Żenczykowski
@ 2025-06-04 21:45 ` Daniel Borkmann
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2025-06-04 21:45 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, martin.lau,
	john.fastabend, eddyz87, sdf, haoluo, willemb, william.xuanziyang,
	alan.maguire, bpf, maze

On 6/4/25 11:06 PM, Jakub Kicinski wrote:
> A not-so-careful NAT46 BPF program can crash the kernel
> if it indiscriminately flips ingress packets from v4 to v6:
> 
>    BUG: kernel NULL pointer dereference, address: 0000000000000000
>      ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
>      ipv6_rcv (net/ipv6/ip6_input.c:306:8)
>      process_backlog (net/core/dev.c:6186:4)
>      napi_poll (net/core/dev.c:6906:9)
>      net_rx_action (net/core/dev.c:7028:13)
>      do_softirq (kernel/softirq.c:462:3)
>      netif_rx (net/core/dev.c:5326:3)
>      dev_loopback_xmit (net/core/dev.c:4015:2)
>      ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
>      NF_HOOK (./include/linux/netfilter.h:314:9)
>      ip_mc_output (net/ipv4/ip_output.c:400:5)
>      dst_output (./include/net/dst.h:459:9)
>      ip_local_out (net/ipv4/ip_output.c:130:9)
>      ip_send_skb (net/ipv4/ip_output.c:1496:8)
>      udp_send_skb (net/ipv4/udp.c:1040:8)
>      udp_sendmsg (net/ipv4/udp.c:1328:10)
> 
> The output interface has a 4->6 program attached at ingress.
> We try to loop the multicast skb back to the sending socket.
> Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
> and changes skb->protocol to v6. We enter ip6_rcv_core which
> tries to use skb_dst(). But the dst is still an IPv4 one left
> after IPv4 mcast output.
> 
> Clear the dst in all BPF helpers which change the protcol.

(nit: protcol)

> Try to preserve metadata dsts, those won't hurt.
> 
> Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
> Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

lgtm!

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-04 21:21 ` Maciej Żenczykowski
@ 2025-06-05 13:22   ` Jakub Kicinski
  2025-06-05 13:50     ` Maciej Żenczykowski
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-06-05 13:22 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, martin.lau,
	daniel, john.fastabend, eddyz87, sdf, haoluo, willemb,
	william.xuanziyang, alan.maguire, bpf

On Wed, 4 Jun 2025 23:21:02 +0200 Maciej Żenczykowski wrote:
> > @@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
> >                 /* Match skb->protocol to new outer l3 protocol */
> >                 if (skb->protocol == htons(ETH_P_IP) &&
> >                     flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
> > -                       skb->protocol = htons(ETH_P_IPV6);
> > +                       bpf_skb_change_protocol(skb, ETH_P_IPV6);
> >                 else if (skb->protocol == htons(ETH_P_IPV6) &&
> >                          flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
> > -                       skb->protocol = htons(ETH_P_IP);
> > +                       bpf_skb_change_protocol(skb, ETH_P_IP);  
> 
> I wonder if this shouldn't drop dst even when doing ipv4->ipv4 or
> ipv6->ipv6 -- it's encapping, presumably old dst is irrelevant...

I keep going back and forth on this. You definitely have a point, 
but I feel like there are levels to how BPF prog can make the dst
irrelevant:
 - change proto
 - encap
 - adjust room but not set any encap flag
 - overwrite the addrs without calling any helpers
First case we have to cover for safety, last we can't possibly cover.
So the question is whether we should draw the line somewhere in
the middle, or leave this patch as is and if the actual use case arrives
- let BPF call skb_dst_drop() as a kfunc. Right now I'm leaning towards
the latter.

Does that make sense? Does anyone else have an opinion?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-05 13:22   ` Jakub Kicinski
@ 2025-06-05 13:50     ` Maciej Żenczykowski
  2025-06-05 14:01       ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2025-06-05 13:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, martin.lau,
	daniel, john.fastabend, eddyz87, sdf, haoluo, willemb,
	william.xuanziyang, alan.maguire, bpf

On Thu, Jun 5, 2025 at 3:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 4 Jun 2025 23:21:02 +0200 Maciej Żenczykowski wrote:
> > > @@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
> > >                 /* Match skb->protocol to new outer l3 protocol */
> > >                 if (skb->protocol == htons(ETH_P_IP) &&
> > >                     flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
> > > -                       skb->protocol = htons(ETH_P_IPV6);
> > > +                       bpf_skb_change_protocol(skb, ETH_P_IPV6);
> > >                 else if (skb->protocol == htons(ETH_P_IPV6) &&
> > >                          flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
> > > -                       skb->protocol = htons(ETH_P_IP);
> > > +                       bpf_skb_change_protocol(skb, ETH_P_IP);
> >
> > I wonder if this shouldn't drop dst even when doing ipv4->ipv4 or
> > ipv6->ipv6 -- it's encapping, presumably old dst is irrelevant...
>
> I keep going back and forth on this. You definitely have a point,
> but I feel like there are levels to how BPF prog can make the dst
> irrelevant:
>  - change proto
>  - encap
>  - adjust room but not set any encap flag
>  - overwrite the addrs without calling any helpers
> First case we have to cover for safety, last we can't possibly cover.
> So the question is whether we should draw the line somewhere in
> the middle, or leave this patch as is and if the actual use case arrives
> - let BPF call skb_dst_drop() as a kfunc. Right now I'm leaning towards
> the latter.
>
> Does that make sense? Does anyone else have an opinion?

It does make a fair bit of sense.
Question: does calling it as a kfunc require kernel BTF?
Specifically some ram limited devices want to disable CONFIG_DEBUG_INFO_BTF...
I know normal bpf helpers don't need that...
I guess you could always convert ipv4 -> ipv6 -> ipv4 ;-)
--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-05 13:50     ` Maciej Żenczykowski
@ 2025-06-05 14:01       ` Jakub Kicinski
  2025-06-06  0:09         ` Willem de Bruijn
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-06-05 14:01 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, martin.lau,
	daniel, john.fastabend, eddyz87, sdf, haoluo, willemb,
	william.xuanziyang, alan.maguire, bpf

On Thu, 5 Jun 2025 15:50:31 +0200 Maciej Żenczykowski wrote:
> On Thu, Jun 5, 2025 at 3:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > I wonder if this shouldn't drop dst even when doing ipv4->ipv4 or
> > > ipv6->ipv6 -- it's encapping, presumably old dst is irrelevant...  
> >
> > I keep going back and forth on this. You definitely have a point,
> > but I feel like there are levels to how BPF prog can make the dst
> > irrelevant:
> >  - change proto
> >  - encap
> >  - adjust room but not set any encap flag
> >  - overwrite the addrs without calling any helpers
> > First case we have to cover for safety, last we can't possibly cover.
> > So the question is whether we should draw the line somewhere in
> > the middle, or leave this patch as is and if the actual use case arrives
> > - let BPF call skb_dst_drop() as a kfunc. Right now I'm leaning towards
> > the latter.
> >
> > Does that make sense? Does anyone else have an opinion?  
> 
> It does make a fair bit of sense.
> Question: does calling it as a kfunc require kernel BTF?
> Specifically some ram limited devices want to disable CONFIG_DEBUG_INFO_BTF...
> I know normal bpf helpers don't need that...
> I guess you could always convert ipv4 -> ipv6 -> ipv4 ;-)

Not sure how BPF folks feel about that, but technically we could
also add a flag to bpf_skb_adjust_room() or bpf_skb_change_proto().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-05 14:01       ` Jakub Kicinski
@ 2025-06-06  0:09         ` Willem de Bruijn
  2025-06-06  0:31           ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2025-06-06  0:09 UTC (permalink / raw)
  To: Jakub Kicinski, Maciej Żenczykowski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, martin.lau,
	daniel, john.fastabend, eddyz87, sdf, haoluo, willemb,
	william.xuanziyang, alan.maguire, bpf

Jakub Kicinski wrote:
> On Thu, 5 Jun 2025 15:50:31 +0200 Maciej Żenczykowski wrote:
> > On Thu, Jun 5, 2025 at 3:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > I wonder if this shouldn't drop dst even when doing ipv4->ipv4 or
> > > > ipv6->ipv6 -- it's encapping, presumably old dst is irrelevant...  
> > >
> > > I keep going back and forth on this. You definitely have a point,
> > > but I feel like there are levels to how BPF prog can make the dst
> > > irrelevant:
> > >  - change proto
> > >  - encap
> > >  - adjust room but not set any encap flag
> > >  - overwrite the addrs without calling any helpers
> > > First case we have to cover for safety, last we can't possibly cover.
> > > So the question is whether we should draw the line somewhere in
> > > the middle, or leave this patch as is and if the actual use case arrives
> > > - let BPF call skb_dst_drop() as a kfunc. Right now I'm leaning towards
> > > the latter.
> > >
> > > Does that make sense? Does anyone else have an opinion?  
> > 
> > It does make a fair bit of sense.
> > Question: does calling it as a kfunc require kernel BTF?
> > Specifically some ram limited devices want to disable CONFIG_DEBUG_INFO_BTF...
> > I know normal bpf helpers don't need that...
> > I guess you could always convert ipv4 -> ipv6 -> ipv4 ;-)
> 
> Not sure how BPF folks feel about that, but technically we could
> also add a flag to bpf_skb_adjust_room() or bpf_skb_change_proto().

To invert the question: what is the value in keeping the dst?

The test refers to a nat6to4.bpf.o, but this is not included.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-06  0:09         ` Willem de Bruijn
@ 2025-06-06  0:31           ` Jakub Kicinski
  2025-06-06  9:40             ` Maciej Żenczykowski
  2025-06-06 15:55             ` Willem de Bruijn
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2025-06-06  0:31 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Maciej Żenczykowski, davem, netdev, edumazet, pabeni,
	andrew+netdev, horms, martin.lau, daniel, john.fastabend, eddyz87,
	sdf, haoluo, willemb, william.xuanziyang, alan.maguire, bpf

On Thu, 05 Jun 2025 20:09:55 -0400 Willem de Bruijn wrote:
> > > It does make a fair bit of sense.
> > > Question: does calling it as a kfunc require kernel BTF?
> > > Specifically some ram limited devices want to disable CONFIG_DEBUG_INFO_BTF...
> > > I know normal bpf helpers don't need that...
> > > I guess you could always convert ipv4 -> ipv6 -> ipv4 ;-)  
> > 
> > Not sure how BPF folks feel about that, but technically we could
> > also add a flag to bpf_skb_adjust_room() or bpf_skb_change_proto().  
> 
> To invert the question: what is the value in keeping the dst?

I guess simplicity defined as "how many English words are needed to
explain the semantics".

The semantics I have in mind would be - dst is dropped if (1) proto 
is changed (this patch), or (2) "CLEAR_DST" flag is explicitly set
(future extension).

If we drop on encap (which I supposed is the counter proposal)
we may end up with: dst is dropped if (1) proto is changed, 
(2) encap flags are set (1+2 = alternative patch), or (3) "CLEAR_DST"
flag is explicitly set (future extension). 

Don't think we can rule out the need for a CLEAR_DST flag as not all
re-routings are encaps.

But both you and Maciej consider dropping for all encaps more
intuitive, so I'll do that in v2 unless someone objects.

> The test refers to a nat6to4.bpf.o, but this is not included.

I reused an existing BPF prog, it does what we need -
it turns a v4 packet into a v6 one :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-06  0:31           ` Jakub Kicinski
@ 2025-06-06  9:40             ` Maciej Żenczykowski
  2025-06-06 17:11               ` Jakub Kicinski
  2025-06-06 15:55             ` Willem de Bruijn
  1 sibling, 1 reply; 13+ messages in thread
From: Maciej Żenczykowski @ 2025-06-06  9:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Willem de Bruijn, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, martin.lau, daniel, john.fastabend, eddyz87, sdf, haoluo,
	willemb, william.xuanziyang, alan.maguire, bpf

On Fri, Jun 6, 2025 at 2:31 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 05 Jun 2025 20:09:55 -0400 Willem de Bruijn wrote:
> > > > It does make a fair bit of sense.
> > > > Question: does calling it as a kfunc require kernel BTF?
> > > > Specifically some ram limited devices want to disable CONFIG_DEBUG_INFO_BTF...
> > > > I know normal bpf helpers don't need that...
> > > > I guess you could always convert ipv4 -> ipv6 -> ipv4 ;-)
> > >
> > > Not sure how BPF folks feel about that, but technically we could
> > > also add a flag to bpf_skb_adjust_room() or bpf_skb_change_proto().
> >
> > To invert the question: what is the value in keeping the dst?
>
> I guess simplicity defined as "how many English words are needed to
> explain the semantics".
>
> The semantics I have in mind would be - dst is dropped if (1) proto
> is changed (this patch), or (2) "CLEAR_DST" flag is explicitly set
> (future extension).
>
> If we drop on encap (which I supposed is the counter proposal)
> we may end up with: dst is dropped if (1) proto is changed,
> (2) encap flags are set (1+2 = alternative patch), or (3) "CLEAR_DST"
> flag is explicitly set (future extension).
>
> Don't think we can rule out the need for a CLEAR_DST flag as not all
> re-routings are encaps.
>
> But both you and Maciej consider dropping for all encaps more
> intuitive, so I'll do that in v2 unless someone objects.
>
> > The test refers to a nat6to4.bpf.o, but this is not included.
>
> I reused an existing BPF prog, it does what we need -
> it turns a v4 packet into a v6 one :)

I haven't yet written code like this, so I'm not sure which specific
helpers would be used, but here's what I'm aware of:

nat46/nat64 translation - obviously the dst should be dropped, as it
cannot be correct
- note that the bpf skb change proto helper is not always enough,
as it always adjusts by 20, but for fragments you need to adjust by 8
more (ipv4 had frag info straight in the core ipv4 header, ipv6
requires an extra 8 byte extension header),
thus our latest Android clat/nat64/nat46 code sometimes calls:
  bpf_skb_adjust_room(skb, -(__s32)sizeof(struct frag_hdr),
BPF_ADJ_ROOM_NET, /*flags*/0))

(and should also do the reverse +8 in the other direction, but that's
not yet implemented)
Anyway in theory this bpf_skb_adjust_room(BPF_ADJ_ROOM_NET) likely
shouldn't clear dst.

ipv4-in-ipv6/ipv6-in-ipv4 decap/encap - drop, cannot be correct

ipv4-in-ipv4/ipv6-in-ipv6 decap/encap - the dst ip is almost always
different in inner vs outer packet, so again drop seems correct,
though yeah I guess it could be conditional

now, what other cases do we have where we need to insert stuff at the
beginning of the packet (I assume we're not talking about cases where
we change something at the end)

I can think of 2 more:

(a) forwarding to an interface with a different L2 header size

specifically I think in Android when we forward (tethering offload)
from rawip to ether we have to adjust the front of the packet to add
an ethernet header: bpf_skb_change_head(skb, sizeof(struct ethhdr),
/*flags*/ 0))
- in practice, for v4, we just did manual nat44 so the dst is garbage,
while for v6 we didn't so the dst would in theory still be valid (but
we immediately bpf_redirect(EGRESS) so dst is also likely useless)

I think we might unconditionally add the ethernet header when
forwarding from rawip to rawip as well, because I think the kernel
then just drops / ignores it when egressing (via bpf_redirect) through
a rawip interface.

Honestly this stuff is annoying as hell to deal with (unfortunately
true of bpf in general).

(b) adding/removing some sort of vlan tag / mpls tag / ipv4
options/ipv6 extension headers / tcp options or something - most of
these likely should keep the dst

I think this wouldn't use BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 & friends but
something else, and at least some of these should likely not
invalidate the dst.

I'm not clear on what all the 'proper' apis are for all these different cases.

(I thought I had a 3rd case, but I can't remember it any more...)

Hopefully this is helpful?

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-06  0:31           ` Jakub Kicinski
  2025-06-06  9:40             ` Maciej Żenczykowski
@ 2025-06-06 15:55             ` Willem de Bruijn
  2025-06-06 17:09               ` Jakub Kicinski
  1 sibling, 1 reply; 13+ messages in thread
From: Willem de Bruijn @ 2025-06-06 15:55 UTC (permalink / raw)
  To: Jakub Kicinski, Willem de Bruijn
  Cc: Maciej Żenczykowski, davem, netdev, edumazet, pabeni,
	andrew+netdev, horms, martin.lau, daniel, john.fastabend, eddyz87,
	sdf, haoluo, willemb, william.xuanziyang, alan.maguire, bpf

Jakub Kicinski wrote:
> On Thu, 05 Jun 2025 20:09:55 -0400 Willem de Bruijn wrote:
> > > > It does make a fair bit of sense.
> > > > Question: does calling it as a kfunc require kernel BTF?
> > > > Specifically some ram limited devices want to disable CONFIG_DEBUG_INFO_BTF...
> > > > I know normal bpf helpers don't need that...
> > > > I guess you could always convert ipv4 -> ipv6 -> ipv4 ;-)  
> > > 
> > > Not sure how BPF folks feel about that, but technically we could
> > > also add a flag to bpf_skb_adjust_room() or bpf_skb_change_proto().  
> > 
> > To invert the question: what is the value in keeping the dst?
> 
> I guess simplicity defined as "how many English words are needed to
> explain the semantics".
> 
> The semantics I have in mind would be - dst is dropped if (1) proto 
> is changed (this patch), or (2) "CLEAR_DST" flag is explicitly set
> (future extension).
> 
> If we drop on encap (which I supposed is the counter proposal)
> we may end up with: dst is dropped if (1) proto is changed, 
> (2) encap flags are set (1+2 = alternative patch), or (3) "CLEAR_DST"
> flag is explicitly set (future extension). 

I was just wondering what the effect is of dropping, and why we should
err on the side of minimizing that.

Leaving it to the authors of a BPF program to understand whether they
should call a kfunc seems dangerous and error prone.

I would drop on every program that calls bpf_skb_adjust_room() or
bpf_skb_change_proto(). This patch LGTM in other words. If anything,
I would drop more aggressively.
 
> Don't think we can rule out the need for a CLEAR_DST flag as not all
> re-routings are encaps.
> 
> But both you and Maciej consider dropping for all encaps more
> intuitive, so I'll do that in v2 unless someone objects.
> 
> > The test refers to a nat6to4.bpf.o, but this is not included.
> 
> I reused an existing BPF prog, it does what we need -
> it turns a v4 packet into a v6 one :)

Oops yes, thanks. I somehow missed that in my quick `find`.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-06 15:55             ` Willem de Bruijn
@ 2025-06-06 17:09               ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2025-06-06 17:09 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Maciej Żenczykowski, davem, netdev, edumazet, pabeni,
	andrew+netdev, horms, martin.lau, daniel, john.fastabend, eddyz87,
	sdf, haoluo, willemb, william.xuanziyang, alan.maguire, bpf

On Fri, 06 Jun 2025 11:55:50 -0400 Willem de Bruijn wrote:
> I would drop on every program that calls bpf_skb_adjust_room() or
> bpf_skb_change_proto().

I think @maze covered it well, not all adjust_room calls matter 
to L3, they may be adjusting L2. I don't feel like blanket dst
discard is either always correct or necessarily sufficient 
(when rerouting the packet without adjusting length).
Let me respond to him with the draft patch..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-06  9:40             ` Maciej Żenczykowski
@ 2025-06-06 17:11               ` Jakub Kicinski
  2025-06-06 17:36                 ` Maciej Żenczykowski
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-06-06 17:11 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Willem de Bruijn, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, martin.lau, daniel, john.fastabend, eddyz87, sdf, haoluo,
	willemb, william.xuanziyang, alan.maguire, bpf

On Fri, 6 Jun 2025 11:40:31 +0200 Maciej Żenczykowski wrote:
> Hopefully this is helpful?

So IIUC this is what we should do?
Cover the cases where we are 99% sure dropping dst is right without
being overeager ?

---->8-----

diff --git a/net/core/filter.c b/net/core/filter.c
index 327ca73f9cd7..d5917d6446f2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3401,18 +3401,24 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
 	 * care of stores.
 	 *
 	 * Currently, additional options and extension header space are
 	 * not supported, but flags register is reserved so we can adapt
 	 * that. For offloads, we mark packet as dodgy, so that headers
 	 * need to be verified first.
 	 */
 	ret = bpf_skb_proto_xlat(skb, proto);
+	if (ret)
+		return ret;
+
 	bpf_compute_data_pointers(skb);
-	return ret;
+	if (skb_valid_dst(skb))
+		skb_dst_drop(skb);
+
+	return 0;
 }
 
 static const struct bpf_func_proto bpf_skb_change_proto_proto = {
 	.func		= bpf_skb_change_proto,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
@@ -3549,16 +3555,19 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 
 		/* Match skb->protocol to new outer l3 protocol */
 		if (skb->protocol == htons(ETH_P_IP) &&
 		    flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
 			skb->protocol = htons(ETH_P_IPV6);
 		else if (skb->protocol == htons(ETH_P_IPV6) &&
 			 flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
 			skb->protocol = htons(ETH_P_IP);
+
+		if (skb_valid_dst(skb))
+			skb_dst_drop(skb);
 	}
 
 	if (skb_is_gso(skb)) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
 
 		/* Header must be checked, and gso_segs recomputed. */
 		shinfo->gso_type |= gso_type;
 		shinfo->gso_segs = 0;
@@ -3576,16 +3585,17 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 	}
 
 	return 0;
 }
 
 static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 			      u64 flags)
 {
+	bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
 	int ret;
 
 	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
 			       BPF_F_ADJ_ROOM_DECAP_L3_MASK |
 			       BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
 		return -EINVAL;
 
 	if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
@@ -3598,23 +3608,28 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 	ret = skb_unclone(skb, GFP_ATOMIC);
 	if (unlikely(ret < 0))
 		return ret;
 
 	ret = bpf_skb_net_hdr_pop(skb, off, len_diff);
 	if (unlikely(ret < 0))
 		return ret;
 
-	/* Match skb->protocol to new outer l3 protocol */
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
-		skb->protocol = htons(ETH_P_IPV6);
-	else if (skb->protocol == htons(ETH_P_IPV6) &&
-		 flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
-		skb->protocol = htons(ETH_P_IP);
+	if (decap) {
+		/* Match skb->protocol to new outer l3 protocol */
+		if (skb->protocol == htons(ETH_P_IP) &&
+		    flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
+			skb->protocol = htons(ETH_P_IPV6);
+		else if (skb->protocol == htons(ETH_P_IPV6) &&
+			 flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
+			skb->protocol = htons(ETH_P_IP);
+
+		if (skb_valid_dst(skb))
+			skb_dst_drop(skb);
+	}
 
 	if (skb_is_gso(skb)) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
 
 		/* Due to header shrink, MSS can be upgraded. */
 		if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
 			skb_increase_gso_size(shinfo, len_diff);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] net: clear the dst when changing skb protocol
  2025-06-06 17:11               ` Jakub Kicinski
@ 2025-06-06 17:36                 ` Maciej Żenczykowski
  0 siblings, 0 replies; 13+ messages in thread
From: Maciej Żenczykowski @ 2025-06-06 17:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Willem de Bruijn, davem, netdev, edumazet, pabeni, andrew+netdev,
	horms, martin.lau, daniel, john.fastabend, eddyz87, sdf, haoluo,
	willemb, william.xuanziyang, alan.maguire, bpf

On Fri, Jun 6, 2025 at 7:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 6 Jun 2025 11:40:31 +0200 Maciej Żenczykowski wrote:
> > Hopefully this is helpful?
>
> So IIUC this is what we should do?
> Cover the cases where we are 99% sure dropping dst is right without
> being overeager ?
>
> ---->8-----
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 327ca73f9cd7..d5917d6446f2 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3401,18 +3401,24 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
>          * care of stores.
>          *
>          * Currently, additional options and extension header space are
>          * not supported, but flags register is reserved so we can adapt
>          * that. For offloads, we mark packet as dodgy, so that headers
>          * need to be verified first.
>          */
>         ret = bpf_skb_proto_xlat(skb, proto);
> +       if (ret)
> +               return ret;
> +
>         bpf_compute_data_pointers(skb);
> -       return ret;
> +       if (skb_valid_dst(skb))
> +               skb_dst_drop(skb);
> +
> +       return 0;
>  }
>
>  static const struct bpf_func_proto bpf_skb_change_proto_proto = {
>         .func           = bpf_skb_change_proto,
>         .gpl_only       = false,
>         .ret_type       = RET_INTEGER,
>         .arg1_type      = ARG_PTR_TO_CTX,
>         .arg2_type      = ARG_ANYTHING,
> @@ -3549,16 +3555,19 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>
>                 /* Match skb->protocol to new outer l3 protocol */
>                 if (skb->protocol == htons(ETH_P_IP) &&
>                     flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
>                         skb->protocol = htons(ETH_P_IPV6);
>                 else if (skb->protocol == htons(ETH_P_IPV6) &&
>                          flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
>                         skb->protocol = htons(ETH_P_IP);
> +
> +               if (skb_valid_dst(skb))
> +                       skb_dst_drop(skb);
>         }
>
>         if (skb_is_gso(skb)) {
>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
>
>                 /* Header must be checked, and gso_segs recomputed. */
>                 shinfo->gso_type |= gso_type;
>                 shinfo->gso_segs = 0;
> @@ -3576,16 +3585,17 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>         }
>
>         return 0;
>  }
>
>  static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>                               u64 flags)
>  {
> +       bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
>         int ret;
>
>         if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
>                                BPF_F_ADJ_ROOM_DECAP_L3_MASK |
>                                BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
>                 return -EINVAL;
>
>         if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) {
> @@ -3598,23 +3608,28 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>         ret = skb_unclone(skb, GFP_ATOMIC);
>         if (unlikely(ret < 0))
>                 return ret;
>
>         ret = bpf_skb_net_hdr_pop(skb, off, len_diff);
>         if (unlikely(ret < 0))
>                 return ret;
>
> -       /* Match skb->protocol to new outer l3 protocol */
> -       if (skb->protocol == htons(ETH_P_IP) &&
> -           flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
> -               skb->protocol = htons(ETH_P_IPV6);
> -       else if (skb->protocol == htons(ETH_P_IPV6) &&
> -                flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
> -               skb->protocol = htons(ETH_P_IP);
> +       if (decap) {
> +               /* Match skb->protocol to new outer l3 protocol */
> +               if (skb->protocol == htons(ETH_P_IP) &&
> +                   flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
> +                       skb->protocol = htons(ETH_P_IPV6);
> +               else if (skb->protocol == htons(ETH_P_IPV6) &&
> +                        flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
> +                       skb->protocol = htons(ETH_P_IP);
> +
> +               if (skb_valid_dst(skb))
> +                       skb_dst_drop(skb);
> +       }
>
>         if (skb_is_gso(skb)) {
>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
>
>                 /* Due to header shrink, MSS can be upgraded. */
>                 if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
>                         skb_increase_gso_size(shinfo, len_diff);
>

This looks reasonable to me... not that I feel very qualified to make
that statement ;-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-06-06 17:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 21:06 [PATCH net] net: clear the dst when changing skb protocol Jakub Kicinski
2025-06-04 21:21 ` Maciej Żenczykowski
2025-06-05 13:22   ` Jakub Kicinski
2025-06-05 13:50     ` Maciej Żenczykowski
2025-06-05 14:01       ` Jakub Kicinski
2025-06-06  0:09         ` Willem de Bruijn
2025-06-06  0:31           ` Jakub Kicinski
2025-06-06  9:40             ` Maciej Żenczykowski
2025-06-06 17:11               ` Jakub Kicinski
2025-06-06 17:36                 ` Maciej Żenczykowski
2025-06-06 15:55             ` Willem de Bruijn
2025-06-06 17:09               ` Jakub Kicinski
2025-06-04 21:45 ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).