[PATCH net v2] net: clear the dst when changing skb protocol

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v2] net: clear the dst when changing skb protocol
@ 2025-06-07 20:47 Jakub Kicinski
  2025-06-07 21:33 ` Maciej Żenczykowski
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-06-07 20:47 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	Maciej Żenczykowski, Daniel Borkmann, martin.lau,
	john.fastabend, eddyz87, sdf, haoluo, willemb, william.xuanziyang,
	alan.maguire, bpf, shuah, linux-kselftest, yonghong.song

A not-so-careful NAT46 BPF program can crash the kernel
if it indiscriminately flips ingress packets from v4 to v6:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
    ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
    ipv6_rcv (net/ipv6/ip6_input.c:306:8)
    process_backlog (net/core/dev.c:6186:4)
    napi_poll (net/core/dev.c:6906:9)
    net_rx_action (net/core/dev.c:7028:13)
    do_softirq (kernel/softirq.c:462:3)
    netif_rx (net/core/dev.c:5326:3)
    dev_loopback_xmit (net/core/dev.c:4015:2)
    ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
    NF_HOOK (./include/linux/netfilter.h:314:9)
    ip_mc_output (net/ipv4/ip_output.c:400:5)
    dst_output (./include/net/dst.h:459:9)
    ip_local_out (net/ipv4/ip_output.c:130:9)
    ip_send_skb (net/ipv4/ip_output.c:1496:8)
    udp_send_skb (net/ipv4/udp.c:1040:8)
    udp_sendmsg (net/ipv4/udp.c:1328:10)

The output interface has a 4->6 program attached at ingress.
We try to loop the multicast skb back to the sending socket.
Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
and changes skb->protocol to v6. We enter ip6_rcv_core which
tries to use skb_dst(). But the dst is still an IPv4 one left
after IPv4 mcast output.

Clear the dst in all BPF helpers which change the protocol.
Also clear the dst if we did an encap or decap as those
will most likely make the dst stale.
Try to preserve metadata dsts, those may carry non-routing
metadata.

Reviewed-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
v2:
 - drop on encap/decap
 - fix typo (protcol)
 - add the test to the Makefile
v1: https://lore.kernel.org/20250604210604.257036-1-kuba@kernel.org

I wonder if we should not skip ingress (tc_skip_classify?)
for looped back packets in the first place. But that doesn't
seem robust enough vs multiple redirections to solve the crash.

Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
work either, since BPF can change pkt_type arbitrarily.

CC: martin.lau@linux.dev
CC: daniel@iogearbox.net
CC: john.fastabend@gmail.com
CC: eddyz87@gmail.com
CC: sdf@fomichev.me
CC: haoluo@google.com
CC: willemb@google.com
CC: william.xuanziyang@huawei.com
CC: alan.maguire@oracle.com
CC: bpf@vger.kernel.org
CC: edumazet@google.com
CC: maze@google.com
CC: shuah@kernel.org
CC: linux-kselftest@vger.kernel.org
CC: yonghong.song@linux.dev
---
 tools/testing/selftests/net/Makefile   |  1 +
 net/core/filter.c                      | 31 +++++++++++++++++++-------
 tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++
 3 files changed, 39 insertions(+), 8 deletions(-)
 create mode 100755 tools/testing/selftests/net/nat6to4.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index ea84b88bcb30..ab996bd22a5f 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -27,6 +27,7 @@ TEST_PROGS += amt.sh
 TEST_PROGS += unicast_extensions.sh
 TEST_PROGS += udpgro_fwd.sh
 TEST_PROGS += udpgro_frglist.sh
+TEST_PROGS += nat6to4.sh
 TEST_PROGS += veth.sh
 TEST_PROGS += ioam6.sh
 TEST_PROGS += gro.sh
diff --git a/net/core/filter.c b/net/core/filter.c
index 327ca73f9cd7..d5917d6446f2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3406,8 +3406,14 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
 	 * need to be verified first.
 	 */
 	ret = bpf_skb_proto_xlat(skb, proto);
+	if (ret)
+		return ret;
+
 	bpf_compute_data_pointers(skb);
-	return ret;
+	if (skb_valid_dst(skb))
+		skb_dst_drop(skb);
+
+	return 0;
 }
 
 static const struct bpf_func_proto bpf_skb_change_proto_proto = {
@@ -3554,6 +3560,9 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 		else if (skb->protocol == htons(ETH_P_IPV6) &&
 			 flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
 			skb->protocol = htons(ETH_P_IP);
+
+		if (skb_valid_dst(skb))
+			skb_dst_drop(skb);
 	}
 
 	if (skb_is_gso(skb)) {
@@ -3581,6 +3590,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 			      u64 flags)
 {
+	bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
 	int ret;
 
 	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
@@ -3603,13 +3613,18 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 	if (unlikely(ret < 0))
 		return ret;
 
-	/* Match skb->protocol to new outer l3 protocol */
-	if (skb->protocol == htons(ETH_P_IP) &&
-	    flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
-		skb->protocol = htons(ETH_P_IPV6);
-	else if (skb->protocol == htons(ETH_P_IPV6) &&
-		 flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
-		skb->protocol = htons(ETH_P_IP);
+	if (decap) {
+		/* Match skb->protocol to new outer l3 protocol */
+		if (skb->protocol == htons(ETH_P_IP) &&
+		    flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
+			skb->protocol = htons(ETH_P_IPV6);
+		else if (skb->protocol == htons(ETH_P_IPV6) &&
+			 flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
+			skb->protocol = htons(ETH_P_IP);
+
+		if (skb_valid_dst(skb))
+			skb_dst_drop(skb);
+	}
 
 	if (skb_is_gso(skb)) {
 		struct skb_shared_info *shinfo = skb_shinfo(skb);
diff --git a/tools/testing/selftests/net/nat6to4.sh b/tools/testing/selftests/net/nat6to4.sh
new file mode 100755
index 000000000000..0ee859b622a4
--- /dev/null
+++ b/tools/testing/selftests/net/nat6to4.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+NS="ns-peer-$(mktemp -u XXXXXX)"
+
+ip netns add "${NS}"
+ip -netns "${NS}" link set lo up
+ip -netns "${NS}" route add default via 127.0.0.2 dev lo
+
+tc -n "${NS}" qdisc add dev lo ingress
+tc -n "${NS}" filter add dev lo ingress prio 4 protocol ip \
+   bpf object-file nat6to4.bpf.o section schedcls/egress4/snat4 direct-action
+
+ip netns exec "${NS}" \
+   bash -c 'echo 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abc | socat - UDP4-DATAGRAM:224.1.0.1:6666,ip-multicast-loop=1'
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: clear the dst when changing skb protocol
  2025-06-07 20:47 [PATCH net v2] net: clear the dst when changing skb protocol Jakub Kicinski
@ 2025-06-07 21:33 ` Maciej Żenczykowski
  2025-06-09 13:50   ` Willem de Bruijn
  2025-06-09 21:59   ` Jakub Kicinski
  0 siblings, 2 replies; 6+ messages in thread
From: Maciej Żenczykowski @ 2025-06-07 21:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	Daniel Borkmann, martin.lau, john.fastabend, eddyz87, sdf, haoluo,
	willemb, william.xuanziyang, alan.maguire, bpf, shuah,
	linux-kselftest, yonghong.song

On Sat, Jun 7, 2025 at 10:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> A not-so-careful NAT46 BPF program can crash the kernel
> if it indiscriminately flips ingress packets from v4 to v6:
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>     ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
>     ipv6_rcv (net/ipv6/ip6_input.c:306:8)
>     process_backlog (net/core/dev.c:6186:4)
>     napi_poll (net/core/dev.c:6906:9)
>     net_rx_action (net/core/dev.c:7028:13)
>     do_softirq (kernel/softirq.c:462:3)
>     netif_rx (net/core/dev.c:5326:3)
>     dev_loopback_xmit (net/core/dev.c:4015:2)
>     ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
>     NF_HOOK (./include/linux/netfilter.h:314:9)
>     ip_mc_output (net/ipv4/ip_output.c:400:5)
>     dst_output (./include/net/dst.h:459:9)
>     ip_local_out (net/ipv4/ip_output.c:130:9)
>     ip_send_skb (net/ipv4/ip_output.c:1496:8)
>     udp_send_skb (net/ipv4/udp.c:1040:8)
>     udp_sendmsg (net/ipv4/udp.c:1328:10)
>
> The output interface has a 4->6 program attached at ingress.
> We try to loop the multicast skb back to the sending socket.
> Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
> and changes skb->protocol to v6. We enter ip6_rcv_core which
> tries to use skb_dst(). But the dst is still an IPv4 one left
> after IPv4 mcast output.
>
> Clear the dst in all BPF helpers which change the protocol.
> Also clear the dst if we did an encap or decap as those
> will most likely make the dst stale.
> Try to preserve metadata dsts, those may carry non-routing
> metadata.
>
> Reviewed-by: Maciej Żenczykowski <maze@google.com>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
> Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> v2:
>  - drop on encap/decap
>  - fix typo (protcol)
>  - add the test to the Makefile
> v1: https://lore.kernel.org/20250604210604.257036-1-kuba@kernel.org
>
> I wonder if we should not skip ingress (tc_skip_classify?)
> for looped back packets in the first place. But that doesn't
> seem robust enough vs multiple redirections to solve the crash.
>
> Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
> work either, since BPF can change pkt_type arbitrarily.
>
> CC: martin.lau@linux.dev
> CC: daniel@iogearbox.net
> CC: john.fastabend@gmail.com
> CC: eddyz87@gmail.com
> CC: sdf@fomichev.me
> CC: haoluo@google.com
> CC: willemb@google.com
> CC: william.xuanziyang@huawei.com
> CC: alan.maguire@oracle.com
> CC: bpf@vger.kernel.org
> CC: edumazet@google.com
> CC: maze@google.com
> CC: shuah@kernel.org
> CC: linux-kselftest@vger.kernel.org
> CC: yonghong.song@linux.dev
> ---
>  tools/testing/selftests/net/Makefile   |  1 +
>  net/core/filter.c                      | 31 +++++++++++++++++++-------
>  tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++
>  3 files changed, 39 insertions(+), 8 deletions(-)
>  create mode 100755 tools/testing/selftests/net/nat6to4.sh
>
> diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> index ea84b88bcb30..ab996bd22a5f 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -27,6 +27,7 @@ TEST_PROGS += amt.sh
>  TEST_PROGS += unicast_extensions.sh
>  TEST_PROGS += udpgro_fwd.sh
>  TEST_PROGS += udpgro_frglist.sh
> +TEST_PROGS += nat6to4.sh
>  TEST_PROGS += veth.sh
>  TEST_PROGS += ioam6.sh
>  TEST_PROGS += gro.sh
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 327ca73f9cd7..d5917d6446f2 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3406,8 +3406,14 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
>          * need to be verified first.
>          */
>         ret = bpf_skb_proto_xlat(skb, proto);
> +       if (ret)
> +               return ret;
> +
>         bpf_compute_data_pointers(skb);
> -       return ret;
> +       if (skb_valid_dst(skb))
> +               skb_dst_drop(skb);
> +
> +       return 0;
>  }
>
>  static const struct bpf_func_proto bpf_skb_change_proto_proto = {
> @@ -3554,6 +3560,9 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>                 else if (skb->protocol == htons(ETH_P_IPV6) &&
>                          flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
>                         skb->protocol = htons(ETH_P_IP);
> +
> +               if (skb_valid_dst(skb))
> +                       skb_dst_drop(skb);
>         }
>
>         if (skb_is_gso(skb)) {
> @@ -3581,6 +3590,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>  static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>                               u64 flags)
>  {
> +       bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
>         int ret;
>
>         if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
> @@ -3603,13 +3613,18 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>         if (unlikely(ret < 0))
>                 return ret;
>
> -       /* Match skb->protocol to new outer l3 protocol */
> -       if (skb->protocol == htons(ETH_P_IP) &&
> -           flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
> -               skb->protocol = htons(ETH_P_IPV6);
> -       else if (skb->protocol == htons(ETH_P_IPV6) &&
> -                flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
> -               skb->protocol = htons(ETH_P_IP);
> +       if (decap) {
> +               /* Match skb->protocol to new outer l3 protocol */
> +               if (skb->protocol == htons(ETH_P_IP) &&
> +                   flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
> +                       skb->protocol = htons(ETH_P_IPV6);
> +               else if (skb->protocol == htons(ETH_P_IPV6) &&
> +                        flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
> +                       skb->protocol = htons(ETH_P_IP);
> +
> +               if (skb_valid_dst(skb))
> +                       skb_dst_drop(skb);
> +       }
>
>         if (skb_is_gso(skb)) {
>                 struct skb_shared_info *shinfo = skb_shinfo(skb);
> diff --git a/tools/testing/selftests/net/nat6to4.sh b/tools/testing/selftests/net/nat6to4.sh
> new file mode 100755
> index 000000000000..0ee859b622a4
> --- /dev/null
> +++ b/tools/testing/selftests/net/nat6to4.sh
> @@ -0,0 +1,15 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +NS="ns-peer-$(mktemp -u XXXXXX)"
> +
> +ip netns add "${NS}"
> +ip -netns "${NS}" link set lo up
> +ip -netns "${NS}" route add default via 127.0.0.2 dev lo
> +
> +tc -n "${NS}" qdisc add dev lo ingress
> +tc -n "${NS}" filter add dev lo ingress prio 4 protocol ip \
> +   bpf object-file nat6to4.bpf.o section schedcls/egress4/snat4 direct-action
> +
> +ip netns exec "${NS}" \
> +   bash -c 'echo 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abc | socat - UDP4-DATAGRAM:224.1.0.1:6666,ip-multicast-loop=1'
> --
> 2.49.0

Submit away.

1 meta question: as this is a fix and will thus be backported into
5.4+ LTS, should this be split into two patches? Either making the
test a follow up, or even going with only the crash fix in patch 1 and
putting the 4-in-4 and 6-in-6 behavioural change in patch 2?  We'd end
up in the same state at tip of tree... but it would affect the LTS
backports.  Honestly I'm not even sure what's best.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: clear the dst when changing skb protocol
  2025-06-07 21:33 ` Maciej Żenczykowski
@ 2025-06-09 13:50   ` Willem de Bruijn
  2025-06-09 22:00     ` Jakub Kicinski
  2025-06-09 21:59   ` Jakub Kicinski
  1 sibling, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2025-06-09 13:50 UTC (permalink / raw)
  To: Maciej Żenczykowski, Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	Daniel Borkmann, martin.lau, john.fastabend, eddyz87, sdf, haoluo,
	willemb, william.xuanziyang, alan.maguire, bpf, shuah,
	linux-kselftest, yonghong.song

Maciej Żenczykowski wrote:
> On Sat, Jun 7, 2025 at 10:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > A not-so-careful NAT46 BPF program can crash the kernel
> > if it indiscriminately flips ingress packets from v4 to v6:
> >
> >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> >     ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
> >     ipv6_rcv (net/ipv6/ip6_input.c:306:8)
> >     process_backlog (net/core/dev.c:6186:4)
> >     napi_poll (net/core/dev.c:6906:9)
> >     net_rx_action (net/core/dev.c:7028:13)
> >     do_softirq (kernel/softirq.c:462:3)
> >     netif_rx (net/core/dev.c:5326:3)
> >     dev_loopback_xmit (net/core/dev.c:4015:2)
> >     ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
> >     NF_HOOK (./include/linux/netfilter.h:314:9)
> >     ip_mc_output (net/ipv4/ip_output.c:400:5)
> >     dst_output (./include/net/dst.h:459:9)
> >     ip_local_out (net/ipv4/ip_output.c:130:9)
> >     ip_send_skb (net/ipv4/ip_output.c:1496:8)
> >     udp_send_skb (net/ipv4/udp.c:1040:8)
> >     udp_sendmsg (net/ipv4/udp.c:1328:10)
> >
> > The output interface has a 4->6 program attached at ingress.
> > We try to loop the multicast skb back to the sending socket.
> > Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
> > and changes skb->protocol to v6. We enter ip6_rcv_core which
> > tries to use skb_dst(). But the dst is still an IPv4 one left
> > after IPv4 mcast output.
> >
> > Clear the dst in all BPF helpers which change the protocol.
> > Also clear the dst if we did an encap or decap as those
> > will most likely make the dst stale.
> > Try to preserve metadata dsts, those may carry non-routing
> > metadata.
> >
> > Reviewed-by: Maciej Żenczykowski <maze@google.com>
> > Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> > Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
> > Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
> > Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Willem de Bruijn <willemb@google.com>

> > ---
> > v2:
> >  - drop on encap/decap
> >  - fix typo (protcol)
> >  - add the test to the Makefile
> > v1: https://lore.kernel.org/20250604210604.257036-1-kuba@kernel.org
> >
> > I wonder if we should not skip ingress (tc_skip_classify?)
> > for looped back packets in the first place. But that doesn't
> > seem robust enough vs multiple redirections to solve the crash.
> >
> > Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
> > work either, since BPF can change pkt_type arbitrarily.
> >
> > CC: martin.lau@linux.dev
> > CC: daniel@iogearbox.net
> > CC: john.fastabend@gmail.com
> > CC: eddyz87@gmail.com
> > CC: sdf@fomichev.me
> > CC: haoluo@google.com
> > CC: willemb@google.com
> > CC: william.xuanziyang@huawei.com
> > CC: alan.maguire@oracle.com
> > CC: bpf@vger.kernel.org
> > CC: edumazet@google.com
> > CC: maze@google.com
> > CC: shuah@kernel.org
> > CC: linux-kselftest@vger.kernel.org
> > CC: yonghong.song@linux.dev
> > ---
> >  tools/testing/selftests/net/Makefile   |  1 +
> >  net/core/filter.c                      | 31 +++++++++++++++++++-------
> >  tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++
> >  3 files changed, 39 insertions(+), 8 deletions(-)
> >  create mode 100755 tools/testing/selftests/net/nat6to4.sh
> >
> > diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> > index ea84b88bcb30..ab996bd22a5f 100644
> > --- a/tools/testing/selftests/net/Makefile
> > +++ b/tools/testing/selftests/net/Makefile
> > @@ -27,6 +27,7 @@ TEST_PROGS += amt.sh
> >  TEST_PROGS += unicast_extensions.sh
> >  TEST_PROGS += udpgro_fwd.sh
> >  TEST_PROGS += udpgro_frglist.sh
> > +TEST_PROGS += nat6to4.sh
> >  TEST_PROGS += veth.sh
> >  TEST_PROGS += ioam6.sh
> >  TEST_PROGS += gro.sh
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 327ca73f9cd7..d5917d6446f2 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3406,8 +3406,14 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
> >          * need to be verified first.
> >          */
> >         ret = bpf_skb_proto_xlat(skb, proto);
> > +       if (ret)
> > +               return ret;
> > +
> >         bpf_compute_data_pointers(skb);
> > -       return ret;

I wonder whether that unconditional call to bpf_compute_data_pointers
even if ret was there for a reason.

From reviewing the bpf_skb_proto_xlat error paths, it does seem safe
to remove it. The cases where an error may be returned after the skb
is modified only modify the skb in terms of headroom, not headlen.

> > +       if (skb_valid_dst(skb))
> > +               skb_dst_drop(skb);
> > +
> > +       return 0;
> >  }
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: clear the dst when changing skb protocol
  2025-06-07 21:33 ` Maciej Żenczykowski
  2025-06-09 13:50   ` Willem de Bruijn
@ 2025-06-09 21:59   ` Jakub Kicinski
  2025-06-09 22:02     ` Maciej Żenczykowski
  1 sibling, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-06-09 21:59 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	Daniel Borkmann, martin.lau, john.fastabend, eddyz87, sdf, haoluo,
	willemb, william.xuanziyang, alan.maguire, bpf, shuah,
	linux-kselftest, yonghong.song

On Sat, 7 Jun 2025 23:33:39 +0200 Maciej Żenczykowski wrote:
> 1 meta question: as this is a fix and will thus be backported into
> 5.4+ LTS, should this be split into two patches? Either making the
> test a follow up, or even going with only the crash fix in patch 1 and
> putting the 4-in-4 and 6-in-6 behavioural change in patch 2?  We'd end
> up in the same state at tip of tree... but it would affect the LTS
> backports.  Honestly I'm not even sure what's best.

:) Did we go from wondering if we can strip dst unconditionally to
wondering if stripping it on encap/decap may introduce regressions?

I suppose it may be useful to split, just to make it clear which
portion of the change is the crash fix and which one is just because
we think it's more consistent.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: clear the dst when changing skb protocol
  2025-06-09 13:50   ` Willem de Bruijn
@ 2025-06-09 22:00     ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-06-09 22:00 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Maciej Żenczykowski, davem, netdev, edumazet, pabeni,
	andrew+netdev, horms, Daniel Borkmann, martin.lau, john.fastabend,
	eddyz87, sdf, haoluo, willemb, william.xuanziyang, alan.maguire,
	bpf, shuah, linux-kselftest, yonghong.song

On Mon, 09 Jun 2025 09:50:30 -0400 Willem de Bruijn wrote:
> > > +       if (ret)
> > > +               return ret;
> > > +
> > >         bpf_compute_data_pointers(skb);
> > > -       return ret;  
> 
> I wonder whether that unconditional call to bpf_compute_data_pointers
> even if ret was there for a reason.
> 
> From reviewing the bpf_skb_proto_xlat error paths, it does seem safe
> to remove it. The cases where an error may be returned after the skb
> is modified only modify the skb in terms of headroom, not headlen.

I should have mentioned, I looked around and also concluded this
unconditional recompute was purely aesthetic.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: clear the dst when changing skb protocol
  2025-06-09 21:59   ` Jakub Kicinski
@ 2025-06-09 22:02     ` Maciej Żenczykowski
  0 siblings, 0 replies; 6+ messages in thread
From: Maciej Żenczykowski @ 2025-06-09 22:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	Daniel Borkmann, martin.lau, john.fastabend, eddyz87, sdf, haoluo,
	willemb, william.xuanziyang, alan.maguire, bpf, shuah,
	linux-kselftest, yonghong.song

On Mon, Jun 9, 2025 at 11:59 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 7 Jun 2025 23:33:39 +0200 Maciej Żenczykowski wrote:
> > 1 meta question: as this is a fix and will thus be backported into
> > 5.4+ LTS, should this be split into two patches? Either making the
> > test a follow up, or even going with only the crash fix in patch 1 and
> > putting the 4-in-4 and 6-in-6 behavioural change in patch 2?  We'd end
> > up in the same state at tip of tree... but it would affect the LTS
> > backports.  Honestly I'm not even sure what's best.
>
> :) Did we go from wondering if we can strip dst unconditionally to
> wondering if stripping it on encap/decap may introduce regressions?

Yeah, well I have utterly enough regression chasing in my day job.

Just spent two days chasing this fun one.

enum bpf_cmd {
        BPF_MAP_CREATE,
...
        BPF_PROG_DETACH,
        BPF_GET_COMM_HASH,    <--- added
        BPF_PROG_TEST_RUN,
...
        BPF_OBJ_GET_INFO_BY_FD,
};

> I suppose it may be useful to split, just to make it clear which
> portion of the change is the crash fix and which one is just because
> we think it's more consistent.

Your call.
> --
> pw-bot: cr

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-09 22:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-07 20:47 [PATCH net v2] net: clear the dst when changing skb protocol Jakub Kicinski
2025-06-07 21:33 ` Maciej Żenczykowski
2025-06-09 13:50   ` Willem de Bruijn
2025-06-09 22:00     ` Jakub Kicinski
2025-06-09 21:59   ` Jakub Kicinski
2025-06-09 22:02     ` Maciej Żenczykowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).