* [RFC PATCH net-next 1/2] udp: fix encapsulation packet resubmit in multicast deliver
2026-04-14 23:24 [RFC PATCH net-next 0/2] udp: fix FOU/GUE over multicast Anton Danilov
@ 2026-04-14 23:27 ` Anton Danilov
2026-04-14 23:28 ` [RFC PATCH net-next 2/2] selftests: net: add FOU multicast encapsulation resubmit test Anton Danilov
1 sibling, 0 replies; 5+ messages in thread
From: Anton Danilov @ 2026-04-14 23:27 UTC (permalink / raw)
To: netdev
Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
horms, shuah, linux-kselftest
When a UDP encapsulation socket (e.g., FOU) receives a multicast
packet, __udp4_lib_mcast_deliver() and __udp6_lib_mcast_deliver()
incorrectly call consume_skb() when udp_queue_rcv_skb() returns a
positive value. A positive return value from udp_queue_rcv_skb()
indicates that the encap_rcv handler (e.g., fou_udp_recv) has
consumed the UDP header and wants the packet to be resubmitted to
the IP protocol handler for further processing (e.g., as a GRE
packet).
The unicast path in udp_unicast_rcv_skb() handles this correctly by
returning -ret, which propagates up to ip_protocol_deliver_rcu() for
resubmission. The GSO path in udp_queue_rcv_skb() also handles this
correctly by calling ip_protocol_deliver_rcu() directly. However, the
multicast path destroys the packet via consume_skb() instead of
resubmitting it, causing silent packet loss.
This bug affects any UDP encapsulation (FOU, GUE) combined with
multicast destination addresses. In practice, it causes ~50% packet
loss on FOU/GRETAP tunnels configured with multicast remote addresses,
with the exact ratio depending on the early demux cache hit rate
(packets that hit early demux take the unicast path and are handled
correctly).
Fix this by calling ip_protocol_deliver_rcu() (IPv4) or
ip6_protocol_deliver_rcu() (IPv6) instead of consume_skb() when the
return value is positive, matching the behavior of the GSO path.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
net/ipv4/udp.c | 13 +++++++++----
net/ipv6/udp.c | 13 +++++++++----
2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e9e2ce9522ef..8c2d4367cba2 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2467,6 +2467,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
struct udp_hslot *hslot;
struct sk_buff *nskb;
bool use_hash2;
+ int ret;
hash2_any = 0;
hash2 = 0;
@@ -2500,8 +2501,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
__UDP_INC_STATS(net, UDP_MIB_INERRORS);
continue;
}
- if (udp_queue_rcv_skb(sk, nskb) > 0)
- consume_skb(nskb);
+ ret = udp_queue_rcv_skb(sk, nskb);
+ if (ret > 0)
+ ip_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+ ret);
}
/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -2511,8 +2514,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
}
if (first) {
- if (udp_queue_rcv_skb(first, skb) > 0)
- consume_skb(skb);
+ ret = udp_queue_rcv_skb(first, skb);
+ if (ret > 0)
+ ip_protocol_deliver_rcu(dev_net(skb->dev), skb,
+ ret);
} else {
kfree_skb(skb);
__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 15e032194ecc..f74935d9f7d7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -949,6 +949,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
struct udp_hslot *hslot;
struct sk_buff *nskb;
bool use_hash2;
+ int ret;
hash2_any = 0;
hash2 = 0;
@@ -987,8 +988,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
continue;
}
- if (udpv6_queue_rcv_skb(sk, nskb) > 0)
- consume_skb(nskb);
+ ret = udpv6_queue_rcv_skb(sk, nskb);
+ if (ret > 0)
+ ip6_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+ ret, true);
}
/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -998,8 +1001,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
}
if (first) {
- if (udpv6_queue_rcv_skb(first, skb) > 0)
- consume_skb(skb);
+ ret = udpv6_queue_rcv_skb(first, skb);
+ if (ret > 0)
+ ip6_protocol_deliver_rcu(dev_net(skb->dev), skb,
+ ret, true);
} else {
kfree_skb(skb);
__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread* [RFC PATCH net-next 2/2] selftests: net: add FOU multicast encapsulation resubmit test
2026-04-14 23:24 [RFC PATCH net-next 0/2] udp: fix FOU/GUE over multicast Anton Danilov
2026-04-14 23:27 ` [RFC PATCH net-next 1/2] udp: fix encapsulation packet resubmit in multicast deliver Anton Danilov
@ 2026-04-14 23:28 ` Anton Danilov
2026-04-15 10:25 ` Breno Leitao
1 sibling, 1 reply; 5+ messages in thread
From: Anton Danilov @ 2026-04-14 23:28 UTC (permalink / raw)
To: netdev
Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
horms, shuah, linux-kselftest
Add a selftest to verify that FOU-encapsulated packets addressed to a
multicast destination are correctly resubmitted to the inner protocol
handler (GRE) via the UDP multicast delivery path.
The test creates two network namespaces connected by a veth pair. The
receiver namespace has a FOU/GRETAP tunnel with a multicast remote
address (239.0.0.1). The sender crafts GRE-over-UDP packets and sends
them to the multicast address.
The early demux optimization (net.ipv4.ip_early_demux) is disabled on
the receiver to force packets through __udp4_lib_mcast_deliver(),
which is the code path that was previously broken.
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
tools/testing/selftests/net/Makefile | 1 +
.../testing/selftests/net/fou_mcast_encap.sh | 150 ++++++++++++++++++
2 files changed, 151 insertions(+)
create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584026..9b2a573e4af2 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -38,6 +38,7 @@ TEST_PROGS := \
fib_rule_tests.sh \
fib_tests.sh \
fin_ack_lat.sh \
+ fou_mcast_encap.sh \
fq_band_pktlimit.sh \
gre_gso.sh \
gre_ipv6_lladdr.sh \
diff --git a/tools/testing/selftests/net/fou_mcast_encap.sh b/tools/testing/selftests/net/fou_mcast_encap.sh
new file mode 100755
index 000000000000..d4737d674862
--- /dev/null
+++ b/tools/testing/selftests/net/fou_mcast_encap.sh
@@ -0,0 +1,150 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that UDP encapsulation (FOU) correctly handles packet resubmit
+# when packets are delivered via the multicast UDP delivery path.
+#
+# When a FOU-encapsulated packet arrives with a multicast destination IP,
+# __udp4_lib_mcast_deliver() must resubmit it to the inner protocol
+# handler (e.g., GRE) rather than consuming it. This test verifies that
+# by creating a FOU/GRETAP tunnel with a multicast remote address, sending
+# encapsulated packets, and checking that they are correctly decapsulated.
+#
+# The early demux optimization can mask this issue by routing packets via
+# the unicast path (udp_unicast_rcv_skb), so we disable it to force
+# packets through __udp4_lib_mcast_deliver().
+
+source lib.sh
+
+NSENDER=""
+NRECV=""
+
+cleanup() {
+ cleanup_all_ns
+}
+
+trap cleanup EXIT
+
+setup() {
+ setup_ns NSENDER NRECV
+
+ ip link add veth_s type veth peer name veth_r
+ ip link set veth_s netns "$NSENDER"
+ ip link set veth_r netns "$NRECV"
+
+ ip -n "$NSENDER" addr add 10.0.0.1/24 dev veth_s
+ ip -n "$NSENDER" link set veth_s up
+
+ ip -n "$NRECV" addr add 10.0.0.2/24 dev veth_r
+ ip -n "$NRECV" link set veth_r up
+
+ # Disable early demux to force multicast delivery path
+ ip netns exec "$NRECV" sysctl -wq net.ipv4.ip_early_demux=0
+
+ # Join multicast group on receiver
+ ip -n "$NRECV" addr add 239.0.0.1/32 dev veth_r autojoin
+
+ # Multicast routes
+ ip -n "$NRECV" route add 239.0.0.0/8 dev veth_r
+ ip -n "$NSENDER" route add 239.0.0.0/8 dev veth_s
+
+ # FOU listener
+ ip netns exec "$NRECV" ip fou add port 4797 ipproto 47
+
+ # GRETAP with multicast remote - this triggers __udp4_lib_mcast_deliver
+ ip -n "$NRECV" link add eoudp0 type gretap \
+ remote 239.0.0.1 local 10.0.0.2 \
+ encap fou encap-sport 4797 encap-dport 4797 \
+ key 239.0.0.1
+ ip -n "$NRECV" link set eoudp0 up
+ ip -n "$NRECV" addr add 192.168.99.2/24 dev eoudp0
+}
+
+send_fou_gre_packets() {
+ local count=$1
+
+ ip netns exec "$NSENDER" python3 -c "
+import socket, struct
+
+# GRE header: key flag set, proto=0x6558 (transparent ethernet bridging)
+gre_key = socket.inet_aton('239.0.0.1')
+gre_hdr = struct.pack('!HH', 0x2000, 0x6558) + gre_key
+
+# Inner Ethernet frame
+dst_mac = b'\xff\xff\xff\xff\xff\xff'
+src_mac = b'\x02\x00\x00\x00\x00\x01'
+eth_hdr = dst_mac + src_mac + struct.pack('!H', 0x0800)
+
+# Inner IP: 192.168.99.1 -> 192.168.99.2, ICMP echo
+inner_ip_src = socket.inet_aton('192.168.99.1')
+inner_ip_dst = socket.inet_aton('192.168.99.2')
+
+# ICMP echo request
+icmp_payload = b'TESTFOU!' * 4
+icmp_hdr = struct.pack('!BBHHH', 8, 0, 0, 0x1234, 1) + icmp_payload
+csum = 0
+for i in range(0, len(icmp_hdr), 2):
+ if i + 1 < len(icmp_hdr):
+ csum += (icmp_hdr[i] << 8) + icmp_hdr[i+1]
+ else:
+ csum += icmp_hdr[i] << 8
+while csum >> 16:
+ csum = (csum & 0xffff) + (csum >> 16)
+csum = ~csum & 0xffff
+icmp_hdr = struct.pack('!BBHHH', 8, 0, csum, 0x1234, 1) + icmp_payload
+
+# Inner IP header
+ip_len = 20 + len(icmp_hdr)
+ip_hdr = struct.pack('!BBHHHBBH', 0x45, 0, ip_len, 0x1234, 0, 64, 1, 0)
+ip_hdr += inner_ip_src + inner_ip_dst
+csum = 0
+for i in range(0, 20, 2):
+ csum += (ip_hdr[i] << 8) + ip_hdr[i+1]
+while csum >> 16:
+ csum = (csum & 0xffff) + (csum >> 16)
+csum = ~csum & 0xffff
+ip_hdr = ip_hdr[:10] + struct.pack('!H', csum) + ip_hdr[12:]
+
+payload = gre_hdr + eth_hdr + ip_hdr + icmp_hdr
+
+sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+for _ in range($count):
+ sock.sendto(payload, ('239.0.0.1', 4797))
+sock.close()
+"
+}
+
+get_rx_packets() {
+ ip -n "$NRECV" -s link show eoudp0 | awk '/RX:/{getline; print $2}'
+}
+
+test_fou_mcast_encap() {
+ local count=100
+ local rx_before
+ local rx_after
+ local rx_delta
+
+ rx_before=$(get_rx_packets)
+ send_fou_gre_packets $count
+ sleep 1
+ rx_after=$(get_rx_packets)
+
+ rx_delta=$((rx_after - rx_before))
+
+ if [ "$rx_delta" -ge "$count" ]; then
+ echo "PASS: received $rx_delta/$count packets via multicast FOU/GRETAP"
+ return "$ksft_pass"
+ elif [ "$rx_delta" -gt 0 ]; then
+ echo "FAIL: only $rx_delta/$count packets received (partial delivery)"
+ return "$ksft_fail"
+ else
+ echo "FAIL: 0/$count packets received (multicast encap resubmit broken)"
+ return "$ksft_fail"
+ fi
+}
+
+echo "TEST: FOU/GRETAP multicast encapsulation resubmit"
+
+setup
+test_fou_mcast_encap
+exit $?
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread