netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/7] Netfilter fixes for net
@ 2025-07-17  9:51 Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

Hi,

The following batch contains Netfilter fixes for net:

1) Three patches to enhance conntrack selftests for resize and clash
   resolution, from Florian Westphal.

2) Expand nft_concat_range.sh selftest to improve coverage from error
   path, from Florian Westphal.

3) Hide clash bit to userspace from netlink dumps until there is a
   good reason to expose, from Florian Westphal.

4) Revert notification for device registration/unregistration for
   nftables basechains and flowtables, we decided to go for a better
   way to handle this through the nfnetlink_hook infrastructure which
   will come via nf-next, patch from Phil Sutter.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-25-07-17

Thanks.

----------------------------------------------------------------

The following changes since commit 7727ec1523d7973defa1dff8f9c0aad288d04008:

  net: emaclite: Fix missing pointer increment in aligned_read() (2025-07-11 16:37:06 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-07-17

for you to fetch changes up to 2d72afb340657f03f7261e9243b44457a9228ac7:

  netfilter: nf_conntrack: fix crash due to removal of uninitialised entry (2025-07-17 11:23:33 +0200)

----------------------------------------------------------------
netfilter pull request 25-07-17

----------------------------------------------------------------
Florian Westphal (6):
      selftests: netfilter: conntrack_resize.sh: extend resize test
      selftests: netfilter: add conntrack clash resolution test case
      selftests: netfilter: conntrack_resize.sh: also use udpclash tool
      selftests: netfilter: nft_concat_range.sh: send packets to empty set
      netfilter: nf_tables: hide clash bit from userspace
      netfilter: nf_conntrack: fix crash due to removal of uninitialised entry

Phil Sutter (1):
      Revert "netfilter: nf_tables: Add notifications for hook changes"

 include/net/netfilter/nf_conntrack.h               |  15 +-
 include/net/netfilter/nf_tables.h                  |   5 -
 include/uapi/linux/netfilter/nf_tables.h           |  10 --
 include/uapi/linux/netfilter/nfnetlink.h           |   2 -
 net/netfilter/nf_conntrack_core.c                  |  26 ++-
 net/netfilter/nf_tables_api.c                      |  59 -------
 net/netfilter/nf_tables_trace.c                    |   3 +
 net/netfilter/nfnetlink.c                          |   1 -
 net/netfilter/nft_chain_filter.c                   |   2 -
 tools/testing/selftests/net/netfilter/.gitignore   |   1 +
 tools/testing/selftests/net/netfilter/Makefile     |   3 +
 .../selftests/net/netfilter/conntrack_clash.sh     | 175 +++++++++++++++++++++
 .../selftests/net/netfilter/conntrack_resize.sh    |  97 +++++++++++-
 .../selftests/net/netfilter/nft_concat_range.sh    |   3 +
 tools/testing/selftests/net/netfilter/udpclash.c   | 158 +++++++++++++++++++
 15 files changed, 468 insertions(+), 92 deletions(-)
 create mode 100755 tools/testing/selftests/net/netfilter/conntrack_clash.sh
 create mode 100644 tools/testing/selftests/net/netfilter/udpclash.c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test
  2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
@ 2025-07-17  9:51 ` Pablo Neira Ayuso
  2025-07-17 13:00   ` patchwork-bot+netdevbpf
  2025-07-17  9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

Extend the resize test:
 - continuously dump table both via /proc and ctnetlink interfaces while
   table is resized in a loop.
 - if socat is available, send udp packets in additon to ping requests.
 - increase/decrease the icmp and udp timeouts while resizes are happening.
   This makes sure we also exercise the 'ct has expired' check that happens
   on conntrack lookup.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 .../net/netfilter/conntrack_resize.sh         | 80 +++++++++++++++++--
 1 file changed, 75 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/netfilter/conntrack_resize.sh b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
index 9e033e80219e..aa1ba07eaf50 100755
--- a/tools/testing/selftests/net/netfilter/conntrack_resize.sh
+++ b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
@@ -12,6 +12,9 @@ tmpfile=""
 tmpfile_proc=""
 tmpfile_uniq=""
 ret=0
+have_socat=0
+
+socat -h > /dev/null && have_socat=1
 
 insert_count=2000
 [ "$KSFT_MACHINE_SLOW" = "yes" ] && insert_count=400
@@ -123,7 +126,7 @@ ctflush() {
         done
 }
 
-ctflood()
+ct_pingflood()
 {
 	local ns="$1"
 	local duration="$2"
@@ -152,6 +155,28 @@ ctflood()
 	wait
 }
 
+ct_udpflood()
+{
+	local ns="$1"
+	local duration="$2"
+	local now=$(date +%s)
+	local end=$((now + duration))
+
+	[ $have_socat -ne "1" ] && return
+
+        while [ $now -lt $end ]; do
+ip netns exec "$ns" bash<<"EOF"
+	for i in $(seq 1 100);do
+		dport=$(((RANDOM%65536)+1))
+
+		echo bar | socat -u STDIN UDP:"127.0.0.1:$dport" &
+	done > /dev/null 2>&1
+	wait
+EOF
+		now=$(date +%s)
+	done
+}
+
 # dump to /dev/null.  We don't want dumps to cause infinite loops
 # or use-after-free even when conntrack table is altered while dumps
 # are in progress.
@@ -169,6 +194,48 @@ ct_nulldump()
 	wait
 }
 
+ct_nulldump_loop()
+{
+	local ns="$1"
+	local duration="$2"
+	local now=$(date +%s)
+	local end=$((now + duration))
+
+        while [ $now -lt $end ]; do
+		ct_nulldump "$ns"
+		sleep $((RANDOM%2))
+		now=$(date +%s)
+	done
+}
+
+change_timeouts()
+{
+	local ns="$1"
+	local r1=$((RANDOM%2))
+	local r2=$((RANDOM%2))
+
+	[ "$r1" -eq 1 ] && ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_icmp_timeout=$((RANDOM%5))
+	[ "$r2" -eq 1 ] && ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_udp_timeout=$((RANDOM%5))
+}
+
+ct_change_timeouts_loop()
+{
+	local ns="$1"
+	local duration="$2"
+	local now=$(date +%s)
+	local end=$((now + duration))
+
+        while [ $now -lt $end ]; do
+		change_timeouts "$ns"
+		sleep $((RANDOM%2))
+		now=$(date +%s)
+	done
+
+	# restore defaults
+	ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_icmp_timeout=30
+	ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_udp_timeout=30
+}
+
 check_taint()
 {
 	local tainted_then="$1"
@@ -198,10 +265,13 @@ insert_flood()
 
 	r=$((RANDOM%$insert_count))
 
-	ctflood "$n" "$timeout" "floodresize" &
+	ct_pingflood "$n" "$timeout" "floodresize" &
+	ct_udpflood "$n" "$timeout" &
+
 	insert_ctnetlink "$n" "$r" &
 	ctflush "$n" "$timeout" &
-	ct_nulldump "$n" &
+	ct_nulldump_loop "$n" "$timeout" &
+	ct_change_timeouts_loop "$n" "$timeout" &
 
 	wait
 }
@@ -306,7 +376,7 @@ test_dump_all()
 
 	ip netns exec "$nsclient1" sysctl -q net.netfilter.nf_conntrack_icmp_timeout=3600
 
-	ctflood "$nsclient1" $timeout "dumpall" &
+	ct_pingflood "$nsclient1" $timeout "dumpall" &
 	insert_ctnetlink "$nsclient2" $insert_count
 
 	wait
@@ -368,7 +438,7 @@ test_conntrack_disable()
 	ct_flush_once "$nsclient1"
 	ct_flush_once "$nsclient2"
 
-	ctflood "$nsclient1" "$timeout" "conntrack disable"
+	ct_pingflood "$nsclient1" "$timeout" "conntrack disable"
 	ip netns exec "$nsclient2" ping -q -c 1 127.0.0.1 >/dev/null 2>&1
 
 	# Disabled, should not have picked up any connection.
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case
  2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
@ 2025-07-17  9:51 ` Pablo Neira Ayuso
  2025-07-17 13:22   ` Jakub Kicinski
  2025-07-17  9:51 ` [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

Add a dedicated test to exercise conntrack clash resolution path.
Test program emits 128 identical udp packets in parallel, then reads
back replies from socat echo server.

Also check (via conntrack -S) that the clash path was hit at least once.
Due to the racy nature of the test its possible that despite the
threaded program all packets were processed in-order or on same cpu,
emit a SKIP warning in this case.

Two tests are added:
 - one to test the simpler, non-nat case
 - one to exercise clash resolution where packets
   might have different nat transformations attached to them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 .../selftests/net/netfilter/.gitignore        |   1 +
 .../testing/selftests/net/netfilter/Makefile  |   3 +
 .../net/netfilter/conntrack_clash.sh          | 175 ++++++++++++++++++
 .../selftests/net/netfilter/udpclash.c        | 158 ++++++++++++++++
 4 files changed, 337 insertions(+)
 create mode 100755 tools/testing/selftests/net/netfilter/conntrack_clash.sh
 create mode 100644 tools/testing/selftests/net/netfilter/udpclash.c

diff --git a/tools/testing/selftests/net/netfilter/.gitignore b/tools/testing/selftests/net/netfilter/.gitignore
index 64c4f8d9aa6c..5d2be9a00627 100644
--- a/tools/testing/selftests/net/netfilter/.gitignore
+++ b/tools/testing/selftests/net/netfilter/.gitignore
@@ -5,3 +5,4 @@ conntrack_dump_flush
 conntrack_reverse_clash
 sctp_collision
 nf_queue
+udpclash
diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index e9b2f553588d..a98ed892f55f 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -15,6 +15,7 @@ TEST_PROGS += conntrack_tcp_unreplied.sh
 TEST_PROGS += conntrack_resize.sh
 TEST_PROGS += conntrack_sctp_collision.sh
 TEST_PROGS += conntrack_vrf.sh
+TEST_PROGS += conntrack_clash.sh
 TEST_PROGS += conntrack_reverse_clash.sh
 TEST_PROGS += ipvs.sh
 TEST_PROGS += nf_conntrack_packetdrill.sh
@@ -44,6 +45,7 @@ TEST_GEN_FILES += connect_close nf_queue
 TEST_GEN_FILES += conntrack_dump_flush
 TEST_GEN_FILES += conntrack_reverse_clash
 TEST_GEN_FILES += sctp_collision
+TEST_GEN_FILES += udpclash
 
 include ../../lib.mk
 
@@ -52,6 +54,7 @@ $(OUTPUT)/nf_queue: LDLIBS += $(MNL_LDLIBS)
 
 $(OUTPUT)/conntrack_dump_flush: CFLAGS += $(MNL_CFLAGS)
 $(OUTPUT)/conntrack_dump_flush: LDLIBS += $(MNL_LDLIBS)
+$(OUTPUT)/udpclash: LDLIBS += -lpthread
 
 TEST_FILES := lib.sh
 TEST_FILES += packetdrill
diff --git a/tools/testing/selftests/net/netfilter/conntrack_clash.sh b/tools/testing/selftests/net/netfilter/conntrack_clash.sh
new file mode 100755
index 000000000000..3712c1b9b38b
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/conntrack_clash.sh
@@ -0,0 +1,175 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source lib.sh
+
+clash_resolution_active=0
+dport=22111
+ret=0
+
+cleanup()
+{
+	# netns cleanup also zaps any remaining socat echo server.
+	cleanup_all_ns
+}
+
+checktool "nft --version" "run test without nft"
+checktool "conntrack --version" "run test without conntrack"
+checktool "socat -h" "run test without socat"
+
+trap cleanup EXIT
+
+setup_ns nsclient1 nsclient2 nsrouter
+
+ip netns exec "$nsrouter" nft -f -<<EOF
+table ip t {
+	chain lb {
+		meta l4proto udp dnat to numgen random mod 3 map { 0 : 10.0.2.1 . 9000, 1 : 10.0.2.1 . 9001, 2 : 10.0.2.1 . 9002 }
+	}
+
+	chain prerouting {
+		type nat hook prerouting priority dstnat
+
+		udp dport $dport counter jump lb
+	}
+
+	chain output {
+		type nat hook output priority dstnat
+
+		udp dport $dport counter jump lb
+	}
+}
+EOF
+
+load_simple_ruleset()
+{
+ip netns exec "$1" nft -f -<<EOF
+table ip t {
+	chain forward {
+		type filter hook forward priority 0
+
+		ct state new counter
+	}
+}
+EOF
+}
+
+spawn_servers()
+{
+	local ns="$1"
+	local ports="9000 9001 9002"
+
+	for port in $ports; do
+		ip netns exec "$ns" socat UDP-RECVFROM:$port,fork PIPE 2>/dev/null &
+	done
+
+	for port in $ports; do
+		wait_local_port_listen "$ns" $port udp
+	done
+}
+
+add_addr()
+{
+	local ns="$1"
+	local dev="$2"
+	local i="$3"
+	local j="$4"
+
+	ip -net "$ns" link set "$dev" up
+	ip -net "$ns" addr add "10.0.$i.$j/24" dev "$dev"
+}
+
+ping_test()
+{
+	local ns="$1"
+	local daddr="$2"
+
+	if ! ip netns exec "$ns" ping -q -c 1 $daddr > /dev/null;then
+		echo "FAIL: ping from $ns to $daddr"
+		exit 1
+	fi
+}
+
+run_one_clash_test()
+{
+	local ns="$1"
+	local daddr="$2"
+	local dport="$3"
+	local entries
+	local cre
+
+	if ! ip netns exec "$ns" ./udpclash $daddr $dport;then
+		echo "FAIL: did not receive expected number of replies for $daddr:$dport"
+		ret=1
+		return 1
+	fi
+
+	entries=$(conntrack -S | wc -l)
+	cre=$(conntrack -S | grep -v "clash_resolve=0" | wc -l)
+
+	if [ "$cre" -ne "$entries" ] ;then
+		clash_resolution_active=1
+		return 0
+	fi
+
+	# 1 cpu -> parallel insertion impossible
+	if [ "$entries" -eq 1 ]; then
+		return 0
+	fi
+
+	# not a failure: clash resolution logic did not trigger, but all replies
+	# were received.  With right timing, xmit completed sequentially and
+	# no parallel insertion occurs.
+	return $ksft_skip
+}
+
+run_clash_test()
+{
+	local ns="$1"
+	local daddr="$2"
+	local dport="$3"
+
+	for i in $(seq 1 10);do
+		run_one_clash_test "$ns" "$daddr" "$dport"
+		local rv=$?
+		if [ $rv -eq 0 ];then
+			echo "PASS: clash resolution test for $daddr:$dport on attempt $i"
+			return 0
+		elif [ $rv -eq 1 ];then
+			echo "FAIL: clash resolution test for $daddr:$dport on attempt $i"
+			return 1
+		fi
+	done
+}
+
+ip link add veth0 netns "$nsclient1" type veth peer name veth0 netns "$nsrouter"
+ip link add veth0 netns "$nsclient2" type veth peer name veth1 netns "$nsrouter"
+add_addr "$nsclient1" veth0 1 1
+add_addr "$nsclient2" veth0 2 1
+add_addr "$nsrouter" veth0 1 99
+add_addr "$nsrouter" veth1 2 99
+
+ip -net "$nsclient1" route add default via 10.0.1.99
+ip -net "$nsclient2" route add default via 10.0.2.99
+ip netns exec "$nsrouter" sysctl -q net.ipv4.ip_forward=1
+
+ping_test "$nsclient1" 10.0.1.99
+ping_test "$nsclient1" 10.0.2.1
+ping_test "$nsclient2" 10.0.1.1
+
+spawn_servers "$nsclient2"
+
+# exercise clash resolution with nat:
+# nsrouter is supposed to dnat to 10.0.2.1:900{0,1,2,3}.
+run_clash_test "$nsclient1" 10.0.1.99 "$dport"
+
+# exercise clash resolution without nat.
+load_simple_ruleset "$nsclient2"
+run_clash_test "$nsclient2" 127.0.0.1 9001
+
+if [ $clash_resolution_active -eq 0 ];then
+	[ "$ret" -eq 0 ] && ret=$ksft_skip
+	echo "SKIP: Clash resolution did not trigger"
+fi
+
+exit $ret
diff --git a/tools/testing/selftests/net/netfilter/udpclash.c b/tools/testing/selftests/net/netfilter/udpclash.c
new file mode 100644
index 000000000000..85c7b906ad08
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/udpclash.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Usage: ./udpclash <IP> <PORT>
+ *
+ * Emit THREAD_COUNT UDP packets sharing the same saddr:daddr pair.
+ *
+ * This mimics DNS resolver libraries that emit A and AAAA requests
+ * in parallel.
+ *
+ * This exercises conntrack clash resolution logic added and later
+ * refined in
+ *
+ *  71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
+ *  ed07d9a021df ("netfilter: nf_conntrack: resolve clash for matching conntracks")
+ *  6a757c07e51f ("netfilter: conntrack: allow insertion of clashing entries")
+ */
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <sys/socket.h>
+#include <pthread.h>
+
+#define THREAD_COUNT 128
+
+struct thread_args {
+	const struct sockaddr_in *si_remote;
+	int sockfd;
+};
+
+static int wait = 1;
+
+static void *thread_main(void *varg)
+{
+	const struct sockaddr_in *si_remote;
+	const struct thread_args *args = varg;
+	static const char msg[] = "foo";
+
+	si_remote = args->si_remote;
+
+	while (wait == 1)
+		;
+
+	if (sendto(args->sockfd, msg, strlen(msg), MSG_NOSIGNAL,
+		   (struct sockaddr *)si_remote, sizeof(*si_remote)) < 0)
+		exit(111);
+
+	return varg;
+}
+
+static int run_test(int fd, const struct sockaddr_in *si_remote)
+{
+	struct thread_args thread_args = {
+		.si_remote = si_remote,
+		.sockfd = fd,
+	};
+	pthread_t *tid = calloc(THREAD_COUNT, sizeof(pthread_t));
+	unsigned int repl_count = 0, timeout = 0;
+	int i;
+
+	if (!tid) {
+		perror("calloc");
+		return 1;
+	}
+
+	for (i = 0; i < THREAD_COUNT; i++) {
+		int err = pthread_create(&tid[i], NULL, &thread_main, &thread_args);
+
+		if (err != 0) {
+			perror("pthread_create");
+			exit(1);
+		}
+	}
+
+	wait = 0;
+
+	for (i = 0; i < THREAD_COUNT; i++)
+		pthread_join(tid[i], NULL);
+
+	while (repl_count < THREAD_COUNT) {
+		struct sockaddr_in si_repl;
+		socklen_t si_repl_len = sizeof(si_repl);
+		char repl[512];
+		ssize_t ret;
+
+		ret = recvfrom(fd, repl, sizeof(repl), MSG_NOSIGNAL,
+			       (struct sockaddr *) &si_repl, &si_repl_len);
+		if (ret < 0) {
+			if (timeout++ > 5000) {
+				fputs("timed out while waiting for reply from thread\n", stderr);
+				break;
+			}
+
+			/* give reply time to pass though the stack */
+			usleep(1000);
+			continue;
+		}
+
+		if (si_repl_len != sizeof(*si_remote)) {
+			fprintf(stderr, "warning: reply has unexpected repl_len %d vs %d\n",
+				(int)si_repl_len, (int)sizeof(si_repl));
+		} else if (si_remote->sin_addr.s_addr != si_repl.sin_addr.s_addr ||
+			si_remote->sin_port != si_repl.sin_port) {
+			char a[64], b[64];
+
+			inet_ntop(AF_INET, &si_remote->sin_addr, a, sizeof(a));
+			inet_ntop(AF_INET, &si_repl.sin_addr, b, sizeof(b));
+
+			fprintf(stderr, "reply from wrong source: want %s:%d got %s:%d\n",
+				a, ntohs(si_remote->sin_port), b, ntohs(si_repl.sin_port));
+		}
+
+		repl_count++;
+	}
+
+	printf("got %d of %d replies\n", repl_count, THREAD_COUNT);
+
+	free(tid);
+
+	return repl_count == THREAD_COUNT ? 0 : 1;
+}
+
+int main(int argc, char *argv[])
+{
+	struct sockaddr_in si_local = {
+		.sin_family = AF_INET,
+	};
+	struct sockaddr_in si_remote = {
+		.sin_family = AF_INET,
+	};
+	int fd, ret;
+
+	if (argc < 3) {
+		fputs("Usage: send_udp <daddr> <dport>\n", stderr);
+		return 1;
+	}
+
+	si_remote.sin_port = htons(atoi(argv[2]));
+	si_remote.sin_addr.s_addr = inet_addr(argv[1]);
+
+	fd = socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_UDP);
+	if (fd < 0) {
+		perror("socket");
+		return 1;
+	}
+
+	if (bind(fd, (struct sockaddr *)&si_local, sizeof(si_local)) < 0) {
+		perror("bind");
+		return 1;
+	}
+
+	ret = run_test(fd, &si_remote);
+
+	close(fd);
+
+	return ret;
+}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool
  2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
@ 2025-07-17  9:51 ` Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

Previous patch added a new clash resolution test case.
Also use this during conntrack resize stress test in addition
to icmp ping flood.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 .../selftests/net/netfilter/conntrack_resize.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/testing/selftests/net/netfilter/conntrack_resize.sh b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
index aa1ba07eaf50..788cd56ea4a0 100755
--- a/tools/testing/selftests/net/netfilter/conntrack_resize.sh
+++ b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
@@ -177,6 +177,22 @@ EOF
 	done
 }
 
+ct_udpclash()
+{
+	local ns="$1"
+	local duration="$2"
+	local now=$(date +%s)
+	local end=$((now + duration))
+
+	[ -x udpclash ] || return
+
+        while [ $now -lt $end ]; do
+		ip netns exec "$ns" ./udpclash 127.0.0.1 $((RANDOM%65536)) > /dev/null 2>&1
+
+		now=$(date +%s)
+	done
+}
+
 # dump to /dev/null.  We don't want dumps to cause infinite loops
 # or use-after-free even when conntrack table is altered while dumps
 # are in progress.
@@ -267,6 +283,7 @@ insert_flood()
 
 	ct_pingflood "$n" "$timeout" "floodresize" &
 	ct_udpflood "$n" "$timeout" &
+	ct_udpclash "$n" "$timeout" &
 
 	insert_ctnetlink "$n" "$r" &
 	ctflush "$n" "$timeout" &
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set
  2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2025-07-17  9:51 ` [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool Pablo Neira Ayuso
@ 2025-07-17  9:51 ` Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

The selftest doesn't cover this error path:
 scratch = *raw_cpu_ptr(m->scratch);
 if (unlikely(!scratch)) { // here

cover this too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 tools/testing/selftests/net/netfilter/nft_concat_range.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/net/netfilter/nft_concat_range.sh b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
index cd12b8b5ac0e..20e76b395c85 100755
--- a/tools/testing/selftests/net/netfilter/nft_concat_range.sh
+++ b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
@@ -1311,6 +1311,9 @@ maybe_send_match() {
 # - remove some elements, check that packets don't match anymore
 test_correctness_main() {
 	range_size=1
+
+	send_nomatch $((end + 1)) $((end + 1 + src_delta)) || return 1
+
 	for i in $(seq "${start}" $((start + count))); do
 		local elem=""
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace
  2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2025-07-17  9:51 ` [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set Pablo Neira Ayuso
@ 2025-07-17  9:51 ` Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes" Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Pablo Neira Ayuso
  6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

Its a kernel implementation detail, at least at this time:

We can later decide to revert this patch if there is a compelling
reason, but then we should also remove the ifdef that prevents exposure
of ip_conntrack_status enum IPS_NAT_CLASH value in the uapi header.

Clash entries are not included in dumps (true for both old /proc
and ctnetlink) either.  So for now exclude the clash bit when dumping.

Fixes: 7e5c6aa67e6f ("netfilter: nf_tables: add packets conntrack state to debug trace info")
Link: https://lore.kernel.org/netfilter-devel/aGwf3dCggwBlRKKC@strlen.de/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_trace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c
index ae3fe87195ab..a88abae5a9de 100644
--- a/net/netfilter/nf_tables_trace.c
+++ b/net/netfilter/nf_tables_trace.c
@@ -127,6 +127,9 @@ static int nf_trace_fill_ct_info(struct sk_buff *nlskb,
 		if (nla_put_be32(nlskb, NFTA_TRACE_CT_ID, (__force __be32)id))
 			return -1;
 
+		/* Kernel implementation detail, withhold this from userspace for now */
+		status &= ~IPS_NAT_CLASH;
+
 		if (status && nla_put_be32(nlskb, NFTA_TRACE_CT_STATUS, htonl(status)))
 			return -1;
 	}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes"
  2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2025-07-17  9:51 ` [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace Pablo Neira Ayuso
@ 2025-07-17  9:51 ` Pablo Neira Ayuso
  2025-07-17  9:51 ` [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Pablo Neira Ayuso
  6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Phil Sutter <phil@nwl.cc>

This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82.

Such notifications fit better into core or nfnetlink_hook code,
following the NFNL_MSG_HOOK_GET message format.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h        |  5 --
 include/uapi/linux/netfilter/nf_tables.h | 10 ----
 include/uapi/linux/netfilter/nfnetlink.h |  2 -
 net/netfilter/nf_tables_api.c            | 59 ------------------------
 net/netfilter/nfnetlink.c                |  1 -
 net/netfilter/nft_chain_filter.c         |  2 -
 6 files changed, 79 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index e4d8e451e935..5e49619ae49c 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1142,11 +1142,6 @@ int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set);
 int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
 void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
 
-struct nft_hook;
-void nf_tables_chain_device_notify(const struct nft_chain *chain,
-				   const struct nft_hook *hook,
-				   const struct net_device *dev, int event);
-
 enum nft_chain_types {
 	NFT_CHAIN_T_DEFAULT = 0,
 	NFT_CHAIN_T_ROUTE,
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 518ba144544c..2beb30be2c5f 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -142,8 +142,6 @@ enum nf_tables_msg_types {
 	NFT_MSG_DESTROYOBJ,
 	NFT_MSG_DESTROYFLOWTABLE,
 	NFT_MSG_GETSETELEM_RESET,
-	NFT_MSG_NEWDEV,
-	NFT_MSG_DELDEV,
 	NFT_MSG_MAX,
 };
 
@@ -1786,18 +1784,10 @@ enum nft_synproxy_attributes {
  * enum nft_device_attributes - nf_tables device netlink attributes
  *
  * @NFTA_DEVICE_NAME: name of this device (NLA_STRING)
- * @NFTA_DEVICE_TABLE: table containing the flowtable or chain hooking into the device (NLA_STRING)
- * @NFTA_DEVICE_FLOWTABLE: flowtable hooking into the device (NLA_STRING)
- * @NFTA_DEVICE_CHAIN: chain hooking into the device (NLA_STRING)
- * @NFTA_DEVICE_SPEC: hook spec matching the device (NLA_STRING)
  */
 enum nft_devices_attributes {
 	NFTA_DEVICE_UNSPEC,
 	NFTA_DEVICE_NAME,
-	NFTA_DEVICE_TABLE,
-	NFTA_DEVICE_FLOWTABLE,
-	NFTA_DEVICE_CHAIN,
-	NFTA_DEVICE_SPEC,
 	__NFTA_DEVICE_MAX
 };
 #define NFTA_DEVICE_MAX		(__NFTA_DEVICE_MAX - 1)
diff --git a/include/uapi/linux/netfilter/nfnetlink.h b/include/uapi/linux/netfilter/nfnetlink.h
index 50d807af2649..6cd58cd2a6f0 100644
--- a/include/uapi/linux/netfilter/nfnetlink.h
+++ b/include/uapi/linux/netfilter/nfnetlink.h
@@ -25,8 +25,6 @@ enum nfnetlink_groups {
 #define NFNLGRP_ACCT_QUOTA		NFNLGRP_ACCT_QUOTA
 	NFNLGRP_NFTRACE,
 #define NFNLGRP_NFTRACE			NFNLGRP_NFTRACE
-	NFNLGRP_NFT_DEV,
-#define NFNLGRP_NFT_DEV			NFNLGRP_NFT_DEV
 	__NFNLGRP_MAX,
 };
 #define NFNLGRP_MAX	(__NFNLGRP_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 24c71ecb2179..a7240736f98e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -9686,64 +9686,6 @@ struct nf_hook_ops *nft_hook_find_ops_rcu(const struct nft_hook *hook,
 }
 EXPORT_SYMBOL_GPL(nft_hook_find_ops_rcu);
 
-static void
-nf_tables_device_notify(const struct nft_table *table, int attr,
-			const char *name, const struct nft_hook *hook,
-			const struct net_device *dev, int event)
-{
-	struct net *net = dev_net(dev);
-	struct nlmsghdr *nlh;
-	struct sk_buff *skb;
-	u16 flags = 0;
-
-	if (!nfnetlink_has_listeners(net, NFNLGRP_NFT_DEV))
-		return;
-
-	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
-	if (!skb)
-		goto err;
-
-	event = event == NETDEV_REGISTER ? NFT_MSG_NEWDEV : NFT_MSG_DELDEV;
-	event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event);
-	nlh = nfnl_msg_put(skb, 0, 0, event, flags, table->family,
-			   NFNETLINK_V0, nft_base_seq(net));
-	if (!nlh)
-		goto err;
-
-	if (nla_put_string(skb, NFTA_DEVICE_TABLE, table->name) ||
-	    nla_put_string(skb, attr, name) ||
-	    nla_put(skb, NFTA_DEVICE_SPEC, hook->ifnamelen, hook->ifname) ||
-	    nla_put_string(skb, NFTA_DEVICE_NAME, dev->name))
-		goto err;
-
-	nlmsg_end(skb, nlh);
-	nfnetlink_send(skb, net, 0, NFNLGRP_NFT_DEV,
-		       nlmsg_report(nlh), GFP_KERNEL);
-	return;
-err:
-	if (skb)
-		kfree_skb(skb);
-	nfnetlink_set_err(net, 0, NFNLGRP_NFT_DEV, -ENOBUFS);
-}
-
-void
-nf_tables_chain_device_notify(const struct nft_chain *chain,
-			      const struct nft_hook *hook,
-			      const struct net_device *dev, int event)
-{
-	nf_tables_device_notify(chain->table, NFTA_DEVICE_CHAIN,
-				chain->name, hook, dev, event);
-}
-
-static void
-nf_tables_flowtable_device_notify(const struct nft_flowtable *ft,
-				  const struct nft_hook *hook,
-				  const struct net_device *dev, int event)
-{
-	nf_tables_device_notify(ft->table, NFTA_DEVICE_FLOWTABLE,
-				ft->name, hook, dev, event);
-}
-
 static int nft_flowtable_event(unsigned long event, struct net_device *dev,
 			       struct nft_flowtable *flowtable, bool changename)
 {
@@ -9791,7 +9733,6 @@ static int nft_flowtable_event(unsigned long event, struct net_device *dev,
 			list_add_tail_rcu(&ops->list, &hook->ops_list);
 			break;
 		}
-		nf_tables_flowtable_device_notify(flowtable, hook, dev, event);
 		break;
 	}
 	return 0;
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index ac77fc21632d..e598a2a252b0 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -86,7 +86,6 @@ static const int nfnl_group2type[NFNLGRP_MAX+1] = {
 	[NFNLGRP_NFTABLES]		= NFNL_SUBSYS_NFTABLES,
 	[NFNLGRP_ACCT_QUOTA]		= NFNL_SUBSYS_ACCT,
 	[NFNLGRP_NFTRACE]		= NFNL_SUBSYS_NFTABLES,
-	[NFNLGRP_NFT_DEV]		= NFNL_SUBSYS_NFTABLES,
 };
 
 static struct nfnl_net *nfnl_pernet(struct net *net)
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 846d48ba8965..b16185e9a6dd 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -363,8 +363,6 @@ static int nft_netdev_event(unsigned long event, struct net_device *dev,
 			list_add_tail_rcu(&ops->list, &hook->ops_list);
 			break;
 		}
-		nf_tables_chain_device_notify(&basechain->chain,
-					      hook, dev, event);
 		break;
 	}
 	return 0;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
  2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2025-07-17  9:51 ` [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes" Pablo Neira Ayuso
@ 2025-07-17  9:51 ` Pablo Neira Ayuso
  6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17  9:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms

From: Florian Westphal <fw@strlen.de>

A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list:
    [exception RIP: __nf_ct_delete_from_lists+172]
    [..]
 #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]
 #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]
 #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]
    [..]

The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state:

 ct hlist pointer is garbage; looks like the ct hash value
 (hence crash).
 ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected
 ct->timeout is 30000 (=30s), which is unexpected.

Everything else looks like normal udp conntrack entry.  If we ignore
ct->status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash:
  - ct hlist pointers are overloaded and store/cache the raw tuple hash
  - ct->timeout matches the relative time expected for a new udp flow
    rather than the absolute 'jiffies' value.

If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry.

Theory is that we did hit following race:

cpu x 			cpu y			cpu z
 found entry E		found entry E
 E is expired		<preemption>
 nf_ct_delete()
 return E to rcu slab
					init_conntrack
					E is re-inited,
					ct->status set to 0
					reply tuplehash hnnode.pprev
					stores hash value.

cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z.  cpu y was preempted before
checking for expiry and/or confirm bit.

					->refcnt set to 1
					E now owned by skb
					->timeout set to 30000

If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit.

					nf_conntrack_confirm gets called
					sets: ct->status |= CONFIRMED
					This is wrong: E is not yet added
					to hashtable.

cpu y resumes, it observes E as expired but CONFIRMED:
			<resumes>
			nf_ct_expired()
			 -> yes (ct->timeout is 30s)
			confirmed bit set.

cpu y will try to delete E from the hashtable:
			nf_ct_delete() -> set DYING bit
			__nf_ct_delete_from_lists

Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks:

			wait for spinlock held by z

					CONFIRMED is set but there is no
					guarantee ct will be added to hash:
					"chaintoolong" or "clash resolution"
					logic both skip the insert step.
					reply hnnode.pprev still stores the
					hash value.

					unlocks spinlock
					return NF_DROP
			<unblocks, then
			 crashes on hlist_nulls_del_rcu pprev>

In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs.

Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy.

To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock.

Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this.

It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set:

Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped.

Also change nf_ct_should_gc() to first check the confirmed bit.

The gc sequence is:
 1. Check if entry has expired, if not skip to next entry
 2. Obtain a reference to the expired entry.
 3. Call nf_ct_should_gc() to double-check step 1.

nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check passes
ct->timeout has been altered to reflect the absolute 'best before' date
instead of a relative time.  Step 3 will therefore not remove the entry.

Without this change to nf_ct_should_gc() we could still get this sequence:

 1. Check if entry has expired.
 2. Obtain a reference.
 3. Call nf_ct_should_gc() to double-check step 1:
    4 - entry is still observed as expired
    5 - meanwhile, ct->timeout is corrected to absolute value on other CPU
      and confirm bit gets set
    6 - confirm bit is seen
    7 - valid entry is removed again

First do check 6), then 4) so the gc expiry check always picks up either
confirmed bit unset (entry gets skipped) or expiry re-check failure for
re-inited conntrack objects.

This change cannot be backported to releases before 5.19. Without
commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list")
|= IPS_CONFIRMED line cannot be moved without further changes.

Cc: Razvan Cojocaru <rzvncj@gmail.com>
Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/
Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/
Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_conntrack.h | 15 +++++++++++++--
 net/netfilter/nf_conntrack_core.c    | 26 ++++++++++++++++++++------
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 3f02a45773e8..ca26274196b9 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -306,8 +306,19 @@ static inline bool nf_ct_is_expired(const struct nf_conn *ct)
 /* use after obtaining a reference count */
 static inline bool nf_ct_should_gc(const struct nf_conn *ct)
 {
-	return nf_ct_is_expired(ct) && nf_ct_is_confirmed(ct) &&
-	       !nf_ct_is_dying(ct);
+	if (!nf_ct_is_confirmed(ct))
+		return false;
+
+	/* load ct->timeout after is_confirmed() test.
+	 * Pairs with __nf_conntrack_confirm() which:
+	 * 1. Increases ct->timeout value
+	 * 2. Inserts ct into rcu hlist
+	 * 3. Sets the confirmed bit
+	 * 4. Unlocks the hlist lock
+	 */
+	smp_acquire__after_ctrl_dep();
+
+	return nf_ct_is_expired(ct) && !nf_ct_is_dying(ct);
 }
 
 #define	NF_CT_DAY	(86400 * HZ)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 201d3c4ec623..e51f0b441109 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1124,6 +1124,12 @@ static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx)
 
 	hlist_nulls_add_head_rcu(&loser_ct->tuplehash[IP_CT_DIR_REPLY].hnnode,
 				 &nf_conntrack_hash[repl_idx]);
+	/* confirmed bit must be set after hlist add, not before:
+	 * loser_ct can still be visible to other cpu due to
+	 * SLAB_TYPESAFE_BY_RCU.
+	 */
+	smp_mb__before_atomic();
+	set_bit(IPS_CONFIRMED_BIT, &loser_ct->status);
 
 	NF_CT_STAT_INC(net, clash_resolve);
 	return NF_ACCEPT;
@@ -1260,8 +1266,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	 * user context, else we insert an already 'dead' hash, blocking
 	 * further use of that particular connection -JM.
 	 */
-	ct->status |= IPS_CONFIRMED;
-
 	if (unlikely(nf_ct_is_dying(ct))) {
 		NF_CT_STAT_INC(net, insert_failed);
 		goto dying;
@@ -1293,7 +1297,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 		}
 	}
 
-	/* Timer relative to confirmation time, not original
+	/* Timeout is relative to confirmation time, not original
 	   setting time, otherwise we'd get timer wrap in
 	   weird delay cases. */
 	ct->timeout += nfct_time_stamp;
@@ -1301,11 +1305,21 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	__nf_conntrack_insert_prepare(ct);
 
 	/* Since the lookup is lockless, hash insertion must be done after
-	 * starting the timer and setting the CONFIRMED bit. The RCU barriers
-	 * guarantee that no other CPU can find the conntrack before the above
-	 * stores are visible.
+	 * setting ct->timeout. The RCU barriers guarantee that no other CPU
+	 * can find the conntrack before the above stores are visible.
 	 */
 	__nf_conntrack_hash_insert(ct, hash, reply_hash);
+
+	/* IPS_CONFIRMED unset means 'ct not (yet) in hash', conntrack lookups
+	 * skip entries that lack this bit.  This happens when a CPU is looking
+	 * at a stale entry that is being recycled due to SLAB_TYPESAFE_BY_RCU
+	 * or when another CPU encounters this entry right after the insertion
+	 * but before the set-confirm-bit below.  This bit must not be set until
+	 * after __nf_conntrack_hash_insert().
+	 */
+	smp_mb__before_atomic();
+	set_bit(IPS_CONFIRMED_BIT, &ct->status);
+
 	nf_conntrack_double_unlock(hash, reply_hash);
 	local_bh_enable();
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test
  2025-07-17  9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
@ 2025-07-17 13:00   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-17 13:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw, horms

Hello:

This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Thu, 17 Jul 2025 11:51:16 +0200 you wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> Extend the resize test:
>  - continuously dump table both via /proc and ctnetlink interfaces while
>    table is resized in a loop.
>  - if socat is available, send udp packets in additon to ping requests.
>  - increase/decrease the icmp and udp timeouts while resizes are happening.
>    This makes sure we also exercise the 'ct has expired' check that happens
>    on conntrack lookup.
> 
> [...]

Here is the summary with links:
  - [net,1/7] selftests: netfilter: conntrack_resize.sh: extend resize test
    https://git.kernel.org/netdev/net/c/b08590559f4b
  - [net,2/7] selftests: netfilter: add conntrack clash resolution test case
    https://git.kernel.org/netdev/net/c/78a588363587
  - [net,3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool
    https://git.kernel.org/netdev/net/c/aa085ea1a68d
  - [net,4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set
    https://git.kernel.org/netdev/net/c/6dc2fae7f8a2
  - [net,5/7] netfilter: nf_tables: hide clash bit from userspace
    https://git.kernel.org/netdev/net/c/6ac86ac74e3a
  - [net,6/7] Revert "netfilter: nf_tables: Add notifications for hook changes"
    https://git.kernel.org/netdev/net/c/36a686c0784f
  - [net,7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
    https://git.kernel.org/netdev/net/c/2d72afb34065

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case
  2025-07-17  9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
@ 2025-07-17 13:22   ` Jakub Kicinski
  2025-07-17 15:14     ` Florian Westphal
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2025-07-17 13:22 UTC (permalink / raw)
  To: fw
  Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, pabeni,
	edumazet, horms

On Thu, 17 Jul 2025 11:51:17 +0200 Pablo Neira Ayuso wrote:
> Add a dedicated test to exercise conntrack clash resolution path.
> Test program emits 128 identical udp packets in parallel, then reads
> back replies from socat echo server.
> 
> Also check (via conntrack -S) that the clash path was hit at least once.
> Due to the racy nature of the test its possible that despite the
> threaded program all packets were processed in-order or on same cpu,
> emit a SKIP warning in this case.
> 
> Two tests are added:
>  - one to test the simpler, non-nat case
>  - one to exercise clash resolution where packets
>    might have different nat transformations attached to them.

This appears to fail for us:

TAP version 13
1..1
# timeout set to 1800
# selftests: net/netfilter: conntrack_clash.sh
# got 128 of 128 replies
# timed out while waiting for reply from thread
# got 127 of 128 replies
# FAIL: did not receive expected number of replies for 10.0.1.99:22111
# FAIL: clash resolution test for 10.0.1.99:22111 on attempt 2
# got 128 of 128 replies
# timed out while waiting for reply from thread
# got 0 of 128 replies
# FAIL: did not receive expected number of replies for 127.0.0.1:9001
# FAIL: clash resolution test for 127.0.0.1:9001 on attempt 2
# SKIP: Clash resolution did not trigger
not ok 1 selftests: net/netfilter: conntrack_clash.sh # exit=1
make[1]: Leaving directory '/home/virtme/testing-15/tools/testing/selftests/net/netfilter'
make: Leaving directory '/home/virtme/testing-15/tools/testing/selftests'

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case
  2025-07-17 13:22   ` Jakub Kicinski
@ 2025-07-17 15:14     ` Florian Westphal
  0 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2025-07-17 15:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, pabeni,
	edumazet, horms

Jakub Kicinski <kuba@kernel.org> wrote:
> # timeout set to 1800
> # selftests: net/netfilter: conntrack_clash.sh
> # got 128 of 128 replies
> # timed out while waiting for reply from thread
> # got 127 of 128 replies
> # FAIL: did not receive expected number of replies for 10.0.1.99:22111
> # FAIL: clash resolution test for 10.0.1.99:22111 on attempt 2
> # got 128 of 128 replies
> # timed out while waiting for reply from thread
> # got 0 of 128 replies

No idea yet whats happening here, I get 100% success rate even
when running this thing 1000 times in a loop.

I sent a patch to no longer treat '127 of 128' et al
as a hard error, i.e. the test should then either SKIP or pass.

And I added a bit more information, maybe the next test run
will tell more.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-07-17 15:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17  9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
2025-07-17  9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
2025-07-17 13:00   ` patchwork-bot+netdevbpf
2025-07-17  9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
2025-07-17 13:22   ` Jakub Kicinski
2025-07-17 15:14     ` Florian Westphal
2025-07-17  9:51 ` [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool Pablo Neira Ayuso
2025-07-17  9:51 ` [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set Pablo Neira Ayuso
2025-07-17  9:51 ` [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace Pablo Neira Ayuso
2025-07-17  9:51 ` [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes" Pablo Neira Ayuso
2025-07-17  9:51 ` [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).