* [PATCH net 0/7] Netfilter fixes for net
@ 2025-07-17 9:51 Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
Hi,
The following batch contains Netfilter fixes for net:
1) Three patches to enhance conntrack selftests for resize and clash
resolution, from Florian Westphal.
2) Expand nft_concat_range.sh selftest to improve coverage from error
path, from Florian Westphal.
3) Hide clash bit to userspace from netlink dumps until there is a
good reason to expose, from Florian Westphal.
4) Revert notification for device registration/unregistration for
nftables basechains and flowtables, we decided to go for a better
way to handle this through the nfnetlink_hook infrastructure which
will come via nf-next, patch from Phil Sutter.
Please, pull these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-25-07-17
Thanks.
----------------------------------------------------------------
The following changes since commit 7727ec1523d7973defa1dff8f9c0aad288d04008:
net: emaclite: Fix missing pointer increment in aligned_read() (2025-07-11 16:37:06 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-07-17
for you to fetch changes up to 2d72afb340657f03f7261e9243b44457a9228ac7:
netfilter: nf_conntrack: fix crash due to removal of uninitialised entry (2025-07-17 11:23:33 +0200)
----------------------------------------------------------------
netfilter pull request 25-07-17
----------------------------------------------------------------
Florian Westphal (6):
selftests: netfilter: conntrack_resize.sh: extend resize test
selftests: netfilter: add conntrack clash resolution test case
selftests: netfilter: conntrack_resize.sh: also use udpclash tool
selftests: netfilter: nft_concat_range.sh: send packets to empty set
netfilter: nf_tables: hide clash bit from userspace
netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
Phil Sutter (1):
Revert "netfilter: nf_tables: Add notifications for hook changes"
include/net/netfilter/nf_conntrack.h | 15 +-
include/net/netfilter/nf_tables.h | 5 -
include/uapi/linux/netfilter/nf_tables.h | 10 --
include/uapi/linux/netfilter/nfnetlink.h | 2 -
net/netfilter/nf_conntrack_core.c | 26 ++-
net/netfilter/nf_tables_api.c | 59 -------
net/netfilter/nf_tables_trace.c | 3 +
net/netfilter/nfnetlink.c | 1 -
net/netfilter/nft_chain_filter.c | 2 -
tools/testing/selftests/net/netfilter/.gitignore | 1 +
tools/testing/selftests/net/netfilter/Makefile | 3 +
.../selftests/net/netfilter/conntrack_clash.sh | 175 +++++++++++++++++++++
.../selftests/net/netfilter/conntrack_resize.sh | 97 +++++++++++-
.../selftests/net/netfilter/nft_concat_range.sh | 3 +
tools/testing/selftests/net/netfilter/udpclash.c | 158 +++++++++++++++++++
15 files changed, 468 insertions(+), 92 deletions(-)
create mode 100755 tools/testing/selftests/net/netfilter/conntrack_clash.sh
create mode 100644 tools/testing/selftests/net/netfilter/udpclash.c
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
@ 2025-07-17 9:51 ` Pablo Neira Ayuso
2025-07-17 13:00 ` patchwork-bot+netdevbpf
2025-07-17 9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
` (5 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
Extend the resize test:
- continuously dump table both via /proc and ctnetlink interfaces while
table is resized in a loop.
- if socat is available, send udp packets in additon to ping requests.
- increase/decrease the icmp and udp timeouts while resizes are happening.
This makes sure we also exercise the 'ct has expired' check that happens
on conntrack lookup.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
.../net/netfilter/conntrack_resize.sh | 80 +++++++++++++++++--
1 file changed, 75 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/conntrack_resize.sh b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
index 9e033e80219e..aa1ba07eaf50 100755
--- a/tools/testing/selftests/net/netfilter/conntrack_resize.sh
+++ b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
@@ -12,6 +12,9 @@ tmpfile=""
tmpfile_proc=""
tmpfile_uniq=""
ret=0
+have_socat=0
+
+socat -h > /dev/null && have_socat=1
insert_count=2000
[ "$KSFT_MACHINE_SLOW" = "yes" ] && insert_count=400
@@ -123,7 +126,7 @@ ctflush() {
done
}
-ctflood()
+ct_pingflood()
{
local ns="$1"
local duration="$2"
@@ -152,6 +155,28 @@ ctflood()
wait
}
+ct_udpflood()
+{
+ local ns="$1"
+ local duration="$2"
+ local now=$(date +%s)
+ local end=$((now + duration))
+
+ [ $have_socat -ne "1" ] && return
+
+ while [ $now -lt $end ]; do
+ip netns exec "$ns" bash<<"EOF"
+ for i in $(seq 1 100);do
+ dport=$(((RANDOM%65536)+1))
+
+ echo bar | socat -u STDIN UDP:"127.0.0.1:$dport" &
+ done > /dev/null 2>&1
+ wait
+EOF
+ now=$(date +%s)
+ done
+}
+
# dump to /dev/null. We don't want dumps to cause infinite loops
# or use-after-free even when conntrack table is altered while dumps
# are in progress.
@@ -169,6 +194,48 @@ ct_nulldump()
wait
}
+ct_nulldump_loop()
+{
+ local ns="$1"
+ local duration="$2"
+ local now=$(date +%s)
+ local end=$((now + duration))
+
+ while [ $now -lt $end ]; do
+ ct_nulldump "$ns"
+ sleep $((RANDOM%2))
+ now=$(date +%s)
+ done
+}
+
+change_timeouts()
+{
+ local ns="$1"
+ local r1=$((RANDOM%2))
+ local r2=$((RANDOM%2))
+
+ [ "$r1" -eq 1 ] && ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_icmp_timeout=$((RANDOM%5))
+ [ "$r2" -eq 1 ] && ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_udp_timeout=$((RANDOM%5))
+}
+
+ct_change_timeouts_loop()
+{
+ local ns="$1"
+ local duration="$2"
+ local now=$(date +%s)
+ local end=$((now + duration))
+
+ while [ $now -lt $end ]; do
+ change_timeouts "$ns"
+ sleep $((RANDOM%2))
+ now=$(date +%s)
+ done
+
+ # restore defaults
+ ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_icmp_timeout=30
+ ip netns exec "$ns" sysctl -q net.netfilter.nf_conntrack_udp_timeout=30
+}
+
check_taint()
{
local tainted_then="$1"
@@ -198,10 +265,13 @@ insert_flood()
r=$((RANDOM%$insert_count))
- ctflood "$n" "$timeout" "floodresize" &
+ ct_pingflood "$n" "$timeout" "floodresize" &
+ ct_udpflood "$n" "$timeout" &
+
insert_ctnetlink "$n" "$r" &
ctflush "$n" "$timeout" &
- ct_nulldump "$n" &
+ ct_nulldump_loop "$n" "$timeout" &
+ ct_change_timeouts_loop "$n" "$timeout" &
wait
}
@@ -306,7 +376,7 @@ test_dump_all()
ip netns exec "$nsclient1" sysctl -q net.netfilter.nf_conntrack_icmp_timeout=3600
- ctflood "$nsclient1" $timeout "dumpall" &
+ ct_pingflood "$nsclient1" $timeout "dumpall" &
insert_ctnetlink "$nsclient2" $insert_count
wait
@@ -368,7 +438,7 @@ test_conntrack_disable()
ct_flush_once "$nsclient1"
ct_flush_once "$nsclient2"
- ctflood "$nsclient1" "$timeout" "conntrack disable"
+ ct_pingflood "$nsclient1" "$timeout" "conntrack disable"
ip netns exec "$nsclient2" ping -q -c 1 127.0.0.1 >/dev/null 2>&1
# Disabled, should not have picked up any connection.
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
@ 2025-07-17 9:51 ` Pablo Neira Ayuso
2025-07-17 13:22 ` Jakub Kicinski
2025-07-17 9:51 ` [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool Pablo Neira Ayuso
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
Add a dedicated test to exercise conntrack clash resolution path.
Test program emits 128 identical udp packets in parallel, then reads
back replies from socat echo server.
Also check (via conntrack -S) that the clash path was hit at least once.
Due to the racy nature of the test its possible that despite the
threaded program all packets were processed in-order or on same cpu,
emit a SKIP warning in this case.
Two tests are added:
- one to test the simpler, non-nat case
- one to exercise clash resolution where packets
might have different nat transformations attached to them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
.../selftests/net/netfilter/.gitignore | 1 +
.../testing/selftests/net/netfilter/Makefile | 3 +
.../net/netfilter/conntrack_clash.sh | 175 ++++++++++++++++++
.../selftests/net/netfilter/udpclash.c | 158 ++++++++++++++++
4 files changed, 337 insertions(+)
create mode 100755 tools/testing/selftests/net/netfilter/conntrack_clash.sh
create mode 100644 tools/testing/selftests/net/netfilter/udpclash.c
diff --git a/tools/testing/selftests/net/netfilter/.gitignore b/tools/testing/selftests/net/netfilter/.gitignore
index 64c4f8d9aa6c..5d2be9a00627 100644
--- a/tools/testing/selftests/net/netfilter/.gitignore
+++ b/tools/testing/selftests/net/netfilter/.gitignore
@@ -5,3 +5,4 @@ conntrack_dump_flush
conntrack_reverse_clash
sctp_collision
nf_queue
+udpclash
diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index e9b2f553588d..a98ed892f55f 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -15,6 +15,7 @@ TEST_PROGS += conntrack_tcp_unreplied.sh
TEST_PROGS += conntrack_resize.sh
TEST_PROGS += conntrack_sctp_collision.sh
TEST_PROGS += conntrack_vrf.sh
+TEST_PROGS += conntrack_clash.sh
TEST_PROGS += conntrack_reverse_clash.sh
TEST_PROGS += ipvs.sh
TEST_PROGS += nf_conntrack_packetdrill.sh
@@ -44,6 +45,7 @@ TEST_GEN_FILES += connect_close nf_queue
TEST_GEN_FILES += conntrack_dump_flush
TEST_GEN_FILES += conntrack_reverse_clash
TEST_GEN_FILES += sctp_collision
+TEST_GEN_FILES += udpclash
include ../../lib.mk
@@ -52,6 +54,7 @@ $(OUTPUT)/nf_queue: LDLIBS += $(MNL_LDLIBS)
$(OUTPUT)/conntrack_dump_flush: CFLAGS += $(MNL_CFLAGS)
$(OUTPUT)/conntrack_dump_flush: LDLIBS += $(MNL_LDLIBS)
+$(OUTPUT)/udpclash: LDLIBS += -lpthread
TEST_FILES := lib.sh
TEST_FILES += packetdrill
diff --git a/tools/testing/selftests/net/netfilter/conntrack_clash.sh b/tools/testing/selftests/net/netfilter/conntrack_clash.sh
new file mode 100755
index 000000000000..3712c1b9b38b
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/conntrack_clash.sh
@@ -0,0 +1,175 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source lib.sh
+
+clash_resolution_active=0
+dport=22111
+ret=0
+
+cleanup()
+{
+ # netns cleanup also zaps any remaining socat echo server.
+ cleanup_all_ns
+}
+
+checktool "nft --version" "run test without nft"
+checktool "conntrack --version" "run test without conntrack"
+checktool "socat -h" "run test without socat"
+
+trap cleanup EXIT
+
+setup_ns nsclient1 nsclient2 nsrouter
+
+ip netns exec "$nsrouter" nft -f -<<EOF
+table ip t {
+ chain lb {
+ meta l4proto udp dnat to numgen random mod 3 map { 0 : 10.0.2.1 . 9000, 1 : 10.0.2.1 . 9001, 2 : 10.0.2.1 . 9002 }
+ }
+
+ chain prerouting {
+ type nat hook prerouting priority dstnat
+
+ udp dport $dport counter jump lb
+ }
+
+ chain output {
+ type nat hook output priority dstnat
+
+ udp dport $dport counter jump lb
+ }
+}
+EOF
+
+load_simple_ruleset()
+{
+ip netns exec "$1" nft -f -<<EOF
+table ip t {
+ chain forward {
+ type filter hook forward priority 0
+
+ ct state new counter
+ }
+}
+EOF
+}
+
+spawn_servers()
+{
+ local ns="$1"
+ local ports="9000 9001 9002"
+
+ for port in $ports; do
+ ip netns exec "$ns" socat UDP-RECVFROM:$port,fork PIPE 2>/dev/null &
+ done
+
+ for port in $ports; do
+ wait_local_port_listen "$ns" $port udp
+ done
+}
+
+add_addr()
+{
+ local ns="$1"
+ local dev="$2"
+ local i="$3"
+ local j="$4"
+
+ ip -net "$ns" link set "$dev" up
+ ip -net "$ns" addr add "10.0.$i.$j/24" dev "$dev"
+}
+
+ping_test()
+{
+ local ns="$1"
+ local daddr="$2"
+
+ if ! ip netns exec "$ns" ping -q -c 1 $daddr > /dev/null;then
+ echo "FAIL: ping from $ns to $daddr"
+ exit 1
+ fi
+}
+
+run_one_clash_test()
+{
+ local ns="$1"
+ local daddr="$2"
+ local dport="$3"
+ local entries
+ local cre
+
+ if ! ip netns exec "$ns" ./udpclash $daddr $dport;then
+ echo "FAIL: did not receive expected number of replies for $daddr:$dport"
+ ret=1
+ return 1
+ fi
+
+ entries=$(conntrack -S | wc -l)
+ cre=$(conntrack -S | grep -v "clash_resolve=0" | wc -l)
+
+ if [ "$cre" -ne "$entries" ] ;then
+ clash_resolution_active=1
+ return 0
+ fi
+
+ # 1 cpu -> parallel insertion impossible
+ if [ "$entries" -eq 1 ]; then
+ return 0
+ fi
+
+ # not a failure: clash resolution logic did not trigger, but all replies
+ # were received. With right timing, xmit completed sequentially and
+ # no parallel insertion occurs.
+ return $ksft_skip
+}
+
+run_clash_test()
+{
+ local ns="$1"
+ local daddr="$2"
+ local dport="$3"
+
+ for i in $(seq 1 10);do
+ run_one_clash_test "$ns" "$daddr" "$dport"
+ local rv=$?
+ if [ $rv -eq 0 ];then
+ echo "PASS: clash resolution test for $daddr:$dport on attempt $i"
+ return 0
+ elif [ $rv -eq 1 ];then
+ echo "FAIL: clash resolution test for $daddr:$dport on attempt $i"
+ return 1
+ fi
+ done
+}
+
+ip link add veth0 netns "$nsclient1" type veth peer name veth0 netns "$nsrouter"
+ip link add veth0 netns "$nsclient2" type veth peer name veth1 netns "$nsrouter"
+add_addr "$nsclient1" veth0 1 1
+add_addr "$nsclient2" veth0 2 1
+add_addr "$nsrouter" veth0 1 99
+add_addr "$nsrouter" veth1 2 99
+
+ip -net "$nsclient1" route add default via 10.0.1.99
+ip -net "$nsclient2" route add default via 10.0.2.99
+ip netns exec "$nsrouter" sysctl -q net.ipv4.ip_forward=1
+
+ping_test "$nsclient1" 10.0.1.99
+ping_test "$nsclient1" 10.0.2.1
+ping_test "$nsclient2" 10.0.1.1
+
+spawn_servers "$nsclient2"
+
+# exercise clash resolution with nat:
+# nsrouter is supposed to dnat to 10.0.2.1:900{0,1,2,3}.
+run_clash_test "$nsclient1" 10.0.1.99 "$dport"
+
+# exercise clash resolution without nat.
+load_simple_ruleset "$nsclient2"
+run_clash_test "$nsclient2" 127.0.0.1 9001
+
+if [ $clash_resolution_active -eq 0 ];then
+ [ "$ret" -eq 0 ] && ret=$ksft_skip
+ echo "SKIP: Clash resolution did not trigger"
+fi
+
+exit $ret
diff --git a/tools/testing/selftests/net/netfilter/udpclash.c b/tools/testing/selftests/net/netfilter/udpclash.c
new file mode 100644
index 000000000000..85c7b906ad08
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/udpclash.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Usage: ./udpclash <IP> <PORT>
+ *
+ * Emit THREAD_COUNT UDP packets sharing the same saddr:daddr pair.
+ *
+ * This mimics DNS resolver libraries that emit A and AAAA requests
+ * in parallel.
+ *
+ * This exercises conntrack clash resolution logic added and later
+ * refined in
+ *
+ * 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race")
+ * ed07d9a021df ("netfilter: nf_conntrack: resolve clash for matching conntracks")
+ * 6a757c07e51f ("netfilter: conntrack: allow insertion of clashing entries")
+ */
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <sys/socket.h>
+#include <pthread.h>
+
+#define THREAD_COUNT 128
+
+struct thread_args {
+ const struct sockaddr_in *si_remote;
+ int sockfd;
+};
+
+static int wait = 1;
+
+static void *thread_main(void *varg)
+{
+ const struct sockaddr_in *si_remote;
+ const struct thread_args *args = varg;
+ static const char msg[] = "foo";
+
+ si_remote = args->si_remote;
+
+ while (wait == 1)
+ ;
+
+ if (sendto(args->sockfd, msg, strlen(msg), MSG_NOSIGNAL,
+ (struct sockaddr *)si_remote, sizeof(*si_remote)) < 0)
+ exit(111);
+
+ return varg;
+}
+
+static int run_test(int fd, const struct sockaddr_in *si_remote)
+{
+ struct thread_args thread_args = {
+ .si_remote = si_remote,
+ .sockfd = fd,
+ };
+ pthread_t *tid = calloc(THREAD_COUNT, sizeof(pthread_t));
+ unsigned int repl_count = 0, timeout = 0;
+ int i;
+
+ if (!tid) {
+ perror("calloc");
+ return 1;
+ }
+
+ for (i = 0; i < THREAD_COUNT; i++) {
+ int err = pthread_create(&tid[i], NULL, &thread_main, &thread_args);
+
+ if (err != 0) {
+ perror("pthread_create");
+ exit(1);
+ }
+ }
+
+ wait = 0;
+
+ for (i = 0; i < THREAD_COUNT; i++)
+ pthread_join(tid[i], NULL);
+
+ while (repl_count < THREAD_COUNT) {
+ struct sockaddr_in si_repl;
+ socklen_t si_repl_len = sizeof(si_repl);
+ char repl[512];
+ ssize_t ret;
+
+ ret = recvfrom(fd, repl, sizeof(repl), MSG_NOSIGNAL,
+ (struct sockaddr *) &si_repl, &si_repl_len);
+ if (ret < 0) {
+ if (timeout++ > 5000) {
+ fputs("timed out while waiting for reply from thread\n", stderr);
+ break;
+ }
+
+ /* give reply time to pass though the stack */
+ usleep(1000);
+ continue;
+ }
+
+ if (si_repl_len != sizeof(*si_remote)) {
+ fprintf(stderr, "warning: reply has unexpected repl_len %d vs %d\n",
+ (int)si_repl_len, (int)sizeof(si_repl));
+ } else if (si_remote->sin_addr.s_addr != si_repl.sin_addr.s_addr ||
+ si_remote->sin_port != si_repl.sin_port) {
+ char a[64], b[64];
+
+ inet_ntop(AF_INET, &si_remote->sin_addr, a, sizeof(a));
+ inet_ntop(AF_INET, &si_repl.sin_addr, b, sizeof(b));
+
+ fprintf(stderr, "reply from wrong source: want %s:%d got %s:%d\n",
+ a, ntohs(si_remote->sin_port), b, ntohs(si_repl.sin_port));
+ }
+
+ repl_count++;
+ }
+
+ printf("got %d of %d replies\n", repl_count, THREAD_COUNT);
+
+ free(tid);
+
+ return repl_count == THREAD_COUNT ? 0 : 1;
+}
+
+int main(int argc, char *argv[])
+{
+ struct sockaddr_in si_local = {
+ .sin_family = AF_INET,
+ };
+ struct sockaddr_in si_remote = {
+ .sin_family = AF_INET,
+ };
+ int fd, ret;
+
+ if (argc < 3) {
+ fputs("Usage: send_udp <daddr> <dport>\n", stderr);
+ return 1;
+ }
+
+ si_remote.sin_port = htons(atoi(argv[2]));
+ si_remote.sin_addr.s_addr = inet_addr(argv[1]);
+
+ fd = socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_UDP);
+ if (fd < 0) {
+ perror("socket");
+ return 1;
+ }
+
+ if (bind(fd, (struct sockaddr *)&si_local, sizeof(si_local)) < 0) {
+ perror("bind");
+ return 1;
+ }
+
+ ret = run_test(fd, &si_remote);
+
+ close(fd);
+
+ return ret;
+}
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
@ 2025-07-17 9:51 ` Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set Pablo Neira Ayuso
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
Previous patch added a new clash resolution test case.
Also use this during conntrack resize stress test in addition
to icmp ping flood.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
.../selftests/net/netfilter/conntrack_resize.sh | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/tools/testing/selftests/net/netfilter/conntrack_resize.sh b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
index aa1ba07eaf50..788cd56ea4a0 100755
--- a/tools/testing/selftests/net/netfilter/conntrack_resize.sh
+++ b/tools/testing/selftests/net/netfilter/conntrack_resize.sh
@@ -177,6 +177,22 @@ EOF
done
}
+ct_udpclash()
+{
+ local ns="$1"
+ local duration="$2"
+ local now=$(date +%s)
+ local end=$((now + duration))
+
+ [ -x udpclash ] || return
+
+ while [ $now -lt $end ]; do
+ ip netns exec "$ns" ./udpclash 127.0.0.1 $((RANDOM%65536)) > /dev/null 2>&1
+
+ now=$(date +%s)
+ done
+}
+
# dump to /dev/null. We don't want dumps to cause infinite loops
# or use-after-free even when conntrack table is altered while dumps
# are in progress.
@@ -267,6 +283,7 @@ insert_flood()
ct_pingflood "$n" "$timeout" "floodresize" &
ct_udpflood "$n" "$timeout" &
+ ct_udpclash "$n" "$timeout" &
insert_ctnetlink "$n" "$r" &
ctflush "$n" "$timeout" &
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
` (2 preceding siblings ...)
2025-07-17 9:51 ` [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool Pablo Neira Ayuso
@ 2025-07-17 9:51 ` Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace Pablo Neira Ayuso
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
The selftest doesn't cover this error path:
scratch = *raw_cpu_ptr(m->scratch);
if (unlikely(!scratch)) { // here
cover this too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
tools/testing/selftests/net/netfilter/nft_concat_range.sh | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/net/netfilter/nft_concat_range.sh b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
index cd12b8b5ac0e..20e76b395c85 100755
--- a/tools/testing/selftests/net/netfilter/nft_concat_range.sh
+++ b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
@@ -1311,6 +1311,9 @@ maybe_send_match() {
# - remove some elements, check that packets don't match anymore
test_correctness_main() {
range_size=1
+
+ send_nomatch $((end + 1)) $((end + 1 + src_delta)) || return 1
+
for i in $(seq "${start}" $((start + count))); do
local elem=""
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
` (3 preceding siblings ...)
2025-07-17 9:51 ` [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set Pablo Neira Ayuso
@ 2025-07-17 9:51 ` Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes" Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Pablo Neira Ayuso
6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
Its a kernel implementation detail, at least at this time:
We can later decide to revert this patch if there is a compelling
reason, but then we should also remove the ifdef that prevents exposure
of ip_conntrack_status enum IPS_NAT_CLASH value in the uapi header.
Clash entries are not included in dumps (true for both old /proc
and ctnetlink) either. So for now exclude the clash bit when dumping.
Fixes: 7e5c6aa67e6f ("netfilter: nf_tables: add packets conntrack state to debug trace info")
Link: https://lore.kernel.org/netfilter-devel/aGwf3dCggwBlRKKC@strlen.de/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_tables_trace.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/netfilter/nf_tables_trace.c b/net/netfilter/nf_tables_trace.c
index ae3fe87195ab..a88abae5a9de 100644
--- a/net/netfilter/nf_tables_trace.c
+++ b/net/netfilter/nf_tables_trace.c
@@ -127,6 +127,9 @@ static int nf_trace_fill_ct_info(struct sk_buff *nlskb,
if (nla_put_be32(nlskb, NFTA_TRACE_CT_ID, (__force __be32)id))
return -1;
+ /* Kernel implementation detail, withhold this from userspace for now */
+ status &= ~IPS_NAT_CLASH;
+
if (status && nla_put_be32(nlskb, NFTA_TRACE_CT_STATUS, htonl(status)))
return -1;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes"
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
` (4 preceding siblings ...)
2025-07-17 9:51 ` [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace Pablo Neira Ayuso
@ 2025-07-17 9:51 ` Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Pablo Neira Ayuso
6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Phil Sutter <phil@nwl.cc>
This reverts commit 465b9ee0ee7bc268d7f261356afd6c4262e48d82.
Such notifications fit better into core or nfnetlink_hook code,
following the NFNL_MSG_HOOK_GET message format.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_tables.h | 5 --
include/uapi/linux/netfilter/nf_tables.h | 10 ----
include/uapi/linux/netfilter/nfnetlink.h | 2 -
net/netfilter/nf_tables_api.c | 59 ------------------------
net/netfilter/nfnetlink.c | 1 -
net/netfilter/nft_chain_filter.c | 2 -
6 files changed, 79 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index e4d8e451e935..5e49619ae49c 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1142,11 +1142,6 @@ int nft_set_catchall_validate(const struct nft_ctx *ctx, struct nft_set *set);
int nf_tables_bind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
void nf_tables_unbind_chain(const struct nft_ctx *ctx, struct nft_chain *chain);
-struct nft_hook;
-void nf_tables_chain_device_notify(const struct nft_chain *chain,
- const struct nft_hook *hook,
- const struct net_device *dev, int event);
-
enum nft_chain_types {
NFT_CHAIN_T_DEFAULT = 0,
NFT_CHAIN_T_ROUTE,
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 518ba144544c..2beb30be2c5f 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -142,8 +142,6 @@ enum nf_tables_msg_types {
NFT_MSG_DESTROYOBJ,
NFT_MSG_DESTROYFLOWTABLE,
NFT_MSG_GETSETELEM_RESET,
- NFT_MSG_NEWDEV,
- NFT_MSG_DELDEV,
NFT_MSG_MAX,
};
@@ -1786,18 +1784,10 @@ enum nft_synproxy_attributes {
* enum nft_device_attributes - nf_tables device netlink attributes
*
* @NFTA_DEVICE_NAME: name of this device (NLA_STRING)
- * @NFTA_DEVICE_TABLE: table containing the flowtable or chain hooking into the device (NLA_STRING)
- * @NFTA_DEVICE_FLOWTABLE: flowtable hooking into the device (NLA_STRING)
- * @NFTA_DEVICE_CHAIN: chain hooking into the device (NLA_STRING)
- * @NFTA_DEVICE_SPEC: hook spec matching the device (NLA_STRING)
*/
enum nft_devices_attributes {
NFTA_DEVICE_UNSPEC,
NFTA_DEVICE_NAME,
- NFTA_DEVICE_TABLE,
- NFTA_DEVICE_FLOWTABLE,
- NFTA_DEVICE_CHAIN,
- NFTA_DEVICE_SPEC,
__NFTA_DEVICE_MAX
};
#define NFTA_DEVICE_MAX (__NFTA_DEVICE_MAX - 1)
diff --git a/include/uapi/linux/netfilter/nfnetlink.h b/include/uapi/linux/netfilter/nfnetlink.h
index 50d807af2649..6cd58cd2a6f0 100644
--- a/include/uapi/linux/netfilter/nfnetlink.h
+++ b/include/uapi/linux/netfilter/nfnetlink.h
@@ -25,8 +25,6 @@ enum nfnetlink_groups {
#define NFNLGRP_ACCT_QUOTA NFNLGRP_ACCT_QUOTA
NFNLGRP_NFTRACE,
#define NFNLGRP_NFTRACE NFNLGRP_NFTRACE
- NFNLGRP_NFT_DEV,
-#define NFNLGRP_NFT_DEV NFNLGRP_NFT_DEV
__NFNLGRP_MAX,
};
#define NFNLGRP_MAX (__NFNLGRP_MAX - 1)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 24c71ecb2179..a7240736f98e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -9686,64 +9686,6 @@ struct nf_hook_ops *nft_hook_find_ops_rcu(const struct nft_hook *hook,
}
EXPORT_SYMBOL_GPL(nft_hook_find_ops_rcu);
-static void
-nf_tables_device_notify(const struct nft_table *table, int attr,
- const char *name, const struct nft_hook *hook,
- const struct net_device *dev, int event)
-{
- struct net *net = dev_net(dev);
- struct nlmsghdr *nlh;
- struct sk_buff *skb;
- u16 flags = 0;
-
- if (!nfnetlink_has_listeners(net, NFNLGRP_NFT_DEV))
- return;
-
- skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
- if (!skb)
- goto err;
-
- event = event == NETDEV_REGISTER ? NFT_MSG_NEWDEV : NFT_MSG_DELDEV;
- event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event);
- nlh = nfnl_msg_put(skb, 0, 0, event, flags, table->family,
- NFNETLINK_V0, nft_base_seq(net));
- if (!nlh)
- goto err;
-
- if (nla_put_string(skb, NFTA_DEVICE_TABLE, table->name) ||
- nla_put_string(skb, attr, name) ||
- nla_put(skb, NFTA_DEVICE_SPEC, hook->ifnamelen, hook->ifname) ||
- nla_put_string(skb, NFTA_DEVICE_NAME, dev->name))
- goto err;
-
- nlmsg_end(skb, nlh);
- nfnetlink_send(skb, net, 0, NFNLGRP_NFT_DEV,
- nlmsg_report(nlh), GFP_KERNEL);
- return;
-err:
- if (skb)
- kfree_skb(skb);
- nfnetlink_set_err(net, 0, NFNLGRP_NFT_DEV, -ENOBUFS);
-}
-
-void
-nf_tables_chain_device_notify(const struct nft_chain *chain,
- const struct nft_hook *hook,
- const struct net_device *dev, int event)
-{
- nf_tables_device_notify(chain->table, NFTA_DEVICE_CHAIN,
- chain->name, hook, dev, event);
-}
-
-static void
-nf_tables_flowtable_device_notify(const struct nft_flowtable *ft,
- const struct nft_hook *hook,
- const struct net_device *dev, int event)
-{
- nf_tables_device_notify(ft->table, NFTA_DEVICE_FLOWTABLE,
- ft->name, hook, dev, event);
-}
-
static int nft_flowtable_event(unsigned long event, struct net_device *dev,
struct nft_flowtable *flowtable, bool changename)
{
@@ -9791,7 +9733,6 @@ static int nft_flowtable_event(unsigned long event, struct net_device *dev,
list_add_tail_rcu(&ops->list, &hook->ops_list);
break;
}
- nf_tables_flowtable_device_notify(flowtable, hook, dev, event);
break;
}
return 0;
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index ac77fc21632d..e598a2a252b0 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -86,7 +86,6 @@ static const int nfnl_group2type[NFNLGRP_MAX+1] = {
[NFNLGRP_NFTABLES] = NFNL_SUBSYS_NFTABLES,
[NFNLGRP_ACCT_QUOTA] = NFNL_SUBSYS_ACCT,
[NFNLGRP_NFTRACE] = NFNL_SUBSYS_NFTABLES,
- [NFNLGRP_NFT_DEV] = NFNL_SUBSYS_NFTABLES,
};
static struct nfnl_net *nfnl_pernet(struct net *net)
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 846d48ba8965..b16185e9a6dd 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -363,8 +363,6 @@ static int nft_netdev_event(unsigned long event, struct net_device *dev,
list_add_tail_rcu(&ops->list, &hook->ops_list);
break;
}
- nf_tables_chain_device_notify(&basechain->chain,
- hook, dev, event);
break;
}
return 0;
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
` (5 preceding siblings ...)
2025-07-17 9:51 ` [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes" Pablo Neira Ayuso
@ 2025-07-17 9:51 ` Pablo Neira Ayuso
6 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2025-07-17 9:51 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, horms
From: Florian Westphal <fw@strlen.de>
A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list:
[exception RIP: __nf_ct_delete_from_lists+172]
[..]
#7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]
#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]
#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]
[..]
The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state:
ct hlist pointer is garbage; looks like the ct hash value
(hence crash).
ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected
ct->timeout is 30000 (=30s), which is unexpected.
Everything else looks like normal udp conntrack entry. If we ignore
ct->status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash:
- ct hlist pointers are overloaded and store/cache the raw tuple hash
- ct->timeout matches the relative time expected for a new udp flow
rather than the absolute 'jiffies' value.
If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry.
Theory is that we did hit following race:
cpu x cpu y cpu z
found entry E found entry E
E is expired <preemption>
nf_ct_delete()
return E to rcu slab
init_conntrack
E is re-inited,
ct->status set to 0
reply tuplehash hnnode.pprev
stores hash value.
cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z. cpu y was preempted before
checking for expiry and/or confirm bit.
->refcnt set to 1
E now owned by skb
->timeout set to 30000
If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit.
nf_conntrack_confirm gets called
sets: ct->status |= CONFIRMED
This is wrong: E is not yet added
to hashtable.
cpu y resumes, it observes E as expired but CONFIRMED:
<resumes>
nf_ct_expired()
-> yes (ct->timeout is 30s)
confirmed bit set.
cpu y will try to delete E from the hashtable:
nf_ct_delete() -> set DYING bit
__nf_ct_delete_from_lists
Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks:
wait for spinlock held by z
CONFIRMED is set but there is no
guarantee ct will be added to hash:
"chaintoolong" or "clash resolution"
logic both skip the insert step.
reply hnnode.pprev still stores the
hash value.
unlocks spinlock
return NF_DROP
<unblocks, then
crashes on hlist_nulls_del_rcu pprev>
In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs.
Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy.
To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock.
Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this.
It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set:
Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped.
Also change nf_ct_should_gc() to first check the confirmed bit.
The gc sequence is:
1. Check if entry has expired, if not skip to next entry
2. Obtain a reference to the expired entry.
3. Call nf_ct_should_gc() to double-check step 1.
nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check passes
ct->timeout has been altered to reflect the absolute 'best before' date
instead of a relative time. Step 3 will therefore not remove the entry.
Without this change to nf_ct_should_gc() we could still get this sequence:
1. Check if entry has expired.
2. Obtain a reference.
3. Call nf_ct_should_gc() to double-check step 1:
4 - entry is still observed as expired
5 - meanwhile, ct->timeout is corrected to absolute value on other CPU
and confirm bit gets set
6 - confirm bit is seen
7 - valid entry is removed again
First do check 6), then 4) so the gc expiry check always picks up either
confirmed bit unset (entry gets skipped) or expiry re-check failure for
re-inited conntrack objects.
This change cannot be backported to releases before 5.19. Without
commit 8a75a2c17410 ("netfilter: conntrack: remove unconfirmed list")
|= IPS_CONFIRMED line cannot be moved without further changes.
Cc: Razvan Cojocaru <rzvncj@gmail.com>
Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/
Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/
Fixes: 1397af5bfd7d ("netfilter: conntrack: remove the percpu dying list")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_conntrack.h | 15 +++++++++++++--
net/netfilter/nf_conntrack_core.c | 26 ++++++++++++++++++++------
2 files changed, 33 insertions(+), 8 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 3f02a45773e8..ca26274196b9 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -306,8 +306,19 @@ static inline bool nf_ct_is_expired(const struct nf_conn *ct)
/* use after obtaining a reference count */
static inline bool nf_ct_should_gc(const struct nf_conn *ct)
{
- return nf_ct_is_expired(ct) && nf_ct_is_confirmed(ct) &&
- !nf_ct_is_dying(ct);
+ if (!nf_ct_is_confirmed(ct))
+ return false;
+
+ /* load ct->timeout after is_confirmed() test.
+ * Pairs with __nf_conntrack_confirm() which:
+ * 1. Increases ct->timeout value
+ * 2. Inserts ct into rcu hlist
+ * 3. Sets the confirmed bit
+ * 4. Unlocks the hlist lock
+ */
+ smp_acquire__after_ctrl_dep();
+
+ return nf_ct_is_expired(ct) && !nf_ct_is_dying(ct);
}
#define NF_CT_DAY (86400 * HZ)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 201d3c4ec623..e51f0b441109 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1124,6 +1124,12 @@ static int nf_ct_resolve_clash_harder(struct sk_buff *skb, u32 repl_idx)
hlist_nulls_add_head_rcu(&loser_ct->tuplehash[IP_CT_DIR_REPLY].hnnode,
&nf_conntrack_hash[repl_idx]);
+ /* confirmed bit must be set after hlist add, not before:
+ * loser_ct can still be visible to other cpu due to
+ * SLAB_TYPESAFE_BY_RCU.
+ */
+ smp_mb__before_atomic();
+ set_bit(IPS_CONFIRMED_BIT, &loser_ct->status);
NF_CT_STAT_INC(net, clash_resolve);
return NF_ACCEPT;
@@ -1260,8 +1266,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
* user context, else we insert an already 'dead' hash, blocking
* further use of that particular connection -JM.
*/
- ct->status |= IPS_CONFIRMED;
-
if (unlikely(nf_ct_is_dying(ct))) {
NF_CT_STAT_INC(net, insert_failed);
goto dying;
@@ -1293,7 +1297,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
}
}
- /* Timer relative to confirmation time, not original
+ /* Timeout is relative to confirmation time, not original
setting time, otherwise we'd get timer wrap in
weird delay cases. */
ct->timeout += nfct_time_stamp;
@@ -1301,11 +1305,21 @@ __nf_conntrack_confirm(struct sk_buff *skb)
__nf_conntrack_insert_prepare(ct);
/* Since the lookup is lockless, hash insertion must be done after
- * starting the timer and setting the CONFIRMED bit. The RCU barriers
- * guarantee that no other CPU can find the conntrack before the above
- * stores are visible.
+ * setting ct->timeout. The RCU barriers guarantee that no other CPU
+ * can find the conntrack before the above stores are visible.
*/
__nf_conntrack_hash_insert(ct, hash, reply_hash);
+
+ /* IPS_CONFIRMED unset means 'ct not (yet) in hash', conntrack lookups
+ * skip entries that lack this bit. This happens when a CPU is looking
+ * at a stale entry that is being recycled due to SLAB_TYPESAFE_BY_RCU
+ * or when another CPU encounters this entry right after the insertion
+ * but before the set-confirm-bit below. This bit must not be set until
+ * after __nf_conntrack_hash_insert().
+ */
+ smp_mb__before_atomic();
+ set_bit(IPS_CONFIRMED_BIT, &ct->status);
+
nf_conntrack_double_unlock(hash, reply_hash);
local_bh_enable();
--
2.39.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test
2025-07-17 9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
@ 2025-07-17 13:00 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-17 13:00 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw, horms
Hello:
This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:
On Thu, 17 Jul 2025 11:51:16 +0200 you wrote:
> From: Florian Westphal <fw@strlen.de>
>
> Extend the resize test:
> - continuously dump table both via /proc and ctnetlink interfaces while
> table is resized in a loop.
> - if socat is available, send udp packets in additon to ping requests.
> - increase/decrease the icmp and udp timeouts while resizes are happening.
> This makes sure we also exercise the 'ct has expired' check that happens
> on conntrack lookup.
>
> [...]
Here is the summary with links:
- [net,1/7] selftests: netfilter: conntrack_resize.sh: extend resize test
https://git.kernel.org/netdev/net/c/b08590559f4b
- [net,2/7] selftests: netfilter: add conntrack clash resolution test case
https://git.kernel.org/netdev/net/c/78a588363587
- [net,3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool
https://git.kernel.org/netdev/net/c/aa085ea1a68d
- [net,4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set
https://git.kernel.org/netdev/net/c/6dc2fae7f8a2
- [net,5/7] netfilter: nf_tables: hide clash bit from userspace
https://git.kernel.org/netdev/net/c/6ac86ac74e3a
- [net,6/7] Revert "netfilter: nf_tables: Add notifications for hook changes"
https://git.kernel.org/netdev/net/c/36a686c0784f
- [net,7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
https://git.kernel.org/netdev/net/c/2d72afb34065
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case
2025-07-17 9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
@ 2025-07-17 13:22 ` Jakub Kicinski
2025-07-17 15:14 ` Florian Westphal
0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2025-07-17 13:22 UTC (permalink / raw)
To: fw
Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, pabeni,
edumazet, horms
On Thu, 17 Jul 2025 11:51:17 +0200 Pablo Neira Ayuso wrote:
> Add a dedicated test to exercise conntrack clash resolution path.
> Test program emits 128 identical udp packets in parallel, then reads
> back replies from socat echo server.
>
> Also check (via conntrack -S) that the clash path was hit at least once.
> Due to the racy nature of the test its possible that despite the
> threaded program all packets were processed in-order or on same cpu,
> emit a SKIP warning in this case.
>
> Two tests are added:
> - one to test the simpler, non-nat case
> - one to exercise clash resolution where packets
> might have different nat transformations attached to them.
This appears to fail for us:
TAP version 13
1..1
# timeout set to 1800
# selftests: net/netfilter: conntrack_clash.sh
# got 128 of 128 replies
# timed out while waiting for reply from thread
# got 127 of 128 replies
# FAIL: did not receive expected number of replies for 10.0.1.99:22111
# FAIL: clash resolution test for 10.0.1.99:22111 on attempt 2
# got 128 of 128 replies
# timed out while waiting for reply from thread
# got 0 of 128 replies
# FAIL: did not receive expected number of replies for 127.0.0.1:9001
# FAIL: clash resolution test for 127.0.0.1:9001 on attempt 2
# SKIP: Clash resolution did not trigger
not ok 1 selftests: net/netfilter: conntrack_clash.sh # exit=1
make[1]: Leaving directory '/home/virtme/testing-15/tools/testing/selftests/net/netfilter'
make: Leaving directory '/home/virtme/testing-15/tools/testing/selftests'
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case
2025-07-17 13:22 ` Jakub Kicinski
@ 2025-07-17 15:14 ` Florian Westphal
0 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2025-07-17 15:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, pabeni,
edumazet, horms
Jakub Kicinski <kuba@kernel.org> wrote:
> # timeout set to 1800
> # selftests: net/netfilter: conntrack_clash.sh
> # got 128 of 128 replies
> # timed out while waiting for reply from thread
> # got 127 of 128 replies
> # FAIL: did not receive expected number of replies for 10.0.1.99:22111
> # FAIL: clash resolution test for 10.0.1.99:22111 on attempt 2
> # got 128 of 128 replies
> # timed out while waiting for reply from thread
> # got 0 of 128 replies
No idea yet whats happening here, I get 100% success rate even
when running this thing 1000 times in a loop.
I sent a patch to no longer treat '127 of 128' et al
as a hard error, i.e. the test should then either SKIP or pass.
And I added a bit more information, maybe the next test run
will tell more.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-07-17 15:14 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17 9:51 [PATCH net 0/7] Netfilter fixes for net Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 1/7] selftests: netfilter: conntrack_resize.sh: extend resize test Pablo Neira Ayuso
2025-07-17 13:00 ` patchwork-bot+netdevbpf
2025-07-17 9:51 ` [PATCH net 2/7] selftests: netfilter: add conntrack clash resolution test case Pablo Neira Ayuso
2025-07-17 13:22 ` Jakub Kicinski
2025-07-17 15:14 ` Florian Westphal
2025-07-17 9:51 ` [PATCH net 3/7] selftests: netfilter: conntrack_resize.sh: also use udpclash tool Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 4/7] selftests: netfilter: nft_concat_range.sh: send packets to empty set Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 5/7] netfilter: nf_tables: hide clash bit from userspace Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 6/7] Revert "netfilter: nf_tables: Add notifications for hook changes" Pablo Neira Ayuso
2025-07-17 9:51 ` [PATCH net 7/7] netfilter: nf_conntrack: fix crash due to removal of uninitialised entry Pablo Neira Ayuso
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.