All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf 0/2] netfilter: disable support for queueing cloned conntrack entries
@ 2024-08-07 19:28 Florian Westphal
  2024-08-07 19:28 ` [PATCH nf 1/2] netfilter: nf_queue: drop packets with cloned unconfirmed conntracks Florian Westphal
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Florian Westphal @ 2024-08-07 19:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Conntrack assumes an unconfirmed entry (not yet committed to global hash
table) has a refcount of 1 and is not visible to other cores.

With multicast forwarding this assumption breaks down because such
skbs get cloned after being picked up, i.e.  ct->use refcount is > 1.

Likewise, bridge netfilter will clone broad/mutlicast frames and
all frames in case they need to be flood-forwarded during learning
phase.

For ip multicast forwarding or plain bridge flood-forward this will
"work" because packets don't leave softirq and are implicitly
serialized.

With nfqueue this no longer holds true, the packets get queued
and can be reinjected in arbitrary ways.

Disable this feature.

After this patch, nfqueue cannot queue packets except the last
multicast/broadcast packet.

Alternatives:
- queue, but zap skb->nf_conn .  Problem:
  On reinject, packet would match INVALID state.
- same, but make them untracked. Slightly better, but not
  by much.
- check if NAT was applied or not.
  If not, we could theoretically queue and then
  relookup the conntrack on reinject.

This would create a new entry in established, new or invalid
state (userspace can munge the packet).

ATM I would prefer to go with the minimal solution which is
to disable this feature.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH nf 1/2] netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
  2024-08-07 19:28 [PATCH nf 0/2] netfilter: disable support for queueing cloned conntrack entries Florian Westphal
@ 2024-08-07 19:28 ` Florian Westphal
  2024-08-07 19:28 ` [PATCH nf 2/2] selftests: netfilter: add test for br_netfilter+conntrack+queue combination Florian Westphal
  2024-08-08 21:14 ` [PATCH nf v2 " Florian Westphal
  2 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2024-08-07 19:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Conntrack assumes an unconfirmed entry (not yet committed to global hash
table) has a refcount of 1 and is not visible to other cores.

With multicast forwarding this assumption breaks down because such
skbs get cloned after being picked up, i.e.  ct->use refcount is > 1.

Likewise, bridge netfilter will clone broad/mutlicast frames and
all frames in case they need to be flood-forwarded during learning
phase.

For ip multicast forwarding or plain bridge flood-forward this will
"work" because packets don't leave softirq and are implicitly
serialized.

With nfqueue this no longer holds true, the packets get queued
and can be reinjected in arbitrary ways.

Disable this feature, I see no other solution.

After this patch, nfqueue cannot queue packets except the last
multicast/broadcast packet.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_netfilter_hooks.c |  6 +++++-
 net/netfilter/nfnetlink_queue.c | 35 +++++++++++++++++++++++++++++++--
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 09f6a773a708..8f9c19d992ac 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -622,8 +622,12 @@ static unsigned int br_nf_local_in(void *priv,
 	if (likely(nf_ct_is_confirmed(ct)))
 		return NF_ACCEPT;
 
+	if (WARN_ON_ONCE(refcount_read(&nfct->use) != 1)) {
+		nf_reset_ct(skb);
+		return NF_ACCEPT;
+	}
+
 	WARN_ON_ONCE(skb_shared(skb));
-	WARN_ON_ONCE(refcount_read(&nfct->use) != 1);
 
 	/* We can't call nf_confirm here, it would create a dependency
 	 * on nf_conntrack module.
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 55e28e1da66e..e0716da256bf 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -820,10 +820,41 @@ static bool nf_ct_drop_unconfirmed(const struct nf_queue_entry *entry)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	static const unsigned long flags = IPS_CONFIRMED | IPS_DYING;
-	const struct nf_conn *ct = (void *)skb_nfct(entry->skb);
+	struct nf_conn *ct = (void *)skb_nfct(entry->skb);
+	unsigned long status;
+	unsigned int use;
 
-	if (ct && ((ct->status & flags) == IPS_DYING))
+	if (!ct)
+		return false;
+
+	status = READ_ONCE(ct->status);
+	if ((status & flags) == IPS_DYING)
 		return true;
+
+	if (status & IPS_CONFIRMED)
+		return false;
+
+	/* in some cases skb_clone() can occur after initial conntrack
+	 * pickup, but conntrack assumes exclusive skb->_nfct ownership for
+	 * unconfirmed entries.
+	 *
+	 * This happens for br_netfilter and with ip multicast routing.
+	 * We can't be solved with serialization here because one clone could
+	 * have been queued for local delivery.
+	 */
+	use = refcount_read(&ct->ct_general.use);
+	if (likely(use == 1))
+		return false;
+
+	/* Can't decrement further? Exclusive ownership. */
+	if (!refcount_dec_not_one(&ct->ct_general.use))
+		return false;
+
+	skb_set_nfct(entry->skb, 0);
+	/* No nf_ct_put(): we already decremented .use and it cannot
+	 * drop down to 0.
+	 */
+	return true;
 #endif
 	return false;
 }
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH nf 2/2] selftests: netfilter: add test for br_netfilter+conntrack+queue combination
  2024-08-07 19:28 [PATCH nf 0/2] netfilter: disable support for queueing cloned conntrack entries Florian Westphal
  2024-08-07 19:28 ` [PATCH nf 1/2] netfilter: nf_queue: drop packets with cloned unconfirmed conntracks Florian Westphal
@ 2024-08-07 19:28 ` Florian Westphal
  2024-08-07 19:51   ` Florian Westphal
  2024-08-08 21:14 ` [PATCH nf v2 " Florian Westphal
  2 siblings, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2024-08-07 19:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Trigger cloned skbs leaving softirq protection.
This triggers splat without the preceeding change
("netfilter: nf_queue: drop packets with cloned unconfirmed
 conntracks"):

WARNING: at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm..

because local delivery and forwarding will race for confirmation.

Based on a reproducer script from Yi Chen.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 .../testing/selftests/net/netfilter/Makefile  |  1 +
 .../net/netfilter/br_netfilter_queue.sh       | 71 +++++++++++++++++++
 tools/testing/selftests/net/netfilter/lib.sh  | 16 +++++
 .../selftests/net/netfilter/nft_queue.sh      | 16 -----
 4 files changed, 88 insertions(+), 16 deletions(-)
 create mode 100755 tools/testing/selftests/net/netfilter/br_netfilter_queue.sh

diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index 47945b2b3f92..d13fb5ea3e89 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -7,6 +7,7 @@ MNL_CFLAGS := $(shell $(HOSTPKG_CONFIG) --cflags libmnl 2>/dev/null)
 MNL_LDLIBS := $(shell $(HOSTPKG_CONFIG) --libs libmnl 2>/dev/null || echo -lmnl)
 
 TEST_PROGS := br_netfilter.sh bridge_brouter.sh
+TEST_PROGS += br_netfilter_queue.sh
 TEST_PROGS += conntrack_icmp_related.sh
 TEST_PROGS += conntrack_ipip_mtu.sh
 TEST_PROGS += conntrack_tcp_unreplied.sh
diff --git a/tools/testing/selftests/net/netfilter/br_netfilter_queue.sh b/tools/testing/selftests/net/netfilter/br_netfilter_queue.sh
new file mode 100755
index 000000000000..009ad754170a
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/br_netfilter_queue.sh
@@ -0,0 +1,71 @@
+#!/bin/bash
+
+source lib.sh
+
+checktool "nft --version" "run test without nft tool"
+
+cleanup() {
+	cleanup_all_ns
+}
+
+setup_ns c1 c2 c3 sender
+
+trap cleanup EXIT
+
+port_add() {
+	ns="$1"
+	dev="$2"
+	a="$3"
+
+	ip link add name "$dev" type veth peer name "$dev" netns "$ns"
+
+	ip -net "$ns" addr add 192.168.1."$a"/24 dev "$dev"
+	ip -net "$ns" link set "$dev" up
+
+	ip link set "$dev" master br0
+	ip link set "$dev" up
+}
+
+[ "${1}" != "run" ] && { unshare -n "${0}" run; exit $?; }
+
+ip link add br0 type bridge
+ip addr add 192.168.1.254/24 dev br0
+
+port_add "$c1" "c1" 1
+port_add "$c2" "c2" 2
+port_add "$c3" "c3" 3
+port_add "$sender" "sender" 253
+
+ip link set br0 up
+
+modprobe -q br_netfilter
+
+sysctl net.bridge.bridge-nf-call-iptables=1 || exit 1
+
+ip netns exec "$sender" ping -I sender -c1 192.168.1.1 || exit 1
+ip netns exec "$sender" ping -I sender -c1 192.168.1.2 || exit 2
+ip netns exec "$sender" ping -I sender -c1 192.168.1.3 || exit 3
+
+nft -f /dev/stdin <<EOF
+table ip filter {
+	chain forward {
+		type filter hook forward priority 0; policy accept;
+		ct state new counter
+		ip protocol icmp counter queue num 0 bypass
+	}
+}
+EOF
+./nf_queue -t 5 > /dev/null &
+
+while true; do conntrack -F > /dev/null 2> /dev/null; sleep 0.1 ; done &
+ip netns exec "$sender" ping -I sender -f -c 50 -b 192.168.1.255
+
+read t < /proc/sys/kernel/tainted
+if [ "$t" -eq 0 ];then
+	echo PASS: kernel not tainted
+else
+	echo ERROR: kernel is tainted
+	exit 1
+fi
+
+exit 0
diff --git a/tools/testing/selftests/net/netfilter/lib.sh b/tools/testing/selftests/net/netfilter/lib.sh
index bedd35298e15..1960e7b3c982 100644
--- a/tools/testing/selftests/net/netfilter/lib.sh
+++ b/tools/testing/selftests/net/netfilter/lib.sh
@@ -8,3 +8,19 @@ checktool (){
 		exit $ksft_skip
 	fi
 }
+
+nf_queue_wait()
+{
+	local procfile="/proc/self/net/netfilter/nfnetlink_queue"
+	local netns id
+
+	netns="$1"
+	id="$2"
+
+	# if this file doesn't exist, nfnetlink_module isn't loaded.
+	# rather than loading it ourselves, wait for kernel module autoload
+	# completion, nfnetlink should do so automatically because nf_queue
+	# helper program, spawned in the background, asked for this functionality.
+	test -f "$procfile" &&
+		ip netns exec "$netns" cat "$procfile" | grep -q "^ *$id "
+}
diff --git a/tools/testing/selftests/net/netfilter/nft_queue.sh b/tools/testing/selftests/net/netfilter/nft_queue.sh
index c61d23a8c88d..edd0e9d6def4 100755
--- a/tools/testing/selftests/net/netfilter/nft_queue.sh
+++ b/tools/testing/selftests/net/netfilter/nft_queue.sh
@@ -190,22 +190,6 @@ EOF
         echo "PASS: $proto: statement with no listener results in packet drop"
 }
 
-nf_queue_wait()
-{
-	local procfile="/proc/self/net/netfilter/nfnetlink_queue"
-	local netns id
-
-	netns="$1"
-	id="$2"
-
-	# if this file doesn't exist, nfnetlink_module isn't loaded.
-	# rather than loading it ourselves, wait for kernel module autoload
-	# completion, nfnetlink should do so automatically because nf_queue
-	# helper program, spawned in the background, asked for this functionality.
-	test -f "$procfile" &&
-		ip netns exec "$netns" cat "$procfile" | grep -q "^ *$id "
-}
-
 test_queue()
 {
 	local expected="$1"
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH nf 2/2] selftests: netfilter: add test for br_netfilter+conntrack+queue combination
  2024-08-07 19:28 ` [PATCH nf 2/2] selftests: netfilter: add test for br_netfilter+conntrack+queue combination Florian Westphal
@ 2024-08-07 19:51   ` Florian Westphal
  0 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2024-08-07 19:51 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

Florian Westphal <fw@strlen.de> wrote:
> Trigger cloned skbs leaving softirq protection.
> This triggers splat without the preceeding change
> ("netfilter: nf_queue: drop packets with cloned unconfirmed
>  conntracks"):
> 
> WARNING: at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm..
> 
> because local delivery and forwarding will race for confirmation.
> 
> Based on a reproducer script from Yi Chen.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  tools/testing/selftests/net/netfilter/lib.sh  | 16 +++++
>  .../selftests/net/netfilter/nft_queue.sh      | 16 -----

Those two files should be unchanged. I'll send a v2 of the
selftest script tomorrow.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH nf v2 2/2] selftests: netfilter: add test for br_netfilter+conntrack+queue combination
  2024-08-07 19:28 [PATCH nf 0/2] netfilter: disable support for queueing cloned conntrack entries Florian Westphal
  2024-08-07 19:28 ` [PATCH nf 1/2] netfilter: nf_queue: drop packets with cloned unconfirmed conntracks Florian Westphal
  2024-08-07 19:28 ` [PATCH nf 2/2] selftests: netfilter: add test for br_netfilter+conntrack+queue combination Florian Westphal
@ 2024-08-08 21:14 ` Florian Westphal
  2 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2024-08-08 21:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Trigger cloned skbs leaving softirq protection.
This triggers splat without the preceeding change
("netfilter: nf_queue: drop packets with cloned unconfirmed
 conntracks"):

WARNING: at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm..

because local delivery and forwarding will race for confirmation.

Based on a reproducer script from Yi Chen.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: remove unwanted change to lib.sh and remove
 conntrack -F busyloop.

 .../testing/selftests/net/netfilter/Makefile  |  1 +
 .../net/netfilter/br_netfilter_queue.sh       | 78 +++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100755 tools/testing/selftests/net/netfilter/br_netfilter_queue.sh

diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index 47945b2b3f92..d13fb5ea3e89 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -7,6 +7,7 @@ MNL_CFLAGS := $(shell $(HOSTPKG_CONFIG) --cflags libmnl 2>/dev/null)
 MNL_LDLIBS := $(shell $(HOSTPKG_CONFIG) --libs libmnl 2>/dev/null || echo -lmnl)
 
 TEST_PROGS := br_netfilter.sh bridge_brouter.sh
+TEST_PROGS += br_netfilter_queue.sh
 TEST_PROGS += conntrack_icmp_related.sh
 TEST_PROGS += conntrack_ipip_mtu.sh
 TEST_PROGS += conntrack_tcp_unreplied.sh
diff --git a/tools/testing/selftests/net/netfilter/br_netfilter_queue.sh b/tools/testing/selftests/net/netfilter/br_netfilter_queue.sh
new file mode 100755
index 000000000000..6a764d70ab06
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/br_netfilter_queue.sh
@@ -0,0 +1,78 @@
+#!/bin/bash
+
+source lib.sh
+
+checktool "nft --version" "run test without nft tool"
+
+cleanup() {
+	cleanup_all_ns
+}
+
+setup_ns c1 c2 c3 sender
+
+trap cleanup EXIT
+
+nf_queue_wait()
+{
+	grep -q "^ *$1 " "/proc/self/net/netfilter/nfnetlink_queue"
+}
+
+port_add() {
+	ns="$1"
+	dev="$2"
+	a="$3"
+
+	ip link add name "$dev" type veth peer name "$dev" netns "$ns"
+
+	ip -net "$ns" addr add 192.168.1."$a"/24 dev "$dev"
+	ip -net "$ns" link set "$dev" up
+
+	ip link set "$dev" master br0
+	ip link set "$dev" up
+}
+
+[ "${1}" != "run" ] && { unshare -n "${0}" run; exit $?; }
+
+ip link add br0 type bridge
+ip addr add 192.168.1.254/24 dev br0
+
+port_add "$c1" "c1" 1
+port_add "$c2" "c2" 2
+port_add "$c3" "c3" 3
+port_add "$sender" "sender" 253
+
+ip link set br0 up
+
+modprobe -q br_netfilter
+
+sysctl net.bridge.bridge-nf-call-iptables=1 || exit 1
+
+ip netns exec "$sender" ping -I sender -c1 192.168.1.1 || exit 1
+ip netns exec "$sender" ping -I sender -c1 192.168.1.2 || exit 2
+ip netns exec "$sender" ping -I sender -c1 192.168.1.3 || exit 3
+
+nft -f /dev/stdin <<EOF
+table ip filter {
+	chain forward {
+		type filter hook forward priority 0; policy accept;
+		ct state new counter
+		ip protocol icmp counter queue num 0 bypass
+	}
+}
+EOF
+./nf_queue -t 5 > /dev/null &
+
+busywait 5000 nf_queue_wait
+
+for i in $(seq 1 5); do conntrack -F > /dev/null 2> /dev/null; sleep 0.1 ; done &
+ip netns exec "$sender" ping -I sender -f -c 50 -b 192.168.1.255
+
+read t < /proc/sys/kernel/tainted
+if [ "$t" -eq 0 ];then
+	echo PASS: kernel not tainted
+else
+	echo ERROR: kernel is tainted
+	exit 1
+fi
+
+exit 0
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-08-08 21:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-07 19:28 [PATCH nf 0/2] netfilter: disable support for queueing cloned conntrack entries Florian Westphal
2024-08-07 19:28 ` [PATCH nf 1/2] netfilter: nf_queue: drop packets with cloned unconfirmed conntracks Florian Westphal
2024-08-07 19:28 ` [PATCH nf 2/2] selftests: netfilter: add test for br_netfilter+conntrack+queue combination Florian Westphal
2024-08-07 19:51   ` Florian Westphal
2024-08-08 21:14 ` [PATCH nf v2 " Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.