All of lore.kernel.org
 help / color / mirror / Atom feed
* netfilter expected behavior for established connections
@ 2025-03-11 23:56 Antonio Ojea
  2025-03-12  0:30 ` imnozi
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Antonio Ojea @ 2025-03-11 23:56 UTC (permalink / raw)
  To: netfilter

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

Hi,

I'm puzzled trying to understand the following behavior, appreciate it
if you can help me to understand better how this works.

The setup is like this:  Client --- Router --- Server

- Router DNATs to a Virtual IP and Port of the Server.
- Client establishes a permanent connection to the Virtual IP.
- Router adds a REJECT rule in the FORWARD hook for the Server IP

I expect the REJECT to match the established connection, but the
client keeps reaching the Server using the existing connection.

The packets of the established connection do not show up on the traces
using nftrace.

Is it possible to "DROP/REJECT" the established connection ?

I've created a selftest to reproduce this behavior, please find it attached.

[-- Attachment #2: 0001-selftests-netfilter-conntrack-does-not-shadow-reject.patch --]
[-- Type: application/octet-stream, Size: 9094 bytes --]

From 8f60146397b277c43bf795651e4cd00469c0bbf3 Mon Sep 17 00:00:00 2001
From: Antonio Ojea <aojea@google.com>
Date: Tue, 11 Mar 2025 08:36:56 +0000
Subject: [PATCH] selftests: netfilter: conntrack does not shadow reject rules

Test netfilter behavior specific for established connections.

Signed-off-by: Antonio Ojea <aojea@google.com>
---
 .../testing/selftests/net/netfilter/Makefile  |   1 +
 .../nft_conntrack_reject_established.sh       | 251 ++++++++++++++++++
 2 files changed, 252 insertions(+)
 create mode 100755 tools/testing/selftests/net/netfilter/nft_conntrack_reject_established.sh

diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index ffe161fac8b5..c276b8ac2383 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -21,6 +21,7 @@ TEST_PROGS += nf_nat_edemux.sh
 TEST_PROGS += nft_audit.sh
 TEST_PROGS += nft_concat_range.sh
 TEST_PROGS += nft_conntrack_helper.sh
+TEST_PROGS += nft_conntrack_reject_established.sh
 TEST_PROGS += nft_fib.sh
 TEST_PROGS += nft_flowtable.sh
 TEST_PROGS += nft_meta.sh
diff --git a/tools/testing/selftests/net/netfilter/nft_conntrack_reject_established.sh b/tools/testing/selftests/net/netfilter/nft_conntrack_reject_established.sh
new file mode 100755
index 000000000000..9e2a2f24640e
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/nft_conntrack_reject_established.sh
@@ -0,0 +1,251 @@
+#!/bin/bash
+#
+# This tests conntrack on the following scenario:
+#
+#                         +------------+
+# +-------+               |  nsrouter  |                  +-------+
+# |ns1    |.99          .1|            |.1             .99|    ns2|
+# |   eth0|---------------|veth0  veth1|------------------|eth0   |
+# |       |  10.0.1.0/24  |            |   10.0.2.0/24    |       |
+# +-------+  dead:1::/64  |    veth2   |   dead:2::/64    +-------+
+#                         +------------+
+#
+# nsrouters implement loadbalancing using DNAT with a virtual IP
+# 10.0.4.10 - dead:4::a
+# shellcheck disable=SC2162,SC2317
+
+source lib.sh
+ret=0
+
+timeout=15
+
+cleanup()
+{
+	ip netns pids "$ns1" | xargs kill 2>/dev/null
+	ip netns pids "$ns2" | xargs kill 2>/dev/null
+	ip netns pids "$nsrouter" | xargs kill 2>/dev/null
+
+	cleanup_all_ns
+}
+
+checktool "nft --version" "test without nft tool"
+checktool "socat -h" "run test without socat"
+
+trap cleanup EXIT
+setup_ns ns1 ns2 nsrouter
+
+if ! ip link add veth0 netns "$nsrouter" type veth peer name eth0 netns "$ns1" > /dev/null 2>&1; then
+    echo "SKIP: No virtual ethernet pair device support in kernel"
+    exit $ksft_skip
+fi
+ip link add veth1 netns "$nsrouter" type veth peer name eth0 netns "$ns2"
+
+ip -net "$nsrouter" link set veth0 up
+ip -net "$nsrouter" addr add 10.0.1.1/24 dev veth0
+ip -net "$nsrouter" addr add dead:1::1/64 dev veth0 nodad
+
+ip -net "$nsrouter" link set veth1 up
+ip -net "$nsrouter" addr add 10.0.2.1/24 dev veth1
+ip -net "$nsrouter" addr add dead:2::1/64 dev veth1 nodad
+
+
+ip -net "$ns1" link set eth0 up
+ip -net "$ns2" link set eth0 up
+
+ip -net "$ns1" addr add 10.0.1.99/24 dev eth0
+ip -net "$ns1" addr add dead:1::99/64 dev eth0 nodad
+ip -net "$ns1" route add default via 10.0.1.1
+ip -net "$ns1" route add default via dead:1::1
+
+ip -net "$ns2" addr add 10.0.2.99/24 dev eth0
+ip -net "$ns2" addr add dead:2::99/64 dev eth0 nodad
+ip -net "$ns2" route add default via 10.0.2.1
+ip -net "$ns2" route add default via dead:2::1
+
+
+ip netns exec "$nsrouter" sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
+ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null
+ip netns exec "$nsrouter" sysctl net.ipv4.conf.veth1.forwarding=1 > /dev/null
+
+test_ping() {
+  if ! ip netns exec "$ns1" ping -c 1 -q 10.0.2.99 > /dev/null; then
+	return 1
+  fi
+
+  if ! ip netns exec "$ns1" ping -c 1 -q dead:2::99 > /dev/null; then
+	return 2
+  fi
+
+  return 0
+}
+
+test_ping_router() {
+  if ! ip netns exec "$ns1" ping -c 1 -q 10.0.2.1 > /dev/null; then
+	return 3
+  fi
+
+  if ! ip netns exec "$ns1" ping -c 1 -q dead:2::1 > /dev/null; then
+	return 4
+  fi
+
+  return 0
+}
+
+
+listener_ready()
+{
+	local ns="$1"
+	local port="$2"
+	local proto="$3"
+	ss -N "$ns" -ln "$proto" -o "sport = :$port" | grep -q "$port"
+}
+
+test_conntrack_reject_established()
+{
+	local ip_proto="$1"
+	# derived variables
+	local testname="test_${ip_proto}_conntrack_reject_established"
+	local socat_ipproto
+	local vip
+	local ns2_ip
+	local ns2_ip_port
+
+	# socat 1.8.0 has a bug that requires to specify the IP family to bind (fixed in 1.8.0.1)
+	case $ip_proto in
+	"ip")
+		socat_ipproto="-4"
+		vip=10.0.4.10
+		ns2_ip=10.0.2.99
+		vip_ip_port="$vip:8080"
+		ns2_ip_port="$ns2_ip:8080"
+	;;
+	"ip6")
+		socat_ipproto="-6"
+		vip=dead:4::a
+		ns2_ip=dead:2::99
+		vip_ip_port="[$vip]:8080"
+		ns2_ip_port="[$ns2_ip]:8080"
+	;;
+	*)
+	echo "FAIL: unsupported protocol"
+	exit 255
+	;;
+	esac
+
+	ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF
+flush ruleset
+table inet nat {
+	chain kube-proxy {
+		type nat hook prerouting priority 0; policy accept;
+		$ip_proto daddr $vip tcp dport 8080 dnat to $ns2_ip_port
+	}
+}
+EOF
+
+	# set up an echo server
+	timeout "$timeout" ip netns exec "$ns2" socat "$socat_ipproto" tcp-listen:8080,fork PIPE 2>/dev/null &
+	local server2_pid=$!
+
+	busywait "$BUSYWAIT_TIMEOUT" listener_ready "$ns2" 8080 "-t"
+
+	local result
+	# request from ns1 to ns2 (direct traffic)
+	result=$(echo PING | ip netns exec "$ns1" socat -t 2 -T 2 STDIO tcp:"$ns2_ip_port")
+	if [ "$result" == "PING" ] ;then
+		echo "PASS: $testname: ns1 got reply \"$result\" connecting to ns2"
+	else
+		echo "ERROR: $testname: ns1 got reply \"$result\" connecting to ns2, not \"PING\" as intended"
+		ret=1
+	fi
+
+	# set up a persistent connection through DNAT to ns3
+	rm -f pipe.test
+	timeout "$timeout" ip netns exec "$ns1" socat -v -d "$socat_ipproto" PIPE:pipe.test tcp:"$vip_ip_port" &
+	local client1_pid=$!
+	# create FD 3 for writing and reading to the pipe
+	exec 3<>pipe.test
+
+
+	# request from ns1 to vip (DNAT to ns2)
+	echo PING >&3 && read line <&3
+	if [ "$result" = "PING" ] ;then
+		echo "PASS: $testname: ns1 got reply \"$result\" connecting to vip using persistent connection"
+	else
+		echo "ERROR: $testname: ns1 got reply \"$result\" connecting to vip using persistent connection, not \"PING\" as intended"
+		ret=1
+	fi
+
+	# request from ns1 to vip
+	result=$(echo PING | ip netns exec "$ns1" socat -t 2 -T 2 STDIO tcp:"$vip_ip_port")
+	if [ "$result" == "PING" ] ;then
+		echo "PASS: $testname: ns1 got reply \"$result\" connecting to vip"
+	else
+		echo "ERROR: $testname: ns1 got reply \"$result\" connecting to vip, not \"PING\" as intended"
+		ret=1
+	fi
+
+	# request from ns1 to vip persistent connection (DNAT to ns2)
+	echo PING >&3 && read line <&3
+	if [ "$result" = "PING" ] ;then
+		echo "PASS: $testname: ns1 got reply \"$result\" connecting to vip using persistent connection"
+	else
+		echo "ERROR: $testname: ns1 got reply \"$result\" connecting to vip using persistent connection, not \"PING\" as intended"
+		ret=1
+	fi
+
+	# add a rule to filter traffic to ns2 ip and port (after DNAT)
+	ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF
+table inet filter {
+	chain kube-proxy {
+		type filter hook forward priority 0; policy accept;
+		$ip_proto daddr $ns2_ip tcp dport 8080 counter reject
+	}
+}
+EOF
+
+	# request from ns1 to ns2 (direct traffic)
+	result=$(echo PING | ip netns exec "$ns1" socat -t 2 -T 2 STDIO tcp:"$ns2_ip_port" 2>&1 >/dev/null)
+	if [[ "$result" == *"Connection refused"* ]] ;then
+		echo "PASS: $testname: ns1 got \"Connection refused\" connecting to vip (ns2)"
+	else
+		echo "ERROR: $testname: ns1 got reply \"$result\" connecting to vip, not \"Connection refused\" as intended"
+		ret=1
+	fi
+
+	# request from ns1 to vip (DNAT to ns2)
+	result=$(echo PING | ip netns exec "$ns1" socat -t 2 -T 2 STDIO tcp:"$vip_ip_port" 2>&1 >/dev/null)
+	if [[ "$result" == *"Connection refused"* ]] ;then
+		echo "PASS: $testname: ns1 connection to vip is closed (ns2)"
+	else
+		echo "ERROR: $testname: ns1 got reply \"$result\" connecting to vip, not \"Connection refused\" as intended"
+		ret=1
+	fi
+
+	# request from ns1 to vip (DNAT to ns2) on an existing connection
+	echo PING >&3 && read result <&3
+	if [[ -z "$result" ]] && ! kill -0 "$client1_pid" 2>/dev/null; then
+		echo "PASS: $testname: ns1 got no response and client is closed to vip (ns2)"
+	else
+		echo "ERROR: $testname: ns1 got reply \"$result\" connecting to vip, persistent connection is not closed as intended"
+		ret=1
+	fi
+
+	nft list counters 1>&2
+
+	kill $client1_pid 2>/dev/null
+	kill $server2_pid 2>/dev/null
+}
+
+
+if test_ping; then
+	# queue bypass works (rules were skipped, no listener)
+	echo "PASS: ${ns1} can reach ${ns2}"
+else
+	echo "FAIL: ${ns1} cannot reach ${ns2}: $ret" 1>&2
+	exit $ret
+fi
+
+test_conntrack_reject_established "ip"
+test_conntrack_reject_established "ip6"
+
+exit $ret
-- 
2.49.0.rc0.332.g42c0ae87b1-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-11 23:56 netfilter expected behavior for established connections Antonio Ojea
@ 2025-03-12  0:30 ` imnozi
  2025-03-12  7:11 ` Florian Westphal
  2025-03-12 16:13 ` Florian Westphal
  2 siblings, 0 replies; 13+ messages in thread
From: imnozi @ 2025-03-12  0:30 UTC (permalink / raw)
  To: netfilter-devel

On Wed, 12 Mar 2025 00:56:48 +0100
Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:

> Hi,
> 
> I'm puzzled trying to understand the following behavior, appreciate it
> if you can help me to understand better how this works.
> 
> The setup is like this:  Client --- Router --- Server
> 
> - Router DNATs to a Virtual IP and Port of the Server.
> - Client establishes a permanent connection to the Virtual IP.
> - Router adds a REJECT rule in the FORWARD hook for the Server IP
> 
> I expect the REJECT to match the established connection, but the
> client keeps reaching the Server using the existing connection.
> 
> The packets of the established connection do not show up on the traces
> using nftrace.
> 
> Is it possible to "DROP/REJECT" the established connection ?

If I understand correctly, if you want to terminate a TCP conn with iptables, you can:

  iptables -N disconn
  iptables -A disconn -p tcp -m state --state ESTABLISHED \
      -j REJECT --reject-with tcp-reset
  iptables -A disconn -j REJECT --reject-with icmp-admin-prohibited

If your other rules determine that a conn should be shut down, they should jump to chain 'disconn' which will immediately reset the the sender's end if it's a TCP conn and cause all other packets for that conn from that end to be rejected. Each end must send a TCP packet on that conn for it to be fully reset.

I've used this on my F/W for timed access. The 'instant' time moves into a prohibited span, all active connections for affected IPs are immediately shut down and blocked; not one more of their packets crosses the F/W. I also use it for blocklists.

I expect nftables has similar functionality.

Neal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-11 23:56 netfilter expected behavior for established connections Antonio Ojea
  2025-03-12  0:30 ` imnozi
@ 2025-03-12  7:11 ` Florian Westphal
  2025-03-12 10:55   ` Antonio Ojea
  2025-03-12 16:13 ` Florian Westphal
  2 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2025-03-12  7:11 UTC (permalink / raw)
  To: Antonio Ojea; +Cc: netfilter

Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> Hi,
> 
> I'm puzzled trying to understand the following behavior, appreciate it
> if you can help me to understand better how this works.
> 
> The setup is like this:  Client --- Router --- Server
> 
> - Router DNATs to a Virtual IP and Port of the Server.
> - Client establishes a permanent connection to the Virtual IP.
> - Router adds a REJECT rule in the FORWARD hook for the Server IP
> 
> I expect the REJECT to match the established connection, but the
> client keeps reaching the Server using the existing connection.
> 
> The packets of the established connection do not show up on the traces
> using nftrace.
> 
> Is it possible to "DROP/REJECT" the established connection ?
> 
> I've created a selftest to reproduce this behavior, please find it attached.

Unfortuntely this selftest passes for me.

PASS: ns1-apNbtu can reach ns2-VgBo5h
PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to ns2
PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
> 2025/03/12 08:10:58.000388001  length=5 from=0 to=4
PING
< 2025/03/12 08:10:58.000388848  length=5 from=0 to=4
PING
PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to vip
PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
PASS: test_ip_conntrack_reject_established: ns1 got "Connection refused" connecting to vip (ns2)
PASS: test_ip_conntrack_reject_established: ns1 connection to vip is closed (ns2)
PASS: test_ip_conntrack_reject_established: ns1 got no response and client is closed to vip (ns2)
PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to ns2
PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
> 2025/03/12 08:11:00.000519768  length=5 from=0 to=4
PING
< 2025/03/12 08:11:00.000520866  length=5 from=0 to=4
PING
PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to vip
PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
PASS: test_ip6_conntrack_reject_established: ns1 got "Connection refused" connecting to vip (ns2)
PASS: test_ip6_conntrack_reject_established: ns1 connection to vip is closed (ns2)
PASS: test_ip6_conntrack_reject_established: ns1 got no response and client is closed to vip (ns2)

Linux 6.13.5-200.fc41.x86_64
nftables v1.0.9 (Old Doc Yak #3)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12  7:11 ` Florian Westphal
@ 2025-03-12 10:55   ` Antonio Ojea
  2025-03-12 12:51     ` Florian Westphal
  0 siblings, 1 reply; 13+ messages in thread
From: Antonio Ojea @ 2025-03-12 10:55 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter

On Wed, 12 Mar 2025 at 08:11, Florian Westphal <fw@strlen.de> wrote:
>
> Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> > Hi,
> >
> > I'm puzzled trying to understand the following behavior, appreciate it
> > if you can help me to understand better how this works.
> >
> > The setup is like this:  Client --- Router --- Server
> >
> > - Router DNATs to a Virtual IP and Port of the Server.
> > - Client establishes a permanent connection to the Virtual IP.
> > - Router adds a REJECT rule in the FORWARD hook for the Server IP
> >
> > I expect the REJECT to match the established connection, but the
> > client keeps reaching the Server using the existing connection.
> >
> > The packets of the established connection do not show up on the traces
> > using nftrace.
> >
> > Is it possible to "DROP/REJECT" the established connection ?
> >
> > I've created a selftest to reproduce this behavior, please find it attached.
>
> Unfortuntely this selftest passes for me.
>
> PASS: ns1-apNbtu can reach ns2-VgBo5h
> PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to ns2
> PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
> > 2025/03/12 08:10:58.000388001  length=5 from=0 to=4
> PING
> < 2025/03/12 08:10:58.000388848  length=5 from=0 to=4
> PING
> PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to vip
> PASS: test_ip_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
> PASS: test_ip_conntrack_reject_established: ns1 got "Connection refused" connecting to vip (ns2)
> PASS: test_ip_conntrack_reject_established: ns1 connection to vip is closed (ns2)
> PASS: test_ip_conntrack_reject_established: ns1 got no response and client is closed to vip (ns2)
> PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to ns2
> PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
> > 2025/03/12 08:11:00.000519768  length=5 from=0 to=4
> PING
> < 2025/03/12 08:11:00.000520866  length=5 from=0 to=4
> PING
> PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to vip
> PASS: test_ip6_conntrack_reject_established: ns1 got reply "PING" connecting to vip using persistent connection
> PASS: test_ip6_conntrack_reject_established: ns1 got "Connection refused" connecting to vip (ns2)
> PASS: test_ip6_conntrack_reject_established: ns1 connection to vip is closed (ns2)
> PASS: test_ip6_conntrack_reject_established: ns1 got no response and client is closed to vip (ns2)
>
> Linux 6.13.5-200.fc41.x86_64
> nftables v1.0.9 (Old Doc Yak #3)


I've tried to debug this further, I did:
1. Install a trace rule but the subsequent packets on the established
connection does not show in the trace

chain input {
type filter hook prerouting priority -301; policy accept;
ip protocol tcp meta nftrace set 1
}

2. run ./pwru --output-tuple --output-meta tcp port 8080, and this is
the only output I got when I send data over the established connection


0xffffa20289e39600 38  ~bin/socat1:3357 0          0               0
      0x0800 0     10    10.0.4.10:8080->10.0.1.99:12345(tcp)
__skb_clone
0xffffa20289e39600 38  ~bin/socat1:3357 0          0               0
      0x0800 0     10    10.0.4.10:8080->10.0.1.99:12345(tcp)
__copy_skb_header
0xffffa20289e39b80 38  ~bin/socat1:3357 0          0               0
      0x0800 0     0     10.0.4.10:8080->10.0.1.99:12345(tcp)
napi_consume_skb
0xffffa20289e39b80 38  ~bin/socat1:3357 0          0               0
      0x0800 0     0     10.0.4.10:8080->10.0.1.99:12345(tcp)
skb_release_head_state
0xffffa20289e39b80 38  ~bin/socat1:3357 0          0               0
      0x0800 0     0     10.0.4.10:8080->10.0.1.99:12345(tcp)
skb_release_data
0xffffa20289e39b80 38  ~bin/socat1:3357 0          0               0
      0x0800 0     0     10.0.4.10:8080->10.0.1.99:12345(tcp)
skb_free_head
0xffffa20289e39b80 38  ~bin/socat1:3357 0          0               0
      0x0800 0     0     10.0.4.10:8080->10.0.1.99:12345(tcp)
kfree_skbmem


Is there some kind of optimization that just directly copies the data
without going through netfilter hooks or am I doing something wrong?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 10:55   ` Antonio Ojea
@ 2025-03-12 12:51     ` Florian Westphal
  2025-03-12 13:04       ` Antonio Ojea
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2025-03-12 12:51 UTC (permalink / raw)
  To: Antonio Ojea; +Cc: Florian Westphal, netfilter

Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> Is there some kind of optimization that just directly copies the data
> without going through netfilter hooks or am I doing something wrong?

Looks lke whatever environment you are using has bpf progs in place
that change packet flow, or some other proprietrary modules.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 12:51     ` Florian Westphal
@ 2025-03-12 13:04       ` Antonio Ojea
  2025-03-12 14:17         ` Antonio Ojea
  0 siblings, 1 reply; 13+ messages in thread
From: Antonio Ojea @ 2025-03-12 13:04 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter

On Wed, 12 Mar 2025 at 13:51, Florian Westphal <fw@strlen.de> wrote:
>
> Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> > Is there some kind of optimization that just directly copies the data
> > without going through netfilter hooks or am I doing something wrong?
>
> Looks lke whatever environment you are using has bpf progs in place
> that change packet flow, or some other proprietrary modules.


hmm, I'm building a vanilla kernel and running it with virtme-ng (but
it is indeed a controlled host)

vng -v -r arch/x86/boot/bzImage --user=root

let me try to run in a different and more clean environment and report back

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 13:04       ` Antonio Ojea
@ 2025-03-12 14:17         ` Antonio Ojea
  2025-03-12 14:25           ` Florian Westphal
  0 siblings, 1 reply; 13+ messages in thread
From: Antonio Ojea @ 2025-03-12 14:17 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter

On Wed, 12 Mar 2025 at 14:04, Antonio Ojea
<antonio.ojea.garcia@gmail.com> wrote:
>
> On Wed, 12 Mar 2025 at 13:51, Florian Westphal <fw@strlen.de> wrote:
> >
> > Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> > > Is there some kind of optimization that just directly copies the data
> > > without going through netfilter hooks or am I doing something wrong?
> >
> > Looks lke whatever environment you are using has bpf progs in place
> > that change packet flow, or some other proprietrary modules.
>
>
> hmm, I'm building a vanilla kernel and running it with virtme-ng (but
> it is indeed a controlled host)
>
> vng -v -r arch/x86/boot/bzImage --user=root
>
> let me try to run in a different and more clean environment and report back

Ok,this is working in a ewn Debian VM:
Linux instance-20250312-132718 6.1.0-31-cloud-amd64 #1 SMP
PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) x86_64 GNU/Linux

At least I know I should not use my first environment for these
things, sorry about that.

Florian, do you mind if I submit the selftest patch?
I really want to get confidence this behavior does not regress, since
we are probably building a feature based on it

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 14:17         ` Antonio Ojea
@ 2025-03-12 14:25           ` Florian Westphal
  0 siblings, 0 replies; 13+ messages in thread
From: Florian Westphal @ 2025-03-12 14:25 UTC (permalink / raw)
  To: Antonio Ojea; +Cc: Florian Westphal, netfilter

Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> Florian, do you mind if I submit the selftest patch?
> I really want to get confidence this behavior does not regress, since
> we are probably building a feature based on it

The more tests the better, go right ahead.

There is a stray "nft list counters" in there that you
could remove.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-11 23:56 netfilter expected behavior for established connections Antonio Ojea
  2025-03-12  0:30 ` imnozi
  2025-03-12  7:11 ` Florian Westphal
@ 2025-03-12 16:13 ` Florian Westphal
  2025-03-12 18:02   ` Antonio Ojea
  2 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2025-03-12 16:13 UTC (permalink / raw)
  To: Antonio Ojea; +Cc: netfilter

Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> I'm puzzled trying to understand the following behavior, appreciate it
> if you can help me to understand better how this works.
> 
> The setup is like this:  Client --- Router --- Server
> 
> - Router DNATs to a Virtual IP and Port of the Server.
> - Client establishes a permanent connection to the Virtual IP.
> - Router adds a REJECT rule in the FORWARD hook for the Server IP
> 
> I expect the REJECT to match the established connection, but the
> client keeps reaching the Server using the existing connection.
> 
> The packets of the established connection do not show up on the traces
> using nftrace.
> 
> Is it possible to "DROP/REJECT" the established connection ?
> 
> I've created a selftest to reproduce this behavior, please find it attached.

Are you sure this script works as intended?

Doing:
socat tcp-listen:12345,fork PIPE &

socat PIPE:P tcp:127.0.0.1:12345 &

echo foo > P

... causes endless traffic, since listener echoes
P back, that gets written to P, socat reads from it,
eches foo to server, that sends to client, ...

Probably you need to use:

socat -u PIPE:P,rdonly ... ?

This config change is also needed:

--- a/tools/testing/selftests/net/netfilter/config
+++ b/tools/testing/selftests/net/netfilter/config
@@ -81,6 +81,7 @@ CONFIG_NFT_NUMGEN=m
 CONFIG_NFT_QUEUE=m
 CONFIG_NFT_QUOTA=m
 CONFIG_NFT_REDIR=m
+CONFIG_NFT_REJECT=m

since thats the kernel feature template used by the netdev ci to
build the test kernel to use.

Another issue: cwd might be readonly, so creating pipe.test will fail.

I suggest to use
pipename=$(mktemp -u)

so the named fifo is created in /tmp which is writeable.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 16:13 ` Florian Westphal
@ 2025-03-12 18:02   ` Antonio Ojea
  2025-03-12 18:20     ` Florian Westphal
  0 siblings, 1 reply; 13+ messages in thread
From: Antonio Ojea @ 2025-03-12 18:02 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter

> Are you sure this script works as intended?
>
> Doing:
> socat tcp-listen:12345,fork PIPE &
>
> socat PIPE:P tcp:127.0.0.1:12345 &
>
> echo foo > P
>
> ... causes endless traffic, since listener echoes
> P back, that gets written to P, socat reads from it,
> eches foo to server, that sends to client, ...
>

heh, I got bitten for that too, that is why I have

echo PING >&3 && read line <&3

the moment that you read it from the fd, it does not come back to the
socket and you break the loop.
I like this approach because it guarantees the packet traverse the
network , but I think we can make it unidirectionally and just dump
the other side in a
It is surprisingly more complex than I thought to create a persistent
connection that you can reuse in the test..
Let me try to find a simpler way

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 18:02   ` Antonio Ojea
@ 2025-03-12 18:20     ` Florian Westphal
  2025-03-12 18:29       ` Antonio Ojea
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Westphal @ 2025-03-12 18:20 UTC (permalink / raw)
  To: Antonio Ojea; +Cc: Florian Westphal, netfilter

Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> heh, I got bitten for that too, that is why I have
> 
> echo PING >&3 && read line <&3
> 
> the moment that you read it from the fd, it does not come back to the
> socket and you break the loop.

AFAIU the read could happen before socat managed to read from
the pipe (so nothing is sent over network).

Or there could have been several writes over the network,
not just 'exactly one write'.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 18:20     ` Florian Westphal
@ 2025-03-12 18:29       ` Antonio Ojea
  2025-03-13 23:23         ` Antonio Ojea
  0 siblings, 1 reply; 13+ messages in thread
From: Antonio Ojea @ 2025-03-12 18:29 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter

On Wed, 12 Mar 2025 at 19:20, Florian Westphal <fw@strlen.de> wrote:
>
> Antonio Ojea <antonio.ojea.garcia@gmail.com> wrote:
> > heh, I got bitten for that too, that is why I have
> >
> > echo PING >&3 && read line <&3
> >
> > the moment that you read it from the fd, it does not come back to the
> > socket and you break the loop.
>
> AFAIU the read could happen before socat managed to read from
> the pipe (so nothing is sent over network).
>
> Or there could have been several writes over the network,
> not just 'exactly one write'.

I tcpdumped to verify the behavior, but let me work on it and use a
different approach that is not so racy and random, I just wanted to
validate my assumption on the expected behavior

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: netfilter expected behavior for established connections
  2025-03-12 18:29       ` Antonio Ojea
@ 2025-03-13 23:23         ` Antonio Ojea
  0 siblings, 0 replies; 13+ messages in thread
From: Antonio Ojea @ 2025-03-13 23:23 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter

> I tcpdumped to verify the behavior, but let me work on it and use a
> different approach that is not so racy and random, I just wanted to
> validate my assumption on the expected behavior

There were also other bugs in this patch so please disregard it.
I submitted a new one in
https://lore.kernel.org/netfilter-devel/20250313231341.3040002-1-aojea@google.com/T/#u

I also realized that I need to use "reject with tcp reset" to close
the established connection, rejecting ICMP messages does not seem to
have any effect over established connections.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-03-13 23:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-11 23:56 netfilter expected behavior for established connections Antonio Ojea
2025-03-12  0:30 ` imnozi
2025-03-12  7:11 ` Florian Westphal
2025-03-12 10:55   ` Antonio Ojea
2025-03-12 12:51     ` Florian Westphal
2025-03-12 13:04       ` Antonio Ojea
2025-03-12 14:17         ` Antonio Ojea
2025-03-12 14:25           ` Florian Westphal
2025-03-12 16:13 ` Florian Westphal
2025-03-12 18:02   ` Antonio Ojea
2025-03-12 18:20     ` Florian Westphal
2025-03-12 18:29       ` Antonio Ojea
2025-03-13 23:23         ` Antonio Ojea

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.