* [PATCH net 0/4] netfilter: updates for net
@ 2025-10-08 12:59 Florian Westphal
2025-10-08 12:59 ` [PATCH net 1/4] netfilter: nft_objref: validate objref and objrefmap expressions Florian Westphal
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Florian Westphal @ 2025-10-08 12:59 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
The following patchset contains Netfilter fixes for *net*:
1) Fix crash (call recursion) when nftables synproxy extension is used
in an object map. When this feature was added in v5.4 the required
hook call validation was forgotten.
Fix from Fernando Fernandez Mancera.
2) bridge br_vlan_fill_forward_path_pvid uses incorrect
rcu_dereference_protected(); we only have rcu read lock but not
RTNL. Fix from Eric Woudstra.
Last two patches address flakes in two existing selftests.
Please, pull these changes from:
The following changes since commit 2c95a756e0cfc19af6d0b32b0c6cf3bada334998:
net: pse-pd: tps23881: Fix current measurement scaling (2025-10-07 18:30:53 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-10-08
for you to fetch changes up to e84945bdc619ed4243ba4298dbb8ca2062026474:
selftests: netfilter: query conntrack state to check for port clash resolution (2025-10-08 13:17:31 +0200)
----------------------------------------------------------------
netfilter pull request nf-25-10-08
----------------------------------------------------------------
Eric Woudstra (1):
bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
Fernando Fernandez Mancera (1):
netfilter: nft_objref: validate objref and objrefmap expressions
Florian Westphal (2):
selftests: netfilter: nft_fib.sh: fix spurious test failures
selftests: netfilter: query conntrack state to check for port clash resolution
net/bridge/br_vlan.c | 2 +-
net/netfilter/nft_objref.c | 39 +++++++++++++++
.../selftests/net/netfilter/nf_nat_edemux.sh | 58 +++++++++++++++-------
tools/testing/selftests/net/netfilter/nft_fib.sh | 13 +++--
4 files changed, 89 insertions(+), 23 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH net 1/4] netfilter: nft_objref: validate objref and objrefmap expressions
2025-10-08 12:59 [PATCH net 0/4] netfilter: updates for net Florian Westphal
@ 2025-10-08 12:59 ` Florian Westphal
2025-10-09 8:20 ` patchwork-bot+netdevbpf
2025-10-08 12:59 ` [PATCH net 2/4] bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu() Florian Westphal
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2025-10-08 12:59 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Fernando Fernandez Mancera <fmancera@suse.de>
Referencing a synproxy stateful object from OUTPUT hook causes kernel
crash due to infinite recursive calls:
BUG: TASK stack guard page was hit at 000000008bda5b8c (stack is 000000003ab1c4a5..00000000494d8b12)
[...]
Call Trace:
__find_rr_leaf+0x99/0x230
fib6_table_lookup+0x13b/0x2d0
ip6_pol_route+0xa4/0x400
fib6_rule_lookup+0x156/0x240
ip6_route_output_flags+0xc6/0x150
__nf_ip6_route+0x23/0x50
synproxy_send_tcp_ipv6+0x106/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
nft_synproxy_do_eval+0x263/0x310
nft_do_chain+0x5a8/0x5f0 [nf_tables
nft_do_chain_inet+0x98/0x110
nf_hook_slow+0x43/0xc0
__ip6_local_out+0xf0/0x170
ip6_local_out+0x17/0x70
synproxy_send_tcp_ipv6+0x1a2/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
[...]
Implement objref and objrefmap expression validate functions.
Currently, only NFT_OBJECT_SYNPROXY object type requires validation.
This will also handle a jump to a chain using a synproxy object from the
OUTPUT hook.
Now when trying to reference a synproxy object in the OUTPUT hook, nft
will produce the following error:
synproxy_crash.nft: Error: Could not process rule: Operation not supported
synproxy name mysynproxy
^^^^^^^^^^^^^^^^^^^^^^^^
Fixes: ee394f96ad75 ("netfilter: nft_synproxy: add synproxy stateful object support")
Reported-by: Georg Pfuetzenreuter <georg.pfuetzenreuter@suse.com>
Closes: https://bugzilla.suse.com/1250237
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_objref.c | 39 ++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index 8ee66a86c3bc..1a62e384766a 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -22,6 +22,35 @@ void nft_objref_eval(const struct nft_expr *expr,
obj->ops->eval(obj, regs, pkt);
}
+static int nft_objref_validate_obj_type(const struct nft_ctx *ctx, u32 type)
+{
+ unsigned int hooks;
+
+ switch (type) {
+ case NFT_OBJECT_SYNPROXY:
+ if (ctx->family != NFPROTO_IPV4 &&
+ ctx->family != NFPROTO_IPV6 &&
+ ctx->family != NFPROTO_INET)
+ return -EOPNOTSUPP;
+
+ hooks = (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD);
+
+ return nft_chain_validate_hooks(ctx->chain, hooks);
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static int nft_objref_validate(const struct nft_ctx *ctx,
+ const struct nft_expr *expr)
+{
+ struct nft_object *obj = nft_objref_priv(expr);
+
+ return nft_objref_validate_obj_type(ctx, obj->ops->type->type);
+}
+
static int nft_objref_init(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[])
@@ -93,6 +122,7 @@ static const struct nft_expr_ops nft_objref_ops = {
.activate = nft_objref_activate,
.deactivate = nft_objref_deactivate,
.dump = nft_objref_dump,
+ .validate = nft_objref_validate,
.reduce = NFT_REDUCE_READONLY,
};
@@ -197,6 +227,14 @@ static void nft_objref_map_destroy(const struct nft_ctx *ctx,
nf_tables_destroy_set(ctx, priv->set);
}
+static int nft_objref_map_validate(const struct nft_ctx *ctx,
+ const struct nft_expr *expr)
+{
+ const struct nft_objref_map *priv = nft_expr_priv(expr);
+
+ return nft_objref_validate_obj_type(ctx, priv->set->objtype);
+}
+
static const struct nft_expr_ops nft_objref_map_ops = {
.type = &nft_objref_type,
.size = NFT_EXPR_SIZE(sizeof(struct nft_objref_map)),
@@ -206,6 +244,7 @@ static const struct nft_expr_ops nft_objref_map_ops = {
.deactivate = nft_objref_map_deactivate,
.destroy = nft_objref_map_destroy,
.dump = nft_objref_map_dump,
+ .validate = nft_objref_map_validate,
.reduce = NFT_REDUCE_READONLY,
};
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH net 1/4] netfilter: nft_objref: validate objref and objrefmap expressions
2025-10-08 12:59 ` [PATCH net 1/4] netfilter: nft_objref: validate objref and objrefmap expressions Florian Westphal
@ 2025-10-09 8:20 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 14+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-09 8:20 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo
Hello:
This series was applied to netdev/net.git (main)
by Florian Westphal <fw@strlen.de>:
On Wed, 8 Oct 2025 14:59:39 +0200 you wrote:
> From: Fernando Fernandez Mancera <fmancera@suse.de>
>
> Referencing a synproxy stateful object from OUTPUT hook causes kernel
> crash due to infinite recursive calls:
>
> BUG: TASK stack guard page was hit at 000000008bda5b8c (stack is 000000003ab1c4a5..00000000494d8b12)
> [...]
> Call Trace:
> __find_rr_leaf+0x99/0x230
> fib6_table_lookup+0x13b/0x2d0
> ip6_pol_route+0xa4/0x400
> fib6_rule_lookup+0x156/0x240
> ip6_route_output_flags+0xc6/0x150
> __nf_ip6_route+0x23/0x50
> synproxy_send_tcp_ipv6+0x106/0x200
> synproxy_send_client_synack_ipv6+0x1aa/0x1f0
> nft_synproxy_do_eval+0x263/0x310
> nft_do_chain+0x5a8/0x5f0 [nf_tables
> nft_do_chain_inet+0x98/0x110
> nf_hook_slow+0x43/0xc0
> __ip6_local_out+0xf0/0x170
> ip6_local_out+0x17/0x70
> synproxy_send_tcp_ipv6+0x1a2/0x200
> synproxy_send_client_synack_ipv6+0x1aa/0x1f0
> [...]
>
> [...]
Here is the summary with links:
- [net,1/4] netfilter: nft_objref: validate objref and objrefmap expressions
https://git.kernel.org/netdev/net/c/f359b809d54c
- [net,2/4] bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
https://git.kernel.org/netdev/net/c/bbf0c98b3ad9
- [net,3/4] selftests: netfilter: nft_fib.sh: fix spurious test failures
https://git.kernel.org/netdev/net/c/a126ab6b26f1
- [net,4/4] selftests: netfilter: query conntrack state to check for port clash resolution
https://git.kernel.org/netdev/net/c/e84945bdc619
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net 2/4] bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu()
2025-10-08 12:59 [PATCH net 0/4] netfilter: updates for net Florian Westphal
2025-10-08 12:59 ` [PATCH net 1/4] netfilter: nft_objref: validate objref and objrefmap expressions Florian Westphal
@ 2025-10-08 12:59 ` Florian Westphal
2025-10-08 12:59 ` [PATCH net 3/4] selftests: netfilter: nft_fib.sh: fix spurious test failures Florian Westphal
2025-10-08 12:59 ` [PATCH net 4/4] selftests: netfilter: query conntrack state to check for port clash resolution Florian Westphal
3 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-10-08 12:59 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Eric Woudstra <ericwouds@gmail.com>
net/bridge/br_private.h:1627 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
7 locks held by socat/410:
#0: ffff88800d7a9c90 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_stream_connect+0x43/0xa0
#1: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x62/0x1830
[..]
#6: ffffffff9a779900 (rcu_read_lock){....}-{1:3}, at: nf_hook.constprop.0+0x8a/0x440
Call Trace:
lockdep_rcu_suspicious.cold+0x4f/0xb1
br_vlan_fill_forward_path_pvid+0x32c/0x410 [bridge]
br_fill_forward_path+0x7a/0x4d0 [bridge]
Use to correct helper, non _rcu variant requires RTNL mutex.
Fixes: bcf2766b1377 ("net: bridge: resolve forwarding path for VLAN tag actions in bridge devices")
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/bridge/br_vlan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index ae911220cb3c..ce72b837ff8e 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1457,7 +1457,7 @@ void br_vlan_fill_forward_path_pvid(struct net_bridge *br,
if (!br_opt_get(br, BROPT_VLAN_ENABLED))
return;
- vg = br_vlan_group(br);
+ vg = br_vlan_group_rcu(br);
if (idx >= 0 &&
ctx->vlan[idx].proto == br->vlan_proto) {
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net 3/4] selftests: netfilter: nft_fib.sh: fix spurious test failures
2025-10-08 12:59 [PATCH net 0/4] netfilter: updates for net Florian Westphal
2025-10-08 12:59 ` [PATCH net 1/4] netfilter: nft_objref: validate objref and objrefmap expressions Florian Westphal
2025-10-08 12:59 ` [PATCH net 2/4] bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu() Florian Westphal
@ 2025-10-08 12:59 ` Florian Westphal
2025-10-08 12:59 ` [PATCH net 4/4] selftests: netfilter: query conntrack state to check for port clash resolution Florian Westphal
3 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-10-08 12:59 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Jakub reports spurious failure of nft_fib.sh test.
This is caused by a subtle bug inherited when i moved faulty ping
from one test case to another.
nft_fib.sh not only checks that the fib expression matched, it also
records the number of matches and then validates we have the expected
count. When I did this it was under the assumption that we would
have 0 to n matching packets. In case of the failure, the entry has
n+1 matching packets.
This happens because ping_unreachable helper uses "ping -c 1 -w 1",
instead of the intended "-W". -w alters the meaning of -c (count),
namely, its then treated as number of wanted *replies* instead of
"number of packets to send".
So, in some cases, ping -c 1 -w 1 ends up sending two packets which then
makes the test fail due to the higher-than-expected packet count.
Fix the actual bug (s/-w/-W) and also change the error handling:
1. Show the number of expected packets in the error message
2. Always try to delete the key from the set.
Else, later test that makes sure we don't have unexpected keys
in there will always fail as well.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netfilter-devel/20250927090709.0b3cd783@kernel.org/
Fixes: 98287045c979 ("selftests: netfilter: move fib vrf test to nft_fib.sh")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
tools/testing/selftests/net/netfilter/nft_fib.sh | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/nft_fib.sh b/tools/testing/selftests/net/netfilter/nft_fib.sh
index 9929a9ffef65..04544905c216 100755
--- a/tools/testing/selftests/net/netfilter/nft_fib.sh
+++ b/tools/testing/selftests/net/netfilter/nft_fib.sh
@@ -256,12 +256,12 @@ test_ping_unreachable() {
local daddr4=$1
local daddr6=$2
- if ip netns exec "$ns1" ping -c 1 -w 1 -q "$daddr4" > /dev/null; then
+ if ip netns exec "$ns1" ping -c 1 -W 0.1 -q "$daddr4" > /dev/null; then
echo "FAIL: ${ns1} could reach $daddr4" 1>&2
return 1
fi
- if ip netns exec "$ns1" ping -c 1 -w 1 -q "$daddr6" > /dev/null; then
+ if ip netns exec "$ns1" ping -c 1 -W 0.1 -q "$daddr6" > /dev/null; then
echo "FAIL: ${ns1} could reach $daddr6" 1>&2
return 1
fi
@@ -437,14 +437,17 @@ check_type()
local addr="$3"
local type="$4"
local count="$5"
+ local lret=0
[ -z "$count" ] && count=1
if ! ip netns exec "$nsrouter" nft get element inet t "$setname" { "$iifname" . "$addr" . "$type" } |grep -q "counter packets $count";then
- echo "FAIL: did not find $iifname . $addr . $type in $setname"
+ echo "FAIL: did not find $iifname . $addr . $type in $setname with $count packets"
ip netns exec "$nsrouter" nft list set inet t "$setname"
ret=1
- return 1
+ # do not fail right away, delete entry if it exists so later test that
+ # checks for unwanted keys don't get confused by this *expected* key.
+ lret=1
fi
# delete the entry, this allows to check if anything unexpected appeared
@@ -456,7 +459,7 @@ check_type()
return 1
fi
- return 0
+ return $lret
}
check_local()
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH net 4/4] selftests: netfilter: query conntrack state to check for port clash resolution
2025-10-08 12:59 [PATCH net 0/4] netfilter: updates for net Florian Westphal
` (2 preceding siblings ...)
2025-10-08 12:59 ` [PATCH net 3/4] selftests: netfilter: nft_fib.sh: fix spurious test failures Florian Westphal
@ 2025-10-08 12:59 ` Florian Westphal
3 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-10-08 12:59 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Jakub reported this self test flaking occasionally (fails, but passes on
re-run) on debug kernels.
This is because the test checks for elapsed time to determine if both
connections were established in parallel.
Rework this to no longer depend on timing.
Use busywait helper to check that both sockets have moved to established
state and then query the conntrack engine for the two entries.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netfilter-devel/20250926163318.40d1a502@kernel.org/
Fixes: 117e149e26d1 ("selftests: netfilter: test nat source port clash resolution interaction with tcp early demux")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../selftests/net/netfilter/nf_nat_edemux.sh | 58 +++++++++++++------
1 file changed, 41 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/nf_nat_edemux.sh b/tools/testing/selftests/net/netfilter/nf_nat_edemux.sh
index 1014551dd769..6731fe1eaf2e 100755
--- a/tools/testing/selftests/net/netfilter/nf_nat_edemux.sh
+++ b/tools/testing/selftests/net/netfilter/nf_nat_edemux.sh
@@ -17,9 +17,31 @@ cleanup()
checktool "socat -h" "run test without socat"
checktool "iptables --version" "run test without iptables"
+checktool "conntrack --version" "run test without conntrack"
trap cleanup EXIT
+connect_done()
+{
+ local ns="$1"
+ local port="$2"
+
+ ip netns exec "$ns" ss -nt -o state established "dport = :$port" | grep -q "$port"
+}
+
+check_ctstate()
+{
+ local ns="$1"
+ local dp="$2"
+
+ if ! ip netns exec "$ns" conntrack --get -s 192.168.1.2 -d 192.168.1.1 -p tcp \
+ --sport 10000 --dport "$dp" --state ESTABLISHED > /dev/null 2>&1;then
+ echo "FAIL: Did not find expected state for dport $2"
+ ip netns exec "$ns" bash -c 'conntrack -L; conntrack -S; ss -nt'
+ ret=1
+ fi
+}
+
setup_ns ns1 ns2
# Connect the namespaces using a veth pair
@@ -44,15 +66,18 @@ socatpid=$!
ip netns exec "$ns2" sysctl -q net.ipv4.ip_local_port_range="10000 10000"
# add a virtual IP using DNAT
-ip netns exec "$ns2" iptables -t nat -A OUTPUT -d 10.96.0.1/32 -p tcp --dport 443 -j DNAT --to-destination 192.168.1.1:5201
+ip netns exec "$ns2" iptables -t nat -A OUTPUT -d 10.96.0.1/32 -p tcp --dport 443 -j DNAT --to-destination 192.168.1.1:5201 || exit 1
# ... and route it to the other namespace
ip netns exec "$ns2" ip route add 10.96.0.1 via 192.168.1.1
-# add a persistent connection from the other namespace
-ip netns exec "$ns2" socat -t 10 - TCP:192.168.1.1:5201 > /dev/null &
+# listener should be up by now, wait if it isn't yet.
+wait_local_port_listen "$ns1" 5201 tcp
-sleep 1
+# add a persistent connection from the other namespace
+sleep 10 | ip netns exec "$ns2" socat -t 10 - TCP:192.168.1.1:5201 > /dev/null &
+cpid0=$!
+busywait "$BUSYWAIT_TIMEOUT" connect_done "$ns2" "5201"
# ip daddr:dport will be rewritten to 192.168.1.1 5201
# NAT must reallocate source port 10000 because
@@ -71,26 +96,25 @@ fi
ip netns exec "$ns1" iptables -t nat -A PREROUTING -p tcp --dport 5202 -j REDIRECT --to-ports 5201
ip netns exec "$ns1" iptables -t nat -A PREROUTING -p tcp --dport 5203 -j REDIRECT --to-ports 5201
-sleep 5 | ip netns exec "$ns2" socat -t 5 -u STDIN TCP:192.168.1.1:5202,connect-timeout=5 >/dev/null &
+sleep 5 | ip netns exec "$ns2" socat -T 5 -u STDIN TCP:192.168.1.1:5202,connect-timeout=5 >/dev/null &
+cpid1=$!
-# if connect succeeds, client closes instantly due to EOF on stdin.
-# if connect hangs, it will time out after 5s.
-echo | ip netns exec "$ns2" socat -t 3 -u STDIN TCP:192.168.1.1:5203,connect-timeout=5 >/dev/null &
+sleep 5 | ip netns exec "$ns2" socat -T 5 -u STDIN TCP:192.168.1.1:5203,connect-timeout=5 >/dev/null &
cpid2=$!
-time_then=$(date +%s)
-wait $cpid2
-rv=$?
-time_now=$(date +%s)
+busywait "$BUSYWAIT_TIMEOUT" connect_done "$ns2" 5202
+busywait "$BUSYWAIT_TIMEOUT" connect_done "$ns2" 5203
-# Check how much time has elapsed, expectation is for
-# 'cpid2' to connect and then exit (and no connect delay).
-delta=$((time_now - time_then))
+check_ctstate "$ns1" 5202
+check_ctstate "$ns1" 5203
-if [ $delta -lt 2 ] && [ $rv -eq 0 ]; then
+kill $socatpid $cpid0 $cpid1 $cpid2
+socatpid=0
+
+if [ $ret -eq 0 ]; then
echo "PASS: could connect to service via redirected ports"
else
- echo "FAIL: socat cannot connect to service via redirect ($delta seconds elapsed, returned $rv)"
+ echo "FAIL: socat cannot connect to service via redirect"
ret=1
fi
--
2.49.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH net 0/4] netfilter: updates for net
@ 2026-03-04 17:29 Florian Westphal
2026-03-04 21:57 ` Pablo Neira Ayuso
2026-03-05 12:21 ` Florian Westphal
0 siblings, 2 replies; 14+ messages in thread
From: Florian Westphal @ 2026-03-04 17:29 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi,
The following patchset contains Netfilter fixes for *net*:
1) Fix a bug with vlan headers in the flowtable infrastructure.
Existing code uses skb_vlan_push() helper, but that helper
requires skb->data to point to the MAC header, which isn't the
case for flowtables. Switch to a new helper, modeled on the
existing PPPoE helper. From Eric Woudstra. This bug was added
in v6.19-rc1.
2) Inseo An reported a bug with the set element handling in nf_tables:
When set cannot accept more elements, we unlink and immediately free
an element that was inserted into a public data structure, freeing it
without waiting for RCU grace period. Fix this by doing the
increment earlier and by deferring possible unlink-and-free to the
existing abort path, which performs the needed synchronize_rcu before
free. From Pablo Neira Ayuso. This is an ancient bug, dating back to
kernel 4.10.
3) syzbot reported WARN_ON() splat in nf_tables that occurs on memory
allocation failure. Fix this by a new iterator annotation:
The affected walker does not need to clone the data structure and
can just use the live version if no clone exists yet.
Also from Pablo. This bug existed since 6.10 days.
4) Ancient forever bug in nft_pipapo data structure:
The garbage collection logic to remove expired elements is broken.
We must unlink from data structure and can only hand the freeing
to call_rcu after the clone/live pointers of the data structures
have been swapped. Else, readers can observe the free'd element.
Reported by Yiming Qian.
Please, pull these changes from:
The following changes since commit fbdfa8da05b6ae44114fc4f9b3e83e1736fd411c:
selftests: tc-testing: fix list_categories() crash on list type (2026-03-04 05:42:57 +0000)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-03-04
for you to fetch changes up to 41c5c0124bd9528c32c9ebd5f8b8f8eb800e77c3:
netfilter: nft_set_pipapo: split gc into unlink and reclaim phase (2026-03-04 15:39:33 +0100)
----------------------------------------------------------------
netfilter pull request nf-26-03-04
----------------------------------------------------------------
Eric Woudstra (1):
netfilter: nf_flow_table_ip: Introduce nf_flow_vlan_push()
Florian Westphal (1):
netfilter: nft_set_pipapo: split gc into unlink and reclaim phase
Pablo Neira Ayuso (2):
netfilter: nf_tables: unconditionally bump set->nelems before insertion
netfilter: nf_tables: clone set on flush only
include/net/netfilter/nf_tables.h | 7 ++++
net/netfilter/nf_flow_table_ip.c | 25 ++++++++++++-
net/netfilter/nf_tables_api.c | 45 ++++++++++++----------
net/netfilter/nft_set_hash.c | 1 +
net/netfilter/nft_set_pipapo.c | 62 ++++++++++++++++++++++++++-----
net/netfilter/nft_set_pipapo.h | 2 +
net/netfilter/nft_set_rbtree.c | 8 ++--
7 files changed, 115 insertions(+), 35 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
@ 2026-03-04 21:57 ` Pablo Neira Ayuso
2026-03-05 9:05 ` Florian Westphal
2026-03-05 12:21 ` Florian Westphal
1 sibling, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2026-03-04 21:57 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
Hi Florian,
On Wed, Mar 04, 2026 at 06:29:36PM +0100, Florian Westphal wrote:
> Hi,
>
> The following patchset contains Netfilter fixes for *net*:
>
> 1) Fix a bug with vlan headers in the flowtable infrastructure.
> Existing code uses skb_vlan_push() helper, but that helper
> requires skb->data to point to the MAC header, which isn't the
> case for flowtables. Switch to a new helper, modeled on the
> existing PPPoE helper. From Eric Woudstra. This bug was added
> in v6.19-rc1.
In patch 1/4, why is this new function so different wrt. skb_vlan_push?
int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
{
if (skb_vlan_tag_present(skb)) {
int offset = skb->data - skb_mac_header(skb);
int err;
if (WARN_ONCE(offset,
"skb_vlan_push got skb with skb->data not at mac header (offset %d)\n",
offset)) {
return -EINVAL;
}
err = __vlan_insert_tag(skb, skb->vlan_proto,
skb_vlan_tag_get(skb));
if (err)
return err;
skb->protocol = skb->vlan_proto;
skb->network_header -= VLAN_HLEN;
skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
}
__vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
In case there are two VLANs, the existing in hwaccel gets pushed into
the VLAN header, and the outer VLAN becomes the one that is offloaded?
Is this reversed in this patch? The first VLAN tag is offloaded, then
the next one coming is pushed as a VLAN header?
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-04 21:57 ` Pablo Neira Ayuso
@ 2026-03-05 9:05 ` Florian Westphal
2026-03-05 9:40 ` Pablo Neira Ayuso
0 siblings, 1 reply; 14+ messages in thread
From: Florian Westphal @ 2026-03-05 9:05 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Florian,
>
> On Wed, Mar 04, 2026 at 06:29:36PM +0100, Florian Westphal wrote:
> > Hi,
> >
> > The following patchset contains Netfilter fixes for *net*:
> >
> > 1) Fix a bug with vlan headers in the flowtable infrastructure.
> > Existing code uses skb_vlan_push() helper, but that helper
> > requires skb->data to point to the MAC header, which isn't the
> > case for flowtables. Switch to a new helper, modeled on the
> > existing PPPoE helper. From Eric Woudstra. This bug was added
> > in v6.19-rc1.
>
> In patch 1/4, why is this new function so different wrt. skb_vlan_push?
>
I asked that to Eric when I reviewed this, and that was his reply:
--------------------------------------------------------------------
The code here for the inner header is an almost exact copy of
nf_flow_pppoe_push(), which was also implemented at the same time.
So handling pppoe and inner-vlan header is implemented in the same
manner, which keeps it simple and uniform. If one functions
(in)correctly, then so would the other.
I've been implementing handling the inner vlan header like this for a
half year now. My version of nf_flow_encap_push() was a bit different,
but after this patch it is quite similar.
--------------------------------------------------------------------
> skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
> }
> __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
>
>
> In case there are two VLANs, the existing in hwaccel gets pushed into
> the VLAN header, and the outer VLAN becomes the one that is offloaded?
>
> Is this reversed in this patch? The first VLAN tag is offloaded, then
> the next one coming is pushed as a VLAN header?
Yes, it looks broken. I wonder why we have no tests for this stuff.
First a vlan push function that cannot have worked, ever, now this
seemingly reversing-headers variant:
For PPPOE, its pushing the ppppe header to packet, so we get
strict ordering, later header coming in the stack gets placed on
top, before older one.
Here, first vlan push gets placed into hw tag in skb (which makes
sense, let HW take care of it).
But if 2nd comes along, then that gets placed in the packet
and the hwaccel tag remains?
What to do? Should be nuke vlan offload support from flowtable?
It appears to be an unused feature.
I have low confidence in this code.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-05 9:05 ` Florian Westphal
@ 2026-03-05 9:40 ` Pablo Neira Ayuso
2026-03-05 12:20 ` Florian Westphal
0 siblings, 1 reply; 14+ messages in thread
From: Pablo Neira Ayuso @ 2026-03-05 9:40 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
On Thu, Mar 05, 2026 at 10:05:15AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Florian,
> >
> > On Wed, Mar 04, 2026 at 06:29:36PM +0100, Florian Westphal wrote:
> > > Hi,
> > >
> > > The following patchset contains Netfilter fixes for *net*:
> > >
> > > 1) Fix a bug with vlan headers in the flowtable infrastructure.
> > > Existing code uses skb_vlan_push() helper, but that helper
> > > requires skb->data to point to the MAC header, which isn't the
> > > case for flowtables. Switch to a new helper, modeled on the
> > > existing PPPoE helper. From Eric Woudstra. This bug was added
> > > in v6.19-rc1.
> >
> > In patch 1/4, why is this new function so different wrt. skb_vlan_push?
> >
>
> I asked that to Eric when I reviewed this, and that was his reply:
> --------------------------------------------------------------------
> The code here for the inner header is an almost exact copy of
> nf_flow_pppoe_push(), which was also implemented at the same time.
> So handling pppoe and inner-vlan header is implemented in the same
> manner, which keeps it simple and uniform. If one functions
> (in)correctly, then so would the other.
>
> I've been implementing handling the inner vlan header like this for a
> half year now. My version of nf_flow_encap_push() was a bit different,
> but after this patch it is quite similar.
> --------------------------------------------------------------------
>
> > skb_postpush_rcsum(skb, skb->data + (2 * ETH_ALEN), VLAN_HLEN);
> > }
> > __vlan_hwaccel_put_tag(skb, vlan_proto, vlan_tci);
> >
> >
> > In case there are two VLANs, the existing in hwaccel gets pushed into
> > the VLAN header, and the outer VLAN becomes the one that is offloaded?
> >
> > Is this reversed in this patch? The first VLAN tag is offloaded, then
> > the next one coming is pushed as a VLAN header?
>
> Yes, it looks broken. I wonder why we have no tests for this stuff.
> First a vlan push function that cannot have worked, ever, now this
> seemingly reversing-headers variant:
This used to work, I just accidentally broke it when using
skb_vlan_push() in net-next.
I will post fix.
> For PPPOE, its pushing the ppppe header to packet, so we get
> strict ordering, later header coming in the stack gets placed on
> top, before older one.
>
> Here, first vlan push gets placed into hw tag in skb (which makes
> sense, let HW take care of it).
>
> But if 2nd comes along, then that gets placed in the packet
> and the hwaccel tag remains?
>
> What to do? Should be nuke vlan offload support from flowtable?
> It appears to be an unused feature.
>
> I have low confidence in this code.
Could you elaborate more precisely?
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-05 9:40 ` Pablo Neira Ayuso
@ 2026-03-05 12:20 ` Florian Westphal
0 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2026-03-05 12:20 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netdev, Paolo Abeni, David S. Miller, Eric Dumazet,
Jakub Kicinski, netfilter-devel
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Yes, it looks broken. I wonder why we have no tests for this stuff.
> > First a vlan push function that cannot have worked, ever, now this
> > seemingly reversing-headers variant:
>
> This used to work, I just accidentally broke it when using
> skb_vlan_push() in net-next.
>
> I will post fix.
Ok, thanks.
> > For PPPOE, its pushing the ppppe header to packet, so we get
> > strict ordering, later header coming in the stack gets placed on
> > top, before older one.
> >
> > Here, first vlan push gets placed into hw tag in skb (which makes
> > sense, let HW take care of it).
> >
> > But if 2nd comes along, then that gets placed in the packet
> > and the hwaccel tag remains?
> >
> > What to do? Should be nuke vlan offload support from flowtable?
> > It appears to be an unused feature.
> >
> > I have low confidence in this code.
>
> Could you elaborate more precisely?
Add bug in nf_queue -> kselftest will likely barf
Add bug in nf_tables control plane -> nftables shell and/or
python tests will likely barf
Add bug in conntrack -> kselftest will likely barf
Add new bug in flowtable vlan -> nada.
I think we should refuse both new features and refactoring patches going
forward unless they come with either update to existing kselftest, or a
new test or a test in nftables.git.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH net 0/4] netfilter: updates for net
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
2026-03-04 21:57 ` Pablo Neira Ayuso
@ 2026-03-05 12:21 ` Florian Westphal
1 sibling, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2026-03-05 12:21 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Florian Westphal <fw@strlen.de> wrote:
> 1) Fix a bug with vlan headers in the flowtable infrastructure.
> Existing code uses skb_vlan_push() helper, but that helper
> requires skb->data to point to the MAC header, which isn't the
> case for flowtables. Switch to a new helper, modeled on the
> existing PPPoE helper. From Eric Woudstra. This bug was added
> in v6.19-rc1.
Please toss this MR, I will create a new one in a few minutes,
axing this fix from the series.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH net 0/4] netfilter: updates for net
@ 2025-12-10 11:07 Florian Westphal
0 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2025-12-10 11:07 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi,
The following patchset contains Netfilter fixes for *net*:
1) Fix refcount leaks in nf_conncount, from Fernando Fernandez Mancera.
This addresses a recent regression that came in the last -next
pull request.
2) Fix a null dereference in route error handling in IPVS, from Slavin
Liu. This is an ancient issue dating back to 5.1 days.
3) Always set ifindex in route tuple in the flowtable output path, from
Lorenzo Bianconi. This bug came in with the recent output path refactoring.
4) Prefer 'exit $ksft_xfail' over 'exit $ksft_skip' when we fail to
trigger a nat race condition to exercise the clash resolution path in
selftest infra, $ksft_skip should be reserved for missing tooling,
From myself.
Please, pull these changes from:
The following changes since commit 6bcb7727d9e612011b70d64a34401688b986d6ab:
Merge branch 'inet-frags-flush-pending-skbs-in-fqdir_pre_exit' (2025-12-10 01:15:33 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-12-10
for you to fetch changes up to b8a81b0ce539e021ac72825238aea1eb657000f0:
selftests: netfilter: prefer xfail in case race wasn't triggered (2025-12-10 11:55:59 +0100)
----------------------------------------------------------------
netfilter pull request nf-25-12-10
----------------------------------------------------------------
Fernando Fernandez Mancera (1):
netfilter: nf_conncount: fix leaked ct in error paths
Florian Westphal (1):
selftests: netfilter: prefer xfail in case race wasn't triggered
Lorenzo Bianconi (1):
netfilter: always set route tuple out ifindex
Slavin Liu (1):
ipvs: fix ipv4 null-ptr-deref in route error path
net/netfilter/ipvs/ip_vs_xmit.c | 3 +++
net/netfilter/nf_conncount.c | 25 ++++++++++++----------
net/netfilter/nf_flow_table_path.c | 4 +++-
.../selftests/net/netfilter/conntrack_clash.sh | 9 ++++----
4 files changed, 24 insertions(+), 17 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH net 0/4] netfilter: updates for net
@ 2023-10-18 12:55 Florian Westphal
0 siblings, 0 replies; 14+ messages in thread
From: Florian Westphal @ 2023-10-18 12:55 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel
Hello,
This series contains fixes for your *net* tree.
First patch, from Phil Sutter, reduces number of audit notifications
when userspace requests to re-set stateful objects.
This change also comes with a selftest update.
Second patch, also from Phil, moves the nftables audit selftest
to its own netns to avoid interference with the init netns.
Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree"
set backend: When set element X has expired, a request to delete element
X should fail (like with all other backends).
Finally, patch four, also from Pablo, reverts a recent attempt to speed
up abort of a large pending update with the "pipapo" set backend.
It could cause stray references to remain in the set, which then
results in a double-free.
The following changes since commit 2915240eddba96b37de4c7e9a3d0ac6f9548454b:
neighbor: tracing: Move pin6 inside CONFIG_IPV6=y section (2023-10-18 11:16:43 +0100)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-10-18
for you to fetch changes up to f86fb94011aeb3b26337fc22204ca726aeb8bc24:
netfilter: nf_tables: revert do not remove elements if set backend implements .abort (2023-10-18 13:47:32 +0200)
----------------------------------------------------------------
netfilter pr 2023-18-10
----------------------------------------------------------------
Pablo Neira Ayuso (2):
netfilter: nft_set_rbtree: .deactivate fails if element has expired
netfilter: nf_tables: revert do not remove elements if set backend implements .abort
Phil Sutter (2):
netfilter: nf_tables: audit log object reset once per table
selftests: netfilter: Run nft_audit.sh in its own netns
net/netfilter/nf_tables_api.c | 55 ++++++++++++++------------
net/netfilter/nft_set_rbtree.c | 2 +
tools/testing/selftests/netfilter/nft_audit.sh | 52 ++++++++++++++++++++++++
3 files changed, 83 insertions(+), 26 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-03-05 12:21 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 12:59 [PATCH net 0/4] netfilter: updates for net Florian Westphal
2025-10-08 12:59 ` [PATCH net 1/4] netfilter: nft_objref: validate objref and objrefmap expressions Florian Westphal
2025-10-09 8:20 ` patchwork-bot+netdevbpf
2025-10-08 12:59 ` [PATCH net 2/4] bridge: br_vlan_fill_forward_path_pvid: use br_vlan_group_rcu() Florian Westphal
2025-10-08 12:59 ` [PATCH net 3/4] selftests: netfilter: nft_fib.sh: fix spurious test failures Florian Westphal
2025-10-08 12:59 ` [PATCH net 4/4] selftests: netfilter: query conntrack state to check for port clash resolution Florian Westphal
-- strict thread matches above, loose matches on Subject: below --
2026-03-04 17:29 [PATCH net 0/4] netfilter: updates for net Florian Westphal
2026-03-04 21:57 ` Pablo Neira Ayuso
2026-03-05 9:05 ` Florian Westphal
2026-03-05 9:40 ` Pablo Neira Ayuso
2026-03-05 12:20 ` Florian Westphal
2026-03-05 12:21 ` Florian Westphal
2025-12-10 11:07 Florian Westphal
2023-10-18 12:55 Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox