* [PATCH net 0/7] Netfilter updates for net
@ 2023-05-10 8:33 Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 1/7] netfilter: nf_tables: always release netdev hooks from notifier Pablo Neira Ayuso
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
Hi,
The following patchset contains Netfilter fixes for net:
1) Fix UAF when releasing netnamespace, from Florian Westphal.
2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
from Florian Westphal.
3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
4) Extend nft_flowtable.sh selftest to cover integration with
ingress/egress hooks, from Florian Westphal.
Please, pull these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-05-10
Thanks.
----------------------------------------------------------------
The following changes since commit 582dbb2cc1a0a7427840f5b1e3c65608e511b061:
net: phy: bcm7xx: Correct read from expansion register (2023-05-09 20:25:52 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-05-10
for you to fetch changes up to 3acf8f6c14d0e42b889738d63b6d9cb63348fc94:
selftests: nft_flowtable.sh: check ingress/egress chain too (2023-05-10 09:31:07 +0200)
----------------------------------------------------------------
netfilter pull request 23-05-10
----------------------------------------------------------------
Boris Sukholitko (4):
selftests: nft_flowtable.sh: use /proc for pid checking
selftests: nft_flowtable.sh: no need for ps -x option
selftests: nft_flowtable.sh: wait for specific nc pids
selftests: nft_flowtable.sh: monitor result file sizes
Florian Westphal (3):
netfilter: nf_tables: always release netdev hooks from notifier
netfilter: conntrack: fix possible bug_on with enable_hooks=1
selftests: nft_flowtable.sh: check ingress/egress chain too
net/netfilter/core.c | 6 +-
net/netfilter/nf_conntrack_standalone.c | 3 +-
net/netfilter/nft_chain_filter.c | 9 +-
tools/testing/selftests/netfilter/nft_flowtable.sh | 145 ++++++++++++++++++++-
4 files changed, 151 insertions(+), 12 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH net 1/7] netfilter: nf_tables: always release netdev hooks from notifier
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
@ 2023-05-10 8:33 ` Pablo Neira Ayuso
2023-05-11 2:20 ` patchwork-bot+netdevbpf
2023-05-10 8:33 ` [PATCH net 2/7] netfilter: conntrack: fix possible bug_on with enable_hooks=1 Pablo Neira Ayuso
` (5 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
From: Florian Westphal <fw@strlen.de>
This reverts "netfilter: nf_tables: skip netdev events generated on netns removal".
The problem is that when a veth device is released, the veth release
callback will also queue the peer netns device for removal.
Its possible that the peer netns is also slated for removal. In this
case, the device memory is already released before the pre_exit hook of
the peer netns runs:
BUG: KASAN: slab-use-after-free in nf_hook_entry_head+0x1b8/0x1d0
Read of size 8 at addr ffff88812c0124f0 by task kworker/u8:1/45
Workqueue: netns cleanup_net
Call Trace:
nf_hook_entry_head+0x1b8/0x1d0
__nf_unregister_net_hook+0x76/0x510
nft_netdev_unregister_hooks+0xa0/0x220
__nft_release_hook+0x184/0x490
nf_tables_pre_exit_net+0x12f/0x1b0
..
Order is:
1. First netns is released, veth_dellink() queues peer netns device
for removal
2. peer netns is queued for removal
3. peer netns device is released, unreg event is triggered
4. unreg event is ignored because netns is going down
5. pre_exit hook calls nft_netdev_unregister_hooks but device memory
might be free'd already.
Fixes: 68a3765c659f ("netfilter: nf_tables: skip netdev events generated on netns removal")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_chain_filter.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index c3563f0be269..680fe557686e 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -344,6 +344,12 @@ static void nft_netdev_event(unsigned long event, struct net_device *dev,
return;
}
+ /* UNREGISTER events are also happening on netns exit.
+ *
+ * Although nf_tables core releases all tables/chains, only this event
+ * handler provides guarantee that hook->ops.dev is still accessible,
+ * so we cannot skip exiting net namespaces.
+ */
__nft_release_basechain(ctx);
}
@@ -362,9 +368,6 @@ static int nf_tables_netdev_event(struct notifier_block *this,
event != NETDEV_CHANGENAME)
return NOTIFY_DONE;
- if (!check_net(ctx.net))
- return NOTIFY_DONE;
-
nft_net = nft_pernet(ctx.net);
mutex_lock(&nft_net->commit_mutex);
list_for_each_entry(table, &nft_net->tables, list) {
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH net 1/7] netfilter: nf_tables: always release netdev hooks from notifier
2023-05-10 8:33 ` [PATCH net 1/7] netfilter: nf_tables: always release netdev hooks from notifier Pablo Neira Ayuso
@ 2023-05-11 2:20 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-05-11 2:20 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet
Hello:
This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:
On Wed, 10 May 2023 10:33:07 +0200 you wrote:
> From: Florian Westphal <fw@strlen.de>
>
> This reverts "netfilter: nf_tables: skip netdev events generated on netns removal".
>
> The problem is that when a veth device is released, the veth release
> callback will also queue the peer netns device for removal.
>
> [...]
Here is the summary with links:
- [net,1/7] netfilter: nf_tables: always release netdev hooks from notifier
https://git.kernel.org/netdev/net/c/dc1c9fd4a8bb
- [net,2/7] netfilter: conntrack: fix possible bug_on with enable_hooks=1
https://git.kernel.org/netdev/net/c/e72eeab542db
- [net,3/7] selftests: nft_flowtable.sh: use /proc for pid checking
https://git.kernel.org/netdev/net/c/0a11073e8e33
- [net,4/7] selftests: nft_flowtable.sh: no need for ps -x option
https://git.kernel.org/netdev/net/c/0749d670d758
- [net,5/7] selftests: nft_flowtable.sh: wait for specific nc pids
https://git.kernel.org/netdev/net/c/1114803c2da9
- [net,6/7] selftests: nft_flowtable.sh: monitor result file sizes
https://git.kernel.org/netdev/net/c/90ab51226d52
- [net,7/7] selftests: nft_flowtable.sh: check ingress/egress chain too
https://git.kernel.org/netdev/net/c/3acf8f6c14d0
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net 2/7] netfilter: conntrack: fix possible bug_on with enable_hooks=1
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 1/7] netfilter: nf_tables: always release netdev hooks from notifier Pablo Neira Ayuso
@ 2023-05-10 8:33 ` Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 3/7] selftests: nft_flowtable.sh: use /proc for pid checking Pablo Neira Ayuso
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
From: Florian Westphal <fw@strlen.de>
I received a bug report (no reproducer so far) where we trip over
712 rcu_read_lock();
713 ct_hook = rcu_dereference(nf_ct_hook);
714 BUG_ON(ct_hook == NULL); // here
In nf_conntrack_destroy().
First turn this BUG_ON into a WARN. I think it was triggered
via enable_hooks=1 flag.
When this flag is turned on, the conntrack hooks are registered
before nf_ct_hook pointer gets assigned.
This opens a short window where packets enter the conntrack machinery,
can have skb->_nfct set up and a subsequent kfree_skb might occur
before nf_ct_hook is set.
Call nf_conntrack_init_end() to set nf_ct_hook before we register the
pernet ops.
Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/core.c | 6 ++++--
net/netfilter/nf_conntrack_standalone.c | 3 ++-
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index f0783e42108b..5f76ae86a656 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -711,9 +711,11 @@ void nf_conntrack_destroy(struct nf_conntrack *nfct)
rcu_read_lock();
ct_hook = rcu_dereference(nf_ct_hook);
- BUG_ON(ct_hook == NULL);
- ct_hook->destroy(nfct);
+ if (ct_hook)
+ ct_hook->destroy(nfct);
rcu_read_unlock();
+
+ WARN_ON(!ct_hook);
}
EXPORT_SYMBOL(nf_conntrack_destroy);
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 57f6724c99a7..169e16fc2bce 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -1218,11 +1218,12 @@ static int __init nf_conntrack_standalone_init(void)
nf_conntrack_htable_size_user = nf_conntrack_htable_size;
#endif
+ nf_conntrack_init_end();
+
ret = register_pernet_subsys(&nf_conntrack_net_ops);
if (ret < 0)
goto out_pernet;
- nf_conntrack_init_end();
return 0;
out_pernet:
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net 3/7] selftests: nft_flowtable.sh: use /proc for pid checking
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 1/7] netfilter: nf_tables: always release netdev hooks from notifier Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 2/7] netfilter: conntrack: fix possible bug_on with enable_hooks=1 Pablo Neira Ayuso
@ 2023-05-10 8:33 ` Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 4/7] selftests: nft_flowtable.sh: no need for ps -x option Pablo Neira Ayuso
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
From: Boris Sukholitko <boris.sukholitko@broadcom.com>
Some ps commands (e.g. busybox derived) have no -p option. Use /proc for
pid existence check.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
tools/testing/selftests/netfilter/nft_flowtable.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/netfilter/nft_flowtable.sh b/tools/testing/selftests/netfilter/nft_flowtable.sh
index 7060bae04ec8..4d8bc51b7a7b 100755
--- a/tools/testing/selftests/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/netfilter/nft_flowtable.sh
@@ -288,11 +288,11 @@ test_tcp_forwarding_ip()
sleep 3
- if ps -p $lpid > /dev/null;then
+ if test -d /proc/"$lpid"/; then
kill $lpid
fi
- if ps -p $cpid > /dev/null;then
+ if test -d /proc/"$cpid"/; then
kill $cpid
fi
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net 4/7] selftests: nft_flowtable.sh: no need for ps -x option
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
` (2 preceding siblings ...)
2023-05-10 8:33 ` [PATCH net 3/7] selftests: nft_flowtable.sh: use /proc for pid checking Pablo Neira Ayuso
@ 2023-05-10 8:33 ` Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 5/7] selftests: nft_flowtable.sh: wait for specific nc pids Pablo Neira Ayuso
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
From: Boris Sukholitko <boris.sukholitko@broadcom.com>
Some ps commands (e.g. busybox derived) have no -x option. For the
purposes of hash calculation of the list of processes this option is
inessential.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
tools/testing/selftests/netfilter/nft_flowtable.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/netfilter/nft_flowtable.sh b/tools/testing/selftests/netfilter/nft_flowtable.sh
index 4d8bc51b7a7b..3cf20e9bd3a6 100755
--- a/tools/testing/selftests/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/netfilter/nft_flowtable.sh
@@ -489,8 +489,8 @@ ip -net $nsr1 addr add 10.0.1.1/24 dev veth0
ip -net $nsr1 addr add dead:1::1/64 dev veth0
ip -net $nsr1 link set up dev veth0
-KEY_SHA="0x"$(ps -xaf | sha1sum | cut -d " " -f 1)
-KEY_AES="0x"$(ps -xaf | md5sum | cut -d " " -f 1)
+KEY_SHA="0x"$(ps -af | sha1sum | cut -d " " -f 1)
+KEY_AES="0x"$(ps -af | md5sum | cut -d " " -f 1)
SPI1=$RANDOM
SPI2=$RANDOM
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net 5/7] selftests: nft_flowtable.sh: wait for specific nc pids
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
` (3 preceding siblings ...)
2023-05-10 8:33 ` [PATCH net 4/7] selftests: nft_flowtable.sh: no need for ps -x option Pablo Neira Ayuso
@ 2023-05-10 8:33 ` Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 6/7] selftests: nft_flowtable.sh: monitor result file sizes Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 7/7] selftests: nft_flowtable.sh: check ingress/egress chain too Pablo Neira Ayuso
6 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
From: Boris Sukholitko <boris.sukholitko@broadcom.com>
Doing wait with no parameters may interfere with some of the tests
having their own background processes.
Although no such test is currently present, the cleanup is useful
to rely on the nft_flowtable.sh for local development (e.g. running
background tcpdump command during the tests).
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
tools/testing/selftests/netfilter/nft_flowtable.sh | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/netfilter/nft_flowtable.sh b/tools/testing/selftests/netfilter/nft_flowtable.sh
index 3cf20e9bd3a6..92bc308bf168 100755
--- a/tools/testing/selftests/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/netfilter/nft_flowtable.sh
@@ -296,7 +296,8 @@ test_tcp_forwarding_ip()
kill $cpid
fi
- wait
+ wait $lpid
+ wait $cpid
if ! check_transfer "$nsin" "$ns2out" "ns1 -> ns2"; then
lret=1
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net 6/7] selftests: nft_flowtable.sh: monitor result file sizes
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
` (4 preceding siblings ...)
2023-05-10 8:33 ` [PATCH net 5/7] selftests: nft_flowtable.sh: wait for specific nc pids Pablo Neira Ayuso
@ 2023-05-10 8:33 ` Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 7/7] selftests: nft_flowtable.sh: check ingress/egress chain too Pablo Neira Ayuso
6 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
From: Boris Sukholitko <boris.sukholitko@broadcom.com>
When running nft_flowtable.sh in VM on a busy server we've found that
the time of the netcat file transfers vary wildly.
Therefore replace hardcoded 3 second sleep with the loop checking for
a change in the file sizes. Once no change in detected we test the results.
Nice side effect is that we shave 1 second sleep in the fast case
(hard-coded 3 second sleep vs two 1 second sleeps).
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
tools/testing/selftests/netfilter/nft_flowtable.sh | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/netfilter/nft_flowtable.sh b/tools/testing/selftests/netfilter/nft_flowtable.sh
index 92bc308bf168..51f986f19fee 100755
--- a/tools/testing/selftests/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/netfilter/nft_flowtable.sh
@@ -286,7 +286,15 @@ test_tcp_forwarding_ip()
ip netns exec $nsa nc -w 4 "$dstip" "$dstport" < "$nsin" > "$ns1out" &
cpid=$!
- sleep 3
+ sleep 1
+
+ prev="$(ls -l $ns1out $ns2out)"
+ sleep 1
+
+ while [[ "$prev" != "$(ls -l $ns1out $ns2out)" ]]; do
+ sleep 1;
+ prev="$(ls -l $ns1out $ns2out)"
+ done
if test -d /proc/"$lpid"/; then
kill $lpid
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH net 7/7] selftests: nft_flowtable.sh: check ingress/egress chain too
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
` (5 preceding siblings ...)
2023-05-10 8:33 ` [PATCH net 6/7] selftests: nft_flowtable.sh: monitor result file sizes Pablo Neira Ayuso
@ 2023-05-10 8:33 ` Pablo Neira Ayuso
6 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
From: Florian Westphal <fw@strlen.de>
Make sure flowtable interacts correctly with ingress and egress
chains, i.e. those get handled before and after flow table respectively.
Adds three more tests:
1. repeat flowtable test, but with 'ip dscp set cs3' done in
inet forward chain.
Expect that some packets have been mangled (before flowtable offload
became effective) while some pass without mangling (after offload
succeeds).
2. repeat flowtable test, but with 'ip dscp set cs3' done in
veth0:ingress.
Expect that all packets pass with cs3 dscp field.
3. same as 2, but use veth1:egress. Expect the same outcome.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
.../selftests/netfilter/nft_flowtable.sh | 124 ++++++++++++++++++
1 file changed, 124 insertions(+)
diff --git a/tools/testing/selftests/netfilter/nft_flowtable.sh b/tools/testing/selftests/netfilter/nft_flowtable.sh
index 51f986f19fee..a32f490f7539 100755
--- a/tools/testing/selftests/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/netfilter/nft_flowtable.sh
@@ -188,6 +188,26 @@ if [ $? -ne 0 ]; then
exit $ksft_skip
fi
+ip netns exec $ns2 nft -f - <<EOF
+table inet filter {
+ counter ip4dscp0 { }
+ counter ip4dscp3 { }
+
+ chain input {
+ type filter hook input priority 0; policy accept;
+ meta l4proto tcp goto {
+ ip dscp cs3 counter name ip4dscp3 accept
+ ip dscp 0 counter name ip4dscp0 accept
+ }
+ }
+}
+EOF
+
+if [ $? -ne 0 ]; then
+ echo "SKIP: Could not load nft ruleset"
+ exit $ksft_skip
+fi
+
# test basic connectivity
if ! ip netns exec $ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then
echo "ERROR: $ns1 cannot reach ns2" 1>&2
@@ -255,6 +275,60 @@ check_counters()
fi
}
+check_dscp()
+{
+ local what=$1
+ local ok=1
+
+ local counter=$(ip netns exec $ns2 nft reset counter inet filter ip4dscp3 | grep packets)
+
+ local pc4=${counter%*bytes*}
+ local pc4=${pc4#*packets}
+
+ local counter=$(ip netns exec $ns2 nft reset counter inet filter ip4dscp0 | grep packets)
+ local pc4z=${counter%*bytes*}
+ local pc4z=${pc4z#*packets}
+
+ case "$what" in
+ "dscp_none")
+ if [ $pc4 -gt 0 ] || [ $pc4z -eq 0 ]; then
+ echo "FAIL: dscp counters do not match, expected dscp3 == 0, dscp0 > 0, but got $pc4,$pc4z" 1>&2
+ ret=1
+ ok=0
+ fi
+ ;;
+ "dscp_fwd")
+ if [ $pc4 -eq 0 ] || [ $pc4z -eq 0 ]; then
+ echo "FAIL: dscp counters do not match, expected dscp3 and dscp0 > 0 but got $pc4,$pc4z" 1>&2
+ ret=1
+ ok=0
+ fi
+ ;;
+ "dscp_ingress")
+ if [ $pc4 -eq 0 ] || [ $pc4z -gt 0 ]; then
+ echo "FAIL: dscp counters do not match, expected dscp3 > 0, dscp0 == 0 but got $pc4,$pc4z" 1>&2
+ ret=1
+ ok=0
+ fi
+ ;;
+ "dscp_egress")
+ if [ $pc4 -eq 0 ] || [ $pc4z -gt 0 ]; then
+ echo "FAIL: dscp counters do not match, expected dscp3 > 0, dscp0 == 0 but got $pc4,$pc4z" 1>&2
+ ret=1
+ ok=0
+ fi
+ ;;
+ *)
+ echo "FAIL: Unknown DSCP check" 1>&2
+ ret=1
+ ok=0
+ esac
+
+ if [ $ok -eq 1 ] ;then
+ echo "PASS: $what: dscp packet counters match"
+ fi
+}
+
check_transfer()
{
in=$1
@@ -325,6 +399,51 @@ test_tcp_forwarding()
return $?
}
+test_tcp_forwarding_set_dscp()
+{
+ check_dscp "dscp_none"
+
+ip netns exec $nsr1 nft -f - <<EOF
+table netdev dscpmangle {
+ chain setdscp0 {
+ type filter hook ingress device "veth0" priority 0; policy accept
+ ip dscp set cs3
+ }
+}
+EOF
+if [ $? -eq 0 ]; then
+ test_tcp_forwarding_ip "$1" "$2" 10.0.2.99 12345
+ check_dscp "dscp_ingress"
+
+ ip netns exec $nsr1 nft delete table netdev dscpmangle
+else
+ echo "SKIP: Could not load netdev:ingress for veth0"
+fi
+
+ip netns exec $nsr1 nft -f - <<EOF
+table netdev dscpmangle {
+ chain setdscp0 {
+ type filter hook egress device "veth1" priority 0; policy accept
+ ip dscp set cs3
+ }
+}
+EOF
+if [ $? -eq 0 ]; then
+ test_tcp_forwarding_ip "$1" "$2" 10.0.2.99 12345
+ check_dscp "dscp_egress"
+
+ ip netns exec $nsr1 nft flush table netdev dscpmangle
+else
+ echo "SKIP: Could not load netdev:egress for veth1"
+fi
+
+ # partial. If flowtable really works, then both dscp-is-0 and dscp-is-cs3
+ # counters should have seen packets (before and after ft offload kicks in).
+ ip netns exec $nsr1 nft -a insert rule inet filter forward ip dscp set cs3
+ test_tcp_forwarding_ip "$1" "$2" 10.0.2.99 12345
+ check_dscp "dscp_fwd"
+}
+
test_tcp_forwarding_nat()
{
local lret
@@ -394,6 +513,11 @@ table ip nat {
}
EOF
+if ! test_tcp_forwarding_set_dscp $ns1 $ns2 0 ""; then
+ echo "FAIL: flow offload for ns1/ns2 with dscp update" 1>&2
+ exit 0
+fi
+
if ! test_tcp_forwarding_nat $ns1 $ns2 0 ""; then
echo "FAIL: flow offload for ns1/ns2 with NAT" 1>&2
ip netns exec $nsr1 nft list ruleset
--
2.30.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net 0/7] netfilter updates for net
@ 2023-10-12 8:57 Florian Westphal
0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2023-10-12 8:57 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel
Hello,
The following contains patches for your *net* tree.
Patch 1, from Pablo Neira Ayuso, fixes a performance regression
(since 6.4) when a large pending set update has to be canceled towards
the end of the transaction.
Patch 2 from myself, silences an incorrect compiler warning reported
with a few (older) compiler toolchains.
Patch 3, from Kees Cook, adds __counted_by annotation to
nft_pipapo set backend type. I took this for net instead of -next
given infra is already in place and no actual code change is made.
Patch 4, from Pablo Neira Ayso, disables timeout resets on
stateful element reset. The rest should only affect internal object
state, e.g. reset a quota or counter, but not affect a pending timeout.
Patches 5 and 6 fix NULL dereferences in 'inner header' match,
control plane doesn't test for netlink attribute presence before
accessing them. Broken since feature was added in 6.2, fixes from
Xingyuan Mo.
Last patch, from myself, fixes a bogus rule match when skb has
a 0-length mac header, in this case we'd fetch data from network
header instead of canceling rule evaluation. This is a day 0 bug,
present since nftables was merged in 3.13.
The following changes since commit 50e492143374c17ad89c865a1a44837b3f5c8226:
octeontx2-pf: Fix page pool frag allocation warning (2023-10-12 09:48:51 +0200)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-10-12
for you to fetch changes up to d351c1ea2de3e36e608fc355d8ae7d0cc80e6cd6:
netfilter: nft_payload: fix wrong mac header matching (2023-10-12 10:28:45 +0200)
----------------------------------------------------------------
nf pull request 2023-10-12
----------------------------------------------------------------
Florian Westphal (2):
netfilter: nfnetlink_log: silence bogus compiler warning
netfilter: nft_payload: fix wrong mac header matching
Kees Cook (1):
netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by
Pablo Neira Ayuso (2):
netfilter: nf_tables: do not remove elements if set backend implements .abort
netfilter: nf_tables: do not refresh timeout when resetting element
Xingyuan Mo (2):
nf_tables: fix NULL pointer dereference in nft_inner_init()
nf_tables: fix NULL pointer dereference in nft_expr_inner_parse()
net/netfilter/nf_tables_api.c | 25 ++++++++++---------------
net/netfilter/nfnetlink_log.c | 2 +-
net/netfilter/nft_inner.c | 1 +
net/netfilter/nft_payload.c | 2 +-
net/netfilter/nft_set_pipapo.h | 2 +-
5 files changed, 14 insertions(+), 18 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH net 0/7] netfilter: updates for net
@ 2025-09-10 19:03 Florian Westphal
0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi,
The following patchset contains Netfilter fixes for *net*:
WARNING: This results in a conflict on net -> net-next merge.
Merge resolution walkthrough is at the end of this cover letter, see
MERGE WALKTHROUGH.
Merge branch 'mptcp-misc-fixes-for-v6-17-rc6' (2025-09-09 18:39:55 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-09-10-v2
for you to fetch changes up to 37a9675e61a2a2a721a28043ffdf2c8ec81eba37:
MAINTAINERS: add Phil as netfilter reviewer (2025-09-10 20:32:46 +0200)
First patch adds a lockdep annotation for a false-positive splat.
Last patch adds formal reviewer tag for Phil Sutter to MAINTAINERS.
Rest of the patches resolve spurious false negative results during set
lookups while another CPU is processing a transaction.
This has been broken at least since v4.18 when an unconditional
synchronize_rcu call was removed from the commit phase of nf_tables.
Quoting from Stefan Hanreichs original report:
It seems like we've found an issue with atomicity when reloading
nftables rulesets. Sometimes there is a small window where rules
containing sets do not seem to apply to incoming traffic, due to the set
apparently being empty for a short amount of time when flushing / adding
elements.
Exanple ruleset:
table ip filter {
set match {
type ipv4_addr
flags interval
elements = { 0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
}
chain pre {
type filter hook prerouting priority filter; policy accept;
ip saddr @match accept
counter comment "must never match"
}
}
Reproducer transaction:
while true:
nft -f -<<EOF
flush set ip filter match
create element ip filter match { \
0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
EOF
done
Then create traffic. to/from e.g. 192.168.2.1 to 192.168.3.10.
Once in a while the counter will increment even though the
'ip saddr @match' rule should have accepted the packet.
See individual patches for details.
Thanks to Stefan Hanreich for an initial description and reproducer for
this bug and to Pablo Neira Ayuso for reviewing earlier iterations of
the patchset.
Florian Westphal (7):
netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
netfilter: nft_set_rbtree: continue traversal if element is inactive
netfilter: nf_tables: place base_seq in struct net
netfilter: nf_tables: make nft_set_do_lookup available unconditionally
netfilter: nf_tables: restart set lookup on base_seq change
MAINTAINERS: add Phil as netfilter reviewer
MAINTAINERS | 1 +
include/net/netfilter/nf_tables.h | 1 -
include/net/netfilter/nf_tables_core.h | 10 +---
include/net/netns/nftables.h | 1 +
net/netfilter/nf_tables_api.c | 66 +++++++++++++-------------
net/netfilter/nft_lookup.c | 46 ++++++++++++++++--
net/netfilter/nft_set_bitmap.c | 3 +-
net/netfilter/nft_set_pipapo.c | 20 +++++++-
net/netfilter/nft_set_pipapo_avx2.c | 4 +-
net/netfilter/nft_set_rbtree.c | 6 +--
10 files changed, 103 insertions(+), 55 deletions(-)
MERGE WALKTHROUGH:
When merging this to net-next, you should see following:
CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo.c
CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo_avx2.c
Instructions for net/netfilter/nft_set_pipapo.c:
@@@ -562,7 -539,7 +578,11 @@@ nft_pipapo_lookup(const struct net *net
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
++<<<<<<< HEAD
+ e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
++=======
+ e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
return e ? &e->ext : NULL;
}
Take the HEAD chunk, with 'genmask' replaced by NFT_GENMASK_ANY, i.e.:
e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
Instructions for net/netfilter/nft_set_pipapo_avx2.c:
++<<<<<<< HEAD
++=======
+ const struct nft_pipapo_match *m;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
Take the HEAD chunk, i.e. delete 'const struct nft_pipapo_match *m;':
In -next, this is passed as function argument.
++<<<<<<< HEAD
+ if (ret < 0) {
+ scratch->map_index = map_index;
+ kernel_fpu_end();
+ __local_unlock_nested_bh(&scratch->bh_lock);
+ return NULL;
++=======
+ if (ret < 0)
+ goto out;
+
+ if (last) {
+ const struct nft_set_ext *e = &f->mt[ret].e->ext;
+
+ if (unlikely(nft_set_elem_expired(e)))
+ goto next_match;
+
+ ext = e;
+ goto out;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
Take the HEAD chunk and discard the other; including if (last) { branch.
Then, in nft_pipapo_avx2_lookup(), make this change:
@@ -1274,9 +1273,8 @@
nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const u8 *rp = (const u8 *)key;
const struct nft_pipapo_elem *e;
@@ -1292,9 +1290,9 @@
}
m = rcu_dereference(priv->match);
- e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+ e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
local_bh_enable();
return e ? &e->ext : NULL;
After this change, you are done.
The expected diff vs the net-next main branch in these two files is:
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -549,6 +549,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
*
* This function is called from the data path. It will search for
* an element matching the given key in the current active copy.
+ * Unlike other set types, this uses NFT_GENMASK_ANY instead of
+ * nft_genmask_cur().
[trimmed rest of comment]
*
* Return: ntables API extension pointer or NULL if no match.
*/
@@ -557,12 +574,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
- e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
+ e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
return e ? &e->ext : NULL;
}
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1275,7 +1275,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const u8 *rp = (const u8 *)key;
const struct nft_pipapo_elem *e;
@@ -1293,7 +1292,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
m = rcu_dereference(priv->match);
- e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+ e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
local_bh_enable();
return e ? &e->ext : NULL;
--
2.49.1
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH net 0/7] netfilter updates for net
@ 2026-04-08 16:35 Florian Westphal
0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi.
This pull requests contain netfilter fixes for the *net* tree.
I only included crash fixes, as we're closer to a release, rest will
be handled via -next.
1) Fix a NULL pointer dereference in ip_vs_add_service error path, from
Weiming Shi, bug added in 6.2 development cycle.
2) Don't leak kernel data bytes from allocator to userspace: nfnetlink_log
needs to init the trailing NLMSG_DONE terminator. From Xiang Mei.
3) xt_multiport match lacks range validation, bogus userspace request will
cause out-of-bounds read. From Ren Wei.
4) ip6t_eui64 match must reject packets with invalid mac header before
calling eth_hdr. Make existing check unconditional. From Zhengchuan
Liang.
5) nft_ct timeout policies are free'd via kfree() while they may still
be reachable by other cpus that process a conntrack object that
uses such a timeout policy. Existing reaping of entries is not
sufficient because it doesn't wait for a grace period. Use kfree_rcu().
From Tuan Do.
6/7) Make nfnetlink_queue hash table per queue. As-is we can hit a page
fault in case underlying page of removed element was free'd. Per-queue
hash prevents parallel lookups. This comes with a test case that
demonstrates the bug, from Fernando Fernandez Mancera.
Please, pull these changes from:
The following changes since commit f821664dde29302e8450aa0597bf1e4c7c5b0a22:
Merge branch 'seg6-fix-dst_cache-sharing-in-seg6-lwtunnel' (2026-04-07 20:21:00 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-04-08
for you to fetch changes up to dde1a6084c5ca9d143a562540d5453454d79ea15:
selftests: nft_queue.sh: add a parallel stress test (2026-04-08 13:34:51 +0200)
----------------------------------------------------------------
netfilter pull request nf-26-04-08
----------------------------------------------------------------
Fernando Fernandez Mancera (1):
selftests: nft_queue.sh: add a parallel stress test
Florian Westphal (1):
netfilter: nfnetlink_queue: make hash table per queue
Ren Wei (1):
netfilter: xt_multiport: validate range encoding in checkentry
Tuan Do (1):
netfilter: nft_ct: fix use-after-free in timeout object destroy
Weiming Shi (1):
ipvs: fix NULL deref in ip_vs_add_service error path
Xiang Mei (1):
netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
Zhengchuan Liang (1):
netfilter: ip6t_eui64: reject invalid MAC header for all packets
include/net/netfilter/nf_conntrack_timeout.h | 1 +
include/net/netfilter/nf_queue.h | 1 -
net/ipv6/netfilter/ip6t_eui64.c | 3 +-
net/netfilter/ipvs/ip_vs_ctl.c | 1 -
net/netfilter/nfnetlink_log.c | 8 +-
net/netfilter/nfnetlink_queue.c | 139 ++++++------------
net/netfilter/nft_ct.c | 2 +-
net/netfilter/xt_multiport.c | 34 ++++-
.../selftests/net/netfilter/nf_queue.c | 50 ++++++-
.../selftests/net/netfilter/nft_queue.sh | 83 +++++++++--
10 files changed, 201 insertions(+), 121 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-08 16:35 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-10 8:33 [PATCH net 0/7] Netfilter updates for net Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 1/7] netfilter: nf_tables: always release netdev hooks from notifier Pablo Neira Ayuso
2023-05-11 2:20 ` patchwork-bot+netdevbpf
2023-05-10 8:33 ` [PATCH net 2/7] netfilter: conntrack: fix possible bug_on with enable_hooks=1 Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 3/7] selftests: nft_flowtable.sh: use /proc for pid checking Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 4/7] selftests: nft_flowtable.sh: no need for ps -x option Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 5/7] selftests: nft_flowtable.sh: wait for specific nc pids Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 6/7] selftests: nft_flowtable.sh: monitor result file sizes Pablo Neira Ayuso
2023-05-10 8:33 ` [PATCH net 7/7] selftests: nft_flowtable.sh: check ingress/egress chain too Pablo Neira Ayuso
-- strict thread matches above, loose matches on Subject: below --
2023-10-12 8:57 [PATCH net 0/7] netfilter updates for net Florian Westphal
2025-09-10 19:03 [PATCH net 0/7] netfilter: " Florian Westphal
2026-04-08 16:35 [PATCH net 0/7] netfilter " Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox