* [PATCH net 0/7] Netfilter updates for net
@ 2023-05-10 8:33 Pablo Neira Ayuso
0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10 8:33 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet
Hi,
The following patchset contains Netfilter fixes for net:
1) Fix UAF when releasing netnamespace, from Florian Westphal.
2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
from Florian Westphal.
3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
4) Extend nft_flowtable.sh selftest to cover integration with
ingress/egress hooks, from Florian Westphal.
Please, pull these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-05-10
Thanks.
----------------------------------------------------------------
The following changes since commit 582dbb2cc1a0a7427840f5b1e3c65608e511b061:
net: phy: bcm7xx: Correct read from expansion register (2023-05-09 20:25:52 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-05-10
for you to fetch changes up to 3acf8f6c14d0e42b889738d63b6d9cb63348fc94:
selftests: nft_flowtable.sh: check ingress/egress chain too (2023-05-10 09:31:07 +0200)
----------------------------------------------------------------
netfilter pull request 23-05-10
----------------------------------------------------------------
Boris Sukholitko (4):
selftests: nft_flowtable.sh: use /proc for pid checking
selftests: nft_flowtable.sh: no need for ps -x option
selftests: nft_flowtable.sh: wait for specific nc pids
selftests: nft_flowtable.sh: monitor result file sizes
Florian Westphal (3):
netfilter: nf_tables: always release netdev hooks from notifier
netfilter: conntrack: fix possible bug_on with enable_hooks=1
selftests: nft_flowtable.sh: check ingress/egress chain too
net/netfilter/core.c | 6 +-
net/netfilter/nf_conntrack_standalone.c | 3 +-
net/netfilter/nft_chain_filter.c | 9 +-
tools/testing/selftests/netfilter/nft_flowtable.sh | 145 ++++++++++++++++++++-
4 files changed, 151 insertions(+), 12 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net 0/7] netfilter updates for net
@ 2023-10-12 8:57 Florian Westphal
0 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2023-10-12 8:57 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel
Hello,
The following contains patches for your *net* tree.
Patch 1, from Pablo Neira Ayuso, fixes a performance regression
(since 6.4) when a large pending set update has to be canceled towards
the end of the transaction.
Patch 2 from myself, silences an incorrect compiler warning reported
with a few (older) compiler toolchains.
Patch 3, from Kees Cook, adds __counted_by annotation to
nft_pipapo set backend type. I took this for net instead of -next
given infra is already in place and no actual code change is made.
Patch 4, from Pablo Neira Ayso, disables timeout resets on
stateful element reset. The rest should only affect internal object
state, e.g. reset a quota or counter, but not affect a pending timeout.
Patches 5 and 6 fix NULL dereferences in 'inner header' match,
control plane doesn't test for netlink attribute presence before
accessing them. Broken since feature was added in 6.2, fixes from
Xingyuan Mo.
Last patch, from myself, fixes a bogus rule match when skb has
a 0-length mac header, in this case we'd fetch data from network
header instead of canceling rule evaluation. This is a day 0 bug,
present since nftables was merged in 3.13.
The following changes since commit 50e492143374c17ad89c865a1a44837b3f5c8226:
octeontx2-pf: Fix page pool frag allocation warning (2023-10-12 09:48:51 +0200)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-10-12
for you to fetch changes up to d351c1ea2de3e36e608fc355d8ae7d0cc80e6cd6:
netfilter: nft_payload: fix wrong mac header matching (2023-10-12 10:28:45 +0200)
----------------------------------------------------------------
nf pull request 2023-10-12
----------------------------------------------------------------
Florian Westphal (2):
netfilter: nfnetlink_log: silence bogus compiler warning
netfilter: nft_payload: fix wrong mac header matching
Kees Cook (1):
netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by
Pablo Neira Ayuso (2):
netfilter: nf_tables: do not remove elements if set backend implements .abort
netfilter: nf_tables: do not refresh timeout when resetting element
Xingyuan Mo (2):
nf_tables: fix NULL pointer dereference in nft_inner_init()
nf_tables: fix NULL pointer dereference in nft_expr_inner_parse()
net/netfilter/nf_tables_api.c | 25 ++++++++++---------------
net/netfilter/nfnetlink_log.c | 2 +-
net/netfilter/nft_inner.c | 1 +
net/netfilter/nft_payload.c | 2 +-
net/netfilter/nft_set_pipapo.h | 2 +-
5 files changed, 14 insertions(+), 18 deletions(-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net 0/7] netfilter: updates for net
@ 2025-09-10 19:03 Florian Westphal
0 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi,
The following patchset contains Netfilter fixes for *net*:
WARNING: This results in a conflict on net -> net-next merge.
Merge resolution walkthrough is at the end of this cover letter, see
MERGE WALKTHROUGH.
Merge branch 'mptcp-misc-fixes-for-v6-17-rc6' (2025-09-09 18:39:55 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-09-10-v2
for you to fetch changes up to 37a9675e61a2a2a721a28043ffdf2c8ec81eba37:
MAINTAINERS: add Phil as netfilter reviewer (2025-09-10 20:32:46 +0200)
First patch adds a lockdep annotation for a false-positive splat.
Last patch adds formal reviewer tag for Phil Sutter to MAINTAINERS.
Rest of the patches resolve spurious false negative results during set
lookups while another CPU is processing a transaction.
This has been broken at least since v4.18 when an unconditional
synchronize_rcu call was removed from the commit phase of nf_tables.
Quoting from Stefan Hanreichs original report:
It seems like we've found an issue with atomicity when reloading
nftables rulesets. Sometimes there is a small window where rules
containing sets do not seem to apply to incoming traffic, due to the set
apparently being empty for a short amount of time when flushing / adding
elements.
Exanple ruleset:
table ip filter {
set match {
type ipv4_addr
flags interval
elements = { 0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
}
chain pre {
type filter hook prerouting priority filter; policy accept;
ip saddr @match accept
counter comment "must never match"
}
}
Reproducer transaction:
while true:
nft -f -<<EOF
flush set ip filter match
create element ip filter match { \
0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
EOF
done
Then create traffic. to/from e.g. 192.168.2.1 to 192.168.3.10.
Once in a while the counter will increment even though the
'ip saddr @match' rule should have accepted the packet.
See individual patches for details.
Thanks to Stefan Hanreich for an initial description and reproducer for
this bug and to Pablo Neira Ayuso for reviewing earlier iterations of
the patchset.
Florian Westphal (7):
netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
netfilter: nft_set_rbtree: continue traversal if element is inactive
netfilter: nf_tables: place base_seq in struct net
netfilter: nf_tables: make nft_set_do_lookup available unconditionally
netfilter: nf_tables: restart set lookup on base_seq change
MAINTAINERS: add Phil as netfilter reviewer
MAINTAINERS | 1 +
include/net/netfilter/nf_tables.h | 1 -
include/net/netfilter/nf_tables_core.h | 10 +---
include/net/netns/nftables.h | 1 +
net/netfilter/nf_tables_api.c | 66 +++++++++++++-------------
net/netfilter/nft_lookup.c | 46 ++++++++++++++++--
net/netfilter/nft_set_bitmap.c | 3 +-
net/netfilter/nft_set_pipapo.c | 20 +++++++-
net/netfilter/nft_set_pipapo_avx2.c | 4 +-
net/netfilter/nft_set_rbtree.c | 6 +--
10 files changed, 103 insertions(+), 55 deletions(-)
MERGE WALKTHROUGH:
When merging this to net-next, you should see following:
CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo.c
CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo_avx2.c
Instructions for net/netfilter/nft_set_pipapo.c:
@@@ -562,7 -539,7 +578,11 @@@ nft_pipapo_lookup(const struct net *net
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
++<<<<<<< HEAD
+ e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
++=======
+ e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
return e ? &e->ext : NULL;
}
Take the HEAD chunk, with 'genmask' replaced by NFT_GENMASK_ANY, i.e.:
e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
Instructions for net/netfilter/nft_set_pipapo_avx2.c:
++<<<<<<< HEAD
++=======
+ const struct nft_pipapo_match *m;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
Take the HEAD chunk, i.e. delete 'const struct nft_pipapo_match *m;':
In -next, this is passed as function argument.
++<<<<<<< HEAD
+ if (ret < 0) {
+ scratch->map_index = map_index;
+ kernel_fpu_end();
+ __local_unlock_nested_bh(&scratch->bh_lock);
+ return NULL;
++=======
+ if (ret < 0)
+ goto out;
+
+ if (last) {
+ const struct nft_set_ext *e = &f->mt[ret].e->ext;
+
+ if (unlikely(nft_set_elem_expired(e)))
+ goto next_match;
+
+ ext = e;
+ goto out;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
Take the HEAD chunk and discard the other; including if (last) { branch.
Then, in nft_pipapo_avx2_lookup(), make this change:
@@ -1274,9 +1273,8 @@
nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const u8 *rp = (const u8 *)key;
const struct nft_pipapo_elem *e;
@@ -1292,9 +1290,9 @@
}
m = rcu_dereference(priv->match);
- e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+ e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
local_bh_enable();
return e ? &e->ext : NULL;
After this change, you are done.
The expected diff vs the net-next main branch in these two files is:
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -549,6 +549,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
*
* This function is called from the data path. It will search for
* an element matching the given key in the current active copy.
+ * Unlike other set types, this uses NFT_GENMASK_ANY instead of
+ * nft_genmask_cur().
[trimmed rest of comment]
*
* Return: ntables API extension pointer or NULL if no match.
*/
@@ -557,12 +574,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
- e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
+ e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
return e ? &e->ext : NULL;
}
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1275,7 +1275,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const u8 *rp = (const u8 *)key;
const struct nft_pipapo_elem *e;
@@ -1293,7 +1292,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
m = rcu_dereference(priv->match);
- e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+ e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
local_bh_enable();
return e ? &e->ext : NULL;
--
2.49.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net 0/7] netfilter updates for net
@ 2026-04-08 16:35 Florian Westphal
2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi.
This pull requests contain netfilter fixes for the *net* tree.
I only included crash fixes, as we're closer to a release, rest will
be handled via -next.
1) Fix a NULL pointer dereference in ip_vs_add_service error path, from
Weiming Shi, bug added in 6.2 development cycle.
2) Don't leak kernel data bytes from allocator to userspace: nfnetlink_log
needs to init the trailing NLMSG_DONE terminator. From Xiang Mei.
3) xt_multiport match lacks range validation, bogus userspace request will
cause out-of-bounds read. From Ren Wei.
4) ip6t_eui64 match must reject packets with invalid mac header before
calling eth_hdr. Make existing check unconditional. From Zhengchuan
Liang.
5) nft_ct timeout policies are free'd via kfree() while they may still
be reachable by other cpus that process a conntrack object that
uses such a timeout policy. Existing reaping of entries is not
sufficient because it doesn't wait for a grace period. Use kfree_rcu().
From Tuan Do.
6/7) Make nfnetlink_queue hash table per queue. As-is we can hit a page
fault in case underlying page of removed element was free'd. Per-queue
hash prevents parallel lookups. This comes with a test case that
demonstrates the bug, from Fernando Fernandez Mancera.
Please, pull these changes from:
The following changes since commit f821664dde29302e8450aa0597bf1e4c7c5b0a22:
Merge branch 'seg6-fix-dst_cache-sharing-in-seg6-lwtunnel' (2026-04-07 20:21:00 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-04-08
for you to fetch changes up to dde1a6084c5ca9d143a562540d5453454d79ea15:
selftests: nft_queue.sh: add a parallel stress test (2026-04-08 13:34:51 +0200)
----------------------------------------------------------------
netfilter pull request nf-26-04-08
----------------------------------------------------------------
Fernando Fernandez Mancera (1):
selftests: nft_queue.sh: add a parallel stress test
Florian Westphal (1):
netfilter: nfnetlink_queue: make hash table per queue
Ren Wei (1):
netfilter: xt_multiport: validate range encoding in checkentry
Tuan Do (1):
netfilter: nft_ct: fix use-after-free in timeout object destroy
Weiming Shi (1):
ipvs: fix NULL deref in ip_vs_add_service error path
Xiang Mei (1):
netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
Zhengchuan Liang (1):
netfilter: ip6t_eui64: reject invalid MAC header for all packets
include/net/netfilter/nf_conntrack_timeout.h | 1 +
include/net/netfilter/nf_queue.h | 1 -
net/ipv6/netfilter/ip6t_eui64.c | 3 +-
net/netfilter/ipvs/ip_vs_ctl.c | 1 -
net/netfilter/nfnetlink_log.c | 8 +-
net/netfilter/nfnetlink_queue.c | 139 ++++++------------
net/netfilter/nft_ct.c | 2 +-
net/netfilter/xt_multiport.c | 34 ++++-
.../selftests/net/netfilter/nf_queue.c | 50 ++++++-
.../selftests/net/netfilter/nft_queue.sh | 83 +++++++++--
10 files changed, 201 insertions(+), 121 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Weiming Shi <bestswngs@gmail.com>
When ip_vs_bind_scheduler() succeeds in ip_vs_add_service(), the local
variable sched is set to NULL. If ip_vs_start_estimator() subsequently
fails, the out_err cleanup calls ip_vs_unbind_scheduler(svc, sched)
with sched == NULL. ip_vs_unbind_scheduler() passes the cur_sched NULL
check (because svc->scheduler was set by the successful bind) but then
dereferences the NULL sched parameter at sched->done_service, causing a
kernel panic at offset 0x30 from NULL.
Oops: general protection fault, [..] [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
RIP: 0010:ip_vs_unbind_scheduler (net/netfilter/ipvs/ip_vs_sched.c:69)
Call Trace:
<TASK>
ip_vs_add_service.isra.0 (net/netfilter/ipvs/ip_vs_ctl.c:1500)
do_ip_vs_set_ctl (net/netfilter/ipvs/ip_vs_ctl.c:2809)
nf_setsockopt (net/netfilter/nf_sockopt.c:102)
[..]
Fix by simply not clearing the local sched variable after a successful
bind. ip_vs_unbind_scheduler() already detects whether a scheduler is
installed via svc->scheduler, and keeping sched non-NULL ensures the
error path passes the correct pointer to both ip_vs_unbind_scheduler()
and ip_vs_scheduler_put().
While the bug is older, the problem popups in more recent kernels (6.2),
when the new error path is taken after the ip_vs_start_estimator() call.
Fixes: 705dd3444081 ("ipvs: use kthreads for stats estimation")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Acked-by: Simon Horman <horms@kernel.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/ipvs/ip_vs_ctl.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 35642de2a0fe..2aaf50f52c8e 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1452,7 +1452,6 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
ret = ip_vs_bind_scheduler(svc, sched);
if (ret)
goto out_err;
- sched = NULL;
}
ret = ip_vs_start_estimator(ipvs, &svc->stats);
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Xiang Mei <xmei5@asu.edu>
When batching multiple NFLOG messages (inst->qlen > 1), __nfulnl_send()
appends an NLMSG_DONE terminator with sizeof(struct nfgenmsg) payload via
nlmsg_put(), but never initializes the nfgenmsg bytes. The nlmsg_put()
helper only zeroes alignment padding after the payload, not the payload
itself, so four bytes of stale kernel heap data are leaked to userspace
in the NLMSG_DONE message body.
Use nfnl_msg_put() to build the NLMSG_DONE terminator, which initializes
the nfgenmsg payload via nfnl_fill_hdr(), consistent with how
__build_packet_message() already constructs NFULNL_MSG_PACKET headers.
Fixes: 29c5d4afba51 ("[NETFILTER]: nfnetlink_log: fix sending of multipart messages")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nfnetlink_log.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index f80978c06fa0..0db908518b2f 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -361,10 +361,10 @@ static void
__nfulnl_send(struct nfulnl_instance *inst)
{
if (inst->qlen > 1) {
- struct nlmsghdr *nlh = nlmsg_put(inst->skb, 0, 0,
- NLMSG_DONE,
- sizeof(struct nfgenmsg),
- 0);
+ struct nlmsghdr *nlh = nfnl_msg_put(inst->skb, 0, 0,
+ NLMSG_DONE, 0,
+ AF_UNSPEC, NFNETLINK_V0,
+ htons(inst->group_num));
if (WARN_ONCE(!nlh, "bad nlskb size: %u, tailroom %d\n",
inst->skb->len, skb_tailroom(inst->skb))) {
kfree_skb(inst->skb);
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Ren Wei <n05ec@lzu.edu.cn>
ports_match_v1() treats any non-zero pflags entry as the start of a
port range and unconditionally consumes the next ports[] element as
the range end.
The checkentry path currently validates protocol, flags and count, but
it does not validate the range encoding itself. As a result, malformed
rules can mark the last slot as a range start or place two range starts
back to back, leaving ports_match_v1() to step past the last valid
ports[] element while interpreting the rule.
Reject malformed multiport v1 rules in checkentry by validating that
each range start has a following element and that the following element
is not itself marked as another range start.
Fixes: a89ecb6a2ef7 ("[NETFILTER]: x_tables: unify IPv4/IPv6 multiport match")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Yuhang Zheng <z1652074432@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/xt_multiport.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/xt_multiport.c b/net/netfilter/xt_multiport.c
index 44a00f5acde8..a1691ff405d3 100644
--- a/net/netfilter/xt_multiport.c
+++ b/net/netfilter/xt_multiport.c
@@ -105,6 +105,28 @@ multiport_mt(const struct sk_buff *skb, struct xt_action_param *par)
return ports_match_v1(multiinfo, ntohs(pptr[0]), ntohs(pptr[1]));
}
+static bool
+multiport_valid_ranges(const struct xt_multiport_v1 *multiinfo)
+{
+ unsigned int i;
+
+ for (i = 0; i < multiinfo->count; i++) {
+ if (!multiinfo->pflags[i])
+ continue;
+
+ if (++i >= multiinfo->count)
+ return false;
+
+ if (multiinfo->pflags[i])
+ return false;
+
+ if (multiinfo->ports[i - 1] > multiinfo->ports[i])
+ return false;
+ }
+
+ return true;
+}
+
static inline bool
check(u_int16_t proto,
u_int8_t ip_invflags,
@@ -127,8 +149,10 @@ static int multiport_mt_check(const struct xt_mtchk_param *par)
const struct ipt_ip *ip = par->entryinfo;
const struct xt_multiport_v1 *multiinfo = par->matchinfo;
- return check(ip->proto, ip->invflags, multiinfo->flags,
- multiinfo->count) ? 0 : -EINVAL;
+ if (!check(ip->proto, ip->invflags, multiinfo->flags, multiinfo->count))
+ return -EINVAL;
+
+ return multiport_valid_ranges(multiinfo) ? 0 : -EINVAL;
}
static int multiport_mt6_check(const struct xt_mtchk_param *par)
@@ -136,8 +160,10 @@ static int multiport_mt6_check(const struct xt_mtchk_param *par)
const struct ip6t_ip6 *ip = par->entryinfo;
const struct xt_multiport_v1 *multiinfo = par->matchinfo;
- return check(ip->proto, ip->invflags, multiinfo->flags,
- multiinfo->count) ? 0 : -EINVAL;
+ if (!check(ip->proto, ip->invflags, multiinfo->flags, multiinfo->count))
+ return -EINVAL;
+
+ return multiport_valid_ranges(multiinfo) ? 0 : -EINVAL;
}
static struct xt_match multiport_mt_reg[] __read_mostly = {
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (2 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Zhengchuan Liang <zcliangcn@gmail.com>
`eui64_mt6()` derives a modified EUI-64 from the Ethernet source address
and compares it with the low 64 bits of the IPv6 source address.
The existing guard only rejects an invalid MAC header when
`par->fragoff != 0`. For packets with `par->fragoff == 0`, `eui64_mt6()`
can still reach `eth_hdr(skb)` even when the MAC header is not valid.
Fix this by removing the `par->fragoff != 0` condition so that packets
with an invalid MAC header are rejected before accessing `eth_hdr(skb)`.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Ren Wei <enjou1224z@gmail.com>
Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv6/netfilter/ip6t_eui64.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv6/netfilter/ip6t_eui64.c b/net/ipv6/netfilter/ip6t_eui64.c
index d704f7ed300c..da69a27e8332 100644
--- a/net/ipv6/netfilter/ip6t_eui64.c
+++ b/net/ipv6/netfilter/ip6t_eui64.c
@@ -22,8 +22,7 @@ eui64_mt6(const struct sk_buff *skb, struct xt_action_param *par)
unsigned char eui64[8];
if (!(skb_mac_header(skb) >= skb->head &&
- skb_mac_header(skb) + ETH_HLEN <= skb->data) &&
- par->fragoff != 0) {
+ skb_mac_header(skb) + ETH_HLEN <= skb->data)) {
par->hotdrop = true;
return false;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (3 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Tuan Do <tuan@calif.io>
nft_ct_timeout_obj_destroy() frees the timeout object with kfree()
immediately after nf_ct_untimeout(), without waiting for an RCU grace
period. Concurrent packet processing on other CPUs may still hold
RCU-protected references to the timeout object obtained via
rcu_dereference() in nf_ct_timeout_data().
Add an rcu_head to struct nf_ct_timeout and use kfree_rcu() to defer
freeing until after an RCU grace period, matching the approach already
used in nfnetlink_cttimeout.c.
KASAN report:
BUG: KASAN: slab-use-after-free in nf_conntrack_tcp_packet+0x1381/0x29d0
Read of size 4 at addr ffff8881035fe19c by task exploit/80
Call Trace:
nf_conntrack_tcp_packet+0x1381/0x29d0
nf_conntrack_in+0x612/0x8b0
nf_hook_slow+0x70/0x100
__ip_local_out+0x1b2/0x210
tcp_sendmsg_locked+0x722/0x1580
__sys_sendto+0x2d8/0x320
Allocated by task 75:
nft_ct_timeout_obj_init+0xf6/0x290
nft_obj_init+0x107/0x1b0
nf_tables_newobj+0x680/0x9c0
nfnetlink_rcv_batch+0xc29/0xe00
Freed by task 26:
nft_obj_destroy+0x3f/0xa0
nf_tables_trans_destroy_work+0x51c/0x5c0
process_one_work+0x2c4/0x5a0
Fixes: 7e0b2b57f01d ("netfilter: nft_ct: add ct timeout support")
Cc: stable@vger.kernel.org
Signed-off-by: Tuan Do <tuan@calif.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_conntrack_timeout.h | 1 +
net/netfilter/nft_ct.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/net/netfilter/nf_conntrack_timeout.h b/include/net/netfilter/nf_conntrack_timeout.h
index 9fdaba911de6..3a66d4abb6d6 100644
--- a/include/net/netfilter/nf_conntrack_timeout.h
+++ b/include/net/netfilter/nf_conntrack_timeout.h
@@ -14,6 +14,7 @@
struct nf_ct_timeout {
__u16 l3num;
const struct nf_conntrack_l4proto *l4proto;
+ struct rcu_head rcu;
char data[];
};
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 128ff8155b5d..04c74ccf9b84 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -1020,7 +1020,7 @@ static void nft_ct_timeout_obj_destroy(const struct nft_ctx *ctx,
nf_queue_nf_hook_drop(ctx->net);
nf_ct_untimeout(ctx->net, timeout);
nf_ct_netns_put(ctx->net, ctx->family);
- kfree(priv->timeout);
+ kfree_rcu(priv->timeout, rcu);
}
static int nft_ct_timeout_obj_dump(struct sk_buff *skb,
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (4 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Sharing a global hash table among all queues is tempting, but
it can cause crash:
BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
[..]
nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
nfnetlink_rcv_msg+0x46a/0x930
kmem_cache_alloc_node_noprof+0x11e/0x450
struct nf_queue_entry is freed via kfree, but parallel cpu can still
encounter such an nf_queue_entry when walking the list.
Alternative fix is to free the nf_queue_entry via kfree_rcu() instead,
but as we have to alloc/free for each skb this will cause more mem
pressure.
Cc: Scott Mitchell <scott.k.mitch1@gmail.com>
Fixes: e19079adcd26 ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_queue.h | 1 -
net/netfilter/nfnetlink_queue.c | 139 +++++++++++--------------------
2 files changed, 49 insertions(+), 91 deletions(-)
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 45eb26b2e95b..d17035d14d96 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -23,7 +23,6 @@ struct nf_queue_entry {
struct nf_hook_state state;
bool nf_ct_is_unconfirmed;
u16 size; /* sizeof(entry) + saved route keys */
- u16 queue_num;
/* extra space to store route keys */
};
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 47f7f62906e2..8e02f84784da 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -49,8 +49,8 @@
#endif
#define NFQNL_QMAX_DEFAULT 1024
-#define NFQNL_HASH_MIN 1024
-#define NFQNL_HASH_MAX 1048576
+#define NFQNL_HASH_MIN 8
+#define NFQNL_HASH_MAX 32768
/* We're using struct nlattr which has 16bit nla_len. Note that nla_len
* includes the header length. Thus, the maximum packet length that we
@@ -60,29 +60,10 @@
*/
#define NFQNL_MAX_COPY_RANGE (0xffff - NLA_HDRLEN)
-/* Composite key for packet lookup: (net, queue_num, packet_id) */
-struct nfqnl_packet_key {
- possible_net_t net;
- u32 packet_id;
- u16 queue_num;
-} __aligned(sizeof(u32)); /* jhash2 requires 32-bit alignment */
-
-/* Global rhashtable - one for entire system, all netns */
-static struct rhashtable nfqnl_packet_map __read_mostly;
-
-/* Helper to initialize composite key */
-static inline void nfqnl_init_key(struct nfqnl_packet_key *key,
- struct net *net, u32 packet_id, u16 queue_num)
-{
- memset(key, 0, sizeof(*key));
- write_pnet(&key->net, net);
- key->packet_id = packet_id;
- key->queue_num = queue_num;
-}
-
struct nfqnl_instance {
struct hlist_node hlist; /* global list of queues */
- struct rcu_head rcu;
+ struct rhashtable nfqnl_packet_map;
+ struct rcu_work rwork;
u32 peer_portid;
unsigned int queue_maxlen;
@@ -106,6 +87,7 @@ struct nfqnl_instance {
typedef int (*nfqnl_cmpfn)(struct nf_queue_entry *, unsigned long);
+static struct workqueue_struct *nfq_cleanup_wq __read_mostly;
static unsigned int nfnl_queue_net_id __read_mostly;
#define INSTANCE_BUCKETS 16
@@ -124,34 +106,10 @@ static inline u_int8_t instance_hashfn(u_int16_t queue_num)
return ((queue_num >> 8) ^ queue_num) % INSTANCE_BUCKETS;
}
-/* Extract composite key from nf_queue_entry for hashing */
-static u32 nfqnl_packet_obj_hashfn(const void *data, u32 len, u32 seed)
-{
- const struct nf_queue_entry *entry = data;
- struct nfqnl_packet_key key;
-
- nfqnl_init_key(&key, entry->state.net, entry->id, entry->queue_num);
-
- return jhash2((u32 *)&key, sizeof(key) / sizeof(u32), seed);
-}
-
-/* Compare stack-allocated key against entry */
-static int nfqnl_packet_obj_cmpfn(struct rhashtable_compare_arg *arg,
- const void *obj)
-{
- const struct nfqnl_packet_key *key = arg->key;
- const struct nf_queue_entry *entry = obj;
-
- return !net_eq(entry->state.net, read_pnet(&key->net)) ||
- entry->queue_num != key->queue_num ||
- entry->id != key->packet_id;
-}
-
static const struct rhashtable_params nfqnl_rhashtable_params = {
.head_offset = offsetof(struct nf_queue_entry, hash_node),
- .key_len = sizeof(struct nfqnl_packet_key),
- .obj_hashfn = nfqnl_packet_obj_hashfn,
- .obj_cmpfn = nfqnl_packet_obj_cmpfn,
+ .key_offset = offsetof(struct nf_queue_entry, id),
+ .key_len = sizeof(u32),
.automatic_shrinking = true,
.min_size = NFQNL_HASH_MIN,
.max_size = NFQNL_HASH_MAX,
@@ -190,6 +148,10 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
spin_lock_init(&inst->lock);
INIT_LIST_HEAD(&inst->queue_list);
+ err = rhashtable_init(&inst->nfqnl_packet_map, &nfqnl_rhashtable_params);
+ if (err < 0)
+ goto out_free;
+
spin_lock(&q->instances_lock);
if (instance_lookup(q, queue_num)) {
err = -EEXIST;
@@ -210,6 +172,8 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
out_unlock:
spin_unlock(&q->instances_lock);
+ rhashtable_destroy(&inst->nfqnl_packet_map);
+out_free:
kfree(inst);
return ERR_PTR(err);
}
@@ -217,15 +181,18 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
static void nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn,
unsigned long data);
-static void
-instance_destroy_rcu(struct rcu_head *head)
+static void instance_destroy_work(struct work_struct *work)
{
- struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance,
- rcu);
+ struct nfqnl_instance *inst;
+ inst = container_of(to_rcu_work(work), struct nfqnl_instance,
+ rwork);
rcu_read_lock();
nfqnl_flush(inst, NULL, 0);
rcu_read_unlock();
+
+ rhashtable_destroy(&inst->nfqnl_packet_map);
+
kfree(inst);
module_put(THIS_MODULE);
}
@@ -234,7 +201,9 @@ static void
__instance_destroy(struct nfqnl_instance *inst)
{
hlist_del_rcu(&inst->hlist);
- call_rcu(&inst->rcu, instance_destroy_rcu);
+
+ INIT_RCU_WORK(&inst->rwork, instance_destroy_work);
+ queue_rcu_work(nfq_cleanup_wq, &inst->rwork);
}
static void
@@ -250,9 +219,7 @@ __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
{
int err;
- entry->queue_num = queue->queue_num;
-
- err = rhashtable_insert_fast(&nfqnl_packet_map, &entry->hash_node,
+ err = rhashtable_insert_fast(&queue->nfqnl_packet_map, &entry->hash_node,
nfqnl_rhashtable_params);
if (unlikely(err))
return err;
@@ -266,23 +233,19 @@ __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
static void
__dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
{
- rhashtable_remove_fast(&nfqnl_packet_map, &entry->hash_node,
+ rhashtable_remove_fast(&queue->nfqnl_packet_map, &entry->hash_node,
nfqnl_rhashtable_params);
list_del(&entry->list);
queue->queue_total--;
}
static struct nf_queue_entry *
-find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id,
- struct net *net)
+find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
{
- struct nfqnl_packet_key key;
struct nf_queue_entry *entry;
- nfqnl_init_key(&key, net, id, queue->queue_num);
-
spin_lock_bh(&queue->lock);
- entry = rhashtable_lookup_fast(&nfqnl_packet_map, &key,
+ entry = rhashtable_lookup_fast(&queue->nfqnl_packet_map, &id,
nfqnl_rhashtable_params);
if (entry)
@@ -1531,7 +1494,7 @@ static int nfqnl_recv_verdict(struct sk_buff *skb, const struct nfnl_info *info,
verdict = ntohl(vhdr->verdict);
- entry = find_dequeue_entry(queue, ntohl(vhdr->id), info->net);
+ entry = find_dequeue_entry(queue, ntohl(vhdr->id));
if (entry == NULL)
return -ENOENT;
@@ -1880,40 +1843,38 @@ static int __init nfnetlink_queue_init(void)
{
int status;
- status = rhashtable_init(&nfqnl_packet_map, &nfqnl_rhashtable_params);
- if (status < 0)
- return status;
+ nfq_cleanup_wq = alloc_ordered_workqueue("nfq_workqueue", 0);
+ if (!nfq_cleanup_wq)
+ return -ENOMEM;
status = register_pernet_subsys(&nfnl_queue_net_ops);
- if (status < 0) {
- pr_err("failed to register pernet ops\n");
- goto cleanup_rhashtable;
- }
+ if (status < 0)
+ goto cleanup_pernet_subsys;
- netlink_register_notifier(&nfqnl_rtnl_notifier);
- status = nfnetlink_subsys_register(&nfqnl_subsys);
- if (status < 0) {
- pr_err("failed to create netlink socket\n");
- goto cleanup_netlink_notifier;
- }
+ status = netlink_register_notifier(&nfqnl_rtnl_notifier);
+ if (status < 0)
+ goto cleanup_rtnl_notifier;
status = register_netdevice_notifier(&nfqnl_dev_notifier);
- if (status < 0) {
- pr_err("failed to register netdevice notifier\n");
- goto cleanup_netlink_subsys;
- }
+ if (status < 0)
+ goto cleanup_dev_notifier;
+
+ status = nfnetlink_subsys_register(&nfqnl_subsys);
+ if (status < 0)
+ goto cleanup_nfqnl_subsys;
nf_register_queue_handler(&nfqh);
return status;
-cleanup_netlink_subsys:
- nfnetlink_subsys_unregister(&nfqnl_subsys);
-cleanup_netlink_notifier:
+cleanup_nfqnl_subsys:
+ unregister_netdevice_notifier(&nfqnl_dev_notifier);
+cleanup_dev_notifier:
netlink_unregister_notifier(&nfqnl_rtnl_notifier);
+cleanup_rtnl_notifier:
unregister_pernet_subsys(&nfnl_queue_net_ops);
-cleanup_rhashtable:
- rhashtable_destroy(&nfqnl_packet_map);
+cleanup_pernet_subsys:
+ destroy_workqueue(nfq_cleanup_wq);
return status;
}
@@ -1924,9 +1885,7 @@ static void __exit nfnetlink_queue_fini(void)
nfnetlink_subsys_unregister(&nfqnl_subsys);
netlink_unregister_notifier(&nfqnl_rtnl_notifier);
unregister_pernet_subsys(&nfnl_queue_net_ops);
-
- rhashtable_destroy(&nfqnl_packet_map);
-
+ destroy_workqueue(nfq_cleanup_wq);
rcu_barrier(); /* Wait for completion of call_rcu()'s */
}
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (5 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Fernando Fernandez Mancera <fmancera@suse.de>
Introduce a new stress test to check for race conditions in the
nfnetlink_queue subsystem, where an entry is freed while another CPU is
concurrently walking the global rhashtable.
To trigger this, `nf_queue.c` is extended with two new flags:
* -O (out-of-order): Buffers packet IDs and flushes them in reverse.
* -b (bogus verdicts): Floods the kernel with non-existent packet IDs.
The bogus verdict loop forces the kernel's lookup function to perform
full rhashtable bucket traversals (-ENOENT). Combined with reverse-order
flushing and heavy parallel UDP/ping flooding across 8 queues, this puts
the nfnetlink_queue code under pressure.
Joint work with Florian Westphal.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../selftests/net/netfilter/nf_queue.c | 50 +++++++++--
.../selftests/net/netfilter/nft_queue.sh | 83 ++++++++++++++++---
2 files changed, 115 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/nf_queue.c b/tools/testing/selftests/net/netfilter/nf_queue.c
index 116c0ca0eabb..8bbec37f5356 100644
--- a/tools/testing/selftests/net/netfilter/nf_queue.c
+++ b/tools/testing/selftests/net/netfilter/nf_queue.c
@@ -19,6 +19,8 @@ struct options {
bool count_packets;
bool gso_enabled;
bool failopen;
+ bool out_of_order;
+ bool bogus_verdict;
int verbose;
unsigned int queue_num;
unsigned int timeout;
@@ -31,7 +33,7 @@ static struct options opts;
static void help(const char *p)
{
- printf("Usage: %s [-c|-v [-vv] ] [-o] [-t timeout] [-q queue_num] [-Qdst_queue ] [ -d ms_delay ] [-G]\n", p);
+ printf("Usage: %s [-c|-v [-vv] ] [-o] [-O] [-b] [-t timeout] [-q queue_num] [-Qdst_queue ] [ -d ms_delay ] [-G]\n", p);
}
static int parse_attr_cb(const struct nlattr *attr, void *data)
@@ -275,7 +277,9 @@ static int mainloop(void)
unsigned int buflen = 64 * 1024 + MNL_SOCKET_BUFFER_SIZE;
struct mnl_socket *nl;
struct nlmsghdr *nlh;
+ uint32_t ooo_ids[16];
unsigned int portid;
+ int ooo_count = 0;
char *buf;
int ret;
@@ -308,6 +312,9 @@ static int mainloop(void)
ret = mnl_cb_run(buf, ret, 0, portid, queue_cb, NULL);
if (ret < 0) {
+ /* bogus verdict mode will generate ENOENT error messages */
+ if (opts.bogus_verdict && errno == ENOENT)
+ continue;
perror("mnl_cb_run");
exit(EXIT_FAILURE);
}
@@ -316,10 +323,35 @@ static int mainloop(void)
if (opts.delay_ms)
sleep_ms(opts.delay_ms);
- nlh = nfq_build_verdict(buf, id, opts.queue_num, opts.verdict);
- if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
- perror("mnl_socket_sendto");
- exit(EXIT_FAILURE);
+ if (opts.bogus_verdict) {
+ for (int i = 0; i < 50; i++) {
+ nlh = nfq_build_verdict(buf, id + 0x7FFFFFFF + i,
+ opts.queue_num, opts.verdict);
+ mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+ }
+ }
+
+ if (opts.out_of_order) {
+ ooo_ids[ooo_count] = id;
+ if (ooo_count >= 15) {
+ for (ooo_count; ooo_count >= 0; ooo_count--) {
+ nlh = nfq_build_verdict(buf, ooo_ids[ooo_count],
+ opts.queue_num, opts.verdict);
+ if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
+ perror("mnl_socket_sendto");
+ exit(EXIT_FAILURE);
+ }
+ }
+ ooo_count = 0;
+ } else {
+ ooo_count++;
+ }
+ } else {
+ nlh = nfq_build_verdict(buf, id, opts.queue_num, opts.verdict);
+ if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
+ perror("mnl_socket_sendto");
+ exit(EXIT_FAILURE);
+ }
}
}
@@ -332,7 +364,7 @@ static void parse_opts(int argc, char **argv)
{
int c;
- while ((c = getopt(argc, argv, "chvot:q:Q:d:G")) != -1) {
+ while ((c = getopt(argc, argv, "chvoObt:q:Q:d:G")) != -1) {
switch (c) {
case 'c':
opts.count_packets = true;
@@ -375,6 +407,12 @@ static void parse_opts(int argc, char **argv)
case 'v':
opts.verbose++;
break;
+ case 'O':
+ opts.out_of_order = true;
+ break;
+ case 'b':
+ opts.bogus_verdict = true;
+ break;
}
}
diff --git a/tools/testing/selftests/net/netfilter/nft_queue.sh b/tools/testing/selftests/net/netfilter/nft_queue.sh
index ea766bdc5d04..d80390848e85 100755
--- a/tools/testing/selftests/net/netfilter/nft_queue.sh
+++ b/tools/testing/selftests/net/netfilter/nft_queue.sh
@@ -11,6 +11,7 @@ ret=0
timeout=5
SCTP_TEST_TIMEOUT=60
+STRESS_TEST_TIMEOUT=30
cleanup()
{
@@ -719,6 +720,74 @@ EOF
fi
}
+check_tainted()
+{
+ local msg="$1"
+
+ if [ "$tainted_then" -ne 0 ];then
+ return
+ fi
+
+ read tainted_now < /proc/sys/kernel/tainted
+ if [ "$tainted_now" -eq 0 ];then
+ echo "PASS: $msg"
+ else
+ echo "TAINT: $msg"
+ dmesg
+ ret=1
+ fi
+}
+
+test_queue_stress()
+{
+ read tainted_then < /proc/sys/kernel/tainted
+ local i
+
+ ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF
+flush ruleset
+table inet t {
+ chain forward {
+ type filter hook forward priority 0; policy accept;
+
+ queue flags bypass to numgen random mod 8
+ }
+}
+EOF
+ timeout "$STRESS_TEST_TIMEOUT" ip netns exec "$ns2" \
+ socat -u UDP-LISTEN:12345,fork,pf=ipv4 STDOUT > /dev/null &
+
+ timeout "$STRESS_TEST_TIMEOUT" ip netns exec "$ns3" \
+ socat -u UDP-LISTEN:12345,fork,pf=ipv4 STDOUT > /dev/null &
+
+ for i in $(seq 0 7); do
+ ip netns exec "$nsrouter" timeout "$STRESS_TEST_TIMEOUT" \
+ ./nf_queue -q $i -t 2 -O -b > /dev/null &
+ done
+
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f 10.0.2.99 > /dev/null 2>&1 &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f 10.0.3.99 > /dev/null 2>&1 &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f "dead:2::99" > /dev/null 2>&1 &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f "dead:3::99" > /dev/null 2>&1 &
+
+ busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns2" 12345
+ busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns3" 12345
+
+ for i in $(seq 1 4);do
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ socat -u STDIN UDP-DATAGRAM:10.0.2.99:12345 < /dev/zero > /dev/null &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ socat -u STDIN UDP-DATAGRAM:10.0.3.99:12345 < /dev/zero > /dev/null &
+ done
+
+ wait
+
+ check_tainted "concurrent queueing"
+}
+
test_queue_removal()
{
read tainted_then < /proc/sys/kernel/tainted
@@ -742,18 +811,7 @@ EOF
ip netns exec "$ns1" nft flush ruleset
- if [ "$tainted_then" -ne 0 ];then
- return
- fi
-
- read tainted_now < /proc/sys/kernel/tainted
- if [ "$tainted_now" -eq 0 ];then
- echo "PASS: queue program exiting while packets queued"
- else
- echo "TAINT: queue program exiting while packets queued"
- dmesg
- ret=1
- fi
+ check_tainted "queue program exiting while packets queued"
}
ip netns exec "$nsrouter" sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
@@ -799,6 +857,7 @@ test_sctp_forward
test_sctp_output
test_udp_nat_race
test_udp_gro_ct
+test_queue_stress
# should be last, adds vrf device in ns1 and changes routes
test_icmp_vrf
--
2.52.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-08 16:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
-- strict thread matches above, loose matches on Subject: below --
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
2023-10-12 8:57 [PATCH net 0/7] netfilter " Florian Westphal
2023-05-10 8:33 [PATCH net 0/7] Netfilter " Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox