* [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Weiming Shi <bestswngs@gmail.com>
When ip_vs_bind_scheduler() succeeds in ip_vs_add_service(), the local
variable sched is set to NULL. If ip_vs_start_estimator() subsequently
fails, the out_err cleanup calls ip_vs_unbind_scheduler(svc, sched)
with sched == NULL. ip_vs_unbind_scheduler() passes the cur_sched NULL
check (because svc->scheduler was set by the successful bind) but then
dereferences the NULL sched parameter at sched->done_service, causing a
kernel panic at offset 0x30 from NULL.
Oops: general protection fault, [..] [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
RIP: 0010:ip_vs_unbind_scheduler (net/netfilter/ipvs/ip_vs_sched.c:69)
Call Trace:
<TASK>
ip_vs_add_service.isra.0 (net/netfilter/ipvs/ip_vs_ctl.c:1500)
do_ip_vs_set_ctl (net/netfilter/ipvs/ip_vs_ctl.c:2809)
nf_setsockopt (net/netfilter/nf_sockopt.c:102)
[..]
Fix by simply not clearing the local sched variable after a successful
bind. ip_vs_unbind_scheduler() already detects whether a scheduler is
installed via svc->scheduler, and keeping sched non-NULL ensures the
error path passes the correct pointer to both ip_vs_unbind_scheduler()
and ip_vs_scheduler_put().
While the bug is older, the problem popups in more recent kernels (6.2),
when the new error path is taken after the ip_vs_start_estimator() call.
Fixes: 705dd3444081 ("ipvs: use kthreads for stats estimation")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Acked-by: Simon Horman <horms@kernel.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/ipvs/ip_vs_ctl.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 35642de2a0fe..2aaf50f52c8e 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1452,7 +1452,6 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
ret = ip_vs_bind_scheduler(svc, sched);
if (ret)
goto out_err;
- sched = NULL;
}
ret = ip_vs_start_estimator(ipvs, &svc->stats);
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Xiang Mei <xmei5@asu.edu>
When batching multiple NFLOG messages (inst->qlen > 1), __nfulnl_send()
appends an NLMSG_DONE terminator with sizeof(struct nfgenmsg) payload via
nlmsg_put(), but never initializes the nfgenmsg bytes. The nlmsg_put()
helper only zeroes alignment padding after the payload, not the payload
itself, so four bytes of stale kernel heap data are leaked to userspace
in the NLMSG_DONE message body.
Use nfnl_msg_put() to build the NLMSG_DONE terminator, which initializes
the nfgenmsg payload via nfnl_fill_hdr(), consistent with how
__build_packet_message() already constructs NFULNL_MSG_PACKET headers.
Fixes: 29c5d4afba51 ("[NETFILTER]: nfnetlink_log: fix sending of multipart messages")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nfnetlink_log.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index f80978c06fa0..0db908518b2f 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -361,10 +361,10 @@ static void
__nfulnl_send(struct nfulnl_instance *inst)
{
if (inst->qlen > 1) {
- struct nlmsghdr *nlh = nlmsg_put(inst->skb, 0, 0,
- NLMSG_DONE,
- sizeof(struct nfgenmsg),
- 0);
+ struct nlmsghdr *nlh = nfnl_msg_put(inst->skb, 0, 0,
+ NLMSG_DONE, 0,
+ AF_UNSPEC, NFNETLINK_V0,
+ htons(inst->group_num));
if (WARN_ONCE(!nlh, "bad nlskb size: %u, tailroom %d\n",
inst->skb->len, skb_tailroom(inst->skb))) {
kfree_skb(inst->skb);
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Ren Wei <n05ec@lzu.edu.cn>
ports_match_v1() treats any non-zero pflags entry as the start of a
port range and unconditionally consumes the next ports[] element as
the range end.
The checkentry path currently validates protocol, flags and count, but
it does not validate the range encoding itself. As a result, malformed
rules can mark the last slot as a range start or place two range starts
back to back, leaving ports_match_v1() to step past the last valid
ports[] element while interpreting the rule.
Reject malformed multiport v1 rules in checkentry by validating that
each range start has a following element and that the following element
is not itself marked as another range start.
Fixes: a89ecb6a2ef7 ("[NETFILTER]: x_tables: unify IPv4/IPv6 multiport match")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Yuhang Zheng <z1652074432@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/xt_multiport.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/xt_multiport.c b/net/netfilter/xt_multiport.c
index 44a00f5acde8..a1691ff405d3 100644
--- a/net/netfilter/xt_multiport.c
+++ b/net/netfilter/xt_multiport.c
@@ -105,6 +105,28 @@ multiport_mt(const struct sk_buff *skb, struct xt_action_param *par)
return ports_match_v1(multiinfo, ntohs(pptr[0]), ntohs(pptr[1]));
}
+static bool
+multiport_valid_ranges(const struct xt_multiport_v1 *multiinfo)
+{
+ unsigned int i;
+
+ for (i = 0; i < multiinfo->count; i++) {
+ if (!multiinfo->pflags[i])
+ continue;
+
+ if (++i >= multiinfo->count)
+ return false;
+
+ if (multiinfo->pflags[i])
+ return false;
+
+ if (multiinfo->ports[i - 1] > multiinfo->ports[i])
+ return false;
+ }
+
+ return true;
+}
+
static inline bool
check(u_int16_t proto,
u_int8_t ip_invflags,
@@ -127,8 +149,10 @@ static int multiport_mt_check(const struct xt_mtchk_param *par)
const struct ipt_ip *ip = par->entryinfo;
const struct xt_multiport_v1 *multiinfo = par->matchinfo;
- return check(ip->proto, ip->invflags, multiinfo->flags,
- multiinfo->count) ? 0 : -EINVAL;
+ if (!check(ip->proto, ip->invflags, multiinfo->flags, multiinfo->count))
+ return -EINVAL;
+
+ return multiport_valid_ranges(multiinfo) ? 0 : -EINVAL;
}
static int multiport_mt6_check(const struct xt_mtchk_param *par)
@@ -136,8 +160,10 @@ static int multiport_mt6_check(const struct xt_mtchk_param *par)
const struct ip6t_ip6 *ip = par->entryinfo;
const struct xt_multiport_v1 *multiinfo = par->matchinfo;
- return check(ip->proto, ip->invflags, multiinfo->flags,
- multiinfo->count) ? 0 : -EINVAL;
+ if (!check(ip->proto, ip->invflags, multiinfo->flags, multiinfo->count))
+ return -EINVAL;
+
+ return multiport_valid_ranges(multiinfo) ? 0 : -EINVAL;
}
static struct xt_match multiport_mt_reg[] __read_mostly = {
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (2 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Zhengchuan Liang <zcliangcn@gmail.com>
`eui64_mt6()` derives a modified EUI-64 from the Ethernet source address
and compares it with the low 64 bits of the IPv6 source address.
The existing guard only rejects an invalid MAC header when
`par->fragoff != 0`. For packets with `par->fragoff == 0`, `eui64_mt6()`
can still reach `eth_hdr(skb)` even when the MAC header is not valid.
Fix this by removing the `par->fragoff != 0` condition so that packets
with an invalid MAC header are rejected before accessing `eth_hdr(skb)`.
Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Ren Wei <enjou1224z@gmail.com>
Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv6/netfilter/ip6t_eui64.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv6/netfilter/ip6t_eui64.c b/net/ipv6/netfilter/ip6t_eui64.c
index d704f7ed300c..da69a27e8332 100644
--- a/net/ipv6/netfilter/ip6t_eui64.c
+++ b/net/ipv6/netfilter/ip6t_eui64.c
@@ -22,8 +22,7 @@ eui64_mt6(const struct sk_buff *skb, struct xt_action_param *par)
unsigned char eui64[8];
if (!(skb_mac_header(skb) >= skb->head &&
- skb_mac_header(skb) + ETH_HLEN <= skb->data) &&
- par->fragoff != 0) {
+ skb_mac_header(skb) + ETH_HLEN <= skb->data)) {
par->hotdrop = true;
return false;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (3 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
6 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Tuan Do <tuan@calif.io>
nft_ct_timeout_obj_destroy() frees the timeout object with kfree()
immediately after nf_ct_untimeout(), without waiting for an RCU grace
period. Concurrent packet processing on other CPUs may still hold
RCU-protected references to the timeout object obtained via
rcu_dereference() in nf_ct_timeout_data().
Add an rcu_head to struct nf_ct_timeout and use kfree_rcu() to defer
freeing until after an RCU grace period, matching the approach already
used in nfnetlink_cttimeout.c.
KASAN report:
BUG: KASAN: slab-use-after-free in nf_conntrack_tcp_packet+0x1381/0x29d0
Read of size 4 at addr ffff8881035fe19c by task exploit/80
Call Trace:
nf_conntrack_tcp_packet+0x1381/0x29d0
nf_conntrack_in+0x612/0x8b0
nf_hook_slow+0x70/0x100
__ip_local_out+0x1b2/0x210
tcp_sendmsg_locked+0x722/0x1580
__sys_sendto+0x2d8/0x320
Allocated by task 75:
nft_ct_timeout_obj_init+0xf6/0x290
nft_obj_init+0x107/0x1b0
nf_tables_newobj+0x680/0x9c0
nfnetlink_rcv_batch+0xc29/0xe00
Freed by task 26:
nft_obj_destroy+0x3f/0xa0
nf_tables_trans_destroy_work+0x51c/0x5c0
process_one_work+0x2c4/0x5a0
Fixes: 7e0b2b57f01d ("netfilter: nft_ct: add ct timeout support")
Cc: stable@vger.kernel.org
Signed-off-by: Tuan Do <tuan@calif.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_conntrack_timeout.h | 1 +
net/netfilter/nft_ct.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/net/netfilter/nf_conntrack_timeout.h b/include/net/netfilter/nf_conntrack_timeout.h
index 9fdaba911de6..3a66d4abb6d6 100644
--- a/include/net/netfilter/nf_conntrack_timeout.h
+++ b/include/net/netfilter/nf_conntrack_timeout.h
@@ -14,6 +14,7 @@
struct nf_ct_timeout {
__u16 l3num;
const struct nf_conntrack_l4proto *l4proto;
+ struct rcu_head rcu;
char data[];
};
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 128ff8155b5d..04c74ccf9b84 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -1020,7 +1020,7 @@ static void nft_ct_timeout_obj_destroy(const struct nft_ctx *ctx,
nf_queue_nf_hook_drop(ctx->net);
nf_ct_untimeout(ctx->net, timeout);
nf_ct_netns_put(ctx->net, ctx->family);
- kfree(priv->timeout);
+ kfree_rcu(priv->timeout, rcu);
}
static int nft_ct_timeout_obj_dump(struct sk_buff *skb,
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (4 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
6 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Sharing a global hash table among all queues is tempting, but
it can cause crash:
BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
[..]
nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
nfnetlink_rcv_msg+0x46a/0x930
kmem_cache_alloc_node_noprof+0x11e/0x450
struct nf_queue_entry is freed via kfree, but parallel cpu can still
encounter such an nf_queue_entry when walking the list.
Alternative fix is to free the nf_queue_entry via kfree_rcu() instead,
but as we have to alloc/free for each skb this will cause more mem
pressure.
Cc: Scott Mitchell <scott.k.mitch1@gmail.com>
Fixes: e19079adcd26 ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_queue.h | 1 -
net/netfilter/nfnetlink_queue.c | 139 +++++++++++--------------------
2 files changed, 49 insertions(+), 91 deletions(-)
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 45eb26b2e95b..d17035d14d96 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -23,7 +23,6 @@ struct nf_queue_entry {
struct nf_hook_state state;
bool nf_ct_is_unconfirmed;
u16 size; /* sizeof(entry) + saved route keys */
- u16 queue_num;
/* extra space to store route keys */
};
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 47f7f62906e2..8e02f84784da 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -49,8 +49,8 @@
#endif
#define NFQNL_QMAX_DEFAULT 1024
-#define NFQNL_HASH_MIN 1024
-#define NFQNL_HASH_MAX 1048576
+#define NFQNL_HASH_MIN 8
+#define NFQNL_HASH_MAX 32768
/* We're using struct nlattr which has 16bit nla_len. Note that nla_len
* includes the header length. Thus, the maximum packet length that we
@@ -60,29 +60,10 @@
*/
#define NFQNL_MAX_COPY_RANGE (0xffff - NLA_HDRLEN)
-/* Composite key for packet lookup: (net, queue_num, packet_id) */
-struct nfqnl_packet_key {
- possible_net_t net;
- u32 packet_id;
- u16 queue_num;
-} __aligned(sizeof(u32)); /* jhash2 requires 32-bit alignment */
-
-/* Global rhashtable - one for entire system, all netns */
-static struct rhashtable nfqnl_packet_map __read_mostly;
-
-/* Helper to initialize composite key */
-static inline void nfqnl_init_key(struct nfqnl_packet_key *key,
- struct net *net, u32 packet_id, u16 queue_num)
-{
- memset(key, 0, sizeof(*key));
- write_pnet(&key->net, net);
- key->packet_id = packet_id;
- key->queue_num = queue_num;
-}
-
struct nfqnl_instance {
struct hlist_node hlist; /* global list of queues */
- struct rcu_head rcu;
+ struct rhashtable nfqnl_packet_map;
+ struct rcu_work rwork;
u32 peer_portid;
unsigned int queue_maxlen;
@@ -106,6 +87,7 @@ struct nfqnl_instance {
typedef int (*nfqnl_cmpfn)(struct nf_queue_entry *, unsigned long);
+static struct workqueue_struct *nfq_cleanup_wq __read_mostly;
static unsigned int nfnl_queue_net_id __read_mostly;
#define INSTANCE_BUCKETS 16
@@ -124,34 +106,10 @@ static inline u_int8_t instance_hashfn(u_int16_t queue_num)
return ((queue_num >> 8) ^ queue_num) % INSTANCE_BUCKETS;
}
-/* Extract composite key from nf_queue_entry for hashing */
-static u32 nfqnl_packet_obj_hashfn(const void *data, u32 len, u32 seed)
-{
- const struct nf_queue_entry *entry = data;
- struct nfqnl_packet_key key;
-
- nfqnl_init_key(&key, entry->state.net, entry->id, entry->queue_num);
-
- return jhash2((u32 *)&key, sizeof(key) / sizeof(u32), seed);
-}
-
-/* Compare stack-allocated key against entry */
-static int nfqnl_packet_obj_cmpfn(struct rhashtable_compare_arg *arg,
- const void *obj)
-{
- const struct nfqnl_packet_key *key = arg->key;
- const struct nf_queue_entry *entry = obj;
-
- return !net_eq(entry->state.net, read_pnet(&key->net)) ||
- entry->queue_num != key->queue_num ||
- entry->id != key->packet_id;
-}
-
static const struct rhashtable_params nfqnl_rhashtable_params = {
.head_offset = offsetof(struct nf_queue_entry, hash_node),
- .key_len = sizeof(struct nfqnl_packet_key),
- .obj_hashfn = nfqnl_packet_obj_hashfn,
- .obj_cmpfn = nfqnl_packet_obj_cmpfn,
+ .key_offset = offsetof(struct nf_queue_entry, id),
+ .key_len = sizeof(u32),
.automatic_shrinking = true,
.min_size = NFQNL_HASH_MIN,
.max_size = NFQNL_HASH_MAX,
@@ -190,6 +148,10 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
spin_lock_init(&inst->lock);
INIT_LIST_HEAD(&inst->queue_list);
+ err = rhashtable_init(&inst->nfqnl_packet_map, &nfqnl_rhashtable_params);
+ if (err < 0)
+ goto out_free;
+
spin_lock(&q->instances_lock);
if (instance_lookup(q, queue_num)) {
err = -EEXIST;
@@ -210,6 +172,8 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
out_unlock:
spin_unlock(&q->instances_lock);
+ rhashtable_destroy(&inst->nfqnl_packet_map);
+out_free:
kfree(inst);
return ERR_PTR(err);
}
@@ -217,15 +181,18 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
static void nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn,
unsigned long data);
-static void
-instance_destroy_rcu(struct rcu_head *head)
+static void instance_destroy_work(struct work_struct *work)
{
- struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance,
- rcu);
+ struct nfqnl_instance *inst;
+ inst = container_of(to_rcu_work(work), struct nfqnl_instance,
+ rwork);
rcu_read_lock();
nfqnl_flush(inst, NULL, 0);
rcu_read_unlock();
+
+ rhashtable_destroy(&inst->nfqnl_packet_map);
+
kfree(inst);
module_put(THIS_MODULE);
}
@@ -234,7 +201,9 @@ static void
__instance_destroy(struct nfqnl_instance *inst)
{
hlist_del_rcu(&inst->hlist);
- call_rcu(&inst->rcu, instance_destroy_rcu);
+
+ INIT_RCU_WORK(&inst->rwork, instance_destroy_work);
+ queue_rcu_work(nfq_cleanup_wq, &inst->rwork);
}
static void
@@ -250,9 +219,7 @@ __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
{
int err;
- entry->queue_num = queue->queue_num;
-
- err = rhashtable_insert_fast(&nfqnl_packet_map, &entry->hash_node,
+ err = rhashtable_insert_fast(&queue->nfqnl_packet_map, &entry->hash_node,
nfqnl_rhashtable_params);
if (unlikely(err))
return err;
@@ -266,23 +233,19 @@ __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
static void
__dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
{
- rhashtable_remove_fast(&nfqnl_packet_map, &entry->hash_node,
+ rhashtable_remove_fast(&queue->nfqnl_packet_map, &entry->hash_node,
nfqnl_rhashtable_params);
list_del(&entry->list);
queue->queue_total--;
}
static struct nf_queue_entry *
-find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id,
- struct net *net)
+find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
{
- struct nfqnl_packet_key key;
struct nf_queue_entry *entry;
- nfqnl_init_key(&key, net, id, queue->queue_num);
-
spin_lock_bh(&queue->lock);
- entry = rhashtable_lookup_fast(&nfqnl_packet_map, &key,
+ entry = rhashtable_lookup_fast(&queue->nfqnl_packet_map, &id,
nfqnl_rhashtable_params);
if (entry)
@@ -1531,7 +1494,7 @@ static int nfqnl_recv_verdict(struct sk_buff *skb, const struct nfnl_info *info,
verdict = ntohl(vhdr->verdict);
- entry = find_dequeue_entry(queue, ntohl(vhdr->id), info->net);
+ entry = find_dequeue_entry(queue, ntohl(vhdr->id));
if (entry == NULL)
return -ENOENT;
@@ -1880,40 +1843,38 @@ static int __init nfnetlink_queue_init(void)
{
int status;
- status = rhashtable_init(&nfqnl_packet_map, &nfqnl_rhashtable_params);
- if (status < 0)
- return status;
+ nfq_cleanup_wq = alloc_ordered_workqueue("nfq_workqueue", 0);
+ if (!nfq_cleanup_wq)
+ return -ENOMEM;
status = register_pernet_subsys(&nfnl_queue_net_ops);
- if (status < 0) {
- pr_err("failed to register pernet ops\n");
- goto cleanup_rhashtable;
- }
+ if (status < 0)
+ goto cleanup_pernet_subsys;
- netlink_register_notifier(&nfqnl_rtnl_notifier);
- status = nfnetlink_subsys_register(&nfqnl_subsys);
- if (status < 0) {
- pr_err("failed to create netlink socket\n");
- goto cleanup_netlink_notifier;
- }
+ status = netlink_register_notifier(&nfqnl_rtnl_notifier);
+ if (status < 0)
+ goto cleanup_rtnl_notifier;
status = register_netdevice_notifier(&nfqnl_dev_notifier);
- if (status < 0) {
- pr_err("failed to register netdevice notifier\n");
- goto cleanup_netlink_subsys;
- }
+ if (status < 0)
+ goto cleanup_dev_notifier;
+
+ status = nfnetlink_subsys_register(&nfqnl_subsys);
+ if (status < 0)
+ goto cleanup_nfqnl_subsys;
nf_register_queue_handler(&nfqh);
return status;
-cleanup_netlink_subsys:
- nfnetlink_subsys_unregister(&nfqnl_subsys);
-cleanup_netlink_notifier:
+cleanup_nfqnl_subsys:
+ unregister_netdevice_notifier(&nfqnl_dev_notifier);
+cleanup_dev_notifier:
netlink_unregister_notifier(&nfqnl_rtnl_notifier);
+cleanup_rtnl_notifier:
unregister_pernet_subsys(&nfnl_queue_net_ops);
-cleanup_rhashtable:
- rhashtable_destroy(&nfqnl_packet_map);
+cleanup_pernet_subsys:
+ destroy_workqueue(nfq_cleanup_wq);
return status;
}
@@ -1924,9 +1885,7 @@ static void __exit nfnetlink_queue_fini(void)
nfnetlink_subsys_unregister(&nfqnl_subsys);
netlink_unregister_notifier(&nfqnl_rtnl_notifier);
unregister_pernet_subsys(&nfnl_queue_net_ops);
-
- rhashtable_destroy(&nfqnl_packet_map);
-
+ destroy_workqueue(nfq_cleanup_wq);
rcu_barrier(); /* Wait for completion of call_rcu()'s */
}
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
` (5 preceding siblings ...)
2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
6 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
From: Fernando Fernandez Mancera <fmancera@suse.de>
Introduce a new stress test to check for race conditions in the
nfnetlink_queue subsystem, where an entry is freed while another CPU is
concurrently walking the global rhashtable.
To trigger this, `nf_queue.c` is extended with two new flags:
* -O (out-of-order): Buffers packet IDs and flushes them in reverse.
* -b (bogus verdicts): Floods the kernel with non-existent packet IDs.
The bogus verdict loop forces the kernel's lookup function to perform
full rhashtable bucket traversals (-ENOENT). Combined with reverse-order
flushing and heavy parallel UDP/ping flooding across 8 queues, this puts
the nfnetlink_queue code under pressure.
Joint work with Florian Westphal.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../selftests/net/netfilter/nf_queue.c | 50 +++++++++--
.../selftests/net/netfilter/nft_queue.sh | 83 ++++++++++++++++---
2 files changed, 115 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/net/netfilter/nf_queue.c b/tools/testing/selftests/net/netfilter/nf_queue.c
index 116c0ca0eabb..8bbec37f5356 100644
--- a/tools/testing/selftests/net/netfilter/nf_queue.c
+++ b/tools/testing/selftests/net/netfilter/nf_queue.c
@@ -19,6 +19,8 @@ struct options {
bool count_packets;
bool gso_enabled;
bool failopen;
+ bool out_of_order;
+ bool bogus_verdict;
int verbose;
unsigned int queue_num;
unsigned int timeout;
@@ -31,7 +33,7 @@ static struct options opts;
static void help(const char *p)
{
- printf("Usage: %s [-c|-v [-vv] ] [-o] [-t timeout] [-q queue_num] [-Qdst_queue ] [ -d ms_delay ] [-G]\n", p);
+ printf("Usage: %s [-c|-v [-vv] ] [-o] [-O] [-b] [-t timeout] [-q queue_num] [-Qdst_queue ] [ -d ms_delay ] [-G]\n", p);
}
static int parse_attr_cb(const struct nlattr *attr, void *data)
@@ -275,7 +277,9 @@ static int mainloop(void)
unsigned int buflen = 64 * 1024 + MNL_SOCKET_BUFFER_SIZE;
struct mnl_socket *nl;
struct nlmsghdr *nlh;
+ uint32_t ooo_ids[16];
unsigned int portid;
+ int ooo_count = 0;
char *buf;
int ret;
@@ -308,6 +312,9 @@ static int mainloop(void)
ret = mnl_cb_run(buf, ret, 0, portid, queue_cb, NULL);
if (ret < 0) {
+ /* bogus verdict mode will generate ENOENT error messages */
+ if (opts.bogus_verdict && errno == ENOENT)
+ continue;
perror("mnl_cb_run");
exit(EXIT_FAILURE);
}
@@ -316,10 +323,35 @@ static int mainloop(void)
if (opts.delay_ms)
sleep_ms(opts.delay_ms);
- nlh = nfq_build_verdict(buf, id, opts.queue_num, opts.verdict);
- if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
- perror("mnl_socket_sendto");
- exit(EXIT_FAILURE);
+ if (opts.bogus_verdict) {
+ for (int i = 0; i < 50; i++) {
+ nlh = nfq_build_verdict(buf, id + 0x7FFFFFFF + i,
+ opts.queue_num, opts.verdict);
+ mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+ }
+ }
+
+ if (opts.out_of_order) {
+ ooo_ids[ooo_count] = id;
+ if (ooo_count >= 15) {
+ for (ooo_count; ooo_count >= 0; ooo_count--) {
+ nlh = nfq_build_verdict(buf, ooo_ids[ooo_count],
+ opts.queue_num, opts.verdict);
+ if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
+ perror("mnl_socket_sendto");
+ exit(EXIT_FAILURE);
+ }
+ }
+ ooo_count = 0;
+ } else {
+ ooo_count++;
+ }
+ } else {
+ nlh = nfq_build_verdict(buf, id, opts.queue_num, opts.verdict);
+ if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
+ perror("mnl_socket_sendto");
+ exit(EXIT_FAILURE);
+ }
}
}
@@ -332,7 +364,7 @@ static void parse_opts(int argc, char **argv)
{
int c;
- while ((c = getopt(argc, argv, "chvot:q:Q:d:G")) != -1) {
+ while ((c = getopt(argc, argv, "chvoObt:q:Q:d:G")) != -1) {
switch (c) {
case 'c':
opts.count_packets = true;
@@ -375,6 +407,12 @@ static void parse_opts(int argc, char **argv)
case 'v':
opts.verbose++;
break;
+ case 'O':
+ opts.out_of_order = true;
+ break;
+ case 'b':
+ opts.bogus_verdict = true;
+ break;
}
}
diff --git a/tools/testing/selftests/net/netfilter/nft_queue.sh b/tools/testing/selftests/net/netfilter/nft_queue.sh
index ea766bdc5d04..d80390848e85 100755
--- a/tools/testing/selftests/net/netfilter/nft_queue.sh
+++ b/tools/testing/selftests/net/netfilter/nft_queue.sh
@@ -11,6 +11,7 @@ ret=0
timeout=5
SCTP_TEST_TIMEOUT=60
+STRESS_TEST_TIMEOUT=30
cleanup()
{
@@ -719,6 +720,74 @@ EOF
fi
}
+check_tainted()
+{
+ local msg="$1"
+
+ if [ "$tainted_then" -ne 0 ];then
+ return
+ fi
+
+ read tainted_now < /proc/sys/kernel/tainted
+ if [ "$tainted_now" -eq 0 ];then
+ echo "PASS: $msg"
+ else
+ echo "TAINT: $msg"
+ dmesg
+ ret=1
+ fi
+}
+
+test_queue_stress()
+{
+ read tainted_then < /proc/sys/kernel/tainted
+ local i
+
+ ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF
+flush ruleset
+table inet t {
+ chain forward {
+ type filter hook forward priority 0; policy accept;
+
+ queue flags bypass to numgen random mod 8
+ }
+}
+EOF
+ timeout "$STRESS_TEST_TIMEOUT" ip netns exec "$ns2" \
+ socat -u UDP-LISTEN:12345,fork,pf=ipv4 STDOUT > /dev/null &
+
+ timeout "$STRESS_TEST_TIMEOUT" ip netns exec "$ns3" \
+ socat -u UDP-LISTEN:12345,fork,pf=ipv4 STDOUT > /dev/null &
+
+ for i in $(seq 0 7); do
+ ip netns exec "$nsrouter" timeout "$STRESS_TEST_TIMEOUT" \
+ ./nf_queue -q $i -t 2 -O -b > /dev/null &
+ done
+
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f 10.0.2.99 > /dev/null 2>&1 &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f 10.0.3.99 > /dev/null 2>&1 &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f "dead:2::99" > /dev/null 2>&1 &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ ping -q -f "dead:3::99" > /dev/null 2>&1 &
+
+ busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns2" 12345
+ busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns3" 12345
+
+ for i in $(seq 1 4);do
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ socat -u STDIN UDP-DATAGRAM:10.0.2.99:12345 < /dev/zero > /dev/null &
+ ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+ socat -u STDIN UDP-DATAGRAM:10.0.3.99:12345 < /dev/zero > /dev/null &
+ done
+
+ wait
+
+ check_tainted "concurrent queueing"
+}
+
test_queue_removal()
{
read tainted_then < /proc/sys/kernel/tainted
@@ -742,18 +811,7 @@ EOF
ip netns exec "$ns1" nft flush ruleset
- if [ "$tainted_then" -ne 0 ];then
- return
- fi
-
- read tainted_now < /proc/sys/kernel/tainted
- if [ "$tainted_now" -eq 0 ];then
- echo "PASS: queue program exiting while packets queued"
- else
- echo "TAINT: queue program exiting while packets queued"
- dmesg
- ret=1
- fi
+ check_tainted "queue program exiting while packets queued"
}
ip netns exec "$nsrouter" sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
@@ -799,6 +857,7 @@ test_sctp_forward
test_sctp_output
test_udp_nat_race
test_udp_gro_ct
+test_queue_stress
# should be last, adds vrf device in ns1 and changes routes
test_icmp_vrf
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread