public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/7] Netfilter updates for net
@ 2023-05-10  8:33 Pablo Neira Ayuso
  0 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2023-05-10  8:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet

Hi,

The following patchset contains Netfilter fixes for net:

1) Fix UAF when releasing netnamespace, from Florian Westphal.

2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
   from Florian Westphal.

3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.

4) Extend nft_flowtable.sh selftest to cover integration with
   ingress/egress hooks, from Florian Westphal.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-05-10

Thanks.

----------------------------------------------------------------

The following changes since commit 582dbb2cc1a0a7427840f5b1e3c65608e511b061:

  net: phy: bcm7xx: Correct read from expansion register (2023-05-09 20:25:52 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-05-10

for you to fetch changes up to 3acf8f6c14d0e42b889738d63b6d9cb63348fc94:

  selftests: nft_flowtable.sh: check ingress/egress chain too (2023-05-10 09:31:07 +0200)

----------------------------------------------------------------
netfilter pull request 23-05-10

----------------------------------------------------------------
Boris Sukholitko (4):
      selftests: nft_flowtable.sh: use /proc for pid checking
      selftests: nft_flowtable.sh: no need for ps -x option
      selftests: nft_flowtable.sh: wait for specific nc pids
      selftests: nft_flowtable.sh: monitor result file sizes

Florian Westphal (3):
      netfilter: nf_tables: always release netdev hooks from notifier
      netfilter: conntrack: fix possible bug_on with enable_hooks=1
      selftests: nft_flowtable.sh: check ingress/egress chain too

 net/netfilter/core.c                               |   6 +-
 net/netfilter/nf_conntrack_standalone.c            |   3 +-
 net/netfilter/nft_chain_filter.c                   |   9 +-
 tools/testing/selftests/netfilter/nft_flowtable.sh | 145 ++++++++++++++++++++-
 4 files changed, 151 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net 0/7] netfilter updates for net
@ 2023-10-12  8:57 Florian Westphal
  0 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2023-10-12  8:57 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel

Hello,

The following contains patches for your *net* tree.

Patch 1, from Pablo Neira Ayuso, fixes a performance regression
(since 6.4) when a large pending set update has to be canceled towards
the end of the transaction.

Patch 2 from myself, silences an incorrect compiler warning reported
with a few (older) compiler toolchains.

Patch 3, from Kees Cook, adds __counted_by annotation to
nft_pipapo set backend type.  I took this for net instead of -next
given infra is already in place and no actual code change is made.

Patch 4, from Pablo Neira Ayso, disables timeout resets on
stateful element reset.  The rest should only affect internal object
state, e.g. reset a quota or counter, but not affect a pending timeout.

Patches 5 and 6 fix NULL dereferences in 'inner header' match,
control plane doesn't test for netlink attribute presence before
accessing them. Broken since feature was added in 6.2, fixes from
Xingyuan Mo.

Last patch, from myself, fixes a bogus rule match when skb has
a 0-length mac header, in this case we'd fetch data from network
header instead of canceling rule evaluation.  This is a day 0 bug,
present since nftables was merged in 3.13.

The following changes since commit 50e492143374c17ad89c865a1a44837b3f5c8226:

  octeontx2-pf: Fix page pool frag allocation warning (2023-10-12 09:48:51 +0200)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-10-12

for you to fetch changes up to d351c1ea2de3e36e608fc355d8ae7d0cc80e6cd6:

  netfilter: nft_payload: fix wrong mac header matching (2023-10-12 10:28:45 +0200)

----------------------------------------------------------------
nf pull request 2023-10-12

----------------------------------------------------------------
Florian Westphal (2):
      netfilter: nfnetlink_log: silence bogus compiler warning
      netfilter: nft_payload: fix wrong mac header matching

Kees Cook (1):
      netfilter: nf_tables: Annotate struct nft_pipapo_match with __counted_by

Pablo Neira Ayuso (2):
      netfilter: nf_tables: do not remove elements if set backend implements .abort
      netfilter: nf_tables: do not refresh timeout when resetting element

Xingyuan Mo (2):
      nf_tables: fix NULL pointer dereference in nft_inner_init()
      nf_tables: fix NULL pointer dereference in nft_expr_inner_parse()

 net/netfilter/nf_tables_api.c  | 25 ++++++++++---------------
 net/netfilter/nfnetlink_log.c  |  2 +-
 net/netfilter/nft_inner.c      |  1 +
 net/netfilter/nft_payload.c    |  2 +-
 net/netfilter/nft_set_pipapo.h |  2 +-
 5 files changed, 14 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net 0/7] netfilter: updates for net
@ 2025-09-10 19:03 Florian Westphal
  0 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Hi,

The following patchset contains Netfilter fixes for *net*:

WARNING: This results in a conflict on net -> net-next merge.
Merge resolution walkthrough is at the end of this cover letter, see
MERGE WALKTHROUGH.

  Merge branch 'mptcp-misc-fixes-for-v6-17-rc6' (2025-09-09 18:39:55 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-09-10-v2

for you to fetch changes up to 37a9675e61a2a2a721a28043ffdf2c8ec81eba37:

  MAINTAINERS: add Phil as netfilter reviewer (2025-09-10 20:32:46 +0200)

First patch adds a lockdep annotation for a false-positive splat.
Last patch adds formal reviewer tag for Phil Sutter to MAINTAINERS.

Rest of the patches resolve spurious false negative results during set
lookups while another CPU is processing a transaction.

This has been broken at least since v4.18 when an unconditional
synchronize_rcu call was removed from the commit phase of nf_tables.

Quoting from Stefan Hanreichs original report:

 It seems like we've found an issue with atomicity when reloading
 nftables rulesets. Sometimes there is a small window where rules
 containing sets do not seem to apply to incoming traffic, due to the set
 apparently being empty for a short amount of time when flushing / adding
 elements.

Exanple ruleset:
table ip filter {
  set match {
    type ipv4_addr
    flags interval
    elements = { 0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
  }

  chain pre {
    type filter hook prerouting priority filter; policy accept;
    ip saddr @match accept
    counter comment "must never match"
  }
}

Reproducer transaction:
while true:
nft -f -<<EOF
 flush set ip filter match
 create element ip filter match { \
    0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
EOF
done

Then create traffic. to/from e.g. 192.168.2.1 to 192.168.3.10.
Once in a while the counter will increment even though the
'ip saddr @match' rule should have accepted the packet.

See individual patches for details.

Thanks to Stefan Hanreich for an initial description and reproducer for
this bug and to Pablo Neira Ayuso for reviewing earlier iterations of
the patchset.

Florian Westphal (7):
  netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
  netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
  netfilter: nft_set_rbtree: continue traversal if element is inactive
  netfilter: nf_tables: place base_seq in struct net
  netfilter: nf_tables: make nft_set_do_lookup available unconditionally
  netfilter: nf_tables: restart set lookup on base_seq change
  MAINTAINERS: add Phil as netfilter reviewer

 MAINTAINERS                            |  1 +
 include/net/netfilter/nf_tables.h      |  1 -
 include/net/netfilter/nf_tables_core.h | 10 +---
 include/net/netns/nftables.h           |  1 +
 net/netfilter/nf_tables_api.c          | 66 +++++++++++++-------------
 net/netfilter/nft_lookup.c             | 46 ++++++++++++++++--
 net/netfilter/nft_set_bitmap.c         |  3 +-
 net/netfilter/nft_set_pipapo.c         | 20 +++++++-
 net/netfilter/nft_set_pipapo_avx2.c    |  4 +-
 net/netfilter/nft_set_rbtree.c         |  6 +--
 10 files changed, 103 insertions(+), 55 deletions(-)

MERGE WALKTHROUGH:

When merging this to net-next, you should see following:

CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo.c
CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo_avx2.c

Instructions for net/netfilter/nft_set_pipapo.c:

@@@ -562,7 -539,7 +578,11 @@@ nft_pipapo_lookup(const struct net *net
        const struct nft_pipapo_elem *e;
  
        m = rcu_dereference(priv->match);
++<<<<<<< HEAD
 +      e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
++=======
+       e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
  
        return e ? &e->ext : NULL;
  }

Take the HEAD chunk, with 'genmask' replaced by NFT_GENMASK_ANY, i.e.:

e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());

Instructions for net/netfilter/nft_set_pipapo_avx2.c:

++<<<<<<< HEAD
++=======
+       const struct nft_pipapo_match *m;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88

Take the HEAD chunk, i.e. delete 'const struct nft_pipapo_match *m;':
In -next, this is passed as function argument.

++<<<<<<< HEAD
 +              if (ret < 0) {
 +                      scratch->map_index = map_index;
 +                      kernel_fpu_end();
 +                      __local_unlock_nested_bh(&scratch->bh_lock);
 +                      return NULL;
++=======
+               if (ret < 0)
+                       goto out;
+ 
+               if (last) {
+                       const struct nft_set_ext *e = &f->mt[ret].e->ext;
+ 
+                       if (unlikely(nft_set_elem_expired(e)))
+                               goto next_match;
+ 
+                       ext = e;
+                       goto out;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88

Take the HEAD chunk and discard the other; including if (last) { branch.

Then, in nft_pipapo_avx2_lookup(), make this change:

@@ -1274,9 +1273,8 @@
 nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
                       const u32 *key)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
-       u8 genmask = nft_genmask_cur(net);
        const struct nft_pipapo_match *m;
        const u8 *rp = (const u8 *)key;
        const struct nft_pipapo_elem *e;

@@ -1292,9 +1290,9 @@
        }

        m = rcu_dereference(priv->match);

-       e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+       e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
        local_bh_enable();

        return e ? &e->ext : NULL;

After this change, you are done.
The expected diff vs the net-next main branch in these two files is:

--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -549,6 +549,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
  *
  * This function is called from the data path.  It will search for
  * an element matching the given key in the current active copy.
+ * Unlike other set types, this uses NFT_GENMASK_ANY instead of
+ * nft_genmask_cur().
[trimmed rest of comment]
  *
  * Return: ntables API extension pointer or NULL if no match.
  */
@@ -557,12 +574,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
                  const u32 *key)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
-       u8 genmask = nft_genmask_cur(net);
        const struct nft_pipapo_match *m;
        const struct nft_pipapo_elem *e;

        m = rcu_dereference(priv->match);
-       e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
+       e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());

        return e ? &e->ext : NULL;
 }

--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1275,7 +1275,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
                       const u32 *key)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
-       u8 genmask = nft_genmask_cur(net);
        const struct nft_pipapo_match *m;
        const u8 *rp = (const u8 *)key;
        const struct nft_pipapo_elem *e;
@@ -1293,7 +1292,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,

        m = rcu_dereference(priv->match);

-       e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+       e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
        local_bh_enable();

        return e ? &e->ext : NULL;
-- 
2.49.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net 0/7] netfilter updates for net
@ 2026-04-08 16:35 Florian Westphal
  2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Hi.

This pull requests contain netfilter fixes for the *net* tree.
I only included crash fixes, as we're closer to a release, rest will
be handled via -next.

1) Fix a NULL pointer dereference in ip_vs_add_service error path, from
   Weiming Shi, bug added in 6.2 development cycle.

2) Don't leak kernel data bytes from allocator to userspace: nfnetlink_log
   needs to init the trailing NLMSG_DONE terminator. From Xiang Mei.

3) xt_multiport match lacks range validation, bogus userspace request will
   cause out-of-bounds read. From Ren Wei.

4) ip6t_eui64 match must reject packets with invalid mac header before
   calling eth_hdr. Make existing check unconditional.  From Zhengchuan
   Liang.

5) nft_ct timeout policies are free'd via kfree() while they may still
   be reachable by other cpus that process a conntrack object that
   uses such a timeout policy.  Existing reaping of entries is not
   sufficient because it doesn't wait for a grace period.  Use kfree_rcu().
   From Tuan Do.

6/7) Make nfnetlink_queue hash table per queue.  As-is we can hit a page
   fault in case underlying page of removed element was free'd.  Per-queue
   hash prevents parallel lookups.  This comes with a test case that
   demonstrates the bug, from Fernando Fernandez Mancera.

Please, pull these changes from:
The following changes since commit f821664dde29302e8450aa0597bf1e4c7c5b0a22:

  Merge branch 'seg6-fix-dst_cache-sharing-in-seg6-lwtunnel' (2026-04-07 20:21:00 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-26-04-08

for you to fetch changes up to dde1a6084c5ca9d143a562540d5453454d79ea15:

  selftests: nft_queue.sh: add a parallel stress test (2026-04-08 13:34:51 +0200)

----------------------------------------------------------------
netfilter pull request nf-26-04-08

----------------------------------------------------------------

Fernando Fernandez Mancera (1):
  selftests: nft_queue.sh: add a parallel stress test

Florian Westphal (1):
  netfilter: nfnetlink_queue: make hash table per queue

Ren Wei (1):
  netfilter: xt_multiport: validate range encoding in checkentry

Tuan Do (1):
  netfilter: nft_ct: fix use-after-free in timeout object destroy

Weiming Shi (1):
  ipvs: fix NULL deref in ip_vs_add_service error path

Xiang Mei (1):
  netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator

Zhengchuan Liang (1):
  netfilter: ip6t_eui64: reject invalid MAC header for all packets

 include/net/netfilter/nf_conntrack_timeout.h  |   1 +
 include/net/netfilter/nf_queue.h              |   1 -
 net/ipv6/netfilter/ip6t_eui64.c               |   3 +-
 net/netfilter/ipvs/ip_vs_ctl.c                |   1 -
 net/netfilter/nfnetlink_log.c                 |   8 +-
 net/netfilter/nfnetlink_queue.c               | 139 ++++++------------
 net/netfilter/nft_ct.c                        |   2 +-
 net/netfilter/xt_multiport.c                  |  34 ++++-
 .../selftests/net/netfilter/nf_queue.c        |  50 ++++++-
 .../selftests/net/netfilter/nft_queue.sh      |  83 +++++++++--
 10 files changed, 201 insertions(+), 121 deletions(-)
-- 
2.52.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path
  2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
  2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Weiming Shi <bestswngs@gmail.com>

When ip_vs_bind_scheduler() succeeds in ip_vs_add_service(), the local
variable sched is set to NULL.  If ip_vs_start_estimator() subsequently
fails, the out_err cleanup calls ip_vs_unbind_scheduler(svc, sched)
with sched == NULL.  ip_vs_unbind_scheduler() passes the cur_sched NULL
check (because svc->scheduler was set by the successful bind) but then
dereferences the NULL sched parameter at sched->done_service, causing a
kernel panic at offset 0x30 from NULL.

 Oops: general protection fault, [..] [#1] PREEMPT SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
 RIP: 0010:ip_vs_unbind_scheduler (net/netfilter/ipvs/ip_vs_sched.c:69)
 Call Trace:
  <TASK>
  ip_vs_add_service.isra.0 (net/netfilter/ipvs/ip_vs_ctl.c:1500)
  do_ip_vs_set_ctl (net/netfilter/ipvs/ip_vs_ctl.c:2809)
  nf_setsockopt (net/netfilter/nf_sockopt.c:102)
  [..]

Fix by simply not clearing the local sched variable after a successful
bind.  ip_vs_unbind_scheduler() already detects whether a scheduler is
installed via svc->scheduler, and keeping sched non-NULL ensures the
error path passes the correct pointer to both ip_vs_unbind_scheduler()
and ip_vs_scheduler_put().

While the bug is older, the problem popups in more recent kernels (6.2),
when the new error path is taken after the ip_vs_start_estimator() call.

Fixes: 705dd3444081 ("ipvs: use kthreads for stats estimation")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Acked-by: Simon Horman <horms@kernel.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 35642de2a0fe..2aaf50f52c8e 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1452,7 +1452,6 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
 		ret = ip_vs_bind_scheduler(svc, sched);
 		if (ret)
 			goto out_err;
-		sched = NULL;
 	}
 
 	ret = ip_vs_start_estimator(ipvs, &svc->stats);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator
  2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
  2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
  2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Xiang Mei <xmei5@asu.edu>

When batching multiple NFLOG messages (inst->qlen > 1), __nfulnl_send()
appends an NLMSG_DONE terminator with sizeof(struct nfgenmsg) payload via
nlmsg_put(), but never initializes the nfgenmsg bytes. The nlmsg_put()
helper only zeroes alignment padding after the payload, not the payload
itself, so four bytes of stale kernel heap data are leaked to userspace
in the NLMSG_DONE message body.

Use nfnl_msg_put() to build the NLMSG_DONE terminator, which initializes
the nfgenmsg payload via nfnl_fill_hdr(), consistent with how
__build_packet_message() already constructs NFULNL_MSG_PACKET headers.

Fixes: 29c5d4afba51 ("[NETFILTER]: nfnetlink_log: fix sending of multipart messages")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nfnetlink_log.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index f80978c06fa0..0db908518b2f 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -361,10 +361,10 @@ static void
 __nfulnl_send(struct nfulnl_instance *inst)
 {
 	if (inst->qlen > 1) {
-		struct nlmsghdr *nlh = nlmsg_put(inst->skb, 0, 0,
-						 NLMSG_DONE,
-						 sizeof(struct nfgenmsg),
-						 0);
+		struct nlmsghdr *nlh = nfnl_msg_put(inst->skb, 0, 0,
+						    NLMSG_DONE, 0,
+						    AF_UNSPEC, NFNETLINK_V0,
+						    htons(inst->group_num));
 		if (WARN_ONCE(!nlh, "bad nlskb size: %u, tailroom %d\n",
 			      inst->skb->len, skb_tailroom(inst->skb))) {
 			kfree_skb(inst->skb);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry
  2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
  2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
  2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
  2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Ren Wei <n05ec@lzu.edu.cn>

ports_match_v1() treats any non-zero pflags entry as the start of a
port range and unconditionally consumes the next ports[] element as
the range end.

The checkentry path currently validates protocol, flags and count, but
it does not validate the range encoding itself. As a result, malformed
rules can mark the last slot as a range start or place two range starts
back to back, leaving ports_match_v1() to step past the last valid
ports[] element while interpreting the rule.

Reject malformed multiport v1 rules in checkentry by validating that
each range start has a following element and that the following element
is not itself marked as another range start.

Fixes: a89ecb6a2ef7 ("[NETFILTER]: x_tables: unify IPv4/IPv6 multiport match")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Yuhang Zheng <z1652074432@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_multiport.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/xt_multiport.c b/net/netfilter/xt_multiport.c
index 44a00f5acde8..a1691ff405d3 100644
--- a/net/netfilter/xt_multiport.c
+++ b/net/netfilter/xt_multiport.c
@@ -105,6 +105,28 @@ multiport_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	return ports_match_v1(multiinfo, ntohs(pptr[0]), ntohs(pptr[1]));
 }
 
+static bool
+multiport_valid_ranges(const struct xt_multiport_v1 *multiinfo)
+{
+	unsigned int i;
+
+	for (i = 0; i < multiinfo->count; i++) {
+		if (!multiinfo->pflags[i])
+			continue;
+
+		if (++i >= multiinfo->count)
+			return false;
+
+		if (multiinfo->pflags[i])
+			return false;
+
+		if (multiinfo->ports[i - 1] > multiinfo->ports[i])
+			return false;
+	}
+
+	return true;
+}
+
 static inline bool
 check(u_int16_t proto,
       u_int8_t ip_invflags,
@@ -127,8 +149,10 @@ static int multiport_mt_check(const struct xt_mtchk_param *par)
 	const struct ipt_ip *ip = par->entryinfo;
 	const struct xt_multiport_v1 *multiinfo = par->matchinfo;
 
-	return check(ip->proto, ip->invflags, multiinfo->flags,
-		     multiinfo->count) ? 0 : -EINVAL;
+	if (!check(ip->proto, ip->invflags, multiinfo->flags, multiinfo->count))
+		return -EINVAL;
+
+	return multiport_valid_ranges(multiinfo) ? 0 : -EINVAL;
 }
 
 static int multiport_mt6_check(const struct xt_mtchk_param *par)
@@ -136,8 +160,10 @@ static int multiport_mt6_check(const struct xt_mtchk_param *par)
 	const struct ip6t_ip6 *ip = par->entryinfo;
 	const struct xt_multiport_v1 *multiinfo = par->matchinfo;
 
-	return check(ip->proto, ip->invflags, multiinfo->flags,
-		     multiinfo->count) ? 0 : -EINVAL;
+	if (!check(ip->proto, ip->invflags, multiinfo->flags, multiinfo->count))
+		return -EINVAL;
+
+	return multiport_valid_ranges(multiinfo) ? 0 : -EINVAL;
 }
 
 static struct xt_match multiport_mt_reg[] __read_mostly = {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets
  2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
                   ` (2 preceding siblings ...)
  2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
  2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Zhengchuan Liang <zcliangcn@gmail.com>

`eui64_mt6()` derives a modified EUI-64 from the Ethernet source address
and compares it with the low 64 bits of the IPv6 source address.

The existing guard only rejects an invalid MAC header when
`par->fragoff != 0`. For packets with `par->fragoff == 0`, `eui64_mt6()`
can still reach `eth_hdr(skb)` even when the MAC header is not valid.

Fix this by removing the `par->fragoff != 0` condition so that packets
with an invalid MAC header are rejected before accessing `eth_hdr(skb)`.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Co-developed-by: Yuan Tan <yuantan098@gmail.com>
Signed-off-by: Yuan Tan <yuantan098@gmail.com>
Suggested-by: Xin Liu <bird@lzu.edu.cn>
Tested-by: Ren Wei <enjou1224z@gmail.com>
Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com>
Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv6/netfilter/ip6t_eui64.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv6/netfilter/ip6t_eui64.c b/net/ipv6/netfilter/ip6t_eui64.c
index d704f7ed300c..da69a27e8332 100644
--- a/net/ipv6/netfilter/ip6t_eui64.c
+++ b/net/ipv6/netfilter/ip6t_eui64.c
@@ -22,8 +22,7 @@ eui64_mt6(const struct sk_buff *skb, struct xt_action_param *par)
 	unsigned char eui64[8];
 
 	if (!(skb_mac_header(skb) >= skb->head &&
-	      skb_mac_header(skb) + ETH_HLEN <= skb->data) &&
-	    par->fragoff != 0) {
+	      skb_mac_header(skb) + ETH_HLEN <= skb->data)) {
 		par->hotdrop = true;
 		return false;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy
  2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
                   ` (3 preceding siblings ...)
  2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
  2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
  2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
  6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Tuan Do <tuan@calif.io>

nft_ct_timeout_obj_destroy() frees the timeout object with kfree()
immediately after nf_ct_untimeout(), without waiting for an RCU grace
period. Concurrent packet processing on other CPUs may still hold
RCU-protected references to the timeout object obtained via
rcu_dereference() in nf_ct_timeout_data().

Add an rcu_head to struct nf_ct_timeout and use kfree_rcu() to defer
freeing until after an RCU grace period, matching the approach already
used in nfnetlink_cttimeout.c.

KASAN report:
 BUG: KASAN: slab-use-after-free in nf_conntrack_tcp_packet+0x1381/0x29d0
 Read of size 4 at addr ffff8881035fe19c by task exploit/80

 Call Trace:
  nf_conntrack_tcp_packet+0x1381/0x29d0
  nf_conntrack_in+0x612/0x8b0
  nf_hook_slow+0x70/0x100
  __ip_local_out+0x1b2/0x210
  tcp_sendmsg_locked+0x722/0x1580
  __sys_sendto+0x2d8/0x320

 Allocated by task 75:
  nft_ct_timeout_obj_init+0xf6/0x290
  nft_obj_init+0x107/0x1b0
  nf_tables_newobj+0x680/0x9c0
  nfnetlink_rcv_batch+0xc29/0xe00

 Freed by task 26:
  nft_obj_destroy+0x3f/0xa0
  nf_tables_trans_destroy_work+0x51c/0x5c0
  process_one_work+0x2c4/0x5a0

Fixes: 7e0b2b57f01d ("netfilter: nft_ct: add ct timeout support")
Cc: stable@vger.kernel.org
Signed-off-by: Tuan Do <tuan@calif.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_timeout.h | 1 +
 net/netfilter/nft_ct.c                       | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_conntrack_timeout.h b/include/net/netfilter/nf_conntrack_timeout.h
index 9fdaba911de6..3a66d4abb6d6 100644
--- a/include/net/netfilter/nf_conntrack_timeout.h
+++ b/include/net/netfilter/nf_conntrack_timeout.h
@@ -14,6 +14,7 @@
 struct nf_ct_timeout {
 	__u16			l3num;
 	const struct nf_conntrack_l4proto *l4proto;
+	struct rcu_head		rcu;
 	char			data[];
 };
 
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 128ff8155b5d..04c74ccf9b84 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -1020,7 +1020,7 @@ static void nft_ct_timeout_obj_destroy(const struct nft_ctx *ctx,
 	nf_queue_nf_hook_drop(ctx->net);
 	nf_ct_untimeout(ctx->net, timeout);
 	nf_ct_netns_put(ctx->net, ctx->family);
-	kfree(priv->timeout);
+	kfree_rcu(priv->timeout, rcu);
 }
 
 static int nft_ct_timeout_obj_dump(struct sk_buff *skb,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue
  2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
                   ` (4 preceding siblings ...)
  2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
  2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
  6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

Sharing a global hash table among all queues is tempting, but
it can cause crash:

BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
[..]
 nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
 nfnetlink_rcv_msg+0x46a/0x930
 kmem_cache_alloc_node_noprof+0x11e/0x450

struct nf_queue_entry is freed via kfree, but parallel cpu can still
encounter such an nf_queue_entry when walking the list.

Alternative fix is to free the nf_queue_entry via kfree_rcu() instead,
but as we have to alloc/free for each skb this will cause more mem
pressure.

Cc: Scott Mitchell <scott.k.mitch1@gmail.com>
Fixes: e19079adcd26 ("netfilter: nfnetlink_queue: optimize verdict lookup with hash table")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_queue.h |   1 -
 net/netfilter/nfnetlink_queue.c  | 139 +++++++++++--------------------
 2 files changed, 49 insertions(+), 91 deletions(-)

diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 45eb26b2e95b..d17035d14d96 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -23,7 +23,6 @@ struct nf_queue_entry {
 	struct nf_hook_state	state;
 	bool			nf_ct_is_unconfirmed;
 	u16			size; /* sizeof(entry) + saved route keys */
-	u16			queue_num;
 
 	/* extra space to store route keys */
 };
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 47f7f62906e2..8e02f84784da 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -49,8 +49,8 @@
 #endif
 
 #define NFQNL_QMAX_DEFAULT 1024
-#define NFQNL_HASH_MIN     1024
-#define NFQNL_HASH_MAX     1048576
+#define NFQNL_HASH_MIN     8
+#define NFQNL_HASH_MAX     32768
 
 /* We're using struct nlattr which has 16bit nla_len. Note that nla_len
  * includes the header length. Thus, the maximum packet length that we
@@ -60,29 +60,10 @@
  */
 #define NFQNL_MAX_COPY_RANGE (0xffff - NLA_HDRLEN)
 
-/* Composite key for packet lookup: (net, queue_num, packet_id) */
-struct nfqnl_packet_key {
-	possible_net_t net;
-	u32 packet_id;
-	u16 queue_num;
-} __aligned(sizeof(u32));  /* jhash2 requires 32-bit alignment */
-
-/* Global rhashtable - one for entire system, all netns */
-static struct rhashtable nfqnl_packet_map __read_mostly;
-
-/* Helper to initialize composite key */
-static inline void nfqnl_init_key(struct nfqnl_packet_key *key,
-				  struct net *net, u32 packet_id, u16 queue_num)
-{
-	memset(key, 0, sizeof(*key));
-	write_pnet(&key->net, net);
-	key->packet_id = packet_id;
-	key->queue_num = queue_num;
-}
-
 struct nfqnl_instance {
 	struct hlist_node hlist;		/* global list of queues */
-	struct rcu_head rcu;
+	struct rhashtable nfqnl_packet_map;
+	struct rcu_work	rwork;
 
 	u32 peer_portid;
 	unsigned int queue_maxlen;
@@ -106,6 +87,7 @@ struct nfqnl_instance {
 
 typedef int (*nfqnl_cmpfn)(struct nf_queue_entry *, unsigned long);
 
+static struct workqueue_struct *nfq_cleanup_wq __read_mostly;
 static unsigned int nfnl_queue_net_id __read_mostly;
 
 #define INSTANCE_BUCKETS	16
@@ -124,34 +106,10 @@ static inline u_int8_t instance_hashfn(u_int16_t queue_num)
 	return ((queue_num >> 8) ^ queue_num) % INSTANCE_BUCKETS;
 }
 
-/* Extract composite key from nf_queue_entry for hashing */
-static u32 nfqnl_packet_obj_hashfn(const void *data, u32 len, u32 seed)
-{
-	const struct nf_queue_entry *entry = data;
-	struct nfqnl_packet_key key;
-
-	nfqnl_init_key(&key, entry->state.net, entry->id, entry->queue_num);
-
-	return jhash2((u32 *)&key, sizeof(key) / sizeof(u32), seed);
-}
-
-/* Compare stack-allocated key against entry */
-static int nfqnl_packet_obj_cmpfn(struct rhashtable_compare_arg *arg,
-				  const void *obj)
-{
-	const struct nfqnl_packet_key *key = arg->key;
-	const struct nf_queue_entry *entry = obj;
-
-	return !net_eq(entry->state.net, read_pnet(&key->net)) ||
-	       entry->queue_num != key->queue_num ||
-	       entry->id != key->packet_id;
-}
-
 static const struct rhashtable_params nfqnl_rhashtable_params = {
 	.head_offset = offsetof(struct nf_queue_entry, hash_node),
-	.key_len = sizeof(struct nfqnl_packet_key),
-	.obj_hashfn = nfqnl_packet_obj_hashfn,
-	.obj_cmpfn = nfqnl_packet_obj_cmpfn,
+	.key_offset = offsetof(struct nf_queue_entry, id),
+	.key_len = sizeof(u32),
 	.automatic_shrinking = true,
 	.min_size = NFQNL_HASH_MIN,
 	.max_size = NFQNL_HASH_MAX,
@@ -190,6 +148,10 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
 	spin_lock_init(&inst->lock);
 	INIT_LIST_HEAD(&inst->queue_list);
 
+	err = rhashtable_init(&inst->nfqnl_packet_map, &nfqnl_rhashtable_params);
+	if (err < 0)
+		goto out_free;
+
 	spin_lock(&q->instances_lock);
 	if (instance_lookup(q, queue_num)) {
 		err = -EEXIST;
@@ -210,6 +172,8 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
 
 out_unlock:
 	spin_unlock(&q->instances_lock);
+	rhashtable_destroy(&inst->nfqnl_packet_map);
+out_free:
 	kfree(inst);
 	return ERR_PTR(err);
 }
@@ -217,15 +181,18 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, u32 portid)
 static void nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn,
 			unsigned long data);
 
-static void
-instance_destroy_rcu(struct rcu_head *head)
+static void instance_destroy_work(struct work_struct *work)
 {
-	struct nfqnl_instance *inst = container_of(head, struct nfqnl_instance,
-						   rcu);
+	struct nfqnl_instance *inst;
 
+	inst = container_of(to_rcu_work(work), struct nfqnl_instance,
+			    rwork);
 	rcu_read_lock();
 	nfqnl_flush(inst, NULL, 0);
 	rcu_read_unlock();
+
+	rhashtable_destroy(&inst->nfqnl_packet_map);
+
 	kfree(inst);
 	module_put(THIS_MODULE);
 }
@@ -234,7 +201,9 @@ static void
 __instance_destroy(struct nfqnl_instance *inst)
 {
 	hlist_del_rcu(&inst->hlist);
-	call_rcu(&inst->rcu, instance_destroy_rcu);
+
+	INIT_RCU_WORK(&inst->rwork, instance_destroy_work);
+	queue_rcu_work(nfq_cleanup_wq, &inst->rwork);
 }
 
 static void
@@ -250,9 +219,7 @@ __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
 {
 	int err;
 
-	entry->queue_num = queue->queue_num;
-
-	err = rhashtable_insert_fast(&nfqnl_packet_map, &entry->hash_node,
+	err = rhashtable_insert_fast(&queue->nfqnl_packet_map, &entry->hash_node,
 				     nfqnl_rhashtable_params);
 	if (unlikely(err))
 		return err;
@@ -266,23 +233,19 @@ __enqueue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
 static void
 __dequeue_entry(struct nfqnl_instance *queue, struct nf_queue_entry *entry)
 {
-	rhashtable_remove_fast(&nfqnl_packet_map, &entry->hash_node,
+	rhashtable_remove_fast(&queue->nfqnl_packet_map, &entry->hash_node,
 			       nfqnl_rhashtable_params);
 	list_del(&entry->list);
 	queue->queue_total--;
 }
 
 static struct nf_queue_entry *
-find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id,
-		   struct net *net)
+find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
 {
-	struct nfqnl_packet_key key;
 	struct nf_queue_entry *entry;
 
-	nfqnl_init_key(&key, net, id, queue->queue_num);
-
 	spin_lock_bh(&queue->lock);
-	entry = rhashtable_lookup_fast(&nfqnl_packet_map, &key,
+	entry = rhashtable_lookup_fast(&queue->nfqnl_packet_map, &id,
 				       nfqnl_rhashtable_params);
 
 	if (entry)
@@ -1531,7 +1494,7 @@ static int nfqnl_recv_verdict(struct sk_buff *skb, const struct nfnl_info *info,
 
 	verdict = ntohl(vhdr->verdict);
 
-	entry = find_dequeue_entry(queue, ntohl(vhdr->id), info->net);
+	entry = find_dequeue_entry(queue, ntohl(vhdr->id));
 	if (entry == NULL)
 		return -ENOENT;
 
@@ -1880,40 +1843,38 @@ static int __init nfnetlink_queue_init(void)
 {
 	int status;
 
-	status = rhashtable_init(&nfqnl_packet_map, &nfqnl_rhashtable_params);
-	if (status < 0)
-		return status;
+	nfq_cleanup_wq = alloc_ordered_workqueue("nfq_workqueue", 0);
+	if (!nfq_cleanup_wq)
+		return -ENOMEM;
 
 	status = register_pernet_subsys(&nfnl_queue_net_ops);
-	if (status < 0) {
-		pr_err("failed to register pernet ops\n");
-		goto cleanup_rhashtable;
-	}
+	if (status < 0)
+		goto cleanup_pernet_subsys;
 
-	netlink_register_notifier(&nfqnl_rtnl_notifier);
-	status = nfnetlink_subsys_register(&nfqnl_subsys);
-	if (status < 0) {
-		pr_err("failed to create netlink socket\n");
-		goto cleanup_netlink_notifier;
-	}
+	status = netlink_register_notifier(&nfqnl_rtnl_notifier);
+	if (status < 0)
+	       goto cleanup_rtnl_notifier;
 
 	status = register_netdevice_notifier(&nfqnl_dev_notifier);
-	if (status < 0) {
-		pr_err("failed to register netdevice notifier\n");
-		goto cleanup_netlink_subsys;
-	}
+	if (status < 0)
+		goto cleanup_dev_notifier;
+
+	status = nfnetlink_subsys_register(&nfqnl_subsys);
+	if (status < 0)
+		goto cleanup_nfqnl_subsys;
 
 	nf_register_queue_handler(&nfqh);
 
 	return status;
 
-cleanup_netlink_subsys:
-	nfnetlink_subsys_unregister(&nfqnl_subsys);
-cleanup_netlink_notifier:
+cleanup_nfqnl_subsys:
+	unregister_netdevice_notifier(&nfqnl_dev_notifier);
+cleanup_dev_notifier:
 	netlink_unregister_notifier(&nfqnl_rtnl_notifier);
+cleanup_rtnl_notifier:
 	unregister_pernet_subsys(&nfnl_queue_net_ops);
-cleanup_rhashtable:
-	rhashtable_destroy(&nfqnl_packet_map);
+cleanup_pernet_subsys:
+	destroy_workqueue(nfq_cleanup_wq);
 	return status;
 }
 
@@ -1924,9 +1885,7 @@ static void __exit nfnetlink_queue_fini(void)
 	nfnetlink_subsys_unregister(&nfqnl_subsys);
 	netlink_unregister_notifier(&nfqnl_rtnl_notifier);
 	unregister_pernet_subsys(&nfnl_queue_net_ops);
-
-	rhashtable_destroy(&nfqnl_packet_map);
-
+	destroy_workqueue(nfq_cleanup_wq);
 	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test
  2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
                   ` (5 preceding siblings ...)
  2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
@ 2026-04-08 16:35 ` Florian Westphal
  6 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2026-04-08 16:35 UTC (permalink / raw)
  To: netdev
  Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netfilter-devel, pablo

From: Fernando Fernandez Mancera <fmancera@suse.de>

Introduce a new stress test to check for race conditions in the
nfnetlink_queue subsystem, where an entry is freed while another CPU is
concurrently walking the global rhashtable.

To trigger this, `nf_queue.c` is extended with two new flags:
  * -O (out-of-order): Buffers packet IDs and flushes them in reverse.
  * -b (bogus verdicts): Floods the kernel with non-existent packet IDs.

The bogus verdict loop forces the kernel's lookup function to perform
full rhashtable bucket traversals (-ENOENT). Combined with reverse-order
flushing and heavy parallel UDP/ping flooding across 8 queues, this puts
the nfnetlink_queue code under pressure.

Joint work with Florian Westphal.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 .../selftests/net/netfilter/nf_queue.c        | 50 +++++++++--
 .../selftests/net/netfilter/nft_queue.sh      | 83 ++++++++++++++++---
 2 files changed, 115 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/net/netfilter/nf_queue.c b/tools/testing/selftests/net/netfilter/nf_queue.c
index 116c0ca0eabb..8bbec37f5356 100644
--- a/tools/testing/selftests/net/netfilter/nf_queue.c
+++ b/tools/testing/selftests/net/netfilter/nf_queue.c
@@ -19,6 +19,8 @@ struct options {
 	bool count_packets;
 	bool gso_enabled;
 	bool failopen;
+	bool out_of_order;
+	bool bogus_verdict;
 	int verbose;
 	unsigned int queue_num;
 	unsigned int timeout;
@@ -31,7 +33,7 @@ static struct options opts;
 
 static void help(const char *p)
 {
-	printf("Usage: %s [-c|-v [-vv] ] [-o] [-t timeout] [-q queue_num] [-Qdst_queue ] [ -d ms_delay ] [-G]\n", p);
+	printf("Usage: %s [-c|-v [-vv] ] [-o] [-O] [-b] [-t timeout] [-q queue_num] [-Qdst_queue ] [ -d ms_delay ] [-G]\n", p);
 }
 
 static int parse_attr_cb(const struct nlattr *attr, void *data)
@@ -275,7 +277,9 @@ static int mainloop(void)
 	unsigned int buflen = 64 * 1024 + MNL_SOCKET_BUFFER_SIZE;
 	struct mnl_socket *nl;
 	struct nlmsghdr *nlh;
+	uint32_t ooo_ids[16];
 	unsigned int portid;
+	int ooo_count = 0;
 	char *buf;
 	int ret;
 
@@ -308,6 +312,9 @@ static int mainloop(void)
 
 		ret = mnl_cb_run(buf, ret, 0, portid, queue_cb, NULL);
 		if (ret < 0) {
+			/* bogus verdict mode will generate ENOENT error messages */
+			if (opts.bogus_verdict && errno == ENOENT)
+				continue;
 			perror("mnl_cb_run");
 			exit(EXIT_FAILURE);
 		}
@@ -316,10 +323,35 @@ static int mainloop(void)
 		if (opts.delay_ms)
 			sleep_ms(opts.delay_ms);
 
-		nlh = nfq_build_verdict(buf, id, opts.queue_num, opts.verdict);
-		if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
-			perror("mnl_socket_sendto");
-			exit(EXIT_FAILURE);
+		if (opts.bogus_verdict) {
+			for (int i = 0; i < 50; i++) {
+				nlh = nfq_build_verdict(buf, id + 0x7FFFFFFF + i,
+							opts.queue_num, opts.verdict);
+				mnl_socket_sendto(nl, nlh, nlh->nlmsg_len);
+			}
+		}
+
+		if (opts.out_of_order) {
+			ooo_ids[ooo_count] = id;
+			if (ooo_count >= 15) {
+				for (ooo_count; ooo_count >= 0; ooo_count--) {
+					nlh = nfq_build_verdict(buf, ooo_ids[ooo_count],
+								opts.queue_num, opts.verdict);
+					if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
+						perror("mnl_socket_sendto");
+						exit(EXIT_FAILURE);
+					}
+				}
+				ooo_count = 0;
+			} else {
+				ooo_count++;
+			}
+		} else {
+			nlh = nfq_build_verdict(buf, id, opts.queue_num, opts.verdict);
+			if (mnl_socket_sendto(nl, nlh, nlh->nlmsg_len) < 0) {
+				perror("mnl_socket_sendto");
+				exit(EXIT_FAILURE);
+			}
 		}
 	}
 
@@ -332,7 +364,7 @@ static void parse_opts(int argc, char **argv)
 {
 	int c;
 
-	while ((c = getopt(argc, argv, "chvot:q:Q:d:G")) != -1) {
+	while ((c = getopt(argc, argv, "chvoObt:q:Q:d:G")) != -1) {
 		switch (c) {
 		case 'c':
 			opts.count_packets = true;
@@ -375,6 +407,12 @@ static void parse_opts(int argc, char **argv)
 		case 'v':
 			opts.verbose++;
 			break;
+		case 'O':
+			opts.out_of_order = true;
+			break;
+		case 'b':
+			opts.bogus_verdict = true;
+			break;
 		}
 	}
 
diff --git a/tools/testing/selftests/net/netfilter/nft_queue.sh b/tools/testing/selftests/net/netfilter/nft_queue.sh
index ea766bdc5d04..d80390848e85 100755
--- a/tools/testing/selftests/net/netfilter/nft_queue.sh
+++ b/tools/testing/selftests/net/netfilter/nft_queue.sh
@@ -11,6 +11,7 @@ ret=0
 timeout=5
 
 SCTP_TEST_TIMEOUT=60
+STRESS_TEST_TIMEOUT=30
 
 cleanup()
 {
@@ -719,6 +720,74 @@ EOF
 	fi
 }
 
+check_tainted()
+{
+	local msg="$1"
+
+	if [ "$tainted_then" -ne 0 ];then
+		return
+	fi
+
+	read tainted_now < /proc/sys/kernel/tainted
+	if [ "$tainted_now" -eq 0 ];then
+		echo "PASS: $msg"
+	else
+		echo "TAINT: $msg"
+		dmesg
+		ret=1
+	fi
+}
+
+test_queue_stress()
+{
+	read tainted_then < /proc/sys/kernel/tainted
+	local i
+
+        ip netns exec "$nsrouter" nft -f /dev/stdin <<EOF
+flush ruleset
+table inet t {
+	chain forward {
+		type filter hook forward priority 0; policy accept;
+
+		queue flags bypass to numgen random mod 8
+	}
+}
+EOF
+	timeout "$STRESS_TEST_TIMEOUT" ip netns exec "$ns2" \
+		socat -u UDP-LISTEN:12345,fork,pf=ipv4 STDOUT > /dev/null &
+
+	timeout "$STRESS_TEST_TIMEOUT" ip netns exec "$ns3" \
+		socat -u UDP-LISTEN:12345,fork,pf=ipv4 STDOUT > /dev/null &
+
+	for i in $(seq 0 7); do
+		ip netns exec "$nsrouter" timeout "$STRESS_TEST_TIMEOUT" \
+			./nf_queue -q $i -t 2 -O -b > /dev/null &
+	done
+
+	ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+		ping -q -f 10.0.2.99 > /dev/null 2>&1 &
+	ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+		ping -q -f 10.0.3.99 > /dev/null 2>&1 &
+	ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+		ping -q -f "dead:2::99" > /dev/null 2>&1 &
+	ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+		ping -q -f "dead:3::99" > /dev/null 2>&1 &
+
+	busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns2" 12345
+	busywait "$BUSYWAIT_TIMEOUT" udp_listener_ready "$ns3" 12345
+
+	for i in $(seq 1 4);do
+		ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+			socat -u STDIN UDP-DATAGRAM:10.0.2.99:12345 < /dev/zero > /dev/null &
+		ip netns exec "$ns1" timeout "$STRESS_TEST_TIMEOUT" \
+			socat -u STDIN UDP-DATAGRAM:10.0.3.99:12345 < /dev/zero > /dev/null &
+	done
+
+	wait
+
+	check_tainted "concurrent queueing"
+}
+
 test_queue_removal()
 {
 	read tainted_then < /proc/sys/kernel/tainted
@@ -742,18 +811,7 @@ EOF
 
 	ip netns exec "$ns1" nft flush ruleset
 
-	if [ "$tainted_then" -ne 0 ];then
-		return
-	fi
-
-	read tainted_now < /proc/sys/kernel/tainted
-	if [ "$tainted_now" -eq 0 ];then
-		echo "PASS: queue program exiting while packets queued"
-	else
-		echo "TAINT: queue program exiting while packets queued"
-		dmesg
-		ret=1
-	fi
+	check_tainted "queue program exiting while packets queued"
 }
 
 ip netns exec "$nsrouter" sysctl net.ipv6.conf.all.forwarding=1 > /dev/null
@@ -799,6 +857,7 @@ test_sctp_forward
 test_sctp_output
 test_udp_nat_race
 test_udp_gro_ct
+test_queue_stress
 
 # should be last, adds vrf device in ns1 and changes routes
 test_icmp_vrf
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-04-08 16:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08 16:35 [PATCH net 0/7] netfilter updates for net Florian Westphal
2026-04-08 16:35 ` [PATCH net 1/7] ipvs: fix NULL deref in ip_vs_add_service error path Florian Westphal
2026-04-08 16:35 ` [PATCH net 2/7] netfilter: nfnetlink_log: initialize nfgenmsg in NLMSG_DONE terminator Florian Westphal
2026-04-08 16:35 ` [PATCH net 3/7] netfilter: xt_multiport: validate range encoding in checkentry Florian Westphal
2026-04-08 16:35 ` [PATCH net 4/7] netfilter: ip6t_eui64: reject invalid MAC header for all packets Florian Westphal
2026-04-08 16:35 ` [PATCH net 5/7] netfilter: nft_ct: fix use-after-free in timeout object destroy Florian Westphal
2026-04-08 16:35 ` [PATCH net 6/7] netfilter: nfnetlink_queue: make hash table per queue Florian Westphal
2026-04-08 16:35 ` [PATCH net 7/7] selftests: nft_queue.sh: add a parallel stress test Florian Westphal
  -- strict thread matches above, loose matches on Subject: below --
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
2023-10-12  8:57 [PATCH net 0/7] netfilter " Florian Westphal
2023-05-10  8:33 [PATCH net 0/7] Netfilter " Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox