* [PATCH net 0/7] netfilter: updates for net
@ 2025-09-10 19:03 Florian Westphal
2025-09-10 19:03 ` [PATCH net 1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation Florian Westphal
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Hi,
The following patchset contains Netfilter fixes for *net*:
WARNING: This results in a conflict on net -> net-next merge.
Merge resolution walkthrough is at the end of this cover letter, see
MERGE WALKTHROUGH.
Merge branch 'mptcp-misc-fixes-for-v6-17-rc6' (2025-09-09 18:39:55 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-25-09-10-v2
for you to fetch changes up to 37a9675e61a2a2a721a28043ffdf2c8ec81eba37:
MAINTAINERS: add Phil as netfilter reviewer (2025-09-10 20:32:46 +0200)
First patch adds a lockdep annotation for a false-positive splat.
Last patch adds formal reviewer tag for Phil Sutter to MAINTAINERS.
Rest of the patches resolve spurious false negative results during set
lookups while another CPU is processing a transaction.
This has been broken at least since v4.18 when an unconditional
synchronize_rcu call was removed from the commit phase of nf_tables.
Quoting from Stefan Hanreichs original report:
It seems like we've found an issue with atomicity when reloading
nftables rulesets. Sometimes there is a small window where rules
containing sets do not seem to apply to incoming traffic, due to the set
apparently being empty for a short amount of time when flushing / adding
elements.
Exanple ruleset:
table ip filter {
set match {
type ipv4_addr
flags interval
elements = { 0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
}
chain pre {
type filter hook prerouting priority filter; policy accept;
ip saddr @match accept
counter comment "must never match"
}
}
Reproducer transaction:
while true:
nft -f -<<EOF
flush set ip filter match
create element ip filter match { \
0.0.0.0-192.168.2.19, 192.168.2.21-255.255.255.255 }
EOF
done
Then create traffic. to/from e.g. 192.168.2.1 to 192.168.3.10.
Once in a while the counter will increment even though the
'ip saddr @match' rule should have accepted the packet.
See individual patches for details.
Thanks to Stefan Hanreich for an initial description and reproducer for
this bug and to Pablo Neira Ayuso for reviewing earlier iterations of
the patchset.
Florian Westphal (7):
netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
netfilter: nft_set_rbtree: continue traversal if element is inactive
netfilter: nf_tables: place base_seq in struct net
netfilter: nf_tables: make nft_set_do_lookup available unconditionally
netfilter: nf_tables: restart set lookup on base_seq change
MAINTAINERS: add Phil as netfilter reviewer
MAINTAINERS | 1 +
include/net/netfilter/nf_tables.h | 1 -
include/net/netfilter/nf_tables_core.h | 10 +---
include/net/netns/nftables.h | 1 +
net/netfilter/nf_tables_api.c | 66 +++++++++++++-------------
net/netfilter/nft_lookup.c | 46 ++++++++++++++++--
net/netfilter/nft_set_bitmap.c | 3 +-
net/netfilter/nft_set_pipapo.c | 20 +++++++-
net/netfilter/nft_set_pipapo_avx2.c | 4 +-
net/netfilter/nft_set_rbtree.c | 6 +--
10 files changed, 103 insertions(+), 55 deletions(-)
MERGE WALKTHROUGH:
When merging this to net-next, you should see following:
CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo.c
CONFLICT (content): Merge conflict in net/netfilter/nft_set_pipapo_avx2.c
Instructions for net/netfilter/nft_set_pipapo.c:
@@@ -562,7 -539,7 +578,11 @@@ nft_pipapo_lookup(const struct net *net
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
++<<<<<<< HEAD
+ e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
++=======
+ e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
return e ? &e->ext : NULL;
}
Take the HEAD chunk, with 'genmask' replaced by NFT_GENMASK_ANY, i.e.:
e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
Instructions for net/netfilter/nft_set_pipapo_avx2.c:
++<<<<<<< HEAD
++=======
+ const struct nft_pipapo_match *m;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
Take the HEAD chunk, i.e. delete 'const struct nft_pipapo_match *m;':
In -next, this is passed as function argument.
++<<<<<<< HEAD
+ if (ret < 0) {
+ scratch->map_index = map_index;
+ kernel_fpu_end();
+ __local_unlock_nested_bh(&scratch->bh_lock);
+ return NULL;
++=======
+ if (ret < 0)
+ goto out;
+
+ if (last) {
+ const struct nft_set_ext *e = &f->mt[ret].e->ext;
+
+ if (unlikely(nft_set_elem_expired(e)))
+ goto next_match;
+
+ ext = e;
+ goto out;
++>>>>>>> 352fd037254683c940630a6c5c8aa8c8ca38ae88
Take the HEAD chunk and discard the other; including if (last) { branch.
Then, in nft_pipapo_avx2_lookup(), make this change:
@@ -1274,9 +1273,8 @@
nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const u8 *rp = (const u8 *)key;
const struct nft_pipapo_elem *e;
@@ -1292,9 +1290,9 @@
}
m = rcu_dereference(priv->match);
- e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+ e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
local_bh_enable();
return e ? &e->ext : NULL;
After this change, you are done.
The expected diff vs the net-next main branch in these two files is:
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -549,6 +549,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
*
* This function is called from the data path. It will search for
* an element matching the given key in the current active copy.
+ * Unlike other set types, this uses NFT_GENMASK_ANY instead of
+ * nft_genmask_cur().
[trimmed rest of comment]
*
* Return: ntables API extension pointer or NULL if no match.
*/
@@ -557,12 +574,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
- e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
+ e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
return e ? &e->ext : NULL;
}
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1275,7 +1275,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const u8 *rp = (const u8 *)key;
const struct nft_pipapo_elem *e;
@@ -1293,7 +1292,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
m = rcu_dereference(priv->match);
- e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
+ e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
local_bh_enable();
return e ? &e->ext : NULL;
--
2.49.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net 1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
@ 2025-09-10 19:03 ` Florian Westphal
2025-09-11 2:40 ` patchwork-bot+netdevbpf
2025-09-10 19:03 ` [PATCH net 2/7] netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Florian Westphal
` (6 subsequent siblings)
7 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Running new 'set_flush_add_atomic_bitmap' test case for nftables.git
with CONFIG_PROVE_RCU_LIST=y yields:
net/netfilter/nft_set_bitmap.c:231 RCU-list traversed in non-reader section!!
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by nft/4008:
#0: ffff888147f79cd8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x2f/0xd0
lockdep_rcu_suspicious+0x116/0x160
nft_bitmap_walk+0x22d/0x240
nf_tables_delsetelem+0x1010/0x1a00
..
This is a false positive, the list cannot be altered while the
transaction mutex is held, so pass the relevant argument to the iterator.
Fixes tag intentionally wrong; no point in picking this up if earlier
false-positive-fixups were not applied.
Fixes: 28b7a6b84c0a ("netfilter: nf_tables: avoid false-positive lockdep splats in set walker")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_bitmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nft_set_bitmap.c b/net/netfilter/nft_set_bitmap.c
index c24c922f895d..8d3f040a904a 100644
--- a/net/netfilter/nft_set_bitmap.c
+++ b/net/netfilter/nft_set_bitmap.c
@@ -226,7 +226,8 @@ static void nft_bitmap_walk(const struct nft_ctx *ctx,
const struct nft_bitmap *priv = nft_set_priv(set);
struct nft_bitmap_elem *be;
- list_for_each_entry_rcu(be, &priv->list, head) {
+ list_for_each_entry_rcu(be, &priv->list, head,
+ lockdep_is_held(&nft_pernet(ctx->net)->commit_mutex)) {
if (iter->count < iter->skip)
goto cont;
--
2.49.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 2/7] netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
2025-09-10 19:03 ` [PATCH net 1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation Florian Westphal
@ 2025-09-10 19:03 ` Florian Westphal
2025-09-10 19:03 ` [PATCH net 3/7] netfilter: nft_set_rbtree: continue traversal if element is inactive Florian Westphal
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
The pipapo set type is special in that it has two copies of its
datastructure: one live copy containing only valid elements and one
on-demand clone used during transaction where adds/deletes happen.
This clone is not visible to the datapath.
This is unlike all other set types in nftables, those all link new
elements into their live hlist/tree.
For those sets, the lookup functions must skip the new elements while the
transaction is ongoing to ensure consistency.
As the clone is shallow, removal does have an effect on the packet path:
once the transaction enters the commit phase the 'gencursor' bit that
determines which elements are active and which elements should be ignored
(because they are no longer valid) is flipped.
This causes the datapath lookup to ignore these elements if they are found
during lookup.
This opens up a small race window where pipapo has an inconsistent view of
the dataset from when the transaction-cpu flipped the genbit until the
transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers:
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
I) increments the genbit
A) observes new genbit
removes elements from the clone so
they won't be found anymore
B) lookup in datastructure
can't see new elements yet,
but old elements are ignored
-> Only matches elements that
were not changed in the
transaction
II) calls nft_pipapo_commit(), clone
and live pointers are swapped.
C New nft_lookup happening now
will find matching elements.
Consider a packet matching range r1-r2:
cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed.
At the same time, r1-r3 is not visible yet, because it can only be found
in the clone.
The situation persists for all lookups until after cpu0 hits II).
The fix is easy: Don't check the genbit from pipapo lookup functions.
This is possible because unlike the other set types, the new elements are
not reachable from the live copy of the dataset.
The clone/live pointer swap is enough to avoid matching on old elements
while at the same time all new elements are exposed in one go.
After this change, step B above returns a match in r1-r2.
This is fine: r1-r2 only becomes truly invalid the moment they get freed.
This happens after a synchronize_rcu() call and rcu read lock is held
via netfilter hook traversal (nf_hook_slow()).
Cc: Stefano Brivio <sbrivio@redhat.com>
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_pipapo.c | 20 ++++++++++++++++++--
net/netfilter/nft_set_pipapo_avx2.c | 4 +---
2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 9a10251228fd..793790d79d13 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -510,6 +510,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
*
* This function is called from the data path. It will search for
* an element matching the given key in the current active copy.
+ * Unlike other set types, this uses NFT_GENMASK_ANY instead of
+ * nft_genmask_cur().
+ *
+ * This is because new (future) elements are not reachable from
+ * priv->match, they get added to priv->clone instead.
+ * When the commit phase flips the generation bitmask, the
+ * 'now old' entries are skipped but without the 'now current'
+ * elements becoming visible. Using nft_genmask_cur() thus creates
+ * inconsistent state: matching old entries get skipped but thew
+ * newly matching entries are unreachable.
+ *
+ * GENMASK will still find the 'now old' entries which ensures consistent
+ * priv->match view.
+ *
+ * nft_pipapo_commit swaps ->clone and ->match shortly after the
+ * genbit flip. As ->clone doesn't contain the old entries in the first
+ * place, lookup will only find the now-current ones.
*
* Return: ntables API extension pointer or NULL if no match.
*/
@@ -518,12 +535,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
- e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64());
+ e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
return e ? &e->ext : NULL;
}
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 2f090e253caf..c0884fa68c79 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1152,7 +1152,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
struct nft_pipapo *priv = nft_set_priv(set);
const struct nft_set_ext *ext = NULL;
struct nft_pipapo_scratch *scratch;
- u8 genmask = nft_genmask_cur(net);
const struct nft_pipapo_match *m;
const struct nft_pipapo_field *f;
const u8 *rp = (const u8 *)key;
@@ -1248,8 +1247,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
if (last) {
const struct nft_set_ext *e = &f->mt[ret].e->ext;
- if (unlikely(nft_set_elem_expired(e) ||
- !nft_set_elem_active(e, genmask)))
+ if (unlikely(nft_set_elem_expired(e)))
goto next_match;
ext = e;
--
2.49.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 3/7] netfilter: nft_set_rbtree: continue traversal if element is inactive
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
2025-09-10 19:03 ` [PATCH net 1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation Florian Westphal
2025-09-10 19:03 ` [PATCH net 2/7] netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Florian Westphal
@ 2025-09-10 19:03 ` Florian Westphal
2025-09-10 19:03 ` [PATCH net 4/7] netfilter: nf_tables: place base_seq in struct net Florian Westphal
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
When the rbtree lookup function finds a match in the rbtree, it sets the
range start interval to a potentially inactive element.
Then, after tree lookup, if the matching element is inactive, it returns
NULL and suppresses a matching result.
This is wrong and leads to false negative matches when a transaction has
already entered the commit phase.
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
I) increments the genbit
A) observes new genbit
B) finds matching range
C) returns no match: found
range invalid in new generation
II) removes old elements from the tree
C New nft_lookup happening now
will find matching element,
because it is no longer
obscured by old, inactive one.
Consider a packet matching range r1-r2:
cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3
P is contained in both ranges. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed. It does NOT test for further matches.
The situation persists for all lookups until after cpu0 hits II) after
which r1-r3 range start node is tested for the first time.
Move the "interval start is valid" check ahead so that tree traversal
continues if the starting interval is not valid in this generation.
Thanks to Stefan Hanreich for providing an initial reproducer for this
bug.
Reported-by: Stefan Hanreich <s.hanreich@proxmox.com>
Fixes: c1eda3c6394f ("netfilter: nft_rbtree: ignore inactive matching element with no descendants")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_rbtree.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 938a257c069e..b1f04168ec93 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -77,7 +77,9 @@ __nft_rbtree_lookup(const struct net *net, const struct nft_set *set,
nft_rbtree_interval_end(rbe) &&
nft_rbtree_interval_start(interval))
continue;
- interval = rbe;
+ if (nft_set_elem_active(&rbe->ext, genmask) &&
+ !nft_rbtree_elem_expired(rbe))
+ interval = rbe;
} else if (d > 0)
parent = rcu_dereference_raw(parent->rb_right);
else {
@@ -102,8 +104,6 @@ __nft_rbtree_lookup(const struct net *net, const struct nft_set *set,
}
if (set->flags & NFT_SET_INTERVAL && interval != NULL &&
- nft_set_elem_active(&interval->ext, genmask) &&
- !nft_rbtree_elem_expired(interval) &&
nft_rbtree_interval_start(interval))
return &interval->ext;
--
2.49.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 4/7] netfilter: nf_tables: place base_seq in struct net
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
` (2 preceding siblings ...)
2025-09-10 19:03 ` [PATCH net 3/7] netfilter: nft_set_rbtree: continue traversal if element is inactive Florian Westphal
@ 2025-09-10 19:03 ` Florian Westphal
2025-09-10 19:03 ` [PATCH net 5/7] netfilter: nf_tables: make nft_set_do_lookup available unconditionally Florian Westphal
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
This will soon be read from packet path around same time as the gencursor.
Both gencursor and base_seq get incremented almost at the same time, so
it makes sense to place them in the same structure.
This doesn't increase struct net size on 64bit due to padding.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables.h | 1 -
include/net/netns/nftables.h | 1 +
net/netfilter/nf_tables_api.c | 65 ++++++++++++++++---------------
3 files changed, 34 insertions(+), 33 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 891e43a01bdc..3faa80f5d801 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1912,7 +1912,6 @@ struct nftables_pernet {
struct mutex commit_mutex;
u64 table_handle;
u64 tstamp;
- unsigned int base_seq;
unsigned int gc_seq;
u8 validate_state;
struct work_struct destroy_work;
diff --git a/include/net/netns/nftables.h b/include/net/netns/nftables.h
index cc8060c017d5..99dd166c5d07 100644
--- a/include/net/netns/nftables.h
+++ b/include/net/netns/nftables.h
@@ -3,6 +3,7 @@
#define _NETNS_NFTABLES_H_
struct netns_nftables {
+ unsigned int base_seq;
u8 gencursor;
};
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index c1082de09656..9518b50695ba 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1131,11 +1131,14 @@ nf_tables_chain_type_lookup(struct net *net, const struct nlattr *nla,
return ERR_PTR(-ENOENT);
}
-static __be16 nft_base_seq(const struct net *net)
+static unsigned int nft_base_seq(const struct net *net)
{
- struct nftables_pernet *nft_net = nft_pernet(net);
+ return READ_ONCE(net->nft.base_seq);
+}
- return htons(nft_net->base_seq & 0xffff);
+static __be16 nft_base_seq_be16(const struct net *net)
+{
+ return htons(nft_base_seq(net) & 0xffff);
}
static const struct nla_policy nft_table_policy[NFTA_TABLE_MAX + 1] = {
@@ -1155,7 +1158,7 @@ static int nf_tables_fill_table_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq,
nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event),
- flags, family, NFNETLINK_V0, nft_base_seq(net));
+ flags, family, NFNETLINK_V0, nft_base_seq_be16(net));
if (!nlh)
goto nla_put_failure;
@@ -1248,7 +1251,7 @@ static int nf_tables_dump_tables(struct sk_buff *skb,
rcu_read_lock();
nft_net = nft_pernet(net);
- cb->seq = READ_ONCE(nft_net->base_seq);
+ cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) {
if (family != NFPROTO_UNSPEC && family != table->family)
@@ -2030,7 +2033,7 @@ static int nf_tables_fill_chain_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq,
nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event),
- flags, family, NFNETLINK_V0, nft_base_seq(net));
+ flags, family, NFNETLINK_V0, nft_base_seq_be16(net));
if (!nlh)
goto nla_put_failure;
@@ -2133,7 +2136,7 @@ static int nf_tables_dump_chains(struct sk_buff *skb,
rcu_read_lock();
nft_net = nft_pernet(net);
- cb->seq = READ_ONCE(nft_net->base_seq);
+ cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) {
if (family != NFPROTO_UNSPEC && family != table->family)
@@ -3671,7 +3674,7 @@ static int nf_tables_fill_rule_info(struct sk_buff *skb, struct net *net,
u16 type = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event);
nlh = nfnl_msg_put(skb, portid, seq, type, flags, family, NFNETLINK_V0,
- nft_base_seq(net));
+ nft_base_seq_be16(net));
if (!nlh)
goto nla_put_failure;
@@ -3839,7 +3842,7 @@ static int nf_tables_dump_rules(struct sk_buff *skb,
rcu_read_lock();
nft_net = nft_pernet(net);
- cb->seq = READ_ONCE(nft_net->base_seq);
+ cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) {
if (family != NFPROTO_UNSPEC && family != table->family)
@@ -4050,7 +4053,7 @@ static int nf_tables_getrule_reset(struct sk_buff *skb,
buf = kasprintf(GFP_ATOMIC, "%.*s:%u",
nla_len(nla[NFTA_RULE_TABLE]),
(char *)nla_data(nla[NFTA_RULE_TABLE]),
- nft_net->base_seq);
+ nft_base_seq(net));
audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1,
AUDIT_NFT_OP_RULE_RESET, GFP_ATOMIC);
kfree(buf);
@@ -4887,7 +4890,7 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx,
nlh = nfnl_msg_put(skb, portid, seq,
nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event),
flags, ctx->family, NFNETLINK_V0,
- nft_base_seq(ctx->net));
+ nft_base_seq_be16(ctx->net));
if (!nlh)
goto nla_put_failure;
@@ -5032,7 +5035,7 @@ static int nf_tables_dump_sets(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock();
nft_net = nft_pernet(net);
- cb->seq = READ_ONCE(nft_net->base_seq);
+ cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) {
if (ctx->family != NFPROTO_UNSPEC &&
@@ -6209,7 +6212,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock();
nft_net = nft_pernet(net);
- cb->seq = READ_ONCE(nft_net->base_seq);
+ cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) {
if (dump_ctx->ctx.family != NFPROTO_UNSPEC &&
@@ -6238,7 +6241,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
seq = cb->nlh->nlmsg_seq;
nlh = nfnl_msg_put(skb, portid, seq, event, NLM_F_MULTI,
- table->family, NFNETLINK_V0, nft_base_seq(net));
+ table->family, NFNETLINK_V0, nft_base_seq_be16(net));
if (!nlh)
goto nla_put_failure;
@@ -6331,7 +6334,7 @@ static int nf_tables_fill_setelem_info(struct sk_buff *skb,
event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event);
nlh = nfnl_msg_put(skb, portid, seq, event, flags, ctx->family,
- NFNETLINK_V0, nft_base_seq(ctx->net));
+ NFNETLINK_V0, nft_base_seq_be16(ctx->net));
if (!nlh)
goto nla_put_failure;
@@ -6630,7 +6633,7 @@ static int nf_tables_getsetelem_reset(struct sk_buff *skb,
}
nelems++;
}
- audit_log_nft_set_reset(dump_ctx.ctx.table, nft_net->base_seq, nelems);
+ audit_log_nft_set_reset(dump_ctx.ctx.table, nft_base_seq(info->net), nelems);
out_unlock:
rcu_read_unlock();
@@ -8381,7 +8384,7 @@ static int nf_tables_fill_obj_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq,
nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event),
- flags, family, NFNETLINK_V0, nft_base_seq(net));
+ flags, family, NFNETLINK_V0, nft_base_seq_be16(net));
if (!nlh)
goto nla_put_failure;
@@ -8446,7 +8449,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
rcu_read_lock();
nft_net = nft_pernet(net);
- cb->seq = READ_ONCE(nft_net->base_seq);
+ cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) {
if (family != NFPROTO_UNSPEC && family != table->family)
@@ -8480,7 +8483,7 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
idx++;
}
if (ctx->reset && entries)
- audit_log_obj_reset(table, nft_net->base_seq, entries);
+ audit_log_obj_reset(table, nft_base_seq(net), entries);
if (rc < 0)
break;
}
@@ -8649,7 +8652,7 @@ static int nf_tables_getobj_reset(struct sk_buff *skb,
buf = kasprintf(GFP_ATOMIC, "%.*s:%u",
nla_len(nla[NFTA_OBJ_TABLE]),
(char *)nla_data(nla[NFTA_OBJ_TABLE]),
- nft_net->base_seq);
+ nft_base_seq(net));
audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1,
AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC);
kfree(buf);
@@ -8754,9 +8757,8 @@ void nft_obj_notify(struct net *net, const struct nft_table *table,
struct nft_object *obj, u32 portid, u32 seq, int event,
u16 flags, int family, int report, gfp_t gfp)
{
- struct nftables_pernet *nft_net = nft_pernet(net);
char *buf = kasprintf(gfp, "%s:%u",
- table->name, nft_net->base_seq);
+ table->name, nft_base_seq(net));
audit_log_nfcfg(buf,
family,
@@ -9442,7 +9444,7 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net,
nlh = nfnl_msg_put(skb, portid, seq,
nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event),
- flags, family, NFNETLINK_V0, nft_base_seq(net));
+ flags, family, NFNETLINK_V0, nft_base_seq_be16(net));
if (!nlh)
goto nla_put_failure;
@@ -9511,7 +9513,7 @@ static int nf_tables_dump_flowtable(struct sk_buff *skb,
rcu_read_lock();
nft_net = nft_pernet(net);
- cb->seq = READ_ONCE(nft_net->base_seq);
+ cb->seq = nft_base_seq(net);
list_for_each_entry_rcu(table, &nft_net->tables, list) {
if (family != NFPROTO_UNSPEC && family != table->family)
@@ -9696,17 +9698,16 @@ static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable)
static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net,
u32 portid, u32 seq)
{
- struct nftables_pernet *nft_net = nft_pernet(net);
struct nlmsghdr *nlh;
char buf[TASK_COMM_LEN];
int event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWGEN);
nlh = nfnl_msg_put(skb, portid, seq, event, 0, AF_UNSPEC,
- NFNETLINK_V0, nft_base_seq(net));
+ NFNETLINK_V0, nft_base_seq_be16(net));
if (!nlh)
goto nla_put_failure;
- if (nla_put_be32(skb, NFTA_GEN_ID, htonl(nft_net->base_seq)) ||
+ if (nla_put_be32(skb, NFTA_GEN_ID, htonl(nft_base_seq(net))) ||
nla_put_be32(skb, NFTA_GEN_PROC_PID, htonl(task_pid_nr(current))) ||
nla_put_string(skb, NFTA_GEN_PROC_NAME, get_task_comm(buf, current)))
goto nla_put_failure;
@@ -10968,11 +10969,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
* Bump generation counter, invalidate any dump in progress.
* Cannot fail after this point.
*/
- base_seq = READ_ONCE(nft_net->base_seq);
+ base_seq = nft_base_seq(net);
while (++base_seq == 0)
;
- WRITE_ONCE(nft_net->base_seq, base_seq);
+ WRITE_ONCE(net->nft.base_seq, base_seq);
gc_seq = nft_gc_seq_begin(nft_net);
@@ -11181,7 +11182,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
nft_commit_notify(net, NETLINK_CB(skb).portid);
nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
- nf_tables_commit_audit_log(&adl, nft_net->base_seq);
+ nf_tables_commit_audit_log(&adl, nft_base_seq(net));
nft_gc_seq_end(nft_net, gc_seq);
nft_net->validate_state = NFT_VALIDATE_SKIP;
@@ -11506,7 +11507,7 @@ static bool nf_tables_valid_genid(struct net *net, u32 genid)
mutex_lock(&nft_net->commit_mutex);
nft_net->tstamp = get_jiffies_64();
- genid_ok = genid == 0 || nft_net->base_seq == genid;
+ genid_ok = genid == 0 || nft_base_seq(net) == genid;
if (!genid_ok)
mutex_unlock(&nft_net->commit_mutex);
@@ -12143,7 +12144,7 @@ static int __net_init nf_tables_init_net(struct net *net)
INIT_LIST_HEAD(&nft_net->module_list);
INIT_LIST_HEAD(&nft_net->notify_list);
mutex_init(&nft_net->commit_mutex);
- nft_net->base_seq = 1;
+ net->nft.base_seq = 1;
nft_net->gc_seq = 0;
nft_net->validate_state = NFT_VALIDATE_SKIP;
INIT_WORK(&nft_net->destroy_work, nf_tables_trans_destroy_work);
--
2.49.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 5/7] netfilter: nf_tables: make nft_set_do_lookup available unconditionally
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
` (3 preceding siblings ...)
2025-09-10 19:03 ` [PATCH net 4/7] netfilter: nf_tables: place base_seq in struct net Florian Westphal
@ 2025-09-10 19:03 ` Florian Westphal
2025-09-10 19:03 ` [PATCH net 6/7] netfilter: nf_tables: restart set lookup on base_seq change Florian Westphal
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
This function was added for retpoline mitigation and is replaced by a
static inline helper if mitigations are not enabled.
Enable this helper function unconditionally so next patch can add a lookup
restart mechanism to fix possible false negatives while transactions are
in progress.
Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would
then need the same copypaste loop.
This patch is separate to ease review of the actual bug fix.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/net/netfilter/nf_tables_core.h | 10 ++--------
net/netfilter/nft_lookup.c | 17 ++++++++++++-----
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 6c2f483d9828..656e784714f3 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -109,17 +109,11 @@ nft_hash_lookup_fast(const struct net *net, const struct nft_set *set,
const struct nft_set_ext *
nft_hash_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
+#endif
+
const struct nft_set_ext *
nft_set_do_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
-#else
-static inline const struct nft_set_ext *
-nft_set_do_lookup(const struct net *net, const struct nft_set *set,
- const u32 *key)
-{
- return set->ops->lookup(net, set, key);
-}
-#endif
/* called from nft_pipapo_avx2.c */
const struct nft_set_ext *
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index 40c602ffbcba..2c6909bf1b40 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -24,11 +24,11 @@ struct nft_lookup {
struct nft_set_binding binding;
};
-#ifdef CONFIG_MITIGATION_RETPOLINE
-const struct nft_set_ext *
-nft_set_do_lookup(const struct net *net, const struct nft_set *set,
- const u32 *key)
+static const struct nft_set_ext *
+__nft_set_do_lookup(const struct net *net, const struct nft_set *set,
+ const u32 *key)
{
+#ifdef CONFIG_MITIGATION_RETPOLINE
if (set->ops == &nft_set_hash_fast_type.ops)
return nft_hash_lookup_fast(net, set, key);
if (set->ops == &nft_set_hash_type.ops)
@@ -51,10 +51,17 @@ nft_set_do_lookup(const struct net *net, const struct nft_set *set,
return nft_rbtree_lookup(net, set, key);
WARN_ON_ONCE(1);
+#endif
return set->ops->lookup(net, set, key);
}
+
+const struct nft_set_ext *
+nft_set_do_lookup(const struct net *net, const struct nft_set *set,
+ const u32 *key)
+{
+ return __nft_set_do_lookup(net, set, key);
+}
EXPORT_SYMBOL_GPL(nft_set_do_lookup);
-#endif
void nft_lookup_eval(const struct nft_expr *expr,
struct nft_regs *regs,
--
2.49.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 6/7] netfilter: nf_tables: restart set lookup on base_seq change
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
` (4 preceding siblings ...)
2025-09-10 19:03 ` [PATCH net 5/7] netfilter: nf_tables: make nft_set_do_lookup available unconditionally Florian Westphal
@ 2025-09-10 19:03 ` Florian Westphal
2025-09-10 19:03 ` [PATCH net 7/7] MAINTAINERS: add Phil as netfilter reviewer Florian Westphal
2025-09-11 7:16 ` [PATCH net 0/7] netfilter: updates for net: manual merge Matthieu Baerts
7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
The hash, hash_fast, rhash and bitwise sets may indicate no result even
though a matching element exists during a short time window while other
cpu is finalizing the transaction.
This happens when the hash lookup/bitwise lookup function has picked up
the old genbit, right before it was toggled by nf_tables_commit(), but
then the same cpu managed to unlink the matching old element from the
hash table:
cpu0 cpu1
has added new elements to clone
has marked elements as being
inactive in new generation
perform lookup in the set
enters commit phase:
A) observes old genbit
increments base_seq
I) increments the genbit
II) removes old element from the set
B) finds matching element
C) returns no match: found
element is not valid in old
generation
Next lookup observes new genbit and
finds matching e2.
Consider a packet matching element e1, e2.
cpu0 processes following transaction:
1. remove e1
2. adds e2, which has same key as e1.
P matches both e1 and e2. Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:
cpu1 observed the old genbit. e2 will not be considered once it is found.
The element e1 is not found anymore if cpu0 managed to unlink it from the
hlist before cpu1 found it during list traversal.
The situation only occurs for a brief time period, lookups happening
after I) observe new genbit and return e2.
This problem exists in all set types except nft_set_pipapo, so fix it once
in nft_lookup rather than each set ops individually.
Sample the base sequence counter, which gets incremented right before the
genbit is changed.
Then, if no match is found, retry the lookup if the base sequence was
altered in between.
If the base sequence hasn't changed:
- No update took place: no-match result is expected.
This is the common case. or:
- nf_tables_commit() hasn't progressed to genbit update yet.
Old elements were still visible and nomatch result is expected, or:
- nf_tables_commit updated the genbit:
We picked up the new base_seq, so the lookup function also picked
up the new genbit, no-match result is expected.
If the old genbit was observed, then nft_lookup also picked up the old
base_seq: nft_lookup_should_retry() returns true and relookup is performed
in the new generation.
This problem was added when the unconditional synchronize_rcu() call
that followed the current/next generation bit toggle was removed.
Thanks to Pablo Neira Ayuso for reviewing an earlier version of this
patchset, for suggesting re-use of existing base_seq and placement of
the restart loop in nft_set_do_lookup().
Fixes: 0cbc06b3faba ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_tables_api.c | 3 ++-
net/netfilter/nft_lookup.c | 31 ++++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9518b50695ba..c3c73411c40c 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -10973,7 +10973,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
while (++base_seq == 0)
;
- WRITE_ONCE(net->nft.base_seq, base_seq);
+ /* pairs with smp_load_acquire in nft_lookup_eval */
+ smp_store_release(&net->nft.base_seq, base_seq);
gc_seq = nft_gc_seq_begin(nft_net);
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index 2c6909bf1b40..58c5b14889c4 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -55,11 +55,40 @@ __nft_set_do_lookup(const struct net *net, const struct nft_set *set,
return set->ops->lookup(net, set, key);
}
+static unsigned int nft_base_seq(const struct net *net)
+{
+ /* pairs with smp_store_release() in nf_tables_commit() */
+ return smp_load_acquire(&net->nft.base_seq);
+}
+
+static bool nft_lookup_should_retry(const struct net *net, unsigned int seq)
+{
+ return unlikely(seq != nft_base_seq(net));
+}
+
const struct nft_set_ext *
nft_set_do_lookup(const struct net *net, const struct nft_set *set,
const u32 *key)
{
- return __nft_set_do_lookup(net, set, key);
+ const struct nft_set_ext *ext;
+ unsigned int base_seq;
+
+ do {
+ base_seq = nft_base_seq(net);
+
+ ext = __nft_set_do_lookup(net, set, key);
+ if (ext)
+ break;
+ /* No match? There is a small chance that lookup was
+ * performed in the old generation, but nf_tables_commit()
+ * already unlinked a (matching) element.
+ *
+ * We need to repeat the lookup to make sure that we didn't
+ * miss a matching element in the new generation.
+ */
+ } while (nft_lookup_should_retry(net, base_seq));
+
+ return ext;
}
EXPORT_SYMBOL_GPL(nft_set_do_lookup);
--
2.49.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 7/7] MAINTAINERS: add Phil as netfilter reviewer
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
` (5 preceding siblings ...)
2025-09-10 19:03 ` [PATCH net 6/7] netfilter: nf_tables: restart set lookup on base_seq change Florian Westphal
@ 2025-09-10 19:03 ` Florian Westphal
2025-09-11 7:16 ` [PATCH net 0/7] netfilter: updates for net: manual merge Matthieu Baerts
7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2025-09-10 19:03 UTC (permalink / raw)
To: netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo
Phil has contributed to netfilter with features, fixes and patch reviews
for a long time. Make this more formal and add Reviewer tag.
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 2df02e4374ed..ba11421c33e5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17480,6 +17480,7 @@ NETFILTER
M: Pablo Neira Ayuso <pablo@netfilter.org>
M: Jozsef Kadlecsik <kadlec@netfilter.org>
M: Florian Westphal <fw@strlen.de>
+R: Phil Sutter <phil@nwl.cc>
L: netfilter-devel@vger.kernel.org
L: coreteam@netfilter.org
S: Maintained
--
2.49.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
2025-09-10 19:03 ` [PATCH net 1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation Florian Westphal
@ 2025-09-11 2:40 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-09-11 2:40 UTC (permalink / raw)
To: Florian Westphal
Cc: netdev, pabeni, davem, edumazet, kuba, netfilter-devel, pablo
Hello:
This series was applied to netdev/net.git (main)
by Florian Westphal <fw@strlen.de>:
On Wed, 10 Sep 2025 21:03:02 +0200 you wrote:
> Running new 'set_flush_add_atomic_bitmap' test case for nftables.git
> with CONFIG_PROVE_RCU_LIST=y yields:
>
> net/netfilter/nft_set_bitmap.c:231 RCU-list traversed in non-reader section!!
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by nft/4008:
> #0: ffff888147f79cd8 (&nft_net->commit_mutex){+.+.}-{4:4}, at: nf_tables_valid_genid+0x2f/0xd0
>
> [...]
Here is the summary with links:
- [net,1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation
https://git.kernel.org/netdev/net/c/5e13f2c491a4
- [net,2/7] netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
https://git.kernel.org/netdev/net/c/c4eaca2e1052
- [net,3/7] netfilter: nft_set_rbtree: continue traversal if element is inactive
https://git.kernel.org/netdev/net/c/a60f7bf4a152
- [net,4/7] netfilter: nf_tables: place base_seq in struct net
https://git.kernel.org/netdev/net/c/64102d9bbc3d
- [net,5/7] netfilter: nf_tables: make nft_set_do_lookup available unconditionally
https://git.kernel.org/netdev/net/c/11fe5a82e53a
- [net,6/7] netfilter: nf_tables: restart set lookup on base_seq change
https://git.kernel.org/netdev/net/c/b2f742c846ca
- [net,7/7] MAINTAINERS: add Phil as netfilter reviewer
https://git.kernel.org/netdev/net/c/37a9675e61a2
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 0/7] netfilter: updates for net: manual merge
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
` (6 preceding siblings ...)
2025-09-10 19:03 ` [PATCH net 7/7] MAINTAINERS: add Phil as netfilter reviewer Florian Westphal
@ 2025-09-11 7:16 ` Matthieu Baerts
7 siblings, 0 replies; 10+ messages in thread
From: Matthieu Baerts @ 2025-09-11 7:16 UTC (permalink / raw)
To: Florian Westphal, netdev
Cc: Paolo Abeni, David S. Miller, Eric Dumazet, Jakub Kicinski,
netfilter-devel, pablo, Stephen Rothwell
[-- Attachment #1: Type: text/plain, Size: 603 bytes --]
Hi Florian,
On 10/09/2025 21:03, Florian Westphal wrote:
> Hi,
>
> The following patchset contains Netfilter fixes for *net*:
>
> WARNING: This results in a conflict on net -> net-next merge.
> Merge resolution walkthrough is at the end of this cover letter, see
> MERGE WALKTHROUGH.
Thank you for these instructions, that was very clear!
Just in case other people need that, attached is the corresponding 3-way
patch, and the rr-cache for this conflict is available there:
https://github.com/multipath-tcp/mptcp-upstream-rr-cache/commit/580515b
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
[-- Attachment #2: 4cab275179a48c3ded528b85c4daed7808a6f04c.patch --]
[-- Type: text/x-patch, Size: 3167 bytes --]
diff --cc net/netfilter/nft_set_pipapo.c
index 4b64c3bd8e70,793790d79d13..a7b8fa8cab7c
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@@ -562,7 -539,7 +578,7 @@@ nft_pipapo_lookup(const struct net *net
const struct nft_pipapo_elem *e;
m = rcu_dereference(priv->match);
- e = pipapo_get_slow(m, (const u8 *)key, genmask, get_jiffies_64());
- e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
++ e = pipapo_get_slow(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
return e ? &e->ext : NULL;
}
diff --cc net/netfilter/nft_set_pipapo_avx2.c
index 7559306d0aed,c0884fa68c79..27dab3667548
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@@ -1226,75 -1241,28 +1226,74 @@@ next_match
#undef NFT_SET_PIPAPO_AVX2_LOOKUP
- if (ret < 0)
- goto out;
-
- if (last) {
- const struct nft_set_ext *e = &f->mt[ret].e->ext;
-
- if (unlikely(nft_set_elem_expired(e)))
- goto next_match;
-
- ext = e;
- goto out;
+ if (ret < 0) {
+ scratch->map_index = map_index;
+ kernel_fpu_end();
+ __local_unlock_nested_bh(&scratch->bh_lock);
+ return NULL;
}
+ if (last) {
+ struct nft_pipapo_elem *e;
+
+ e = f->mt[ret].e;
+ if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
+ !nft_set_elem_active(&e->ext, genmask)))
+ goto next_match;
+
+ scratch->map_index = map_index;
+ kernel_fpu_end();
+ __local_unlock_nested_bh(&scratch->bh_lock);
+ return e;
+ }
+
+ map_index = !map_index;
swap(res, fill);
- rp += NFT_PIPAPO_GROUPS_PADDED_SIZE(f);
+ data += NFT_PIPAPO_GROUPS_PADDED_SIZE(f);
}
-out:
- if (i % 2)
- scratch->map_index = !map_index;
kernel_fpu_end();
+ __local_unlock_nested_bh(&scratch->bh_lock);
+ return NULL;
+}
+
+/**
+ * nft_pipapo_avx2_lookup() - Dataplane frontend for AVX2 implementation
+ * @net: Network namespace
+ * @set: nftables API set representation
+ * @key: nftables API element representation containing key data
+ *
+ * This function is called from the data path. It will search for
+ * an element matching the given key in the current active copy using
+ * the AVX2 routines if the FPU is usable or fall back to the generic
+ * implementation of the algorithm otherwise.
+ *
+ * Return: nftables API extension pointer or NULL if no match.
+ */
+const struct nft_set_ext *
+nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
+ const u32 *key)
+{
+ struct nft_pipapo *priv = nft_set_priv(set);
- u8 genmask = nft_genmask_cur(net);
+ const struct nft_pipapo_match *m;
+ const u8 *rp = (const u8 *)key;
+ const struct nft_pipapo_elem *e;
+
+ local_bh_disable();
+
+ if (unlikely(!irq_fpu_usable())) {
+ const struct nft_set_ext *ext;
+
+ ext = nft_pipapo_lookup(net, set, key);
+
+ local_bh_enable();
+ return ext;
+ }
+
+ m = rcu_dereference(priv->match);
+
- e = pipapo_get_avx2(m, rp, genmask, get_jiffies_64());
++ e = pipapo_get_avx2(m, rp, NFT_GENMASK_ANY, get_jiffies_64());
local_bh_enable();
- return ext;
+ return e ? &e->ext : NULL;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-09-11 7:16 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 19:03 [PATCH net 0/7] netfilter: updates for net Florian Westphal
2025-09-10 19:03 ` [PATCH net 1/7] netfilter: nft_set_bitmap: fix lockdep splat due to missing annotation Florian Westphal
2025-09-11 2:40 ` patchwork-bot+netdevbpf
2025-09-10 19:03 ` [PATCH net 2/7] netfilter: nft_set_pipapo: don't check genbit from packetpath lookups Florian Westphal
2025-09-10 19:03 ` [PATCH net 3/7] netfilter: nft_set_rbtree: continue traversal if element is inactive Florian Westphal
2025-09-10 19:03 ` [PATCH net 4/7] netfilter: nf_tables: place base_seq in struct net Florian Westphal
2025-09-10 19:03 ` [PATCH net 5/7] netfilter: nf_tables: make nft_set_do_lookup available unconditionally Florian Westphal
2025-09-10 19:03 ` [PATCH net 6/7] netfilter: nf_tables: restart set lookup on base_seq change Florian Westphal
2025-09-10 19:03 ` [PATCH net 7/7] MAINTAINERS: add Phil as netfilter reviewer Florian Westphal
2025-09-11 7:16 ` [PATCH net 0/7] netfilter: updates for net: manual merge Matthieu Baerts
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).