* [PATCH nf 0/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry
@ 2026-03-18 13:24 Florian Westphal
2026-03-18 13:24 ` [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry Florian Westphal
2026-03-18 13:24 ` [PATCH nf 2/2] selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug Florian Westphal
0 siblings, 2 replies; 5+ messages in thread
From: Florian Westphal @ 2026-03-18 13:24 UTC (permalink / raw)
To: netfilter-devel; +Cc: sbrivio, Florian Westphal
While adding more comprehensive tests for set transactions to
nftables I found nft cannot restore a valid set via:
(echo flush set t s; cat foo) | nft -f -
... because the avx2 functions can return a non-matching entry iff the entry
that it found in first round was expired.
Patch 1 fixes this bug and patch 2 add a test that triggers the problem.
- C implementation doesn't have this problem
- forcing 'slow' mode in avx2 by axing the actual avx2 routines
also 'fixes' this issue
- No noticeable performance differences with this patch.
- Also have an alternative fix that calls pipapo_refill OR
nft_pipapo_avx2_refill, but that diff is significantly larger,
so I picked the one that is smaller.
Florian Westphal (2):
netfilter: nft_set_pipapo_avx2: don't return non-matching entry on
expiry
selftests: netfilter: nft_concat_range.sh: add check for flush+reload
bug
net/netfilter/nft_set_pipapo_avx2.c | 20 +++---
.../net/netfilter/nft_concat_range.sh | 68 ++++++++++++++++++-
2 files changed, 77 insertions(+), 11 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
2026-03-18 13:24 [PATCH nf 0/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry Florian Westphal
@ 2026-03-18 13:24 ` Florian Westphal
2026-03-21 14:25 ` Stefano Brivio
2026-03-18 13:24 ` [PATCH nf 2/2] selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug Florian Westphal
1 sibling, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-03-18 13:24 UTC (permalink / raw)
To: netfilter-devel; +Cc: sbrivio, Florian Westphal
New test case fails unexpectedly when avx2 matching functions are used.
The test first loads a ranomly generated pipapo set
with 'ipv4 . port' key, i.e. nft -f foo.
This works. Then, it reloads the set after a flush:
(echo flush set t s; cat foo) | nft -f -
This is expected to work, because its the same set after all and it was
already loaded succesfully once.
But with avx2, this fails: nft reports a clashing element.
The reported clash is of following form:
We successfully re-inserted
a . b
c . d
Then we try to insert a . d
avx2 finds the already existing a . d, which (due to 'flush set') is marked
as invalid in the new generation. It skips the element and moves to next.
Due to incorrect masking, the skip-step finds the next matching
element *only considering the first field*,
i.e. we return the already reinserted "a . b", even though the
last field is different and the entry should not have been matched.
No such error is reported for the generic c implementation (no avx2) or when
the last field has to use the 'nft_pipapo_avx2_lookup_slow' fallback.
Bisection points to
7711f4bb4b36 ("netfilter: nft_set_pipapo: fix range overlap detection")
but that fix merely uncovers this bug.
Before this commit, the wrong element is returned, but erronously
reported as a full, identical duplicate.
The root-cause is too early return in the avx2 match functions.
When we process the last field, we should continue to process data
until the entire input size has been consumed to make sure no stale
bits remain in the map.
An alternative fix is to change the avx2 lookup functions to also
return the last 'i_ul' (map store location) and then replace:
if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
!nft_set_elem_active(&e->ext, genmask))) {
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
goto next_match;
}
With:
if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
!nft_set_elem_active(&e->ext, genmask))) {
if (slow)
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
else
ret = nft_pipapo_avx2_refill(i_ul, &res[i_ul], fill, f->mt, last);
fill, f->mt, last);
goto next_match;
... so that irrelvant map parts aren't considered. However, the diffstat
is significantly larger than this one.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_pipapo_avx2.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 7ff90325c97f..6395982e4d95 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -242,7 +242,7 @@ static int nft_pipapo_avx2_lookup_4b_2(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -319,7 +319,7 @@ static int nft_pipapo_avx2_lookup_4b_4(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -414,7 +414,7 @@ static int nft_pipapo_avx2_lookup_4b_8(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -505,7 +505,7 @@ static int nft_pipapo_avx2_lookup_4b_12(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -641,7 +641,7 @@ static int nft_pipapo_avx2_lookup_4b_32(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -699,7 +699,7 @@ static int nft_pipapo_avx2_lookup_8b_1(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -764,7 +764,7 @@ static int nft_pipapo_avx2_lookup_8b_2(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -839,7 +839,7 @@ static int nft_pipapo_avx2_lookup_8b_4(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -925,7 +925,7 @@ static int nft_pipapo_avx2_lookup_8b_6(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
@@ -1019,7 +1019,7 @@ static int nft_pipapo_avx2_lookup_8b_16(unsigned long *map, unsigned long *fill,
b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
if (last)
- return b;
+ ret = b;
if (unlikely(ret == -1))
ret = b / XSAVE_YMM_SIZE;
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH nf 2/2] selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
2026-03-18 13:24 [PATCH nf 0/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry Florian Westphal
2026-03-18 13:24 ` [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry Florian Westphal
@ 2026-03-18 13:24 ` Florian Westphal
2026-03-21 14:25 ` Stefano Brivio
1 sibling, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2026-03-18 13:24 UTC (permalink / raw)
To: netfilter-devel; +Cc: sbrivio, Florian Westphal
This test will fail without
the preceeding commit ("netfilter: nft_set_pipapo_avx2: fix match retart if found element is expired"):
reject overlapping range on add 0s [ OK ]
reload with flush /dev/stdin:59:32-52: Error: Could not process rule: File exists
add element inet filter test { 10.0.0.29 . 10.0.2.29 }
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../net/netfilter/nft_concat_range.sh | 68 ++++++++++++++++++-
1 file changed, 67 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/netfilter/nft_concat_range.sh b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
index 394166f224a4..c1ee0c2da583 100755
--- a/tools/testing/selftests/net/netfilter/nft_concat_range.sh
+++ b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
@@ -29,7 +29,7 @@ TYPES="net_port port_net net6_port port_proto net6_port_mac net6_port_mac_proto
net6_port_net6_port net_port_mac_proto_net"
# Reported bugs, also described by TYPE_ variables below
-BUGS="flush_remove_add reload net_port_proto_match avx2_mismatch doublecreate insert_overlap"
+BUGS="flush_remove_add reload net_port_proto_match avx2_mismatch doublecreate insert_overlap load_flush_load4 load_flush_load8"
# List of possible paths to pktgen script from kernel tree for performance tests
PKTGEN_SCRIPT_PATHS="
@@ -432,6 +432,30 @@ race_repeat 0
perf_duration 0
"
+TYPE_load_flush_load4="
+display reload with flush, 4bit groups
+type_spec ipv4_addr . ipv4_addr
+chain_spec ip saddr . ip daddr
+dst addr4
+proto icmp
+
+race_repeat 0
+
+perf_duration 0
+"
+
+TYPE_load_flush_load8="
+display reload with flush, 8bit groups
+type_spec ipv4_addr . ipv4_addr
+chain_spec ip saddr . ip daddr
+dst addr4
+proto icmp
+
+race_repeat 0
+
+perf_duration 0
+"
+
# Set template for all tests, types and rules are filled in depending on test
set_template='
flush ruleset
@@ -1997,6 +2021,48 @@ test_bug_insert_overlap()
return 0
}
+test_bug_load_flush_load4()
+{
+ local i
+
+ setup veth send_"${proto}" set || return ${ksft_skip}
+
+ for i in $(seq 0 255); do
+ local j
+
+ for j in $(seq 0 20); do
+ echo "add element inet filter test { 10.$j.0.$i . 10.$j.1.$i }"
+ echo "add element inet filter test { 10.$j.0.$i . 10.$j.2.$i }"
+ done
+ done > "$tmp"
+
+ nft -f "$tmp" || return 1
+
+ ( echo "flush set inet filter test";cat "$tmp") | nft -f -
+ [ $? -eq 0 ] || return 1
+
+ return 0
+}
+
+test_bug_load_flush_load8()
+{
+ local i
+
+ setup veth send_"${proto}" set || return ${ksft_skip}
+
+ for i in $(seq 1 100); do
+ echo "add element inet filter test { 10.0.0.$i . 10.0.1.$i }"
+ echo "add element inet filter test { 10.0.0.$i . 10.0.2.$i }"
+ done > "$tmp"
+
+ nft -f "$tmp" || return 1
+
+ ( echo "flush set inet filter test";cat "$tmp") | nft -f -
+ [ $? -eq 0 ] || return 1
+
+ return 0
+}
+
test_reported_issues() {
eval test_bug_"${subtest}"
}
--
2.52.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
2026-03-18 13:24 ` [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry Florian Westphal
@ 2026-03-21 14:25 ` Stefano Brivio
0 siblings, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2026-03-21 14:25 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Wed, 18 Mar 2026 14:24:13 +0100
Florian Westphal <fw@strlen.de> wrote:
> New test case fails unexpectedly when avx2 matching functions are used.
>
> The test first loads a ranomly generated pipapo set
> with 'ipv4 . port' key, i.e. nft -f foo.
>
> This works. Then, it reloads the set after a flush:
> (echo flush set t s; cat foo) | nft -f -
>
> This is expected to work, because its the same set after all and it was
> already loaded succesfully once.
>
> But with avx2, this fails: nft reports a clashing element.
>
> The reported clash is of following form:
>
> We successfully re-inserted
> a . b
> c . d
>
> Then we try to insert a . d
>
> avx2 finds the already existing a . d, which (due to 'flush set') is marked
> as invalid in the new generation. It skips the element and moves to next.
>
> Due to incorrect masking, the skip-step finds the next matching
> element *only considering the first field*,
>
> i.e. we return the already reinserted "a . b", even though the
> last field is different and the entry should not have been matched.
>
> No such error is reported for the generic c implementation (no avx2) or when
> the last field has to use the 'nft_pipapo_avx2_lookup_slow' fallback.
>
> Bisection points to
> 7711f4bb4b36 ("netfilter: nft_set_pipapo: fix range overlap detection")
> but that fix merely uncovers this bug.
>
> Before this commit, the wrong element is returned, but erronously
> reported as a full, identical duplicate.
>
> The root-cause is too early return in the avx2 match functions.
> When we process the last field, we should continue to process data
> until the entire input size has been consumed to make sure no stale
> bits remain in the map.
Oops, thanks for fixing this. It must have been a lot of "fun" to debug
it.
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
An explanation below.
> An alternative fix is to change the avx2 lookup functions to also
> return the last 'i_ul' (map store location) and then replace:
>
> if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
> !nft_set_elem_active(&e->ext, genmask))) {
> ret = pipapo_refill(res, f->bsize, f->rules,
> fill, f->mt, last);
> goto next_match;
> }
>
> With:
> if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
> !nft_set_elem_active(&e->ext, genmask))) {
> if (slow)
> ret = pipapo_refill(res, f->bsize, f->rules,
> fill, f->mt, last);
> else
> ret = nft_pipapo_avx2_refill(i_ul, &res[i_ul], fill, f->mt, last);
> fill, f->mt, last);
> goto next_match;
By the way, there are some mixed spaces and tabs and one duplicate
line in these snippets, which make them a bit hard to understand, they
should be:
if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
!nft_set_elem_active(&e->ext, genmask))) {
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
goto next_match;
}
and:
if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
!nft_set_elem_active(&e->ext, genmask))) {
if (slow) {
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
} else {
ret = nft_pipapo_avx2_refill(i_ul, &res[i_ul],
fill, f->mt, last);
}
goto next_match;
}
> ... so that irrelvant map parts aren't considered. However, the diffstat
> is significantly larger than this one.
Right, yes, I don't think it's worth it. The extra branch might
actually make things slower.
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/netfilter/nft_set_pipapo_avx2.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
> index 7ff90325c97f..6395982e4d95 100644
> --- a/net/netfilter/nft_set_pipapo_avx2.c
> +++ b/net/netfilter/nft_set_pipapo_avx2.c
> @@ -242,7 +242,7 @@ static int nft_pipapo_avx2_lookup_4b_2(unsigned long *map, unsigned long *fill,
>
> b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
> if (last)
> - return b;
This came from this kind of optimisation:
https://pipapo.lameexcu.se/pipapo/tree/avx2.h?id=a724e8dbd67ce3d9bf5a24bd836dea4ad3a5516f#n59
if (!_mm256_testz_si256(r4, r4)) {
if (last) {
_mm256_store_si256((__m256i *)(map + i), r4);
return i;
}
ret = 0;
}
which, however, is only applied if the register with the current
matching results is all zeroes, *and* I think it's incorrect anyway as
it would only work for a 32-byte sized bucket (single iteration).
I think it's a left-over from the previous stage of implementation
where I only had 32-byte sized buckets for simplicity.
Now, we could probably reintroduce this kind of implementation on the
lines of what you suggested but a long branch like that doesn't look
promising in terms of clock cycles.
--
Stefano
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH nf 2/2] selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug
2026-03-18 13:24 ` [PATCH nf 2/2] selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug Florian Westphal
@ 2026-03-21 14:25 ` Stefano Brivio
0 siblings, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2026-03-21 14:25 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Wed, 18 Mar 2026 14:24:14 +0100
Florian Westphal <fw@strlen.de> wrote:
> This test will fail without
> the preceeding commit ("netfilter: nft_set_pipapo_avx2: fix match retart if found element is expired"):
>
> reject overlapping range on add 0s [ OK ]
> reload with flush /dev/stdin:59:32-52: Error: Could not process rule: File exists
> add element inet filter test { 10.0.0.29 . 10.0.2.29 }
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
--
Stefano
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-21 14:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 13:24 [PATCH nf 0/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry Florian Westphal
2026-03-18 13:24 ` [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry Florian Westphal
2026-03-21 14:25 ` Stefano Brivio
2026-03-18 13:24 ` [PATCH nf 2/2] selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug Florian Westphal
2026-03-21 14:25 ` Stefano Brivio
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox