From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: <netfilter-devel@vger.kernel.org>
Subject: Re: [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry
Date: Sat, 21 Mar 2026 15:25:07 +0100 (CET) [thread overview]
Message-ID: <20260321152506.037f68c0@elisabeth> (raw)
In-Reply-To: <20260318132417.31661-2-fw@strlen.de>
On Wed, 18 Mar 2026 14:24:13 +0100
Florian Westphal <fw@strlen.de> wrote:
> New test case fails unexpectedly when avx2 matching functions are used.
>
> The test first loads a ranomly generated pipapo set
> with 'ipv4 . port' key, i.e. nft -f foo.
>
> This works. Then, it reloads the set after a flush:
> (echo flush set t s; cat foo) | nft -f -
>
> This is expected to work, because its the same set after all and it was
> already loaded succesfully once.
>
> But with avx2, this fails: nft reports a clashing element.
>
> The reported clash is of following form:
>
> We successfully re-inserted
> a . b
> c . d
>
> Then we try to insert a . d
>
> avx2 finds the already existing a . d, which (due to 'flush set') is marked
> as invalid in the new generation. It skips the element and moves to next.
>
> Due to incorrect masking, the skip-step finds the next matching
> element *only considering the first field*,
>
> i.e. we return the already reinserted "a . b", even though the
> last field is different and the entry should not have been matched.
>
> No such error is reported for the generic c implementation (no avx2) or when
> the last field has to use the 'nft_pipapo_avx2_lookup_slow' fallback.
>
> Bisection points to
> 7711f4bb4b36 ("netfilter: nft_set_pipapo: fix range overlap detection")
> but that fix merely uncovers this bug.
>
> Before this commit, the wrong element is returned, but erronously
> reported as a full, identical duplicate.
>
> The root-cause is too early return in the avx2 match functions.
> When we process the last field, we should continue to process data
> until the entire input size has been consumed to make sure no stale
> bits remain in the map.
Oops, thanks for fixing this. It must have been a lot of "fun" to debug
it.
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
An explanation below.
> An alternative fix is to change the avx2 lookup functions to also
> return the last 'i_ul' (map store location) and then replace:
>
> if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
> !nft_set_elem_active(&e->ext, genmask))) {
> ret = pipapo_refill(res, f->bsize, f->rules,
> fill, f->mt, last);
> goto next_match;
> }
>
> With:
> if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
> !nft_set_elem_active(&e->ext, genmask))) {
> if (slow)
> ret = pipapo_refill(res, f->bsize, f->rules,
> fill, f->mt, last);
> else
> ret = nft_pipapo_avx2_refill(i_ul, &res[i_ul], fill, f->mt, last);
> fill, f->mt, last);
> goto next_match;
By the way, there are some mixed spaces and tabs and one duplicate
line in these snippets, which make them a bit hard to understand, they
should be:
if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
!nft_set_elem_active(&e->ext, genmask))) {
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
goto next_match;
}
and:
if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
!nft_set_elem_active(&e->ext, genmask))) {
if (slow) {
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
} else {
ret = nft_pipapo_avx2_refill(i_ul, &res[i_ul],
fill, f->mt, last);
}
goto next_match;
}
> ... so that irrelvant map parts aren't considered. However, the diffstat
> is significantly larger than this one.
Right, yes, I don't think it's worth it. The extra branch might
actually make things slower.
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/netfilter/nft_set_pipapo_avx2.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
> index 7ff90325c97f..6395982e4d95 100644
> --- a/net/netfilter/nft_set_pipapo_avx2.c
> +++ b/net/netfilter/nft_set_pipapo_avx2.c
> @@ -242,7 +242,7 @@ static int nft_pipapo_avx2_lookup_4b_2(unsigned long *map, unsigned long *fill,
>
> b = nft_pipapo_avx2_refill(i_ul, &map[i_ul], fill, f->mt, last);
> if (last)
> - return b;
This came from this kind of optimisation:
https://pipapo.lameexcu.se/pipapo/tree/avx2.h?id=a724e8dbd67ce3d9bf5a24bd836dea4ad3a5516f#n59
if (!_mm256_testz_si256(r4, r4)) {
if (last) {
_mm256_store_si256((__m256i *)(map + i), r4);
return i;
}
ret = 0;
}
which, however, is only applied if the register with the current
matching results is all zeroes, *and* I think it's incorrect anyway as
it would only work for a 32-byte sized bucket (single iteration).
I think it's a left-over from the previous stage of implementation
where I only had 32-byte sized buckets for simplicity.
Now, we could probably reintroduce this kind of implementation on the
lines of what you suggested but a long branch like that doesn't look
promising in terms of clock cycles.
--
Stefano
next prev parent reply other threads:[~2026-03-21 14:25 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-18 13:24 [PATCH nf 0/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry Florian Westphal
2026-03-18 13:24 ` [PATCH nf 1/2] netfilter: nft_set_pipapo_avx2: don't return non-matching entry on expiry Florian Westphal
2026-03-21 14:25 ` Stefano Brivio [this message]
2026-03-18 13:24 ` [PATCH nf 2/2] selftests: netfilter: nft_concat_range.sh: add check for flush+reload bug Florian Westphal
2026-03-21 14:25 ` Stefano Brivio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260321152506.037f68c0@elisabeth \
--to=sbrivio@redhat.com \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox