From: Stefano Brivio <sbrivio@redhat.com>
To: Florian Westphal <fw@strlen.de>
Cc: <netfilter-devel@vger.kernel.org>
Subject: Re: [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries
Date: Sat, 13 Sep 2025 02:13:26 +0200 [thread overview]
Message-ID: <20250913021326.52fc3ca6@elisabeth> (raw)
In-Reply-To: <20250912132004.7925-1-fw@strlen.de>
On Fri, 12 Sep 2025 15:19:59 +0200
Florian Westphal <fw@strlen.de> wrote:
> KASAN reports following splat:
> BUG: KASAN: slab-out-of-bounds in pipapo_get_avx2+0x941/0x25d0
> Read of size 1 at addr ffff88814c561be0 by task nft/3944
>
> CPU: 7 UID: 0 PID: 3944 Comm: nft Not tainted 6.17.0-rc4+ #637 PREEMPT(full)
> Call Trace:
> pipapo_get_avx2+0x941/0x25d0
> ? __local_bh_enable_ip+0x116/0x1a0
> ? pipapo_get_avx2+0xee/0x25d0
> ? nft_pipapo_insert+0x22b/0x11b0
> nft_pipapo_insert+0x440/0x11b0
> nf_tables_newsetelem+0x220a/0x3a00
> ..
>
> This bisects down to
> 84c1da7b38d9 ("netfilter: nft_set_pipapo: use AVX2 algorithm for insertions too"),
> however, it merely uncovers this bug.
>
> When we find a match but that match has expired or timed out, the AVX2
> implementation restarts the full match loop.
>
> At that point, data (key element or start of register space with the key)
> has already been incremented to point to the last key field:
> out-of-bounds access occurs.
Oops.
By the way, you're referring to 'rp' here, and nothing else, right?
Could you mention that explicitly if that's the case? I have to say
that "data (...) has already been incremented" took me a while to
understand.
> The restart logic in AVX2 is different compared to the plain C
> implementation, but both should follow the same logic.
That's because I wanted to avoid calling pipapo_refill() from the AVX2
lookup path, as it was significantly slower than re-doing the full
lookup, at least for "net, port" sets which I assumed would be the most
common.
On the other hand, it should always be a corner case (right?), so I
guess simplicity / consistency should prevail.
> The C implementation just calls pipapo_refill() again to check the next
> entry. Do the same in the AVX2 implementation.
An alternative would be to reset rp = key, but maybe what you're doing
is actually saner, see above.
> Note that with this change, due to implementation differences of
> pipapo_refill vs. nft_pipapo_avx2_refill, the refill call will return
> the same element again, then, on the next call it will move to the next
> entry as expected. This is because avx2_refill doesn't clear the bitmap
> in the 'last' conditional. This is harmless.
>
> A selftest test case comes in a followup patch.
>
> Sent as RFC tag because it needs to be revamped after net -> net-next
> merge, there are conflicting changes in these two trees at the moment.
>
> Another alternative is to retarget this patch to nf, but given its
> a day-0 bug that only got exposed due to the use of AVX2 in insertion
> path added recently I think -next is fine.
>
> Cc: Stefano Brivio <sbrivio@redhat.com>
> Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/netfilter/nft_set_pipapo_avx2.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
> index 7559306d0aed..d97b67a4de16 100644
> --- a/net/netfilter/nft_set_pipapo_avx2.c
> +++ b/net/netfilter/nft_set_pipapo_avx2.c
> @@ -1179,7 +1179,6 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
>
> nft_pipapo_avx2_prepare();
>
> -next_match:
> nft_pipapo_for_each_field(f, i, m) {
> bool last = i == m->field_count - 1, first = !i;
> int ret = 0;
> @@ -1226,6 +1225,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
>
> #undef NFT_SET_PIPAPO_AVX2_LOOKUP
>
> +next_match:
> if (ret < 0) {
> scratch->map_index = map_index;
> kernel_fpu_end();
> @@ -1238,8 +1238,10 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
>
> e = f->mt[ret].e;
> if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
> - !nft_set_elem_active(&e->ext, genmask)))
> - goto next_match;
> + !nft_set_elem_active(&e->ext, genmask))) {
> + ret = pipapo_refill(res, f->bsize, f->rules, fill, f->mt, last);
It could be wrapped like this:
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
> + goto next_match:
> + }
>
> scratch->map_index = map_index;
> kernel_fpu_end();
In any case:
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
--
Stefano
next prev parent reply other threads:[~2025-09-13 0:13 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-12 13:19 [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Florian Westphal
2025-09-12 13:20 ` [PATCH RFC nf-next 2/2] selftests: netfilter: nft_concat_range.sh: add check for double-create bug Florian Westphal
2025-09-13 0:13 ` Stefano Brivio
2025-09-13 0:13 ` Stefano Brivio [this message]
2025-09-13 20:17 ` [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Florian Westphal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250913021326.52fc3ca6@elisabeth \
--to=sbrivio@redhat.com \
--cc=fw@strlen.de \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).