* [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries
@ 2025-09-12 13:19 Florian Westphal
2025-09-12 13:20 ` [PATCH RFC nf-next 2/2] selftests: netfilter: nft_concat_range.sh: add check for double-create bug Florian Westphal
2025-09-13 0:13 ` [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Stefano Brivio
0 siblings, 2 replies; 5+ messages in thread
From: Florian Westphal @ 2025-09-12 13:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal, Stefano Brivio
KASAN reports following splat:
BUG: KASAN: slab-out-of-bounds in pipapo_get_avx2+0x941/0x25d0
Read of size 1 at addr ffff88814c561be0 by task nft/3944
CPU: 7 UID: 0 PID: 3944 Comm: nft Not tainted 6.17.0-rc4+ #637 PREEMPT(full)
Call Trace:
pipapo_get_avx2+0x941/0x25d0
? __local_bh_enable_ip+0x116/0x1a0
? pipapo_get_avx2+0xee/0x25d0
? nft_pipapo_insert+0x22b/0x11b0
nft_pipapo_insert+0x440/0x11b0
nf_tables_newsetelem+0x220a/0x3a00
..
This bisects down to
84c1da7b38d9 ("netfilter: nft_set_pipapo: use AVX2 algorithm for insertions too"),
however, it merely uncovers this bug.
When we find a match but that match has expired or timed out, the AVX2
implementation restarts the full match loop.
At that point, data (key element or start of register space with the key)
has already been incremented to point to the last key field:
out-of-bounds access occurs.
The restart logic in AVX2 is different compared to the plain C
implementation, but both should follow the same logic.
The C implementation just calls pipapo_refill() again to check the next
entry. Do the same in the AVX2 implementation.
Note that with this change, due to implementation differences of
pipapo_refill vs. nft_pipapo_avx2_refill, the refill call will return
the same element again, then, on the next call it will move to the next
entry as expected. This is because avx2_refill doesn't clear the bitmap
in the 'last' conditional. This is harmless.
A selftest test case comes in a followup patch.
Sent as RFC tag because it needs to be revamped after net -> net-next
merge, there are conflicting changes in these two trees at the moment.
Another alternative is to retarget this patch to nf, but given its
a day-0 bug that only got exposed due to the use of AVX2 in insertion
path added recently I think -next is fine.
Cc: Stefano Brivio <sbrivio@redhat.com>
Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_set_pipapo_avx2.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 7559306d0aed..d97b67a4de16 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1179,7 +1179,6 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
nft_pipapo_avx2_prepare();
-next_match:
nft_pipapo_for_each_field(f, i, m) {
bool last = i == m->field_count - 1, first = !i;
int ret = 0;
@@ -1226,6 +1225,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
#undef NFT_SET_PIPAPO_AVX2_LOOKUP
+next_match:
if (ret < 0) {
scratch->map_index = map_index;
kernel_fpu_end();
@@ -1238,8 +1238,10 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
e = f->mt[ret].e;
if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
- !nft_set_elem_active(&e->ext, genmask)))
- goto next_match;
+ !nft_set_elem_active(&e->ext, genmask))) {
+ ret = pipapo_refill(res, f->bsize, f->rules, fill, f->mt, last);
+ goto next_match:
+ }
scratch->map_index = map_index;
kernel_fpu_end();
--
2.49.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH RFC nf-next 2/2] selftests: netfilter: nft_concat_range.sh: add check for double-create bug
2025-09-12 13:19 [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Florian Westphal
@ 2025-09-12 13:20 ` Florian Westphal
2025-09-13 0:13 ` Stefano Brivio
2025-09-13 0:13 ` [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Stefano Brivio
1 sibling, 1 reply; 5+ messages in thread
From: Florian Westphal @ 2025-09-12 13:20 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal, Stefano Brivio
Add a test case for bug resolved with:
'netfilter: nft_set_pipapo_avx2: fix skip of expired entries'.
It passes on nf.git (it uses the generic/C version for insertion
duplicate check) but fails on unpatched nf-next if AVX2 is supported:
cannot create same element twice 0s [FAIL]
Could create element twice in same transaction
table inet filter { # handle 8
[..]
elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0,
1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0 }
Cc: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
.../net/netfilter/nft_concat_range.sh | 38 ++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/netfilter/nft_concat_range.sh b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
index 20e76b395c85..4d4d5004684c 100755
--- a/tools/testing/selftests/net/netfilter/nft_concat_range.sh
+++ b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
@@ -29,7 +29,7 @@ TYPES="net_port port_net net6_port port_proto net6_port_mac net6_port_mac_proto
net6_port_net6_port net_port_mac_proto_net"
# Reported bugs, also described by TYPE_ variables below
-BUGS="flush_remove_add reload net_port_proto_match avx2_mismatch"
+BUGS="flush_remove_add reload net_port_proto_match avx2_mismatch doublecreate"
# List of possible paths to pktgen script from kernel tree for performance tests
PKTGEN_SCRIPT_PATHS="
@@ -408,6 +408,18 @@ perf_duration 0
"
+TYPE_doublecreate="
+display cannot create same element twice
+type_spec ipv4_addr . ipv4_addr
+chain_spec ip saddr . ip daddr
+dst addr4
+proto icmp
+
+race_repeat 0
+
+perf_duration 0
+"
+
# Set template for all tests, types and rules are filled in depending on test
set_template='
flush ruleset
@@ -1900,6 +1912,30 @@ test_bug_avx2_mismatch()
fi
}
+test_bug_doublecreate()
+{
+ local elements="1.2.3.4 . 1.2.4.1, 1.2.4.1 . 1.2.3.4"
+ local ret=1
+
+ setup veth send_"${proto}" set || return ${ksft_skip}
+
+ nft add element inet filter test "{ $elements }"
+nft -f - <<EOF 2>/dev/null
+flush set inet filter test
+create element inet filter test { $elements }
+create element inet filter test { $elements }
+EOF
+ ret=$?
+
+ if [ $ret -eq 0 ];then
+ err "Could create element twice in same transaction"
+ err "$(nft -a list ruleset)"
+ return 1
+ fi
+
+ return 0
+}
+
test_reported_issues() {
eval test_bug_"${subtest}"
}
--
2.49.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries
2025-09-12 13:19 [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Florian Westphal
2025-09-12 13:20 ` [PATCH RFC nf-next 2/2] selftests: netfilter: nft_concat_range.sh: add check for double-create bug Florian Westphal
@ 2025-09-13 0:13 ` Stefano Brivio
2025-09-13 20:17 ` Florian Westphal
1 sibling, 1 reply; 5+ messages in thread
From: Stefano Brivio @ 2025-09-13 0:13 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, 12 Sep 2025 15:19:59 +0200
Florian Westphal <fw@strlen.de> wrote:
> KASAN reports following splat:
> BUG: KASAN: slab-out-of-bounds in pipapo_get_avx2+0x941/0x25d0
> Read of size 1 at addr ffff88814c561be0 by task nft/3944
>
> CPU: 7 UID: 0 PID: 3944 Comm: nft Not tainted 6.17.0-rc4+ #637 PREEMPT(full)
> Call Trace:
> pipapo_get_avx2+0x941/0x25d0
> ? __local_bh_enable_ip+0x116/0x1a0
> ? pipapo_get_avx2+0xee/0x25d0
> ? nft_pipapo_insert+0x22b/0x11b0
> nft_pipapo_insert+0x440/0x11b0
> nf_tables_newsetelem+0x220a/0x3a00
> ..
>
> This bisects down to
> 84c1da7b38d9 ("netfilter: nft_set_pipapo: use AVX2 algorithm for insertions too"),
> however, it merely uncovers this bug.
>
> When we find a match but that match has expired or timed out, the AVX2
> implementation restarts the full match loop.
>
> At that point, data (key element or start of register space with the key)
> has already been incremented to point to the last key field:
> out-of-bounds access occurs.
Oops.
By the way, you're referring to 'rp' here, and nothing else, right?
Could you mention that explicitly if that's the case? I have to say
that "data (...) has already been incremented" took me a while to
understand.
> The restart logic in AVX2 is different compared to the plain C
> implementation, but both should follow the same logic.
That's because I wanted to avoid calling pipapo_refill() from the AVX2
lookup path, as it was significantly slower than re-doing the full
lookup, at least for "net, port" sets which I assumed would be the most
common.
On the other hand, it should always be a corner case (right?), so I
guess simplicity / consistency should prevail.
> The C implementation just calls pipapo_refill() again to check the next
> entry. Do the same in the AVX2 implementation.
An alternative would be to reset rp = key, but maybe what you're doing
is actually saner, see above.
> Note that with this change, due to implementation differences of
> pipapo_refill vs. nft_pipapo_avx2_refill, the refill call will return
> the same element again, then, on the next call it will move to the next
> entry as expected. This is because avx2_refill doesn't clear the bitmap
> in the 'last' conditional. This is harmless.
>
> A selftest test case comes in a followup patch.
>
> Sent as RFC tag because it needs to be revamped after net -> net-next
> merge, there are conflicting changes in these two trees at the moment.
>
> Another alternative is to retarget this patch to nf, but given its
> a day-0 bug that only got exposed due to the use of AVX2 in insertion
> path added recently I think -next is fine.
>
> Cc: Stefano Brivio <sbrivio@redhat.com>
> Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/netfilter/nft_set_pipapo_avx2.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
> index 7559306d0aed..d97b67a4de16 100644
> --- a/net/netfilter/nft_set_pipapo_avx2.c
> +++ b/net/netfilter/nft_set_pipapo_avx2.c
> @@ -1179,7 +1179,6 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
>
> nft_pipapo_avx2_prepare();
>
> -next_match:
> nft_pipapo_for_each_field(f, i, m) {
> bool last = i == m->field_count - 1, first = !i;
> int ret = 0;
> @@ -1226,6 +1225,7 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
>
> #undef NFT_SET_PIPAPO_AVX2_LOOKUP
>
> +next_match:
> if (ret < 0) {
> scratch->map_index = map_index;
> kernel_fpu_end();
> @@ -1238,8 +1238,10 @@ struct nft_pipapo_elem *pipapo_get_avx2(const struct nft_pipapo_match *m,
>
> e = f->mt[ret].e;
> if (unlikely(__nft_set_elem_expired(&e->ext, tstamp) ||
> - !nft_set_elem_active(&e->ext, genmask)))
> - goto next_match;
> + !nft_set_elem_active(&e->ext, genmask))) {
> + ret = pipapo_refill(res, f->bsize, f->rules, fill, f->mt, last);
It could be wrapped like this:
ret = pipapo_refill(res, f->bsize, f->rules,
fill, f->mt, last);
> + goto next_match:
> + }
>
> scratch->map_index = map_index;
> kernel_fpu_end();
In any case:
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
--
Stefano
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC nf-next 2/2] selftests: netfilter: nft_concat_range.sh: add check for double-create bug
2025-09-12 13:20 ` [PATCH RFC nf-next 2/2] selftests: netfilter: nft_concat_range.sh: add check for double-create bug Florian Westphal
@ 2025-09-13 0:13 ` Stefano Brivio
0 siblings, 0 replies; 5+ messages in thread
From: Stefano Brivio @ 2025-09-13 0:13 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, 12 Sep 2025 15:20:00 +0200
Florian Westphal <fw@strlen.de> wrote:
> Add a test case for bug resolved with:
> 'netfilter: nft_set_pipapo_avx2: fix skip of expired entries'.
>
> It passes on nf.git (it uses the generic/C version for insertion
> duplicate check) but fails on unpatched nf-next if AVX2 is supported:
>
> cannot create same element twice 0s [FAIL]
> Could create element twice in same transaction
> table inet filter { # handle 8
> [..]
> elements = { 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
> 1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0,
> 1.2.3.4 . 1.2.4.1 counter packets 0 bytes 0,
> 1.2.4.1 . 1.2.3.4 counter packets 0 bytes 0 }
>
> Cc: Stefano Brivio <sbrivio@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> .../net/netfilter/nft_concat_range.sh | 38 ++++++++++++++++++-
> 1 file changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/net/netfilter/nft_concat_range.sh b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
> index 20e76b395c85..4d4d5004684c 100755
> --- a/tools/testing/selftests/net/netfilter/nft_concat_range.sh
> +++ b/tools/testing/selftests/net/netfilter/nft_concat_range.sh
> @@ -29,7 +29,7 @@ TYPES="net_port port_net net6_port port_proto net6_port_mac net6_port_mac_proto
> net6_port_net6_port net_port_mac_proto_net"
>
> # Reported bugs, also described by TYPE_ variables below
> -BUGS="flush_remove_add reload net_port_proto_match avx2_mismatch"
> +BUGS="flush_remove_add reload net_port_proto_match avx2_mismatch doublecreate"
>
> # List of possible paths to pktgen script from kernel tree for performance tests
> PKTGEN_SCRIPT_PATHS="
> @@ -408,6 +408,18 @@ perf_duration 0
> "
>
>
> +TYPE_doublecreate="
> +display cannot create same element twice
> +type_spec ipv4_addr . ipv4_addr
> +chain_spec ip saddr . ip daddr
> +dst addr4
> +proto icmp
> +
> +race_repeat 0
> +
> +perf_duration 0
> +"
> +
> # Set template for all tests, types and rules are filled in depending on test
> set_template='
> flush ruleset
> @@ -1900,6 +1912,30 @@ test_bug_avx2_mismatch()
> fi
> }
>
> +test_bug_doublecreate()
> +{
> + local elements="1.2.3.4 . 1.2.4.1, 1.2.4.1 . 1.2.3.4"
> + local ret=1
> +
> + setup veth send_"${proto}" set || return ${ksft_skip}
> +
> + nft add element inet filter test "{ $elements }"
> +nft -f - <<EOF 2>/dev/null
> +flush set inet filter test
> +create element inet filter test { $elements }
> +create element inet filter test { $elements }
> +EOF
> + ret=$?
> +
> + if [ $ret -eq 0 ];then
Nit: everywhere else in this file it's 'if [ ... ]; then', with a
whitespace before 'then' (same as all the examples from POSIX without a
newline). Either way,
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> + err "Could create element twice in same transaction"
> + err "$(nft -a list ruleset)"
> + return 1
> + fi
> +
> + return 0
> +}
> +
> test_reported_issues() {
> eval test_bug_"${subtest}"
> }
--
Stefano
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries
2025-09-13 0:13 ` [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Stefano Brivio
@ 2025-09-13 20:17 ` Florian Westphal
0 siblings, 0 replies; 5+ messages in thread
From: Florian Westphal @ 2025-09-13 20:17 UTC (permalink / raw)
To: Stefano Brivio; +Cc: netfilter-devel
Stefano Brivio <sbrivio@redhat.com> wrote:
> On Fri, 12 Sep 2025 15:19:59 +0200
> Florian Westphal <fw@strlen.de> wrote:
> By the way, you're referring to 'rp' here, and nothing else, right?
Yes, sorry.
> Could you mention that explicitly if that's the case? I have to say
> that "data (...) has already been incremented" took me a while to
> understand.
Apologies, I will rephrase it.
> > The restart logic in AVX2 is different compared to the plain C
> > implementation, but both should follow the same logic.
>
> That's because I wanted to avoid calling pipapo_refill() from the AVX2
> lookup path, as it was significantly slower than re-doing the full
> lookup, at least for "net, port" sets which I assumed would be the most
> common.
>
> On the other hand, it should always be a corner case (right?), so I
> guess simplicity / consistency should prevail.
Yes, I don't think we ever have a duplicate to begin with, insert path
returns -EEXIST or ignores the insertion request.
They exist in the clone/copy for the case when one transaction asked for
remoal of "e" (and marks it as invalid in new generation) followed by
a re-insert of "e".
In that case, The "e" re-insert should find and skip the old result
and continue to search for a matching result.
Thats also what the test case does:
insert e
<end of transaction>
flush set x <mark e invalid>
insert e <find e-old, continue search, find no result>, insert ok
insert e <find e-old, continue search, find e-new, fail transaction
I think I'll extend the test case to also check that "flush set x;
insert e" works on a set that already contained e before.
> > The C implementation just calls pipapo_refill() again to check the next
> > entry. Do the same in the AVX2 implementation.
>
> An alternative would be to reset rp = key, but maybe what you're doing
> is actually saner, see above.
Nope, that doesn't work. If you reset rp = key, it fixes the splat
but results in infinite loop.
> > Note that with this change, due to implementation differences of
> > pipapo_refill vs. nft_pipapo_avx2_refill, the refill call will return
> > the same element again,
> > then, on the next call it will move to the next
> > entry as expected. This is because avx2_refill doesn't clear the bitmap
> > in the 'last' conditional.
Note last sentence, its important the bitmap is toggled to off in the
"last" case so the next attempt won't result in rediscovery of that
element.
After this patch, the sequence is: (in case you have key matching both e1
and e2):
1. find e1, test, expired -> call pipapo_refill
2. find e1 again, expired -> call pipapo_refill
3. find e2, test, valid -> return e2 as matching entry
without this patch, its:
find e1, test, expired -> goto next (restart loop)
find no matching entry (rp not reset so its searching for
something else)
-> duplicate insertion passes as e2 not be found
with full restart but fixed rp assignment:
1) find e1, test, expired -> reset key
2) goto 1
... so i opted for the first solution.
For the packet path, the "pipapo_refill" call will happen when
the element is expired (timed out).
I don't see many users mixing pipapo entries with timeouts.
> > - !nft_set_elem_active(&e->ext, genmask)))
> > - goto next_match;
> > + !nft_set_elem_active(&e->ext, genmask))) {
> > + ret = pipapo_refill(res, f->bsize, f->rules, fill, f->mt, last);
>
> It could be wrapped like this:
>
> ret = pipapo_refill(res, f->bsize, f->rules,
> fill, f->mt, last);
I have to resend anyway so I will fix this up, thanks.
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Thanks for reviewing!
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-13 20:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 13:19 [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Florian Westphal
2025-09-12 13:20 ` [PATCH RFC nf-next 2/2] selftests: netfilter: nft_concat_range.sh: add check for double-create bug Florian Westphal
2025-09-13 0:13 ` Stefano Brivio
2025-09-13 0:13 ` [PATCH RFC nf-next 1/2] netfilter: nft_set_pipapo_avx2: fix skip of expired entries Stefano Brivio
2025-09-13 20:17 ` Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).