* "nft reset counters" bug on 32-bit systems
@ 2025-09-10 16:48 Andreas Fried
2025-09-10 18:08 ` Florian Westphal
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Fried @ 2025-09-10 16:48 UTC (permalink / raw)
To: netfilter
Hello,
on 32-bit systems, resetting netfilter counters sets them to 2**32
instead if their count was not 0:
# nft list counters
table ip foo {
counter test1 {
packets 4 bytes 240
}
counter test2 {
packets 0 bytes 0
}
}
# nft reset counters
table ip foo {
counter test1 {
packets 4 bytes 240
}
counter test2 {
packets 0 bytes 0
}
}
# nft list counters
table ip foo {
counter test1 {
packets 4294967296 bytes 4294967296
}
counter test2 {
packets 0 bytes 0
}
}
This was tested on an arm32 system running kernel 6.1.134 and nftables
1.0.9, but as far as I can see, the code is the same on current master.
Looking at nft_counter_reset(), this is a problem for systems where long
is 32 bits wide.
nft_counter_reset() wants to subtract the current total from the
counter, so it calls u64_stats_add() with a negative "val" argument
(e.g. -total->packets, in our case -4). But that argument is an unsigned
long (u32) being added to the u64 counter. That means that it actually
adds 0xfffffffc to the counter, giving 0x100000000, a.k.a. 2**32.
Seeing that u64_stats are used all over the place, any change might
break all sorts of other things. So I'm hesitating to suggest a patch,
just some observations:
The first solution would be a separate u64_stats_sub() function.
However, that still limits the functions to non-negative arguments, and
negative arguments still do unexpected things to the u64 counter.
It also seems to be OK to change u64_stats_add() to take s64 instead of
unsigned long. On 32-bit systems, it only makes the variable wider, so
there's no problem.
For 64-bit systems, this makes the variable one bit narrower because
it's now signed. This might be a problem, but there is some signedness
confusion going on anyway: The counter in u64_stats_t is a local64_t,
which turns out to be a wrapped s64, rather than the u64 used in 32-bit
u64_stats_t. The "val" parameter of u64_stats_add is unsigned long (i.e.
u64), but that boils down to atomic_long_add, which takes a (signed)
long (i.e. s64) parameter. So it seems like making u64_stats_add()
taking an s64 in the first place won't break.
Thanks for your help,
Andreas
--
Andreas Fried
emlix GmbH
Headquarters: Berliner Str. 12, 37073 Goettingen, Germany
Phone +49 (0)551 30664-0, e-mail info@emlix.com
District Court of Goettingen, Registry Number HR B 3160
Managing Directors: Heike Jordan, Dr. Uwe Kracke
VAT ID No. DE 205 198 055
Office Berlin: Panoramastr. 1, 10178 Berlin, Germany
Office Bonn: Bachstr. 6, 53115 Bonn, Germany
http://www.emlix.com
emlix - your embedded Linux partner
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: "nft reset counters" bug on 32-bit systems
2025-09-10 16:48 "nft reset counters" bug on 32-bit systems Andreas Fried
@ 2025-09-10 18:08 ` Florian Westphal
2025-09-23 16:21 ` Andreas Fried
0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2025-09-10 18:08 UTC (permalink / raw)
To: Andreas Fried; +Cc: netfilter
Andreas Fried <afried@emlix.com> wrote:
> This was tested on an arm32 system running kernel 6.1.134 and nftables
> 1.0.9, but as far as I can see, the code is the same on current master.
> Looking at nft_counter_reset(), this is a problem for systems where long
> is 32 bits wide.
Good catch.
> nft_counter_reset() wants to subtract the current total from the
> counter, so it calls u64_stats_add() with a negative "val" argument
> (e.g. -total->packets, in our case -4). But that argument is an unsigned
> long (u32) being added to the u64 counter. That means that it actually
> adds 0xfffffffc to the counter, giving 0x100000000, a.k.a. 2**32.
>
> Seeing that u64_stats are used all over the place, any change might
> break all sorts of other things. So I'm hesitating to suggest a patch,
I'd suggest to turn nft_counter_reset() into a variant of nft_counter_fetch()
that uses local_xchg() instead of reads.
Expensive but reset requests should be rare and its much clearer as to
what is happening vs. this
fetch-and-then-add-negative-total-to-one-pcpu-counter.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: "nft reset counters" bug on 32-bit systems
2025-09-10 18:08 ` Florian Westphal
@ 2025-09-23 16:21 ` Andreas Fried
2025-09-23 16:44 ` Florian Westphal
0 siblings, 1 reply; 4+ messages in thread
From: Andreas Fried @ 2025-09-23 16:21 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter
On 9/10/25 20:08, Florian Westphal wrote:
> I'd suggest to turn nft_counter_reset() into a variant of nft_counter_fetch()
> that uses local_xchg() instead of reads.
>
> Expensive but reset requests should be rare and its much clearer as to
> what is happening vs. this
> fetch-and-then-add-negative-total-to-one-pcpu-counter.
I'm afraid I'm missing something here. Wouldn't this be too expensive?
nft_counter_fetch() can read the stats from other CPUs without issues,
but that won't work for writing, i.e. local_xchg(), right? local64_t is
only atomic with respect to one CPU, so would we need to schedule work
on each CPU to read and reset the counters?
In d84701ecbcd6ad63faa7a9c18ad670d1c4d561c0, Pablo Neira points out that
cmpxchg will not work unless all other functions also use it, and that's
too slow.
What am I missing?
--
Andreas Fried
emlix GmbH
Headquarters: Berliner Str. 12, 37073 Goettingen, Germany
Phone +49 (0)551 30664-0, e-mail info@emlix.com
District Court of Goettingen, Registry Number HR B 3160
Managing Directors: Heike Jordan, Dr. Uwe Kracke
VAT ID No. DE 205 198 055
Office Berlin: Panoramastr. 1, 10178 Berlin, Germany
Office Bonn: Bachstr. 6, 53115 Bonn, Germany
http://www.emlix.com
emlix - your embedded Linux partner
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: "nft reset counters" bug on 32-bit systems
2025-09-23 16:21 ` Andreas Fried
@ 2025-09-23 16:44 ` Florian Westphal
0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2025-09-23 16:44 UTC (permalink / raw)
To: Andreas Fried; +Cc: netfilter
Andreas Fried <afried@emlix.com> wrote:
> On 9/10/25 20:08, Florian Westphal wrote:
> > I'd suggest to turn nft_counter_reset() into a variant of nft_counter_fetch()
> > that uses local_xchg() instead of reads.
> >
> > Expensive but reset requests should be rare and its much clearer as to
> > what is happening vs. this
> > fetch-and-then-add-negative-total-to-one-pcpu-counter.
>
> I'm afraid I'm missing something here. Wouldn't this be too expensive?
No, but doesn't work due to not being atomic wrt. userspace reading
stats in parallel.
> nft_counter_fetch() can read the stats from other CPUs without issues,
> but that won't work for writing, i.e. local_xchg(), right? local64_t is
> only atomic with respect to one CPU, so would we need to schedule work
> on each CPU to read and reset the counters?
No, it simply won't work. We need to continue to use the existing
scheme to make sure userspace observers either the counter before reset,
0, or a new updated value that happened right after the reset.
With my proposal, userspace can observe incorrect value when one cpu
is doing the reset, as for each xchg it can observe 'shrinking'
counters.
> In d84701ecbcd6ad63faa7a9c18ad670d1c4d561c0, Pablo Neira points out that
> cmpxchg will not work unless all other functions also use it, and that's
> too slow.
A-ha. Yes, its not atomic wrt. read path.
Someone needs to add u64_stats_add64() so this can be fixed
for 32bit platforms in a followup patch.
local_add() already does the right thing for us, so the stats api is the
only missing piece of the puzzle.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-23 16:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-10 16:48 "nft reset counters" bug on 32-bit systems Andreas Fried
2025-09-10 18:08 ` Florian Westphal
2025-09-23 16:21 ` Andreas Fried
2025-09-23 16:44 ` Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).