From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Joe Damato <jdamato@fastly.com>,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Eric Dumazet <edumazet@google.com>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Jakub Kicinski <kuba@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Leon Romanovsky <leon@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Simon Horman <horms@kernel.org>, Tariq Toukan <tariqt@nvidia.com>,
Thomas Gleixner <tglx@linutronix.de>,
Yunsheng Lin <linyunsheng@huawei.com>
Subject: Re: [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t.
Date: Wed, 26 Feb 2025 11:27:03 +0100 [thread overview]
Message-ID: <20250226102703.3F7Aa2oK@linutronix.de> (raw)
In-Reply-To: <Z7izmyDRvTmKpN4-@LQ3V64L9R2>
On 2025-02-21 12:10:51 [-0500], Joe Damato wrote:
> On Fri, Feb 21, 2025 at 12:52:19PM +0100, Sebastian Andrzej Siewior wrote:
> > This is a follow-up on
> > https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/
> >
> > to convert the page_pool statistics to u64_stats_t to avoid u64 related
> > problems on 32bit architectures.
> > While looking over it, the comment for recycle_stat_inc() says that it
> > is safe to use in preemptible context.
>
> I wrote that comment because it's an increment of a per-cpu counter.
>
> The documentation in Documentation/core-api/this_cpu_ops.rst
> explains in more depth, but this_cpu_inc is safe to use without
> worrying about pre-emption and interrupts.
I don't argue that it is not safe to use in preemptible context. I am
just saying that it is not safe after the rework. If it is really used
like that, then it is no longer safe to do so (after the rework).
> > The 32bit update is split into two 32bit writes and if we get
> > preempted in the middle and another one makes an update then the
> > value gets inconsistent and the previous update can overwrite the
> > following. (Rare but still).
>
> Have you seen this? Can you show the generated assembly which
> suggests that this occurs? It would be helpful if you could show the
> before and after 32-bit assembly code.
…
So there are two things:
First we have alloc_stat_inc(), which does
#define alloc_stat_inc(pool, __stat) (pool->alloc_stats.__stat++)
so on x86 32bit this turns into
| addl $1, 120(%ebx) #, pool_15(D)->alloc_stats.fast
| adcl $0, 124(%ebx) #, pool_15(D)->alloc_stats.fast
So the lower 4 byte are incremented before the higher 4 byte are.
recycle_stat_inc() is using this_cpu_inc() and performs a similar
update. On x86-32 it turns into
| movl 836(%ebx), %eax # pool_15(D)->recycle_stats, s
| pushf ; pop %edx # flags
| cli
| movl %fs:this_cpu_off, %ecx # *_20,
| addl %ecx, %eax #, _42
| addl $1, (%eax) #, *_42
| adcl $0, 4(%eax) #, *_42
| testb $2, %dh #, flags
| je .L508 #,
| sti
|.L508:
so the update can be performed safely in preemptible context as the CPU
is determined within the IRQ-off section and so is the increment itself
performed. It updates always the local value belonging to the CPU.
Reading the values locally (on the CPU that is doing the update) is okay
but reading the value from a remote CPU while an update might be done
can result in reading the lower 4 bytes before the upper 4 bytes are
visible.
This can lead to an inconsistent value on 32bit which is likely
"corrected" on the next read.
Thus my initial question: Do we care? If so, I suggest to use u64_stats
like most of the networking stack. However if we do so, the update must
be performed with disabled-BH as I assume the updates are done in
softirq context. It must be avoided that one update preempts another.
Sebastian
prev parent reply other threads:[~2025-02-26 10:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-21 11:52 [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats " Sebastian Andrzej Siewior
2025-02-21 17:21 ` Joe Damato
2025-02-26 12:06 ` Sebastian Andrzej Siewior
2025-02-22 8:13 ` Yunsheng Lin
2025-02-25 11:27 ` Paolo Abeni
2025-02-26 9:28 ` Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
2025-02-21 17:30 ` Joe Damato
2025-02-22 8:13 ` Yunsheng Lin
2025-02-21 17:10 ` [PATCH net-next 0/2] page_pool: Convert stats " Joe Damato
2025-02-26 10:27 ` Sebastian Andrzej Siewior [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250226102703.3F7Aa2oK@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jdamato@fastly.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).