From: Johannes Weiner <hannes@cmpxchg.org>
To: Breno Leitao <leitao@debian.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org, kas@kernel.org,
shakeel.butt@linux.dev, usama.arif@linux.dev,
kernel-team@meta.com
Subject: Re: [PATCH] mm/vmstat: spread vmstat_update requeue across the stat interval
Date: Wed, 1 Apr 2026 10:25:35 -0400 [thread overview]
Message-ID: <ac0q37PYB-2DZhLk@cmpxchg.org> (raw)
In-Reply-To: <20260401-vmstat-v1-1-b68ce4a35055@debian.org>
On Wed, Apr 01, 2026 at 06:57:50AM -0700, Breno Leitao wrote:
> vmstat_update uses round_jiffies_relative() when re-queuing itself,
> which aligns all CPUs' timers to the same second boundary. When many
> CPUs have pending PCP pages to drain, they all call decay_pcp_high() ->
> free_pcppages_bulk() simultaneously, serializing on zone->lock and
> hitting contention.
>
> Introduce vmstat_spread_delay() which distributes each CPU's
> vmstat_update evenly across the stat interval instead of aligning them.
>
> This does not increase the number of timer interrupts — each CPU still
> fires once per interval. The timers are simply staggered rather than
> aligned. Additionally, vmstat_work is DEFERRABLE_WORK, so it does not
> wake idle CPUs regardless of scheduling; the spread only affects CPUs
> that are already active
>
> `perf lock contention` shows 7.5x reduction in zone->lock contention
> (872 -> 117 contentions, 199ms -> 81ms total wait) on a 72-CPU aarch64
> system under memory pressure.
>
> Tested on a 72-CPU aarch64 system using stress-ng --vm to generate
> memory allocation bursts. Lock contention was measured with:
>
> perf lock contention -a -b -S free_pcppages_bulk
>
> Results with KASAN enabled:
>
> free_pcppages_bulk contention (KASAN):
> +--------------+----------+----------+
> | Metric | No fix | With fix |
> +--------------+----------+----------+
> | Contentions | 872 | 117 |
> | Total wait | 199.43ms | 80.76ms |
> | Max wait | 4.19ms | 35.76ms |
> +--------------+----------+----------+
>
> Results without KASAN:
>
> free_pcppages_bulk contention (no KASAN):
> +--------------+----------+----------+
> | Metric | No fix | With fix |
> +--------------+----------+----------+
> | Contentions | 240 | 133 |
> | Total wait | 34.01ms | 24.61ms |
> | Max wait | 965us | 1.35ms |
> +--------------+----------+----------+
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
Nice!
> ---
> mm/vmstat.c | 25 ++++++++++++++++++++++++-
> 1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 2370c6fb1fcd..2e94bd765606 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -2032,6 +2032,29 @@ static int vmstat_refresh(const struct ctl_table *table, int write,
> }
> #endif /* CONFIG_PROC_FS */
>
> +/*
> + * Return a per-cpu delay that spreads vmstat_update work across the stat
> + * interval. Without this, round_jiffies_relative() aligns every CPU's
> + * timer to the same second boundary, causing a thundering-herd on
> + * zone->lock when multiple CPUs drain PCP pages simultaneously via
> + * decay_pcp_high() -> free_pcppages_bulk().
> + */
> +static unsigned long vmstat_spread_delay(void)
> +{
> + unsigned long interval = sysctl_stat_interval;
> + unsigned int nr_cpus = num_online_cpus();
> +
> + if (nr_cpus <= 1)
> + return round_jiffies_relative(interval);
> +
> + /*
> + * Spread per-cpu vmstat work evenly across the interval. Don't
> + * use round_jiffies_relative() here -- it would snap every CPU
> + * back to the same second boundary, defeating the spread.
> + */
> + return interval + (interval * (smp_processor_id() % nr_cpus)) / nr_cpus;
smp_processor_id() <= nr_cpus, so
return interval + interval*cpu/nr_cpus
should be equivalent, no?
Other than that,
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
next prev parent reply other threads:[~2026-04-01 14:25 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 13:57 [PATCH] mm/vmstat: spread vmstat_update requeue across the stat interval Breno Leitao
2026-04-01 14:25 ` Johannes Weiner [this message]
2026-04-01 14:39 ` Breno Leitao
2026-04-01 14:57 ` Johannes Weiner
2026-04-01 14:47 ` Breno Leitao
2026-04-01 15:01 ` Kiryl Shutsemau
2026-04-01 15:23 ` Usama Arif
2026-04-01 15:43 ` Breno Leitao
2026-04-01 15:50 ` Usama Arif
2026-04-01 15:52 ` Breno Leitao
2026-04-01 17:46 ` Vlastimil Babka (SUSE)
2026-04-02 12:40 ` Vlastimil Babka (SUSE)
2026-04-02 13:33 ` Breno Leitao
2026-04-02 12:43 ` Dmitry Ilvokhin
2026-04-02 7:18 ` Michal Hocko
2026-04-02 12:49 ` Matthew Wilcox
2026-04-02 13:26 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac0q37PYB-2DZhLk@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=kas@kernel.org \
--cc=kernel-team@meta.com \
--cc=leitao@debian.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox