public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Ilvokhin <d@ilvokhin.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Breno Leitao <leitao@debian.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, kas@kernel.org,
	shakeel.butt@linux.dev, usama.arif@linux.dev,
	kernel-team@meta.com
Subject: Re: [PATCH] mm/vmstat: spread vmstat_update requeue across the stat interval
Date: Thu, 2 Apr 2026 12:43:27 +0000	[thread overview]
Message-ID: <ac5kb4ZHWAouBFEK@shell.ilvokhin.com> (raw)
In-Reply-To: <fa089716-1bed-478b-96e3-a2ef5465b52f@kernel.org>

On Wed, Apr 01, 2026 at 07:46:35PM +0200, Vlastimil Babka (SUSE) wrote:

[...]

> > +/*
> > + * Return a per-cpu delay that spreads vmstat_update work across the stat
> > + * interval.  Without this, round_jiffies_relative() aligns every CPU's
> > + * timer to the same second boundary, causing a thundering-herd on
> > + * zone->lock when multiple CPUs drain PCP pages simultaneously via
> > + * decay_pcp_high() -> free_pcppages_bulk().
> > + */
> > +static unsigned long vmstat_spread_delay(void)
> > +{
> > +	unsigned long interval = sysctl_stat_interval;
> > +	unsigned int nr_cpus = num_online_cpus();
> > +
> > +	if (nr_cpus <= 1)
> > +		return round_jiffies_relative(interval);
> > +
> > +	/*
> > +	 * Spread per-cpu vmstat work evenly across the interval.  Don't
> > +	 * use round_jiffies_relative() here -- it would snap every CPU
> > +	 * back to the same second boundary, defeating the spread.
> > +	 */
> > +	return interval + (interval * (smp_processor_id() % nr_cpus)) / nr_cpus;
> 
> Hm doesn't this mean that lower id cpus will consistently fire in shorter
> intervals and higher id in longer intervals? What we want is same interval
> but differently offset, no?

Yes, I think that's a valid concern, this effectively skews the
interval rather than just introducing a phase offset.

I initially thought this might explain the increase in max wait, but it
turns out the columns were just swapped.

Spreading the initial scheduling and then requeueing with a constant
interval sounds like a reasonable alternative, e.g. below.

From 56ed7e17b32f0a7ce433caed87650b0de8246c4e Mon Sep 17 00:00:00 2001
From: Dmitry Ilvokhin <d@ilvokhin.com>
Date: Thu, 2 Apr 2026 04:49:06 -0700
Subject: [PATCH] mm/vmstat: stagger per-cpu vmstat updates to avoid zone->lock
 contention

Fix by spreading the shepherd's initial wakeup across the stat plain
sysctl_stat_interval to preserve the stagger. Every CPU still fires once
per interval, same frequency, different phase.

Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
 mm/vmstat.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 2370c6fb1fcd..aee99786718a 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2042,7 +2042,7 @@ static void vmstat_update(struct work_struct *w)
 		 */
 		queue_delayed_work_on(smp_processor_id(), mm_percpu_wq,
 				this_cpu_ptr(&vmstat_work),
-				round_jiffies_relative(sysctl_stat_interval));
+				sysctl_stat_interval);
 	}
 }
 
@@ -2140,7 +2140,8 @@ static void vmstat_shepherd(struct work_struct *w)
 				continue;
 
 			if (!delayed_work_pending(dw) && need_update(cpu))
-				queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);
+				queue_delayed_work_on(cpu, mm_percpu_wq, dw,
+					(sysctl_stat_interval * cpu) / nr_cpu_ids);
 		}
 
 		cond_resched();
-- 
2.52.0


  parent reply	other threads:[~2026-04-02 12:43 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01 13:57 [PATCH] mm/vmstat: spread vmstat_update requeue across the stat interval Breno Leitao
2026-04-01 14:25 ` Johannes Weiner
2026-04-01 14:39   ` Breno Leitao
2026-04-01 14:57     ` Johannes Weiner
2026-04-01 14:47 ` Breno Leitao
2026-04-01 15:01 ` Kiryl Shutsemau
2026-04-01 15:23 ` Usama Arif
2026-04-01 15:43   ` Breno Leitao
2026-04-01 15:50     ` Usama Arif
2026-04-01 15:52       ` Breno Leitao
2026-04-01 17:46 ` Vlastimil Babka (SUSE)
2026-04-02 12:40   ` Vlastimil Babka (SUSE)
2026-04-02 13:33     ` Breno Leitao
2026-04-02 12:43   ` Dmitry Ilvokhin [this message]
2026-04-02  7:18 ` Michal Hocko
2026-04-02 12:49 ` Matthew Wilcox
2026-04-02 13:26   ` Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac5kb4ZHWAouBFEK@shell.ilvokhin.com \
    --to=d@ilvokhin.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=kas@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox