From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D17C2D9ECA
	for <linux-kernel@vger.kernel.org>; Wed,  1 Apr 2026 14:25:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.49
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775053541; cv=none; b=inlJaCKojl1oIXFe6JTdaA49Rfx2RQ/aaEZhDV4h0cQwXrjnQL5XNqesomps5I4Ngu6vP/j7Q5fjfUNoT3YIbF1bq+dnIUbQlE03GBtxUrgNW6hSeUUviluYhEcTCw8q/yDgIArSU1zYJemaQcBJxR8JnWUMeEuDXkGc8pkJS5s=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775053541; c=relaxed/simple;
	bh=ZQfv6wwHnJujbwQ4rggm7uwDRGV8IeLb98SS4rUhf8o=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=TuLlJB0BSuSJsTJfj5V0dcnw1UVbPCA78JHqMNRbWYcGzEb7D3f1KkLMsGiDwjnZcFJt/a+rGitYWPYVuxEY2isHn6/RgT34+2G0V6Ol+uQhO13rnZp1/TkeA/V4IZIi2NVy/JwD1UUC5Ix4rFMTtp4qYoj+7fVB/f03rSaKCgQ=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=CKGIdCDr; arc=none smtp.client-ip=209.85.219.49
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="CKGIdCDr"
Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-89cc797547fso73637816d6.2
        for <linux-kernel@vger.kernel.org>; Wed, 01 Apr 2026 07:25:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg.org; s=google; t=1775053537; x=1775658337; darn=vger.kernel.org;
        h=in-reply-to:content-transfer-encoding:content-disposition
         :mime-version:references:message-id:subject:cc:to:from:date:from:to
         :cc:subject:date:message-id:reply-to;
        bh=z0enE4GAHK8hrP3H3mmhLmXee0EMRPHy0H0LDOHahRQ=;
        b=CKGIdCDrLPWZ2uHeUR/+F8xTI3wDZGehl8lSHFvmQZa2tjQDHDIvnuEn2XFNCZ9DqG
         IY+8F/9/7X3ib5lcGjK+SN9hrVVJRHDWzqssGcCLZgUbTI3xZyyOLperlXWTYHHjC4EC
         8l8Ccn3sk4sagr6J4LoiDHEqR4vKHuYeLbxOlyfxax2OGCxhcc0hC4iInxca+E9WbTOS
         EAV3R8zTJHo+uoKbfjavvbFGOEFSCFA10FRDA7nZSTgUWYuQeqlW7oU7NR7AmUf3fjn9
         aDOT1XFRzajyO8nrmA2hj/iBR8wEdTiRkIPVbcqRAl3WPhAJEtmNxi50XATV0Acs9T7d
         q5UA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1775053537; x=1775658337;
        h=in-reply-to:content-transfer-encoding:content-disposition
         :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=z0enE4GAHK8hrP3H3mmhLmXee0EMRPHy0H0LDOHahRQ=;
        b=RfJPATTtocCFF9f4V+Ci2eyavqAub1k6tCk3wgbeUQrrvZZgfo1lGCQZk1Nuiw0RLe
         95A8xkvpR9ndKcQUfa1/KxC3YX9JMeAiOvCYveYF444Kk1FhaxSlSPDcxE5rKkzIR3ui
         cmgstVBC7ZC2FguVbCegZOPD5rptltmDFCkpUVf3VV/3V2qGKRBR+KvYgvcbN2Da/Z3G
         rWf+AzGYPn7AUiNi/26dr2FOe6Ye8H9MZjfdd/cTEeD96UunBVt6Em32+k3MrfN5sulf
         bxOKTHGl4FWbATgwJ7oOftO4XwFHQ4kmpUMZz+xfZcbEByefT9MwyXBy3aCYzsMXT0ko
         g3fw==
X-Forwarded-Encrypted: i=1; AJvYcCV+bErojq/vk2sHtiXEcOjDglCARVkL8nhBDLwxd4Sw0caV6QfzqXOZvJZT8aF2hZBQXerRrGcCdSEPZhc=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy6akee78F4KcphWUfBoYkcPlRLFGQ3ltV93iV6Puo4PpNVFQyM
	FovpslyWHbxrd3BeAWjDONedU+lSzBnPjVJQFmqJx2FPb3wJeJOxRbg8mgsqerby02E=
X-Gm-Gg: ATEYQzw41E5L/CzLnXLFWYOx+g9uWHIDFmnnxKtDhQQIauEuLRwsENNX5ZUO9gfIFgX
	ItOaktRLlIWv0WMAUOLXA5/15qhYuiFArD4qlGZMvMB/7EcGmy2QJDNrwruwpMq2hVhWhtXzHjw
	OBmJg2uJs2GWaJWWTWn/3/WJoOPNXNkItKOtmUDN8cAP1+1C1tEM0XF3OjCYS50ynfSGPpt/u9b
	EK85SrOuMB3PHoqP2PncsWN/qShetBLRgnLlR2BmV7Qlieh8r0j18tc6Wq6tuA+GmusPP/1yxF7
	d6TWLintbFPCWEEueAe7cdC3FvTuoE9CeXVwTfC8QIWS4cQLYc/6s+QoWDmzUjmVHrp44nMpz7u
	coQgtpF8RNREVtD3zHRRqwua/ePXGqr5a1iLIRAWUvEQics0360oUEtXcDsx2yDyncrhMCv6m49
	IVpUdh3MfRNCWGbiqLst/TNg==
X-Received: by 2002:a05:6214:4981:b0:8a0:598e:897a with SMTP id 6a1803df08f44-8a437415ad6mr59972356d6.13.1775053536759;
        Wed, 01 Apr 2026 07:25:36 -0700 (PDT)
Received: from localhost ([2603:7000:c00:3a00:365a:60ff:fe62:ff29])
        by smtp.gmail.com with ESMTPSA id 6a1803df08f44-89ecb5cb530sm121185876d6.7.2026.04.01.07.25.35
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 01 Apr 2026 07:25:35 -0700 (PDT)
Date: Wed, 1 Apr 2026 10:25:35 -0400
From: Johannes Weiner <hannes@cmpxchg.org>
To: Breno Leitao <leitao@debian.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kas@kernel.org,
	shakeel.butt@linux.dev, usama.arif@linux.dev, kernel-team@meta.com
Subject: Re: [PATCH] mm/vmstat: spread vmstat_update requeue across the stat
 interval
Message-ID: <ac0q37PYB-2DZhLk@cmpxchg.org>
References: <20260401-vmstat-v1-1-b68ce4a35055@debian.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20260401-vmstat-v1-1-b68ce4a35055@debian.org>

On Wed, Apr 01, 2026 at 06:57:50AM -0700, Breno Leitao wrote:
> vmstat_update uses round_jiffies_relative() when re-queuing itself,
> which aligns all CPUs' timers to the same second boundary.  When many
> CPUs have pending PCP pages to drain, they all call decay_pcp_high() ->
> free_pcppages_bulk() simultaneously, serializing on zone->lock and
> hitting contention.
> 
> Introduce vmstat_spread_delay() which distributes each CPU's
> vmstat_update evenly across the stat interval instead of aligning them.
> 
> This does not increase the number of timer interrupts — each CPU still
> fires once per interval. The timers are simply staggered rather than
> aligned. Additionally, vmstat_work is DEFERRABLE_WORK, so it does not
> wake idle CPUs regardless of scheduling; the spread only affects CPUs
> that are already active
> 
> `perf lock contention` shows 7.5x reduction in zone->lock contention
> (872 -> 117 contentions, 199ms -> 81ms total wait) on a 72-CPU aarch64
> system under memory pressure.
> 
> Tested on a 72-CPU aarch64 system using stress-ng --vm to generate
> memory allocation bursts.  Lock contention was measured with:
> 
>   perf lock contention -a -b -S free_pcppages_bulk
> 
> Results with KASAN enabled:
> 
>   free_pcppages_bulk contention (KASAN):
>   +--------------+----------+----------+
>   | Metric       | No fix   | With fix |
>   +--------------+----------+----------+
>   | Contentions  |      872 |      117 |
>   | Total wait   | 199.43ms | 80.76ms  |
>   | Max wait     |   4.19ms | 35.76ms  |
>   +--------------+----------+----------+
> 
> Results without KASAN:
> 
>   free_pcppages_bulk contention (no KASAN):
>   +--------------+----------+----------+
>   | Metric       | No fix   | With fix |
>   +--------------+----------+----------+
>   | Contentions  |      240 |      133 |
>   | Total wait   |  34.01ms | 24.61ms  |
>   | Max wait     |   965us  |  1.35ms  |
>   +--------------+----------+----------+
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>

Nice!

> ---
>  mm/vmstat.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 2370c6fb1fcd..2e94bd765606 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -2032,6 +2032,29 @@ static int vmstat_refresh(const struct ctl_table *table, int write,
>  }
>  #endif /* CONFIG_PROC_FS */
>  
> +/*
> + * Return a per-cpu delay that spreads vmstat_update work across the stat
> + * interval.  Without this, round_jiffies_relative() aligns every CPU's
> + * timer to the same second boundary, causing a thundering-herd on
> + * zone->lock when multiple CPUs drain PCP pages simultaneously via
> + * decay_pcp_high() -> free_pcppages_bulk().
> + */
> +static unsigned long vmstat_spread_delay(void)
> +{
> +	unsigned long interval = sysctl_stat_interval;
> +	unsigned int nr_cpus = num_online_cpus();
> +
> +	if (nr_cpus <= 1)
> +		return round_jiffies_relative(interval);
> +
> +	/*
> +	 * Spread per-cpu vmstat work evenly across the interval.  Don't
> +	 * use round_jiffies_relative() here -- it would snap every CPU
> +	 * back to the same second boundary, defeating the spread.
> +	 */
> +	return interval + (interval * (smp_processor_id() % nr_cpus)) / nr_cpus;

smp_processor_id() <= nr_cpus, so

	return interval + interval*cpu/nr_cpus

should be equivalent, no?

Other than that,

Acked-by: Johannes Weiner <hannes@cmpxchg.org>