From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2682D175A92 for ; Thu, 2 Apr 2026 07:18:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775114325; cv=none; b=JSnSS3QPx+nP6wzRsyra4eLwKL4Z3gKcYrSgDCf8YmDDmgEeQ7oiXMHStwP9aDzozgfuuEbDWlQx47ll7EL7z9pIUcU2P6ZNCXpnx+CQZHeLthLkiXaEElRtC/h3Dk2qztlfYwf+5LQElo19tYmHr7TbB5bF60HqY+DhhZCtdyM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775114325; c=relaxed/simple; bh=znjWZ0jisz7RA4cp91JFCzwsowUfd7aW7ELjiX5foo8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Pu9Kx3O8/MS19Ft356LHmuO8RuJRDGoyFQQPqY9Hg5/30mLnfO4q7bHlo3BO4mm+q3D8LzyH9GBkeiB/2/8KmivFwJioPt3REjPL9WYzyhKKbNmw7fIX9rmvznV5nYspFZLlfrBKwx2/9X801vYudIodofrIMjupqNnxN3ucOGk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=eeAOLYtP; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="eeAOLYtP" Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-43cfde3c3f3so511703f8f.3 for ; Thu, 02 Apr 2026 00:18:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1775114322; x=1775719122; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=yWib1ACAapu7QdNLT3bwEyAd7Y6usbD3msQPPdpg6bg=; b=eeAOLYtPUMxm7CnnILTmC86mtts8LfhKN9DcXEc776cX9S6N3/z2+wNEuDPrfw/SjV /BpJ+qG9bRJCoN7kjEstX2vliZA9pw9FjuqQ/dUb9oBRZtfmiuapPuwQQ6bDEyE6vaYP l31ULvwMsvjw6pudrE/dArvYwFKSQbdiyHGsFQWmw/h0UyB4bihMLhYR1RBFWCzP2nUw LTVh7sx9Qk/rnqn4JVYGyM9d1fpIFu33XtEe1mlqZOQ6zjPzTpM+rKQGgyYjYeygFBOh VbCrCKXPsE9u12VSIz8pOh7USeEeTPnvkcAQBgNq/1lI0CfpRUy36tDVszVmE8LiBUZq tX4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775114322; x=1775719122; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yWib1ACAapu7QdNLT3bwEyAd7Y6usbD3msQPPdpg6bg=; b=GI0yPXPw2o5PG7Z7HerzPd6YHcHsFtEtme9uqNxyx487/uUbBS8fM07zK4MESsRIgl 1MThYM513Rrl6HH6cCnZY2DVpoqE3IOC+L1Zb/V45hvWlEa6wuLzlbUbkC6ylsj/oER9 D47yACFV9ElDB8/b1pOrWgZyyenV+L0+UsdgFbKtAGH3MSE+Pq8LMOzbz2atIUdsPLbU ymC6tpBcgjogD7SkSiDQmq9F3D96YQMndBd+j2kEqv8pIb8penUUu1e99kW8bNZHmwVN 7cxQ/mnSUHAkM9I/MyRQaHELzfUVUgSiptCJYSTvzXQJvaKw5kqcyMLmB9G6rZ4ulhyJ EzuA== X-Forwarded-Encrypted: i=1; AJvYcCX+FtPdvCaQlmCpCHCRN9SdUHZE3/l+zEB5UIADlJbQoGDJWN38c8N9jRXCvJGH7Ri/Rej0SxXaYeOg6Bo=@vger.kernel.org X-Gm-Message-State: AOJu0YyH3QPUSrwH7pIuK6tguaKThLvzYB6Hs2WqwhrNHePATplojBW9 Ar2/Jx2Zo7BFxEY/uAQsUAEp+Mb6xEBBQk4FcXHHpaOnLYnHpV7/tlGMGBd9URGBl2E= X-Gm-Gg: ATEYQzxSEXQTMNTgrD8Guysca3GNiN6reQqCOKRigGt3PD7eJ/0+sBehS9RHoOZxc59 B5mYx9uBXjXol3vYNYEoEidQxcC3iCRWfKkoG9a7vwn06h2wMII1zO4ep2mOQg+bcsysnMI0/YE 74SQZ9tu/Si2fUbWdguNqWGjN7yyFmXHXI1lD8isKyD4krhfTOS+56qqWBwZ6U8XftVk4eKgZ2T 4uPOWNsbGbJCPQkGdR7Chp4jxR0XzXVTIIWpkJAbIZoO8wBz6Bp3wSfdJ9uqP46eORcalC2LGKW glbl69HJ5GQhwoH25y5QPGnlmRvGSVJNOO12IoRxew0MEo9gu9jUay7rhpxr+kQCTsZnsxq2Sjq E2uUkuHoFmivpwjuOg6JyEDYCy/DoyiY/CmQOhZU1Gzr76dYHD66H3FChKVxt0PRPx2/gwkxFw7 bRq/UlmhShNzREayyesAslc7J78eLO+48dRg== X-Received: by 2002:a05:6000:2285:b0:43b:5192:894b with SMTP id ffacd0b85a97d-43d150bd2bcmr12479245f8f.23.1775114322340; Thu, 02 Apr 2026 00:18:42 -0700 (PDT) Received: from localhost (109-81-86-77.rct.o2.cz. [109.81.86.77]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43d1e2a6f1esm5937267f8f.2.2026.04.02.00.18.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Apr 2026 00:18:41 -0700 (PDT) Date: Thu, 2 Apr 2026 09:18:40 +0200 From: Michal Hocko To: Breno Leitao Cc: Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kas@kernel.org, shakeel.butt@linux.dev, usama.arif@linux.dev, kernel-team@meta.com Subject: Re: [PATCH] mm/vmstat: spread vmstat_update requeue across the stat interval Message-ID: References: <20260401-vmstat-v1-1-b68ce4a35055@debian.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260401-vmstat-v1-1-b68ce4a35055@debian.org> On Wed 01-04-26 06:57:50, Breno Leitao wrote: > vmstat_update uses round_jiffies_relative() when re-queuing itself, > which aligns all CPUs' timers to the same second boundary. When many > CPUs have pending PCP pages to drain, they all call decay_pcp_high() -> > free_pcppages_bulk() simultaneously, serializing on zone->lock and > hitting contention. > > Introduce vmstat_spread_delay() which distributes each CPU's > vmstat_update evenly across the stat interval instead of aligning them. > > This does not increase the number of timer interrupts — each CPU still > fires once per interval. The timers are simply staggered rather than > aligned. Additionally, vmstat_work is DEFERRABLE_WORK, so it does not > wake idle CPUs regardless of scheduling; the spread only affects CPUs > that are already active > > `perf lock contention` shows 7.5x reduction in zone->lock contention > (872 -> 117 contentions, 199ms -> 81ms total wait) on a 72-CPU aarch64 > system under memory pressure. > > Tested on a 72-CPU aarch64 system using stress-ng --vm to generate > memory allocation bursts. Lock contention was measured with: > > perf lock contention -a -b -S free_pcppages_bulk > > Results with KASAN enabled: > > free_pcppages_bulk contention (KASAN): > +--------------+----------+----------+ > | Metric | No fix | With fix | > +--------------+----------+----------+ > | Contentions | 872 | 117 | > | Total wait | 199.43ms | 80.76ms | > | Max wait | 4.19ms | 35.76ms | > +--------------+----------+----------+ > > Results without KASAN: > > free_pcppages_bulk contention (no KASAN): > +--------------+----------+----------+ > | Metric | No fix | With fix | > +--------------+----------+----------+ > | Contentions | 240 | 133 | > | Total wait | 34.01ms | 24.61ms | > | Max wait | 965us | 1.35ms | > +--------------+----------+----------+ > > Signed-off-by: Breno Leitao Makes sense Acked-by: Michal Hocko Thanks! > --- > mm/vmstat.c | 25 ++++++++++++++++++++++++- > 1 file changed, 24 insertions(+), 1 deletion(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 2370c6fb1fcd..2e94bd765606 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2032,6 +2032,29 @@ static int vmstat_refresh(const struct ctl_table *table, int write, > } > #endif /* CONFIG_PROC_FS */ > > +/* > + * Return a per-cpu delay that spreads vmstat_update work across the stat > + * interval. Without this, round_jiffies_relative() aligns every CPU's > + * timer to the same second boundary, causing a thundering-herd on > + * zone->lock when multiple CPUs drain PCP pages simultaneously via > + * decay_pcp_high() -> free_pcppages_bulk(). > + */ > +static unsigned long vmstat_spread_delay(void) > +{ > + unsigned long interval = sysctl_stat_interval; > + unsigned int nr_cpus = num_online_cpus(); > + > + if (nr_cpus <= 1) > + return round_jiffies_relative(interval); > + > + /* > + * Spread per-cpu vmstat work evenly across the interval. Don't > + * use round_jiffies_relative() here -- it would snap every CPU > + * back to the same second boundary, defeating the spread. > + */ > + return interval + (interval * (smp_processor_id() % nr_cpus)) / nr_cpus; > +} > + > static void vmstat_update(struct work_struct *w) > { > if (refresh_cpu_vm_stats(true)) { > @@ -2042,7 +2065,7 @@ static void vmstat_update(struct work_struct *w) > */ > queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, > this_cpu_ptr(&vmstat_work), > - round_jiffies_relative(sysctl_stat_interval)); > + vmstat_spread_delay()); > } > } > > > --- > base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb > change-id: 20260401-vmstat-048e0feaf344 > > Best regards, > -- > Breno Leitao > -- Michal Hocko SUSE Labs