From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3FCEF1098783 for ; Fri, 20 Mar 2026 13:19:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8C4356B0088; Fri, 20 Mar 2026 09:19:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 89B8F6B0092; Fri, 20 Mar 2026 09:19:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B3786B0088; Fri, 20 Mar 2026 09:19:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6AB0B6B0088 for ; Fri, 20 Mar 2026 09:19:17 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 13614139AC0 for ; Fri, 20 Mar 2026 13:19:17 +0000 (UTC) X-FDA: 84566497554.02.4E25CE4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id DD7BD1C000B for ; Fri, 20 Mar 2026 13:19:14 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cMJEPqUI; spf=pass (imf20.hostedemail.com: domain of longman@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1774012755; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=43pI9Rxx7g/U6KWY+aHGDiY4UP4hzBP1C6y0vyJ/NMo=; b=YBsUEF9eMVH9QMdENv57gyjgsJDtTvlGZLi7J8A7spsdYLtjkLDNJlKvN0tS3Ba83LZ0Rl NJsqT4x31aUDI5p3s0+Se/fOyED/fVyxfJLoH54sdOlnUjTxLT0zIp/Gc1EPdJt1mW1lCp zRWcP+2NEF6mjNbW7oe1APKeKeR1M0E= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cMJEPqUI; spf=pass (imf20.hostedemail.com: domain of longman@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1774012755; a=rsa-sha256; cv=none; b=pH/+n5XqwCoOkX2XuQo4r4SGjL/39Dm/8ZMsdhTsFW2JDx/4JBOL2o3hcSRFDcSs3j5FgM 8smVNMPeOaPqY2xiqTsRYeJPBylWUzlRRCn3YeIy+YY8ozNmOCDo5G8DJ+6JIVgf9v7Hrx HvOEHzV0NtPWjuOJFOWnTbkCmT8Kj5o= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1774012754; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=43pI9Rxx7g/U6KWY+aHGDiY4UP4hzBP1C6y0vyJ/NMo=; b=cMJEPqUIYURmb3ow6OLba9nduPNVdsxFXSNdVdewvxfvE9AUB/ev1r/6d3cnH5vh+HODCM 0/JhUiuyFt9Q/scGdxNioG6pkwuY6Bw1UEfxTwwfns3xYE7flkE8fqK+cbLPYnbueaCgvy sjXrUeABQr3jDnZz6bIoHrXa3s7mvyg= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-461-CeVCNg7WP_inUj0MZw47nw-1; Fri, 20 Mar 2026 09:19:10 -0400 X-MC-Unique: CeVCNg7WP_inUj0MZw47nw-1 X-Mimecast-MFC-AGG-ID: CeVCNg7WP_inUj0MZw47nw_1774012748 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CBBF51944F01; Fri, 20 Mar 2026 13:19:07 +0000 (UTC) Received: from [10.22.65.139] (unknown [10.22.65.139]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B29951955F25; Fri, 20 Mar 2026 13:19:03 +0000 (UTC) Message-ID: Date: Fri, 20 Mar 2026 09:19:01 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/7] memcg: Scale up vmstats flush threshold with log2(nums_possible_cpus) To: Li Wang Cc: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Tejun Heo , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Shuah Khan , Mike Rapoport , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Sean Christopherson , James Houghton , Sebastian Chlad , Guopeng Zhang , Li Wang References: <20260319173752.1472864-1-longman@redhat.com> <20260319173752.1472864-2-longman@redhat.com> From: Waiman Long In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-MFC-PROC-ID: lmjdvxApl6C7fk-OVKM42Uo1ddOkTEeiQLNVSCzafQY_1774012748 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: DD7BD1C000B X-Stat-Signature: wrxu7t7ru63dn8pmhafu4fujnkgfrd5f X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1774012754-751514 X-HE-Meta: U2FsdGVkX1/A7WPKIU/Y5GCMYtXGc2S65RGWUDwpWN1HoLftP2tFMF4QVKo29ld+3j7ePVQF2/S+TaoAnUhaw5rlu+cu2Dug/NhXnyNcXeIK4DO0gFOlwzO3OZVqHH2MNnP3ioxgz37NulsFsRBJ63pxxLpKJ+KMrkTwhkQigFs1Co9dA36g77xOFoAAA08S4y0KgyszRY5Cgp/btukn5EaB3SkuktLUP/qF1fyd10LJYf3vNsrS4y/xItqiMrKia2RoSHoRPw6TAa9gSyluU8RuIPN0NbVYLravlksSwelkWB0Mr3nXm9AERATSAIhSDSBsSb+g931TGUi498asq++VpV7Up5n0tXpO0srYU7PFEKJTiJUZwp8gQy+y9tUs2oDL+XDU5HAz9tEHAaWuGopTPzOSwW+sMreQSKMzk/85s0DSBL3gVtvi9Jdiku1h2pKq4l1dxv7SPzSiXzRv1Wr622rHe8zw/LndTNFazlf2ja6dT/zr8AZby5korLjRFoVuRvs+Zkg4EX+wuxw1sydKPNXh0wYoPRWNRBGqt1ImTw9y7iOeyHl8fVLU8DV/ql8icv/d9WmFEu8DpknZ1SpMsC+3FbWopodduZ05ZXkBlGivukAlHCDaEakwlmVE6+mIMvRJxw5ktqweX0nk9p3XW/wpTsUr5jmTKDdUglP/DTx3n6gdQD9XnuvZQxgeRxnPgsOXNa7M7eLgkevKau8IMetmBSq0BOCOusUXt/Dt4NIT+ocRMGZAlpXpdpYoBxm2hH21vVX16o8w1q17Cv2V6aA96HKaBf70vo/gEh+/9bCK9YxgUH2Z8HC0vxRljopqkJkmcV1KWWfeg6Goeosldys1XRzfHKeFwGk2WiMtXSPJcvXHn4CjWEZ/OWk3PKqSp0uw9XMCUPmqxnjgaNlzDolINAJXJVsuvTn3CHw3GzjAqVxL8yssu0JJ8+z7L4/FEckdFLlRKI9aAQE 16jDC0PJ AOnDCrwuqjNCSXZc57bJTfEJYQfkcInBZfczHH6VmJtW9NhD3o6DhA690Ki2x/0zatKdZhjbmlQojDvcLjADIYtkXqzqzbKZaEL1NzgJRs4SgN+HKU370g5DdtU9/QDq2xN7NqvXo9Y6HVYk58N1h39OzxNkrwayh2kp0AVaEBlsUR8KiJ+DtNTAd+gE5oM+hfVCDCmSkM73M/m2mEApnPCX2zLn1K/Pq9Qz7KR4Gj2jO8uYROvoHi+BdvsDPXtSywOH3G7oGRWD/rz6aDA0zWxKe8Il0ksNVLklS3RDc55bA/FnD8VsskqWz90zH12XTDzAMy5dyIlNXDeeRY055/76IjaaR0xXzMd7JOMNbgxdELJXa1Fb4rANERhwIkihTxmQ7HIwGfZeMRyA= Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/20/26 6:40 AM, Li Wang wrote: > On Thu, Mar 19, 2026 at 01:37:46PM -0400, Waiman Long wrote: >> The vmstats flush threshold currently increases linearly with the >> number of online CPUs. As the number of CPUs increases over time, it >> will become increasingly difficult to meet the threshold and update the >> vmstats data in a timely manner. These days, systems with hundreds of >> CPUs or even thousands of them are becoming more common. >> >> For example, the test_memcg_sock test of test_memcontrol always fails >> when running on an arm64 system with 128 CPUs. It is because the >> threshold is now 64*128 = 8192. With 4k page size, it needs changes in >> 32 MB of memory. It will be even worse with larger page size like 64k. >> >> To make the output of memory.stat more correct, it is better to >> scale up the threshold logarithmically instead of linearly with the >> number of CPUs. With the log2 scale, we can use the possibly larger >> num_possible_cpus() instead of num_online_cpus() which may change at >> run time. >> >> Although there is supposed to be a periodic and asynchronous flush of >> vmstats every 2 seconds, the actual time lag between succesive runs >> can actually vary quite a bit. In fact, I have seen time lags of up >> to 10s of seconds in some cases. So we couldn't too rely on the hope >> that there will be an asynchronous vmstats flush every 2 seconds. This >> may be something we need to look into. >> >> Signed-off-by: Waiman Long >> --- >> mm/memcontrol.c | 17 ++++++++++++----- >> 1 file changed, 12 insertions(+), 5 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 772bac21d155..8d4ede72f05c 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -548,20 +548,20 @@ struct memcg_vmstats { >> * rstat update tree grow unbounded. >> * >> * 2) Flush the stats synchronously on reader side only when there are more than >> - * (MEMCG_CHARGE_BATCH * nr_cpus) update events. Though this optimization >> - * will let stats be out of sync by atmost (MEMCG_CHARGE_BATCH * nr_cpus) but >> - * only for 2 seconds due to (1). >> + * (MEMCG_CHARGE_BATCH * (ilog2(nr_cpus) + 1)) update events. Though this >> + * optimization will let stats be out of sync by up to that amount but only >> + * for 2 seconds due to (1). >> */ >> static void flush_memcg_stats_dwork(struct work_struct *w); >> static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); >> static u64 flush_last_time; >> +static int vmstats_flush_threshold __ro_after_init; >> >> #define FLUSH_TIME (2UL*HZ) >> >> static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats) >> { >> - return atomic_read(&vmstats->stats_updates) > >> - MEMCG_CHARGE_BATCH * num_online_cpus(); >> + return atomic_read(&vmstats->stats_updates) > vmstats_flush_threshold; >> } >> >> static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val, >> @@ -5191,6 +5191,13 @@ int __init mem_cgroup_init(void) >> >> memcg_pn_cachep = KMEM_CACHE(mem_cgroup_per_node, >> SLAB_PANIC | SLAB_HWCACHE_ALIGN); >> + /* >> + * Logarithmically scale up vmstats flush threshold with the number >> + * of CPUs. >> + * N.B. ilog2(1) = 0. >> + */ >> + vmstats_flush_threshold = MEMCG_CHARGE_BATCH * >> + (ilog2(num_possible_cpus()) + 1); > Changing the threashold from linearly to logarithmically looks smarter, > but my concern is that, on large systems (hundreds/thousands of CPUs), > the threshold drops dramatically. > > For example, 1024 CPUs it goes from 65536 (256MB) to only 704 (2.7MB), > that's almost 100x. Could this potentially raise a performance issue > as frequently read 'memory.stat' on a heavily loaded system? > > Maybe go with MEMCG_CHARGE_BATCH * int_sqrt(num_possible_cpus()), > which sits between linear and log2? I have also been thinking about scaling faster than log2 but still below linear. I believe int_sqrt() is a good suggestion and I will adopt it in the next version. Thanks, Longman