From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 447BCC61DB2 for ; Mon, 9 Jun 2025 08:32:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE3D26B0088; Mon, 9 Jun 2025 04:32:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BBA936B008C; Mon, 9 Jun 2025 04:32:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF54B6B0092; Mon, 9 Jun 2025 04:32:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 941836B0088 for ; Mon, 9 Jun 2025 04:32:53 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3E0D9121549 for ; Mon, 9 Jun 2025 08:32:53 +0000 (UTC) X-FDA: 83535196626.13.89E4FE9 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf19.hostedemail.com (Postfix) with ESMTP id 8082B1A0005 for ; Mon, 9 Jun 2025 08:32:51 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=btySPf+o; spf=pass (imf19.hostedemail.com: domain of ritesh.list@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=ritesh.list@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749457971; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references:dkim-signature; bh=xJhRuEPgl8IUTnHZOTdmxme0PwZqVTJX6cZfIyFLqhk=; b=JPVZd3ry4466CNVOU8gtf+IF1H/mephgYC4GU/FYEwo82c2SZxEqGElK1Wn16sec9BFore hSsTwIRsnA0l+oUizXQ/Mrs8tDGsulUhVQFExAz0UUzmVzulcdXkeYTeP+c9jBmz2b0nf1 ZXncVGm9KkN1zN4ZDmGYbpbyB6ysCeg= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=btySPf+o; spf=pass (imf19.hostedemail.com: domain of ritesh.list@gmail.com designates 209.85.215.180 as permitted sender) smtp.mailfrom=ritesh.list@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749457971; a=rsa-sha256; cv=none; b=oErb++mKjvodpcmUHIj3ODwAzFEbE4nGnAw+zTptlWakKcXJp6M0I4cmMgvH0SxjfW/HNh HJZAYstZPUvECx6+YZY7g6zJfUph5mpn4p/YPFCPyTVoGUwliZIfQypUuGesCpZ2lUmA/d 4kX5oeLJOTezOoW5LqyXMpORMxYdxyM= Received: by mail-pg1-f180.google.com with SMTP id 41be03b00d2f7-b1fd59851baso2249985a12.0 for ; Mon, 09 Jun 2025 01:32:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749457970; x=1750062770; darn=kvack.org; h=references:message-id:date:in-reply-to:subject:cc:to:from:from:to :cc:subject:date:message-id:reply-to; bh=xJhRuEPgl8IUTnHZOTdmxme0PwZqVTJX6cZfIyFLqhk=; b=btySPf+o1IY4AGxFxOCuwCw5FxzoBYxBecv38tC3RKd3RK6cShGr6/pRNzmRU49pPJ vh3fwsS8PA+JWyuk3Hm4dGZiJPt4jPdVh65nWk3rlGRJoTV4FiyNEOaFlZ5HHbhrlml4 hEVjvdutPm6pjwktbLsE63BO15yDOTgxLF2c54faB3BEXcEMkmZ7iOtRDkCxzjUFHR3T zJk9JKL2+6+Bh5UvKdNRsOfFuhHYiMpcz6ayH4lv/EHkdHaaZLsuT17W/tqEkykfyRUL cFBZM0vFm+tyVghLEGLC6anWyPR3AWXzM4y3fQm60DvjC5LYDxU4C8eBNKN8y4ObBf1V S/lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749457970; x=1750062770; h=references:message-id:date:in-reply-to:subject:cc:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xJhRuEPgl8IUTnHZOTdmxme0PwZqVTJX6cZfIyFLqhk=; b=fHILhnDPJWyIyfRISYyzDF0ksMebvRrYZcHKLh7OcXangqVgfZU60ITeKiwjYHpp4N ArYZ4G6BebTLbjTZ00UEaZqVxcEa6P1DCtdTb8DONpEAymaYSR1/Dv/+SraCHD2/iT4n 6oXFYr7HzRIEMKsfcDwNyG38wED60UKv6HfvZ7vRy3aCSSSB6Z7dewrMb2xRZh2xFKF9 1bnF5ecxHZ8WMkAPWkGPGtVlNzdY8BYHRef8fj3YWWCGbJtNdmEA4zruoOwmBhcPezlY Z/jG3DM6or2Fr9YQgsaoEECf1O+86RGLdO6H6dCRjX4NMKTkB7crG3wo17+XNB+AOmtp 97qg== X-Forwarded-Encrypted: i=1; AJvYcCVUGDZCJcldIiJrC8zHQWtGg8DAVJ8CH6TvwVhrLWdNZ04DRnqGqen3iMLRYHZa/2BFgpXUvLGGHg==@kvack.org X-Gm-Message-State: AOJu0YzyU2wVjAAcGYT9R/Q4XjT3Q/SXQf8a68MQWAjXRsD/We+L5FRp fPT5C5Nlhns3bD0qWifIJ+5rnp8ZLQFG908E1VX5XOk1SGAhllBcvkyn9lopAQ== X-Gm-Gg: ASbGnct6cUls3QmjEPjM8zSk6zD2LIKjEIHBtWPUu6h6Uv4q6AkOnKTglkNNLGUTRfd H/OKATuy3BK95Vr9t/9NMWMnA+KMo9IRI89em2QbRhYIFNsRIt1oGw+Yw4jsNnRd919SgcKgYuP FT2JnVnlNVrH1w4vK/3qdxtIAHCd+rkMzCfq+O1xSCMH5zGSRFWa75Oh2WWBTVRf4drGrb3dwzN YnmiUKuIWN57DAktY8w7j2FuzXMAhpJml03RRszfEr2C2RwjIjsHbm7sEh81DOphenB8eGGQBMx rn+W41/VCOCEuyfKGy13bYaNHl9fLLhKAiMB+CWzI5A= X-Google-Smtp-Source: AGHT+IH/jLthttFZGwChlmYVWIdc0ZdbjjsyQOWl1xOhRZ8sMoj8FC6nMVI4aXzgdM/DNifxv+HT0Q== X-Received: by 2002:a17:90b:390e:b0:312:ec:412f with SMTP id 98e67ed59e1d1-31346b3fc52mr21701565a91.14.1749457970144; Mon, 09 Jun 2025 01:32:50 -0700 (PDT) Received: from dw-tp ([171.76.83.10]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-31349f17be8sm5273606a91.3.2025.06.09.01.32.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Jun 2025 01:32:49 -0700 (PDT) From: Ritesh Harjani (IBM) To: Baolin Wang , Michal Hocko Cc: akpm@linux-foundation.org, david@redhat.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, donettom@linux.ibm.com, aboorvad@linux.ibm.com, sj@kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] mm: fix the inaccurate memory statistics issue for users In-Reply-To: <890b825e-b3b1-4d32-83ec-662495e35023@linux.alibaba.com> Date: Mon, 09 Jun 2025 14:01:59 +0530 Message-ID: <87a56h48ow.fsf@gmail.com> References: <87bjqx4h82.fsf@gmail.com> <890b825e-b3b1-4d32-83ec-662495e35023@linux.alibaba.com> X-Rspam-User: X-Rspamd-Queue-Id: 8082B1A0005 X-Rspamd-Server: rspam09 X-Stat-Signature: xq8wnwy9mjntgybj5ojpg4gmnb9de4xp X-HE-Tag: 1749457971-415486 X-HE-Meta: U2FsdGVkX1/0EMR3XeaGyvNFwQRKMWt+BYca8UOOsSe7lkKTWjO8TlSV1FOwu/kksaB47fkzF+OwY7NLeRX7Cnr5A2lFq4iWc1rxKtLN6glDDf5Z6A/Juo1d8hzJX2/T1vRyEXOxto1Bn+14450d9zK8N9b6xamQZ+w4BL1KsmcNIiH6o0HiG+xhHn96CyrnmunR/KkBRJydIdox3qINsmdqP0BTyFqFfgoXFLy4vH17b9Vu5U/zxNQ3jcXnrcbTwpQsayyjxjP0Ju0Ob30PNR99GnOzVfef/8st8vBbnI635HiO6ERwfXLMVE728z4d/WIN8E9cy03B9gXnVLuhArbkmR1q6q8VZShyRsl+NTG5TQPiPIW+daFHSdyFxCuzUpZVRhR0w+xxTv8sjCcxTqdcL/bEkJNLle4QDZRHGjyugMCXalDLVC1zh5pV/PS3+dvM7dojYCBVPH9irHPbmrN6wH2QiK+mu93oplpevwm+9X0Lm08ZfoAwbPLoA7T21Y474UVIQpix/j+a//Nxl9yvsxxUhVJiCgeoGIgo84Sy1e+xKee2wJ7MEMBu6qSqs00TSOTKJKwr55U41RQXTi9PSX56tRm0ke+rlwUkI3IdWvKL2+Um98OOfVRt1JIE1MKVYadZtjxkWWxqcOPUABbnOGt0ERv6l1+K9exqAiziFIibVT37AUHDA3Aih4mV6fQbdZDprOfGecBwSGBz8dOXtIbaRerDnoQw9w3TofZ/OiU1Bl+6jVjQXKA6i1a+yS23sCL0eYU4fiu4LIFLqD0tOGAQGlN8UYw/bffGwJ6zTA76xOO4ahCBDUklIMAxGg3guj4ybmLEGFMpK8oM9Vh1Id+NZqfsJ1Vvqn4PRFBgWb8EeHy6s+6SMN/n5+RtrqJm6S+pxgO31aMDzn17QHYQ+6njhI3A4K1ERwdyjjVM51rR249vlkBGwYim5AUBbyKX9y0SiLNOCxvJYl8 TS3LuAeI 5lPEwiRvTz3iXVbyto/O5z9eyoRETmoDmixyDSXpr4U+o/AvdyS+l4iqy24QLlXsumd6gPmHDeQfleK5HSY+3K8JlVZkVDI/4TFhvIcFHTdqu8k71X8LzKlc+l0cn6vw/VHGZwUs6hud/CQFpSwAHE9tEYBiOsvlhsLJ2+Un3f7OuAQJTi98hxikJ5EwsC4c9kz0jDqEvTEhzdKfV31bxt3pYWhPwk3o5PhXt/+M6IsCdaGDv9AE/ffCcgkjGw+WrvYHGeDGniTczDBrWFJewZ1IqSu/fmSloR9D8BvbCoEmlwhQymOvm+tes9CYf9v8C+9kPiEXf1COAh0vb3flm/wnd+Gk0NAkhOpxD2cm68xD5Wq399cmiSMGm+zCW0ifVhga8LtKDm4pYSax9ueT1uiTNe9Zv/hiZsN5MbK9SGWe3Tui6t0jbI6X/5lD9warjyO3/RNfymhEgtJ3fKck8OwZ2Z70NGLXfFq3Ocnl9Cyqnz5fPabIb/J3QuTVb5Yr6+lzJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Baolin Wang writes: > On 2025/6/9 15:35, Michal Hocko wrote: >> On Mon 09-06-25 10:57:41, Ritesh Harjani wrote: >>> Baolin Wang writes: >>> >>>> On some large machines with a high number of CPUs running a 64K pagesize >>>> kernel, we found that the 'RES' field is always 0 displayed by the top >>>> command for some processes, which will cause a lot of confusion for users. >>>> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> 875525 root 20 0 12480 0 0 R 0.3 0.0 0:00.08 top >>>> 1 root 20 0 172800 0 0 S 0.0 0.0 0:04.52 systemd >>>> >>>> The main reason is that the batch size of the percpu counter is quite large >>>> on these machines, caching a significant percpu value, since converting mm's >>>> rss stats into percpu_counter by commit f1a7941243c1 ("mm: convert mm's rss >>>> stats into percpu_counter"). Intuitively, the batch number should be optimized, >>>> but on some paths, performance may take precedence over statistical accuracy. >>>> Therefore, introducing a new interface to add the percpu statistical count >>>> and display it to users, which can remove the confusion. In addition, this >>>> change is not expected to be on a performance-critical path, so the modification >>>> should be acceptable. >>>> >>>> In addition, the 'mm->rss_stat' is updated by using add_mm_counter() and >>>> dec/inc_mm_counter(), which are all wrappers around percpu_counter_add_batch(). >>>> In percpu_counter_add_batch(), there is percpu batch caching to avoid 'fbc->lock' >>>> contention. This patch changes task_mem() and task_statm() to get the accurate >>>> mm counters under the 'fbc->lock', but this should not exacerbate kernel >>>> 'mm->rss_stat' lock contention due to the percpu batch caching of the mm >>>> counters. The following test also confirm the theoretical analysis. >>>> >>>> I run the stress-ng that stresses anon page faults in 32 threads on my 32 cores >>>> machine, while simultaneously running a script that starts 32 threads to >>>> busy-loop pread each stress-ng thread's /proc/pid/status interface. From the >>>> following data, I did not observe any obvious impact of this patch on the >>>> stress-ng tests. >>>> >>>> w/o patch: >>>> stress-ng: info: [6848] 4,399,219,085,152 CPU Cycles 67.327 B/sec >>>> stress-ng: info: [6848] 1,616,524,844,832 Instructions 24.740 B/sec (0.367 instr. per cycle) >>>> stress-ng: info: [6848] 39,529,792 Page Faults Total 0.605 M/sec >>>> stress-ng: info: [6848] 39,529,792 Page Faults Minor 0.605 M/sec >>>> >>>> w/patch: >>>> stress-ng: info: [2485] 4,462,440,381,856 CPU Cycles 68.382 B/sec >>>> stress-ng: info: [2485] 1,615,101,503,296 Instructions 24.750 B/sec (0.362 instr. per cycle) >>>> stress-ng: info: [2485] 39,439,232 Page Faults Total 0.604 M/sec >>>> stress-ng: info: [2485] 39,439,232 Page Faults Minor 0.604 M/sec >>>> >>>> Tested-by Donet Tom >>>> Reviewed-by: Aboorva Devarajan >>>> Tested-by: Aboorva Devarajan >>>> Acked-by: Shakeel Butt >>>> Acked-by: SeongJae Park >>>> Acked-by: Michal Hocko >>>> Signed-off-by: Baolin Wang >>>> --- >>>> Changes from v1: >>>> - Update the commit message to add some measurements. >>>> - Add acked tag from Michal. Thanks. >>>> - Drop the Fixes tag. >>> >>> Any reason why we dropped the Fixes tag? I see there were a series of >>> discussion on v1 and it got concluded that the fix was correct, then why >>> drop the fixes tag? >> >> This seems more like an improvement than a bug fix. > > Yes. I don't have a strong opinion on this, but we (Alibaba) will > backport it manually, > > because some of user-space monitoring tools depend > on these statistics. That sounds like a regression then, isn't it? -ritesh