From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0E71E7717D for ; Wed, 11 Dec 2024 19:53:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F3D66B0082; Wed, 11 Dec 2024 14:53:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A2B66B0083; Wed, 11 Dec 2024 14:53:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAC796B0085; Wed, 11 Dec 2024 14:53:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CE1DD6B0082 for ; Wed, 11 Dec 2024 14:53:36 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4B19C121637 for ; Wed, 11 Dec 2024 19:53:36 +0000 (UTC) X-FDA: 82883727822.14.CFDFCE9 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf17.hostedemail.com (Postfix) with ESMTP id CB06D4000C for ; Wed, 11 Dec 2024 19:53:16 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="fH/EZ2AE"; spf=pass (imf17.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733946803; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n3CUwpFQnwfYRhMtlvmv52eRLhSoNWH8rwqMDU4u5s8=; b=KedjqLZNKR+TG2Bhuu25T44WGXdSxRRcy/mB7ZKa1dxY3XGtRRyryehe0jTAqBJMN8zUd5 FESK3wk9wE3jHa6dahEf0bTijIsoR7pxZm/xvC6jwE3GrWoX6tEAZ0V4PvCnzRi8kkyWrM TPDUOjQ2G2ceGxE1QKrBuR+0QitZ3Sc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733946803; a=rsa-sha256; cv=none; b=ryMffdxouFTAtr1XErTMpUQ8tpdBGivBNt1H+HwgPJOrCNCUC8Pl3N2heFc2oZ3oqsghGV x71ruXCmOnId/GQn7+OpDyeHF5ZSBqtZ5Lg6HYDeBA3dwmwmDmsbpX/eNvPw377sp5VjH1 Ku8s8GaEyvPAvslmIvXBBcdHi/jwSqY= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="fH/EZ2AE"; spf=pass (imf17.hostedemail.com: domain of sj@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=sj@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 3D35AA419BD; Wed, 11 Dec 2024 19:51:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F412C4CED2; Wed, 11 Dec 2024 19:53:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733946813; bh=/Islhx5QPnsZFTF/qi0yWZBLeMxaGO8x2YhrOZpsYTw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fH/EZ2AE2pqH53MRSScd31eG/ObA/GfsrmQtufCSl4yPoy+MCOaa72mq2tTkS/Ptt MElV3mKuCIyouF7O4TeLksAGyBsjRyyLeNIyCIB3ew5+d2s/AoWfVhNt6/nEerEueP lsEbi36/ME73ZGwcBD9JV1emQNX/HjB79qiLo2p2jp4Vd5vSmEtM1AlVEEISmjaB/v g4N+HiAvMqiueLdx51+uGAUPUuNsOAH5p5tavAZcJm5094U4UxAJwv9624a8nmihmv POMTYqt3LTct+4Fgrr5FfhhslujOt5dVSWkZOavas1KrgLzaRfknHtIs9GeNbFzjQX QNnQ+32ufo7aw== From: SeongJae Park To: Yuanchu Xie Cc: SeongJae Park , David Hildenbrand , "Aneesh Kumar K.V" , Khalid Aziz , Henry Huang , Yu Zhao , Dan Williams , Gregory Price , Huang Ying , Lance Yang , Randy Dunlap , Muhammad Usama Anjum , Tejun Heo , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Jonathan Corbet , Greg Kroah-Hartman , "Rafael J. Wysocki" , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , =?UTF-8?q?Eugenio=20P=C3=A9rez?= , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Mike Rapoport , Shuah Khan , Christian Brauner , Daniel Watson , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, virtualization@lists.linux.dev, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH v4 0/9] mm: workingset reporting Date: Wed, 11 Dec 2024 11:53:29 -0800 Message-Id: <20241211195329.60224-1-sj@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: CB06D4000C X-Stat-Signature: 5p8zkizgp66e5o7s8yhcxu53bjehhhef X-Rspam-User: X-HE-Tag: 1733946796-74793 X-HE-Meta: U2FsdGVkX18cqtdcnF2XRJiAjVD8WWvwukLmWjksZcHP6+zmw+gWRIIRWnDg6xgwteQfE/pQqJPjfakD5snvZLDM4w19d0H9lYo3DoiAPZi/WCfsNse5pTSshUb78w+aFeF72ntf/PsJJYhNZKqI0Xvsj+Dwn6RaL5On7RFzgBpjpz8S/flR4u32k5hVmmjFBRgjogJPY8K98EN8f5yDaXwtxFJoNyv6UtMPM6j2mq4A3Fpu5zitwjW3gSMOsMN72JJQyO13OV9kS4kHg4uockoK9IblD05kl9bwJ/V8SqLocB7HMmiJgDqECfx0HrzO38KLRXFUXq6R0uXOiAB+tHnEiW4s9577zQuq7D0q10B1AcBiyh8hTpsuj43ycVy3e/8r2uHSeQZnvXx/DSxd1L9545CKBMRBMtzE2bTP+m/sUXT9H7aOegOF6yAHq8SCJdHL34bT8oZW+E4SSWVgOrvcu7gPGTD2VFTFtak93nAJazEEGvpDPYyDTVgvPpwq5Ez1Mb8Pb3AJpj2CGXvnTRDsVXeYF6k85bCivbxq7vPR/kjKV631WO9GvGGHuJ3u8qmoYn1yiiNKwg3wBiQxIUudZPSmUhbljSzo3w+c5TcaCob6tTejWgsMLZJ9W5paDl/1ckm995i7V8uMxHOcR1h65KYAxa+Ppp6uN8+AfYG3e9HxEFvK8B8qJZoga9CR0LZSXV1nuYfccrpvsV2Wvzww4NRTDXhBQgI7b17dVr1ARyB29U2QdOMvzEAkCHCP65c94kgRieRnmXLE4EIdaBhXWGngjDKHGWf8pRrFVR2botN4V/WI0F/gUJMr5PADJC+SB6XX1t2VutuMGM4DDLPLf86oFUa6ostSPMOV9UJHitkIDBBLOInqrV9TnQq67kqxVVfiOMOGgX15t21J+z2KTDEszO8gbfU6K2Ip4xrZ+wQ9N2JDUnc/hyb9QjqYjDWWI5nOfQ8yT0Dd0UO aAuW05k1 p+IyJ7zlKMWUqKDU2fRmFDleUhWoaed2QY2hzvjPMvcYJmDpj2pp7B/AKye8HVlg7t6zsN6ZA8taZlYg8C5ImPiQ5xU+cO4swlKM5vodTtT0jn1lXox3tnDHS3eHiUW8U0RK7Hm7iTdaabkysaBI/Pd+gCTL09wPPI47aw27kxH5khRVLRMord1FgabhrEL7tRrcEIWLVgErBZlR+wGSswmWJc1Px1FA0wVXfWCoXbUSQa1zE0kMbg0Fh6887pkBKY9P849CeZbIA540Zh5dRq9Pe6pcvwavulkEn4WtyH/uXddJYsOtZVbKNmJObqbn2hmS5qygRgeQNZNPMKC5mW4arLYp/C1l2hGlVnRMgUx5ONhhJJPKa956A6A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 6 Dec 2024 11:57:55 -0800 Yuanchu Xie wrote: > Thanks for the response Johannes. Some replies inline. > > On Tue, Nov 26, 2024 at 11:26\u202fPM Johannes Weiner wrote: > > > > On Tue, Nov 26, 2024 at 06:57:19PM -0800, Yuanchu Xie wrote: > > > This patch series provides workingset reporting of user pages in > > > lruvecs, of which coldness can be tracked by accessed bits and fd > > > references. However, the concept of workingset applies generically to > > > all types of memory, which could be kernel slab caches, discardable > > > userspace caches (databases), or CXL.mem. Therefore, data sources might > > > come from slab shrinkers, device drivers, or the userspace. > > > Another interesting idea might be hugepage workingset, so that we can > > > measure the proportion of hugepages backing cold memory. However, with > > > architectures like arm, there may be too many hugepage sizes leading to > > > a combinatorial explosion when exporting stats to the userspace. > > > Nonetheless, the kernel should provide a set of workingset interfaces > > > that is generic enough to accommodate the various use cases, and extensible > > > to potential future use cases. > > > > Doesn't DAMON already provide this information? > > > > CCing SJ. > Thanks for the CC. DAMON was really good at visualizing the memory > access frequencies last time I tried it out! Thank you for this kind acknowledgement, Yuanchu! > For server use cases, > DAMON would benefit from integrations with cgroups. The key then would be a > standard interface for exporting a cgroup's working set to the user. I show two ways to make DAMON supports cgroups for now. First way is making another DAMON operations set implementation for cgroups. I shared a rough idea for this before, probably on kernel summit. But I haven't had a chance to prioritize this so far. Please let me know if you need more details. The second way is extending DAMOS filter to provide more detailed statistics per DAMON-region, and adding another DAMOS action that does nothing but only accounting the detailed statistics. Using the new DAMOS action, users will be able to know how much of specific DAMON-found regions are filtered out by the given filter. Because we have DAMOS filter type for cgroups, we can know how much of workingset (or, warm memory) belongs to specific groups. This can be applied to not only cgroups, but for any DAMOS filter types that exist (e.g., anonymous page, young page). I believe the second way is simpler to implement while providing information that sufficient for most possible use cases. I was anyway planning to do this. > It would be good to have something that will work for different > backing implementations, DAMON, MGLRU, or active/inactive LRU. I think we can do this using the filter statistics, with new filter types. For example, we can add new DAMOS filter that filters pages if it is for specific range of MGLRU-gen of the page, or whether the page belongs to active or inactive LRU lists. > > > > > > Use cases > > > ========== [...] > > Access frequency is only half the picture. Whether you need to keep > > memory with a given frequency resident depends on the speed of the > > backing device. [...] > > > Benchmarks > > > ========== > > > Ghait Ouled Amar Ben Cheikh has implemented a simple policy and ran Linux > > > compile and redis benchmarks from openbenchmarking.org. The policy and > > > runner is referred to as WMO (Workload Memory Optimization). > > > The results were based on v3 of the series, but v4 doesn't change the core > > > of the working set reporting and just adds the ballooning counterpart. > > > > > > The timed Linux kernel compilation benchmark shows improvements in peak > > > memory usage with a policy of "swap out all bytes colder than 10 seconds > > > every 40 seconds". A swapfile is configured on SSD. [...] > > You can do this with a recent (>2018) upstream kernel and ~100 lines > > of python [1]. It also works on both LRU implementations. > > > > [1] https://github.com/facebookincubator/senpai > > > > We use this approach in virtually the entire Meta fleet, to offload > > unneeded memory, estimate available capacity for job scheduling, plan > > future capacity needs, and provide accurate memory usage feedback to > > application developers. > > > > It works over a wide variety of CPU and storage configurations with no > > specific tuning. > > > > The paper I referenced above provides a detailed breakdown of how it > > all works together. > > > > I would be curious to see a more in-depth comparison to the prior art > > in this space. At first glance, your proposal seems more complex and > > less robust/versatile, at least for offloading and capacity gauging. > We have implemented TMO PSI-based proactive reclaim and compared it to > a kstaled-based reclaimer (reclaiming based on 2 minute working set > and refaults). The PSI-based reclaimer was able to save more memory, > but it also caused spikes of refaults and a lot higher > decompressions/second. Overall the test workloads had better > performance with the kstaled-based reclaimer. The conclusion was that > it was a trade-off. I agree it is only half of the picture, and there could be tradeoff. Motivated by those previous works, DAMOS provides PSI-based aggressiveness auto-tuning to use both ways. > I do agree there's not a good in-depth comparison > with prior art though. I would be more than happy to help the comparison work agains DAMON of current implementation and future plans, and any possible collaborations. Thanks, SJ