From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3DFFC47DB3 for ; Thu, 18 Jan 2024 17:18:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5ED296B0088; Thu, 18 Jan 2024 12:18:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 59C256B0089; Thu, 18 Jan 2024 12:18:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 463986B008A; Thu, 18 Jan 2024 12:18:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 379406B0088 for ; Thu, 18 Jan 2024 12:18:03 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EC06D120712 for ; Thu, 18 Jan 2024 17:18:02 +0000 (UTC) X-FDA: 81693089604.01.E6AF61E Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf02.hostedemail.com (Postfix) with ESMTP id EED078001F for ; Thu, 18 Jan 2024 17:18:00 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=YY0GKIJc; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of sj@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=sj@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705598281; a=rsa-sha256; cv=none; b=irkKuDinExBiu8nujBSCs7M6P+vXI7ey4btTgRihlVybzVE1eDweYYltBCDqSjo9vnB6zi cPrlsReteNpvfYZmOazZaPtwkSHuikmdnJFYH2doM9AJYjP6wDzTgnlDyfKxw+Wr3QlXL9 qoU3kab9mxq3Ijz8PYMZdW/leOm4fig= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=YY0GKIJc; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf02.hostedemail.com: domain of sj@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=sj@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705598281; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D8Lii/8pOBoDmjZreMfQNKlvWqvHpnVfMQWdqGrpiiw=; b=OZiyLVmkNL2HP7dO8uy2QDYX6XGRWxE3VkI5Spr7ib7+Ma+1U/31AccmDtIWRO9DCHMm85 KH05TO+wzxf9E5jzFwIxaD9Zg/FX//YRSo459FovWhzXrqYv7KYa97siC3o99ToCEW/AnN nEkgjiIOArETvB0iPOVNTkXApf7QIZs= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by ams.source.kernel.org (Postfix) with ESMTP id 8087AB817CD; Thu, 18 Jan 2024 17:17:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 66706C433C7; Thu, 18 Jan 2024 17:17:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705598278; bh=3QxSF28AU67pQOjUoBLeJaaRz+4V7zAoBNQxbnwWDt8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YY0GKIJc+pAS737+c4sKDiztl/QoGtBo8pvn+YwjPSonfyxdp0JRYw4bQNAr0joWz CaVmb7jTY4wu9qshPkDLOY+weKEPjX+1DBD9bFW2MdV537bXdXFmTO3Ba4DOODuxuI 7pWpez9XB8YlGlN4/ud5X91JMKrygflfa27tvcxRILMMUKFvlWpolaQ3aN3mKmzQYW Gp8csA815jUZcly340n12SsW5f4Lbf/h33nXVW2vBTKUzh0JgLfOYa+JmPfcSA0J+R SFPrW1SXWufOKHqiDMTSLlhk0NrSTLgLDJjLUyCvUAKzvYcUSL/ovQkMkm6+Q0XnhE KPhE25L8zpjvA== From: SeongJae Park To: Hyeongtak Ji Cc: sj@kernel.org, akpm@linux-foundation.org, apopple@nvidia.com, baolin.wang@linux.alibaba.com, damon@lists.linux.dev, dave.jiang@intel.com, honggyu.kim@sk.com, kernel_team@skhynix.com, linmiaohe@huawei.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, lizhijian@cn.fujitsu.com, mathieu.desnoyers@efficios.com, mhiramat@kernel.org, rakie.kim@sk.com, rostedt@goodmis.org, surenb@google.com, yangx.jy@fujitsu.com, ying.huang@intel.com, ziy@nvidia.com Subject: Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory Date: Thu, 18 Jan 2024 09:17:56 -0800 Message-Id: <20240118171756.80356-1-sj@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240118104017.2098-1-hyeongtak.ji@sk.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EED078001F X-Stat-Signature: qkhocdrjz4nq4i6hcr18p5ajibsqui5k X-HE-Tag: 1705598280-796708 X-HE-Meta: U2FsdGVkX1/l0dKTARzYMcH2s/IFjLXA9KwehjAAFd8b+hjL6TZgLTGcLgHYkKmRLUX3joFsX4C8vCpuXOFjZACz/JAnCIlIKMkK8UoYddUHyHnTMZWDJ3c/ScW5vVB38GuLOIKoauMEuceJcGESX/gYTkfelZcO9rccMUbORXYagUHp9rwC9We0isvretOqlEQ4L8b7v1ORvK7p9gH+5xy29MVXIrMJjuF0TiES9da9FPwkvLxmJxGzvftsJVlzbdFoVFZMz9S/+eIdI++D7/U+0zOzeuaOEg4sds+ZcF5gPiQRg758C8Dp4iMmPy8VNTQZ5JMO2kq1e3enfwSgDrq+z0LxQA4GrTU27RIgjSvqM4/mP7exFXizpXGxzrB+lQ96CqvEWgwhOlZy2Y0sU+z3QyZmulXZ+AriI0ET1N89aN23Wcc3jFELntpAMQzXl04F+T+N05w1GJ53411A3DvdQC+TaevxHz7nEpYub/yQ3nWQKP45rOFwpBf7zaFNc+mOYAKkBQX54jkcNXKKg1o1EXLb2qdGsvyY6A0GMgj7WtgrgTeE78QkfW7viS6CMEaUOQ9vwAa73e+8B//nHgnrB8mf6fde37olwsSWedqFM2fD2vzvrb/NgwtH7gguqXrdf6ZibbPrIs/QXJBIU1kyKfur8n48U9Jvf2Jd3tSE53uzOz5kzBJeBD+lOKyPUWmjIheMl6LT/DYVZbVEit9xTbmHkRRHTizMhCswJIddZzC0FQbNTsQJqLKrln9MjuksJxGJMUeMksVqf1T7Mw/FThIzfU3fejW8unVYM0j7M5gDxX3jOA39yyO1e1fwX5UYmgEjSkT2WXAYhjPMYucyQTrG7mb29otp2pmx0w0+TlQx7btxP/Me4OxmQAUjkeGQJ5xX+891VKxeL5S1EQJZa19pQE6CI+0bROfA+0QR79tAAIjTaPzDvaypVCVW2L7TF3qcscGSl13i1Ah 0b8xGqfw 8WV2nLJ4rBahTjz7MgR6OtDLZdq/fOgmuRt/3NTXV/0JOmpvVwSIKFOjDjgPc6SV9ky9Qio5aPRJHDLruhvkto/0YzKVkVrKajHoTbBmuKVFW9Adl+h+ANHNmdFmCs4fTFIR8lKUwIAftHBzdx9aRpGWBJrDjkkv0ygpLAhqhTm6MmgTmXPFcTnv7xGbpHHLi+gaAg3/TdONaL+aO4HlH4xAQu+SgniZHYENMNEtWueRle+kOu+93L3nqF6voMKKFw706ZXcLZ2MlsCNurqBBosxe/5vnne5qXZfyywHW3pDxraE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 18 Jan 2024 19:40:16 +0900 Hyeongtak Ji wrote: > Hi SeongJae, > > On Wed, 17 Jan 2024 SeongJae Park wrote: > > [...] > >> Let's say there are 3 nodes in the system and the first node0 and node1 > >> are the first tier, and node2 is the second tier. > >> > >> $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist > >> 0-1 > >> > >> $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist > >> 2 > >> > >> Here is the result of partitioning hot/cold memory and I put execution > >> command at the right side of numastat result. I initially ran each > >> hot_cold program with preferred setting so that they initially allocate > >> memory on one of node0 or node2, but they gradually migrated based on > >> their access frequencies. > >> > >> $ numastat -c -p hot_cold > >> Per-node process memory usage (in MBs) > >> PID Node 0 Node 1 Node 2 Total > >> --------------- ------ ------ ------ ----- > >> 754 (hot_cold) 1800 0 2000 3800 <- hot_cold 1800 2000 > >> 1184 (hot_cold) 300 0 500 800 <- hot_cold 300 500 > >> 1818 (hot_cold) 801 0 3199 4000 <- hot_cold 800 3200 > >> 30289 (hot_cold) 4 0 5 10 <- hot_cold 3 5 > >> 30325 (hot_cold) 31 0 51 81 <- hot_cold 30 50 > >> --------------- ------ ------ ------ ----- > >> Total 2938 0 5756 8695 > >> > >> The final node placement result shows that DAMON accurately migrated > >> pages by their hotness for multiple processes. > > > > What was the result when the corner cases handling logics were not applied? > > This is the result of the same test that Honggyu did, but with an insufficient > corner cases handling logics. > > $ numastat -c -p hot_cold > > Per-node process memory usage (in MBs) > PID Node 0 Node 1 Node 2 Total > -------------- ------ ------ ------ ----- > 862 (hot_cold) 2256 0 1545 3801 <- hot_cold 1800 2000 > 863 (hot_cold) 403 0 398 801 <- hot_cold 300 500 > 864 (hot_cold) 1520 0 2482 4001 <- hot_cold 800 3200 > 865 (hot_cold) 6 0 3 9 <- hot_cold 3 5 > 866 (hot_cold) 29 0 52 81 <- hot_cold 30 50 > -------------- ------ ------ ------ ----- > Total 4215 0 4480 8695 > > As time goes by, DAMON keeps trying to split the hot/cold region, but it does > not seem to be enough. > > $ numastat -c -p hot_cold > > Per-node process memory usage (in MBs) > PID Node 0 Node 1 Node 2 Total > -------------- ------ ------ ------ ----- > 862 (hot_cold) 2022 0 1780 3801 <- hot_cold 1800 2000 > 863 (hot_cold) 351 0 450 801 <- hot_cold 300 500 > 864 (hot_cold) 1134 0 2868 4001 <- hot_cold 800 3200 > 865 (hot_cold) 7 0 2 9 <- hot_cold 3 5 > 866 (hot_cold) 43 0 39 81 <- hot_cold 30 50 > -------------- ------ ------ ------ ----- > Total 3557 0 5138 8695 > > > > > And, what are the corner cases handling logic that seemed essential? I show > > the page granularity active/reference check could indeed provide many > > improvements, but that's only my humble assumption. > > Yes, the page granularity active/reference check is essential. To make the > above "insufficient" result, the only thing I did was to promote > inactive/not_referenced pages. > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f03be320f9ad..c2aefb883c54 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1127,9 +1127,7 @@ static unsigned int __promote_folio_list(struct list_head *folio_list, > VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > > references = folio_check_references(folio, sc); > - if (references == FOLIOREF_KEEP || > - references == FOLIOREF_RECLAIM || > - references == FOLIOREF_RECLAIM_CLEAN) > + if (references == FOLIOREF_KEEP ) > goto keep_locked; > > /* Relocate its contents to another node. */ Thank you for sharing the details :) I think DAMOS filters based approach could be worthy to try, then. > > > > > If the corner cases are indeed better to be applied in page granularity, I > > agree we need some more efforts since DAMON monitoring results are not page > > granularity aware by the design. Users could increase min_nr_regions to make > > it more accurate, and we have plan to support page granularity monitoring, > > though. But maybe the overhead could be unacceptable. > > > > Ideal solution would be making DAMON more accurate while keeping current level > > of overhead. We indeed have TODO items for DAMON accuracy improvement, but > > this may take some time that might unacceptable for your case. > > > > If that's the case, I think the additional corner handling (or, page gran > > additional access check) could be made as DAMOS filters[1], since DAMOS filters > > can be applied in page granularity, and designed for this kind of handling of > > information that DAMON monitoring results cannot provide. More specifically, > > we could have filters for promotion-qualifying pages and demotion-qualifying > > pages. In this way, I think we can keep the action more flexible while the > > filters can be applied in creative ways. > > Making corner handling as a new DAMOS filters is a good idea. I'm just a bit > concerned if adding new filters might cause users to care more. I prefer keeping DAMON API and Sysfs interface flexible and easy to extended even if it increases number of parameters, while providing simplified high level interfaces for end users aiming to use DAMON for specific use cases, like DAMON_RECLAIM, DAMON_LRU_SORT, and damo do. Hence I'm not very concerned. Thanks, SJ > > Kind regards, > Hyeongtak