From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A528C43334 for ; Wed, 15 Jun 2022 03:47:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1817C6B0072; Tue, 14 Jun 2022 23:47:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 10B0F6B0073; Tue, 14 Jun 2022 23:47:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEDCB6B0074; Tue, 14 Jun 2022 23:47:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DDDFD6B0072 for ; Tue, 14 Jun 2022 23:47:40 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AD924354C2 for ; Wed, 15 Jun 2022 03:47:40 +0000 (UTC) X-FDA: 79579085880.12.AF2F32B Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf05.hostedemail.com (Postfix) with ESMTP id CE7B810007B for ; Wed, 15 Jun 2022 03:47:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655264854; x=1686800854; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=7+KaXy+Ck+yPVdZA+JXKuDsVDGPUO+xDK8yDgUwh2fM=; b=MlSxRE1DnoGs/KHolCkskXWyUfrm2SKEfBrdwU/unN1/zVsubdic2D95 aZ0esqT8svAl4odqdDwf2sAIXxphrUc27wU9LF/U1CqXrY/Pg275iLKdO LJKkqZLfsdtnrOfNlzlCV0Nle6uSrxIUoKiQjQjI7v4Ym9tBEYAb0fbYi ah6VnnQorTegCFI3eCv2CmMay6aZmxZnMQ49sbQNOh+hVSB9asYyChk/6 ZBpYGfZzLxpSwGX4oOHQsakTISFbiyrHHAGEJifhJNG2QhGGw2NbDpLdQ 4flNs+sLI9n//LuU1emmdpUO75fQBbAGRcxD4nwDelKBwHGwuYwPtSoYg w==; X-IronPort-AV: E=McAfee;i="6400,9594,10378"; a="342787232" X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="342787232" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2022 20:47:33 -0700 X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="640754148" Received: from yutang-mobl.ccr.corp.intel.com ([10.254.214.55]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2022 20:47:29 -0700 Message-ID: <98651df913231327d5167116b65de26c7d390a2c.camel@intel.com> Subject: Re: [PATCH -V3 0/3] memory tiering: hot page selection From: Ying Huang To: Johannes Weiner Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Rik van Riel , Mel Gorman , Peter Zijlstra , Dave Hansen , Yang Shi , Zi Yan , Wei Xu , osalvador , Shakeel Butt , Zhong Jiang Date: Wed, 15 Jun 2022 11:47:23 +0800 In-Reply-To: References: <20220614081635.194014-1-ying.huang@intel.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655264857; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NMGj7Ok/F3D6SM6qH15gok61Ib8fpLJX8lgPKonoo94=; b=PwkChdaOiqg0aInTWB1TTnf/IW1Y+SM0BtgM5rx49lV9EmLJufMmnx6Ua1OYhq8yvaEL2W jyN1XJSu8gxrHntty+ErrOEw1wJ5sKJOL2a+EZbRPY9bmJU5y4AWX+qwHNUMFzUQZPHhQ1 MvBoarxkkLx7zvNPZKlrdzaFLfiDl+k= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MlSxRE1D; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf05.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655264857; a=rsa-sha256; cv=none; b=4TFwOvRfIKZC/BCR/0nRRF3pk6eZW9xvzj6YAgMcEYWj00Uf8qa598SQtlnFUj10PET41p 4baefhpGSrhUqf4Y5OawAw4zmT83ebW6KmFq2jyomCXQzFrc6kijKPpB9+7Z6ERIQ1asl0 Fc3CiX1y5haXoTbW+U1X56kSHtm9yUk= X-Rspamd-Queue-Id: CE7B810007B X-Rspam-User: X-Stat-Signature: msuk3cm579f6jtg5fbabur6wpsdfkgfa Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MlSxRE1D; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf05.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=ying.huang@intel.com X-Rspamd-Server: rspam04 X-HE-Tag: 1655264854-440608 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 2022-06-14 at 11:30 -0400, Johannes Weiner wrote: > Hi Huang, Hi, Johannes, > Have you had a chance to look at our hot page detection patch that > Hasan has sent out some time ago? [1] Yes. I have seen that patch before. > It hooks into page reclaim to determine what is and isn't hot. Reclaim > is an existing, well-tested mechanism to do just that. It's just 13 > lines of code: set active bit on the first hint fault; promote on the > second one if the active bit is still set. This promotes only pages > hot enough that they can compete with toptier access frequencies. In general, I think that patch is good. And it can work with the hot page selection patchset (this series) together. That is, if !PageActive(), then activate the page; otherwise, promote the page if the hint page fault latency is short too. In a system with swap device configured, and with continuous memory pressure on all memory types (including PMEM), the NUMA balancing hint page fault can help the page reclaiming, the page accesses can be detected much earlier. And page reclaiming can help page promotion via keeping recently-not-accessed pages in inactive list and recently-accessed pages in active list. In a system without swap device configured and continuous memory pressure on slow tier memory (e.g., PMEM), page reclaiming doesn't help much because the active/inactive list aren't scanned regularly. This is true for some users. And the method in this series still helps. > It's not just convenient, it's also essential to link tier promotion > rate to page aging. Tiered NUMA balancing is about establishing a > global LRU order across two (or more) nodes. LRU promotions *within* a > node require multiple LRU cycles with references. IMHO, LRU algorithm is good for page reclaiming. It isn't sufficient for page promoting by itself. Because it can identify cold pages well, but its accuracy of identifying hot pages isn't enough. That is, it's hard to distinguish between warm pages and hot pages with LRU/MRU itself. The hint page fault latency introduced in this series is to help on that. > LRU promotions > *between* nodes must follow the same rules, and be subject to the same > aging pressure, or you can get much colder pages promoted into a very > hot workingset and wreak havoc. > > We've hammered this patch quite extensively with several Meta > production workloads and it's been working reliably at keeping > reasonable promotion rates. Sounds good. Do you have some data to share? > @@ -4202,6 +4202,19 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) >   > >   last_cpupid = page_cpupid_last(page); >   page_nid = page_to_nid(page); > + > + /* Only migrate pages that are active on non-toptier node */ > + if (numa_promotion_tiered_enabled && > + !node_is_toptier(page_nid) && > + !PageActive(page)) { > + count_vm_numa_event(NUMA_HINT_FAULTS); > + if (page_nid == numa_node_id()) > + count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); > + mark_page_accessed(page); > + pte_unmap_unlock(vmf->pte, vmf->ptl); > + goto out; > + } > + >   target_nid = numa_migrate_prep(page, vma, vmf->address, page_nid, >   &flags); >   pte_unmap_unlock(vmf->pte, vmf->ptl); > > [1] https://lore.kernel.org/all/20211130003634.35468-1-hasanalmaruf@fb.com/t/#m85b95624622f175ca17a00cc8cc0fc9cc4eeb6d2 Best Regards, Huang, Ying