From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86B66C43334 for ; Wed, 8 Jun 2022 04:20:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03F286B0072; Wed, 8 Jun 2022 00:20:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F30078D0002; Wed, 8 Jun 2022 00:20:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF7BB8D0001; Wed, 8 Jun 2022 00:20:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D22616B0072 for ; Wed, 8 Jun 2022 00:20:04 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 9F43A60E0A for ; Wed, 8 Jun 2022 04:20:04 +0000 (UTC) X-FDA: 79553765928.03.1D3D790 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf31.hostedemail.com (Postfix) with ESMTP id B173920052 for ; Wed, 8 Jun 2022 04:20:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654662002; x=1686198002; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=oxqzxeK1ewo+4Ulb0MeKVIGm1F9MiXZOy7Gc3aAjZXc=; b=hlSakuGKKgrifbvGZGafsJ+J/ayVU0hGTjfnyBraI2OzDnBjb7CmyBj3 lUkMujH7+c9PzZzG++ZnmMegd2lXEfIA1ubH2FiP9WT0BVWFtODYBxAOs eFLyumW96htG2ut81jDshEUxc5Wm02199+AEZJ2naH80TzEG3k70UhKV2 QdndA7w6l6DUUrRdakLosYmtdYHWMuZKKvDRhI4obdJuxhvJCFOl8/5uq ZSI9DB7nROeIFwgn5n9rM3XJ0L5NelMqZkl0VbXDFzO4s9no2d1xlaFBG gPn2y+lGP9w8pl1HDSMv127Yl3rtc80Jypv4TwRRUV4V/hoiud6eLevd4 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10371"; a="276823745" X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="276823745" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 21:19:59 -0700 X-IronPort-AV: E=Sophos;i="5.91,285,1647327600"; d="scan'208";a="636540115" Received: from wantingz-mobl.ccr.corp.intel.com ([10.254.214.193]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2022 21:19:55 -0700 Message-ID: Subject: Re: [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes From: Ying Huang To: Johannes Weiner , linux-mm@kvack.org Cc: Hao Wang , Abhishek Dhanotia , Dave Hansen , Yang Shi , Tim Chen , Davidlohr Bueso , Adam Manzanares , linux-kernel@vger.kernel.org, kernel-team@fb.com, Hasan Al Maruf , Wei Xu , "Aneesh Kumar K.V" , Yang Shi Date: Wed, 08 Jun 2022 12:19:52 +0800 In-Reply-To: <20220607171949.85796-1-hannes@cmpxchg.org> References: <20220607171949.85796-1-hannes@cmpxchg.org> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hlSakuGK; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf31.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=ying.huang@intel.com X-Stat-Signature: 7o5qgszz5xk4t5se3ijxo5xp3f7r78cf X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B173920052 X-HE-Tag: 1654662002-888505 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 2022-06-07 at 13:19 -0400, Johannes Weiner wrote: > From: Hasan Al Maruf > > Existing interleave policy spreads out pages evenly across a set of > specified nodes, i.e. 1:1 interleave. Upcoming tiered memory systems > have CPU-less memory nodes with different peak bandwidth and > latency-bandwidth characteristics. In such systems, we will want to > use the additional bandwidth provided by lowtier memory for > bandwidth-intensive applications. However, the default 1:1 interleave > can lead to suboptimal bandwidth distribution. > > Introduce an N:M interleave policy, where N pages allocated to the > top-tier nodes are followed by M pages allocated to lowtier nodes. > This provides the capability to steer the fraction of memory traffic > that goes to toptier vs. lowtier nodes. For example, 4:1 interleave > leads to an 80%/20% traffic breakdown between toptier and lowtier. > > The ratios are configured through a new sysctl: > > vm.numa_tier_interleave = toptier lowtier > > We have run experiments on bandwidth-intensive production services on > CXL-based tiered memory systems, where lowtier CXL memory has, when > compared to the toptier memory directly connected to the CPU: > > - ~half of the peak bandwidth > - ~80ns higher idle latency > - steeper latency vs. bandwidth curve > > Results show that regular interleaving leads to a ~40% performance > regression over baseline; 5:1 interleaving shows an ~8% improvement > over baseline. We have found the optimal distribution changes based on > hardware characteristics: slower CXL memory will shift the optimal > breakdown from 5:1 to (e.g.) 8:1. > > The sysctl only applies to processes and vmas with an "interleave" > policy and has no bearing on contexts using prefer or bind policies. > > It defaults to a setting of "1 1", which represents even interleaving, > and so is backward compatible with existing setups. > > Signed-off-by: Hasan Al Maruf > Signed-off-by: Hao Wang > Signed-off-by: Johannes Weiner In general, I think the use case is valid. But we are changing memory tiering now, including - make memory tiering explict - support more than 2 tiers - expose memory tiering via sysfs Details can be found int the following threads, https://lore.kernel.org/lkml/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@mail.gmail.com/ https://lore.kernel.org/lkml/20220603134237.131362-1-aneesh.kumar@linux.ibm.com/ With these changes, we may need to revise your implementation. For example, put interleave knobs in memory tier sysfs interface, support more than 2 tiers, etc. Best Regards, Huang, Ying [snip]