From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0B95C43334 for ; Thu, 2 Jun 2022 06:41:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E0256B0071; Thu, 2 Jun 2022 02:41:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 08F086B0072; Thu, 2 Jun 2022 02:41:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E95986B0073; Thu, 2 Jun 2022 02:41:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D98686B0071 for ; Thu, 2 Jun 2022 02:41:51 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id AD10A12029F for ; Thu, 2 Jun 2022 06:41:51 +0000 (UTC) X-FDA: 79532350422.25.4582693 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf23.hostedemail.com (Postfix) with ESMTP id D405F140015 for ; Thu, 2 Jun 2022 06:41:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654152109; x=1685688109; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=hkExu+xypCvEZ2GrPWlhA1VX0WFEoqjTwVjGVoC+Nkk=; b=hSo5J6Dm1wK46O4uAoKgxOPl8EZo1H/ACCm4GMTPfIPNpZWf/HiYO72W 1bgqcQ5JetXk+ok7KDqzIZf7D7DyC3/IAftdmaoES/wNjQy4D99HyMlNR RP/MUWcHv6N5I5Ax3G9Q69tHl7wks6WqSbUhUfZVHZXMTwc/KeGThT8t5 psfLI3F/FFYal5FW3yK/DgiIItge87DR7MLIScyvIlA6lBLDS46DUa8nK siHzpupvDVt5W/7hLMuMlmMOEJzbeHKlW8cU99LsG/SNuZr98Vst5HEk9 PTIQLgwOt2qI0oqh3m67O8ae89iNtrb31iRLaBRIf+72LZluH03zt7J1u g==; X-IronPort-AV: E=McAfee;i="6400,9594,10365"; a="275569875" X-IronPort-AV: E=Sophos;i="5.91,270,1647327600"; d="scan'208";a="275569875" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 23:41:46 -0700 X-IronPort-AV: E=Sophos;i="5.91,270,1647327600"; d="scan'208";a="552685962" Received: from yanqingl-mobl1.ccr.corp.intel.com ([10.254.212.10]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Jun 2022 23:41:41 -0700 Message-ID: Subject: Re: [RFC PATCH v4 5/7] mm/demotion: Add support to associate rank with memory tier From: Ying Huang To: "Aneesh Kumar K.V" , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Date: Thu, 02 Jun 2022 14:41:39 +0800 In-Reply-To: <20220527122528.129445-6-aneesh.kumar@linux.ibm.com> References: <20220527122528.129445-1-aneesh.kumar@linux.ibm.com> <20220527122528.129445-6-aneesh.kumar@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: o31fgtodoinfpgzha3yyus819tk5rijq X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=hSo5J6Dm; spf=none (imf23.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D405F140015 X-HE-Tag: 1654152083-604227 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2022-05-27 at 17:55 +0530, Aneesh Kumar K.V wrote: > The rank approach allows us to keep memory tier device IDs stable even if there > is a need to change the tier ordering among different memory tiers. e.g. DRAM > nodes with CPUs will always be on memtier1, no matter how many tiers are higher > or lower than these nodes. A new memory tier can be inserted into the tier > hierarchy for a new set of nodes without affecting the node assignment of any > existing memtier, provided that there is enough gap in the rank values for the > new memtier. > > The absolute value of "rank" of a memtier doesn't necessarily carry any meaning. > Its value relative to other memtiers decides the level of this memtier in the tier > hierarchy. > > For now, This patch supports hardcoded rank values which are 100, 200, & 300 for > memory tiers 0,1 & 2 respectively. > > Below is the sysfs interface to read the rank values of memory tier, > /sys/devices/system/memtier/memtierN/rank > > This interface is read only for now, write support can be added when there is > a need of flexibility of more number of memory tiers(> 3) with flexibile ordering > requirement among them, rank can be utilized there as rank decides now memory > tiering ordering and not memory tier device ids. > > Signed-off-by: Aneesh Kumar K.V > --- >  drivers/base/node.c | 5 +- >  drivers/dax/kmem.c | 2 +- >  include/linux/migrate.h | 17 ++-- >  mm/migrate.c | 218 ++++++++++++++++++++++++---------------- >  4 files changed, 144 insertions(+), 98 deletions(-) > > diff --git a/drivers/base/node.c b/drivers/base/node.c > index cf4a58446d8c..892f7c23c94e 100644 > --- a/drivers/base/node.c > +++ b/drivers/base/node.c > @@ -567,8 +567,11 @@ static ssize_t memtier_show(struct device *dev, >   char *buf) >  { >   int node = dev->id; > + int tier_index = node_get_memory_tier_id(node); >   > > > > - return sysfs_emit(buf, "%d\n", node_get_memory_tier(node)); > + if (tier_index != -1) > + return sysfs_emit(buf, "%d\n", tier_index); > + return 0; >  } >   > > > >  static ssize_t memtier_store(struct device *dev, > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 991782aa2448..79953426ddaf 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -149,7 +149,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >   dev_set_drvdata(dev, data); >   > > > >  #ifdef CONFIG_TIERED_MEMORY > - node_set_memory_tier(numa_node, MEMORY_TIER_PMEM); > + node_set_memory_tier_rank(numa_node, MEMORY_RANK_PMEM); I think that we can work with memory tier ID inside kernel? Best Regards, Huang, Ying [snip]