linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: Wei Xu <weixugc@google.com>, Huang Ying <ying.huang@intel.com>,
	Yang Shi <shy828301@gmail.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Tim C Chen <tim.c.chen@intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Hesham Almatary <hesham.almatary@huawei.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alistair Popple <apopple@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Jagdish Gediya <jvgediya@linux.ibm.com>
Subject: [PATCH v7 07/12] mm/demotion: Add per node memory tier attribute to sysfs
Date: Wed, 22 Jun 2022 13:55:08 +0530	[thread overview]
Message-ID: <20220622082513.467538-8-aneesh.kumar@linux.ibm.com> (raw)
In-Reply-To: <20220622082513.467538-1-aneesh.kumar@linux.ibm.com>

Add support to modify the memory tier for a NUMA node.

/sys/devices/system/node/nodeN/memtier

where N = node id

When read, It list the memory tier that the node belongs to.

When written, the kernel moves the node into the specified
memory tier, the tier assignment of all other nodes are not
affected.

If the memory tier does not exist, it is created.

Suggested-by: Wei Xu <weixugc@google.com>
Signed-off-by: Jagdish Gediya <jvgediya@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/base/node.c          | 42 ++++++++++++++++++++++++++++++++++++
 include/linux/memory-tiers.h |  2 ++
 mm/memory-tiers.c            | 42 ++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 0ac6376ef7a1..667f37eecf3a 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -20,6 +20,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/swap.h>
 #include <linux/slab.h>
+#include <linux/memory-tiers.h>
 
 static struct bus_type node_subsys = {
 	.name = "node",
@@ -560,11 +561,52 @@ static ssize_t node_read_distance(struct device *dev,
 }
 static DEVICE_ATTR(distance, 0444, node_read_distance, NULL);
 
+#ifdef CONFIG_NUMA
+static ssize_t memtier_show(struct device *dev,
+			    struct device_attribute *attr,
+			    char *buf)
+{
+	int node = dev->id;
+	int tier_index = node_get_memory_tier_id(node);
+
+	/*
+	 * CPU only NUMA node is not part of memory tiers.
+	 */
+	if (tier_index != -1)
+		return sysfs_emit(buf, "%d\n", tier_index);
+	return 0;
+}
+
+static ssize_t memtier_store(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	unsigned long tier;
+	int node = dev->id;
+	int ret;
+
+	ret = kstrtoul(buf, 10, &tier);
+	if (ret)
+		return ret;
+
+	ret = node_update_memory_tier(node, tier);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(memtier);
+#endif
+
 static struct attribute *node_dev_attrs[] = {
 	&dev_attr_meminfo.attr,
 	&dev_attr_numastat.attr,
 	&dev_attr_distance.attr,
 	&dev_attr_vmstat.attr,
+#ifdef CONFIG_NUMA
+	&dev_attr_memtier.attr,
+#endif
 	NULL
 };
 
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 3234301c2537..453f6e5d357c 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -23,6 +23,8 @@ static inline int next_demotion_node(int node)
 	return NUMA_NO_NODE;
 }
 #endif
+int node_get_memory_tier_id(int node);
+int node_update_memory_tier(int node, int tier);
 
 #else
 
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 4acf7570ae1b..b7cb368cb9c0 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -288,6 +288,48 @@ static int node_set_memory_tier(int node, int tier)
 	return ret;
 }
 
+int node_get_memory_tier_id(int node)
+{
+	int tier = -1;
+	struct memory_tier *memtier;
+	/*
+	 * Make sure memory tier is not unregistered
+	 * while it is being read.
+	 */
+	mutex_lock(&memory_tier_lock);
+	memtier = __node_get_memory_tier(node);
+	if (memtier)
+		tier = memtier->dev.id;
+	mutex_unlock(&memory_tier_lock);
+
+	return tier;
+}
+
+int node_update_memory_tier(int node, int tier)
+{
+	struct memory_tier *current_tier;
+	int ret = 0;
+
+	mutex_lock(&memory_tier_lock);
+
+	current_tier = __node_get_memory_tier(node);
+	if (!current_tier || current_tier->dev.id == tier)
+		goto out;
+
+	node_clear(node, current_tier->nodelist);
+
+	ret = __node_create_and_set_memory_tier(node, tier);
+
+	if (nodes_empty(current_tier->nodelist))
+		unregister_memory_tier(current_tier);
+
+	establish_migration_targets();
+out:
+	mutex_unlock(&memory_tier_lock);
+
+	return ret;
+}
+
 #ifdef CONFIG_MIGRATION
 /**
  * next_demotion_node() - Get the next node in the demotion path
-- 
2.36.1



  parent reply	other threads:[~2022-06-22  8:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22  8:25 [PATCH v7 00/12] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-06-22  8:25 ` [PATCH v7 01/12] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-06-22  8:25 ` [PATCH v7 02/12] mm/demotion: Move memory demotion related code Aneesh Kumar K.V
2022-06-22  8:25 ` [PATCH v7 03/12] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Aneesh Kumar K.V
2022-06-22  8:25 ` [PATCH v7 04/12] mm/demotion: Add hotplug callbacks to handle new numa node onlined Aneesh Kumar K.V
2022-06-22  8:25 ` [PATCH v7 05/12] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-06-22  8:25 ` [PATCH v7 06/12] mm/demotion: Expose memory tier details via sysfs Aneesh Kumar K.V
2022-06-22  8:25 ` Aneesh Kumar K.V [this message]
2022-06-22  8:25 ` [PATCH v7 08/12] mm/demotion: Add pg_data_t member to track node memory tier details Aneesh Kumar K.V
2022-06-22 22:52   ` kernel test robot
2022-06-22  8:25 ` [PATCH v7 09/12] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-06-23  2:53   ` Alistair Popple
2022-06-27  3:54     ` Aneesh Kumar K.V
2022-06-27  4:18       ` Alistair Popple
2022-06-22  8:25 ` [PATCH v7 10/12] mm/demotion: Update node_is_toptier to work with memory tiers Aneesh Kumar K.V
2022-06-22  8:25 ` [PATCH v7 11/12] mm/demotion: Add documentation for memory tiering Aneesh Kumar K.V
2022-06-22 21:21   ` kernel test robot
2022-06-25  2:56     ` Bagas Sanjaya
2022-06-25  4:13   ` Bagas Sanjaya
2022-06-27  4:40     ` Aneesh Kumar K.V
2022-06-30  0:57   ` Souptick Joarder
2022-06-22  8:25 ` [PATCH v7 12/12] mm/demotion: Add sysfs ABI documentation Aneesh Kumar K.V
2022-06-22 11:06 ` [PATCH v7 00/12] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220622082513.467538-8-aneesh.kumar@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=hesham.almatary@huawei.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).