From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
Wei Xu <weixugc@google.com>, Yang Shi <shy828301@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim C Chen <tim.c.chen@intel.com>,
Michal Hocko <mhocko@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hesham Almatary <hesham.almatary@huawei.com>,
Dave Hansen <dave.hansen@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Alistair Popple <apopple@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Johannes Weiner <hannes@cmpxchg.org>,
jvgediya.oss@gmail.com
Subject: Re: [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM
Date: Fri, 29 Jul 2022 12:49:34 +0530 [thread overview]
Message-ID: <87bkt8s7w9.fsf@linux.ibm.com> (raw)
In-Reply-To: <875yjgmocg.fsf@yhuang6-desk2.ccr.corp.intel.com>
"Huang, Ying" <ying.huang@intel.com> writes:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>
>> By default, all nodes are assigned to the default memory tier which
>> is the memory tier designated for nodes with DRAM
>>
>> Set dax kmem device node's tier to slower memory tier by assigning
>> abstract distance to MEMTIER_ADISTANCE_PMEM. PMEM tier
>> appears below the default memory tier in demotion order.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>> drivers/dax/kmem.c | 9 +++++++++
>> include/linux/memory-tiers.h | 19 ++++++++++++++++++-
>> mm/memory-tiers.c | 28 ++++++++++++++++------------
>> 3 files changed, 43 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>> index a37622060fff..6b0d5de9a3e9 100644
>> --- a/drivers/dax/kmem.c
>> +++ b/drivers/dax/kmem.c
>> @@ -11,6 +11,7 @@
>> #include <linux/fs.h>
>> #include <linux/mm.h>
>> #include <linux/mman.h>
>> +#include <linux/memory-tiers.h>
>> #include "dax-private.h"
>> #include "bus.h"
>>
>> @@ -41,6 +42,12 @@ struct dax_kmem_data {
>> struct resource *res[];
>> };
>>
>> +static struct memory_dev_type default_pmem_type = {
>
> Why is this named as default_pmem_type? We will not change the memory
> type of a node usually.
>
Any other suggestion? pmem_dev_type?
>> + .adistance = MEMTIER_ADISTANCE_PMEM,
>> + .tier_sibiling = LIST_HEAD_INIT(default_pmem_type.tier_sibiling),
>> + .nodes = NODE_MASK_NONE,
>> +};
>> +
>> static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>> {
>> struct device *dev = &dev_dax->dev;
>> @@ -62,6 +69,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>> return -EINVAL;
>> }
>>
>> + init_node_memory_type(numa_node, &default_pmem_type);
>> +
>
> The memory hot-add below may fail. So the error handling needs to be
> added.
>
> And, it appears that the memory type and memory tier of a node may be
> fully initialized here before NUMA hot-adding started. So I suggest to
> set node_memory_types[] here only. And set memory_dev_type->nodes in
> node hot-add callback. I think there is the proper place to complete
> the initialization.
>
> And, in theory dax/kmem.c can be unloaded. So we need to clear
> node_memory_types[] for nodes somewhere.
>
I guess by module exit we can be sure that all the memory managed
by dax/kmem is hotplugged out. How about something like below?
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 6b0d5de9a3e9..eb4e158012a9 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -248,6 +248,7 @@ static void __exit dax_kmem_exit(void)
dax_driver_unregister(&device_dax_kmem_driver);
if (!any_hotremove_failed)
kfree_const(kmem_name);
+ unregister_memory_type(&default_pmem_type);
}
MODULE_AUTHOR("Intel Corporation");
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index fc6b7a14da51..8355baf5b8b4 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -31,6 +31,7 @@ struct memory_dev_type {
#ifdef CONFIG_NUMA
extern bool numa_demotion_enabled;
void init_node_memory_type(int node, struct memory_dev_type *default_type);
+void unregister_memory_type(struct memory_dev_type *memtype);
#ifdef CONFIG_MIGRATION
int next_demotion_node(int node);
void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -57,6 +58,10 @@ static inline bool node_is_toptier(int node)
#define numa_demotion_enabled false
static inline void init_node_memory_type(int node, struct memory_dev_type *default_type)
{
+}
+
+static inline void unregister_memory_type(struct memory_dev_type *memtype)
+{
}
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 064e0f932795..4d29ebd4c4f3 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -500,6 +500,28 @@ void init_node_memory_type(int node, struct memory_dev_type *default_type)
mutex_unlock(&memory_tier_lock);
}
+void unregister_memory_type(struct memory_dev_type *memtype)
+{
+ int node;
+ struct memory_tier *memtier = memtype->memtier;
+
+ mutex_lock(&memory_tier_lock);
+ for(node = 0; node < MAX_NUMNODES; node++) {
+ if (node_memory_types[node] == memtype) {
+ if (!nodes_empty(memtype->nodes))
+ WARN_ON(1);
+ node_memory_types[node] = NULL;
+ }
+ }
+
+ list_del(&memtype->tier_sibiling);
+ memtype->memtier = NULL;
+ if (list_empty(&memtier->memory_types))
+ destroy_memory_tier(memtier);
+
+ mutex_unlock(&memory_tier_lock);
+}
+
void update_node_adistance(int node, struct memory_dev_type *memtype)
{
pg_data_t *pgdat;
next prev parent reply other threads:[~2022-07-29 7:20 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-28 19:04 [PATCH v11 0/8] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 1/8] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-07-29 6:25 ` Huang, Ying
2022-07-29 7:24 ` Aneesh Kumar K.V
2022-08-02 2:50 ` Dan Williams
2022-08-02 3:16 ` Huang, Ying
2022-08-02 3:40 ` Dan Williams
2022-08-02 5:03 ` Aneesh Kumar K V
2022-08-02 6:57 ` Huang, Ying
2022-08-02 9:34 ` Aneesh Kumar K V
2022-08-04 0:56 ` Huang, Ying
2022-08-04 4:49 ` Aneesh Kumar K V
2022-08-04 5:19 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 2/8] mm/demotion: Move memory demotion related code Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 3/8] mm/demotion: Add hotplug callbacks to handle new numa node onlined Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM Aneesh Kumar K.V
2022-07-29 6:20 ` Huang, Ying
2022-07-29 7:19 ` Aneesh Kumar K.V [this message]
2022-08-01 2:06 ` Huang, Ying
2022-08-01 4:40 ` Aneesh Kumar K V
2022-08-01 5:10 ` Huang, Ying
2022-08-01 5:38 ` Aneesh Kumar K V
2022-08-01 6:37 ` Huang, Ying
2022-08-01 6:55 ` Aneesh Kumar K V
2022-08-01 7:13 ` Huang, Ying
2022-08-01 7:41 ` Aneesh Kumar K V
2022-08-02 1:58 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 5/8] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-07-29 6:35 ` Huang, Ying
2022-07-29 7:22 ` Aneesh Kumar K.V
2022-08-01 2:15 ` Huang, Ying
2022-07-28 19:04 ` [PATCH v11 6/8] mm/demotion: Add pg_data_t member to track node memory tier details Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 7/8] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-07-28 19:04 ` [PATCH v11 8/8] mm/demotion: Update node_is_toptier to work with memory tiers Aneesh Kumar K.V
2022-07-29 6:39 ` Huang, Ying
2022-07-29 6:41 ` Aneesh Kumar K V
2022-07-29 6:47 ` Aneesh Kumar K V
2022-08-01 1:04 ` Huang, Ying
2022-07-29 5:30 ` [PATCH v11 0/8] mm/demotion: Memory tiers and demotion Huang, Ying
2022-07-29 6:17 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bkt8s7w9.fsf@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=hannes@cmpxchg.org \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya.oss@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.