From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>,
Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
Wei Xu <weixugc@google.com>, Yang Shi <shy828301@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim C Chen <tim.c.chen@intel.com>,
Michal Hocko <mhocko@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hesham Almatary <hesham.almatary@huawei.com>,
Dave Hansen <dave.hansen@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
Johannes Weiner <hannes@cmpxchg.org>,
jvgediya.oss@gmail.com
Subject: Re: [PATCH v10 3/8] mm/demotion: Add hotplug callbacks to handle new numa node onlined
Date: Wed, 27 Jul 2022 10:08:16 +0530 [thread overview]
Message-ID: <87h733rwzr.fsf@linux.ibm.com> (raw)
In-Reply-To: <87czdruxs0.fsf@yhuang6-desk2.ccr.corp.intel.com>
"Huang, Ying" <ying.huang@intel.com> writes:
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>
>> On 7/26/22 9:33 AM, Huang, Ying wrote:
>>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>
>>>> If the new NUMA node onlined doesn't have a performance level assigned,
>>>> the kernel adds the NUMA node to default memory tier.
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>> include/linux/memory-tiers.h | 1 +
>>>> mm/memory-tiers.c | 75 ++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 76 insertions(+)
>>>>
>>>> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
>>>> index ef380a39db3a..3d5f14d57ae6 100644
>>>> --- a/include/linux/memory-tiers.h
>>>> +++ b/include/linux/memory-tiers.h
>>>> @@ -14,6 +14,7 @@
>>>> #define MEMTIER_PERF_LEVEL_DRAM (1 << (MEMTIER_CHUNK_BITS + 2))
>>>> /* leave one tier below this slow pmem */
>>>> #define MEMTIER_PERF_LEVEL_PMEM (1 << MEMTIER_CHUNK_BITS)
>>>> +#define MEMTIER_HOTPLUG_PRIO 100
>>>>
>>>> extern bool numa_demotion_enabled;
>>>>
>>>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>>>> index 41a21cc5ae55..cc3a47ec18e4 100644
>>>> --- a/mm/memory-tiers.c
>>>> +++ b/mm/memory-tiers.c
>>>> @@ -5,6 +5,7 @@
>>>> #include <linux/lockdep.h>
>>>> #include <linux/moduleparam.h>
>>>> #include <linux/node.h>
>>>> +#include <linux/memory.h>
>>>> #include <linux/memory-tiers.h>
>>>>
>>>> struct memory_tier {
>>>> @@ -64,6 +65,78 @@ static struct memory_tier *find_create_memory_tier(unsigned int perf_level)
>>>> return new_memtier;
>>>> }
>>>>
>>>> +static struct memory_tier *__node_get_memory_tier(int node)
>>>> +{
>>>> + struct memory_tier *memtier;
>>>> +
>>>> + list_for_each_entry(memtier, &memory_tiers, list) {
>>>> + if (node_isset(node, memtier->nodelist))
>>>> + return memtier;
>>>> + }
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static void init_node_memory_tier(int node)
>>>
>>> set_node_memory_tier()?
>>
>> That was done based on feedback from Alistair
>>
>> https://lore.kernel.org/linux-mm/87h73iapg1.fsf@nvdebian.thelocal
>>
>>>
>>>> +{
>>>> + int perf_level;
>>>> + struct memory_tier *memtier;
>>>> +
>>>> + mutex_lock(&memory_tier_lock);
>>>> +
>>>> + memtier = __node_get_memory_tier(node);
>>>> + if (!memtier) {
>>>> + perf_level = node_devices[node]->perf_level;
>>>> + memtier = find_create_memory_tier(perf_level);
>>>> + node_set(node, memtier->nodelist);
>>>> + }
>
> It's related to Alistair's comments too. When will memtier != NULL
> here? We may need just VM_WARN_ON() here?
When the platform driver sets memory tier directly. With the old code
it can happen when dax/kmem register a node to a memory tier. With
memory_type proposal this can happen if the node is part of memory
type that is already added to a memory tier.
>
>>>> + mutex_unlock(&memory_tier_lock);
>>>> +}
>>>> +
>>>> +static void clear_node_memory_tier(int node)
>>>> +{
>>>> + struct memory_tier *memtier;
>>>> +
>>>> + mutex_lock(&memory_tier_lock);
>>>> + memtier = __node_get_memory_tier(node);
>>>> + if (memtier)
>>>> + node_clear(node, memtier->nodelist);
>>>
>>> When memtier->nodelist becomes empty, we need to free memtier?
>>>
>>>> + mutex_unlock(&memory_tier_lock);
>>>> +}
>>>> +
>>>> +/*
>>>> + * This runs whether reclaim-based migration is enabled or not,
>>>> + * which ensures that the user can turn reclaim-based migration
>>>> + * at any time without needing to recalculate migration targets.
>>>> + */
>>>
>>> The comments doesn't apply here.
>>>
>>>> +static int __meminit migrate_on_reclaim_callback(struct notifier_block *self,
>>>> + unsigned long action, void *_arg)
>>>
>>> Now we are building memory tiers instead of working on demotion. So I
>>> think we should rename the function to memtier_hotplug_callback().
>>>
>>>> +{
>>>> + struct memory_notify *arg = _arg;
>>>> +
>>>> + /*
>>>> + * Only update the node migration order when a node is
>>>> + * changing status, like online->offline.
>>>> + */
>>>> + if (arg->status_change_nid < 0)
>>>> + return notifier_from_errno(0);
>>>> +
>>>> + switch (action) {
>>>> + case MEM_OFFLINE:
>>>> + clear_node_memory_tier(arg->status_change_nid);
>>>> + break;
>>>> + case MEM_ONLINE:
>>>> + init_node_memory_tier(arg->status_change_nid);
>>>> + break;
>>>> + }
>>>> +
>>>> + return notifier_from_errno(0);
>>>> +}
>>>> +
>>>> +static void __init migrate_on_reclaim_init(void)
>>>> +{
>>>> + hotplug_memory_notifier(migrate_on_reclaim_callback, MEMTIER_HOTPLUG_PRIO);
>>>> +}
>>>
>>> I suggest to call hotplug_memory_notifier() in memory_tier_init()
>>> directly. We are not working on demotion here.
>>>
>>>> +
>>>> static int __init memory_tier_init(void)
>>>> {
>>>> int node;
>>>> @@ -96,6 +169,8 @@ static int __init memory_tier_init(void)
>>>> node_property->perf_level = default_memtier_perf_level;
>>>> }
>>>> mutex_unlock(&memory_tier_lock);
>>>> +
>>>> + migrate_on_reclaim_init();
>>>> return 0;
>>>> }
>>>> subsys_initcall(memory_tier_init);
>>>
>>> Best Regards,
>>> Huang, Ying
>>
>>
>> Will update the patch in next iteration to take care of other feedback.
>
> Thanks!
>
> Best Regards,
> Huang, Ying
next prev parent reply other threads:[~2022-07-27 4:38 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-20 2:59 [PATCH v10 0/8] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-07-20 2:59 ` [PATCH v10 1/8] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-07-26 3:53 ` Huang, Ying
2022-07-26 11:59 ` Aneesh Kumar K V
2022-07-27 1:16 ` Huang, Ying
2022-07-28 17:23 ` Johannes Weiner
2022-07-20 2:59 ` [PATCH v10 2/8] mm/demotion: Move memory demotion related code Aneesh Kumar K.V
2022-07-20 2:59 ` [PATCH v10 3/8] mm/demotion: Add hotplug callbacks to handle new numa node onlined Aneesh Kumar K.V
2022-07-26 4:03 ` Huang, Ying
2022-07-26 12:03 ` Aneesh Kumar K V
2022-07-27 1:53 ` Huang, Ying
2022-07-27 4:38 ` Aneesh Kumar K.V [this message]
2022-07-28 6:42 ` Huang, Ying
2022-07-20 2:59 ` [PATCH v10 4/8] mm/demotion/dax/kmem: Set node's performance level to MEMTIER_PERF_LEVEL_PMEM Aneesh Kumar K.V
2022-07-21 6:07 ` kernel test robot
2022-07-25 6:37 ` Huang, Ying
2022-07-25 6:48 ` Aneesh Kumar K V
2022-07-25 8:35 ` Huang, Ying
2022-07-25 8:42 ` Aneesh Kumar K V
2022-07-26 2:13 ` Huang, Ying
2022-07-27 4:31 ` Aneesh Kumar K.V
2022-07-28 6:39 ` Huang, Ying
2022-07-20 2:59 ` [PATCH v10 5/8] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-07-20 3:38 ` Aneesh Kumar K.V
2022-07-21 0:02 ` kernel test robot
2022-07-26 7:44 ` Huang, Ying
2022-07-26 12:30 ` Aneesh Kumar K V
2022-07-27 1:40 ` Huang, Ying
2022-07-27 4:35 ` Aneesh Kumar K.V
2022-07-28 6:51 ` Huang, Ying
2022-08-03 3:18 ` Aneesh Kumar K.V
2022-08-04 4:19 ` Huang, Ying
2022-07-20 2:59 ` [PATCH v10 6/8] mm/demotion: Add pg_data_t member to track node memory tier details Aneesh Kumar K.V
2022-07-26 8:02 ` Huang, Ying
2022-07-20 2:59 ` [PATCH v10 7/8] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-07-26 8:24 ` Huang, Ying
2022-07-20 2:59 ` [PATCH v10 8/8] mm/demotion: Update node_is_toptier to work with memory tiers Aneesh Kumar K.V
2022-07-25 8:54 ` Huang, Ying
2022-07-25 8:56 ` Aneesh Kumar K V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h733rwzr.fsf@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=hannes@cmpxchg.org \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya.oss@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.