From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Bharata B Rao <bharata@amd.com>,
linux-mm@kvack.org, akpm@linux-foundation.org
Cc: Huang Ying <ying.huang@intel.com>,
Greg Thelen <gthelen@google.com>, Yang Shi <shy828301@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Tim C Chen <tim.c.chen@intel.com>,
Brice Goglin <brice.goglin@gmail.com>,
Michal Hocko <mhocko@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Hesham Almatary <hesham.almatary@huawei.com>,
Dave Hansen <dave.hansen@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Alistair Popple <apopple@nvidia.com>,
Dan Williams <dan.j.williams@intel.com>,
Feng Tang <feng.tang@intel.com>,
Jagdish Gediya <jvgediya@linux.ibm.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Rientjes <rientjes@google.com>
Subject: Re: [RFC PATCH v4 4/7] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM
Date: Mon, 06 Jun 2022 17:24:22 +0530 [thread overview]
Message-ID: <87fski80sx.fsf@linux.ibm.com> (raw)
In-Reply-To: <a844c8c9-e1e1-2ccb-d58c-a5a608afabc0@linux.ibm.com>
Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> On 6/6/22 3:41 PM, Bharata B Rao wrote:
>> On 6/3/2022 2:34 PM, Aneesh Kumar K V wrote:
>>> On 6/2/22 12:06 PM, Bharata B Rao wrote:
>>>> On 6/1/2022 7:19 PM, Aneesh Kumar K V wrote:
>>>>> On 6/1/22 11:59 AM, Bharata B Rao wrote:
>>>>>> I was experimenting with this patchset and found this behaviour.
>>>>>> Here's what I did:
>>>>>>
>>>>>> Boot a KVM guest with vNVDIMM device which ends up with device_dax
>>>>>> driver by default.
>>>>>>
>>>>>> Use it as RAM by binding it to dax kmem driver. It now appears as
>>>>>> RAM with a new NUMA node that is put to memtier1 (the existing tier
>>>>>> where DRAM already exists)
>>>>>>
>>>>>
>>>>> That should have placed it in memtier2.
>>>>>
>>>>>> I can move it to memtier2 (MEMORY_RANK_PMEM) manually, but isn't
>>>>>> that expected to happen automatically when a node with dax kmem
>>>>>> device comes up?
>>>>>>
>>>>>
>>>>> This can happen if we have added the same NUMA node to memtier1 before dax kmem driver initialized the pmem memory. Can you check before the above node_set_memory_tier_rank() whether the specific NUMA node is already part of any memory tier?
>>>>
>>>> When we reach node_set_memory_tier_rank(), node1 (that has the pmem device)
>>>> is already part of memtier1 whose nodelist shows 0-1.
>>>>
>>>
>>> can you find out which code path added node1 to memtier1?
>>
>> node_set_memory_tier_rank+0x63/0x80
>> migrate_on_reclaim_callback+0x40/0x4d
>> blocking_notifier_call_chain+0x68/0x90
>> memory_notify+0x1b/0x20
>> online_pages+0x257/0x2f0
>> memory_subsys_online+0x99/0x150
>> device_online+0x65/0x90
>> online_memory_block+0x1b/0x20
>> walk_memory_blocks+0x85/0xc0
>> ? generic_online_page+0x40/0x40
>> add_memory_resource+0x1fa/0x2d0
>> add_memory_driver_managed+0x80/0xc0
>> dev_dax_kmem_probe+0x1af/0x250
>> dax_bus_probe+0x6e/0xa0
>>
>> After this the explicit call to node_set_memory_tier_rank(numa_node, MEMORY_RANK_PMEM)
>> from dev_dax_kmem_probe() finds that the memtier is already set.
>>
>>> Do you have regular memory also appearing on node1?
>>
>> No, regular memory is on Node0.
>>
>
> Thanks for the stack trace. I was getting the kvm setup on my laptop to
> test this. We should move node_set_mem_tier() early. You had automatic
> online on memory hotplug
>
> /* online pages if requested */
> if (mhp_default_online_type != MMOP_OFFLINE)
> walk_memory_blocks(start, size, NULL, online_memory_block);
>
>
> which caused memory to be onlined before we could do node_set_mem_tier.
> That is a bug on my side. Will send you a change after testing .
>
Can you try this change?
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 7a11c387fbbc..905609260dda 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -94,6 +94,17 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
goto err_reg_mgid;
data->mgid = rc;
+ /*
+ * This get called before the node is brought online. That
+ * is because depending on the value of mhp_default_online_type
+ * the kernel will online the memory along with hotplug
+ * operation. Add the new memory tier before we try to bring
+ * memory blocks online. Otherwise new node will get added to
+ * the default memory tier via hotplug callbacks.
+ */
+#ifdef CONFIG_TIERED_MEMORY
+ node_set_memory_tier(numa_node, MEMORY_TIER_PMEM);
+#endif
for (i = 0; i < dev_dax->nr_range; i++) {
struct resource *res;
struct range range;
@@ -148,9 +159,6 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
dev_set_drvdata(dev, data);
-#ifdef CONFIG_TIERED_MEMORY
- node_set_memory_tier(numa_node, MEMORY_TIER_PMEM);
-#endif
return 0;
err_request_mem:
next prev parent reply other threads:[~2022-06-06 12:02 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-26 21:22 RFC: Memory Tiering Kernel Interfaces (v3) Wei Xu
2022-05-27 2:58 ` Ying Huang
2022-05-27 14:05 ` Hesham Almatary
2022-05-27 16:25 ` Wei Xu
2022-05-27 12:25 ` [RFC PATCH v4 0/7] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-05-27 12:25 ` [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-05-27 13:59 ` Jonathan Cameron
2022-06-02 6:07 ` Ying Huang
2022-06-06 2:49 ` Ying Huang
2022-06-06 3:56 ` Aneesh Kumar K V
2022-06-06 5:33 ` Ying Huang
2022-06-06 6:01 ` Aneesh Kumar K V
2022-06-06 6:27 ` Aneesh Kumar K.V
2022-06-06 7:53 ` Ying Huang
2022-06-06 8:01 ` Aneesh Kumar K V
2022-06-06 8:52 ` Ying Huang
2022-06-06 9:02 ` Aneesh Kumar K V
2022-06-08 1:24 ` Ying Huang
2022-06-08 7:16 ` Ying Huang
2022-06-08 8:24 ` Aneesh Kumar K V
2022-06-08 8:27 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 2/7] mm/demotion: Expose per node memory tier to sysfs Aneesh Kumar K.V
2022-05-27 14:15 ` Jonathan Cameron
2022-06-03 8:40 ` Aneesh Kumar K V
2022-06-06 14:59 ` Jonathan Cameron
2022-06-06 16:01 ` Aneesh Kumar K V
2022-06-06 16:16 ` Jonathan Cameron
2022-06-06 16:39 ` Aneesh Kumar K V
2022-06-06 17:46 ` Aneesh Kumar K.V
2022-06-07 14:32 ` Jonathan Cameron
2022-05-28 1:33 ` kernel test robot
2022-06-08 7:18 ` Ying Huang
2022-06-08 8:25 ` Aneesh Kumar K V
2022-06-08 8:29 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 3/7] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-05-27 14:31 ` Jonathan Cameron
2022-05-30 3:35 ` [mm/demotion] 8ebccd60c2: BUG:sleeping_function_called_from_invalid_context_at_mm/compaction.c kernel test robot
2022-05-30 3:35 ` kernel test robot
2022-05-27 12:25 ` [RFC PATCH v4 4/7] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Aneesh Kumar K.V
2022-06-01 6:29 ` Bharata B Rao
2022-06-01 13:49 ` Aneesh Kumar K V
2022-06-02 6:36 ` Bharata B Rao
2022-06-03 9:04 ` Aneesh Kumar K V
2022-06-06 10:11 ` Bharata B Rao
2022-06-06 10:16 ` Aneesh Kumar K V
2022-06-06 11:54 ` Aneesh Kumar K.V [this message]
2022-06-06 12:09 ` Bharata B Rao
2022-06-06 13:00 ` Aneesh Kumar K V
2022-05-27 12:25 ` [RFC PATCH v4 5/7] mm/demotion: Add support to associate rank with memory tier Aneesh Kumar K.V
2022-05-27 14:45 ` Jonathan Cameron
2022-05-27 15:45 ` Aneesh Kumar K V
2022-05-30 12:36 ` Jonathan Cameron
2022-06-02 6:41 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 6/7] mm/demotion: Add support for removing node from demotion memory tiers Aneesh Kumar K.V
2022-06-02 6:43 ` Ying Huang
2022-05-27 12:25 ` [RFC PATCH v4 7/7] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-05-27 15:03 ` Jonathan Cameron
2022-06-02 7:35 ` Ying Huang
2022-06-03 15:09 ` Aneesh Kumar K V
2022-06-06 0:43 ` Ying Huang
2022-06-06 4:07 ` Aneesh Kumar K V
2022-06-06 5:26 ` Ying Huang
2022-06-06 6:21 ` Aneesh Kumar K.V
2022-06-06 7:42 ` Ying Huang
2022-06-06 8:02 ` Aneesh Kumar K V
2022-06-06 8:06 ` Ying Huang
2022-06-06 17:07 ` Yang Shi
2022-05-27 13:40 ` RFC: Memory Tiering Kernel Interfaces (v3) Aneesh Kumar K V
2022-05-27 16:30 ` Wei Xu
2022-05-29 4:31 ` Ying Huang
2022-05-30 12:50 ` Jonathan Cameron
2022-05-31 1:57 ` Ying Huang
2022-06-07 19:25 ` Tim Chen
2022-06-08 4:41 ` Aneesh Kumar K V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fski80sx.fsf@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=bharata@amd.com \
--cc=brice.goglin@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=feng.tang@intel.com \
--cc=gthelen@google.com \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.