* [PATCH v6 0/3] Enhance sysfs handling for memory hotplug in weighted interleave
@ 2025-04-04 7:46 Rakie Kim
2025-04-04 7:46 ` [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Rakie Kim @ 2025-04-04 7:46 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
The following patch series enhances the weighted interleave policy in the
memory management subsystem by improving sysfs handling, fixing memory leaks,
and introducing dynamic sysfs updates for memory hotplug support.
### Background
The weighted interleave policy distributes memory allocations across multiple
NUMA nodes based on their performance weight, thereby optimizing memory
bandwidth utilization. The weight values are configured through sysfs.
Previously, sysfs entries for weighted interleave were managed statically
at initialization. This led to several issues:
- Memory Leaks: Improper `kobject` teardown caused memory leaks
when initialization failed or when nodes were removed.
- Lack of Dynamic Updates: Sysfs attributes were created only during
initialization, preventing nodes added at runtime from being recognized.
- Handling of Unusable Nodes: Sysfs entries were generated for all
possible nodes (`N_POSSIBLE`), including memoryless or unavailable nodes,
leading to sysfs entries for unusable nodes and potential
misconfigurations.
### Patch Overview
1. [PATCH 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
- Ensures proper cleanup of `kobject` allocations.
- Adds `kobject_del()` before `kobject_put()` to clean up sysfs state correctly.
- Prevents memory/resource leaks and improves teardown behavior.
2. [PATCH 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
- Refactors static sysfs layout into a new `sysfs_wi_group` structure.
- Makes per-node sysfs attributes accessible to external modules.
- Lays groundwork for future hotplug support by enabling runtime modification.
3. [PATCH 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
- Dynamically adds/removes sysfs entries when nodes are online/offline.
- Limits sysfs creation to nodes with memory, avoiding unusable node entries.
- Hooks into memory hotplug notifier for runtime updates.
These patches have been tested under CXL-based memory configurations,
including hotplug scenarios, to ensure proper behavior and stability.
mm/mempolicy.c | 194 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 126 insertions(+), 68 deletions(-)
base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
--
2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
2025-04-04 7:46 [PATCH v6 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
@ 2025-04-04 7:46 ` Rakie Kim
2025-04-04 12:59 ` Jonathan Cameron
2025-04-04 7:46 ` [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
2025-04-04 7:46 ` [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2 siblings, 1 reply; 15+ messages in thread
From: Rakie Kim @ 2025-04-04 7:46 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
Memory leaks occurred when removing sysfs attributes for weighted
interleave. Improper kobject deallocation led to unreleased memory
when initialization failed or when nodes were removed.
This patch resolves the issue by replacing unnecessary `kfree()`
calls with proper `kobject_del()` and `kobject_put()` sequences,
ensuring correct teardown and preventing memory leaks.
By explicitly calling `kobject_del()` before `kobject_put()`,
the release function is now invoked safely, and internal sysfs
state is correctly cleaned up. This guarantees that the memory
associated with the kobject is fully released and avoids
resource leaks, thereby improving system stability.
Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
---
mm/mempolicy.c | 64 +++++++++++++++++++++++++++-----------------------
1 file changed, 34 insertions(+), 30 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index bbaadbeeb291..af3753925573 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3448,7 +3448,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
for (i = 0; i < nr_node_ids; i++)
sysfs_wi_node_release(node_attrs[i], wi_kobj);
- kobject_put(wi_kobj);
+
+ kfree(node_attrs);
+ kfree(wi_kobj);
}
static const struct kobj_type wi_ktype = {
@@ -3494,15 +3496,22 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
struct kobject *wi_kobj;
int nid, err;
- wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
- if (!wi_kobj)
+ node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+ GFP_KERNEL);
+ if (!node_attrs)
return -ENOMEM;
+ wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+ if (!wi_kobj) {
+ err = -ENOMEM;
+ goto node_out;
+ }
+
err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
"weighted_interleave");
if (err) {
- kfree(wi_kobj);
- return err;
+ kobject_put(wi_kobj);
+ goto err_out;
}
for_each_node_state(nid, N_POSSIBLE) {
@@ -3512,9 +3521,18 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
break;
}
}
- if (err)
+ if (err) {
+ kobject_del(wi_kobj);
kobject_put(wi_kobj);
+ goto err_out;
+ }
+
return 0;
+
+node_out:
+ kfree(node_attrs);
+err_out:
+ return err;
}
static void mempolicy_kobj_release(struct kobject *kobj)
@@ -3528,7 +3546,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
mutex_unlock(&iw_table_lock);
synchronize_rcu();
kfree(old);
- kfree(node_attrs);
kfree(kobj);
}
@@ -3542,37 +3559,24 @@ static int __init mempolicy_sysfs_init(void)
static struct kobject *mempolicy_kobj;
mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
- if (!mempolicy_kobj) {
- err = -ENOMEM;
- goto err_out;
- }
-
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs) {
- err = -ENOMEM;
- goto mempol_out;
- }
+ if (!mempolicy_kobj)
+ return -ENOMEM;
err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
"mempolicy");
if (err)
- goto node_out;
+ goto err_out;
err = add_weighted_interleave_group(mempolicy_kobj);
- if (err) {
- pr_err("mempolicy sysfs structure failed to initialize\n");
- kobject_put(mempolicy_kobj);
- return err;
- }
+ if (err)
+ goto err_del;
- return err;
-node_out:
- kfree(node_attrs);
-mempol_out:
- kfree(mempolicy_kobj);
+ return 0;
+
+err_del:
+ kobject_del(mempolicy_kobj);
err_out:
- pr_err("failed to add mempolicy kobject to the system\n");
+ kobject_put(mempolicy_kobj);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-04 7:46 [PATCH v6 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-04-04 7:46 ` [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-04-04 7:46 ` Rakie Kim
2025-04-04 13:05 ` Jonathan Cameron
2025-04-04 7:46 ` [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2 siblings, 1 reply; 15+ messages in thread
From: Rakie Kim @ 2025-04-04 7:46 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
Previously, the weighted interleave sysfs structure was statically
managed during initialization. This prevented new nodes from being
recognized when memory hotplug events occurred, limiting the ability
to update or extend sysfs entries dynamically at runtime.
To address this, this patch refactors the sysfs infrastructure and
encapsulates it within a new structure, `sysfs_wi_group`, which holds
both the kobject and an array of node attribute pointers.
By allocating this group structure globally, the per-node sysfs
attributes can be managed beyond initialization time, enabling
external modules to insert or remove node entries in response to
events such as memory hotplug or node online/offline transitions.
Instead of allocating all per-node sysfs attributes at once, the
initialization path now uses the existing sysfs_wi_node_add() and
sysfs_wi_node_delete() helpers. This refactoring makes it possible
to modularly manage per-node sysfs entries and ensures the
infrastructure is ready for runtime extension.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
---
mm/mempolicy.c | 73 ++++++++++++++++++++++----------------------------
1 file changed, 32 insertions(+), 41 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index af3753925573..73a9405ff352 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3388,6 +3388,13 @@ struct iw_node_attr {
int nid;
};
+struct sysfs_wi_group {
+ struct kobject wi_kobj;
+ struct iw_node_attr *nattrs[];
+};
+
+static struct sysfs_wi_group *wi_group;
+
static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
@@ -3430,27 +3437,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
return count;
}
-static struct iw_node_attr **node_attrs;
-
-static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
- struct kobject *parent)
+static void sysfs_wi_node_delete(int nid)
{
- if (!node_attr)
+ if (!wi_group->nattrs[nid])
return;
- sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
- kfree(node_attr->kobj_attr.attr.name);
- kfree(node_attr);
+
+ sysfs_remove_file(&wi_group->wi_kobj,
+ &wi_group->nattrs[nid]->kobj_attr.attr);
+ kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
+ kfree(wi_group->nattrs[nid]);
}
static void sysfs_wi_release(struct kobject *wi_kobj)
{
- int i;
-
- for (i = 0; i < nr_node_ids; i++)
- sysfs_wi_node_release(node_attrs[i], wi_kobj);
+ int nid;
- kfree(node_attrs);
- kfree(wi_kobj);
+ for (nid = 0; nid < nr_node_ids; nid++)
+ sysfs_wi_node_delete(nid);
+ kfree(wi_group);
}
static const struct kobj_type wi_ktype = {
@@ -3458,7 +3462,7 @@ static const struct kobj_type wi_ktype = {
.release = sysfs_wi_release,
};
-static int add_weight_node(int nid, struct kobject *wi_kobj)
+static int sysfs_wi_node_add(int nid)
{
struct iw_node_attr *node_attr;
char *name;
@@ -3480,58 +3484,45 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
node_attr->kobj_attr.store = node_store;
node_attr->nid = nid;
- if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
+ if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
kfree(node_attr->kobj_attr.attr.name);
kfree(node_attr);
pr_err("failed to add attribute to weighted_interleave\n");
return -ENOMEM;
}
- node_attrs[nid] = node_attr;
+ wi_group->nattrs[nid] = node_attr;
return 0;
}
-static int add_weighted_interleave_group(struct kobject *root_kobj)
+static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
{
- struct kobject *wi_kobj;
int nid, err;
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs)
+ wi_group = kzalloc(struct_size(wi_group, nattrs, nr_node_ids),
+ GFP_KERNEL);
+ if (!wi_group)
return -ENOMEM;
- wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
- if (!wi_kobj) {
- err = -ENOMEM;
- goto node_out;
- }
-
- err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
+ err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
"weighted_interleave");
- if (err) {
- kobject_put(wi_kobj);
+ if (err)
goto err_out;
- }
for_each_node_state(nid, N_POSSIBLE) {
- err = add_weight_node(nid, wi_kobj);
+ err = sysfs_wi_node_add(nid);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- break;
+ goto err_del;
}
}
- if (err) {
- kobject_del(wi_kobj);
- kobject_put(wi_kobj);
- goto err_out;
- }
return 0;
-node_out:
- kfree(node_attrs);
+err_del:
+ kobject_del(&wi_group->wi_kobj);
err_out:
+ kobject_put(&wi_group->wi_kobj);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-04 7:46 [PATCH v6 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-04-04 7:46 ` [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-04-04 7:46 ` [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
@ 2025-04-04 7:46 ` Rakie Kim
2025-04-04 8:43 ` Oscar Salvador
2025-04-04 20:45 ` David Hildenbrand
2 siblings, 2 replies; 15+ messages in thread
From: Rakie Kim @ 2025-04-04 7:46 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
The weighted interleave policy distributes page allocations across multiple
NUMA nodes based on their performance weight, thereby improving memory
bandwidth utilization. The weight values for each node are configured
through sysfs.
Previously, sysfs entries for configuring weighted interleave were created
for all possible nodes (N_POSSIBLE) at initialization, including nodes that
might not have memory. However, not all nodes in N_POSSIBLE are usable at
runtime, as some may remain memoryless or offline.
This led to sysfs entries being created for unusable nodes, causing
potential misconfiguration issues.
To address this issue, this patch modifies the sysfs creation logic to:
1) Limit sysfs entries to nodes that are online and have memory, avoiding
the creation of sysfs entries for nodes that cannot be used.
2) Support memory hotplug by dynamically adding and removing sysfs entries
based on whether a node transitions into or out of the N_MEMORY state.
Additionally, the patch ensures that sysfs attributes are properly managed
when nodes go offline, preventing stale or redundant entries from persisting
in the system.
By making these changes, the weighted interleave policy now manages its
sysfs entries more efficiently, ensuring that only relevant nodes are
considered for interleaving, and dynamically adapting to memory hotplug
events.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
---
mm/mempolicy.c | 109 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 86 insertions(+), 23 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 73a9405ff352..f25c2c7f8fcf 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -113,6 +113,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include <linux/uaccess.h>
+#include <linux/memory.h>
#include "internal.h"
@@ -3390,6 +3391,7 @@ struct iw_node_attr {
struct sysfs_wi_group {
struct kobject wi_kobj;
+ struct mutex kobj_lock;
struct iw_node_attr *nattrs[];
};
@@ -3439,13 +3441,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
static void sysfs_wi_node_delete(int nid)
{
- if (!wi_group->nattrs[nid])
+ struct iw_node_attr *attr;
+
+ if (nid < 0 || nid >= nr_node_ids)
+ return;
+
+ mutex_lock(&wi_group->kobj_lock);
+ attr = wi_group->nattrs[nid];
+ if (!attr) {
+ mutex_unlock(&wi_group->kobj_lock);
return;
+ }
+
+ wi_group->nattrs[nid] = NULL;
+ mutex_unlock(&wi_group->kobj_lock);
- sysfs_remove_file(&wi_group->wi_kobj,
- &wi_group->nattrs[nid]->kobj_attr.attr);
- kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
- kfree(wi_group->nattrs[nid]);
+ sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
+ kfree(attr->kobj_attr.attr.name);
+ kfree(attr);
}
static void sysfs_wi_release(struct kobject *wi_kobj)
@@ -3464,35 +3477,80 @@ static const struct kobj_type wi_ktype = {
static int sysfs_wi_node_add(int nid)
{
- struct iw_node_attr *node_attr;
+ int ret = 0;
char *name;
+ struct iw_node_attr *new_attr = NULL;
- node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
- if (!node_attr)
+ if (nid < 0 || nid >= nr_node_ids) {
+ pr_err("Invalid node id: %d\n", nid);
+ return -EINVAL;
+ }
+
+ new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
+ if (!new_attr)
return -ENOMEM;
name = kasprintf(GFP_KERNEL, "node%d", nid);
if (!name) {
- kfree(node_attr);
+ kfree(new_attr);
return -ENOMEM;
}
- sysfs_attr_init(&node_attr->kobj_attr.attr);
- node_attr->kobj_attr.attr.name = name;
- node_attr->kobj_attr.attr.mode = 0644;
- node_attr->kobj_attr.show = node_show;
- node_attr->kobj_attr.store = node_store;
- node_attr->nid = nid;
+ mutex_lock(&wi_group->kobj_lock);
+ if (wi_group->nattrs[nid]) {
+ mutex_unlock(&wi_group->kobj_lock);
+ pr_info("Node [%d] already exists\n", nid);
+ kfree(new_attr);
+ kfree(name);
+ return 0;
+ }
+ wi_group->nattrs[nid] = new_attr;
- if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
- kfree(node_attr->kobj_attr.attr.name);
- kfree(node_attr);
- pr_err("failed to add attribute to weighted_interleave\n");
- return -ENOMEM;
+ sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
+ wi_group->nattrs[nid]->kobj_attr.attr.name = name;
+ wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
+ wi_group->nattrs[nid]->kobj_attr.show = node_show;
+ wi_group->nattrs[nid]->kobj_attr.store = node_store;
+ wi_group->nattrs[nid]->nid = nid;
+
+ ret = sysfs_create_file(&wi_group->wi_kobj,
+ &wi_group->nattrs[nid]->kobj_attr.attr);
+ if (ret) {
+ kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
+ kfree(wi_group->nattrs[nid]);
+ wi_group->nattrs[nid] = NULL;
+ pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
}
+ mutex_unlock(&wi_group->kobj_lock);
- wi_group->nattrs[nid] = node_attr;
- return 0;
+ return ret;
+}
+
+static int wi_node_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ int err;
+ struct memory_notify *arg = data;
+ int nid = arg->status_change_nid;
+
+ if (nid < 0)
+ goto notifier_end;
+
+ switch(action) {
+ case MEM_ONLINE:
+ err = sysfs_wi_node_add(nid);
+ if (err) {
+ pr_err("failed to add sysfs [node%d]\n", nid);
+ return NOTIFY_BAD;
+ }
+ break;
+ case MEM_OFFLINE:
+ sysfs_wi_node_delete(nid);
+ break;
+ }
+
+notifier_end:
+ return NOTIFY_OK;
}
static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
@@ -3503,13 +3561,17 @@ static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
GFP_KERNEL);
if (!wi_group)
return -ENOMEM;
+ mutex_init(&wi_group->kobj_lock);
err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
"weighted_interleave");
if (err)
goto err_out;
- for_each_node_state(nid, N_POSSIBLE) {
+ for_each_online_node(nid) {
+ if (!node_state(nid, N_MEMORY))
+ continue;
+
err = sysfs_wi_node_add(nid);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
@@ -3517,6 +3579,7 @@ static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
}
}
+ hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
return 0;
err_del:
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-04 7:46 ` [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
@ 2025-04-04 8:43 ` Oscar Salvador
2025-04-07 9:37 ` Rakie Kim
2025-04-04 20:45 ` David Hildenbrand
1 sibling, 1 reply; 15+ messages in thread
From: Oscar Salvador @ 2025-04-04 8:43 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
honggyu.kim, yunjeong.mun
On Fri, Apr 04, 2025 at 04:46:21PM +0900, Rakie Kim wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
>
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> based on whether a node transitions into or out of the N_MEMORY state.
>
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
>
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
For the memory-hotplug bits: Reviewed-by: Oscar Salvador
<osalvador@suse.de<
Just one thing that caught my eye:
Cannot add_weighted_interleave_group be __init? AFAICS, it only gets
called at boot time?
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
2025-04-04 7:46 ` [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-04-04 12:59 ` Jonathan Cameron
2025-04-07 9:37 ` Rakie Kim
0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2025-04-04 12:59 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, osalvador, kernel_team,
honggyu.kim, yunjeong.mun
On Fri, 4 Apr 2025 16:46:19 +0900
Rakie Kim <rakie.kim@sk.com> wrote:
> Memory leaks occurred when removing sysfs attributes for weighted
> interleave. Improper kobject deallocation led to unreleased memory
> when initialization failed or when nodes were removed.
>
> This patch resolves the issue by replacing unnecessary `kfree()`
> calls with proper `kobject_del()` and `kobject_put()` sequences,
> ensuring correct teardown and preventing memory leaks.
>
> By explicitly calling `kobject_del()` before `kobject_put()`,
> the release function is now invoked safely, and internal sysfs
> state is correctly cleaned up. This guarantees that the memory
> associated with the kobject is fully released and avoids
> resource leaks, thereby improving system stability.
>
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
I've pretty much forgotten earlier discussions so apologies if I revisit
old ground!
The fix is fine I think. But the resulting code structure
could be improved, without (I think) complicating what is here by much.
> ---
> mm/mempolicy.c | 64 +++++++++++++++++++++++++++-----------------------
> 1 file changed, 34 insertions(+), 30 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index bbaadbeeb291..af3753925573 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3448,7 +3448,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
>
> for (i = 0; i < nr_node_ids; i++)
> sysfs_wi_node_release(node_attrs[i], wi_kobj);
> - kobject_put(wi_kobj);
> +
> + kfree(node_attrs);
> + kfree(wi_kobj);
> }
>
> static const struct kobj_type wi_ktype = {
> @@ -3494,15 +3496,22 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> struct kobject *wi_kobj;
> int nid, err;
>
> - wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> - if (!wi_kobj)
> + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> + GFP_KERNEL);
> + if (!node_attrs)
> return -ENOMEM;
>
> + wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> + if (!wi_kobj) {
> + err = -ENOMEM;
> + goto node_out;
As this is only place where we do kfree(node_attrs)
why not just do that here and return directly.
> + }
> +
> err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> "weighted_interleave");
> if (err) {
> - kfree(wi_kobj);
> - return err;
> + kobject_put(wi_kobj);
> + goto err_out;
> }
>
> for_each_node_state(nid, N_POSSIBLE) {
> @@ -3512,9 +3521,18 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> break;
> }
> }
> - if (err)
> + if (err) {
> + kobject_del(wi_kobj);
> kobject_put(wi_kobj);
For this and the one above, a unified exit kind of makes sense as
can do two labels and have the put only once.
If not, why not move this up into the loop and return directly?
If you move to an error handling block
err_del_obj:
kobject_del(wi_kobj);
err_put_obj:
kobject_put(wi_kobj);
then you could also do the goto from within that loop.
> + goto err_out;
> + }
> +
> return 0;
> +
> +node_out:
> + kfree(node_attrs);
> +err_out:
> + return err;
> }
>
> static void mempolicy_kobj_release(struct kobject *kobj)
> @@ -3528,7 +3546,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
> mutex_unlock(&iw_table_lock);
> synchronize_rcu();
> kfree(old);
> - kfree(node_attrs);
> kfree(kobj);
> }
>
> @@ -3542,37 +3559,24 @@ static int __init mempolicy_sysfs_init(void)
> static struct kobject *mempolicy_kobj;
>
> mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> - if (!mempolicy_kobj) {
> - err = -ENOMEM;
> - goto err_out;
> - }
> -
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs) {
> - err = -ENOMEM;
> - goto mempol_out;
> - }
> + if (!mempolicy_kobj)
> + return -ENOMEM;
>
> err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> "mempolicy");
> if (err)
> - goto node_out;
> + goto err_out;
goto err_put_kobj;
or something like that.
>
> err = add_weighted_interleave_group(mempolicy_kobj);
> - if (err) {
> - pr_err("mempolicy sysfs structure failed to initialize\n");
> - kobject_put(mempolicy_kobj);
> - return err;
> - }
> + if (err)
> + goto err_del;
>
> - return err;
> -node_out:
> - kfree(node_attrs);
> -mempol_out:
> - kfree(mempolicy_kobj);
> + return 0;
> +
> +err_del:
> + kobject_del(mempolicy_kobj);
> err_out:
> - pr_err("failed to add mempolicy kobject to the system\n");
> + kobject_put(mempolicy_kobj);
If we keep an err_out, usual expectation is all it does is return
+ maybe print a message. We'd not expect a put.
> return err;
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-04 7:46 ` [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
@ 2025-04-04 13:05 ` Jonathan Cameron
2025-04-04 17:23 ` Dan Williams
0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Cameron @ 2025-04-04 13:05 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, osalvador, kernel_team,
honggyu.kim, yunjeong.mun
On Fri, 4 Apr 2025 16:46:20 +0900
Rakie Kim <rakie.kim@sk.com> wrote:
> Previously, the weighted interleave sysfs structure was statically
> managed during initialization. This prevented new nodes from being
> recognized when memory hotplug events occurred, limiting the ability
> to update or extend sysfs entries dynamically at runtime.
>
> To address this, this patch refactors the sysfs infrastructure and
> encapsulates it within a new structure, `sysfs_wi_group`, which holds
> both the kobject and an array of node attribute pointers.
>
> By allocating this group structure globally, the per-node sysfs
> attributes can be managed beyond initialization time, enabling
> external modules to insert or remove node entries in response to
> events such as memory hotplug or node online/offline transitions.
>
> Instead of allocating all per-node sysfs attributes at once, the
> initialization path now uses the existing sysfs_wi_node_add() and
> sysfs_wi_node_delete() helpers. This refactoring makes it possible
> to modularly manage per-node sysfs entries and ensures the
> infrastructure is ready for runtime extension.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
Hi Rakie,
Some things I was requesting in patch 1 are done here.
Mostly I think what is wanted is moving some of that
refactoring back to that patch rather than here.
Some of the label and function naming needs another look.
Jonathan
> ---
> mm/mempolicy.c | 73 ++++++++++++++++++++++----------------------------
> 1 file changed, 32 insertions(+), 41 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index af3753925573..73a9405ff352 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3388,6 +3388,13 @@ struct iw_node_attr {
> int nid;
> };
>
> +struct sysfs_wi_group {
> + struct kobject wi_kobj;
> + struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *wi_group;
> +
> static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> char *buf)
> {
> @@ -3430,27 +3437,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> return count;
> }
>
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> - struct kobject *parent)
> +static void sysfs_wi_node_delete(int nid)
Maybe stick to release naming to match the sysfs_wi_release()
below? I don't really care about this.
> {
> - if (!node_attr)
> + if (!wi_group->nattrs[nid])
> return;
> - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> +
> + sysfs_remove_file(&wi_group->wi_kobj,
> + &wi_group->nattrs[nid]->kobj_attr.attr);
> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> + kfree(wi_group->nattrs[nid]);
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> {
> - int i;
> -
> - for (i = 0; i < nr_node_ids; i++)
> - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> + int nid;
>
> - kfree(node_attrs);
> - kfree(wi_kobj);
> + for (nid = 0; nid < nr_node_ids; nid++)
> + sysfs_wi_node_delete(nid);
> + kfree(wi_group);
> }
> -static int add_weighted_interleave_group(struct kobject *root_kobj)
> +static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
> {
> - struct kobject *wi_kobj;
> int nid, err;
>
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs)
> + wi_group = kzalloc(struct_size(wi_group, nattrs, nr_node_ids),
> + GFP_KERNEL);
Align to after the (
I think it's a couple of spaces short of that.
> + if (!wi_group)
> return -ENOMEM;
>
> - wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> - if (!wi_kobj) {
> - err = -ENOMEM;
> - goto node_out;
> - }
> -
> - err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> + err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
> "weighted_interleave");
> - if (err) {
> - kobject_put(wi_kobj);
> + if (err)
> goto err_out;
> - }
>
> for_each_node_state(nid, N_POSSIBLE) {
> - err = add_weight_node(nid, wi_kobj);
> + err = sysfs_wi_node_add(nid);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> - break;
> + goto err_del;
Ah! This is what I was looking for in patch 1, but it's down here.
Move it back to there.
> }
> }
> - if (err) {
> - kobject_del(wi_kobj);
> - kobject_put(wi_kobj);
> - goto err_out;
> - }
>
> return 0;
>
> -node_out:
> - kfree(node_attrs);
> +err_del:
> + kobject_del(&wi_group->wi_kobj);
> err_out:
> + kobject_put(&wi_group->wi_kobj);
Same issue as previous patch on naming of the label.
Moving to this single error block is fine but belongs in patch 1.
> return err;
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-04 13:05 ` Jonathan Cameron
@ 2025-04-04 17:23 ` Dan Williams
2025-04-07 9:38 ` Rakie Kim
2025-04-07 9:49 ` Jonathan Cameron
0 siblings, 2 replies; 15+ messages in thread
From: Dan Williams @ 2025-04-04 17:23 UTC (permalink / raw)
To: Jonathan Cameron, Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, osalvador, kernel_team,
honggyu.kim, yunjeong.mun
Jonathan Cameron wrote:
> On Fri, 4 Apr 2025 16:46:20 +0900
> Rakie Kim <rakie.kim@sk.com> wrote:
>
> > Previously, the weighted interleave sysfs structure was statically
> > managed during initialization. This prevented new nodes from being
> > recognized when memory hotplug events occurred, limiting the ability
> > to update or extend sysfs entries dynamically at runtime.
> >
> > To address this, this patch refactors the sysfs infrastructure and
> > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > both the kobject and an array of node attribute pointers.
> >
> > By allocating this group structure globally, the per-node sysfs
> > attributes can be managed beyond initialization time, enabling
> > external modules to insert or remove node entries in response to
> > events such as memory hotplug or node online/offline transitions.
> >
> > Instead of allocating all per-node sysfs attributes at once, the
> > initialization path now uses the existing sysfs_wi_node_add() and
> > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > to modularly manage per-node sysfs entries and ensures the
> > infrastructure is ready for runtime extension.
> >
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > Reviewed-by: Gregory Price <gourry@gourry.net>
> Hi Rakie,
>
> Some things I was requesting in patch 1 are done here.
> Mostly I think what is wanted is moving some of that
> refactoring back to that patch rather than here.
>
> Some of the label and function naming needs another look.
>
> Jonathan
[..]
> > @@ -3430,27 +3437,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > return count;
> > }
> >
> > -static struct iw_node_attr **node_attrs;
> > -
> > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > - struct kobject *parent)
> > +static void sysfs_wi_node_delete(int nid)
>
> Maybe stick to release naming to match the sysfs_wi_release()
> below? I don't really care about this.
I had asked for "delete" to pair with "add" and to not get confused with
a final kobject_put() callback.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-04 7:46 ` [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-04-04 8:43 ` Oscar Salvador
@ 2025-04-04 20:45 ` David Hildenbrand
2025-04-07 9:39 ` Rakie Kim
1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2025-04-04 20:45 UTC (permalink / raw)
To: Rakie Kim, akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun
On 04.04.25 09:46, Rakie Kim wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
>
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> based on whether a node transitions into or out of the N_MEMORY state.
>
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
>
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> ---
> mm/mempolicy.c | 109 ++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 86 insertions(+), 23 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 73a9405ff352..f25c2c7f8fcf 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -113,6 +113,7 @@
> #include <asm/tlbflush.h>
> #include <asm/tlb.h>
> #include <linux/uaccess.h>
> +#include <linux/memory.h>
>
> #include "internal.h"
>
> @@ -3390,6 +3391,7 @@ struct iw_node_attr {
>
> struct sysfs_wi_group {
> struct kobject wi_kobj;
> + struct mutex kobj_lock;
> struct iw_node_attr *nattrs[];
> };
>
> @@ -3439,13 +3441,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>
> static void sysfs_wi_node_delete(int nid)
> {
> - if (!wi_group->nattrs[nid])
> + struct iw_node_attr *attr;
> +
> + if (nid < 0 || nid >= nr_node_ids)
> + return;
> +
> + mutex_lock(&wi_group->kobj_lock);
> + attr = wi_group->nattrs[nid];
> + if (!attr) {
> + mutex_unlock(&wi_group->kobj_lock);
> return;
> + }
> +
> + wi_group->nattrs[nid] = NULL;
> + mutex_unlock(&wi_group->kobj_lock);
>
> - sysfs_remove_file(&wi_group->wi_kobj,
> - &wi_group->nattrs[nid]->kobj_attr.attr);
> - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> - kfree(wi_group->nattrs[nid]);
> + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
> + kfree(attr->kobj_attr.attr.name);
> + kfree(attr);
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> @@ -3464,35 +3477,80 @@ static const struct kobj_type wi_ktype = {
>
> static int sysfs_wi_node_add(int nid)
> {
> - struct iw_node_attr *node_attr;
> + int ret = 0;
> char *name;
> + struct iw_node_attr *new_attr = NULL;
>
> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> - if (!node_attr)
> + if (nid < 0 || nid >= nr_node_ids) {
> + pr_err("Invalid node id: %d\n", nid);
> + return -EINVAL;
> + }
> +
> + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
> + if (!new_attr)
> return -ENOMEM;
>
> name = kasprintf(GFP_KERNEL, "node%d", nid);
> if (!name) {
> - kfree(node_attr);
> + kfree(new_attr);
> return -ENOMEM;
> }
>
> - sysfs_attr_init(&node_attr->kobj_attr.attr);
> - node_attr->kobj_attr.attr.name = name;
> - node_attr->kobj_attr.attr.mode = 0644;
> - node_attr->kobj_attr.show = node_show;
> - node_attr->kobj_attr.store = node_store;
> - node_attr->nid = nid;
> + mutex_lock(&wi_group->kobj_lock);
> + if (wi_group->nattrs[nid]) {
> + mutex_unlock(&wi_group->kobj_lock);
> + pr_info("Node [%d] already exists\n", nid);
> + kfree(new_attr);
> + kfree(name);
> + return 0;
> + }
> + wi_group->nattrs[nid] = new_attr;
>
> - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> - pr_err("failed to add attribute to weighted_interleave\n");
> - return -ENOMEM;
> + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
> + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
> + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
> + wi_group->nattrs[nid]->kobj_attr.show = node_show;
> + wi_group->nattrs[nid]->kobj_attr.store = node_store;
> + wi_group->nattrs[nid]->nid = nid;
> +
> + ret = sysfs_create_file(&wi_group->wi_kobj,
> + &wi_group->nattrs[nid]->kobj_attr.attr);
> + if (ret) {
> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> + kfree(wi_group->nattrs[nid]);
> + wi_group->nattrs[nid] = NULL;
> + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
> }
> + mutex_unlock(&wi_group->kobj_lock);
>
> - wi_group->nattrs[nid] = node_attr;
> - return 0;
> + return ret;
> +}
> +
> +static int wi_node_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + int err;
> + struct memory_notify *arg = data;
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + goto notifier_end;
> +
> + switch(action) {
> + case MEM_ONLINE:
MEM_ONLINE is too late, we cannot fail hotplug at that point.
Would MEM_GOING_ONLINE / MEM_CANCEL_ONLINE be better?
> + err = sysfs_wi_node_add(nid);
> + if (err) {
> + pr_err("failed to add sysfs [node%d]\n", nid);
> + return NOTIFY_BAD;
Note that NOTIFY_BAD includes NOTIFY_STOP_MASK. So you wouldn't call
other notifiers, but the overall memory onlining would succeed, which is
bad.
If we don't care about the error (not prevent hotplug) we could only
pr_warn() and continue. Maybe this (unlikely) case is not a good reason
to stop memory from getting onlined. OTOH, it will barely ever trigger
in practice ...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-04 8:43 ` Oscar Salvador
@ 2025-04-07 9:37 ` Rakie Kim
0 siblings, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-04-07 9:37 UTC (permalink / raw)
To: Oscar Salvador
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, kernel_team,
honggyu.kim, yunjeong.mun, Rakie Kim
On Fri, 4 Apr 2025 10:43:18 +0200 Oscar Salvador <osalvador@suse.de> wrote:
> On Fri, Apr 04, 2025 at 04:46:21PM +0900, Rakie Kim wrote:
> > The weighted interleave policy distributes page allocations across multiple
> > NUMA nodes based on their performance weight, thereby improving memory
> > bandwidth utilization. The weight values for each node are configured
> > through sysfs.
> >
> > Previously, sysfs entries for configuring weighted interleave were created
> > for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> > might not have memory. However, not all nodes in N_POSSIBLE are usable at
> > runtime, as some may remain memoryless or offline.
> > This led to sysfs entries being created for unusable nodes, causing
> > potential misconfiguration issues.
> >
> > To address this issue, this patch modifies the sysfs creation logic to:
> > 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> > the creation of sysfs entries for nodes that cannot be used.
> > 2) Support memory hotplug by dynamically adding and removing sysfs entries
> > based on whether a node transitions into or out of the N_MEMORY state.
> >
> > Additionally, the patch ensures that sysfs attributes are properly managed
> > when nodes go offline, preventing stale or redundant entries from persisting
> > in the system.
> >
> > By making these changes, the weighted interleave policy now manages its
> > sysfs entries more efficiently, ensuring that only relevant nodes are
> > considered for interleaving, and dynamically adapting to memory hotplug
> > events.
> >
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>
> For the memory-hotplug bits: Reviewed-by: Oscar Salvador
> <osalvador@suse.de<
>
> Just one thing that caught my eye:
>
> Cannot add_weighted_interleave_group be __init? AFAICS, it only gets
> called at boot time?
>
>
> --
> Oscar Salvador
> SUSE Labs
Thank you for your response regarding this patch.
I agree that `add_weighted_interleave_group` can be marked with __init,
as it is only called during system boot.
I will make this change in the next revision.
Rakie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
2025-04-04 12:59 ` Jonathan Cameron
@ 2025-04-07 9:37 ` Rakie Kim
0 siblings, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-04-07 9:37 UTC (permalink / raw)
To: Jonathan Cameron
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, Rakie Kim
On Fri, 4 Apr 2025 13:59:06 +0100 Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> On Fri, 4 Apr 2025 16:46:19 +0900
> Rakie Kim <rakie.kim@sk.com> wrote:
>
> > Memory leaks occurred when removing sysfs attributes for weighted
> > interleave. Improper kobject deallocation led to unreleased memory
> > when initialization failed or when nodes were removed.
> >
> > This patch resolves the issue by replacing unnecessary `kfree()`
> > calls with proper `kobject_del()` and `kobject_put()` sequences,
> > ensuring correct teardown and preventing memory leaks.
> >
> > By explicitly calling `kobject_del()` before `kobject_put()`,
> > the release function is now invoked safely, and internal sysfs
> > state is correctly cleaned up. This guarantees that the memory
> > associated with the kobject is fully released and avoids
> > resource leaks, thereby improving system stability.
> >
> > Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > Reviewed-by: Gregory Price <gourry@gourry.net>
> I've pretty much forgotten earlier discussions so apologies if I revisit
> old ground!
>
> The fix is fine I think. But the resulting code structure
> could be improved, without (I think) complicating what is here by much.
>
Thank you for your response regarding this patch.
I appreciate your willingness to revisit this code and share your
thoughts. Please feel free to provide suggestions anytime.
> > ---
> > mm/mempolicy.c | 64 +++++++++++++++++++++++++++-----------------------
> > 1 file changed, 34 insertions(+), 30 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index bbaadbeeb291..af3753925573 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3448,7 +3448,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
> >
> > for (i = 0; i < nr_node_ids; i++)
> > sysfs_wi_node_release(node_attrs[i], wi_kobj);
> > - kobject_put(wi_kobj);
> > +
> > + kfree(node_attrs);
> > + kfree(wi_kobj);
> > }
> >
> > static const struct kobj_type wi_ktype = {
> > @@ -3494,15 +3496,22 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > struct kobject *wi_kobj;
> > int nid, err;
> >
> > - wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > - if (!wi_kobj)
> > + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > + GFP_KERNEL);
> > + if (!node_attrs)
> > return -ENOMEM;
> >
> > + wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > + if (!wi_kobj) {
> > + err = -ENOMEM;
> > + goto node_out;
> As this is only place where we do kfree(node_attrs)
> why not just do that here and return directly.
Regarding your suggestion:
Is the following change what you had in mind?
wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
if (!wi_kobj) {
kfree(node_attrs);
return -ENOMEM;
}
I will apply this change accordingly.
>
>
> > + }
> > +
> > err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> > "weighted_interleave");
> > if (err) {
> > - kfree(wi_kobj);
> > - return err;
> > + kobject_put(wi_kobj);
> > + goto err_out;
> > }
> >
> > for_each_node_state(nid, N_POSSIBLE) {
> > @@ -3512,9 +3521,18 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > break;
> > }
> > }
> > - if (err)
> > + if (err) {
> > + kobject_del(wi_kobj);
> > kobject_put(wi_kobj);
>
> For this and the one above, a unified exit kind of makes sense as
> can do two labels and have the put only once.
>
> If not, why not move this up into the loop and return directly?
> If you move to an error handling block
>
> err_del_obj:
> kobject_del(wi_kobj);
> err_put_obj:
> kobject_put(wi_kobj);
>
> then you could also do the goto from within that loop.
>
As for your second suggestion about restructuring the error handling,
you are right that having unified labels helps make the flow cleaner.
I will update the code to use:
err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
"weighted_interleave");
if (err)
goto err_put_kobj;
for_each_node_state(nid, N_POSSIBLE) {
err = add_weight_node(nid, wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
goto err_del_kobj;
}
}
err_del_kobj:
kobject_del(wi_kobj);
err_put_kobj:
kobject_put(wi_kobj);
return err;
This structure keeps error handling more consistent and avoids redundancy.
>
> > + goto err_out;
> > + }
> > +
> > return 0;
> > +
> > +node_out:
> > + kfree(node_attrs);
> > +err_out:
> > + return err;
> > }
> >
> > static void mempolicy_kobj_release(struct kobject *kobj)
> > @@ -3528,7 +3546,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
> > mutex_unlock(&iw_table_lock);
> > synchronize_rcu();
> > kfree(old);
> > - kfree(node_attrs);
> > kfree(kobj);
> > }
> >
> > @@ -3542,37 +3559,24 @@ static int __init mempolicy_sysfs_init(void)
> > static struct kobject *mempolicy_kobj;
> >
> > mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> > - if (!mempolicy_kobj) {
> > - err = -ENOMEM;
> > - goto err_out;
> > - }
> > -
> > - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> > - GFP_KERNEL);
> > - if (!node_attrs) {
> > - err = -ENOMEM;
> > - goto mempol_out;
> > - }
> > + if (!mempolicy_kobj)
> > + return -ENOMEM;
> >
> > err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> > "mempolicy");
> > if (err)
> > - goto node_out;
> > + goto err_out;
> goto err_put_kobj;
> or something like that.
>
> >
> > err = add_weighted_interleave_group(mempolicy_kobj);
> > - if (err) {
> > - pr_err("mempolicy sysfs structure failed to initialize\n");
> > - kobject_put(mempolicy_kobj);
> > - return err;
> > - }
> > + if (err)
> > + goto err_del;
> >
> > - return err;
> > -node_out:
> > - kfree(node_attrs);
> > -mempol_out:
> > - kfree(mempolicy_kobj);
> > + return 0;
> > +
> > +err_del:
> > + kobject_del(mempolicy_kobj);
> > err_out:
> > - pr_err("failed to add mempolicy kobject to the system\n");
> > + kobject_put(mempolicy_kobj);
> If we keep an err_out, usual expectation is all it does is return
> + maybe print a message. We'd not expect a put.
Lastly, I agree with your point about `err_out`.
I will revise it to use `err_del_kobj` and `err_put_kobj` as needed.
Thanks again for the detailed review.
Rakie
>
> > return err;
> > }
> >
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-04 17:23 ` Dan Williams
@ 2025-04-07 9:38 ` Rakie Kim
2025-04-07 9:49 ` Jonathan Cameron
1 sibling, 0 replies; 15+ messages in thread
From: Rakie Kim @ 2025-04-07 9:38 UTC (permalink / raw)
To: Dan Williams
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, osalvador, kernel_team, honggyu.kim,
yunjeong.mun, Jonathan Cameron, Rakie Kim
On Fri, 4 Apr 2025 10:23:10 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> Jonathan Cameron wrote:
> > On Fri, 4 Apr 2025 16:46:20 +0900
> > Rakie Kim <rakie.kim@sk.com> wrote:
> >
> > > Previously, the weighted interleave sysfs structure was statically
> > > managed during initialization. This prevented new nodes from being
> > > recognized when memory hotplug events occurred, limiting the ability
> > > to update or extend sysfs entries dynamically at runtime.
> > >
> > > To address this, this patch refactors the sysfs infrastructure and
> > > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > > both the kobject and an array of node attribute pointers.
> > >
> > > By allocating this group structure globally, the per-node sysfs
> > > attributes can be managed beyond initialization time, enabling
> > > external modules to insert or remove node entries in response to
> > > events such as memory hotplug or node online/offline transitions.
> > >
> > > Instead of allocating all per-node sysfs attributes at once, the
> > > initialization path now uses the existing sysfs_wi_node_add() and
> > > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > > to modularly manage per-node sysfs entries and ensures the
> > > infrastructure is ready for runtime extension.
> > >
> > > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > > Reviewed-by: Gregory Price <gourry@gourry.net>
> > Hi Rakie,
> >
> > Some things I was requesting in patch 1 are done here.
> > Mostly I think what is wanted is moving some of that
> > refactoring back to that patch rather than here.
> >
> > Some of the label and function naming needs another look.
> >
> > Jonathan
> [..]
> > > @@ -3430,27 +3437,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > return count;
> > > }
> > >
> > > -static struct iw_node_attr **node_attrs;
> > > -
> > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > - struct kobject *parent)
> > > +static void sysfs_wi_node_delete(int nid)
> >
> > Maybe stick to release naming to match the sysfs_wi_release()
> > below? I don't really care about this.
>
> I had asked for "delete" to pair with "add" and to not get confused with
> a final kobject_put() callback.
As Dan mentioned earlier, I also believe that "sysfs_wi_node_delete"
is a more appropriate name than "sysfs_wi_node_release" because it
pairs naturally with "sysfs_wi_node_add" and avoids confusion with
a final kobject_put() callback.
Rakie
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-04 20:45 ` David Hildenbrand
@ 2025-04-07 9:39 ` Rakie Kim
2025-04-07 10:47 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Rakie Kim @ 2025-04-07 9:39 UTC (permalink / raw)
To: David Hildenbrand
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, Rakie Kim, akpm
On Fri, 4 Apr 2025 22:45:00 +0200 David Hildenbrand <david@redhat.com> wrote:
> On 04.04.25 09:46, Rakie Kim wrote:
> > The weighted interleave policy distributes page allocations across multiple
> > NUMA nodes based on their performance weight, thereby improving memory
> > bandwidth utilization. The weight values for each node are configured
> > through sysfs.
> >
> > Previously, sysfs entries for configuring weighted interleave were created
> > for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> > might not have memory. However, not all nodes in N_POSSIBLE are usable at
> > runtime, as some may remain memoryless or offline.
> > This led to sysfs entries being created for unusable nodes, causing
> > potential misconfiguration issues.
> >
> > To address this issue, this patch modifies the sysfs creation logic to:
> > 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> > the creation of sysfs entries for nodes that cannot be used.
> > 2) Support memory hotplug by dynamically adding and removing sysfs entries
> > based on whether a node transitions into or out of the N_MEMORY state.
> >
> > Additionally, the patch ensures that sysfs attributes are properly managed
> > when nodes go offline, preventing stale or redundant entries from persisting
> > in the system.
> >
> > By making these changes, the weighted interleave policy now manages its
> > sysfs entries more efficiently, ensuring that only relevant nodes are
> > considered for interleaving, and dynamically adapting to memory hotplug
> > events.
> >
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > ---
> > mm/mempolicy.c | 109 ++++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 86 insertions(+), 23 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 73a9405ff352..f25c2c7f8fcf 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -113,6 +113,7 @@
> > #include <asm/tlbflush.h>
> > #include <asm/tlb.h>
> > #include <linux/uaccess.h>
> > +#include <linux/memory.h>
> >
> > #include "internal.h"
> >
> > @@ -3390,6 +3391,7 @@ struct iw_node_attr {
> >
> > struct sysfs_wi_group {
> > struct kobject wi_kobj;
> > + struct mutex kobj_lock;
> > struct iw_node_attr *nattrs[];
> > };
> >
> > @@ -3439,13 +3441,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> >
> > static void sysfs_wi_node_delete(int nid)
> > {
> > - if (!wi_group->nattrs[nid])
> > + struct iw_node_attr *attr;
> > +
> > + if (nid < 0 || nid >= nr_node_ids)
> > + return;
> > +
> > + mutex_lock(&wi_group->kobj_lock);
> > + attr = wi_group->nattrs[nid];
> > + if (!attr) {
> > + mutex_unlock(&wi_group->kobj_lock);
> > return;
> > + }
> > +
> > + wi_group->nattrs[nid] = NULL;
> > + mutex_unlock(&wi_group->kobj_lock);
> >
> > - sysfs_remove_file(&wi_group->wi_kobj,
> > - &wi_group->nattrs[nid]->kobj_attr.attr);
> > - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> > - kfree(wi_group->nattrs[nid]);
> > + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
> > + kfree(attr->kobj_attr.attr.name);
> > + kfree(attr);
> > }
> >
> > static void sysfs_wi_release(struct kobject *wi_kobj)
> > @@ -3464,35 +3477,80 @@ static const struct kobj_type wi_ktype = {
> >
> > static int sysfs_wi_node_add(int nid)
> > {
> > - struct iw_node_attr *node_attr;
> > + int ret = 0;
> > char *name;
> > + struct iw_node_attr *new_attr = NULL;
> >
> > - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> > - if (!node_attr)
> > + if (nid < 0 || nid >= nr_node_ids) {
> > + pr_err("Invalid node id: %d\n", nid);
> > + return -EINVAL;
> > + }
> > +
> > + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
> > + if (!new_attr)
> > return -ENOMEM;
> >
> > name = kasprintf(GFP_KERNEL, "node%d", nid);
> > if (!name) {
> > - kfree(node_attr);
> > + kfree(new_attr);
> > return -ENOMEM;
> > }
> >
> > - sysfs_attr_init(&node_attr->kobj_attr.attr);
> > - node_attr->kobj_attr.attr.name = name;
> > - node_attr->kobj_attr.attr.mode = 0644;
> > - node_attr->kobj_attr.show = node_show;
> > - node_attr->kobj_attr.store = node_store;
> > - node_attr->nid = nid;
> > + mutex_lock(&wi_group->kobj_lock);
> > + if (wi_group->nattrs[nid]) {
> > + mutex_unlock(&wi_group->kobj_lock);
> > + pr_info("Node [%d] already exists\n", nid);
> > + kfree(new_attr);
> > + kfree(name);
> > + return 0;
> > + }
> > + wi_group->nattrs[nid] = new_attr;
> >
> > - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
> > - kfree(node_attr->kobj_attr.attr.name);
> > - kfree(node_attr);
> > - pr_err("failed to add attribute to weighted_interleave\n");
> > - return -ENOMEM;
> > + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
> > + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
> > + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
> > + wi_group->nattrs[nid]->kobj_attr.show = node_show;
> > + wi_group->nattrs[nid]->kobj_attr.store = node_store;
> > + wi_group->nattrs[nid]->nid = nid;
> > +
> > + ret = sysfs_create_file(&wi_group->wi_kobj,
> > + &wi_group->nattrs[nid]->kobj_attr.attr);
> > + if (ret) {
> > + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> > + kfree(wi_group->nattrs[nid]);
> > + wi_group->nattrs[nid] = NULL;
> > + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
> > }
> > + mutex_unlock(&wi_group->kobj_lock);
> >
> > - wi_group->nattrs[nid] = node_attr;
> > - return 0;
> > + return ret;
> > +}
> > +
> > +static int wi_node_notifier(struct notifier_block *nb,
> > + unsigned long action, void *data)
> > +{
> > + int err;
> > + struct memory_notify *arg = data;
> > + int nid = arg->status_change_nid;
> > +
> > + if (nid < 0)
> > + goto notifier_end;
> > +
> > + switch(action) {
> > + case MEM_ONLINE:
>
> MEM_ONLINE is too late, we cannot fail hotplug at that point.
>
> Would MEM_GOING_ONLINE / MEM_CANCEL_ONLINE be better?
Hi David,
Thank you for raising these points. I would appreciate your clarification
on the following:
Issue1: I want to invoke sysfs_wi_node_add() after a node with memory
has been fully transitioned to the online state. Does replacing
MEM_ONLINE with MEM_GOING_ONLINE or MEM_CANCEL_ONLINE still ensure
that the node is considered online and usable by that point?
>
>
> > + err = sysfs_wi_node_add(nid);
> > + if (err) {
> > + pr_err("failed to add sysfs [node%d]\n", nid);
> > + return NOTIFY_BAD;
>
> Note that NOTIFY_BAD includes NOTIFY_STOP_MASK. So you wouldn't call
> other notifiers, but the overall memory onlining would succeed, which is
> bad.
>
> If we don't care about the error (not prevent hotplug) we could only
> pr_warn() and continue. Maybe this (unlikely) case is not a good reason
> to stop memory from getting onlined. OTOH, it will barely ever trigger
> in practice ...
>
Issue2: Regarding your note about NOTIFY_BAD ? are you saying that
if sysfs_wi_node_add() returns NOTIFY_BAD, it will trigger
NOTIFY_STOP_MASK, preventing other notifiers from running, while
still allowing the memory hotplug operation to complete?
If so, then I'm thinking of resolving both issues as follows:
- For Issue1: I keep using MEM_ONLINE, assuming it is safe and
sufficient to ensure the node is fully online.
- For Issue2: I avoid returning NOTIFY_BAD from the notifier.
Instead, I log the error using pr_err() and continue the operation.
This would result in the following code:
if (nid < 0)
return NOTIFY_OK;
switch (action) {
case MEM_ONLINE: // Issue1: keeping this unchanged
err = sysfs_wi_node_add(nid);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
// Issue2: Do not return NOTIFY_BAD
}
break;
case MEM_OFFLINE:
sysfs_wi_node_delete(nid);
break;
}
// Always return NOTIFY_OK
return NOTIFY_OK;
Please let me know if this approach is acceptable.
Rakie
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-04 17:23 ` Dan Williams
2025-04-07 9:38 ` Rakie Kim
@ 2025-04-07 9:49 ` Jonathan Cameron
1 sibling, 0 replies; 15+ messages in thread
From: Jonathan Cameron @ 2025-04-07 9:49 UTC (permalink / raw)
To: Dan Williams
Cc: Rakie Kim, akpm, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, ying.huang, david, osalvador, kernel_team,
honggyu.kim, yunjeong.mun
On Fri, 4 Apr 2025 10:23:10 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> Jonathan Cameron wrote:
> > On Fri, 4 Apr 2025 16:46:20 +0900
> > Rakie Kim <rakie.kim@sk.com> wrote:
> >
> > > Previously, the weighted interleave sysfs structure was statically
> > > managed during initialization. This prevented new nodes from being
> > > recognized when memory hotplug events occurred, limiting the ability
> > > to update or extend sysfs entries dynamically at runtime.
> > >
> > > To address this, this patch refactors the sysfs infrastructure and
> > > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > > both the kobject and an array of node attribute pointers.
> > >
> > > By allocating this group structure globally, the per-node sysfs
> > > attributes can be managed beyond initialization time, enabling
> > > external modules to insert or remove node entries in response to
> > > events such as memory hotplug or node online/offline transitions.
> > >
> > > Instead of allocating all per-node sysfs attributes at once, the
> > > initialization path now uses the existing sysfs_wi_node_add() and
> > > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > > to modularly manage per-node sysfs entries and ensures the
> > > infrastructure is ready for runtime extension.
> > >
> > > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > > Reviewed-by: Gregory Price <gourry@gourry.net>
> > Hi Rakie,
> >
> > Some things I was requesting in patch 1 are done here.
> > Mostly I think what is wanted is moving some of that
> > refactoring back to that patch rather than here.
> >
> > Some of the label and function naming needs another look.
> >
> > Jonathan
> [..]
> > > @@ -3430,27 +3437,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > return count;
> > > }
> > >
> > > -static struct iw_node_attr **node_attrs;
> > > -
> > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > - struct kobject *parent)
> > > +static void sysfs_wi_node_delete(int nid)
> >
> > Maybe stick to release naming to match the sysfs_wi_release()
> > below? I don't really care about this.
>
> I had asked for "delete" to pair with "add" and to not get confused with
> a final kobject_put() callback.
>
Fair enough.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-07 9:39 ` Rakie Kim
@ 2025-04-07 10:47 ` David Hildenbrand
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2025-04-07 10:47 UTC (permalink / raw)
To: Rakie Kim
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, akpm
On 07.04.25 11:39, Rakie Kim wrote:
> On Fri, 4 Apr 2025 22:45:00 +0200 David Hildenbrand <david@redhat.com> wrote:
>> On 04.04.25 09:46, Rakie Kim wrote:
>>> The weighted interleave policy distributes page allocations across multiple
>>> NUMA nodes based on their performance weight, thereby improving memory
>>> bandwidth utilization. The weight values for each node are configured
>>> through sysfs.
>>>
>>> Previously, sysfs entries for configuring weighted interleave were created
>>> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
>>> might not have memory. However, not all nodes in N_POSSIBLE are usable at
>>> runtime, as some may remain memoryless or offline.
>>> This led to sysfs entries being created for unusable nodes, causing
>>> potential misconfiguration issues.
>>>
>>> To address this issue, this patch modifies the sysfs creation logic to:
>>> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
>>> the creation of sysfs entries for nodes that cannot be used.
>>> 2) Support memory hotplug by dynamically adding and removing sysfs entries
>>> based on whether a node transitions into or out of the N_MEMORY state.
>>>
>>> Additionally, the patch ensures that sysfs attributes are properly managed
>>> when nodes go offline, preventing stale or redundant entries from persisting
>>> in the system.
>>>
>>> By making these changes, the weighted interleave policy now manages its
>>> sysfs entries more efficiently, ensuring that only relevant nodes are
>>> considered for interleaving, and dynamically adapting to memory hotplug
>>> events.
>>>
>>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>>> ---
>>> mm/mempolicy.c | 109 ++++++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 86 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>>> index 73a9405ff352..f25c2c7f8fcf 100644
>>> --- a/mm/mempolicy.c
>>> +++ b/mm/mempolicy.c
>>> @@ -113,6 +113,7 @@
>>> #include <asm/tlbflush.h>
>>> #include <asm/tlb.h>
>>> #include <linux/uaccess.h>
>>> +#include <linux/memory.h>
>>>
>>> #include "internal.h"
>>>
>>> @@ -3390,6 +3391,7 @@ struct iw_node_attr {
>>>
>>> struct sysfs_wi_group {
>>> struct kobject wi_kobj;
>>> + struct mutex kobj_lock;
>>> struct iw_node_attr *nattrs[];
>>> };
>>>
>>> @@ -3439,13 +3441,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>>>
>>> static void sysfs_wi_node_delete(int nid)
>>> {
>>> - if (!wi_group->nattrs[nid])
>>> + struct iw_node_attr *attr;
>>> +
>>> + if (nid < 0 || nid >= nr_node_ids)
>>> + return;
>>> +
>>> + mutex_lock(&wi_group->kobj_lock);
>>> + attr = wi_group->nattrs[nid];
>>> + if (!attr) {
>>> + mutex_unlock(&wi_group->kobj_lock);
>>> return;
>>> + }
>>> +
>>> + wi_group->nattrs[nid] = NULL;
>>> + mutex_unlock(&wi_group->kobj_lock);
>>>
>>> - sysfs_remove_file(&wi_group->wi_kobj,
>>> - &wi_group->nattrs[nid]->kobj_attr.attr);
>>> - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
>>> - kfree(wi_group->nattrs[nid]);
>>> + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
>>> + kfree(attr->kobj_attr.attr.name);
>>> + kfree(attr);
>>> }
>>>
>>> static void sysfs_wi_release(struct kobject *wi_kobj)
>>> @@ -3464,35 +3477,80 @@ static const struct kobj_type wi_ktype = {
>>>
>>> static int sysfs_wi_node_add(int nid)
>>> {
>>> - struct iw_node_attr *node_attr;
>>> + int ret = 0;
>>> char *name;
>>> + struct iw_node_attr *new_attr = NULL;
>>>
>>> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
>>> - if (!node_attr)
>>> + if (nid < 0 || nid >= nr_node_ids) {
>>> + pr_err("Invalid node id: %d\n", nid);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
>>> + if (!new_attr)
>>> return -ENOMEM;
>>>
>>> name = kasprintf(GFP_KERNEL, "node%d", nid);
>>> if (!name) {
>>> - kfree(node_attr);
>>> + kfree(new_attr);
>>> return -ENOMEM;
>>> }
>>>
>>> - sysfs_attr_init(&node_attr->kobj_attr.attr);
>>> - node_attr->kobj_attr.attr.name = name;
>>> - node_attr->kobj_attr.attr.mode = 0644;
>>> - node_attr->kobj_attr.show = node_show;
>>> - node_attr->kobj_attr.store = node_store;
>>> - node_attr->nid = nid;
>>> + mutex_lock(&wi_group->kobj_lock);
>>> + if (wi_group->nattrs[nid]) {
>>> + mutex_unlock(&wi_group->kobj_lock);
>>> + pr_info("Node [%d] already exists\n", nid);
>>> + kfree(new_attr);
>>> + kfree(name);
>>> + return 0;
>>> + }
>>> + wi_group->nattrs[nid] = new_attr;
>>>
>>> - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
>>> - kfree(node_attr->kobj_attr.attr.name);
>>> - kfree(node_attr);
>>> - pr_err("failed to add attribute to weighted_interleave\n");
>>> - return -ENOMEM;
>>> + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
>>> + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
>>> + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
>>> + wi_group->nattrs[nid]->kobj_attr.show = node_show;
>>> + wi_group->nattrs[nid]->kobj_attr.store = node_store;
>>> + wi_group->nattrs[nid]->nid = nid;
>>> +
>>> + ret = sysfs_create_file(&wi_group->wi_kobj,
>>> + &wi_group->nattrs[nid]->kobj_attr.attr);
>>> + if (ret) {
>>> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
>>> + kfree(wi_group->nattrs[nid]);
>>> + wi_group->nattrs[nid] = NULL;
>>> + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
>>> }
>>> + mutex_unlock(&wi_group->kobj_lock);
>>>
>>> - wi_group->nattrs[nid] = node_attr;
>>> - return 0;
>>> + return ret;
>>> +}
>>> +
>>> +static int wi_node_notifier(struct notifier_block *nb,
>>> + unsigned long action, void *data)
>>> +{
>>> + int err;
>>> + struct memory_notify *arg = data;
>>> + int nid = arg->status_change_nid;
>>> +
>>> + if (nid < 0)
>>> + goto notifier_end;
>>> +
>>> + switch(action) {
>>> + case MEM_ONLINE:
>>
>> MEM_ONLINE is too late, we cannot fail hotplug at that point.
>>
>> Would MEM_GOING_ONLINE / MEM_CANCEL_ONLINE be better?
>
> Hi David,
Hi,
>
> Thank you for raising these points. I would appreciate your clarification
> on the following:
>
> Issue1: I want to invoke sysfs_wi_node_add() after a node with memory
> has been fully transitioned to the online state. Does replacing
> MEM_ONLINE with MEM_GOING_ONLINE or MEM_CANCEL_ONLINE still ensure
> that the node is considered online and usable by that point?
After MEM_GOING_ONLINE nothing can go wrong except that some other
notifier fails MEM_GOING_ONLINE.
That happens rarely -- only if some other allocation, like for kasan, fails.
In MEM_CANCEL_ONLINE you have to undo what you did (remove the node again).
>
>>
>>
>>> + err = sysfs_wi_node_add(nid);
>>> + if (err) {
>>> + pr_err("failed to add sysfs [node%d]\n", nid);
>>> + return NOTIFY_BAD;
>>
>> Note that NOTIFY_BAD includes NOTIFY_STOP_MASK. So you wouldn't call
>> other notifiers, but the overall memory onlining would succeed, which is
>> bad.
>>
>> If we don't care about the error (not prevent hotplug) we could only
>> pr_warn() and continue. Maybe this (unlikely) case is not a good reason
>> to stop memory from getting onlined. OTOH, it will barely ever trigger
>> in practice ...
>>
>
> Issue2: Regarding your note about NOTIFY_BAD ? are you saying that
> if sysfs_wi_node_add() returns NOTIFY_BAD, it will trigger
> NOTIFY_STOP_MASK, preventing other notifiers from running, while
> still allowing the memory hotplug operation to complete?
Yes.
>
> If so, then I'm thinking of resolving both issues as follows:
> - For Issue1: I keep using MEM_ONLINE, assuming it is safe and
> sufficient to ensure the node is fully online.
> - For Issue2: I avoid returning NOTIFY_BAD from the notifier.
> Instead, I log the error using pr_err() and continue the operation.
>
> This would result in the following code:
>
> if (nid < 0)
> return NOTIFY_OK;
>
> switch (action) {
> case MEM_ONLINE: // Issue1: keeping this unchanged
> err = sysfs_wi_node_add(nid);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> // Issue2: Do not return NOTIFY_BAD
> }
> break;
> case MEM_OFFLINE:
> sysfs_wi_node_delete(nid);
> break;
> }
>
> // Always return NOTIFY_OK
> return NOTIFY_OK;
>
> Please let me know if this approach is acceptable.
That would work. The alternative is failing during MEM_GOING_ONLINE if
the allocation failed, and deleting the node during MEM_CANCEL_ONLINE.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-04-07 10:47 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-04 7:46 [PATCH v6 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-04-04 7:46 ` [PATCH v6 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-04-04 12:59 ` Jonathan Cameron
2025-04-07 9:37 ` Rakie Kim
2025-04-04 7:46 ` [PATCH v6 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
2025-04-04 13:05 ` Jonathan Cameron
2025-04-04 17:23 ` Dan Williams
2025-04-07 9:38 ` Rakie Kim
2025-04-07 9:49 ` Jonathan Cameron
2025-04-04 7:46 ` [PATCH v6 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-04-04 8:43 ` Oscar Salvador
2025-04-07 9:37 ` Rakie Kim
2025-04-04 20:45 ` David Hildenbrand
2025-04-07 9:39 ` Rakie Kim
2025-04-07 10:47 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).