* [PATCH v7 0/3] Enhance sysfs handling for memory hotplug in weighted interleave
@ 2025-04-08 7:32 Rakie Kim
2025-04-08 7:32 ` [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Rakie Kim @ 2025-04-08 7:32 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
The following patch series enhances the weighted interleave policy in the
memory management subsystem by improving sysfs handling, fixing memory leaks,
and introducing dynamic sysfs updates for memory hotplug support.
### Background
The weighted interleave policy distributes memory allocations across multiple
NUMA nodes based on their performance weight, thereby optimizing memory
bandwidth utilization. The weight values are configured through sysfs.
Previously, sysfs entries for weighted interleave were managed statically
at initialization. This led to several issues:
- Memory Leaks: Improper `kobject` teardown caused memory leaks
when initialization failed or when nodes were removed.
- Lack of Dynamic Updates: Sysfs attributes were created only during
initialization, preventing nodes added at runtime from being recognized.
- Handling of Unusable Nodes: Sysfs entries were generated for all
possible nodes (`N_POSSIBLE`), including memoryless or unavailable nodes,
leading to sysfs entries for unusable nodes and potential
misconfigurations.
### Patch Overview
1. [PATCH 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
- Ensures proper cleanup of `kobject` allocations.
- Adds `kobject_del()` before `kobject_put()` to clean up sysfs state correctly.
- Prevents memory/resource leaks and improves teardown behavior.
2. [PATCH 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
- Refactors static sysfs layout into a new `sysfs_wi_group` structure.
- Makes per-node sysfs attributes accessible to external modules.
- Lays groundwork for future hotplug support by enabling runtime modification.
3. [PATCH 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
- Dynamically adds/removes sysfs entries when nodes are online/offline.
- Limits sysfs creation to nodes with memory, avoiding unusable node entries.
- Hooks into memory hotplug notifier for runtime updates.
These patches have been tested under CXL-based memory configurations,
including hotplug scenarios, to ensure proper behavior and stability.
mm/mempolicy.c | 193 +++++++++++++++++++++++++++++++------------------
1 file changed, 124 insertions(+), 69 deletions(-)
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
--
2.34.1
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
2025-04-08 7:32 [PATCH v7 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
@ 2025-04-08 7:32 ` Rakie Kim
2025-04-08 13:45 ` Joshua Hahn
2025-04-15 15:41 ` Jonathan Cameron
2025-04-08 7:32 ` [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2 siblings, 2 replies; 28+ messages in thread
From: Rakie Kim @ 2025-04-08 7:32 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
Memory leaks occurred when removing sysfs attributes for weighted
interleave. Improper kobject deallocation led to unreleased memory
when initialization failed or when nodes were removed.
This patch resolves the issue by replacing unnecessary `kfree()`
calls with proper `kobject_del()` and `kobject_put()` sequences,
ensuring correct teardown and preventing memory leaks.
By explicitly calling `kobject_del()` before `kobject_put()`,
the release function is now invoked safely, and internal sysfs
state is correctly cleaned up. This guarantees that the memory
associated with the kobject is fully released and avoids
resource leaks, thereby improving system stability.
Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
---
mm/mempolicy.c | 66 ++++++++++++++++++++++++--------------------------
1 file changed, 32 insertions(+), 34 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b28a1e6ae096..0da102aa1cfc 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3479,7 +3479,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
for (i = 0; i < nr_node_ids; i++)
sysfs_wi_node_release(node_attrs[i], wi_kobj);
- kobject_put(wi_kobj);
+
+ kfree(node_attrs);
+ kfree(wi_kobj);
}
static const struct kobj_type wi_ktype = {
@@ -3525,27 +3527,37 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
struct kobject *wi_kobj;
int nid, err;
+ node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+ GFP_KERNEL);
+ if (!node_attrs)
+ return -ENOMEM;
+
wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
- if (!wi_kobj)
+ if (!wi_kobj) {
+ kfree(node_attrs);
return -ENOMEM;
+ }
err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
"weighted_interleave");
- if (err) {
- kfree(wi_kobj);
- return err;
- }
+ if (err)
+ goto err_put_kobj;
for_each_node_state(nid, N_POSSIBLE) {
err = add_weight_node(nid, wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- break;
+ goto err_del_kobj;
}
}
- if (err)
- kobject_put(wi_kobj);
+
return 0;
+
+err_del_kobj:
+ kobject_del(wi_kobj);
+err_put_kobj:
+ kobject_put(wi_kobj);
+ return err;
}
static void mempolicy_kobj_release(struct kobject *kobj)
@@ -3559,7 +3571,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
mutex_unlock(&iw_table_lock);
synchronize_rcu();
kfree(old);
- kfree(node_attrs);
kfree(kobj);
}
@@ -3573,37 +3584,24 @@ static int __init mempolicy_sysfs_init(void)
static struct kobject *mempolicy_kobj;
mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
- if (!mempolicy_kobj) {
- err = -ENOMEM;
- goto err_out;
- }
-
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs) {
- err = -ENOMEM;
- goto mempol_out;
- }
+ if (!mempolicy_kobj)
+ return -ENOMEM;
err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
"mempolicy");
if (err)
- goto node_out;
+ goto err_put_kobj;
err = add_weighted_interleave_group(mempolicy_kobj);
- if (err) {
- pr_err("mempolicy sysfs structure failed to initialize\n");
- kobject_put(mempolicy_kobj);
- return err;
- }
+ if (err)
+ goto err_del_kobj;
- return err;
-node_out:
- kfree(node_attrs);
-mempol_out:
- kfree(mempolicy_kobj);
-err_out:
- pr_err("failed to add mempolicy kobject to the system\n");
+ return 0;
+
+err_del_kobj:
+ kobject_del(mempolicy_kobj);
+err_put_kobj:
+ kobject_put(mempolicy_kobj);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-08 7:32 [PATCH v7 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-04-08 7:32 ` [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-04-08 7:32 ` Rakie Kim
2025-04-08 13:49 ` Joshua Hahn
2025-04-09 3:43 ` Dan Williams
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2 siblings, 2 replies; 28+ messages in thread
From: Rakie Kim @ 2025-04-08 7:32 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
Previously, the weighted interleave sysfs structure was statically
managed during initialization. This prevented new nodes from being
recognized when memory hotplug events occurred, limiting the ability
to update or extend sysfs entries dynamically at runtime.
To address this, this patch refactors the sysfs infrastructure and
encapsulates it within a new structure, `sysfs_wi_group`, which holds
both the kobject and an array of node attribute pointers.
By allocating this group structure globally, the per-node sysfs
attributes can be managed beyond initialization time, enabling
external modules to insert or remove node entries in response to
events such as memory hotplug or node online/offline transitions.
Instead of allocating all per-node sysfs attributes at once, the
initialization path now uses the existing sysfs_wi_node_add() and
sysfs_wi_node_delete() helpers. This refactoring makes it possible
to modularly manage per-node sysfs entries and ensures the
infrastructure is ready for runtime extension.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
---
mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
1 file changed, 29 insertions(+), 32 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 0da102aa1cfc..988575f29c53 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3419,6 +3419,13 @@ struct iw_node_attr {
int nid;
};
+struct sysfs_wi_group {
+ struct kobject wi_kobj;
+ struct iw_node_attr *nattrs[];
+};
+
+static struct sysfs_wi_group *wi_group;
+
static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
@@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
return count;
}
-static struct iw_node_attr **node_attrs;
-
-static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
- struct kobject *parent)
+static void sysfs_wi_node_delete(int nid)
{
- if (!node_attr)
+ if (!wi_group->nattrs[nid])
return;
- sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
- kfree(node_attr->kobj_attr.attr.name);
- kfree(node_attr);
+
+ sysfs_remove_file(&wi_group->wi_kobj,
+ &wi_group->nattrs[nid]->kobj_attr.attr);
+ kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
+ kfree(wi_group->nattrs[nid]);
}
static void sysfs_wi_release(struct kobject *wi_kobj)
{
- int i;
-
- for (i = 0; i < nr_node_ids; i++)
- sysfs_wi_node_release(node_attrs[i], wi_kobj);
+ int nid;
- kfree(node_attrs);
- kfree(wi_kobj);
+ for (nid = 0; nid < nr_node_ids; nid++)
+ sysfs_wi_node_delete(nid);
+ kfree(wi_group);
}
static const struct kobj_type wi_ktype = {
@@ -3489,7 +3493,7 @@ static const struct kobj_type wi_ktype = {
.release = sysfs_wi_release,
};
-static int add_weight_node(int nid, struct kobject *wi_kobj)
+static int sysfs_wi_node_add(int nid)
{
struct iw_node_attr *node_attr;
char *name;
@@ -3511,40 +3515,33 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
node_attr->kobj_attr.store = node_store;
node_attr->nid = nid;
- if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
+ if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
kfree(node_attr->kobj_attr.attr.name);
kfree(node_attr);
pr_err("failed to add attribute to weighted_interleave\n");
return -ENOMEM;
}
- node_attrs[nid] = node_attr;
+ wi_group->nattrs[nid] = node_attr;
return 0;
}
-static int add_weighted_interleave_group(struct kobject *root_kobj)
+static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
{
- struct kobject *wi_kobj;
int nid, err;
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs)
+ wi_group = kzalloc(struct_size(wi_group, nattrs, nr_node_ids),
+ GFP_KERNEL);
+ if (!wi_group)
return -ENOMEM;
- wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
- if (!wi_kobj) {
- kfree(node_attrs);
- return -ENOMEM;
- }
-
- err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
+ err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
"weighted_interleave");
if (err)
goto err_put_kobj;
for_each_node_state(nid, N_POSSIBLE) {
- err = add_weight_node(nid, wi_kobj);
+ err = sysfs_wi_node_add(nid);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
goto err_del_kobj;
@@ -3554,9 +3551,9 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
return 0;
err_del_kobj:
- kobject_del(wi_kobj);
+ kobject_del(&wi_group->wi_kobj);
err_put_kobj:
- kobject_put(wi_kobj);
+ kobject_put(&wi_group->wi_kobj);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-08 7:32 [PATCH v7 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-04-08 7:32 ` [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-04-08 7:32 ` [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
@ 2025-04-08 7:32 ` Rakie Kim
2025-04-08 13:52 ` Joshua Hahn
` (3 more replies)
2 siblings, 4 replies; 28+ messages in thread
From: Rakie Kim @ 2025-04-08 7:32 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
The weighted interleave policy distributes page allocations across multiple
NUMA nodes based on their performance weight, thereby improving memory
bandwidth utilization. The weight values for each node are configured
through sysfs.
Previously, sysfs entries for configuring weighted interleave were created
for all possible nodes (N_POSSIBLE) at initialization, including nodes that
might not have memory. However, not all nodes in N_POSSIBLE are usable at
runtime, as some may remain memoryless or offline.
This led to sysfs entries being created for unusable nodes, causing
potential misconfiguration issues.
To address this issue, this patch modifies the sysfs creation logic to:
1) Limit sysfs entries to nodes that are online and have memory, avoiding
the creation of sysfs entries for nodes that cannot be used.
2) Support memory hotplug by dynamically adding and removing sysfs entries
based on whether a node transitions into or out of the N_MEMORY state.
Additionally, the patch ensures that sysfs attributes are properly managed
when nodes go offline, preventing stale or redundant entries from persisting
in the system.
By making these changes, the weighted interleave policy now manages its
sysfs entries more efficiently, ensuring that only relevant nodes are
considered for interleaving, and dynamically adapting to memory hotplug
events.
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
---
mm/mempolicy.c | 106 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 83 insertions(+), 23 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 988575f29c53..9aa884107f4c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -113,6 +113,7 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
#include <linux/uaccess.h>
+#include <linux/memory.h>
#include "internal.h"
@@ -3421,6 +3422,7 @@ struct iw_node_attr {
struct sysfs_wi_group {
struct kobject wi_kobj;
+ struct mutex kobj_lock;
struct iw_node_attr *nattrs[];
};
@@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
static void sysfs_wi_node_delete(int nid)
{
- if (!wi_group->nattrs[nid])
+ struct iw_node_attr *attr;
+
+ if (nid < 0 || nid >= nr_node_ids)
+ return;
+
+ mutex_lock(&wi_group->kobj_lock);
+ attr = wi_group->nattrs[nid];
+ if (!attr) {
+ mutex_unlock(&wi_group->kobj_lock);
return;
+ }
+
+ wi_group->nattrs[nid] = NULL;
+ mutex_unlock(&wi_group->kobj_lock);
- sysfs_remove_file(&wi_group->wi_kobj,
- &wi_group->nattrs[nid]->kobj_attr.attr);
- kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
- kfree(wi_group->nattrs[nid]);
+ sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
+ kfree(attr->kobj_attr.attr.name);
+ kfree(attr);
}
static void sysfs_wi_release(struct kobject *wi_kobj)
@@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = {
static int sysfs_wi_node_add(int nid)
{
- struct iw_node_attr *node_attr;
+ int ret = 0;
char *name;
+ struct iw_node_attr *new_attr = NULL;
- node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
- if (!node_attr)
+ if (nid < 0 || nid >= nr_node_ids) {
+ pr_err("Invalid node id: %d\n", nid);
+ return -EINVAL;
+ }
+
+ new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
+ if (!new_attr)
return -ENOMEM;
name = kasprintf(GFP_KERNEL, "node%d", nid);
if (!name) {
- kfree(node_attr);
+ kfree(new_attr);
return -ENOMEM;
}
- sysfs_attr_init(&node_attr->kobj_attr.attr);
- node_attr->kobj_attr.attr.name = name;
- node_attr->kobj_attr.attr.mode = 0644;
- node_attr->kobj_attr.show = node_show;
- node_attr->kobj_attr.store = node_store;
- node_attr->nid = nid;
+ mutex_lock(&wi_group->kobj_lock);
+ if (wi_group->nattrs[nid]) {
+ mutex_unlock(&wi_group->kobj_lock);
+ pr_info("Node [%d] already exists\n", nid);
+ kfree(new_attr);
+ kfree(name);
+ return 0;
+ }
+ wi_group->nattrs[nid] = new_attr;
- if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
- kfree(node_attr->kobj_attr.attr.name);
- kfree(node_attr);
- pr_err("failed to add attribute to weighted_interleave\n");
- return -ENOMEM;
+ sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
+ wi_group->nattrs[nid]->kobj_attr.attr.name = name;
+ wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
+ wi_group->nattrs[nid]->kobj_attr.show = node_show;
+ wi_group->nattrs[nid]->kobj_attr.store = node_store;
+ wi_group->nattrs[nid]->nid = nid;
+
+ ret = sysfs_create_file(&wi_group->wi_kobj,
+ &wi_group->nattrs[nid]->kobj_attr.attr);
+ if (ret) {
+ kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
+ kfree(wi_group->nattrs[nid]);
+ wi_group->nattrs[nid] = NULL;
+ pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
}
+ mutex_unlock(&wi_group->kobj_lock);
- wi_group->nattrs[nid] = node_attr;
- return 0;
+ return ret;
+}
+
+static int wi_node_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ int err;
+ struct memory_notify *arg = data;
+ int nid = arg->status_change_nid;
+
+ if (nid < 0)
+ return NOTIFY_OK;
+
+ switch(action) {
+ case MEM_ONLINE:
+ err = sysfs_wi_node_add(nid);
+ if (err)
+ pr_err("failed to add sysfs [node%d]\n", nid);
+ break;
+ case MEM_OFFLINE:
+ sysfs_wi_node_delete(nid);
+ break;
+ }
+
+ return NOTIFY_OK;
}
static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
@@ -3534,13 +3589,17 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
GFP_KERNEL);
if (!wi_group)
return -ENOMEM;
+ mutex_init(&wi_group->kobj_lock);
err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
"weighted_interleave");
if (err)
goto err_put_kobj;
- for_each_node_state(nid, N_POSSIBLE) {
+ for_each_online_node(nid) {
+ if (!node_state(nid, N_MEMORY))
+ continue;
+
err = sysfs_wi_node_add(nid);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
@@ -3548,6 +3607,7 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
}
}
+ hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
return 0;
err_del_kobj:
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
2025-04-08 7:32 ` [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
@ 2025-04-08 13:45 ` Joshua Hahn
2025-04-15 15:41 ` Jonathan Cameron
1 sibling, 0 replies; 28+ messages in thread
From: Joshua Hahn @ 2025-04-08 13:45 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun
On Tue, 8 Apr 2025 16:32:40 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
Hi Rakie,
Thank you for your work on this fix! Everything looks good to me : -)
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Memory leaks occurred when removing sysfs attributes for weighted
> interleave. Improper kobject deallocation led to unreleased memory
> when initialization failed or when nodes were removed.
>
> This patch resolves the issue by replacing unnecessary `kfree()`
> calls with proper `kobject_del()` and `kobject_put()` sequences,
> ensuring correct teardown and preventing memory leaks.
>
> By explicitly calling `kobject_del()` before `kobject_put()`,
> the release function is now invoked safely, and internal sysfs
> state is correctly cleaned up. This guarantees that the memory
> associated with the kobject is fully released and avoids
> resource leaks, thereby improving system stability.
>
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> ---
> mm/mempolicy.c | 66 ++++++++++++++++++++++++--------------------------
> 1 file changed, 32 insertions(+), 34 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index b28a1e6ae096..0da102aa1cfc 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3479,7 +3479,9 @@ static void sysfs_wi_release(struct kobject *wi_kobj)
>
> for (i = 0; i < nr_node_ids; i++)
> sysfs_wi_node_release(node_attrs[i], wi_kobj);
> - kobject_put(wi_kobj);
> +
> + kfree(node_attrs);
> + kfree(wi_kobj);
> }
>
> static const struct kobj_type wi_ktype = {
> @@ -3525,27 +3527,37 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> struct kobject *wi_kobj;
> int nid, err;
>
> + node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> + GFP_KERNEL);
> + if (!node_attrs)
> + return -ENOMEM;
> +
> wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> - if (!wi_kobj)
> + if (!wi_kobj) {
> + kfree(node_attrs);
> return -ENOMEM;
> + }
>
> err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> "weighted_interleave");
> - if (err) {
> - kfree(wi_kobj);
> - return err;
> - }
> + if (err)
> + goto err_put_kobj;
>
> for_each_node_state(nid, N_POSSIBLE) {
> err = add_weight_node(nid, wi_kobj);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> - break;
> + goto err_del_kobj;
> }
> }
> - if (err)
> - kobject_put(wi_kobj);
> +
> return 0;
> +
> +err_del_kobj:
> + kobject_del(wi_kobj);
> +err_put_kobj:
> + kobject_put(wi_kobj);
> + return err;
> }
>
> static void mempolicy_kobj_release(struct kobject *kobj)
> @@ -3559,7 +3571,6 @@ static void mempolicy_kobj_release(struct kobject *kobj)
> mutex_unlock(&iw_table_lock);
> synchronize_rcu();
> kfree(old);
> - kfree(node_attrs);
> kfree(kobj);
> }
>
> @@ -3573,37 +3584,24 @@ static int __init mempolicy_sysfs_init(void)
> static struct kobject *mempolicy_kobj;
>
> mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
> - if (!mempolicy_kobj) {
> - err = -ENOMEM;
> - goto err_out;
> - }
> -
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs) {
> - err = -ENOMEM;
> - goto mempol_out;
> - }
> + if (!mempolicy_kobj)
> + return -ENOMEM;
>
> err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
> "mempolicy");
> if (err)
> - goto node_out;
> + goto err_put_kobj;
>
> err = add_weighted_interleave_group(mempolicy_kobj);
> - if (err) {
> - pr_err("mempolicy sysfs structure failed to initialize\n");
> - kobject_put(mempolicy_kobj);
> - return err;
> - }
> + if (err)
> + goto err_del_kobj;
>
> - return err;
> -node_out:
> - kfree(node_attrs);
> -mempol_out:
> - kfree(mempolicy_kobj);
> -err_out:
> - pr_err("failed to add mempolicy kobject to the system\n");
> + return 0;
> +
> +err_del_kobj:
> + kobject_del(mempolicy_kobj);
> +err_put_kobj:
> + kobject_put(mempolicy_kobj);
> return err;
> }
>
> --
> 2.34.1
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-08 7:32 ` [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
@ 2025-04-08 13:49 ` Joshua Hahn
2025-04-09 3:43 ` Dan Williams
1 sibling, 0 replies; 28+ messages in thread
From: Joshua Hahn @ 2025-04-08 13:49 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun
On Tue, 8 Apr 2025 16:32:41 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
Hi Rakie,
This also looks good to me!
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Previously, the weighted interleave sysfs structure was statically
> managed during initialization. This prevented new nodes from being
> recognized when memory hotplug events occurred, limiting the ability
> to update or extend sysfs entries dynamically at runtime.
>
> To address this, this patch refactors the sysfs infrastructure and
> encapsulates it within a new structure, `sysfs_wi_group`, which holds
> both the kobject and an array of node attribute pointers.
>
> By allocating this group structure globally, the per-node sysfs
> attributes can be managed beyond initialization time, enabling
> external modules to insert or remove node entries in response to
> events such as memory hotplug or node online/offline transitions.
>
> Instead of allocating all per-node sysfs attributes at once, the
> initialization path now uses the existing sysfs_wi_node_add() and
> sysfs_wi_node_delete() helpers. This refactoring makes it possible
> to modularly manage per-node sysfs entries and ensures the
> infrastructure is ready for runtime extension.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> ---
> mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> 1 file changed, 29 insertions(+), 32 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 0da102aa1cfc..988575f29c53 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> int nid;
> };
>
> +struct sysfs_wi_group {
> + struct kobject wi_kobj;
> + struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *wi_group;
> +
> static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> char *buf)
> {
> @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> return count;
> }
>
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> - struct kobject *parent)
> +static void sysfs_wi_node_delete(int nid)
> {
> - if (!node_attr)
> + if (!wi_group->nattrs[nid])
> return;
> - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> +
> + sysfs_remove_file(&wi_group->wi_kobj,
> + &wi_group->nattrs[nid]->kobj_attr.attr);
> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> + kfree(wi_group->nattrs[nid]);
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> {
> - int i;
> -
> - for (i = 0; i < nr_node_ids; i++)
> - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> + int nid;
>
> - kfree(node_attrs);
> - kfree(wi_kobj);
> + for (nid = 0; nid < nr_node_ids; nid++)
> + sysfs_wi_node_delete(nid);
> + kfree(wi_group);
> }
>
> static const struct kobj_type wi_ktype = {
> @@ -3489,7 +3493,7 @@ static const struct kobj_type wi_ktype = {
> .release = sysfs_wi_release,
> };
>
> -static int add_weight_node(int nid, struct kobject *wi_kobj)
> +static int sysfs_wi_node_add(int nid)
> {
> struct iw_node_attr *node_attr;
> char *name;
> @@ -3511,40 +3515,33 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> node_attr->kobj_attr.store = node_store;
> node_attr->nid = nid;
>
> - if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
> + if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
> kfree(node_attr->kobj_attr.attr.name);
> kfree(node_attr);
> pr_err("failed to add attribute to weighted_interleave\n");
> return -ENOMEM;
> }
>
> - node_attrs[nid] = node_attr;
> + wi_group->nattrs[nid] = node_attr;
> return 0;
> }
>
> -static int add_weighted_interleave_group(struct kobject *root_kobj)
> +static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
> {
> - struct kobject *wi_kobj;
> int nid, err;
>
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs)
> + wi_group = kzalloc(struct_size(wi_group, nattrs, nr_node_ids),
> + GFP_KERNEL);
> + if (!wi_group)
> return -ENOMEM;
>
> - wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> - if (!wi_kobj) {
> - kfree(node_attrs);
> - return -ENOMEM;
> - }
> -
> - err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
> + err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
> "weighted_interleave");
> if (err)
> goto err_put_kobj;
>
> for_each_node_state(nid, N_POSSIBLE) {
> - err = add_weight_node(nid, wi_kobj);
> + err = sysfs_wi_node_add(nid);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> goto err_del_kobj;
> @@ -3554,9 +3551,9 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> return 0;
>
> err_del_kobj:
> - kobject_del(wi_kobj);
> + kobject_del(&wi_group->wi_kobj);
> err_put_kobj:
> - kobject_put(wi_kobj);
> + kobject_put(&wi_group->wi_kobj);
> return err;
> }
>
> --
> 2.34.1
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
@ 2025-04-08 13:52 ` Joshua Hahn
2025-04-08 14:45 ` Gregory Price
` (2 subsequent siblings)
3 siblings, 0 replies; 28+ messages in thread
From: Joshua Hahn @ 2025-04-08 13:52 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun
On Tue, 8 Apr 2025 16:32:42 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
Hi Rakie,
Looks good to me as well : -) Thank you for working on this!
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
>
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> based on whether a node transitions into or out of the N_MEMORY state.
>
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
>
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> ---
> mm/mempolicy.c | 106 ++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 83 insertions(+), 23 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 988575f29c53..9aa884107f4c 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -113,6 +113,7 @@
> #include <asm/tlbflush.h>
> #include <asm/tlb.h>
> #include <linux/uaccess.h>
> +#include <linux/memory.h>
>
> #include "internal.h"
>
> @@ -3421,6 +3422,7 @@ struct iw_node_attr {
>
> struct sysfs_wi_group {
> struct kobject wi_kobj;
> + struct mutex kobj_lock;
> struct iw_node_attr *nattrs[];
> };
>
> @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>
> static void sysfs_wi_node_delete(int nid)
> {
> - if (!wi_group->nattrs[nid])
> + struct iw_node_attr *attr;
> +
> + if (nid < 0 || nid >= nr_node_ids)
> + return;
> +
> + mutex_lock(&wi_group->kobj_lock);
> + attr = wi_group->nattrs[nid];
> + if (!attr) {
> + mutex_unlock(&wi_group->kobj_lock);
> return;
> + }
> +
> + wi_group->nattrs[nid] = NULL;
> + mutex_unlock(&wi_group->kobj_lock);
>
> - sysfs_remove_file(&wi_group->wi_kobj,
> - &wi_group->nattrs[nid]->kobj_attr.attr);
> - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> - kfree(wi_group->nattrs[nid]);
> + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
> + kfree(attr->kobj_attr.attr.name);
> + kfree(attr);
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> @@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = {
>
> static int sysfs_wi_node_add(int nid)
> {
> - struct iw_node_attr *node_attr;
> + int ret = 0;
> char *name;
> + struct iw_node_attr *new_attr = NULL;
>
> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> - if (!node_attr)
> + if (nid < 0 || nid >= nr_node_ids) {
> + pr_err("Invalid node id: %d\n", nid);
> + return -EINVAL;
> + }
> +
> + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
> + if (!new_attr)
> return -ENOMEM;
>
> name = kasprintf(GFP_KERNEL, "node%d", nid);
> if (!name) {
> - kfree(node_attr);
> + kfree(new_attr);
> return -ENOMEM;
> }
>
> - sysfs_attr_init(&node_attr->kobj_attr.attr);
> - node_attr->kobj_attr.attr.name = name;
> - node_attr->kobj_attr.attr.mode = 0644;
> - node_attr->kobj_attr.show = node_show;
> - node_attr->kobj_attr.store = node_store;
> - node_attr->nid = nid;
> + mutex_lock(&wi_group->kobj_lock);
> + if (wi_group->nattrs[nid]) {
> + mutex_unlock(&wi_group->kobj_lock);
> + pr_info("Node [%d] already exists\n", nid);
> + kfree(new_attr);
> + kfree(name);
> + return 0;
> + }
> + wi_group->nattrs[nid] = new_attr;
>
> - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> - pr_err("failed to add attribute to weighted_interleave\n");
> - return -ENOMEM;
> + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
> + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
> + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
> + wi_group->nattrs[nid]->kobj_attr.show = node_show;
> + wi_group->nattrs[nid]->kobj_attr.store = node_store;
> + wi_group->nattrs[nid]->nid = nid;
> +
> + ret = sysfs_create_file(&wi_group->wi_kobj,
> + &wi_group->nattrs[nid]->kobj_attr.attr);
> + if (ret) {
> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> + kfree(wi_group->nattrs[nid]);
> + wi_group->nattrs[nid] = NULL;
> + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
> }
> + mutex_unlock(&wi_group->kobj_lock);
>
> - wi_group->nattrs[nid] = node_attr;
> - return 0;
> + return ret;
> +}
> +
> +static int wi_node_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + int err;
> + struct memory_notify *arg = data;
> + int nid = arg->status_change_nid;
> +
> + if (nid < 0)
> + return NOTIFY_OK;
> +
> + switch(action) {
> + case MEM_ONLINE:
> + err = sysfs_wi_node_add(nid);
> + if (err)
> + pr_err("failed to add sysfs [node%d]\n", nid);
> + break;
> + case MEM_OFFLINE:
> + sysfs_wi_node_delete(nid);
> + break;
> + }
> +
> + return NOTIFY_OK;
> }
>
> static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
> @@ -3534,13 +3589,17 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
> GFP_KERNEL);
> if (!wi_group)
> return -ENOMEM;
> + mutex_init(&wi_group->kobj_lock);
>
> err = kobject_init_and_add(&wi_group->wi_kobj, &wi_ktype, mempolicy_kobj,
> "weighted_interleave");
> if (err)
> goto err_put_kobj;
>
> - for_each_node_state(nid, N_POSSIBLE) {
> + for_each_online_node(nid) {
> + if (!node_state(nid, N_MEMORY))
> + continue;
> +
> err = sysfs_wi_node_add(nid);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> @@ -3548,6 +3607,7 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
> }
> }
>
> + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
> return 0;
>
> err_del_kobj:
> --
> 2.34.1
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-04-08 13:52 ` Joshua Hahn
@ 2025-04-08 14:45 ` Gregory Price
2025-04-09 9:05 ` David Hildenbrand
2025-04-15 16:00 ` Jonathan Cameron
3 siblings, 0 replies; 28+ messages in thread
From: Gregory Price @ 2025-04-08 14:45 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun
On Tue, Apr 08, 2025 at 04:32:42PM +0900, Rakie Kim wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
>
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> based on whether a node transitions into or out of the N_MEMORY state.
>
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
>
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Gregory Price <gourry@gourry.net>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-08 7:32 ` [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
2025-04-08 13:49 ` Joshua Hahn
@ 2025-04-09 3:43 ` Dan Williams
2025-04-09 3:54 ` Dan Williams
1 sibling, 1 reply; 28+ messages in thread
From: Dan Williams @ 2025-04-09 3:43 UTC (permalink / raw)
To: Rakie Kim, akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
Rakie Kim wrote:
> Previously, the weighted interleave sysfs structure was statically
> managed during initialization. This prevented new nodes from being
> recognized when memory hotplug events occurred, limiting the ability
> to update or extend sysfs entries dynamically at runtime.
>
> To address this, this patch refactors the sysfs infrastructure and
> encapsulates it within a new structure, `sysfs_wi_group`, which holds
> both the kobject and an array of node attribute pointers.
>
> By allocating this group structure globally, the per-node sysfs
> attributes can be managed beyond initialization time, enabling
> external modules to insert or remove node entries in response to
> events such as memory hotplug or node online/offline transitions.
>
> Instead of allocating all per-node sysfs attributes at once, the
> initialization path now uses the existing sysfs_wi_node_add() and
> sysfs_wi_node_delete() helpers. This refactoring makes it possible
> to modularly manage per-node sysfs entries and ensures the
> infrastructure is ready for runtime extension.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
> ---
> mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> 1 file changed, 29 insertions(+), 32 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 0da102aa1cfc..988575f29c53 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> int nid;
> };
>
> +struct sysfs_wi_group {
> + struct kobject wi_kobj;
> + struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *wi_group;
> +
> static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> char *buf)
> {
> @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> return count;
> }
>
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> - struct kobject *parent)
> +static void sysfs_wi_node_delete(int nid)
> {
> - if (!node_attr)
> + if (!wi_group->nattrs[nid])
> return;
> - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> +
> + sysfs_remove_file(&wi_group->wi_kobj,
> + &wi_group->nattrs[nid]->kobj_attr.attr);
This still looks broken to me, but I think this is more a problem that
was present in the original code.
At this point @wi_group's reference count is zero because
sysfs_wi_release() has been called. However, it can only be zero if it has
properly transitioned through kobject_del() and final kobject_put(). It
follows that kobject_del() arranges for kobj->sd to be NULL. That means
that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
for the !parent case.
So, either you are not triggering that path, or testing that path, but
sys_remove_file() of the child attributes should be happening *before*
sysfs_wi_release().
Did I miss something?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-09 3:43 ` Dan Williams
@ 2025-04-09 3:54 ` Dan Williams
2025-04-09 5:56 ` Rakie Kim
2025-04-11 7:21 ` Rakie Kim
0 siblings, 2 replies; 28+ messages in thread
From: Dan Williams @ 2025-04-09 3:54 UTC (permalink / raw)
To: Dan Williams, Rakie Kim, akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun, rakie.kim
Dan Williams wrote:
> Rakie Kim wrote:
> > Previously, the weighted interleave sysfs structure was statically
> > managed during initialization. This prevented new nodes from being
> > recognized when memory hotplug events occurred, limiting the ability
> > to update or extend sysfs entries dynamically at runtime.
> >
> > To address this, this patch refactors the sysfs infrastructure and
> > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > both the kobject and an array of node attribute pointers.
> >
> > By allocating this group structure globally, the per-node sysfs
> > attributes can be managed beyond initialization time, enabling
> > external modules to insert or remove node entries in response to
> > events such as memory hotplug or node online/offline transitions.
> >
> > Instead of allocating all per-node sysfs attributes at once, the
> > initialization path now uses the existing sysfs_wi_node_add() and
> > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > to modularly manage per-node sysfs entries and ensures the
> > infrastructure is ready for runtime extension.
> >
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > Reviewed-by: Gregory Price <gourry@gourry.net>
> > ---
> > mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> > 1 file changed, 29 insertions(+), 32 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 0da102aa1cfc..988575f29c53 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> > int nid;
> > };
> >
> > +struct sysfs_wi_group {
> > + struct kobject wi_kobj;
> > + struct iw_node_attr *nattrs[];
> > +};
> > +
> > +static struct sysfs_wi_group *wi_group;
> > +
> > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > char *buf)
> > {
> > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > return count;
> > }
> >
> > -static struct iw_node_attr **node_attrs;
> > -
> > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > - struct kobject *parent)
> > +static void sysfs_wi_node_delete(int nid)
> > {
> > - if (!node_attr)
> > + if (!wi_group->nattrs[nid])
> > return;
> > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > - kfree(node_attr->kobj_attr.attr.name);
> > - kfree(node_attr);
> > +
> > + sysfs_remove_file(&wi_group->wi_kobj,
> > + &wi_group->nattrs[nid]->kobj_attr.attr);
>
> This still looks broken to me, but I think this is more a problem that
> was present in the original code.
>
> At this point @wi_group's reference count is zero because
> sysfs_wi_release() has been called. However, it can only be zero if it has
> properly transitioned through kobject_del() and final kobject_put(). It
> follows that kobject_del() arranges for kobj->sd to be NULL. That means
> that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> for the !parent case.
>
> So, either you are not triggering that path, or testing that path, but
> sys_remove_file() of the child attributes should be happening *before*
> sysfs_wi_release().
>
> Did I miss something?
I think the missing change is that sysfs_wi_node_add() failures need to
be done with a sysfs_wi_node_delete() of the added attrs *before* the
kobject_del() of @wi_group.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-09 3:54 ` Dan Williams
@ 2025-04-09 5:56 ` Rakie Kim
2025-04-09 18:51 ` Dan Williams
2025-04-11 7:21 ` Rakie Kim
1 sibling, 1 reply; 28+ messages in thread
From: Rakie Kim @ 2025-04-09 5:56 UTC (permalink / raw)
To: Dan Williams
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, rakie.kim, akpm
On Tue, 8 Apr 2025 20:54:48 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> Dan Williams wrote:
> > Rakie Kim wrote:
> > > Previously, the weighted interleave sysfs structure was statically
> > > managed during initialization. This prevented new nodes from being
> > > recognized when memory hotplug events occurred, limiting the ability
> > > to update or extend sysfs entries dynamically at runtime.
> > >
> > > To address this, this patch refactors the sysfs infrastructure and
> > > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > > both the kobject and an array of node attribute pointers.
> > >
> > > By allocating this group structure globally, the per-node sysfs
> > > attributes can be managed beyond initialization time, enabling
> > > external modules to insert or remove node entries in response to
> > > events such as memory hotplug or node online/offline transitions.
> > >
> > > Instead of allocating all per-node sysfs attributes at once, the
> > > initialization path now uses the existing sysfs_wi_node_add() and
> > > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > > to modularly manage per-node sysfs entries and ensures the
> > > infrastructure is ready for runtime extension.
> > >
> > > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > > Reviewed-by: Gregory Price <gourry@gourry.net>
> > > ---
> > > mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> > > 1 file changed, 29 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > index 0da102aa1cfc..988575f29c53 100644
> > > --- a/mm/mempolicy.c
> > > +++ b/mm/mempolicy.c
> > > @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> > > int nid;
> > > };
> > >
> > > +struct sysfs_wi_group {
> > > + struct kobject wi_kobj;
> > > + struct iw_node_attr *nattrs[];
> > > +};
> > > +
> > > +static struct sysfs_wi_group *wi_group;
> > > +
> > > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > > char *buf)
> > > {
> > > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > return count;
> > > }
> > >
> > > -static struct iw_node_attr **node_attrs;
> > > -
> > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > - struct kobject *parent)
> > > +static void sysfs_wi_node_delete(int nid)
> > > {
> > > - if (!node_attr)
> > > + if (!wi_group->nattrs[nid])
> > > return;
> > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > - kfree(node_attr->kobj_attr.attr.name);
> > > - kfree(node_attr);
> > > +
> > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> >
> > This still looks broken to me, but I think this is more a problem that
> > was present in the original code.
> >
> > At this point @wi_group's reference count is zero because
> > sysfs_wi_release() has been called. However, it can only be zero if it has
> > properly transitioned through kobject_del() and final kobject_put(). It
> > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > for the !parent case.
> >
> > So, either you are not triggering that path, or testing that path, but
> > sys_remove_file() of the child attributes should be happening *before*
> > sysfs_wi_release().
> >
> > Did I miss something?
>
> I think the missing change is that sysfs_wi_node_add() failures need to
> be done with a sysfs_wi_node_delete() of the added attrs *before* the
> kobject_del() of @wi_group.
Hi Dan Williams
Thank you very much for identifying this potential issue in the code.
As you pointed out, this seems to be a problem that was already present in
the original implementation, and I agree that it needs to be addressed.
However, since this issue existed prior to the changes in this patch
series, I believe it would be more appropriate to fix it in a separate
follow-up patch rather than include it here.
I will start preparing a new patch to address this problem, and I would
greatly appreciate it if you could review it once it's ready.
Rakie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-04-08 13:52 ` Joshua Hahn
2025-04-08 14:45 ` Gregory Price
@ 2025-04-09 9:05 ` David Hildenbrand
2025-04-09 11:39 ` Honggyu Kim
2025-04-15 16:00 ` Jonathan Cameron
3 siblings, 1 reply; 28+ messages in thread
From: David Hildenbrand @ 2025-04-09 9:05 UTC (permalink / raw)
To: Rakie Kim, akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, Jonathan.Cameron, osalvador,
kernel_team, honggyu.kim, yunjeong.mun
On 08.04.25 09:32, Rakie Kim wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
>
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> based on whether a node transitions into or out of the N_MEMORY state.
>
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
>
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
Why are the other SOF in there? Are there Co-developed-by missing?
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-09 9:05 ` David Hildenbrand
@ 2025-04-09 11:39 ` Honggyu Kim
2025-04-09 11:52 ` David Hildenbrand
0 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2025-04-09 11:39 UTC (permalink / raw)
To: David Hildenbrand, Rakie Kim, akpm
Cc: kernel_team, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, Jonathan.Cameron,
osalvador, yunjeong.mun
Hi David,
On 4/9/2025 6:05 PM, David Hildenbrand wrote:
> On 08.04.25 09:32, Rakie Kim wrote:
>> The weighted interleave policy distributes page allocations across multiple
>> NUMA nodes based on their performance weight, thereby improving memory
>> bandwidth utilization. The weight values for each node are configured
>> through sysfs.
>>
>> Previously, sysfs entries for configuring weighted interleave were created
>> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
>> might not have memory. However, not all nodes in N_POSSIBLE are usable at
>> runtime, as some may remain memoryless or offline.
>> This led to sysfs entries being created for unusable nodes, causing
>> potential misconfiguration issues.
>>
>> To address this issue, this patch modifies the sysfs creation logic to:
>> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
>> the creation of sysfs entries for nodes that cannot be used.
>> 2) Support memory hotplug by dynamically adding and removing sysfs entries
>> based on whether a node transitions into or out of the N_MEMORY state.
>>
>> Additionally, the patch ensures that sysfs attributes are properly managed
>> when nodes go offline, preventing stale or redundant entries from persisting
>> in the system.
>>
>> By making these changes, the weighted interleave policy now manages its
>> sysfs entries more efficiently, ensuring that only relevant nodes are
>> considered for interleaving, and dynamically adapting to memory hotplug
>> events.
>>
>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>
>
> Why are the other SOF in there? Are there Co-developed-by missing?
I initially found the problem and fixed it with my internal implementation but
Rakie also had his idea so he started working on it. His initial implementation
has almost been similar to mine.
I thought Signed-off-by is a way to express the patch series contains our
contribution, but if you think it's unusual, then I can add this.
Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
For Yunjeong, the following can be added.
Tested-by: Yunjeong Mun <yunjeong.mun@sk.com>
However, this patch series is already in Andrew's mm-new so I don't want to
bother him again unless we need to update this patches for other reasons.
Is this okay?
Thanks,
Honggyu
>
>
> Acked-by: David Hildenbrand <david@redhat.com>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-09 11:39 ` Honggyu Kim
@ 2025-04-09 11:52 ` David Hildenbrand
2025-04-10 7:53 ` Rakie Kim
2025-04-10 13:25 ` Honggyu Kim
0 siblings, 2 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-04-09 11:52 UTC (permalink / raw)
To: Honggyu Kim, Rakie Kim, akpm
Cc: kernel_team, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, Jonathan.Cameron,
osalvador, yunjeong.mun
On 09.04.25 13:39, Honggyu Kim wrote:
> Hi David,
>
> On 4/9/2025 6:05 PM, David Hildenbrand wrote:
>> On 08.04.25 09:32, Rakie Kim wrote:
>>> The weighted interleave policy distributes page allocations across multiple
>>> NUMA nodes based on their performance weight, thereby improving memory
>>> bandwidth utilization. The weight values for each node are configured
>>> through sysfs.
>>>
>>> Previously, sysfs entries for configuring weighted interleave were created
>>> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
>>> might not have memory. However, not all nodes in N_POSSIBLE are usable at
>>> runtime, as some may remain memoryless or offline.
>>> This led to sysfs entries being created for unusable nodes, causing
>>> potential misconfiguration issues.
>>>
>>> To address this issue, this patch modifies the sysfs creation logic to:
>>> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
>>> the creation of sysfs entries for nodes that cannot be used.
>>> 2) Support memory hotplug by dynamically adding and removing sysfs entries
>>> based on whether a node transitions into or out of the N_MEMORY state.
>>>
>>> Additionally, the patch ensures that sysfs attributes are properly managed
>>> when nodes go offline, preventing stale or redundant entries from persisting
>>> in the system.
>>>
>>> By making these changes, the weighted interleave policy now manages its
>>> sysfs entries more efficiently, ensuring that only relevant nodes are
>>> considered for interleaving, and dynamically adapting to memory hotplug
>>> events.
>>>
>>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>>
>>
>> Why are the other SOF in there? Are there Co-developed-by missing?
>
> I initially found the problem and fixed it with my internal implementation but
> Rakie also had his idea so he started working on it. His initial implementation
> has almost been similar to mine.
>
> I thought Signed-off-by is a way to express the patch series contains our
> contribution, but if you think it's unusual, then I can add this.
Please see Documentation/process/submitting-patches.rst, and note that these
are not "patch delivery" SOB.
"
The Signed-off-by: tag indicates that the signer was involved in the
development of the patch, or that he/she was in the patch's delivery path.
"
and
"
Co-developed-by: states that the patch was co-created by multiple developers;
it is used to give attribution to co-authors (in addition to the author
attributed by the From: tag) when several people work on a single patch. Since
Co-developed-by: denotes authorship, every Co-developed-by: must be immediately
followed by a Signed-off-by: of the associated co-author. Standard sign-off
procedure applies, i.e. the ordering of Signed-off-by: tags should reflect the
chronological history of the patch insofar as possible, regardless of whether
the author is attributed via From: or Co-developed-by:. Notably, the last
Signed-off-by: must always be that of the developer submitting the patch.
"
The SOB order here is also not correct.
>
> Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>
> For Yunjeong, the following can be added.
>
> Tested-by: Yunjeong Mun <yunjeong.mun@sk.com>
That is probably the right thing to do if contribution was focused on testing.
>
> However, this patch series is already in Andrew's mm-new so I don't want to
> bother him again unless we need to update this patches for other reasons.
mm-new is exactly for these kind of things. We can ask Andrew to fix it up.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-09 5:56 ` Rakie Kim
@ 2025-04-09 18:51 ` Dan Williams
2025-04-10 7:53 ` Rakie Kim
0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2025-04-09 18:51 UTC (permalink / raw)
To: Rakie Kim, Dan Williams
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, rakie.kim, akpm
Rakie Kim wrote:
> On Tue, 8 Apr 2025 20:54:48 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> > Dan Williams wrote:
> > > Rakie Kim wrote:
> > > > Previously, the weighted interleave sysfs structure was statically
> > > > managed during initialization. This prevented new nodes from being
> > > > recognized when memory hotplug events occurred, limiting the ability
> > > > to update or extend sysfs entries dynamically at runtime.
> > > >
> > > > To address this, this patch refactors the sysfs infrastructure and
> > > > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > > > both the kobject and an array of node attribute pointers.
> > > >
> > > > By allocating this group structure globally, the per-node sysfs
> > > > attributes can be managed beyond initialization time, enabling
> > > > external modules to insert or remove node entries in response to
> > > > events such as memory hotplug or node online/offline transitions.
> > > >
> > > > Instead of allocating all per-node sysfs attributes at once, the
> > > > initialization path now uses the existing sysfs_wi_node_add() and
> > > > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > > > to modularly manage per-node sysfs entries and ensures the
> > > > infrastructure is ready for runtime extension.
> > > >
> > > > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > > > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > > > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > > > Reviewed-by: Gregory Price <gourry@gourry.net>
> > > > ---
> > > > mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> > > > 1 file changed, 29 insertions(+), 32 deletions(-)
> > > >
> > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > > index 0da102aa1cfc..988575f29c53 100644
> > > > --- a/mm/mempolicy.c
> > > > +++ b/mm/mempolicy.c
> > > > @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> > > > int nid;
> > > > };
> > > >
> > > > +struct sysfs_wi_group {
> > > > + struct kobject wi_kobj;
> > > > + struct iw_node_attr *nattrs[];
> > > > +};
> > > > +
> > > > +static struct sysfs_wi_group *wi_group;
> > > > +
> > > > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > > > char *buf)
> > > > {
> > > > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > > return count;
> > > > }
> > > >
> > > > -static struct iw_node_attr **node_attrs;
> > > > -
> > > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > > - struct kobject *parent)
> > > > +static void sysfs_wi_node_delete(int nid)
> > > > {
> > > > - if (!node_attr)
> > > > + if (!wi_group->nattrs[nid])
> > > > return;
> > > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > > - kfree(node_attr->kobj_attr.attr.name);
> > > > - kfree(node_attr);
> > > > +
> > > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> > >
> > > This still looks broken to me, but I think this is more a problem that
> > > was present in the original code.
> > >
> > > At this point @wi_group's reference count is zero because
> > > sysfs_wi_release() has been called. However, it can only be zero if it has
> > > properly transitioned through kobject_del() and final kobject_put(). It
> > > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > > for the !parent case.
> > >
> > > So, either you are not triggering that path, or testing that path, but
> > > sys_remove_file() of the child attributes should be happening *before*
> > > sysfs_wi_release().
> > >
> > > Did I miss something?
> >
> > I think the missing change is that sysfs_wi_node_add() failures need to
> > be done with a sysfs_wi_node_delete() of the added attrs *before* the
> > kobject_del() of @wi_group.
>
> Hi Dan Williams
>
> Thank you very much for identifying this potential issue in the code.
>
> As you pointed out, this seems to be a problem that was already present in
> the original implementation, and I agree that it needs to be addressed.
>
> However, since this issue existed prior to the changes in this patch
> series, I believe it would be more appropriate to fix it in a separate
> follow-up patch rather than include it here.
I tend to disagree. The whole motivation of this series is to get the
kobject lifetime handling correct in order to add the new dynamic
capability. The claimed correctness fixups are incomplete. There is time
to respin this (we are only at -rc1) and get it right before landing the
new dynamic capability.
One of the outcomes of the "MM Process" topic at LSF/MM was that Andrew
wanted more feedback on when patches are not quite ready for prime-time
and I think this is an example of a patch set that deserves another spin
to meet the stated goals.
> I will start preparing a new patch to address this problem, and I would
> greatly appreciate it if you could review it once it's ready.
Will definitely review it. I will leave to Andrew if he wants an
incremental fixup on top of this series, or rebase on top of a fully
fixed baseline. My preference is finish fixing all the old kobject()
issues and then rebase the new dynamic work on top. Either way, do not
be afraid to ask Andrew to replace a series in -mm, that's a sign of the
process working as expected.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-09 11:52 ` David Hildenbrand
@ 2025-04-10 7:53 ` Rakie Kim
2025-04-10 13:25 ` Honggyu Kim
1 sibling, 0 replies; 28+ messages in thread
From: Rakie Kim @ 2025-04-10 7:53 UTC (permalink / raw)
To: David Hildenbrand
Cc: kernel_team, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, Jonathan.Cameron,
osalvador, yunjeong.mun, Honggyu Kim, Rakie Kim, akpm
On Wed, 9 Apr 2025 13:52:28 +0200 David Hildenbrand <david@redhat.com> wrote:
> On 09.04.25 13:39, Honggyu Kim wrote:
> > Hi David,
> >
> > On 4/9/2025 6:05 PM, David Hildenbrand wrote:
> >> On 08.04.25 09:32, Rakie Kim wrote:
> >>> The weighted interleave policy distributes page allocations across multiple
> >>> NUMA nodes based on their performance weight, thereby improving memory
> >>> bandwidth utilization. The weight values for each node are configured
> >>> through sysfs.
> >>>
> >>> Previously, sysfs entries for configuring weighted interleave were created
> >>> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> >>> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> >>> runtime, as some may remain memoryless or offline.
> >>> This led to sysfs entries being created for unusable nodes, causing
> >>> potential misconfiguration issues.
> >>>
> >>> To address this issue, this patch modifies the sysfs creation logic to:
> >>> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> >>> the creation of sysfs entries for nodes that cannot be used.
> >>> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> >>> based on whether a node transitions into or out of the N_MEMORY state.
> >>>
> >>> Additionally, the patch ensures that sysfs attributes are properly managed
> >>> when nodes go offline, preventing stale or redundant entries from persisting
> >>> in the system.
> >>>
> >>> By making these changes, the weighted interleave policy now manages its
> >>> sysfs entries more efficiently, ensuring that only relevant nodes are
> >>> considered for interleaving, and dynamically adapting to memory hotplug
> >>> events.
> >>>
> >>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> >>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> >>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> >>
> >>
> >> Why are the other SOF in there? Are there Co-developed-by missing?
> >
> > I initially found the problem and fixed it with my internal implementation but
> > Rakie also had his idea so he started working on it. His initial implementation
> > has almost been similar to mine.
> >
> > I thought Signed-off-by is a way to express the patch series contains our
> > contribution, but if you think it's unusual, then I can add this.
>
> Please see Documentation/process/submitting-patches.rst, and note that these
> are not "patch delivery" SOB.
>
> "
> The Signed-off-by: tag indicates that the signer was involved in the
> development of the patch, or that he/she was in the patch's delivery path.
> "
>
> and
>
> "
> Co-developed-by: states that the patch was co-created by multiple developers;
> it is used to give attribution to co-authors (in addition to the author
> attributed by the From: tag) when several people work on a single patch. Since
> Co-developed-by: denotes authorship, every Co-developed-by: must be immediately
> followed by a Signed-off-by: of the associated co-author. Standard sign-off
> procedure applies, i.e. the ordering of Signed-off-by: tags should reflect the
> chronological history of the patch insofar as possible, regardless of whether
> the author is attributed via From: or Co-developed-by:. Notably, the last
> Signed-off-by: must always be that of the developer submitting the patch.
> "
>
> The SOB order here is also not correct.
>
> >
> > Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> >
> > For Yunjeong, the following can be added.
> >
> > Tested-by: Yunjeong Mun <yunjeong.mun@sk.com>
>
> That is probably the right thing to do if contribution was focused on testing.
>
> >
> > However, this patch series is already in Andrew's mm-new so I don't want to
> > bother him again unless we need to update this patches for other reasons.
>
> mm-new is exactly for these kind of things. We can ask Andrew to fix it up.
>
> --
> Cheers,
>
> David / dhildenb
>
Hi David,
Thank you for reviewing this patch series and providing your Acked-by tag.
As you pointed out, I agree that the Signed-off-by tags in this patch
series are not clearly aligned with the actual contributions.
Coincidentally, Dan Williams has requested an additional fix for Patch 1
in this series. Therefore, I am planning to prepare a new version, v8.
In that version, I will reorganize the Signed-off-by tags as you suggested
to accurately reflect the authorship and contributions.
Thank you again for your guidance.
Rakie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-09 18:51 ` Dan Williams
@ 2025-04-10 7:53 ` Rakie Kim
2025-04-10 8:06 ` Rakie Kim
0 siblings, 1 reply; 28+ messages in thread
From: Rakie Kim @ 2025-04-10 7:53 UTC (permalink / raw)
To: Dan Williams
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, rakie.kim, akpm
On Wed, 9 Apr 2025 11:51:36 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> Rakie Kim wrote:
> > On Tue, 8 Apr 2025 20:54:48 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> > > Dan Williams wrote:
> > > > Rakie Kim wrote:
> > > > > Previously, the weighted interleave sysfs structure was statically
> > > > > managed during initialization. This prevented new nodes from being
> > > > > recognized when memory hotplug events occurred, limiting the ability
> > > > > to update or extend sysfs entries dynamically at runtime.
> > > > >
> > > > > To address this, this patch refactors the sysfs infrastructure and
> > > > > encapsulates it within a new structure, `sysfs_wi_group`, which holds
> > > > > both the kobject and an array of node attribute pointers.
> > > > >
> > > > > By allocating this group structure globally, the per-node sysfs
> > > > > attributes can be managed beyond initialization time, enabling
> > > > > external modules to insert or remove node entries in response to
> > > > > events such as memory hotplug or node online/offline transitions.
> > > > >
> > > > > Instead of allocating all per-node sysfs attributes at once, the
> > > > > initialization path now uses the existing sysfs_wi_node_add() and
> > > > > sysfs_wi_node_delete() helpers. This refactoring makes it possible
> > > > > to modularly manage per-node sysfs entries and ensures the
> > > > > infrastructure is ready for runtime extension.
> > > > >
> > > > > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > > > > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > > > > Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> > > > > Reviewed-by: Gregory Price <gourry@gourry.net>
> > > > > ---
> > > > > mm/mempolicy.c | 61 ++++++++++++++++++++++++--------------------------
> > > > > 1 file changed, 29 insertions(+), 32 deletions(-)
> > > > >
> > > > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > > > > index 0da102aa1cfc..988575f29c53 100644
> > > > > --- a/mm/mempolicy.c
> > > > > +++ b/mm/mempolicy.c
> > > > > @@ -3419,6 +3419,13 @@ struct iw_node_attr {
> > > > > int nid;
> > > > > };
> > > > >
> > > > > +struct sysfs_wi_group {
> > > > > + struct kobject wi_kobj;
> > > > > + struct iw_node_attr *nattrs[];
> > > > > +};
> > > > > +
> > > > > +static struct sysfs_wi_group *wi_group;
> > > > > +
> > > > > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > > > > char *buf)
> > > > > {
> > > > > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > > > return count;
> > > > > }
> > > > >
> > > > > -static struct iw_node_attr **node_attrs;
> > > > > -
> > > > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > > > - struct kobject *parent)
> > > > > +static void sysfs_wi_node_delete(int nid)
> > > > > {
> > > > > - if (!node_attr)
> > > > > + if (!wi_group->nattrs[nid])
> > > > > return;
> > > > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > > > - kfree(node_attr->kobj_attr.attr.name);
> > > > > - kfree(node_attr);
> > > > > +
> > > > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> > > >
> > > > This still looks broken to me, but I think this is more a problem that
> > > > was present in the original code.
> > > >
> > > > At this point @wi_group's reference count is zero because
> > > > sysfs_wi_release() has been called. However, it can only be zero if it has
> > > > properly transitioned through kobject_del() and final kobject_put(). It
> > > > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > > > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > > > for the !parent case.
> > > >
> > > > So, either you are not triggering that path, or testing that path, but
> > > > sys_remove_file() of the child attributes should be happening *before*
> > > > sysfs_wi_release().
> > > >
> > > > Did I miss something?
> > >
> > > I think the missing change is that sysfs_wi_node_add() failures need to
> > > be done with a sysfs_wi_node_delete() of the added attrs *before* the
> > > kobject_del() of @wi_group.
> >
> > Hi Dan Williams
> >
> > Thank you very much for identifying this potential issue in the code.
> >
> > As you pointed out, this seems to be a problem that was already present in
> > the original implementation, and I agree that it needs to be addressed.
> >
> > However, since this issue existed prior to the changes in this patch
> > series, I believe it would be more appropriate to fix it in a separate
> > follow-up patch rather than include it here.
>
> I tend to disagree. The whole motivation of this series is to get the
> kobject lifetime handling correct in order to add the new dynamic
> capability. The claimed correctness fixups are incomplete. There is time
> to respin this (we are only at -rc1) and get it right before landing the
> new dynamic capability.
>
> One of the outcomes of the "MM Process" topic at LSF/MM was that Andrew
> wanted more feedback on when patches are not quite ready for prime-time
> and I think this is an example of a patch set that deserves another spin
> to meet the stated goals.
>
> > I will start preparing a new patch to address this problem, and I would
> > greatly appreciate it if you could review it once it's ready.
>
> Will definitely review it. I will leave to Andrew if he wants an
> incremental fixup on top of this series, or rebase on top of a fully
> fixed baseline. My preference is finish fixing all the old kobject()
> issues and then rebase the new dynamic work on top. Either way, do not
> be afraid to ask Andrew to replace a series in -mm, that's a sign of the
> process working as expected.
Thank you very much for your advice, and I completely agree with your
recommendation. I will immediately ask Andrew to remove this patch series
from -mm. Then, I will prepare a new version, v8, which properly addresses
the kobject-related issues you pointed out.
Once again, I sincerely appreciate your thoughtful and detailed feedback.
Rakie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-10 7:53 ` Rakie Kim
@ 2025-04-10 8:06 ` Rakie Kim
2025-04-11 3:11 ` Andrew Morton
0 siblings, 1 reply; 28+ messages in thread
From: Rakie Kim @ 2025-04-10 8:06 UTC (permalink / raw)
To: akpm
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, rakie.kim, Dan Williams
On Thu, 10 Apr 2025 16:53:33 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> On Wed, 9 Apr 2025 11:51:36 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> > Rakie Kim wrote:
> > > > > > +static void sysfs_wi_node_delete(int nid)
> > > > > > {
> > > > > > - if (!node_attr)
> > > > > > + if (!wi_group->nattrs[nid])
> > > > > > return;
> > > > > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > > > > - kfree(node_attr->kobj_attr.attr.name);
> > > > > > - kfree(node_attr);
> > > > > > +
> > > > > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > > > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> > > > >
> > > > > This still looks broken to me, but I think this is more a problem that
> > > > > was present in the original code.
> > > > >
> > > > > At this point @wi_group's reference count is zero because
> > > > > sysfs_wi_release() has been called. However, it can only be zero if it has
> > > > > properly transitioned through kobject_del() and final kobject_put(). It
> > > > > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > > > > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > > > > for the !parent case.
> > > > >
> > > > > So, either you are not triggering that path, or testing that path, but
> > > > > sys_remove_file() of the child attributes should be happening *before*
> > > > > sysfs_wi_release().
> > > > >
> > > > > Did I miss something?
> > > >
> > > > I think the missing change is that sysfs_wi_node_add() failures need to
> > > > be done with a sysfs_wi_node_delete() of the added attrs *before* the
> > > > kobject_del() of @wi_group.
> > >
> > > Hi Dan Williams
> > >
> > > Thank you very much for identifying this potential issue in the code.
> > >
> > > As you pointed out, this seems to be a problem that was already present in
> > > the original implementation, and I agree that it needs to be addressed.
> > >
> > > However, since this issue existed prior to the changes in this patch
> > > series, I believe it would be more appropriate to fix it in a separate
> > > follow-up patch rather than include it here.
> >
> > I tend to disagree. The whole motivation of this series is to get the
> > kobject lifetime handling correct in order to add the new dynamic
> > capability. The claimed correctness fixups are incomplete. There is time
> > to respin this (we are only at -rc1) and get it right before landing the
> > new dynamic capability.
> >
> > One of the outcomes of the "MM Process" topic at LSF/MM was that Andrew
> > wanted more feedback on when patches are not quite ready for prime-time
> > and I think this is an example of a patch set that deserves another spin
> > to meet the stated goals.
> >
> > > I will start preparing a new patch to address this problem, and I would
> > > greatly appreciate it if you could review it once it's ready.
> >
> > Will definitely review it. I will leave to Andrew if he wants an
> > incremental fixup on top of this series, or rebase on top of a fully
> > fixed baseline. My preference is finish fixing all the old kobject()
> > issues and then rebase the new dynamic work on top. Either way, do not
> > be afraid to ask Andrew to replace a series in -mm, that's a sign of the
> > process working as expected.
>
> Thank you very much for your advice, and I completely agree with your
> recommendation. I will immediately ask Andrew to remove this patch series
> from -mm. Then, I will prepare a new version, v8, which properly addresses
> the kobject-related issues you pointed out.
>
> Once again, I sincerely appreciate your thoughtful and detailed feedback.
>
> Rakie
>
To Andrew
I sincerely apologize for the inconvenience. It appears that this commit still
requires additional corrections. I would appreciate it if you could drop the
changes you merged into -mm, mm-new branch yesterday.
<1>
The patch titled
Subject: mm/mempolicy: fix memory leaks in weighted interleave sysfs has been added to the -mm mm-new branch. Its filename is
mm-mempolicy-fix-memory-leaks-in-weighted-interleave-sysfs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-mempolicy-fix-memory-leaks-in-weighted-interleave-sysfs.patch
<2>
The patch titled
Subject: mm/mempolicy: prepare weighted interleave sysfs for memory hotplug has been added to the -mm mm-new branch. Its filename is
mm-mempolicy-prepare-weighted-interleave-sysfs-for-memory-hotplug.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-mempolicy-prepare-weighted-interleave-sysfs-for-memory-hotplug.patch
<3>
The patch titled
Subject: mm/mempolicy: support memory hotplug in weighted interleave has been added to the -mm mm-new branch. Its filename is
mm-mempolicy-support-memory-hotplug-in-weighted-interleave.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-mempolicy-support-memory-hotplug-in-weighted-interleave.patch
Rakie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-09 11:52 ` David Hildenbrand
2025-04-10 7:53 ` Rakie Kim
@ 2025-04-10 13:25 ` Honggyu Kim
2025-04-10 13:41 ` David Hildenbrand
1 sibling, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2025-04-10 13:25 UTC (permalink / raw)
To: David Hildenbrand, Rakie Kim, akpm
Cc: kernel_team, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, Jonathan.Cameron,
osalvador, yunjeong.mun
Hi David,
On 4/9/2025 8:52 PM, David Hildenbrand wrote:
> On 09.04.25 13:39, Honggyu Kim wrote:
>> Hi David,
>>
>> On 4/9/2025 6:05 PM, David Hildenbrand wrote:
>>> On 08.04.25 09:32, Rakie Kim wrote:
[...snip...]
>>>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>>>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>>>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>>>
>>> Why are the other SOF in there? Are there Co-developed-by missing?
>>
>> I initially found the problem and fixed it with my internal implementation but
>> Rakie also had his idea so he started working on it. His initial implementation
>> has almost been similar to mine.
>>
>> I thought Signed-off-by is a way to express the patch series contains our
>> contribution, but if you think it's unusual, then I can add this.
>
> Please see Documentation/process/submitting-patches.rst,
Thanks for the info.
> and note that these are not "patch delivery" SOB.
>
> "
> The Signed-off-by: tag indicates that the signer was involved in the
> development of the patch, or that he/she was in the patch's delivery path.
Yunjeong and I have been involved in finding the problem and also concluded this
issue is related to hotplug together with our initial implementations before
this patch. So I guess it is the former case.
> "
>
> and
>
> "
> Co-developed-by: states that the patch was co-created by multiple developers;
> it is used to give attribution to co-authors (in addition to the author
> attributed by the From: tag) when several people work on a single patch. Since
> Co-developed-by: denotes authorship, every Co-developed-by: must be immediately
> followed by a Signed-off-by: of the associated co-author. Standard sign-off
So the Co-developed-by comes before Signed-off-by.
> procedure applies, i.e. the ordering of Signed-off-by: tags should reflect the
> chronological history of the patch insofar as possible, regardless of whether
> the author is attributed via From: or Co-developed-by:. Notably, the last
> Signed-off-by: must always be that of the developer submitting the patch.
> "
>
> The SOB order here is also not correct.
It looks the below order is correct. I saw this order in the official document
example as well.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.15-rc1#n516
>> Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>>
>> For Yunjeong, the following can be added.
>>
>> Tested-by: Yunjeong Mun <yunjeong.mun@sk.com>
>
> That is probably the right thing to do if contribution was focused on testing.
>
>>
>> However, this patch series is already in Andrew's mm-new so I don't want to
>> bother him again unless we need to update this patches for other reasons.
>
> mm-new is exactly for these kind of things. We can ask Andrew to fix it up.
Rakie already asked him and he will update signinig tags at the next spin.
Thanks very much for your help!
Thanks,
Honggyu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-10 13:25 ` Honggyu Kim
@ 2025-04-10 13:41 ` David Hildenbrand
0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-04-10 13:41 UTC (permalink / raw)
To: Honggyu Kim, Rakie Kim, akpm
Cc: kernel_team, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, Jonathan.Cameron,
osalvador, yunjeong.mun
On 10.04.25 15:25, Honggyu Kim wrote:
> Hi David,
>
> On 4/9/2025 8:52 PM, David Hildenbrand wrote:
>> On 09.04.25 13:39, Honggyu Kim wrote:
>>> Hi David,
>>>
>>> On 4/9/2025 6:05 PM, David Hildenbrand wrote:
>>>> On 08.04.25 09:32, Rakie Kim wrote:
> [...snip...]
>>>>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>>>>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>>>>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>>>>
>>>> Why are the other SOF in there? Are there Co-developed-by missing?
>>>
>>> I initially found the problem and fixed it with my internal implementation but
>>> Rakie also had his idea so he started working on it. His initial implementation
>>> has almost been similar to mine.
>>>
>>> I thought Signed-off-by is a way to express the patch series contains our
>>> contribution, but if you think it's unusual, then I can add this.
>>
>> Please see Documentation/process/submitting-patches.rst,
>
> Thanks for the info.
>
>> and note that these are not "patch delivery" SOB.
>>
>> "
>> The Signed-off-by: tag indicates that the signer was involved in the
>> development of the patch, or that he/she was in the patch's delivery path.
>
> Yunjeong and I have been involved in finding the problem and also concluded this
> issue is related to hotplug together with our initial implementations before
> this patch. So I guess it is the former case.
IIRC, usually we use Co-developed-by + SOB only if there are actual code
contributions: when you would consider someone a "co-author".
"Co-developed-by: denotes authorship"
For suggestions we use Suggested-by, and for things that popped up
during a review, it's usually a good idea that reviewers supply a
Reviewed-by at the end.
So I guess Co-developed-by + SOB is appropriate if people consider
themselves co-authors, in addition to the main author.
>
>> "
>>
>> and
>>
>> "
>> Co-developed-by: states that the patch was co-created by multiple developers;
>> it is used to give attribution to co-authors (in addition to the author
>> attributed by the From: tag) when several people work on a single patch. Since
>> Co-developed-by: denotes authorship, every Co-developed-by: must be immediately
>> followed by a Signed-off-by: of the associated co-author. Standard sign-off
>
> So the Co-developed-by comes before Signed-off-by.
Yes.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-10 8:06 ` Rakie Kim
@ 2025-04-11 3:11 ` Andrew Morton
0 siblings, 0 replies; 28+ messages in thread
From: Andrew Morton @ 2025-04-11 3:11 UTC (permalink / raw)
To: Rakie Kim
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, Dan Williams
On Thu, 10 Apr 2025 17:06:19 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> I sincerely apologize for the inconvenience. It appears that this commit still
> requires additional corrections. I would appreciate it if you could drop the
> changes you merged into -mm, mm-new branch yesterday.
No problems, it happens, glad to be of service. Dropped.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-09 3:54 ` Dan Williams
2025-04-09 5:56 ` Rakie Kim
@ 2025-04-11 7:21 ` Rakie Kim
2025-04-11 22:24 ` Dan Williams
1 sibling, 1 reply; 28+ messages in thread
From: Rakie Kim @ 2025-04-11 7:21 UTC (permalink / raw)
To: Dan Williams
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, rakie.kim, akpm
On Tue, 8 Apr 2025 20:54:48 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> Dan Williams wrote:
> > >
> > > +struct sysfs_wi_group {
> > > + struct kobject wi_kobj;
> > > + struct iw_node_attr *nattrs[];
> > > +};
> > > +
> > > +static struct sysfs_wi_group *wi_group;
> > > +
> > > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > > char *buf)
> > > {
> > > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > return count;
> > > }
> > >
> > > -static struct iw_node_attr **node_attrs;
> > > -
> > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > - struct kobject *parent)
> > > +static void sysfs_wi_node_delete(int nid)
> > > {
> > > - if (!node_attr)
> > > + if (!wi_group->nattrs[nid])
> > > return;
> > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > - kfree(node_attr->kobj_attr.attr.name);
> > > - kfree(node_attr);
> > > +
> > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> >
> > This still looks broken to me, but I think this is more a problem that
> > was present in the original code.
> >
> > At this point @wi_group's reference count is zero because
> > sysfs_wi_release() has been called. However, it can only be zero if it has
> > properly transitioned through kobject_del() and final kobject_put(). It
> > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > for the !parent case.
> >
> > So, either you are not triggering that path, or testing that path, but
> > sys_remove_file() of the child attributes should be happening *before*
> > sysfs_wi_release().
> >
> > Did I miss something?
>
> I think the missing change is that sysfs_wi_node_add() failures need to
> be done with a sysfs_wi_node_delete() of the added attrs *before* the
> kobject_del() of @wi_group.
Hi Dan,
Thank you for pointing out this issue.
As you suggested, I believe the most appropriate way to handle this is
to incorporate your feedback into Patch 1
(mm/mempolicy: Fix memory leaks in weighted interleave sysfs).
To ensure that sysfs_remove_file() is called before kobject_del(), I
have restructured the code as follows:
<Previously>
static void sysfs_wi_release(struct kobject *wi_kobj)
{
int nid;
for (nid = 0; nid < nr_node_ids; nid++)
sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
-> ERROR: sysfs_remove_file called here
kfree(node_attrs);
kfree(wi_kobj);
}
<Now>
static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
{
int nid;
for (nid = 0; nid < nr_node_ids; nid++)
sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
-> sysfs_remove_file called here
}
static void sysfs_wi_release(struct kobject *wi_kobj)
{
kfree(node_attrs);
kfree(wi_kobj);
}
In addition, I call sysfs_wi_node_delete_all() before kobject_del()
during error handling:
+err_cleanup_kobj:
+ sysfs_wi_node_delete_all(wi_kobj);
kobject_del(wi_kobj);
I believe this resolves the issue you raised.
That said, I have a follow-up question. With this structure, when the
system is shutting down, sysfs_remove_file() will not be called. Based
on my review of other kernel subsystems, it seems that sysfs_remove_file()
is only called during module_exit() in driver code, and not in other
built-in subsystems.
Is this an acceptable practice? If you happen to know the expected
behavior in such cases, I would appreciate your insights.
Below is the full content of the updated Patch 1.
@@ -3463,8 +3463,8 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
static struct iw_node_attr **node_attrs;
-static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
- struct kobject *parent)
+static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
+ struct kobject *parent)
{
if (!node_attr)
return;
@@ -3473,13 +3473,16 @@ static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
kfree(node_attr);
}
-static void sysfs_wi_release(struct kobject *wi_kobj)
+static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
{
- int i;
+ int nid;
- for (i = 0; i < nr_node_ids; i++)
- sysfs_wi_node_release(node_attrs[i], wi_kobj);
+ for (nid = 0; nid < nr_node_ids; nid++)
+ sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
+}
+static void sysfs_wi_release(struct kobject *wi_kobj)
+{
kfree(node_attrs);
kfree(wi_kobj);
}
@@ -3547,13 +3550,14 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
err = add_weight_node(nid, wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- goto err_del_kobj;
+ goto err_cleanup_kobj;
}
}
return 0;
-err_del_kobj:
+err_cleanup_kobj:
+ sysfs_wi_node_delete_all(wi_kobj);
kobject_del(wi_kobj);
err_put_kobj:
kobject_put(wi_kobj);
Thank you again for your helpful feedback.
Rakie
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
2025-04-11 7:21 ` Rakie Kim
@ 2025-04-11 22:24 ` Dan Williams
0 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2025-04-11 22:24 UTC (permalink / raw)
To: Rakie Kim, Dan Williams
Cc: gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
ying.huang, david, Jonathan.Cameron, osalvador, kernel_team,
honggyu.kim, yunjeong.mun, rakie.kim, akpm
Rakie Kim wrote:
> On Tue, 8 Apr 2025 20:54:48 -0700 Dan Williams <dan.j.williams@intel.com> wrote:
> > Dan Williams wrote:
> > > >
> > > > +struct sysfs_wi_group {
> > > > + struct kobject wi_kobj;
> > > > + struct iw_node_attr *nattrs[];
> > > > +};
> > > > +
> > > > +static struct sysfs_wi_group *wi_group;
> > > > +
> > > > static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> > > > char *buf)
> > > > {
> > > > @@ -3461,27 +3468,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> > > > return count;
> > > > }
> > > >
> > > > -static struct iw_node_attr **node_attrs;
> > > > -
> > > > -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> > > > - struct kobject *parent)
> > > > +static void sysfs_wi_node_delete(int nid)
> > > > {
> > > > - if (!node_attr)
> > > > + if (!wi_group->nattrs[nid])
> > > > return;
> > > > - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> > > > - kfree(node_attr->kobj_attr.attr.name);
> > > > - kfree(node_attr);
> > > > +
> > > > + sysfs_remove_file(&wi_group->wi_kobj,
> > > > + &wi_group->nattrs[nid]->kobj_attr.attr);
> > >
> > > This still looks broken to me, but I think this is more a problem that
> > > was present in the original code.
> > >
> > > At this point @wi_group's reference count is zero because
> > > sysfs_wi_release() has been called. However, it can only be zero if it has
> > > properly transitioned through kobject_del() and final kobject_put(). It
> > > follows that kobject_del() arranges for kobj->sd to be NULL. That means
> > > that this *should* be hitting the WARN() in kernfs_remove_by_name_ns()
> > > for the !parent case.
> > >
> > > So, either you are not triggering that path, or testing that path, but
> > > sys_remove_file() of the child attributes should be happening *before*
> > > sysfs_wi_release().
> > >
> > > Did I miss something?
> >
> > I think the missing change is that sysfs_wi_node_add() failures need to
> > be done with a sysfs_wi_node_delete() of the added attrs *before* the
> > kobject_del() of @wi_group.
>
> Hi Dan,
>
> Thank you for pointing out this issue.
>
> As you suggested, I believe the most appropriate way to handle this is
> to incorporate your feedback into Patch 1
> (mm/mempolicy: Fix memory leaks in weighted interleave sysfs).
>
> To ensure that sysfs_remove_file() is called before kobject_del(), I
> have restructured the code as follows:
>
> <Previously>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> {
> int nid;
>
> for (nid = 0; nid < nr_node_ids; nid++)
> sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
> -> ERROR: sysfs_remove_file called here
> kfree(node_attrs);
> kfree(wi_kobj);
> }
>
> <Now>
> static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
> {
> int nid;
>
> for (nid = 0; nid < nr_node_ids; nid++)
> sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
At this point the nodes were live which means userspace could have
triggered an iw_table update. So I would expect that after all node
files have been deleted then this function frees the iw_table.
> -> sysfs_remove_file called here
Call iw_table_free() after the loop, where that is something like below
(untested!):
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b28a1e6ae096..88538f23c7d4 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3430,6 +3430,28 @@ static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
return sysfs_emit(buf, "%d\n", weight);
}
+static void iw_table_install(static u8 *new, struct iw_node_attr *node_attr, u8 weight)
+{
+ u8 *old;
+
+ mutex_lock(&iw_table_lock);
+ old = rcu_dereference_protected(iw_table,
+ lockdep_is_held(&iw_table_lock));
+ if (old && new)
+ memcpy(new, old, nr_node_ids);
+ if (new)
+ new[node_attr->nid] = weight;
+ rcu_assign_pointer(iw_table, new);
+ mutex_unlock(&iw_table_lock);
+ synchronize_rcu();
+ kfree(old);
+}
+
+static void iw_table_free(void)
+{
+ iw_table_install(NULL, NULL, 0);
+}
+
static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t count)
{
@@ -3447,17 +3469,8 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
new = kzalloc(nr_node_ids, GFP_KERNEL);
if (!new)
return -ENOMEM;
+ iw_table_install(new, node_attr, weight);
- mutex_lock(&iw_table_lock);
- old = rcu_dereference_protected(iw_table,
- lockdep_is_held(&iw_table_lock));
- if (old)
- memcpy(new, old, nr_node_ids);
- new[node_attr->nid] = weight;
- rcu_assign_pointer(iw_table, new);
- mutex_unlock(&iw_table_lock);
- synchronize_rcu();
- kfree(old);
return count;
}
@@ -3550,15 +3563,6 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
static void mempolicy_kobj_release(struct kobject *kobj)
{
- u8 *old;
-
- mutex_lock(&iw_table_lock);
- old = rcu_dereference_protected(iw_table,
- lockdep_is_held(&iw_table_lock));
- rcu_assign_pointer(iw_table, NULL);
- mutex_unlock(&iw_table_lock);
- synchronize_rcu();
- kfree(old);
kfree(node_attrs);
kfree(kobj);
}
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> {
> kfree(node_attrs);
> kfree(wi_kobj);
> }
>
> In addition, I call sysfs_wi_node_delete_all() before kobject_del()
> during error handling:
>
> +err_cleanup_kobj:
> + sysfs_wi_node_delete_all(wi_kobj);
> kobject_del(wi_kobj);
>
> I believe this resolves the issue you raised.
Yes, along with the iw_table_free() change because while it is not a
leak, it is awkward that mempolicy_kobj_release arranges to keep
iw_table allocated long past the time the node attributes have been
deleted and shutdown in sysfs.
> That said, I have a follow-up question. With this structure, when the
> system is shutting down, sysfs_remove_file() will not be called. Based
> on my review of other kernel subsystems, it seems that sysfs_remove_file()
> is only called during module_exit() in driver code, and not in other
> built-in subsystems.
Correct.
> Is this an acceptable practice? If you happen to know the expected
> behavior in such cases, I would appreciate your insights.
Yes, there are plenty of examples of sysfs infrastructure that gets set
up, but never torn down for the life of the kernel. The goal here is to
make the error unwind path correct and make the code clean for potentially
deleting mempolicy_kobj infrastructure in the future, but it is
otherwise ok if the only patch that calls kobject_del() for an object is
the error unwind path.
>
> Below is the full content of the updated Patch 1.
> @@ -3463,8 +3463,8 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>
> static struct iw_node_attr **node_attrs;
>
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> - struct kobject *parent)
> +static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
> + struct kobject *parent)
> {
> if (!node_attr)
> return;
> @@ -3473,13 +3473,16 @@ static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> kfree(node_attr);
> }
>
> -static void sysfs_wi_release(struct kobject *wi_kobj)
> +static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
> {
> - int i;
> + int nid;
>
> - for (i = 0; i < nr_node_ids; i++)
> - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> + for (nid = 0; nid < nr_node_ids; nid++)
> + sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
> +}
>
> +static void sysfs_wi_release(struct kobject *wi_kobj)
> +{
> kfree(node_attrs);
> kfree(wi_kobj);
> }
> @@ -3547,13 +3550,14 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> err = add_weight_node(nid, wi_kobj);
> if (err) {
> pr_err("failed to add sysfs [node%d]\n", nid);
> - goto err_del_kobj;
> + goto err_cleanup_kobj;
> }
> }
>
> return 0;
>
> -err_del_kobj:
> +err_cleanup_kobj:
> + sysfs_wi_node_delete_all(wi_kobj);
> kobject_del(wi_kobj);
> err_put_kobj:
> kobject_put(wi_kobj);
>
> Thank you again for your helpful feedback.
Hey, thanks for the patience to get this all fixed up properly.
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
2025-04-08 7:32 ` [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-04-08 13:45 ` Joshua Hahn
@ 2025-04-15 15:41 ` Jonathan Cameron
1 sibling, 0 replies; 28+ messages in thread
From: Jonathan Cameron @ 2025-04-15 15:41 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, osalvador, kernel_team,
honggyu.kim, yunjeong.mun
On Tue, 8 Apr 2025 16:32:40 +0900
Rakie Kim <rakie.kim@sk.com> wrote:
> Memory leaks occurred when removing sysfs attributes for weighted
> interleave. Improper kobject deallocation led to unreleased memory
> when initialization failed or when nodes were removed.
>
> This patch resolves the issue by replacing unnecessary `kfree()`
> calls with proper `kobject_del()` and `kobject_put()` sequences,
> ensuring correct teardown and preventing memory leaks.
>
> By explicitly calling `kobject_del()` before `kobject_put()`,
> the release function is now invoked safely, and internal sysfs
> state is correctly cleaned up. This guarantees that the memory
> associated with the kobject is fully released and avoids
> resource leaks, thereby improving system stability.
>
> Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Gregory Price <gourry@gourry.net>
LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
` (2 preceding siblings ...)
2025-04-09 9:05 ` David Hildenbrand
@ 2025-04-15 16:00 ` Jonathan Cameron
2025-04-16 4:04 ` Honggyu Kim
3 siblings, 1 reply; 28+ messages in thread
From: Jonathan Cameron @ 2025-04-15 16:00 UTC (permalink / raw)
To: Rakie Kim
Cc: akpm, gourry, linux-mm, linux-kernel, linux-cxl, joshua.hahnjy,
dan.j.williams, ying.huang, david, osalvador, kernel_team,
honggyu.kim, yunjeong.mun
On Tue, 8 Apr 2025 16:32:42 +0900
Rakie Kim <rakie.kim@sk.com> wrote:
> The weighted interleave policy distributes page allocations across multiple
> NUMA nodes based on their performance weight, thereby improving memory
> bandwidth utilization. The weight values for each node are configured
> through sysfs.
>
> Previously, sysfs entries for configuring weighted interleave were created
> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
> might not have memory. However, not all nodes in N_POSSIBLE are usable at
> runtime, as some may remain memoryless or offline.
> This led to sysfs entries being created for unusable nodes, causing
> potential misconfiguration issues.
>
> To address this issue, this patch modifies the sysfs creation logic to:
> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
> the creation of sysfs entries for nodes that cannot be used.
> 2) Support memory hotplug by dynamically adding and removing sysfs entries
> based on whether a node transitions into or out of the N_MEMORY state.
>
> Additionally, the patch ensures that sysfs attributes are properly managed
> when nodes go offline, preventing stale or redundant entries from persisting
> in the system.
>
> By making these changes, the weighted interleave policy now manages its
> sysfs entries more efficiently, ensuring that only relevant nodes are
> considered for interleaving, and dynamically adapting to memory hotplug
> events.
>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> ---
> mm/mempolicy.c | 106 ++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 83 insertions(+), 23 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 988575f29c53..9aa884107f4c 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -113,6 +113,7 @@
> #include <asm/tlbflush.h>
> #include <asm/tlb.h>
> #include <linux/uaccess.h>
> +#include <linux/memory.h>
>
> #include "internal.h"
>
> @@ -3421,6 +3422,7 @@ struct iw_node_attr {
>
> struct sysfs_wi_group {
> struct kobject wi_kobj;
> + struct mutex kobj_lock;
> struct iw_node_attr *nattrs[];
> };
>
> @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>
> static void sysfs_wi_node_delete(int nid)
> {
> - if (!wi_group->nattrs[nid])
> + struct iw_node_attr *attr;
> +
> + if (nid < 0 || nid >= nr_node_ids)
> + return;
> +
> + mutex_lock(&wi_group->kobj_lock);
> + attr = wi_group->nattrs[nid];
> + if (!attr) {
> + mutex_unlock(&wi_group->kobj_lock);
> return;
> + }
> +
> + wi_group->nattrs[nid] = NULL;
> + mutex_unlock(&wi_group->kobj_lock);
>
> - sysfs_remove_file(&wi_group->wi_kobj,
> - &wi_group->nattrs[nid]->kobj_attr.attr);
> - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> - kfree(wi_group->nattrs[nid]);
> + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
> + kfree(attr->kobj_attr.attr.name);
> + kfree(attr);
Here you go through a careful dance to not touch wi_group->nattrs[nid]
except under the lock, but later you are happy to do so in the
error handling paths. Maybe better to do similar to here and
set it to NULL under the lock but do the freeing on a copy taken
under that lock.
.
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> @@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = {
>
> static int sysfs_wi_node_add(int nid)
> {
> - struct iw_node_attr *node_attr;
> + int ret = 0;
Trivial but isn't ret always set when it is used? So no need to initialize
here.
> char *name;
> + struct iw_node_attr *new_attr = NULL;
This is also always set before use so I'm not seeing a
reason to initialize it to NULL.
>
> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> - if (!node_attr)
> + if (nid < 0 || nid >= nr_node_ids) {
> + pr_err("Invalid node id: %d\n", nid);
> + return -EINVAL;
> + }
> +
> + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
I'd prefer sizeof(*new_attr) because I'm lazy and don't like checking
types for allocation sizes :) Local style seems to be a bit
of a mix though.
> + if (!new_attr)
> return -ENOMEM;
>
> name = kasprintf(GFP_KERNEL, "node%d", nid);
> if (!name) {
> - kfree(node_attr);
> + kfree(new_attr);
> return -ENOMEM;
> }
>
> - sysfs_attr_init(&node_attr->kobj_attr.attr);
> - node_attr->kobj_attr.attr.name = name;
> - node_attr->kobj_attr.attr.mode = 0644;
> - node_attr->kobj_attr.show = node_show;
> - node_attr->kobj_attr.store = node_store;
> - node_attr->nid = nid;
> + mutex_lock(&wi_group->kobj_lock);
> + if (wi_group->nattrs[nid]) {
> + mutex_unlock(&wi_group->kobj_lock);
> + pr_info("Node [%d] already exists\n", nid);
> + kfree(new_attr);
> + kfree(name);
> + return 0;
> + }
> + wi_group->nattrs[nid] = new_attr;
>
> - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> - pr_err("failed to add attribute to weighted_interleave\n");
> - return -ENOMEM;
> + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
I'd have been tempted to use the new_attr pointer but perhaps
this brings some documentation like advantages.
> + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
> + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
> + wi_group->nattrs[nid]->kobj_attr.show = node_show;
> + wi_group->nattrs[nid]->kobj_attr.store = node_store;
> + wi_group->nattrs[nid]->nid = nid;
> +
> + ret = sysfs_create_file(&wi_group->wi_kobj,
> + &wi_group->nattrs[nid]->kobj_attr.attr);
> + if (ret) {
> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
See comment above on the rather different handling here to in
sysfs_wi_node_delete() where you set it to NULL first, release the lock and tidy up.
new_attrand name are still set so you could even combine the handling with the
if (wi_group->nattrs[nid]) above via appropriate gotos.
> + kfree(wi_group->nattrs[nid]);
> + wi_group->nattrs[nid] = NULL;
> + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
> }
> + mutex_unlock(&wi_group->kobj_lock);
>
> - wi_group->nattrs[nid] = node_attr;
> - return 0;
> + return ret;
> +}
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-15 16:00 ` Jonathan Cameron
@ 2025-04-16 4:04 ` Honggyu Kim
2025-04-16 7:37 ` Honggyu Kim
2025-04-16 7:49 ` Rakie Kim
0 siblings, 2 replies; 28+ messages in thread
From: Honggyu Kim @ 2025-04-16 4:04 UTC (permalink / raw)
To: Jonathan Cameron, Rakie Kim
Cc: kernel_team, akpm, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, david, osalvador,
yunjeong.mun
Hi Jonathan,
Thanks for reviewing our patches.
I have a few comments and the rest will be addressed by Rakie.
On 4/16/2025 1:00 AM, Jonathan Cameron wrote:
> On Tue, 8 Apr 2025 16:32:42 +0900
> Rakie Kim <rakie.kim@sk.com> wrote:
>
>> The weighted interleave policy distributes page allocations across multiple
>> NUMA nodes based on their performance weight, thereby improving memory
>> bandwidth utilization. The weight values for each node are configured
>> through sysfs.
>>
>> Previously, sysfs entries for configuring weighted interleave were created
>> for all possible nodes (N_POSSIBLE) at initialization, including nodes that
>> might not have memory. However, not all nodes in N_POSSIBLE are usable at
>> runtime, as some may remain memoryless or offline.
>> This led to sysfs entries being created for unusable nodes, causing
>> potential misconfiguration issues.
>>
>> To address this issue, this patch modifies the sysfs creation logic to:
>> 1) Limit sysfs entries to nodes that are online and have memory, avoiding
>> the creation of sysfs entries for nodes that cannot be used.
>> 2) Support memory hotplug by dynamically adding and removing sysfs entries
>> based on whether a node transitions into or out of the N_MEMORY state.
>>
>> Additionally, the patch ensures that sysfs attributes are properly managed
>> when nodes go offline, preventing stale or redundant entries from persisting
>> in the system.
>>
>> By making these changes, the weighted interleave policy now manages its
>> sysfs entries more efficiently, ensuring that only relevant nodes are
>> considered for interleaving, and dynamically adapting to memory hotplug
>> events.
>>
>> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
>> Signed-off-by: Yunjeong Mun <yunjeong.mun@sk.com>
>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>> ---
>> mm/mempolicy.c | 106 ++++++++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 83 insertions(+), 23 deletions(-)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index 988575f29c53..9aa884107f4c 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -113,6 +113,7 @@
>> #include <asm/tlbflush.h>
>> #include <asm/tlb.h>
>> #include <linux/uaccess.h>
>> +#include <linux/memory.h>
>>
>> #include "internal.h"
>>
>> @@ -3421,6 +3422,7 @@ struct iw_node_attr {
>>
>> struct sysfs_wi_group {
>> struct kobject wi_kobj;
>> + struct mutex kobj_lock;
>> struct iw_node_attr *nattrs[];
>> };
>>
>> @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>>
>> static void sysfs_wi_node_delete(int nid)
>> {
>> - if (!wi_group->nattrs[nid])
>> + struct iw_node_attr *attr;
>> +
>> + if (nid < 0 || nid >= nr_node_ids)
>> + return;
>> +
>> + mutex_lock(&wi_group->kobj_lock);
>> + attr = wi_group->nattrs[nid];
>> + if (!attr) {
>> + mutex_unlock(&wi_group->kobj_lock);
>> return;
>> + }
>> +
>> + wi_group->nattrs[nid] = NULL;
>> + mutex_unlock(&wi_group->kobj_lock);
>>
>> - sysfs_remove_file(&wi_group->wi_kobj,
>> - &wi_group->nattrs[nid]->kobj_attr.attr);
>> - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
>> - kfree(wi_group->nattrs[nid]);
>> + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
>> + kfree(attr->kobj_attr.attr.name);
>> + kfree(attr);
> Here you go through a careful dance to not touch wi_group->nattrs[nid]
> except under the lock, but later you are happy to do so in the
> error handling paths. Maybe better to do similar to here and
> set it to NULL under the lock but do the freeing on a copy taken
> under that lock.
> .
>> }
>>
>> static void sysfs_wi_release(struct kobject *wi_kobj)
>> @@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = {
>>
>> static int sysfs_wi_node_add(int nid)
>> {
>> - struct iw_node_attr *node_attr;
>> + int ret = 0;
>
> Trivial but isn't ret always set when it is used? So no need to initialize
> here.
If we don't initialize it, then this kind of trivial fixup might be needed later
so I think there is no reason not to initialize it.
https://lore.kernel.org/mm-commits/20240705010631.46743C4AF07@smtp.kernel.org
>
>> char *name;
>> + struct iw_node_attr *new_attr = NULL;
>
> This is also always set before use so I'm not seeing a
> reason to initialize it to NULL.
Ditto.
>
>
>>
>> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
>> - if (!node_attr)
>> + if (nid < 0 || nid >= nr_node_ids) {
>> + pr_err("Invalid node id: %d\n", nid);
>> + return -EINVAL;
>> + }
>> +
>> + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
>
> I'd prefer sizeof(*new_attr) because I'm lazy and don't like checking
> types for allocation sizes :) Local style seems to be a bit
> of a mix though.
Agreed.
>
>> + if (!new_attr)
>> return -ENOMEM;
>>
>> name = kasprintf(GFP_KERNEL, "node%d", nid);
>> if (!name) {
>> - kfree(node_attr);
>> + kfree(new_attr);
>> return -ENOMEM;
>> }
>>
>> - sysfs_attr_init(&node_attr->kobj_attr.attr);
>> - node_attr->kobj_attr.attr.name = name;
>> - node_attr->kobj_attr.attr.mode = 0644;
>> - node_attr->kobj_attr.show = node_show;
>> - node_attr->kobj_attr.store = node_store;
>> - node_attr->nid = nid;
>> + mutex_lock(&wi_group->kobj_lock);
>> + if (wi_group->nattrs[nid]) {
>> + mutex_unlock(&wi_group->kobj_lock);
>> + pr_info("Node [%d] already exists\n", nid);
>> + kfree(new_attr);
>> + kfree(name);
>> + return 0;
>> + }
>> + wi_group->nattrs[nid] = new_attr;
This set can be done after all the "wi_group->nattrs[nid]" related set is done.
>>
>> - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
>> - kfree(node_attr->kobj_attr.attr.name);
>> - kfree(node_attr);
>> - pr_err("failed to add attribute to weighted_interleave\n");
>> - return -ENOMEM;
>> + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
>
> I'd have been tempted to use the new_attr pointer but perhaps
> this brings some documentation like advantages.
+1
>
>> + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
>> + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
>> + wi_group->nattrs[nid]->kobj_attr.show = node_show;
>> + wi_group->nattrs[nid]->kobj_attr.store = node_store;
>> + wi_group->nattrs[nid]->nid = nid;
As Jonathan mentioned, all the "wi_group->nattrs[nid]" here is better to be
"new_attr" for simplicity.
Thanks,
Honggyu
>> +
>> + ret = sysfs_create_file(&wi_group->wi_kobj,
>> + &wi_group->nattrs[nid]->kobj_attr.attr);
>> + if (ret) {
>> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
>
> See comment above on the rather different handling here to in
> sysfs_wi_node_delete() where you set it to NULL first, release the lock and tidy up.
> new_attrand name are still set so you could even combine the handling with the
> if (wi_group->nattrs[nid]) above via appropriate gotos.
>
>> + kfree(wi_group->nattrs[nid]);
>> + wi_group->nattrs[nid] = NULL;
>> + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
>> }
>> + mutex_unlock(&wi_group->kobj_lock);
>>
>> - wi_group->nattrs[nid] = node_attr;
>> - return 0;
>> + return ret;
>> +}
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-16 4:04 ` Honggyu Kim
@ 2025-04-16 7:37 ` Honggyu Kim
2025-04-16 7:49 ` Rakie Kim
1 sibling, 0 replies; 28+ messages in thread
From: Honggyu Kim @ 2025-04-16 7:37 UTC (permalink / raw)
To: Jonathan Cameron, Rakie Kim
Cc: kernel_team, akpm, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, david, osalvador,
yunjeong.mun
On 4/16/2025 1:04 PM, Honggyu Kim wrote:
> Hi Jonathan,
>
> Thanks for reviewing our patches.
>
> I have a few comments and the rest will be addressed by Rakie.
>
> On 4/16/2025 1:00 AM, Jonathan Cameron wrote:
>> On Tue, 8 Apr 2025 16:32:42 +0900
>> Rakie Kim <rakie.kim@sk.com> wrote:
[...snip...]
>>> @@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = {
>>> static int sysfs_wi_node_add(int nid)
>>> {
>>> - struct iw_node_attr *node_attr;
>>> + int ret = 0;
>>
>> Trivial but isn't ret always set when it is used? So no need to initialize
>> here.
>
> If we don't initialize it, then this kind of trivial fixup might be needed later
> so I think there is no reason not to initialize it.
> https://lore.kernel.org/mm-commits/20240705010631.46743C4AF07@smtp.kernel.org
Ah. This is a different case. Please ignore this.
>
>>
>>> char *name;
>>> + struct iw_node_attr *new_attr = NULL;
>>
>> This is also always set before use so I'm not seeing a
>> reason to initialize it to NULL.
>
> Ditto.
Please ignore this too.
Thanks,
Honggyu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
2025-04-16 4:04 ` Honggyu Kim
2025-04-16 7:37 ` Honggyu Kim
@ 2025-04-16 7:49 ` Rakie Kim
1 sibling, 0 replies; 28+ messages in thread
From: Rakie Kim @ 2025-04-16 7:49 UTC (permalink / raw)
To: Honggyu Kim
Cc: kernel_team, akpm, gourry, linux-mm, linux-kernel, linux-cxl,
joshua.hahnjy, dan.j.williams, ying.huang, david, osalvador,
yunjeong.mun, Jonathan Cameron, Rakie Kim
On Wed, 16 Apr 2025 13:04:32 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
Hi Jonathan and Honggyu,
Thank you for reviewing this patch and for offering valuable ideas to
address the issues. I have accepted all of your suggestions and am
currently preparing a new patch series, version v8.
> Hi Jonathan,
>
> Thanks for reviewing our patches.
>
> I have a few comments and the rest will be addressed by Rakie.
>
> On 4/16/2025 1:00 AM, Jonathan Cameron wrote:
> > On Tue, 8 Apr 2025 16:32:42 +0900
> > Rakie Kim <rakie.kim@sk.com> wrote:
> >
> >> @@ -3470,13 +3472,24 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> >>
> >> static void sysfs_wi_node_delete(int nid)
> >> {
> >> - if (!wi_group->nattrs[nid])
> >> + struct iw_node_attr *attr;
> >> +
> >> + if (nid < 0 || nid >= nr_node_ids)
> >> + return;
> >> +
> >> + mutex_lock(&wi_group->kobj_lock);
> >> + attr = wi_group->nattrs[nid];
> >> + if (!attr) {
> >> + mutex_unlock(&wi_group->kobj_lock);
> >> return;
> >> + }
> >> +
> >> + wi_group->nattrs[nid] = NULL;
> >> + mutex_unlock(&wi_group->kobj_lock);
> >>
> >> - sysfs_remove_file(&wi_group->wi_kobj,
> >> - &wi_group->nattrs[nid]->kobj_attr.attr);
> >> - kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> >> - kfree(wi_group->nattrs[nid]);
> >> + sysfs_remove_file(&wi_group->wi_kobj, &attr->kobj_attr.attr);
> >> + kfree(attr->kobj_attr.attr.name);
> >> + kfree(attr);
> > Here you go through a careful dance to not touch wi_group->nattrs[nid]
> > except under the lock, but later you are happy to do so in the
> > error handling paths. Maybe better to do similar to here and
> > set it to NULL under the lock but do the freeing on a copy taken
> > under that lock.
I have updated the error handling path in sysfs_wi_node_add() as you
suggested.
> > .
> >> }
> >>
> >> static void sysfs_wi_release(struct kobject *wi_kobj)
> >> @@ -3495,35 +3508,77 @@ static const struct kobj_type wi_ktype = {
> >>
> >> static int sysfs_wi_node_add(int nid)
> >> {
> >> - struct iw_node_attr *node_attr;
> >> + int ret = 0;
> >
> > Trivial but isn't ret always set when it is used? So no need to initialize
> > here.
In the updated code for v8, I retained the initialization of `ret = 0`
because it is required for proper cleanup handling in the current
version.
>
> If we don't initialize it, then this kind of trivial fixup might be needed later
> so I think there is no reason not to initialize it.
> https://lore.kernel.org/mm-commits/20240705010631.46743C4AF07@smtp.kernel.org
>
> >
> >> char *name;
> >> + struct iw_node_attr *new_attr = NULL;
> >
> > This is also always set before use so I'm not seeing a
> > reason to initialize it to NULL.
>
> Ditto.
I also removed the unnecessary `= NULL` initializer for `new_attr`,
as it is always assigned before use.
>
> >
> >
> >>
> >> - node_attr = kzalloc(sizeof(*node_attr), GFP_KERNEL);
> >> - if (!node_attr)
> >> + if (nid < 0 || nid >= nr_node_ids) {
> >> + pr_err("Invalid node id: %d\n", nid);
> >> + return -EINVAL;
> >> + }
> >> +
> >> + new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
> >
> > I'd prefer sizeof(*new_attr) because I'm lazy and don't like checking
> > types for allocation sizes :) Local style seems to be a bit
> > of a mix though.
>
> Agreed.
As you recommended, I changed the allocation from
`sizeof(struct iw_node_attr)` to `sizeof(*new_attr)` for better
readability and consistency.
>
> >
> >> + if (!new_attr)
> >> return -ENOMEM;
> >>
> >> name = kasprintf(GFP_KERNEL, "node%d", nid);
> >> if (!name) {
> >> - kfree(node_attr);
> >> + kfree(new_attr);
> >> return -ENOMEM;
> >> }
> >>
> >> - sysfs_attr_init(&node_attr->kobj_attr.attr);
> >> - node_attr->kobj_attr.attr.name = name;
> >> - node_attr->kobj_attr.attr.mode = 0644;
> >> - node_attr->kobj_attr.show = node_show;
> >> - node_attr->kobj_attr.store = node_store;
> >> - node_attr->nid = nid;
> >> + mutex_lock(&wi_group->kobj_lock);
> >> + if (wi_group->nattrs[nid]) {
> >> + mutex_unlock(&wi_group->kobj_lock);
> >> + pr_info("Node [%d] already exists\n", nid);
> >> + kfree(new_attr);
> >> + kfree(name);
> >> + return 0;
> >> + }
> >> + wi_group->nattrs[nid] = new_attr;
>
> This set can be done after all the "wi_group->nattrs[nid]" related set is done.
>
> >>
> >> - if (sysfs_create_file(&wi_group->wi_kobj, &node_attr->kobj_attr.attr)) {
> >> - kfree(node_attr->kobj_attr.attr.name);
> >> - kfree(node_attr);
> >> - pr_err("failed to add attribute to weighted_interleave\n");
> >> - return -ENOMEM;
> >> + sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
> >
> > I'd have been tempted to use the new_attr pointer but perhaps
> > this brings some documentation like advantages.
>
> +1
Additionally, I replaced all usage of `wi_group->nattrs[nid]` in
sysfs_wi_node_add() with the `new_attr` pointer to simplify the logic
and improve clarity. This also aligns with your suggestion to treat
`new_attr` consistently throughout the function.
>
> >
> >> + wi_group->nattrs[nid]->kobj_attr.attr.name = name;
> >> + wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
> >> + wi_group->nattrs[nid]->kobj_attr.show = node_show;
> >> + wi_group->nattrs[nid]->kobj_attr.store = node_store;
> >> + wi_group->nattrs[nid]->nid = nid;
>
> As Jonathan mentioned, all the "wi_group->nattrs[nid]" here is better to be
> "new_attr" for simplicity.
>
> Thanks,
> Honggyu
>
> >> +
> >> + ret = sysfs_create_file(&wi_group->wi_kobj,
> >> + &wi_group->nattrs[nid]->kobj_attr.attr);
> >> + if (ret) {
> >> + kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
> >
> > See comment above on the rather different handling here to in
> > sysfs_wi_node_delete() where you set it to NULL first, release the lock and tidy up.
> > new_attrand name are still set so you could even combine the handling with the
> > if (wi_group->nattrs[nid]) above via appropriate gotos.
I agree with your observation regarding the difference in error
handling between sysfs_wi_node_add() and sysfs_wi_node_delete(), so I
refactored sysfs_wi_node_add() to follow the same structure.
I will apply all of these updates in the new v8 series. Thank you
again for your thoughtful and detailed feedback.
Below is the revised code after incorporating your feedback.
Rakie
@@ -3532,14 +3532,14 @@ static int sysfs_wi_node_add(int nid)
{
int ret = 0;
char *name;
- struct iw_node_attr *new_attr = NULL;
+ struct iw_node_attr *new_attr;
if (nid < 0 || nid >= nr_node_ids) {
- pr_err("Invalid node id: %d\n", nid);
+ pr_err("invalid node id: %d\n", nid);
return -EINVAL;
}
- new_attr = kzalloc(sizeof(struct iw_node_attr), GFP_KERNEL);
+ new_attr = kzalloc(sizeof(*new_attr), GFP_KERNEL);
if (!new_attr)
return -ENOMEM;
@@ -3549,33 +3549,32 @@ static int sysfs_wi_node_add(int nid)
return -ENOMEM;
}
+ sysfs_attr_init(&new_attr->kobj_attr.attr);
+ new_attr->kobj_attr.attr.name = name;
+ new_attr->kobj_attr.attr.mode = 0644;
+ new_attr->kobj_attr.show = node_show;
+ new_attr->kobj_attr.store = node_store;
+ new_attr->nid = nid;
+
mutex_lock(&wi_group->kobj_lock);
if (wi_group->nattrs[nid]) {
mutex_unlock(&wi_group->kobj_lock);
- pr_info("Node [%d] already exists\n", nid);
- kfree(new_attr);
- kfree(name);
- return 0;
+ pr_info("node%d already exists\n", nid);
+ goto out;
}
- wi_group->nattrs[nid] = new_attr;
-
- sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
- pr_info("Node [%d] already exists\n", nid);
- kfree(new_attr);
- kfree(name);
- return 0;
+ pr_info("node%d already exists\n", nid);
+ goto out;
}
- wi_group->nattrs[nid] = new_attr;
-
- sysfs_attr_init(&wi_group->nattrs[nid]->kobj_attr.attr);
- wi_group->nattrs[nid]->kobj_attr.attr.name = name;
- wi_group->nattrs[nid]->kobj_attr.attr.mode = 0644;
- wi_group->nattrs[nid]->kobj_attr.show = node_show;
- wi_group->nattrs[nid]->kobj_attr.store = node_store;
- wi_group->nattrs[nid]->nid = nid;
- ret = sysfs_create_file(&wi_group->wi_kobj,
- &wi_group->nattrs[nid]->kobj_attr.attr);
+ ret = sysfs_create_file(&wi_group->wi_kobj, &new_attr->kobj_attr.attr);
if (ret) {
- kfree(wi_group->nattrs[nid]->kobj_attr.attr.name);
- kfree(wi_group->nattrs[nid]);
- wi_group->nattrs[nid] = NULL;
- pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
+ mutex_unlock(&wi_group->kobj_lock);
+ goto out;
}
+ wi_group->nattrs[nid] = new_attr;
mutex_unlock(&wi_group->kobj_lock);
+ return 0;
+out:
+ kfree(new_attr->kobj_attr.attr.name);
+ kfree(new_attr);
return ret;
}
> >
> >> + kfree(wi_group->nattrs[nid]);
> >> + wi_group->nattrs[nid] = NULL;
> >> + pr_err("Failed to add attribute to weighted_interleave: %d\n", ret);
> >> }
> >> + mutex_unlock(&wi_group->kobj_lock);
> >>
> >> - wi_group->nattrs[nid] = node_attr;
> >> - return 0;
> >> + return ret;
> >> +}
> >
> >
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-04-16 7:50 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 7:32 [PATCH v7 0/3] Enhance sysfs handling for memory hotplug in weighted interleave Rakie Kim
2025-04-08 7:32 ` [PATCH v7 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-04-08 13:45 ` Joshua Hahn
2025-04-15 15:41 ` Jonathan Cameron
2025-04-08 7:32 ` [PATCH v7 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug Rakie Kim
2025-04-08 13:49 ` Joshua Hahn
2025-04-09 3:43 ` Dan Williams
2025-04-09 3:54 ` Dan Williams
2025-04-09 5:56 ` Rakie Kim
2025-04-09 18:51 ` Dan Williams
2025-04-10 7:53 ` Rakie Kim
2025-04-10 8:06 ` Rakie Kim
2025-04-11 3:11 ` Andrew Morton
2025-04-11 7:21 ` Rakie Kim
2025-04-11 22:24 ` Dan Williams
2025-04-08 7:32 ` [PATCH v7 3/3] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-04-08 13:52 ` Joshua Hahn
2025-04-08 14:45 ` Gregory Price
2025-04-09 9:05 ` David Hildenbrand
2025-04-09 11:39 ` Honggyu Kim
2025-04-09 11:52 ` David Hildenbrand
2025-04-10 7:53 ` Rakie Kim
2025-04-10 13:25 ` Honggyu Kim
2025-04-10 13:41 ` David Hildenbrand
2025-04-15 16:00 ` Jonathan Cameron
2025-04-16 4:04 ` Honggyu Kim
2025-04-16 7:37 ` Honggyu Kim
2025-04-16 7:49 ` Rakie Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).