* [PATCH v6 00/10] Implement numa node notifier
@ 2025-06-09 9:21 Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 01/10] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
` (9 more replies)
0 siblings, 10 replies; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
v5 -> v6:
- Remove redundant checks (per David)
- Fix build failure
- Drop 'nid' parameter from memory notify (Per David)
- Add RB/ACKs-by
v4 -> v5:
- Split out conversion for different consumers (per David)
- Renamed node-notifier actions (per David)
- Added new Documentation for new node-notifier and updated
the memory-notifier one to reflect the changes
- Make sure we do not trigger anything when !CONFIG_NUMA (per David)
v3 -> v4:
- Fix typos pointed out by Alok Tiwari
- Further cleanups suggested by Vlastimil
- Add RBs-by from Vlastimil
v2 -> v3:
- Add Suggested-by (David)
- Replace last N_NORMAL_MEMORY mention in slub (David)
- Replace the notifier for autoweitght-mempolicy
- Fix build on !CONFIG_MEMORY_HOTPLUG
v1 -> v2:
- Remove status_change_nid_normal and the code that
deals with it (David & Vlastimil)
- Remove slab_mem_offline_callback (David & Vlastimil)
- Change the order of canceling the notifiers
in {online,offline}_pages (Vlastimil)
- Fix up a couple of whitespaces (Jonathan Cameron)
- Add RBs-by
Memory notifier is a tool that allow consumers to get notified whenever
memory gets onlined or offlined in the system.
Currently, there are 10 consumers of that, but 5 out of those 10 consumers
are only interested in getting notifications when a numa node changes its
memory state.
That means going from memoryless to memory-aware of vice versa.
Which means that for every {online,offline}_pages operation they get
notified even though the numa node might not have changed its state.
This is suboptimal, and we want to decouple numa node state changes from
memory state changes.
While we are doing this, remove status_change_nid_normal, as the only
current user (slub) does not really need it.
This allows us to further simplify and clean up the code.
The first patch gets rid of status_change_nid_normal in slub.
The second patch implements a numa node notifier that does just that, and have
those consumers register in there, so they get notified only when they are
interested.
The third patch replaces 'status_change_nid{_normal}' fields within
memory_notify with a 'nid', as that is only what we need for memory
notifer and update the only user of it (page_ext).
Consumers that are only interested in numa node states change are:
- memory-tier
- slub
- cpuset
- hmat
- cxl
- autoweight-mempolicy
Oscar Salvador (10):
mm,slub: Do not special case N_NORMAL nodes for slab_nodes
mm,memory_hotplug: Remove status_change_nid_normal and update
documentation
mm,memory_hotplug: Implement numa node notifier
mm,slub: Use node-notifier instead of memory-notifier
mm,memory-tiers: Use node-notifier instead of memory-notifier
drivers,cxl: Use node-notifier instead of memory-notifier
drivers,hmat: Use node-notifier instead of memory-notifier
kernel,cpuset: Use node-notifier instead of memory-notifier
mm,mempolicy: Use node-notifier instead of memory-notifier
mm,memory_hotplug: Drop status_change_nid parameter from memory_notify
Documentation/core-api/memory-hotplug.rst | 76 ++++++--
.../zh_CN/core-api/memory-hotplug.rst | 3 -
drivers/acpi/numa/hmat.c | 8 +-
drivers/base/node.c | 21 +++
drivers/cxl/core/region.c | 16 +-
drivers/cxl/cxl.h | 4 +-
include/linux/memory.h | 2 -
include/linux/node.h | 40 +++++
kernel/cgroup/cpuset.c | 2 +-
mm/memory-tiers.c | 19 +-
mm/memory_hotplug.c | 165 ++++++++----------
mm/mempolicy.c | 13 +-
mm/page_ext.c | 16 +-
mm/slub.c | 60 ++-----
14 files changed, 238 insertions(+), 207 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH v6 01/10] mm,slub: Do not special case N_NORMAL nodes for slab_nodes
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 02/10] mm,memory_hotplug: Remove status_change_nid_normal and update documentation Oscar Salvador
` (8 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
Currently, slab_mem_going_online_callback() checks whether the node has
N_NORMAL memory in order to be set in slab_nodes.
While it is true that getting rid of that enforcing would mean
ending up with movables nodes in slab_nodes, the memory waste that comes
with that is negligible.
So stop checking for status_change_nid_normal and just use status_change_nid
instead which works for both types of memory.
Also, once we allocate the kmem_cache_node cache for the node in
slab_mem_online_callback(), we never deallocate it in
slab_mem_offline_callback() when the node goes memoryless, so we can just
get rid of it.
The side effects are that we will stop clearing the node from slab_nodes,
and also that newly created kmem caches after node hotremove will now allocate
their kmem_cache_node for the node(s) that was hotremoved, but these
should be negligible.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
mm/slub.c | 34 +++-------------------------------
1 file changed, 3 insertions(+), 31 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index be8b09e09d30..f92b43d36adc 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -447,7 +447,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
/*
* Tracks for which NUMA nodes we have kmem_cache_nodes allocated.
- * Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily
+ * Corresponds to node_state[N_MEMORY], but can temporarily
* differ during memory hotplug/hotremove operations.
* Protected by slab_mutex.
*/
@@ -6160,36 +6160,12 @@ static int slab_mem_going_offline_callback(void *arg)
return 0;
}
-static void slab_mem_offline_callback(void *arg)
-{
- struct memory_notify *marg = arg;
- int offline_node;
-
- offline_node = marg->status_change_nid_normal;
-
- /*
- * If the node still has available memory. we need kmem_cache_node
- * for it yet.
- */
- if (offline_node < 0)
- return;
-
- mutex_lock(&slab_mutex);
- node_clear(offline_node, slab_nodes);
- /*
- * We no longer free kmem_cache_node structures here, as it would be
- * racy with all get_node() users, and infeasible to protect them with
- * slab_mutex.
- */
- mutex_unlock(&slab_mutex);
-}
-
static int slab_mem_going_online_callback(void *arg)
{
struct kmem_cache_node *n;
struct kmem_cache *s;
struct memory_notify *marg = arg;
- int nid = marg->status_change_nid_normal;
+ int nid = marg->status_change_nid;
int ret = 0;
/*
@@ -6247,10 +6223,6 @@ static int slab_memory_callback(struct notifier_block *self,
case MEM_GOING_OFFLINE:
ret = slab_mem_going_offline_callback(arg);
break;
- case MEM_OFFLINE:
- case MEM_CANCEL_ONLINE:
- slab_mem_offline_callback(arg);
- break;
case MEM_ONLINE:
case MEM_CANCEL_OFFLINE:
break;
@@ -6321,7 +6293,7 @@ void __init kmem_cache_init(void)
* Initialize the nodemask for which we will allocate per node
* structures. Here we don't need taking slab_mutex yet.
*/
- for_each_node_state(node, N_NORMAL_MEMORY)
+ for_each_node_state(node, N_MEMORY)
node_set(node, slab_nodes);
create_boot_cache(kmem_cache_node, "kmem_cache_node",
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 02/10] mm,memory_hotplug: Remove status_change_nid_normal and update documentation
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 01/10] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
` (7 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
Now that the last user of status_change_nid_normal is gone, we can remove it.
Update documentation accordingly.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
Documentation/core-api/memory-hotplug.rst | 3 ---
.../translations/zh_CN/core-api/memory-hotplug.rst | 3 ---
include/linux/memory.h | 1 -
mm/memory_hotplug.c | 12 ------------
4 files changed, 19 deletions(-)
diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
index 682259ee633a..d1b8eb9add8a 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -56,14 +56,11 @@ The third argument (arg) passes a pointer of struct memory_notify::
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
- int status_change_nid_normal;
int status_change_nid;
}
- start_pfn is start_pfn of online/offline memory.
- nr_pages is # of pages of online/offline memory.
-- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
- is (will be) set/clear, if this is -1, then nodemask status is not changed.
- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
diff --git a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst b/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
index 9b2841fb9a5f..c2a4122ae221 100644
--- a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
+++ b/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
@@ -62,7 +62,6 @@ memory_notify结构体的指针::
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
- int status_change_nid_normal;
int status_change_nid;
}
@@ -70,8 +69,6 @@ memory_notify结构体的指针::
- nr_pages是在线/离线内存的页数。
-- status_change_nid_normal是当nodemask的N_NORMAL_MEMORY被设置/清除时设置节
- 点id,如果是-1,则nodemask状态不改变。
- status_change_nid是当nodemask的N_MEMORY被(将)设置/清除时设置的节点id。这
意味着一个新的(没上线的)节点通过联机获得新的内存,而一个节点失去了所有的内
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 5ec4e6d209b9..a9ccd6579422 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -109,7 +109,6 @@ struct memory_notify {
unsigned long altmap_nr_pages;
unsigned long start_pfn;
unsigned long nr_pages;
- int status_change_nid_normal;
int status_change_nid;
};
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b1caedbade5b..94ae0ca37021 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -706,19 +706,13 @@ static void node_states_check_changes_online(unsigned long nr_pages,
int nid = zone_to_nid(zone);
arg->status_change_nid = NUMA_NO_NODE;
- arg->status_change_nid_normal = NUMA_NO_NODE;
if (!node_state(nid, N_MEMORY))
arg->status_change_nid = nid;
- if (zone_idx(zone) <= ZONE_NORMAL && !node_state(nid, N_NORMAL_MEMORY))
- arg->status_change_nid_normal = nid;
}
static void node_states_set_node(int node, struct memory_notify *arg)
{
- if (arg->status_change_nid_normal >= 0)
- node_set_state(node, N_NORMAL_MEMORY);
-
if (arg->status_change_nid >= 0)
node_set_state(node, N_MEMORY);
}
@@ -1895,7 +1889,6 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
enum zone_type zt;
arg->status_change_nid = NUMA_NO_NODE;
- arg->status_change_nid_normal = NUMA_NO_NODE;
/*
* Check whether node_states[N_NORMAL_MEMORY] will be changed.
@@ -1907,8 +1900,6 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
*/
for (zt = 0; zt <= ZONE_NORMAL; zt++)
present_pages += pgdat->node_zones[zt].present_pages;
- if (zone_idx(zone) <= ZONE_NORMAL && nr_pages >= present_pages)
- arg->status_change_nid_normal = zone_to_nid(zone);
/*
* We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM
@@ -1927,9 +1918,6 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
static void node_states_clear_node(int node, struct memory_notify *arg)
{
- if (arg->status_change_nid_normal >= 0)
- node_clear_state(node, N_NORMAL_MEMORY);
-
if (arg->status_change_nid >= 0)
node_clear_state(node, N_MEMORY);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 01/10] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 02/10] mm,memory_hotplug: Remove status_change_nid_normal and update documentation Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-10 8:10 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 04/10] mm,slub: Use node-notifier instead of memory-notifier Oscar Salvador
` (6 subsequent siblings)
9 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
There are at least six consumers of hotplug_memory_notifier that what they
really are interested in is whether any numa node changed its state, e.g: going
from having memory to not having memory and vice versa.
Implement a specific notifier for numa nodes when their state gets changed,
which will later be used by those consumers that are only interested
in numa node state changes.
Add documentation as well.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
Documentation/core-api/memory-hotplug.rst | 66 +++++++++
drivers/base/node.c | 21 +++
include/linux/node.h | 40 ++++++
mm/memory_hotplug.c | 155 ++++++++++------------
4 files changed, 200 insertions(+), 82 deletions(-)
diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
index d1b8eb9add8a..b19c3be7437d 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -9,6 +9,9 @@ Memory hotplug event notifier
Hotplugging events are sent to a notification queue.
+Memory notifier
+----------------
+
There are six types of notification defined in ``include/linux/memory.h``:
MEM_GOING_ONLINE
@@ -80,6 +83,69 @@ further processing of the notification queue.
NOTIFY_STOP stops further processing of the notification queue.
+Numa node notifier
+------------------
+
+There are six types of notification defined in ``include/linux/node.h``:
+
+NODE_ADDING_FIRST_MEMORY
+ Generated before memory becomes available to this node for the first time.
+
+NODE_CANCEL_ADDING_FIRST_MEMORY
+ Generated if NODE_ADDING_FIRST_MEMORY fails.
+
+NODE_ADDED_FIRST_MEMORY
+ Generated when memory has become available fo this node for the first time.
+
+NODE_REMOVING_LAST_MEMORY
+ Generated when the last memory available to this node is about to be offlined.
+
+NODE_CANCEL_REMOVING_LAST_MEMORY
+ Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails.
+
+NODE_REMOVED_LAST_MEMORY
+ Generated when the last memory available to this node has been offlined.
+
+A callback routine can be registered by calling::
+
+ hotplug_node_notifier(callback_func, priority)
+
+Callback functions with higher values of priority are called before callback
+functions with lower values.
+
+A callback function must have the following prototype::
+
+ int callback_func(
+
+ struct notifier_block *self, unsigned long action, void *arg);
+
+The first argument of the callback function (self) is a pointer to the block
+of the notifier chain that points to the callback function itself.
+The second argument (action) is one of the event types described above.
+The third argument (arg) passes a pointer of struct node_notify::
+
+ struct node_notify {
+ int nid;
+ }
+
+- nid is the node we are adding or removing memory to.
+
+ If nid >= 0, callback should create/discard structures for the
+ node if necessary.
+
+The callback routine shall return one of the values
+NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
+defined in ``include/linux/notifier.h``
+
+NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
+
+NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY,
+NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or
+NODE_REMOVED_LAST_MEMORY action to cancel hotplugging.
+It stops further processing of the notification queue.
+
+NOTIFY_STOP stops further processing of the notification queue.
+
Locking Internals
=================
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 25ab9ec14eb8..c5b0859d846d 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -111,6 +111,27 @@ static const struct attribute_group *node_access_node_groups[] = {
NULL,
};
+#ifdef CONFIG_MEMORY_HOTPLUG
+static BLOCKING_NOTIFIER_HEAD(node_chain);
+
+int register_node_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&node_chain, nb);
+}
+EXPORT_SYMBOL(register_node_notifier);
+
+void unregister_node_notifier(struct notifier_block *nb)
+{
+ blocking_notifier_chain_unregister(&node_chain, nb);
+}
+EXPORT_SYMBOL(unregister_node_notifier);
+
+int node_notify(unsigned long val, void *v)
+{
+ return blocking_notifier_call_chain(&node_chain, val, v);
+}
+#endif
+
static void node_remove_accesses(struct node *node)
{
struct node_access_nodes *c, *cnext;
diff --git a/include/linux/node.h b/include/linux/node.h
index 2b7517892230..d7aa2636d948 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -123,6 +123,46 @@ static inline void register_memory_blocks_under_node(int nid, unsigned long star
#endif
extern void unregister_node(struct node *node);
+
+struct node_notify {
+ int nid;
+};
+
+#define NODE_ADDING_FIRST_MEMORY (1<<0)
+#define NODE_ADDED_FIRST_MEMORY (1<<1)
+#define NODE_CANCEL_ADDING_FIRST_MEMORY (1<<2)
+#define NODE_REMOVING_LAST_MEMORY (1<<3)
+#define NODE_REMOVED_LAST_MEMORY (1<<4)
+#define NODE_CANCEL_REMOVING_LAST_MEMORY (1<<5)
+
+#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
+extern int register_node_notifier(struct notifier_block *nb);
+extern void unregister_node_notifier(struct notifier_block *nb);
+extern int node_notify(unsigned long val, void *v);
+
+#define hotplug_node_notifier(fn, pri) ({ \
+ static __meminitdata struct notifier_block fn##_node_nb =\
+ { .notifier_call = fn, .priority = pri };\
+ register_node_notifier(&fn##_node_nb); \
+})
+#else
+static inline int register_node_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
+static inline void unregister_node_notifier(struct notifier_block *nb)
+{
+}
+static inline int node_notify(unsigned long val, void *v)
+{
+ return 0;
+}
+static inline int hotplug_node_notifier(notifier_fn_t fn, int pri)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_NUMA
extern void node_dev_init(void);
/* Core of the node registration - only memory hotplug should use this */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 94ae0ca37021..0550f3061fc4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -35,6 +35,7 @@
#include <linux/compaction.h>
#include <linux/rmap.h>
#include <linux/module.h>
+#include <linux/node.h>
#include <asm/tlbflush.h>
@@ -699,24 +700,6 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
online_mem_sections(start_pfn, end_pfn);
}
-/* check which state of node_states will be changed when online memory */
-static void node_states_check_changes_online(unsigned long nr_pages,
- struct zone *zone, struct memory_notify *arg)
-{
- int nid = zone_to_nid(zone);
-
- arg->status_change_nid = NUMA_NO_NODE;
-
- if (!node_state(nid, N_MEMORY))
- arg->status_change_nid = nid;
-}
-
-static void node_states_set_node(int node, struct memory_notify *arg)
-{
- if (arg->status_change_nid >= 0)
- node_set_state(node, N_MEMORY);
-}
-
static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages)
{
@@ -1171,7 +1154,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
int need_zonelists_rebuild = 0;
const int nid = zone_to_nid(zone);
int ret;
- struct memory_notify arg;
+ struct memory_notify mem_arg;
+ struct node_notify node_arg;
+ bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
/*
* {on,off}lining is constrained to full memory sections (or more
@@ -1188,11 +1173,22 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
/* associate pfn range with the zone */
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
- arg.start_pfn = pfn;
- arg.nr_pages = nr_pages;
- node_states_check_changes_online(nr_pages, zone, &arg);
+ node_arg.nid = NUMA_NO_NODE;
+ if (!node_state(nid, N_MEMORY)) {
+ /* Adding memory to the node for the first time */
+ cancel_node_notifier_on_err = true;
+ node_arg.nid = nid;
+ ret = node_notify(NODE_ADDING_FIRST_MEMORY, &node_arg);
+ ret = notifier_to_errno(ret);
+ if (ret)
+ goto failed_addition;
+ }
- ret = memory_notify(MEM_GOING_ONLINE, &arg);
+ mem_arg.start_pfn = pfn;
+ mem_arg.nr_pages = nr_pages;
+ mem_arg.status_change_nid = node_arg.nid;
+ cancel_mem_notifier_on_err = true;
+ ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
ret = notifier_to_errno(ret);
if (ret)
goto failed_addition;
@@ -1218,7 +1214,8 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
online_pages_range(pfn, nr_pages);
adjust_present_page_count(pfn_to_page(pfn), group, nr_pages);
- node_states_set_node(nid, &arg);
+ if (node_arg.nid >= 0)
+ node_set_state(nid, N_MEMORY);
if (need_zonelists_rebuild)
build_all_zonelists(NULL);
@@ -1239,16 +1236,23 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
kswapd_run(nid);
kcompactd_run(nid);
+ if (node_arg.nid >= 0)
+ /* First memory added successfully. Notify consumers. */
+ node_notify(NODE_ADDED_FIRST_MEMORY, &node_arg);
+
writeback_set_ratelimit();
- memory_notify(MEM_ONLINE, &arg);
+ memory_notify(MEM_ONLINE, &mem_arg);
return 0;
failed_addition:
pr_debug("online_pages [mem %#010llx-%#010llx] failed\n",
(unsigned long long) pfn << PAGE_SHIFT,
(((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
- memory_notify(MEM_CANCEL_ONLINE, &arg);
+ if (cancel_mem_notifier_on_err)
+ memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
+ if (cancel_node_notifier_on_err)
+ node_notify(NODE_CANCEL_ADDING_FIRST_MEMORY, &node_arg);
remove_pfn_range_from_zone(zone, pfn, nr_pages);
return ret;
}
@@ -1880,48 +1884,6 @@ static int __init cmdline_parse_movable_node(char *p)
}
early_param("movable_node", cmdline_parse_movable_node);
-/* check which state of node_states will be changed when offline memory */
-static void node_states_check_changes_offline(unsigned long nr_pages,
- struct zone *zone, struct memory_notify *arg)
-{
- struct pglist_data *pgdat = zone->zone_pgdat;
- unsigned long present_pages = 0;
- enum zone_type zt;
-
- arg->status_change_nid = NUMA_NO_NODE;
-
- /*
- * Check whether node_states[N_NORMAL_MEMORY] will be changed.
- * If the memory to be offline is within the range
- * [0..ZONE_NORMAL], and it is the last present memory there,
- * the zones in that range will become empty after the offlining,
- * thus we can determine that we need to clear the node from
- * node_states[N_NORMAL_MEMORY].
- */
- for (zt = 0; zt <= ZONE_NORMAL; zt++)
- present_pages += pgdat->node_zones[zt].present_pages;
-
- /*
- * We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM
- * does not apply as we don't support 32bit.
- * Here we count the possible pages from ZONE_MOVABLE.
- * If after having accounted all the pages, we see that the nr_pages
- * to be offlined is over or equal to the accounted pages,
- * we know that the node will become empty, and so, we can clear
- * it for N_MEMORY as well.
- */
- present_pages += pgdat->node_zones[ZONE_MOVABLE].present_pages;
-
- if (nr_pages >= present_pages)
- arg->status_change_nid = zone_to_nid(zone);
-}
-
-static void node_states_clear_node(int node, struct memory_notify *arg)
-{
- if (arg->status_change_nid >= 0)
- node_clear_state(node, N_MEMORY);
-}
-
static int count_system_ram_pages_cb(unsigned long start_pfn,
unsigned long nr_pages, void *data)
{
@@ -1937,13 +1899,17 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
- const unsigned long end_pfn = start_pfn + nr_pages;
- unsigned long pfn, managed_pages, system_ram_pages = 0;
- const int node = zone_to_nid(zone);
- unsigned long flags;
- struct memory_notify arg;
- char *reason;
int ret;
+ char *reason;
+ enum zone_type zt;
+ unsigned long flags;
+ struct memory_notify mem_arg;
+ struct node_notify node_arg;
+ const int node = zone_to_nid(zone);
+ struct pglist_data *pgdat = zone->zone_pgdat;
+ const unsigned long end_pfn = start_pfn + nr_pages;
+ unsigned long pfn, managed_pages, system_ram_pages = 0, present_pages = 0;
+ bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
/*
* {on,off}lining is constrained to full memory sections (or more
@@ -2000,11 +1966,30 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
goto failed_removal_pcplists_disabled;
}
- arg.start_pfn = start_pfn;
- arg.nr_pages = nr_pages;
- node_states_check_changes_offline(nr_pages, zone, &arg);
+ /*
+ * Here we count the possible pages within the range [0..ZONE_MOVABLE].
+ * If after having accounted all the pages, we see that the nr_pages to
+ * be offlined is greater or equal to the accounted pages, we know that the
+ * node will become empty, and so, we will clear N_MEMORY for it.
+ */
+ node_arg.nid = NUMA_NO_NODE;
+ for (zt = 0; zt <= ZONE_MOVABLE; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+
+ if (nr_pages >= present_pages) {
+ node_arg.nid = node;
+ cancel_node_notifier_on_err = true;
+ ret = node_notify(NODE_REMOVING_LAST_MEMORY, &node_arg);
+ ret = notifier_to_errno(ret);
+ if (ret)
+ goto failed_removal_isolated;
+ }
- ret = memory_notify(MEM_GOING_OFFLINE, &arg);
+ mem_arg.start_pfn = start_pfn;
+ mem_arg.nr_pages = nr_pages;
+ mem_arg.status_change_nid = node_arg.nid;
+ cancel_mem_notifier_on_err = true;
+ ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
ret = notifier_to_errno(ret);
if (ret) {
reason = "notifier failure";
@@ -2084,27 +2069,33 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
* Make sure to mark the node as memory-less before rebuilding the zone
* list. Otherwise this node would still appear in the fallback lists.
*/
- node_states_clear_node(node, &arg);
+ if (node_arg.nid >= 0)
+ node_clear_state(node, N_MEMORY);
if (!populated_zone(zone)) {
zone_pcp_reset(zone);
build_all_zonelists(NULL);
}
- if (arg.status_change_nid >= 0) {
+ if (node_arg.nid >= 0) {
kcompactd_stop(node);
kswapd_stop(node);
+ /* Node went memoryless. Notify consumers */
+ node_notify(NODE_REMOVED_LAST_MEMORY, &node_arg);
}
writeback_set_ratelimit();
- memory_notify(MEM_OFFLINE, &arg);
+ memory_notify(MEM_OFFLINE, &mem_arg);
remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
return 0;
failed_removal_isolated:
/* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
- memory_notify(MEM_CANCEL_OFFLINE, &arg);
+ if (cancel_mem_notifier_on_err)
+ memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
+ if (cancel_node_notifier_on_err)
+ node_notify(NODE_CANCEL_REMOVING_LAST_MEMORY, &node_arg);
failed_removal_pcplists_disabled:
lru_cache_enable();
zone_pcp_enable(zone);
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 04/10] mm,slub: Use node-notifier instead of memory-notifier
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
` (2 preceding siblings ...)
2025-06-09 9:21 ` [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-10 7:50 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 05/10] mm,memory-tiers: " Oscar Salvador
` (5 subsequent siblings)
9 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
slub is only concerned when a numa node changes its memory state,
so stop using the memory notifier and use the new numa node notifer
instead.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/slub.c | 28 +++++++++-------------------
1 file changed, 9 insertions(+), 19 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index f92b43d36adc..3ff0b94f3eeb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6146,7 +6146,7 @@ int __kmem_cache_shrink(struct kmem_cache *s)
return __kmem_cache_do_shrink(s);
}
-static int slab_mem_going_offline_callback(void *arg)
+static int slab_mem_going_offline_callback(void)
{
struct kmem_cache *s;
@@ -6160,21 +6160,12 @@ static int slab_mem_going_offline_callback(void *arg)
return 0;
}
-static int slab_mem_going_online_callback(void *arg)
+static int slab_mem_going_online_callback(int nid)
{
struct kmem_cache_node *n;
struct kmem_cache *s;
- struct memory_notify *marg = arg;
- int nid = marg->status_change_nid;
int ret = 0;
- /*
- * If the node's memory is already available, then kmem_cache_node is
- * already created. Nothing to do.
- */
- if (nid < 0)
- return 0;
-
/*
* We are bringing a node online. No memory is available yet. We must
* allocate a kmem_cache_node structure in order to bring the node
@@ -6214,17 +6205,16 @@ static int slab_mem_going_online_callback(void *arg)
static int slab_memory_callback(struct notifier_block *self,
unsigned long action, void *arg)
{
+ struct node_notify *nn = arg;
+ int nid = nn->nid;
int ret = 0;
switch (action) {
- case MEM_GOING_ONLINE:
- ret = slab_mem_going_online_callback(arg);
- break;
- case MEM_GOING_OFFLINE:
- ret = slab_mem_going_offline_callback(arg);
+ case NODE_ADDING_FIRST_MEMORY:
+ ret = slab_mem_going_online_callback(nid);
break;
- case MEM_ONLINE:
- case MEM_CANCEL_OFFLINE:
+ case NODE_REMOVING_LAST_MEMORY:
+ ret = slab_mem_going_offline_callback();
break;
}
if (ret)
@@ -6300,7 +6290,7 @@ void __init kmem_cache_init(void)
sizeof(struct kmem_cache_node),
SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
- hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+ hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
/* Able to allocate the per node structures */
slab_state = PARTIAL;
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 05/10] mm,memory-tiers: Use node-notifier instead of memory-notifier
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
` (3 preceding siblings ...)
2025-06-09 9:21 ` [PATCH v6 04/10] mm,slub: Use node-notifier instead of memory-notifier Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-10 7:51 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 06/10] drivers,cxl: " Oscar Salvador
` (4 subsequent siblings)
9 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
memory-tier is only concerned when a numa node changes its memory state,
because it then needs to re-create the demotion list.
So stop using the memory notifier and use the new numa node notifer
instead.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/memory-tiers.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index fc14fe53e9b7..0382b6942b8b 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -872,25 +872,18 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self,
unsigned long action, void *_arg)
{
struct memory_tier *memtier;
- struct memory_notify *arg = _arg;
-
- /*
- * Only update the node migration order when a node is
- * changing status, like online->offline.
- */
- if (arg->status_change_nid < 0)
- return notifier_from_errno(0);
+ struct node_notify *nn = _arg;
switch (action) {
- case MEM_OFFLINE:
+ case NODE_REMOVED_LAST_MEMORY:
mutex_lock(&memory_tier_lock);
- if (clear_node_memory_tier(arg->status_change_nid))
+ if (clear_node_memory_tier(nn->nid))
establish_demotion_targets();
mutex_unlock(&memory_tier_lock);
break;
- case MEM_ONLINE:
+ case NODE_ADDED_FIRST_MEMORY:
mutex_lock(&memory_tier_lock);
- memtier = set_node_memory_tier(arg->status_change_nid);
+ memtier = set_node_memory_tier(nn->nid);
if (!IS_ERR(memtier))
establish_demotion_targets();
mutex_unlock(&memory_tier_lock);
@@ -929,7 +922,7 @@ static int __init memory_tier_init(void)
nodes_and(default_dram_nodes, node_states[N_MEMORY],
node_states[N_CPU]);
- hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
+ hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
return 0;
}
subsys_initcall(memory_tier_init);
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 06/10] drivers,cxl: Use node-notifier instead of memory-notifier
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
` (4 preceding siblings ...)
2025-06-09 9:21 ` [PATCH v6 05/10] mm,memory-tiers: " Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-10 7:51 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 07/10] drivers,hmat: " Oscar Salvador
` (3 subsequent siblings)
9 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
memory-tier is only concerned when a numa node changes its memory state,
specifically when a numa node with memory comes into play for the first
time, because it needs to get its performance attributes to build a proper
demotion chain.
So stop using the memory notifier and use the new numa node notifer
instead.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
drivers/cxl/core/region.c | 16 ++++++++--------
drivers/cxl/cxl.h | 4 ++--
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c3f4dc244df7..261e07302ca4 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2432,12 +2432,12 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
unsigned long action, void *arg)
{
struct cxl_region *cxlr = container_of(nb, struct cxl_region,
- memory_notifier);
- struct memory_notify *mnb = arg;
- int nid = mnb->status_change_nid;
+ node_notifier);
+ struct node_notify *nn = arg;
+ int nid = nn->nid;
int region_nid;
- if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
+ if (action != NODE_ADDED_FIRST_MEMORY)
return NOTIFY_DONE;
/*
@@ -3484,7 +3484,7 @@ static void shutdown_notifiers(void *_cxlr)
{
struct cxl_region *cxlr = _cxlr;
- unregister_memory_notifier(&cxlr->memory_notifier);
+ unregister_node_notifier(&cxlr->node_notifier);
unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
}
@@ -3523,9 +3523,9 @@ static int cxl_region_probe(struct device *dev)
if (rc)
return rc;
- cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
- cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
- register_memory_notifier(&cxlr->memory_notifier);
+ cxlr->node_notifier.notifier_call = cxl_region_perf_attrs_callback;
+ cxlr->node_notifier.priority = CXL_CALLBACK_PRI;
+ register_node_notifier(&cxlr->node_notifier);
cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
cxlr->adist_notifier.priority = 100;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index a9ab46eb0610..48ac02dee881 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -513,7 +513,7 @@ enum cxl_partition_mode {
* @flags: Region state flags
* @params: active + config params for the region
* @coord: QoS access coordinates for the region
- * @memory_notifier: notifier for setting the access coordinates to node
+ * @node_notifier: notifier for setting the access coordinates to node
* @adist_notifier: notifier for calculating the abstract distance of node
*/
struct cxl_region {
@@ -526,7 +526,7 @@ struct cxl_region {
unsigned long flags;
struct cxl_region_params params;
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
- struct notifier_block memory_notifier;
+ struct notifier_block node_notifier;
struct notifier_block adist_notifier;
};
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 07/10] drivers,hmat: Use node-notifier instead of memory-notifier
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
` (5 preceding siblings ...)
2025-06-09 9:21 ` [PATCH v6 06/10] drivers,cxl: " Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-10 7:52 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 08/10] kernel,cpuset: " Oscar Salvador
` (2 subsequent siblings)
9 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
hmat driver is only concerned when a numa node changes its memory state,
specifically when a numa node with memory comes into play for the first
time, because it will register the memory_targets belonging to that numa
node.
So stop using the memory notifier and use the new numa node notifer
instead.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
drivers/acpi/numa/hmat.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 9d9052258e92..4958301f5417 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -962,10 +962,10 @@ static int hmat_callback(struct notifier_block *self,
unsigned long action, void *arg)
{
struct memory_target *target;
- struct memory_notify *mnb = arg;
- int pxm, nid = mnb->status_change_nid;
+ struct node_notify *nn = arg;
+ int pxm, nid = nn->nid;
- if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
+ if (action != NODE_ADDED_FIRST_MEMORY)
return NOTIFY_OK;
pxm = node_to_pxm(nid);
@@ -1118,7 +1118,7 @@ static __init int hmat_init(void)
hmat_register_targets();
/* Keep the table and structures if the notifier may use them */
- if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI))
+ if (hotplug_node_notifier(hmat_callback, HMAT_CALLBACK_PRI))
goto out_put;
if (!hmat_set_default_dram_perf())
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 08/10] kernel,cpuset: Use node-notifier instead of memory-notifier
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
` (6 preceding siblings ...)
2025-06-09 9:21 ` [PATCH v6 07/10] drivers,hmat: " Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 09/10] mm,mempolicy: " Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 10/10] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
9 siblings, 0 replies; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
cpuset is only concerned when a numa node changes its memory state,
as it needs to know the current numa nodes with memory to keep
an updated mems_allowed mask.
So stop using the memory notifier and use the new numa node notifer
instead.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
kernel/cgroup/cpuset.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 83639a12883d..66c84024f217 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -4013,7 +4013,7 @@ void __init cpuset_init_smp(void)
cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask);
top_cpuset.effective_mems = node_states[N_MEMORY];
- hotplug_memory_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI);
+ hotplug_node_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI);
cpuset_migrate_mm_wq = alloc_ordered_workqueue("cpuset_migrate_mm", 0);
BUG_ON(!cpuset_migrate_mm_wq);
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 09/10] mm,mempolicy: Use node-notifier instead of memory-notifier
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
` (7 preceding siblings ...)
2025-06-09 9:21 ` [PATCH v6 08/10] kernel,cpuset: " Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-10 7:52 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 10/10] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
9 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
mempolicy is only concerned when a numa node changes its memory state,
because it needs to take this node into account for the auto-weighted
memory policy system.
So stop using the memory notifier and use the new numa node notifer
instead.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Rakie Kim <rakie.kim@sk.com>
---
mm/mempolicy.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 72fd72e156b1..693319c2652d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3793,20 +3793,17 @@ static int wi_node_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
int err;
- struct memory_notify *arg = data;
- int nid = arg->status_change_nid;
-
- if (nid < 0)
- return NOTIFY_OK;
+ struct node_notify *nn = data;
+ int nid = nn->nid;
switch (action) {
- case MEM_ONLINE:
+ case NODE_ADDED_FIRST_MEMORY:
err = sysfs_wi_node_add(nid);
if (err)
pr_err("failed to add sysfs for node%d during hotplug: %d\n",
nid, err);
break;
- case MEM_OFFLINE:
+ case NODE_REMOVED_LAST_MEMORY:
sysfs_wi_node_delete(nid);
break;
}
@@ -3845,7 +3842,7 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
}
}
- hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
+ hotplug_node_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
return 0;
err_cleanup_kobj:
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v6 10/10] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
` (8 preceding siblings ...)
2025-06-09 9:21 ` [PATCH v6 09/10] mm,mempolicy: " Oscar Salvador
@ 2025-06-09 9:21 ` Oscar Salvador
2025-06-10 7:55 ` David Hildenbrand
9 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-09 9:21 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel,
Oscar Salvador
The 'status_change_nid' field was used to track changes in the memory
state of a numa node, but that funcionality has been decoupled from
memory_notify and moved to node_notify.
Current consumers of memory_notify are only interested in which node the
memory we are adding belongs to, but we can derive the nid from the pfn
because we call move_pfn_range_to_zone(), which sets the node in the page
via __init_single_page(), before calling in any memory notifier.
Drop the 'status_change_nid' parameter from 'memory_notify' struct and update documentation
accordingly.
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
Documentation/core-api/memory-hotplug.rst | 7 -------
include/linux/memory.h | 1 -
mm/memory_hotplug.c | 2 --
mm/page_ext.c | 16 +++-------------
4 files changed, 3 insertions(+), 23 deletions(-)
diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
index b19c3be7437d..74897713c4f8 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -59,17 +59,10 @@ The third argument (arg) passes a pointer of struct memory_notify::
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
- int status_change_nid;
}
- start_pfn is start_pfn of online/offline memory.
- nr_pages is # of pages of online/offline memory.
-- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
- set/clear. It means a new(memoryless) node gets new memory by online and a
- node loses all memory. If this is -1, then nodemask status is not changed.
-
- If status_changed_nid* >= 0, callback should create/discard structures for the
- node if necessary.
The callback routine shall return one of the values
NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a9ccd6579422..de8b898ada3f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -109,7 +109,6 @@ struct memory_notify {
unsigned long altmap_nr_pages;
unsigned long start_pfn;
unsigned long nr_pages;
- int status_change_nid;
};
struct notifier_block;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0550f3061fc4..07d7bdb65761 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1186,7 +1186,6 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
mem_arg.start_pfn = pfn;
mem_arg.nr_pages = nr_pages;
- mem_arg.status_change_nid = node_arg.nid;
cancel_mem_notifier_on_err = true;
ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
ret = notifier_to_errno(ret);
@@ -1987,7 +1986,6 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
mem_arg.start_pfn = start_pfn;
mem_arg.nr_pages = nr_pages;
- mem_arg.status_change_nid = node_arg.nid;
cancel_mem_notifier_on_err = true;
ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
ret = notifier_to_errno(ret);
diff --git a/mm/page_ext.c b/mm/page_ext.c
index c351fdfe9e9a..f08353802fa6 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -369,25 +369,15 @@ static void __invalidate_page_ext(unsigned long pfn)
}
static int __meminit online_page_ext(unsigned long start_pfn,
- unsigned long nr_pages,
- int nid)
+ unsigned long nr_pages)
{
unsigned long start, end, pfn;
int fail = 0;
+ int nid = pfn_to_nid(start_pfn);
start = SECTION_ALIGN_DOWN(start_pfn);
end = SECTION_ALIGN_UP(start_pfn + nr_pages);
- if (nid == NUMA_NO_NODE) {
- /*
- * In this case, "nid" already exists and contains valid memory.
- * "start_pfn" passed to us is a pfn which is an arg for
- * online__pages(), and start_pfn should exist.
- */
- nid = pfn_to_nid(start_pfn);
- VM_BUG_ON(!node_online(nid));
- }
-
for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION)
fail = init_section_page_ext(pfn, nid);
if (!fail)
@@ -436,7 +426,7 @@ static int __meminit page_ext_callback(struct notifier_block *self,
switch (action) {
case MEM_GOING_ONLINE:
ret = online_page_ext(mn->start_pfn,
- mn->nr_pages, mn->status_change_nid);
+ mn->nr_pages);
break;
case MEM_OFFLINE:
offline_page_ext(mn->start_pfn,
--
2.49.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH v6 04/10] mm,slub: Use node-notifier instead of memory-notifier
2025-06-09 9:21 ` [PATCH v6 04/10] mm,slub: Use node-notifier instead of memory-notifier Oscar Salvador
@ 2025-06-10 7:50 ` David Hildenbrand
0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-06-10 7:50 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 09.06.25 11:21, Oscar Salvador wrote:
> slub is only concerned when a numa node changes its memory state,
> so stop using the memory notifier and use the new numa node notifer
> instead.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 05/10] mm,memory-tiers: Use node-notifier instead of memory-notifier
2025-06-09 9:21 ` [PATCH v6 05/10] mm,memory-tiers: " Oscar Salvador
@ 2025-06-10 7:51 ` David Hildenbrand
0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-06-10 7:51 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 09.06.25 11:21, Oscar Salvador wrote:
> memory-tier is only concerned when a numa node changes its memory state,
> because it then needs to re-create the demotion list.
> So stop using the memory notifier and use the new numa node notifer
> instead.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 06/10] drivers,cxl: Use node-notifier instead of memory-notifier
2025-06-09 9:21 ` [PATCH v6 06/10] drivers,cxl: " Oscar Salvador
@ 2025-06-10 7:51 ` David Hildenbrand
0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-06-10 7:51 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 09.06.25 11:21, Oscar Salvador wrote:
> memory-tier is only concerned when a numa node changes its memory state,
> specifically when a numa node with memory comes into play for the first
> time, because it needs to get its performance attributes to build a proper
> demotion chain.
> So stop using the memory notifier and use the new numa node notifer
> instead.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 07/10] drivers,hmat: Use node-notifier instead of memory-notifier
2025-06-09 9:21 ` [PATCH v6 07/10] drivers,hmat: " Oscar Salvador
@ 2025-06-10 7:52 ` David Hildenbrand
0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-06-10 7:52 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 09.06.25 11:21, Oscar Salvador wrote:
> hmat driver is only concerned when a numa node changes its memory state,
> specifically when a numa node with memory comes into play for the first
> time, because it will register the memory_targets belonging to that numa
> node.
> So stop using the memory notifier and use the new numa node notifer
> instead.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 09/10] mm,mempolicy: Use node-notifier instead of memory-notifier
2025-06-09 9:21 ` [PATCH v6 09/10] mm,mempolicy: " Oscar Salvador
@ 2025-06-10 7:52 ` David Hildenbrand
0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-06-10 7:52 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 09.06.25 11:21, Oscar Salvador wrote:
> mempolicy is only concerned when a numa node changes its memory state,
> because it needs to take this node into account for the auto-weighted
> memory policy system.
> So stop using the memory notifier and use the new numa node notifer
> instead.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Rakie Kim <rakie.kim@sk.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 10/10] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify
2025-06-09 9:21 ` [PATCH v6 10/10] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
@ 2025-06-10 7:55 ` David Hildenbrand
2025-06-10 8:02 ` Oscar Salvador
0 siblings, 1 reply; 28+ messages in thread
From: David Hildenbrand @ 2025-06-10 7:55 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 09.06.25 11:21, Oscar Salvador wrote:
> The 'status_change_nid' field was used to track changes in the memory
> state of a numa node, but that funcionality has been decoupled from
> memory_notify and moved to node_notify.
> Current consumers of memory_notify are only interested in which node the
> memory we are adding belongs to, but we can derive the nid from the pfn
> because we call move_pfn_range_to_zone(), which sets the node in the page
> via __init_single_page(), before calling in any memory notifier.
>
> Drop the 'status_change_nid' parameter from 'memory_notify' struct and update documentation
> accordingly.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
> Documentation/core-api/memory-hotplug.rst | 7 -------
> include/linux/memory.h | 1 -
> mm/memory_hotplug.c | 2 --
> mm/page_ext.c | 16 +++-------------
> 4 files changed, 3 insertions(+), 23 deletions(-)
>
> diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
> index b19c3be7437d..74897713c4f8 100644
> --- a/Documentation/core-api/memory-hotplug.rst
> +++ b/Documentation/core-api/memory-hotplug.rst
> @@ -59,17 +59,10 @@ The third argument (arg) passes a pointer of struct memory_notify::
> struct memory_notify {
> unsigned long start_pfn;
> unsigned long nr_pages;
> - int status_change_nid;
> }
>
> - start_pfn is start_pfn of online/offline memory.
> - nr_pages is # of pages of online/offline memory.
> -- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
> - set/clear. It means a new(memoryless) node gets new memory by online and a
> - node loses all memory. If this is -1, then nodemask status is not changed.
> -
> - If status_changed_nid* >= 0, callback should create/discard structures for the
> - node if necessary.
>
> The callback routine shall return one of the values
> NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index a9ccd6579422..de8b898ada3f 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -109,7 +109,6 @@ struct memory_notify {
> unsigned long altmap_nr_pages;
> unsigned long start_pfn;
> unsigned long nr_pages;
> - int status_change_nid;
> };
>
> struct notifier_block;
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0550f3061fc4..07d7bdb65761 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1186,7 +1186,6 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>
> mem_arg.start_pfn = pfn;
> mem_arg.nr_pages = nr_pages;
> - mem_arg.status_change_nid = node_arg.nid;
> cancel_mem_notifier_on_err = true;
> ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
> ret = notifier_to_errno(ret);
> @@ -1987,7 +1986,6 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>
> mem_arg.start_pfn = start_pfn;
> mem_arg.nr_pages = nr_pages;
> - mem_arg.status_change_nid = node_arg.nid;
> cancel_mem_notifier_on_err = true;
> ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
> ret = notifier_to_errno(ret);
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index c351fdfe9e9a..f08353802fa6 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -369,25 +369,15 @@ static void __invalidate_page_ext(unsigned long pfn)
> }
>
> static int __meminit online_page_ext(unsigned long start_pfn,
> - unsigned long nr_pages,
> - int nid)
> + unsigned long nr_pages)
> {
> unsigned long start, end, pfn;
> int fail = 0;
> + int nid = pfn_to_nid(start_pfn);
Nit: Keep reverse xmas tree :)
>
> start = SECTION_ALIGN_DOWN(start_pfn);
> end = SECTION_ALIGN_UP(start_pfn + nr_pages);
>
> - if (nid == NUMA_NO_NODE) {
> - /*
> - * In this case, "nid" already exists and contains valid memory.
> - * "start_pfn" passed to us is a pfn which is an arg for
> - * online__pages(), and start_pfn should exist.
> - */
> - nid = pfn_to_nid(start_pfn);
> - VM_BUG_ON(!node_online(nid));
> - }
> -
> for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION)
> fail = init_section_page_ext(pfn, nid);
> if (!fail)
> @@ -436,7 +426,7 @@ static int __meminit page_ext_callback(struct notifier_block *self,
> switch (action) {
> case MEM_GOING_ONLINE:
> ret = online_page_ext(mn->start_pfn,
> - mn->nr_pages, mn->status_change_nid);
> + mn->nr_pages);
> break;
> case MEM_OFFLINE:
> offline_page_ext(mn->start_pfn,
I would have moved the page_ext stuff into a separate patch, including
documenting why that is fine (e.g., memmap initialized before
GOING_ONLINE call).
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 10/10] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify
2025-06-10 7:55 ` David Hildenbrand
@ 2025-06-10 8:02 ` Oscar Salvador
0 siblings, 0 replies; 28+ messages in thread
From: Oscar Salvador @ 2025-06-10 8:02 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On Tue, Jun 10, 2025 at 09:55:27AM +0200, David Hildenbrand wrote:
> > --- a/mm/page_ext.c
> > +++ b/mm/page_ext.c
> > @@ -369,25 +369,15 @@ static void __invalidate_page_ext(unsigned long pfn)
> > }
> > static int __meminit online_page_ext(unsigned long start_pfn,
> > - unsigned long nr_pages,
> > - int nid)
> > + unsigned long nr_pages)
> > {
> > unsigned long start, end, pfn;
> > int fail = 0;
> > + int nid = pfn_to_nid(start_pfn);
>
> Nit: Keep reverse xmas tree :)
Boh, I tend to, I guess this one slipped through the cracks :-(
> > @@ -436,7 +426,7 @@ static int __meminit page_ext_callback(struct notifier_block *self,
> > switch (action) {
> > case MEM_GOING_ONLINE:
> > ret = online_page_ext(mn->start_pfn,
> > - mn->nr_pages, mn->status_change_nid);
> > + mn->nr_pages);
> > break;
> > case MEM_OFFLINE:
> > offline_page_ext(mn->start_pfn,
>
> I would have moved the page_ext stuff into a separate patch, including
> documenting why that is fine (e.g., memmap initialized before GOING_ONLINE
> call).
I guess I misunderstood you during the previous feedback.
I mean, there is no hurry so I can do another respin and split page_ext
stuff from this one if you think that is worth and will add clarity.
> Acked-by: David Hildenbrand <david@redhat.com>
thanks man!
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-09 9:21 ` [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
@ 2025-06-10 8:10 ` David Hildenbrand
2025-06-16 8:30 ` Oscar Salvador
0 siblings, 1 reply; 28+ messages in thread
From: David Hildenbrand @ 2025-06-10 8:10 UTC (permalink / raw)
To: Oscar Salvador, Andrew Morton
Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 09.06.25 11:21, Oscar Salvador wrote:
> There are at least six consumers of hotplug_memory_notifier that what they
> really are interested in is whether any numa node changed its state, e.g: going
> from having memory to not having memory and vice versa.
>
> Implement a specific notifier for numa nodes when their state gets changed,
> which will later be used by those consumers that are only interested
> in numa node state changes.
>
> Add documentation as well.
>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Documentation/core-api/memory-hotplug.rst | 66 +++++++++
> drivers/base/node.c | 21 +++
> include/linux/node.h | 40 ++++++
> mm/memory_hotplug.c | 155 ++++++++++------------
> 4 files changed, 200 insertions(+), 82 deletions(-)
>
> diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
> index d1b8eb9add8a..b19c3be7437d 100644
> --- a/Documentation/core-api/memory-hotplug.rst
> +++ b/Documentation/core-api/memory-hotplug.rst
> @@ -9,6 +9,9 @@ Memory hotplug event notifier
>
> Hotplugging events are sent to a notification queue.
>
> +Memory notifier
> +----------------
> +
> There are six types of notification defined in ``include/linux/memory.h``:
>
> MEM_GOING_ONLINE
> @@ -80,6 +83,69 @@ further processing of the notification queue.
>
> NOTIFY_STOP stops further processing of the notification queue.
>
> +Numa node notifier
> +------------------
> +
> +There are six types of notification defined in ``include/linux/node.h``:
> +
> +NODE_ADDING_FIRST_MEMORY
> + Generated before memory becomes available to this node for the first time.
> +
> +NODE_CANCEL_ADDING_FIRST_MEMORY
> + Generated if NODE_ADDING_FIRST_MEMORY fails.
> +
> +NODE_ADDED_FIRST_MEMORY
> + Generated when memory has become available fo this node for the first time.
> +
> +NODE_REMOVING_LAST_MEMORY
> + Generated when the last memory available to this node is about to be offlined.
> +
> +NODE_CANCEL_REMOVING_LAST_MEMORY
> + Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails.
> +
> +NODE_REMOVED_LAST_MEMORY
> + Generated when the last memory available to this node has been offlined.
> +
> +A callback routine can be registered by calling::
> +
> + hotplug_node_notifier(callback_func, priority)
> +
> +Callback functions with higher values of priority are called before callback
> +functions with lower values.
> +
> +A callback function must have the following prototype::
> +
> + int callback_func(
> +
> + struct notifier_block *self, unsigned long action, void *arg);
> +
> +The first argument of the callback function (self) is a pointer to the block
> +of the notifier chain that points to the callback function itself.
> +The second argument (action) is one of the event types described above.
> +The third argument (arg) passes a pointer of struct node_notify::
> +
> + struct node_notify {
> + int nid;
> + }
> +
> +- nid is the node we are adding or removing memory to.
> +
> + If nid >= 0, callback should create/discard structures for the
> + node if necessary.
Likely that should be removed?
It' probably worth mentioning that one might get notified about
NODE_CANCEL_ADDING_FIRST_MEMORY even though never notified for
NODE_ADDING_FIRST_MEMORY. (same for removing)
I recall this can happen if one of the NODE_ADDING_FIRST_MEMORY
notifiers fails.
(same applies to MEM_CANCEL_*)
Consequently, we might simplify the cancel_mem_notifier_on_err etc
stuff, simply unconditionally calling the cancel counterparts.
> +
> +The callback routine shall return one of the values
> +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
> +defined in ``include/linux/notifier.h``
> +
> +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
> +
> +NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY,
> +NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or
> +NODE_REMOVED_LAST_MEMORY action to cancel hotplugging.
> +It stops further processing of the notification queue.
> +
> +NOTIFY_STOP stops further processing of the notification queue.
Should we docunment that failing NODE_ADDED_FIRST_MEMORY /
NODE_REMOVED_FIRST_MEMORY is very bad?
> +
> Locking Internals
> =================
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 25ab9ec14eb8..c5b0859d846d 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -111,6 +111,27 @@ static const struct attribute_group *node_access_node_groups[] = {
> NULL,
> };
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +static BLOCKING_NOTIFIER_HEAD(node_chain);
> +
> +int register_node_notifier(struct notifier_block *nb)
> +{
> + return blocking_notifier_chain_register(&node_chain, nb);
> +}
> +EXPORT_SYMBOL(register_node_notifier);
> +
> +void unregister_node_notifier(struct notifier_block *nb)
> +{
> + blocking_notifier_chain_unregister(&node_chain, nb);
> +}
> +EXPORT_SYMBOL(unregister_node_notifier);
> +
> +int node_notify(unsigned long val, void *v)
> +{
> + return blocking_notifier_call_chain(&node_chain, val, v);
> +}
> +#endif
> +
> static void node_remove_accesses(struct node *node)
> {
> struct node_access_nodes *c, *cnext;
> diff --git a/include/linux/node.h b/include/linux/node.h
> index 2b7517892230..d7aa2636d948 100644
> --- a/include/linux/node.h
> +++ b/include/linux/node.h
> @@ -123,6 +123,46 @@ static inline void register_memory_blocks_under_node(int nid, unsigned long star
> #endif
>
> extern void unregister_node(struct node *node);
> +
> +struct node_notify {
> + int nid;
> +};
> +
> +#define NODE_ADDING_FIRST_MEMORY (1<<0)
> +#define NODE_ADDED_FIRST_MEMORY (1<<1)
> +#define NODE_CANCEL_ADDING_FIRST_MEMORY (1<<2)
> +#define NODE_REMOVING_LAST_MEMORY (1<<3)
> +#define NODE_REMOVED_LAST_MEMORY (1<<4)
> +#define NODE_CANCEL_REMOVING_LAST_MEMORY (1<<5)
> +
> +#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
> +extern int register_node_notifier(struct notifier_block *nb);
> +extern void unregister_node_notifier(struct notifier_block *nb);
> +extern int node_notify(unsigned long val, void *v);
> +
> +#define hotplug_node_notifier(fn, pri) ({ \
> + static __meminitdata struct notifier_block fn##_node_nb =\
> + { .notifier_call = fn, .priority = pri };\
> + register_node_notifier(&fn##_node_nb); \
> +})
> +#else
> +static inline int register_node_notifier(struct notifier_block *nb)
> +{
> + return 0;
> +}
> +static inline void unregister_node_notifier(struct notifier_block *nb)
> +{
> +}
> +static inline int node_notify(unsigned long val, void *v)
> +{
> + return 0;
> +}
> +static inline int hotplug_node_notifier(notifier_fn_t fn, int pri)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_NUMA
> extern void node_dev_init(void);
> /* Core of the node registration - only memory hotplug should use this */
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 94ae0ca37021..0550f3061fc4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -35,6 +35,7 @@
> #include <linux/compaction.h>
> #include <linux/rmap.h>
> #include <linux/module.h>
> +#include <linux/node.h>
>
> #include <asm/tlbflush.h>
>
> @@ -699,24 +700,6 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
> online_mem_sections(start_pfn, end_pfn);
> }
>
> -/* check which state of node_states will be changed when online memory */
> -static void node_states_check_changes_online(unsigned long nr_pages,
> - struct zone *zone, struct memory_notify *arg)
> -{
> - int nid = zone_to_nid(zone);
> -
> - arg->status_change_nid = NUMA_NO_NODE;
> -
> - if (!node_state(nid, N_MEMORY))
> - arg->status_change_nid = nid;
> -}
> -
> -static void node_states_set_node(int node, struct memory_notify *arg)
> -{
> - if (arg->status_change_nid >= 0)
> - node_set_state(node, N_MEMORY);
> -}
> -
> static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
> unsigned long nr_pages)
> {
> @@ -1171,7 +1154,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
> int need_zonelists_rebuild = 0;
> const int nid = zone_to_nid(zone);
> int ret;
> - struct memory_notify arg;
> + struct memory_notify mem_arg;
> + struct node_notify node_arg;
> + bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
Prefer reverse xmas tree ... :)
[...]
> -
> static int count_system_ram_pages_cb(unsigned long start_pfn,
> unsigned long nr_pages, void *data)
> {
> @@ -1937,13 +1899,17 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
> int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> struct zone *zone, struct memory_group *group)
> {
> - const unsigned long end_pfn = start_pfn + nr_pages;
> - unsigned long pfn, managed_pages, system_ram_pages = 0;
> - const int node = zone_to_nid(zone);
> - unsigned long flags;
> - struct memory_notify arg;
> - char *reason;
> int ret;
> + char *reason;
> + enum zone_type zt;
> + unsigned long flags;
> + struct memory_notify mem_arg;
> + struct node_notify node_arg;
> + const int node = zone_to_nid(zone);
> + struct pglist_data *pgdat = zone->zone_pgdat;
> + const unsigned long end_pfn = start_pfn + nr_pages;
> + unsigned long pfn, managed_pages, system_ram_pages = 0, present_pages = 0;
> + bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
You'r now reversing the reversed christmas tree :)
>
> /*
> * {on,off}lining is constrained to full memory sections (or more
> @@ -2000,11 +1966,30 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> goto failed_removal_pcplists_disabled;
> }
>
> - arg.start_pfn = start_pfn;
> - arg.nr_pages = nr_pages;
> - node_states_check_changes_offline(nr_pages, zone, &arg);
> + /*
> + * Here we count the possible pages within the range [0..ZONE_MOVABLE].
> + * If after having accounted all the pages, we see that the nr_pages to
> + * be offlined is greater or equal to the accounted pages, we know that the
> + * node will become empty, and so, we will clear N_MEMORY for it.
> + */
> + node_arg.nid = NUMA_NO_NODE;
> + for (zt = 0; zt <= ZONE_MOVABLE; zt++)
> + present_pages += pgdat->node_zones[zt].present_pages;
Why can't we look at node_present_pages?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-10 8:10 ` David Hildenbrand
@ 2025-06-16 8:30 ` Oscar Salvador
2025-06-16 8:39 ` David Hildenbrand
0 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-16 8:30 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On Tue, Jun 10, 2025 at 10:10:21AM +0200, David Hildenbrand wrote:
> On 09.06.25 11:21, Oscar Salvador wrote:
> > +The first argument of the callback function (self) is a pointer to the block
> > +of the notifier chain that points to the callback function itself.
> > +The second argument (action) is one of the event types described above.
> > +The third argument (arg) passes a pointer of struct node_notify::
> > +
> > + struct node_notify {
> > + int nid;
> > + }
> > +
> > +- nid is the node we are adding or removing memory to.
> > +
> > + If nid >= 0, callback should create/discard structures for the
> > + node if necessary.
>
> Likely that should be removed?
Yes, indeed.
>
> It' probably worth mentioning that one might get notified about
> NODE_CANCEL_ADDING_FIRST_MEMORY even though never notified for
> NODE_ADDING_FIRST_MEMORY. (same for removing)
>
> I recall this can happen if one of the NODE_ADDING_FIRST_MEMORY notifiers
> fails.
>
> (same applies to MEM_CANCEL_*)
>
> Consequently, we might simplify the cancel_mem_notifier_on_err etc stuff,
> simply unconditionally calling the cancel counterparts.
So, I managed to do another respin with all feedback included, but I
left this one for the end, and here I'm.
It's true, currently users can get notified about e.g: MEM_CANCE_ONLINE without
going through MEM_GOING_ONLINE if another user fails for the latter, but I'm
trying to workaround the fact why that's not a problem.
Because assume you have a user of MEM_CANCEL_ONLINE, who thinks it got called
for MEM_GOING_ONLINE, while in fact it didn't because some other user fail on
it, and it tries to free some memory it thinks it initialized during MEM_GOING_ONLINE.
Isn't this a bit shaky? I mean, yes, I guess we can put the burden on the users of
the notifiers to not assume anything, but then yes, I think we should document this
as it can lead to potential misbeliefs.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 8:30 ` Oscar Salvador
@ 2025-06-16 8:39 ` David Hildenbrand
2025-06-16 8:50 ` Oscar Salvador
0 siblings, 1 reply; 28+ messages in thread
From: David Hildenbrand @ 2025-06-16 8:39 UTC (permalink / raw)
To: Oscar Salvador
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 16.06.25 10:30, Oscar Salvador wrote:
> On Tue, Jun 10, 2025 at 10:10:21AM +0200, David Hildenbrand wrote:
>> On 09.06.25 11:21, Oscar Salvador wrote:
>>> +The first argument of the callback function (self) is a pointer to the block
>>> +of the notifier chain that points to the callback function itself.
>>> +The second argument (action) is one of the event types described above.
>>> +The third argument (arg) passes a pointer of struct node_notify::
>>> +
>>> + struct node_notify {
>>> + int nid;
>>> + }
>>> +
>>> +- nid is the node we are adding or removing memory to.
>>> +
>>> + If nid >= 0, callback should create/discard structures for the
>>> + node if necessary.
>>
>> Likely that should be removed?
>
> Yes, indeed.
>
>>
>> It' probably worth mentioning that one might get notified about
>> NODE_CANCEL_ADDING_FIRST_MEMORY even though never notified for
>> NODE_ADDING_FIRST_MEMORY. (same for removing)
>>
>> I recall this can happen if one of the NODE_ADDING_FIRST_MEMORY notifiers
>> fails.
>>
>> (same applies to MEM_CANCEL_*)
>>
>> Consequently, we might simplify the cancel_mem_notifier_on_err etc stuff,
>> simply unconditionally calling the cancel counterparts.
>
> So, I managed to do another respin with all feedback included, but I
> left this one for the end, and here I'm.
>
> It's true, currently users can get notified about e.g: MEM_CANCE_ONLINE without
> going through MEM_GOING_ONLINE if another user fails for the latter, but I'm
> trying to workaround the fact why that's not a problem.
>
> Because assume you have a user of MEM_CANCEL_ONLINE, who thinks it got called
> for MEM_GOING_ONLINE, while in fact it didn't because some other user fail on
> it, and it tries to free some memory it thinks it initialized during MEM_GOING_ONLINE.
>
> Isn't this a bit shaky?
It's suboptimal yes, But to get it right, you'd have to remmeber for
exactly which notofiers you performed the calls ...
> I mean, yes, I guess we can put the burden on the users of
> the notifiers to not assume anything, but then yes, I think we should document this
> as it can lead to potential misbeliefs.
The burden is already on the users I think.
E.g., virio-mem maintains a "hotplug_active" variable, to detect whether
MEM_ONLINE was actually called.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 8:39 ` David Hildenbrand
@ 2025-06-16 8:50 ` Oscar Salvador
2025-06-16 8:52 ` David Hildenbrand
0 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-16 8:50 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On Mon, Jun 16, 2025 at 10:39:24AM +0200, David Hildenbrand wrote:
> It's suboptimal yes, But to get it right, you'd have to remmeber for exactly
> which notofiers you performed the calls ...
Yeah, definitely not straightforward.
> > I mean, yes, I guess we can put the burden on the users of
> > the notifiers to not assume anything, but then yes, I think we should document this
> > as it can lead to potential misbeliefs.
>
> The burden is already on the users I think.
Yes, it's already on them, although I'm not sure if all of them are aware
though.
But anyway, let's just document it to have the ruleset clear.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 8:50 ` Oscar Salvador
@ 2025-06-16 8:52 ` David Hildenbrand
2025-06-16 11:45 ` Oscar Salvador
0 siblings, 1 reply; 28+ messages in thread
From: David Hildenbrand @ 2025-06-16 8:52 UTC (permalink / raw)
To: Oscar Salvador
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 16.06.25 10:50, Oscar Salvador wrote:
> On Mon, Jun 16, 2025 at 10:39:24AM +0200, David Hildenbrand wrote:
>> It's suboptimal yes, But to get it right, you'd have to remmeber for exactly
>> which notofiers you performed the calls ...
>
> Yeah, definitely not straightforward.
>
>>> I mean, yes, I guess we can put the burden on the users of
>>> the notifiers to not assume anything, but then yes, I think we should document this
>>> as it can lead to potential misbeliefs.
>>
>> The burden is already on the users I think.
>
> Yes, it's already on them, although I'm not sure if all of them are aware
> though.
Probably worth checking, to make sure we don't have accidental bugs in
there ...
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 8:52 ` David Hildenbrand
@ 2025-06-16 11:45 ` Oscar Salvador
2025-06-16 12:21 ` David Hildenbrand
0 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-16 11:45 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On Mon, Jun 16, 2025 at 10:52:31AM +0200, David Hildenbrand wrote:
> Probably worth checking, to make sure we don't have accidental bugs in there
> ...
I did a quick sweep, and we should be cool since users of the node notifier
don't really use *_CANCEL* action. Only ADDED/REMOVED.
Now, users of memory notifier is a different story.
E.g: page_ext will call offline_page_ext to mark the section->page_ext invalid.
online_page_ext does:
base = alloc_page_ext(table_size, nid);
section->page_ext = (void *)base - page_ext_size * pfn;
This is fine, I think, offline_page_ext will not mark it as INVALID because
section->page_ext is NULL, so we just skip it.
This is just one example. I checked some others like kasan and hyperv and they
seem fine.
And anyway, the we could already hit this situation with MEM_* notifiers, so
nothing new.
I'll just make sure to document it so new users take this into account.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 11:45 ` Oscar Salvador
@ 2025-06-16 12:21 ` David Hildenbrand
2025-06-16 12:32 ` Oscar Salvador
0 siblings, 1 reply; 28+ messages in thread
From: David Hildenbrand @ 2025-06-16 12:21 UTC (permalink / raw)
To: Oscar Salvador
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 16.06.25 13:45, Oscar Salvador wrote:
> On Mon, Jun 16, 2025 at 10:52:31AM +0200, David Hildenbrand wrote:
>> Probably worth checking, to make sure we don't have accidental bugs in there
>> ...
>
> I did a quick sweep, and we should be cool since users of the node notifier
> don't really use *_CANCEL* action. Only ADDED/REMOVED.
>
> Now, users of memory notifier is a different story.
> E.g: page_ext will call offline_page_ext to mark the section->page_ext invalid.
>
> online_page_ext does:
>
> base = alloc_page_ext(table_size, nid);
> section->page_ext = (void *)base - page_ext_size * pfn;
>
> This is fine, I think, offline_page_ext will not mark it as INVALID because
> section->page_ext is NULL, so we just skip it.
>
> This is just one example. I checked some others like kasan and hyperv and they
> seem fine.
> And anyway, the we could already hit this situation with MEM_* notifiers, so
> nothing new.
Exactly. I recall I checked some of them in the past as well, when I
stumbled over this behavior.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 12:21 ` David Hildenbrand
@ 2025-06-16 12:32 ` Oscar Salvador
2025-06-16 12:35 ` David Hildenbrand
0 siblings, 1 reply; 28+ messages in thread
From: Oscar Salvador @ 2025-06-16 12:32 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On Mon, Jun 16, 2025 at 02:21:02PM +0200, David Hildenbrand wrote:
> Exactly. I recall I checked some of them in the past as well, when I
> stumbled over this behavior.
Now, about simplying the cancel_{mem,node}_notifier_on_err.
It would look like this:
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d6df85452c72..ff887f10b114 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1150,11 +1150,16 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
int online_pages(unsigned long pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
- bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
const int nid = zone_to_nid(zone);
int need_zonelists_rebuild = 0;
- struct memory_notify mem_arg;
- struct node_notify node_arg;
+ struct memory_notify mem_arg = {
+ .start_pfn = pfn,
+ .nr_pages = nr_pages,
+ .status_change_nid = NUMA_NO_NODE,
+ };
+ struct node_notify node_arg = {
+ .nid = NUMA_NO_NODE,
+ };
unsigned long flags;
int ret;
@@ -1173,21 +1178,16 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
/* associate pfn range with the zone */
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
- node_arg.nid = NUMA_NO_NODE;
if (!node_state(nid, N_MEMORY)) {
/* Adding memory to the node for the first time */
- cancel_node_notifier_on_err = true;
node_arg.nid = nid;
+ mem_arg.status_change_nid = nid;
ret = node_notify(NODE_ADDING_FIRST_MEMORY, &node_arg);
ret = notifier_to_errno(ret);
if (ret)
goto failed_addition;
}
- mem_arg.start_pfn = pfn;
- mem_arg.nr_pages = nr_pages;
- mem_arg.status_change_nid = node_arg.nid;
- cancel_mem_notifier_on_err = true;
ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
ret = notifier_to_errno(ret);
if (ret)
@@ -1249,9 +1249,8 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
pr_debug("online_pages [mem %#010llx-%#010llx] failed\n",
(unsigned long long) pfn << PAGE_SHIFT,
(((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
- if (cancel_mem_notifier_on_err)
- memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
- if (cancel_node_notifier_on_err)
+ memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
+ if (node_arg.nid != NUMA_NO_NODE)
node_notify(NODE_CANCEL_ADDING_FIRST_MEMORY, &node_arg);
remove_pfn_range_from_zone(zone, pfn, nr_pages);
return ret;
@@ -1899,13 +1898,18 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
- bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
unsigned long pfn, managed_pages, system_ram_pages = 0;
const unsigned long end_pfn = start_pfn + nr_pages;
struct pglist_data *pgdat = zone->zone_pgdat;
const int node = zone_to_nid(zone);
- struct memory_notify mem_arg;
- struct node_notify node_arg;
+ struct memory_notify mem_arg = {
+ .start_pfn = pfn,
+ .nr_pages = nr_pages,
+ .status_change_nid = NUMA_NO_NODE,
+ };
+ struct node_notify node_arg = {
+ .nid = NUMA_NO_NODE,
+ };
unsigned long flags;
char *reason;
int ret;
@@ -1970,20 +1974,15 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
* 'nr_pages' more. If so, we know that the node will become empty, and
* so we will clear N_MEMORY for it.
*/
- node_arg.nid = NUMA_NO_NODE;
if (nr_pages >= pgdat->node_present_pages) {
node_arg.nid = node;
- cancel_node_notifier_on_err = true;
+ mem_arg.status_change_nid = node;
ret = node_notify(NODE_REMOVING_LAST_MEMORY, &node_arg);
ret = notifier_to_errno(ret);
if (ret)
goto failed_removal_isolated;
}
- mem_arg.start_pfn = start_pfn;
- mem_arg.nr_pages = nr_pages;
- mem_arg.status_change_nid = node_arg.nid;
- cancel_mem_notifier_on_err = true;
ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
ret = notifier_to_errno(ret);
if (ret) {
@@ -2087,9 +2086,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
failed_removal_isolated:
/* pushback to free area */
undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
- if (cancel_mem_notifier_on_err)
- memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
- if (cancel_node_notifier_on_err)
+ memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
+ if (node_arg.nid != NUMA_NO_NODE)
node_notify(NODE_CANCEL_REMOVING_LAST_MEMORY, &node_arg);
failed_removal_pcplists_disabled:
lru_cache_enable();
Not sure if I like keeping the cancel_* stuff.
Strong opinion here? Feelings? :-)
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 12:32 ` Oscar Salvador
@ 2025-06-16 12:35 ` David Hildenbrand
2025-06-16 12:55 ` Oscar Salvador
0 siblings, 1 reply; 28+ messages in thread
From: David Hildenbrand @ 2025-06-16 12:35 UTC (permalink / raw)
To: Oscar Salvador
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On 16.06.25 14:32, Oscar Salvador wrote:
> On Mon, Jun 16, 2025 at 02:21:02PM +0200, David Hildenbrand wrote:
>> Exactly. I recall I checked some of them in the past as well, when I
>> stumbled over this behavior.
>
> Now, about simplying the cancel_{mem,node}_notifier_on_err.
> It would look like this:
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index d6df85452c72..ff887f10b114 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1150,11 +1150,16 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
> int online_pages(unsigned long pfn, unsigned long nr_pages,
> struct zone *zone, struct memory_group *group)
> {
> - bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
> const int nid = zone_to_nid(zone);
> int need_zonelists_rebuild = 0;
> - struct memory_notify mem_arg;
> - struct node_notify node_arg;
> + struct memory_notify mem_arg = {
> + .start_pfn = pfn,
> + .nr_pages = nr_pages,
> + .status_change_nid = NUMA_NO_NODE,
> + };
> + struct node_notify node_arg = {
> + .nid = NUMA_NO_NODE,
> + };
> unsigned long flags;
> int ret;
>
> @@ -1173,21 +1178,16 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
> /* associate pfn range with the zone */
> move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
>
> - node_arg.nid = NUMA_NO_NODE;
> if (!node_state(nid, N_MEMORY)) {
> /* Adding memory to the node for the first time */
> - cancel_node_notifier_on_err = true;
> node_arg.nid = nid;
> + mem_arg.status_change_nid = nid;
> ret = node_notify(NODE_ADDING_FIRST_MEMORY, &node_arg);
> ret = notifier_to_errno(ret);
> if (ret)
> goto failed_addition;
> }
>
> - mem_arg.start_pfn = pfn;
> - mem_arg.nr_pages = nr_pages;
> - mem_arg.status_change_nid = node_arg.nid;
> - cancel_mem_notifier_on_err = true;
> ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
> ret = notifier_to_errno(ret);
> if (ret)
> @@ -1249,9 +1249,8 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
> pr_debug("online_pages [mem %#010llx-%#010llx] failed\n",
> (unsigned long long) pfn << PAGE_SHIFT,
> (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
> - if (cancel_mem_notifier_on_err)
> - memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
> - if (cancel_node_notifier_on_err)
> + memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
> + if (node_arg.nid != NUMA_NO_NODE)
> node_notify(NODE_CANCEL_ADDING_FIRST_MEMORY, &node_arg);
> remove_pfn_range_from_zone(zone, pfn, nr_pages);
> return ret;
> @@ -1899,13 +1898,18 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
> int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> struct zone *zone, struct memory_group *group)
> {
> - bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
> unsigned long pfn, managed_pages, system_ram_pages = 0;
> const unsigned long end_pfn = start_pfn + nr_pages;
> struct pglist_data *pgdat = zone->zone_pgdat;
> const int node = zone_to_nid(zone);
> - struct memory_notify mem_arg;
> - struct node_notify node_arg;
> + struct memory_notify mem_arg = {
> + .start_pfn = pfn,
> + .nr_pages = nr_pages,
> + .status_change_nid = NUMA_NO_NODE,
> + };
> + struct node_notify node_arg = {
> + .nid = NUMA_NO_NODE,
> + };
> unsigned long flags;
> char *reason;
> int ret;
> @@ -1970,20 +1974,15 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> * 'nr_pages' more. If so, we know that the node will become empty, and
> * so we will clear N_MEMORY for it.
> */
> - node_arg.nid = NUMA_NO_NODE;
> if (nr_pages >= pgdat->node_present_pages) {
> node_arg.nid = node;
> - cancel_node_notifier_on_err = true;
> + mem_arg.status_change_nid = node;
> ret = node_notify(NODE_REMOVING_LAST_MEMORY, &node_arg);
> ret = notifier_to_errno(ret);
> if (ret)
> goto failed_removal_isolated;
> }
>
> - mem_arg.start_pfn = start_pfn;
> - mem_arg.nr_pages = nr_pages;
> - mem_arg.status_change_nid = node_arg.nid;
> - cancel_mem_notifier_on_err = true;
> ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
> ret = notifier_to_errno(ret);
> if (ret) {
> @@ -2087,9 +2086,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> failed_removal_isolated:
> /* pushback to free area */
> undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
> - if (cancel_mem_notifier_on_err)
> - memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
> - if (cancel_node_notifier_on_err)
> + memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
> + if (node_arg.nid != NUMA_NO_NODE)
> node_notify(NODE_CANCEL_REMOVING_LAST_MEMORY, &node_arg);
> failed_removal_pcplists_disabled:
> lru_cache_enable();
>
>
> Not sure if I like keeping the cancel_* stuff.
> Strong opinion here? Feelings? :-)
Looks cleaner to me at least :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier
2025-06-16 12:35 ` David Hildenbrand
@ 2025-06-16 12:55 ` Oscar Salvador
0 siblings, 0 replies; 28+ messages in thread
From: Oscar Salvador @ 2025-06-16 12:55 UTC (permalink / raw)
To: David Hildenbrand
Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
Rakie Kim, Hyeonggon Yoo, Joshua Hahn, linux-mm, linux-kernel
On Mon, Jun 16, 2025 at 02:35:54PM +0200, David Hildenbrand wrote:
> Looks cleaner to me at least :)
Alright then, let's that route :-)
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-06-16 12:55 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-09 9:21 [PATCH v6 00/10] Implement numa node notifier Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 01/10] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 02/10] mm,memory_hotplug: Remove status_change_nid_normal and update documentation Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 03/10] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
2025-06-10 8:10 ` David Hildenbrand
2025-06-16 8:30 ` Oscar Salvador
2025-06-16 8:39 ` David Hildenbrand
2025-06-16 8:50 ` Oscar Salvador
2025-06-16 8:52 ` David Hildenbrand
2025-06-16 11:45 ` Oscar Salvador
2025-06-16 12:21 ` David Hildenbrand
2025-06-16 12:32 ` Oscar Salvador
2025-06-16 12:35 ` David Hildenbrand
2025-06-16 12:55 ` Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 04/10] mm,slub: Use node-notifier instead of memory-notifier Oscar Salvador
2025-06-10 7:50 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 05/10] mm,memory-tiers: " Oscar Salvador
2025-06-10 7:51 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 06/10] drivers,cxl: " Oscar Salvador
2025-06-10 7:51 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 07/10] drivers,hmat: " Oscar Salvador
2025-06-10 7:52 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 08/10] kernel,cpuset: " Oscar Salvador
2025-06-09 9:21 ` [PATCH v6 09/10] mm,mempolicy: " Oscar Salvador
2025-06-10 7:52 ` David Hildenbrand
2025-06-09 9:21 ` [PATCH v6 10/10] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
2025-06-10 7:55 ` David Hildenbrand
2025-06-10 8:02 ` Oscar Salvador
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).