linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/11] Implement numa node notifier
@ 2025-06-16 13:51 Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 01/11] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
                   ` (11 more replies)
  0 siblings, 12 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

 v6 -> v7:
   - Split previous patch#10 in two, one for page_ext
     and the other to drop status_change_nid (per David)
   - Implement feedback on simplifying previous cancel_on_*
     notifiers and better document the fact that we consumers
     can get called on _CANCEL_* actions before having been called
     for previous actions (per David)
   - Add Acks-by

 v5 -> v6:
   - Remove redundant checks (per David)
   - Fix build failure
   - Drop 'nid' parameter from memory notify (Per David)
   - Add RB/ACKs-by

 v4 -> v5:
   - Split out conversion for different consumers (per David)
   - Renamed node-notifier actions (per David)
   - Added new Documentation for new node-notifier and updated
     the memory-notifier one to reflect the changes
   - Make sure we do not trigger anything when !CONFIG_NUMA (per David)

 v3 -> v4:
   - Fix typos pointed out by Alok Tiwari
   - Further cleanups suggested by Vlastimil
   - Add RBs-by from Vlastimil

 v2 -> v3:
   - Add Suggested-by (David)
   - Replace last N_NORMAL_MEMORY mention in slub (David)
   - Replace the notifier for autoweitght-mempolicy
   - Fix build on !CONFIG_MEMORY_HOTPLUG
 
 v1 -> v2:
   - Remove status_change_nid_normal and the code that
     deals with it (David & Vlastimil)
   - Remove slab_mem_offline_callback (David & Vlastimil)
   - Change the order of canceling the notifiers
     in {online,offline}_pages (Vlastimil)
   - Fix up a couple of whitespaces (Jonathan Cameron)
   - Add RBs-by

Memory notifier is a tool that allow consumers to get notified whenever
memory gets onlined or offlined in the system.
Currently, there are 10 consumers of that, but 5 out of those 10 consumers
are only interested in getting notifications when a numa node changes its
memory state.
That means going from memoryless to memory-aware of vice versa.

Which means that for every {online,offline}_pages operation they get
notified even though the numa node might not have changed its state.
This is suboptimal, and we want to decouple numa node state changes from
memory state changes.

While we are doing this, remove status_change_nid_normal, as the only
current user (slub) does not really need it.
This allows us to further simplify and clean up the code.

The first patch gets rid of status_change_nid_normal in slub.
The second patch implements a numa node notifier that does just that, and have
those consumers register in there, so they get notified only when they are
interested.

The third patch replaces 'status_change_nid{_normal}' fields within
memory_notify with a 'nid', as that is only what we need for memory
notifer and update the only user of it (page_ext).

Consumers that are only interested in numa node states change are:

 - memory-tier
 - slub
 - cpuset
 - hmat
 - cxl
 - autoweight-mempolicy

Oscar Salvador (11):
  mm,slub: Do not special case N_NORMAL nodes for slab_nodes
  mm,memory_hotplug: Remove status_change_nid_normal and update
    documentation
  mm,memory_hotplug: Implement numa node notifier
  mm,slub: Use node-notifier instead of memory-notifier
  mm,memory-tiers: Use node-notifier instead of memory-notifier
  drivers,cxl: Use node-notifier instead of memory-notifier
  drivers,hmat: Use node-notifier instead of memory-notifier
  kernel,cpuset: Use node-notifier instead of memory-notifier
  mm,mempolicy: Use node-notifier instead of memory-notifier
  mm,page_ext: Derive the node from the pfn
  mm,memory_hotplug: Drop status_change_nid parameter from memory_notify

 Documentation/core-api/memory-hotplug.rst     |  91 +++++++++--
 .../zh_CN/core-api/memory-hotplug.rst         |   3 -
 drivers/acpi/numa/hmat.c                      |   8 +-
 drivers/base/node.c                           |  21 +++
 drivers/cxl/core/region.c                     |  16 +-
 drivers/cxl/cxl.h                             |   4 +-
 include/linux/memory.h                        |   2 -
 include/linux/node.h                          |  40 +++++
 kernel/cgroup/cpuset.c                        |   2 +-
 mm/memory-tiers.c                             |  19 +--
 mm/memory_hotplug.c                           | 152 +++++++-----------
 mm/mempolicy.c                                |  13 +-
 mm/page_ext.c                                 |  17 +-
 mm/slub.c                                     |  60 ++-----
 14 files changed, 243 insertions(+), 205 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v7 01/11] mm,slub: Do not special case N_NORMAL nodes for slab_nodes
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 02/11] mm,memory_hotplug: Remove status_change_nid_normal and update documentation Oscar Salvador
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

Currently, slab_mem_going_online_callback() checks whether the node has
N_NORMAL memory in order to be set in slab_nodes.
While it is true that getting rid of that enforcing would mean
ending up with movables nodes in slab_nodes, the memory waste that comes
with that is negligible.

So stop checking for status_change_nid_normal and just use status_change_nid
instead which works for both types of memory.

Also, once we allocate the kmem_cache_node cache  for the node in
slab_mem_online_callback(), we never deallocate it in
slab_mem_offline_callback() when the node goes memoryless, so we can just
get rid of it.

The side effects are that we will stop clearing the node from slab_nodes,
and also that newly created kmem caches after node hotremove will now allocate
their kmem_cache_node for the node(s) that was hotremoved, but these
should be negligible.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/slub.c | 34 +++-------------------------------
 1 file changed, 3 insertions(+), 31 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index be8b09e09d30..f92b43d36adc 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -447,7 +447,7 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 
 /*
  * Tracks for which NUMA nodes we have kmem_cache_nodes allocated.
- * Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily
+ * Corresponds to node_state[N_MEMORY], but can temporarily
  * differ during memory hotplug/hotremove operations.
  * Protected by slab_mutex.
  */
@@ -6160,36 +6160,12 @@ static int slab_mem_going_offline_callback(void *arg)
 	return 0;
 }
 
-static void slab_mem_offline_callback(void *arg)
-{
-	struct memory_notify *marg = arg;
-	int offline_node;
-
-	offline_node = marg->status_change_nid_normal;
-
-	/*
-	 * If the node still has available memory. we need kmem_cache_node
-	 * for it yet.
-	 */
-	if (offline_node < 0)
-		return;
-
-	mutex_lock(&slab_mutex);
-	node_clear(offline_node, slab_nodes);
-	/*
-	 * We no longer free kmem_cache_node structures here, as it would be
-	 * racy with all get_node() users, and infeasible to protect them with
-	 * slab_mutex.
-	 */
-	mutex_unlock(&slab_mutex);
-}
-
 static int slab_mem_going_online_callback(void *arg)
 {
 	struct kmem_cache_node *n;
 	struct kmem_cache *s;
 	struct memory_notify *marg = arg;
-	int nid = marg->status_change_nid_normal;
+	int nid = marg->status_change_nid;
 	int ret = 0;
 
 	/*
@@ -6247,10 +6223,6 @@ static int slab_memory_callback(struct notifier_block *self,
 	case MEM_GOING_OFFLINE:
 		ret = slab_mem_going_offline_callback(arg);
 		break;
-	case MEM_OFFLINE:
-	case MEM_CANCEL_ONLINE:
-		slab_mem_offline_callback(arg);
-		break;
 	case MEM_ONLINE:
 	case MEM_CANCEL_OFFLINE:
 		break;
@@ -6321,7 +6293,7 @@ void __init kmem_cache_init(void)
 	 * Initialize the nodemask for which we will allocate per node
 	 * structures. Here we don't need taking slab_mutex yet.
 	 */
-	for_each_node_state(node, N_NORMAL_MEMORY)
+	for_each_node_state(node, N_MEMORY)
 		node_set(node, slab_nodes);
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 02/11] mm,memory_hotplug: Remove status_change_nid_normal and update documentation
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 01/11] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 03/11] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

Now that the last user of status_change_nid_normal is gone, we can remove it.
Update documentation accordingly.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 Documentation/core-api/memory-hotplug.rst            |  3 ---
 .../translations/zh_CN/core-api/memory-hotplug.rst   |  3 ---
 include/linux/memory.h                               |  1 -
 mm/memory_hotplug.c                                  | 12 ------------
 4 files changed, 19 deletions(-)

diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
index 682259ee633a..d1b8eb9add8a 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -56,14 +56,11 @@ The third argument (arg) passes a pointer of struct memory_notify::
 	struct memory_notify {
 		unsigned long start_pfn;
 		unsigned long nr_pages;
-		int status_change_nid_normal;
 		int status_change_nid;
 	}
 
 - start_pfn is start_pfn of online/offline memory.
 - nr_pages is # of pages of online/offline memory.
-- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
-  is (will be) set/clear, if this is -1, then nodemask status is not changed.
 - status_change_nid is set node id when N_MEMORY of nodemask is (will be)
   set/clear. It means a new(memoryless) node gets new memory by online and a
   node loses all memory. If this is -1, then nodemask status is not changed.
diff --git a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst b/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
index 9b2841fb9a5f..c2a4122ae221 100644
--- a/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
+++ b/Documentation/translations/zh_CN/core-api/memory-hotplug.rst
@@ -62,7 +62,6 @@ memory_notify结构体的指针::
 	struct memory_notify {
 		unsigned long start_pfn;
 		unsigned long nr_pages;
-		int status_change_nid_normal;
 		int status_change_nid;
 	}
 
@@ -70,8 +69,6 @@ memory_notify结构体的指针::
 
 - nr_pages是在线/离线内存的页数。
 
-- status_change_nid_normal是当nodemask的N_NORMAL_MEMORY被设置/清除时设置节
-  点id,如果是-1,则nodemask状态不改变。
 
 - status_change_nid是当nodemask的N_MEMORY被(将)设置/清除时设置的节点id。这
   意味着一个新的(没上线的)节点通过联机获得新的内存,而一个节点失去了所有的内
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 5ec4e6d209b9..a9ccd6579422 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -109,7 +109,6 @@ struct memory_notify {
 	unsigned long altmap_nr_pages;
 	unsigned long start_pfn;
 	unsigned long nr_pages;
-	int status_change_nid_normal;
 	int status_change_nid;
 };
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b1caedbade5b..94ae0ca37021 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -706,19 +706,13 @@ static void node_states_check_changes_online(unsigned long nr_pages,
 	int nid = zone_to_nid(zone);
 
 	arg->status_change_nid = NUMA_NO_NODE;
-	arg->status_change_nid_normal = NUMA_NO_NODE;
 
 	if (!node_state(nid, N_MEMORY))
 		arg->status_change_nid = nid;
-	if (zone_idx(zone) <= ZONE_NORMAL && !node_state(nid, N_NORMAL_MEMORY))
-		arg->status_change_nid_normal = nid;
 }
 
 static void node_states_set_node(int node, struct memory_notify *arg)
 {
-	if (arg->status_change_nid_normal >= 0)
-		node_set_state(node, N_NORMAL_MEMORY);
-
 	if (arg->status_change_nid >= 0)
 		node_set_state(node, N_MEMORY);
 }
@@ -1895,7 +1889,6 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
 	enum zone_type zt;
 
 	arg->status_change_nid = NUMA_NO_NODE;
-	arg->status_change_nid_normal = NUMA_NO_NODE;
 
 	/*
 	 * Check whether node_states[N_NORMAL_MEMORY] will be changed.
@@ -1907,8 +1900,6 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
 	 */
 	for (zt = 0; zt <= ZONE_NORMAL; zt++)
 		present_pages += pgdat->node_zones[zt].present_pages;
-	if (zone_idx(zone) <= ZONE_NORMAL && nr_pages >= present_pages)
-		arg->status_change_nid_normal = zone_to_nid(zone);
 
 	/*
 	 * We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM
@@ -1927,9 +1918,6 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
 
 static void node_states_clear_node(int node, struct memory_notify *arg)
 {
-	if (arg->status_change_nid_normal >= 0)
-		node_clear_state(node, N_NORMAL_MEMORY);
-
 	if (arg->status_change_nid >= 0)
 		node_clear_state(node, N_MEMORY);
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 03/11] mm,memory_hotplug: Implement numa node notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 01/11] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 02/11] mm,memory_hotplug: Remove status_change_nid_normal and update documentation Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 14:31   ` David Hildenbrand
  2025-06-16 13:51 ` [PATCH v7 04/11] mm,slub: Use node-notifier instead of memory-notifier Oscar Salvador
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

There are at least six consumers of hotplug_memory_notifier that what they
really are interested in is whether any numa node changed its state, e.g: going
from having memory to not having memory and vice versa.

Implement a specific notifier for numa nodes when their state gets changed,
which will later be used by those consumers that are only interested
in numa node state changes.

Add documentation as well.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/core-api/memory-hotplug.rst |  83 +++++++++++++
 drivers/base/node.c                       |  21 ++++
 include/linux/node.h                      |  40 ++++++
 mm/memory_hotplug.c                       | 144 ++++++++++------------
 4 files changed, 208 insertions(+), 80 deletions(-)

diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
index d1b8eb9add8a..fb84e78968b2 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -9,6 +9,9 @@ Memory hotplug event notifier
 
 Hotplugging events are sent to a notification queue.
 
+Memory notifier
+----------------
+
 There are six types of notification defined in ``include/linux/memory.h``:
 
 MEM_GOING_ONLINE
@@ -68,6 +71,14 @@ The third argument (arg) passes a pointer of struct memory_notify::
   If status_changed_nid* >= 0, callback should create/discard structures for the
   node if necessary.
 
+It is possible to get notified for MEM_CANCEL_ONLINE without having been notified
+for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and
+MEM_GOING_OFFLINE.
+This can happen when a consumer fails, meaning we break the callchain and we
+stop calling the remaining consumers of the notifier.
+It is then important that users of memory_notify make no assumptions and get
+prepared to handle such cases.
+
 The callback routine shall return one of the values
 NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
 defined in ``include/linux/notifier.h``
@@ -80,6 +91,78 @@ further processing of the notification queue.
 
 NOTIFY_STOP stops further processing of the notification queue.
 
+Numa node notifier
+------------------
+
+There are six types of notification defined in ``include/linux/node.h``:
+
+NODE_ADDING_FIRST_MEMORY
+ Generated before memory becomes available to this node for the first time.
+
+NODE_CANCEL_ADDING_FIRST_MEMORY
+ Generated if NODE_ADDING_FIRST_MEMORY fails.
+
+NODE_ADDED_FIRST_MEMORY
+ Generated when memory has become available fo this node for the first time.
+
+NODE_REMOVING_LAST_MEMORY
+ Generated when the last memory available to this node is about to be offlined.
+
+NODE_CANCEL_REMOVING_LAST_MEMORY
+ Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails.
+
+NODE_REMOVED_LAST_MEMORY
+ Generated when the last memory available to this node has been offlined.
+
+A callback routine can be registered by calling::
+
+  hotplug_node_notifier(callback_func, priority)
+
+Callback functions with higher values of priority are called before callback
+functions with lower values.
+
+A callback function must have the following prototype::
+
+  int callback_func(
+
+    struct notifier_block *self, unsigned long action, void *arg);
+
+The first argument of the callback function (self) is a pointer to the block
+of the notifier chain that points to the callback function itself.
+The second argument (action) is one of the event types described above.
+The third argument (arg) passes a pointer of struct node_notify::
+
+        struct node_notify {
+                int nid;
+        }
+
+- nid is the node we are adding or removing memory to.
+
+It is possible to get notified for NODE_CANCEL_ADDING_FIRST_MEMORY without
+having been notified for NODE_ADDING_FIRST_MEMORY, and the same applies to
+NODE_CANCEL_REMOVING_LAST_MEMORY and NODE_REMOVING_LAST_MEMORY.
+This can happen when a consumer fails, meaning we break the callchain and we
+stop calling the remaining consumers of the notifier.
+It is then important that users of node_notify make no assumptions and get
+prepared to handle such cases.
+
+The callback routine shall return one of the values
+NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
+defined in ``include/linux/notifier.h``
+
+NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
+
+NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY,
+NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or
+NODE_REMOVED_LAST_MEMORY action to cancel hotplugging.
+It stops further processing of the notification queue.
+
+NOTIFY_STOP stops further processing of the notification queue.
+
+Please note that we should not fail for NODE_ADDED_FIRST_MEMORY /
+NODE_REMOVED_FIRST_MEMORY, as memory_hotplug code cannot rollback at that
+point anymore.
+
 Locking Internals
 =================
 
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 25ab9ec14eb8..c5b0859d846d 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -111,6 +111,27 @@ static const struct attribute_group *node_access_node_groups[] = {
 	NULL,
 };
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+static BLOCKING_NOTIFIER_HEAD(node_chain);
+
+int register_node_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&node_chain, nb);
+}
+EXPORT_SYMBOL(register_node_notifier);
+
+void unregister_node_notifier(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&node_chain, nb);
+}
+EXPORT_SYMBOL(unregister_node_notifier);
+
+int node_notify(unsigned long val, void *v)
+{
+	return blocking_notifier_call_chain(&node_chain, val, v);
+}
+#endif
+
 static void node_remove_accesses(struct node *node)
 {
 	struct node_access_nodes *c, *cnext;
diff --git a/include/linux/node.h b/include/linux/node.h
index 2b7517892230..d7aa2636d948 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -123,6 +123,46 @@ static inline void register_memory_blocks_under_node(int nid, unsigned long star
 #endif
 
 extern void unregister_node(struct node *node);
+
+struct node_notify {
+	int nid;
+};
+
+#define NODE_ADDING_FIRST_MEMORY                (1<<0)
+#define NODE_ADDED_FIRST_MEMORY                 (1<<1)
+#define NODE_CANCEL_ADDING_FIRST_MEMORY         (1<<2)
+#define NODE_REMOVING_LAST_MEMORY               (1<<3)
+#define NODE_REMOVED_LAST_MEMORY                (1<<4)
+#define NODE_CANCEL_REMOVING_LAST_MEMORY        (1<<5)
+
+#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
+extern int register_node_notifier(struct notifier_block *nb);
+extern void unregister_node_notifier(struct notifier_block *nb);
+extern int node_notify(unsigned long val, void *v);
+
+#define hotplug_node_notifier(fn, pri) ({		\
+	static __meminitdata struct notifier_block fn##_node_nb =\
+		{ .notifier_call = fn, .priority = pri };\
+	register_node_notifier(&fn##_node_nb);			\
+})
+#else
+static inline int register_node_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+static inline void unregister_node_notifier(struct notifier_block *nb)
+{
+}
+static inline int node_notify(unsigned long val, void *v)
+{
+	return 0;
+}
+static inline int hotplug_node_notifier(notifier_fn_t fn, int pri)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_NUMA
 extern void node_dev_init(void);
 /* Core of the node registration - only memory hotplug should use this */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 94ae0ca37021..e8ccfe4cada2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -35,6 +35,7 @@
 #include <linux/compaction.h>
 #include <linux/rmap.h>
 #include <linux/module.h>
+#include <linux/node.h>
 
 #include <asm/tlbflush.h>
 
@@ -699,24 +700,6 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 	online_mem_sections(start_pfn, end_pfn);
 }
 
-/* check which state of node_states will be changed when online memory */
-static void node_states_check_changes_online(unsigned long nr_pages,
-	struct zone *zone, struct memory_notify *arg)
-{
-	int nid = zone_to_nid(zone);
-
-	arg->status_change_nid = NUMA_NO_NODE;
-
-	if (!node_state(nid, N_MEMORY))
-		arg->status_change_nid = nid;
-}
-
-static void node_states_set_node(int node, struct memory_notify *arg)
-{
-	if (arg->status_change_nid >= 0)
-		node_set_state(node, N_MEMORY);
-}
-
 static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages)
 {
@@ -1167,11 +1150,18 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
 int online_pages(unsigned long pfn, unsigned long nr_pages,
 		       struct zone *zone, struct memory_group *group)
 {
-	unsigned long flags;
-	int need_zonelists_rebuild = 0;
+	struct memory_notify mem_arg = {
+		.start_pfn = pfn,
+		.nr_pages = nr_pages,
+		.status_change_nid = NUMA_NO_NODE,
+	};
+	struct node_notify node_arg = {
+		.nid = NUMA_NO_NODE,
+	};
 	const int nid = zone_to_nid(zone);
+	int need_zonelists_rebuild = 0;
+	unsigned long flags;
 	int ret;
-	struct memory_notify arg;
 
 	/*
 	 * {on,off}lining is constrained to full memory sections (or more
@@ -1188,11 +1178,17 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	/* associate pfn range with the zone */
 	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
 
-	arg.start_pfn = pfn;
-	arg.nr_pages = nr_pages;
-	node_states_check_changes_online(nr_pages, zone, &arg);
+	if (!node_state(nid, N_MEMORY)) {
+		/* Adding memory to the node for the first time */
+		node_arg.nid = nid;
+		mem_arg.status_change_nid = nid;
+		ret = node_notify(NODE_ADDING_FIRST_MEMORY, &node_arg);
+		ret = notifier_to_errno(ret);
+		if (ret)
+			goto failed_addition;
+	}
 
-	ret = memory_notify(MEM_GOING_ONLINE, &arg);
+	ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
 	ret = notifier_to_errno(ret);
 	if (ret)
 		goto failed_addition;
@@ -1218,7 +1214,8 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	online_pages_range(pfn, nr_pages);
 	adjust_present_page_count(pfn_to_page(pfn), group, nr_pages);
 
-	node_states_set_node(nid, &arg);
+	if (node_arg.nid >= 0)
+		node_set_state(nid, N_MEMORY);
 	if (need_zonelists_rebuild)
 		build_all_zonelists(NULL);
 
@@ -1239,16 +1236,22 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	kswapd_run(nid);
 	kcompactd_run(nid);
 
+	if (node_arg.nid >= 0)
+		/* First memory added successfully. Notify consumers. */
+		node_notify(NODE_ADDED_FIRST_MEMORY, &node_arg);
+
 	writeback_set_ratelimit();
 
-	memory_notify(MEM_ONLINE, &arg);
+	memory_notify(MEM_ONLINE, &mem_arg);
 	return 0;
 
 failed_addition:
 	pr_debug("online_pages [mem %#010llx-%#010llx] failed\n",
 		 (unsigned long long) pfn << PAGE_SHIFT,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
-	memory_notify(MEM_CANCEL_ONLINE, &arg);
+	memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
+	if (node_arg.nid != NUMA_NO_NODE)
+		node_notify(NODE_CANCEL_ADDING_FIRST_MEMORY, &node_arg);
 	remove_pfn_range_from_zone(zone, pfn, nr_pages);
 	return ret;
 }
@@ -1880,48 +1883,6 @@ static int __init cmdline_parse_movable_node(char *p)
 }
 early_param("movable_node", cmdline_parse_movable_node);
 
-/* check which state of node_states will be changed when offline memory */
-static void node_states_check_changes_offline(unsigned long nr_pages,
-		struct zone *zone, struct memory_notify *arg)
-{
-	struct pglist_data *pgdat = zone->zone_pgdat;
-	unsigned long present_pages = 0;
-	enum zone_type zt;
-
-	arg->status_change_nid = NUMA_NO_NODE;
-
-	/*
-	 * Check whether node_states[N_NORMAL_MEMORY] will be changed.
-	 * If the memory to be offline is within the range
-	 * [0..ZONE_NORMAL], and it is the last present memory there,
-	 * the zones in that range will become empty after the offlining,
-	 * thus we can determine that we need to clear the node from
-	 * node_states[N_NORMAL_MEMORY].
-	 */
-	for (zt = 0; zt <= ZONE_NORMAL; zt++)
-		present_pages += pgdat->node_zones[zt].present_pages;
-
-	/*
-	 * We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM
-	 * does not apply as we don't support 32bit.
-	 * Here we count the possible pages from ZONE_MOVABLE.
-	 * If after having accounted all the pages, we see that the nr_pages
-	 * to be offlined is over or equal to the accounted pages,
-	 * we know that the node will become empty, and so, we can clear
-	 * it for N_MEMORY as well.
-	 */
-	present_pages += pgdat->node_zones[ZONE_MOVABLE].present_pages;
-
-	if (nr_pages >= present_pages)
-		arg->status_change_nid = zone_to_nid(zone);
-}
-
-static void node_states_clear_node(int node, struct memory_notify *arg)
-{
-	if (arg->status_change_nid >= 0)
-		node_clear_state(node, N_MEMORY);
-}
-
 static int count_system_ram_pages_cb(unsigned long start_pfn,
 				     unsigned long nr_pages, void *data)
 {
@@ -1937,11 +1898,19 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
 int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 			struct zone *zone, struct memory_group *group)
 {
-	const unsigned long end_pfn = start_pfn + nr_pages;
 	unsigned long pfn, managed_pages, system_ram_pages = 0;
+	const unsigned long end_pfn = start_pfn + nr_pages;
+	struct pglist_data *pgdat = zone->zone_pgdat;
 	const int node = zone_to_nid(zone);
+	struct memory_notify mem_arg = {
+		.start_pfn = start_pfn,
+		.nr_pages = nr_pages,
+		.status_change_nid = NUMA_NO_NODE,
+	};
+	struct node_notify node_arg = {
+		.nid = NUMA_NO_NODE,
+	};
 	unsigned long flags;
-	struct memory_notify arg;
 	char *reason;
 	int ret;
 
@@ -2000,11 +1969,21 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 		goto failed_removal_pcplists_disabled;
 	}
 
-	arg.start_pfn = start_pfn;
-	arg.nr_pages = nr_pages;
-	node_states_check_changes_offline(nr_pages, zone, &arg);
+	/*
+	 * Check whether the node will have no present pages after we offline
+	 * 'nr_pages' more. If so, we know that the node will become empty, and
+	 * so we will clear N_MEMORY for it.
+	 */
+	if (nr_pages >= pgdat->node_present_pages) {
+		node_arg.nid = node;
+		mem_arg.status_change_nid = node;
+		ret = node_notify(NODE_REMOVING_LAST_MEMORY, &node_arg);
+		ret = notifier_to_errno(ret);
+		if (ret)
+			goto failed_removal_isolated;
+	}
 
-	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
+	ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
 	ret = notifier_to_errno(ret);
 	if (ret) {
 		reason = "notifier failure";
@@ -2084,27 +2063,32 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	 * Make sure to mark the node as memory-less before rebuilding the zone
 	 * list. Otherwise this node would still appear in the fallback lists.
 	 */
-	node_states_clear_node(node, &arg);
+	if (node_arg.nid >= 0)
+		node_clear_state(node, N_MEMORY);
 	if (!populated_zone(zone)) {
 		zone_pcp_reset(zone);
 		build_all_zonelists(NULL);
 	}
 
-	if (arg.status_change_nid >= 0) {
+	if (node_arg.nid >= 0) {
 		kcompactd_stop(node);
 		kswapd_stop(node);
+		/* Node went memoryless. Notify consumers */
+		node_notify(NODE_REMOVED_LAST_MEMORY, &node_arg);
 	}
 
 	writeback_set_ratelimit();
 
-	memory_notify(MEM_OFFLINE, &arg);
+	memory_notify(MEM_OFFLINE, &mem_arg);
 	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
 	return 0;
 
 failed_removal_isolated:
 	/* pushback to free area */
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
-	memory_notify(MEM_CANCEL_OFFLINE, &arg);
+	memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
+	if (node_arg.nid != NUMA_NO_NODE)
+		node_notify(NODE_CANCEL_REMOVING_LAST_MEMORY, &node_arg);
 failed_removal_pcplists_disabled:
 	lru_cache_enable();
 	zone_pcp_enable(zone);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 04/11] mm,slub: Use node-notifier instead of memory-notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (2 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 03/11] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 05/11] mm,memory-tiers: " Oscar Salvador
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

slub is only concerned when a numa node changes its memory state,
so stop using the memory notifier and use the new numa node notifer
instead.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/slub.c | 28 +++++++++-------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f92b43d36adc..3ff0b94f3eeb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6146,7 +6146,7 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 	return __kmem_cache_do_shrink(s);
 }
 
-static int slab_mem_going_offline_callback(void *arg)
+static int slab_mem_going_offline_callback(void)
 {
 	struct kmem_cache *s;
 
@@ -6160,21 +6160,12 @@ static int slab_mem_going_offline_callback(void *arg)
 	return 0;
 }
 
-static int slab_mem_going_online_callback(void *arg)
+static int slab_mem_going_online_callback(int nid)
 {
 	struct kmem_cache_node *n;
 	struct kmem_cache *s;
-	struct memory_notify *marg = arg;
-	int nid = marg->status_change_nid;
 	int ret = 0;
 
-	/*
-	 * If the node's memory is already available, then kmem_cache_node is
-	 * already created. Nothing to do.
-	 */
-	if (nid < 0)
-		return 0;
-
 	/*
 	 * We are bringing a node online. No memory is available yet. We must
 	 * allocate a kmem_cache_node structure in order to bring the node
@@ -6214,17 +6205,16 @@ static int slab_mem_going_online_callback(void *arg)
 static int slab_memory_callback(struct notifier_block *self,
 				unsigned long action, void *arg)
 {
+	struct node_notify *nn = arg;
+	int nid = nn->nid;
 	int ret = 0;
 
 	switch (action) {
-	case MEM_GOING_ONLINE:
-		ret = slab_mem_going_online_callback(arg);
-		break;
-	case MEM_GOING_OFFLINE:
-		ret = slab_mem_going_offline_callback(arg);
+	case NODE_ADDING_FIRST_MEMORY:
+		ret = slab_mem_going_online_callback(nid);
 		break;
-	case MEM_ONLINE:
-	case MEM_CANCEL_OFFLINE:
+	case NODE_REMOVING_LAST_MEMORY:
+		ret = slab_mem_going_offline_callback();
 		break;
 	}
 	if (ret)
@@ -6300,7 +6290,7 @@ void __init kmem_cache_init(void)
 			sizeof(struct kmem_cache_node),
 			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
 
-	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+	hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
 
 	/* Able to allocate the per node structures */
 	slab_state = PARTIAL;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 05/11] mm,memory-tiers: Use node-notifier instead of memory-notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (3 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 04/11] mm,slub: Use node-notifier instead of memory-notifier Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 06/11] drivers,cxl: " Oscar Salvador
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

memory-tier is only concerned when a numa node changes its memory state,
because it then needs to re-create the demotion list.
So stop using the memory notifier and use the new numa node notifer
instead.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/memory-tiers.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index fc14fe53e9b7..0382b6942b8b 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -872,25 +872,18 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self,
 					      unsigned long action, void *_arg)
 {
 	struct memory_tier *memtier;
-	struct memory_notify *arg = _arg;
-
-	/*
-	 * Only update the node migration order when a node is
-	 * changing status, like online->offline.
-	 */
-	if (arg->status_change_nid < 0)
-		return notifier_from_errno(0);
+	struct node_notify *nn = _arg;
 
 	switch (action) {
-	case MEM_OFFLINE:
+	case NODE_REMOVED_LAST_MEMORY:
 		mutex_lock(&memory_tier_lock);
-		if (clear_node_memory_tier(arg->status_change_nid))
+		if (clear_node_memory_tier(nn->nid))
 			establish_demotion_targets();
 		mutex_unlock(&memory_tier_lock);
 		break;
-	case MEM_ONLINE:
+	case NODE_ADDED_FIRST_MEMORY:
 		mutex_lock(&memory_tier_lock);
-		memtier = set_node_memory_tier(arg->status_change_nid);
+		memtier = set_node_memory_tier(nn->nid);
 		if (!IS_ERR(memtier))
 			establish_demotion_targets();
 		mutex_unlock(&memory_tier_lock);
@@ -929,7 +922,7 @@ static int __init memory_tier_init(void)
 	nodes_and(default_dram_nodes, node_states[N_MEMORY],
 		  node_states[N_CPU]);
 
-	hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
+	hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
 	return 0;
 }
 subsys_initcall(memory_tier_init);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 06/11] drivers,cxl: Use node-notifier instead of memory-notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (4 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 05/11] mm,memory-tiers: " Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 07/11] drivers,hmat: " Oscar Salvador
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

memory-tier is only concerned when a numa node changes its memory state,
specifically when a numa node with memory comes into play for the first
time, because it needs to get its performance attributes to build a proper
demotion chain.
So stop using the memory notifier and use the new numa node notifer
instead.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 drivers/cxl/core/region.c | 16 ++++++++--------
 drivers/cxl/cxl.h         |  4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c3f4dc244df7..261e07302ca4 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2432,12 +2432,12 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
 					  unsigned long action, void *arg)
 {
 	struct cxl_region *cxlr = container_of(nb, struct cxl_region,
-					       memory_notifier);
-	struct memory_notify *mnb = arg;
-	int nid = mnb->status_change_nid;
+					       node_notifier);
+	struct node_notify *nn = arg;
+	int nid = nn->nid;
 	int region_nid;
 
-	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
+	if (action != NODE_ADDED_FIRST_MEMORY)
 		return NOTIFY_DONE;
 
 	/*
@@ -3484,7 +3484,7 @@ static void shutdown_notifiers(void *_cxlr)
 {
 	struct cxl_region *cxlr = _cxlr;
 
-	unregister_memory_notifier(&cxlr->memory_notifier);
+	unregister_node_notifier(&cxlr->node_notifier);
 	unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
 }
 
@@ -3523,9 +3523,9 @@ static int cxl_region_probe(struct device *dev)
 	if (rc)
 		return rc;
 
-	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
-	cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
-	register_memory_notifier(&cxlr->memory_notifier);
+	cxlr->node_notifier.notifier_call = cxl_region_perf_attrs_callback;
+	cxlr->node_notifier.priority = CXL_CALLBACK_PRI;
+	register_node_notifier(&cxlr->node_notifier);
 
 	cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
 	cxlr->adist_notifier.priority = 100;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index a9ab46eb0610..48ac02dee881 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -513,7 +513,7 @@ enum cxl_partition_mode {
  * @flags: Region state flags
  * @params: active + config params for the region
  * @coord: QoS access coordinates for the region
- * @memory_notifier: notifier for setting the access coordinates to node
+ * @node_notifier: notifier for setting the access coordinates to node
  * @adist_notifier: notifier for calculating the abstract distance of node
  */
 struct cxl_region {
@@ -526,7 +526,7 @@ struct cxl_region {
 	unsigned long flags;
 	struct cxl_region_params params;
 	struct access_coordinate coord[ACCESS_COORDINATE_MAX];
-	struct notifier_block memory_notifier;
+	struct notifier_block node_notifier;
 	struct notifier_block adist_notifier;
 };
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 07/11] drivers,hmat: Use node-notifier instead of memory-notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (5 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 06/11] drivers,cxl: " Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 08/11] kernel,cpuset: " Oscar Salvador
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

hmat driver is only concerned when a numa node changes its memory state,
specifically when a numa node with memory comes into play for the first
time, because it will register the memory_targets belonging to that numa
node.
So stop using the memory notifier and use the new numa node notifer
instead.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 drivers/acpi/numa/hmat.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 9d9052258e92..4958301f5417 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -962,10 +962,10 @@ static int hmat_callback(struct notifier_block *self,
 			 unsigned long action, void *arg)
 {
 	struct memory_target *target;
-	struct memory_notify *mnb = arg;
-	int pxm, nid = mnb->status_change_nid;
+	struct node_notify *nn = arg;
+	int pxm, nid = nn->nid;
 
-	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
+	if (action != NODE_ADDED_FIRST_MEMORY)
 		return NOTIFY_OK;
 
 	pxm = node_to_pxm(nid);
@@ -1118,7 +1118,7 @@ static __init int hmat_init(void)
 	hmat_register_targets();
 
 	/* Keep the table and structures if the notifier may use them */
-	if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI))
+	if (hotplug_node_notifier(hmat_callback, HMAT_CALLBACK_PRI))
 		goto out_put;
 
 	if (!hmat_set_default_dram_perf())
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 08/11] kernel,cpuset: Use node-notifier instead of memory-notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (6 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 07/11] drivers,hmat: " Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 09/11] mm,mempolicy: " Oscar Salvador
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

cpuset is only concerned when a numa node changes its memory state,
as it needs to know the current numa nodes with memory to keep
an updated mems_allowed mask.
So stop using the memory notifier and use the new numa node notifer
instead.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 kernel/cgroup/cpuset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 83639a12883d..66c84024f217 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -4013,7 +4013,7 @@ void __init cpuset_init_smp(void)
 	cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask);
 	top_cpuset.effective_mems = node_states[N_MEMORY];
 
-	hotplug_memory_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI);
+	hotplug_node_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI);
 
 	cpuset_migrate_mm_wq = alloc_ordered_workqueue("cpuset_migrate_mm", 0);
 	BUG_ON(!cpuset_migrate_mm_wq);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 09/11] mm,mempolicy: Use node-notifier instead of memory-notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (7 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 08/11] kernel,cpuset: " Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-17  0:14   ` Gregory Price
  2025-06-16 13:51 ` [PATCH v7 10/11] mm,page_ext: Derive the node from the pfn Oscar Salvador
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

mempolicy is only concerned when a numa node changes its memory state,
because it needs to take this node into account for the auto-weighted
memory policy system.
So stop using the memory notifier and use the new numa node notifer
instead.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Rakie Kim <rakie.kim@sk.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/mempolicy.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 72fd72e156b1..693319c2652d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3793,20 +3793,17 @@ static int wi_node_notifier(struct notifier_block *nb,
 			       unsigned long action, void *data)
 {
 	int err;
-	struct memory_notify *arg = data;
-	int nid = arg->status_change_nid;
-
-	if (nid < 0)
-		return NOTIFY_OK;
+	struct node_notify *nn = data;
+	int nid = nn->nid;
 
 	switch (action) {
-	case MEM_ONLINE:
+	case NODE_ADDED_FIRST_MEMORY:
 		err = sysfs_wi_node_add(nid);
 		if (err)
 			pr_err("failed to add sysfs for node%d during hotplug: %d\n",
 			       nid, err);
 		break;
-	case MEM_OFFLINE:
+	case NODE_REMOVED_LAST_MEMORY:
 		sysfs_wi_node_delete(nid);
 		break;
 	}
@@ -3845,7 +3842,7 @@ static int __init add_weighted_interleave_group(struct kobject *mempolicy_kobj)
 		}
 	}
 
-	hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
+	hotplug_node_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
 	return 0;
 
 err_cleanup_kobj:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 10/11] mm,page_ext: Derive the node from the pfn
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (8 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 09/11] mm,mempolicy: " Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 13:51 ` [PATCH v7 11/11] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
  2025-06-16 14:32 ` [PATCH v7 00/11] Implement numa node notifier David Hildenbrand
  11 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

page_ext is the only user of 'status_change_nid', which is set in
online/offline operations, to know to which node we are
adding/removing memory.

Prior to call any notifiers, the memmap is initialized via, which among
other things, sets the node the pages belong to, to all corresponging pages.
This means that there is no need to keep using 'status_change_nid' since
we can derive the node from the pfn.
This will allow us to finally drop 'status_change_nid' from the memory_notify
struct.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/page_ext.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index c351fdfe9e9a..d7396a8970e5 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -369,25 +369,15 @@ static void __invalidate_page_ext(unsigned long pfn)
 }
 
 static int __meminit online_page_ext(unsigned long start_pfn,
-				unsigned long nr_pages,
-				int nid)
+				unsigned long nr_pages)
 {
+	int nid = pfn_to_nid(start_pfn);
 	unsigned long start, end, pfn;
 	int fail = 0;
 
 	start = SECTION_ALIGN_DOWN(start_pfn);
 	end = SECTION_ALIGN_UP(start_pfn + nr_pages);
 
-	if (nid == NUMA_NO_NODE) {
-		/*
-		 * In this case, "nid" already exists and contains valid memory.
-		 * "start_pfn" passed to us is a pfn which is an arg for
-		 * online__pages(), and start_pfn should exist.
-		 */
-		nid = pfn_to_nid(start_pfn);
-		VM_BUG_ON(!node_online(nid));
-	}
-
 	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION)
 		fail = init_section_page_ext(pfn, nid);
 	if (!fail)
@@ -435,8 +425,7 @@ static int __meminit page_ext_callback(struct notifier_block *self,
 
 	switch (action) {
 	case MEM_GOING_ONLINE:
-		ret = online_page_ext(mn->start_pfn,
-				   mn->nr_pages, mn->status_change_nid);
+		ret = online_page_ext(mn->start_pfn, mn->nr_pages);
 		break;
 	case MEM_OFFLINE:
 		offline_page_ext(mn->start_pfn,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v7 11/11] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (9 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 10/11] mm,page_ext: Derive the node from the pfn Oscar Salvador
@ 2025-06-16 13:51 ` Oscar Salvador
  2025-06-16 14:11   ` Vlastimil Babka
  2025-06-16 14:32 ` [PATCH v7 00/11] Implement numa node notifier David Hildenbrand
  11 siblings, 1 reply; 17+ messages in thread
From: Oscar Salvador @ 2025-06-16 13:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel, Oscar Salvador

There no users left of status_change_nid, so drop it from memory_notify
struct.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
---
 Documentation/core-api/memory-hotplug.rst | 7 -------
 include/linux/memory.h                    | 1 -
 mm/memory_hotplug.c                       | 4 ----
 3 files changed, 12 deletions(-)

diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
index fb84e78968b2..8fc97c2379de 100644
--- a/Documentation/core-api/memory-hotplug.rst
+++ b/Documentation/core-api/memory-hotplug.rst
@@ -59,17 +59,10 @@ The third argument (arg) passes a pointer of struct memory_notify::
 	struct memory_notify {
 		unsigned long start_pfn;
 		unsigned long nr_pages;
-		int status_change_nid;
 	}
 
 - start_pfn is start_pfn of online/offline memory.
 - nr_pages is # of pages of online/offline memory.
-- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
-  set/clear. It means a new(memoryless) node gets new memory by online and a
-  node loses all memory. If this is -1, then nodemask status is not changed.
-
-  If status_changed_nid* >= 0, callback should create/discard structures for the
-  node if necessary.
 
 It is possible to get notified for MEM_CANCEL_ONLINE without having been notified
 for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a9ccd6579422..de8b898ada3f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -109,7 +109,6 @@ struct memory_notify {
 	unsigned long altmap_nr_pages;
 	unsigned long start_pfn;
 	unsigned long nr_pages;
-	int status_change_nid;
 };
 
 struct notifier_block;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e8ccfe4cada2..bfaa570c0685 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1153,7 +1153,6 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	struct memory_notify mem_arg = {
 		.start_pfn = pfn,
 		.nr_pages = nr_pages,
-		.status_change_nid = NUMA_NO_NODE,
 	};
 	struct node_notify node_arg = {
 		.nid = NUMA_NO_NODE,
@@ -1181,7 +1180,6 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	if (!node_state(nid, N_MEMORY)) {
 		/* Adding memory to the node for the first time */
 		node_arg.nid = nid;
-		mem_arg.status_change_nid = nid;
 		ret = node_notify(NODE_ADDING_FIRST_MEMORY, &node_arg);
 		ret = notifier_to_errno(ret);
 		if (ret)
@@ -1905,7 +1903,6 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	struct memory_notify mem_arg = {
 		.start_pfn = start_pfn,
 		.nr_pages = nr_pages,
-		.status_change_nid = NUMA_NO_NODE,
 	};
 	struct node_notify node_arg = {
 		.nid = NUMA_NO_NODE,
@@ -1976,7 +1973,6 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	 */
 	if (nr_pages >= pgdat->node_present_pages) {
 		node_arg.nid = node;
-		mem_arg.status_change_nid = node;
 		ret = node_notify(NODE_REMOVING_LAST_MEMORY, &node_arg);
 		ret = notifier_to_errno(ret);
 		if (ret)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 11/11] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify
  2025-06-16 13:51 ` [PATCH v7 11/11] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
@ 2025-06-16 14:11   ` Vlastimil Babka
  0 siblings, 0 replies; 17+ messages in thread
From: Vlastimil Babka @ 2025-06-16 14:11 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: David Hildenbrand, Jonathan Cameron, Harry Yoo, Rakie Kim,
	Hyeonggon Yoo, linux-mm, linux-kernel

On 6/16/25 3:51 PM, Oscar Salvador wrote:
> There no users left of status_change_nid, so drop it from memory_notify
> struct.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  Documentation/core-api/memory-hotplug.rst | 7 -------
>  include/linux/memory.h                    | 1 -
>  mm/memory_hotplug.c                       | 4 ----
>  3 files changed, 12 deletions(-)
> 
> diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst
> index fb84e78968b2..8fc97c2379de 100644
> --- a/Documentation/core-api/memory-hotplug.rst
> +++ b/Documentation/core-api/memory-hotplug.rst
> @@ -59,17 +59,10 @@ The third argument (arg) passes a pointer of struct memory_notify::
>  	struct memory_notify {
>  		unsigned long start_pfn;
>  		unsigned long nr_pages;
> -		int status_change_nid;
>  	}
>  
>  - start_pfn is start_pfn of online/offline memory.
>  - nr_pages is # of pages of online/offline memory.
> -- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
> -  set/clear. It means a new(memoryless) node gets new memory by online and a
> -  node loses all memory. If this is -1, then nodemask status is not changed.
> -
> -  If status_changed_nid* >= 0, callback should create/discard structures for the
> -  node if necessary.
>  
>  It is possible to get notified for MEM_CANCEL_ONLINE without having been notified
>  for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index a9ccd6579422..de8b898ada3f 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -109,7 +109,6 @@ struct memory_notify {
>  	unsigned long altmap_nr_pages;
>  	unsigned long start_pfn;
>  	unsigned long nr_pages;
> -	int status_change_nid;
>  };
>  
>  struct notifier_block;
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e8ccfe4cada2..bfaa570c0685 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1153,7 +1153,6 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>  	struct memory_notify mem_arg = {
>  		.start_pfn = pfn,
>  		.nr_pages = nr_pages,
> -		.status_change_nid = NUMA_NO_NODE,
>  	};
>  	struct node_notify node_arg = {
>  		.nid = NUMA_NO_NODE,
> @@ -1181,7 +1180,6 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>  	if (!node_state(nid, N_MEMORY)) {
>  		/* Adding memory to the node for the first time */
>  		node_arg.nid = nid;
> -		mem_arg.status_change_nid = nid;
>  		ret = node_notify(NODE_ADDING_FIRST_MEMORY, &node_arg);
>  		ret = notifier_to_errno(ret);
>  		if (ret)
> @@ -1905,7 +1903,6 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	struct memory_notify mem_arg = {
>  		.start_pfn = start_pfn,
>  		.nr_pages = nr_pages,
> -		.status_change_nid = NUMA_NO_NODE,
>  	};
>  	struct node_notify node_arg = {
>  		.nid = NUMA_NO_NODE,
> @@ -1976,7 +1973,6 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	 */
>  	if (nr_pages >= pgdat->node_present_pages) {
>  		node_arg.nid = node;
> -		mem_arg.status_change_nid = node;
>  		ret = node_notify(NODE_REMOVING_LAST_MEMORY, &node_arg);
>  		ret = notifier_to_errno(ret);
>  		if (ret)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 03/11] mm,memory_hotplug: Implement numa node notifier
  2025-06-16 13:51 ` [PATCH v7 03/11] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
@ 2025-06-16 14:31   ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2025-06-16 14:31 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
	Hyeonggon Yoo, linux-mm, linux-kernel

On 16.06.25 15:51, Oscar Salvador wrote:
> There are at least six consumers of hotplug_memory_notifier that what they
> really are interested in is whether any numa node changed its state, e.g: going
> from having memory to not having memory and vice versa.
> 
> Implement a specific notifier for numa nodes when their state gets changed,
> which will later be used by those consumers that are only interested
> in numa node state changes.
> 
> Add documentation as well.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---

Thanks Oscar!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 00/11] Implement numa node notifier
  2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
                   ` (10 preceding siblings ...)
  2025-06-16 13:51 ` [PATCH v7 11/11] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
@ 2025-06-16 14:32 ` David Hildenbrand
  2025-06-17  9:26   ` Oscar Salvador
  11 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2025-06-16 14:32 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: Vlastimil Babka, Jonathan Cameron, Harry Yoo, Rakie Kim,
	Hyeonggon Yoo, linux-mm, linux-kernel

On 16.06.25 15:51, Oscar Salvador wrote:
>   v6 -> v7:
>     - Split previous patch#10 in two, one for page_ext
>       and the other to drop status_change_nid (per David)
>     - Implement feedback on simplifying previous cancel_on_*
>       notifiers and better document the fact that we consumers
>       can get called on _CANCEL_* actions before having been called
>       for previous actions (per David)
>     - Add Acks-by
> 
>   v5 -> v6:
>     - Remove redundant checks (per David)
>     - Fix build failure
>     - Drop 'nid' parameter from memory notify (Per David)
>     - Add RB/ACKs-by
> 
>   v4 -> v5:
>     - Split out conversion for different consumers (per David)
>     - Renamed node-notifier actions (per David)
>     - Added new Documentation for new node-notifier and updated
>       the memory-notifier one to reflect the changes
>     - Make sure we do not trigger anything when !CONFIG_NUMA (per David)
> 
>   v3 -> v4:
>     - Fix typos pointed out by Alok Tiwari
>     - Further cleanups suggested by Vlastimil
>     - Add RBs-by from Vlastimil
> 
>   v2 -> v3:
>     - Add Suggested-by (David)
>     - Replace last N_NORMAL_MEMORY mention in slub (David)
>     - Replace the notifier for autoweitght-mempolicy
>     - Fix build on !CONFIG_MEMORY_HOTPLUG
>   
>   v1 -> v2:
>     - Remove status_change_nid_normal and the code that
>       deals with it (David & Vlastimil)
>     - Remove slab_mem_offline_callback (David & Vlastimil)
>     - Change the order of canceling the notifiers
>       in {online,offline}_pages (Vlastimil)
>     - Fix up a couple of whitespaces (Jonathan Cameron)
>     - Add RBs-by
> 
> Memory notifier is a tool that allow consumers to get notified whenever
> memory gets onlined or offlined in the system.
> Currently, there are 10 consumers of that, but 5 out of those 10 consumers
> are only interested in getting notifications when a numa node changes its
> memory state.
> That means going from memoryless to memory-aware of vice versa.
> 
> Which means that for every {online,offline}_pages operation they get
> notified even though the numa node might not have changed its state.
> This is suboptimal, and we want to decouple numa node state changes from
> memory state changes.
> 
> While we are doing this, remove status_change_nid_normal, as the only
> current user (slub) does not really need it.
> This allows us to further simplify and clean up the code.
> 
> The first patch gets rid of status_change_nid_normal in slub.
> The second patch implements a numa node notifier that does just that, and have
> those consumers register in there, so they get notified only when they are
> interested.
> 
> The third patch replaces 'status_change_nid{_normal}' fields within
> memory_notify with a 'nid', as that is only what we need for memory
> notifer and update the only user of it (page_ext).
> 
> Consumers that are only interested in numa node states change are:
> 
>   - memory-tier
>   - slub
>   - cpuset
>   - hmat
>   - cxl
>   - autoweight-mempolicy
> 


All looking good to me (and much easier to digest now that it's properly 
split up into patches!) :)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 09/11] mm,mempolicy: Use node-notifier instead of memory-notifier
  2025-06-16 13:51 ` [PATCH v7 09/11] mm,mempolicy: " Oscar Salvador
@ 2025-06-17  0:14   ` Gregory Price
  0 siblings, 0 replies; 17+ messages in thread
From: Gregory Price @ 2025-06-17  0:14 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, David Hildenbrand, Vlastimil Babka,
	Jonathan Cameron, Harry Yoo, Rakie Kim, Hyeonggon Yoo, linux-mm,
	linux-kernel

On Mon, Jun 16, 2025 at 03:51:52PM +0200, Oscar Salvador wrote:
> mempolicy is only concerned when a numa node changes its memory state,
> because it needs to take this node into account for the auto-weighted
> memory policy system.
> So stop using the memory notifier and use the new numa node notifer
> instead.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> Reviewed-by: Rakie Kim <rakie.kim@sk.com>
> Acked-by: David Hildenbrand <david@redhat.com>

Sorry for the late chime-in, thank you for doing this clean up, this all
looks awesome.

Reviewed-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 00/11] Implement numa node notifier
  2025-06-16 14:32 ` [PATCH v7 00/11] Implement numa node notifier David Hildenbrand
@ 2025-06-17  9:26   ` Oscar Salvador
  0 siblings, 0 replies; 17+ messages in thread
From: Oscar Salvador @ 2025-06-17  9:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Vlastimil Babka, Jonathan Cameron, Harry Yoo,
	Rakie Kim, Hyeonggon Yoo, linux-mm, linux-kernel

On Mon, Jun 16, 2025 at 04:32:32PM +0200, David Hildenbrand wrote:
> All looking good to me (and much easier to digest now that it's properly
> split up into patches!) :)

Great! Thank you all for all the feedback ;-P!

 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-06-17  9:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-16 13:51 [PATCH v7 00/11] Implement numa node notifier Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 01/11] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 02/11] mm,memory_hotplug: Remove status_change_nid_normal and update documentation Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 03/11] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
2025-06-16 14:31   ` David Hildenbrand
2025-06-16 13:51 ` [PATCH v7 04/11] mm,slub: Use node-notifier instead of memory-notifier Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 05/11] mm,memory-tiers: " Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 06/11] drivers,cxl: " Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 07/11] drivers,hmat: " Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 08/11] kernel,cpuset: " Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 09/11] mm,mempolicy: " Oscar Salvador
2025-06-17  0:14   ` Gregory Price
2025-06-16 13:51 ` [PATCH v7 10/11] mm,page_ext: Derive the node from the pfn Oscar Salvador
2025-06-16 13:51 ` [PATCH v7 11/11] mm,memory_hotplug: Drop status_change_nid parameter from memory_notify Oscar Salvador
2025-06-16 14:11   ` Vlastimil Babka
2025-06-16 14:32 ` [PATCH v7 00/11] Implement numa node notifier David Hildenbrand
2025-06-17  9:26   ` Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).