From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04CA5C71136 for ; Mon, 16 Jun 2025 13:52:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35D646B00BF; Mon, 16 Jun 2025 09:52:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30C1E6B00C3; Mon, 16 Jun 2025 09:52:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0971C6B00C1; Mon, 16 Jun 2025 09:52:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E95806B00C0 for ; Mon, 16 Jun 2025 09:52:43 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8DE6BB83EF for ; Mon, 16 Jun 2025 13:52:43 +0000 (UTC) X-FDA: 83561404206.11.B5A0A0D Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf30.hostedemail.com (Postfix) with ESMTP id 4103880003 for ; Mon, 16 Jun 2025 13:52:40 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=izQUp55O; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=BBGZTpCA; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=laWD4APh; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=By0Q2X2B; spf=pass (imf30.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750081961; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6KhCve2kyWr/xX9kQ4iFPhiYOfn5SZJvT43IbhTSXS4=; b=2FzaArh44JsrJbDpDmn+a17NdS6/UckWE8npK83T5LHFWyLbjUmgyRjbVFlqEtTf9UAlNF W2ASiuN84DjD4OL7iGIol9N/SgJtRhdWzpO94ywBy0h6wGqwmLcxvzSpphglSSTcV/eVKy /cqc7e7v5e/ZgN+fR1Rv8okjeeSAGpc= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=izQUp55O; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=BBGZTpCA; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=laWD4APh; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=By0Q2X2B; spf=pass (imf30.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.130 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750081961; a=rsa-sha256; cv=none; b=D+Mm2tMYvOIEB6nyCa3/vf2Q6qWJcZA6h96UJLVok68ia8iME+SV7HgpXgCjB3+vyIfmr3 z5E8ONRBHRqDeZF4VkJ76JI3WERzT4b9T+MijHuVwh/YPIdhb4MX9xsTy95JD5864VHbHP T/HknCq0GTn+I4U+nzbKQ8u11j9FlJ4= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DB22C211EB; Mon, 16 Jun 2025 13:52:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1750081943; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6KhCve2kyWr/xX9kQ4iFPhiYOfn5SZJvT43IbhTSXS4=; b=izQUp55ORBGMJtTNsw1krVKrd+Jz99TTP4IAMFTOPK+IKKm6ACKJra17dUjcgSPARLjeGO l9bcEwoTcMjtW+sS+i6IzqcVaw3ntqEIGfJFKnswPK4HGhbBi0AVfTaDteQcVgnsQyGKXy t3cu7UJWoA6pZyYvt7Qf3wRDevDqYY4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1750081943; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6KhCve2kyWr/xX9kQ4iFPhiYOfn5SZJvT43IbhTSXS4=; b=BBGZTpCAlqCwZGb4nYrLJ94E1ZPp0s5bbuoSKIVwjywB2j5mLEte0z8aVkCtIftiLSZ0x5 w1zMYaM/KfnshzDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1750081937; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6KhCve2kyWr/xX9kQ4iFPhiYOfn5SZJvT43IbhTSXS4=; b=laWD4APhip2Mnm1Kxn5awjpzsKZqyBp5P/Q2+BP7hVgaFym5DBsWf806Vdf/nwEuo3Jjuw GJn9P+mJajnxfVUr0yq3ucvWQspSfely1lQ4ArTIb5XfmLrQBDh2U0qNwzLILfwTy+kLoh P54I8oxOoh6uJmG6r13PslXS3K/NM4o= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1750081937; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6KhCve2kyWr/xX9kQ4iFPhiYOfn5SZJvT43IbhTSXS4=; b=By0Q2X2Bb05RFcMK6ELXLeCA7UthWnmJ9FRySMsgOvOfwjRBTDFr3UTST0frgnUIebfgCO 28+orh459iEOYpCw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 445DC13AD9; Mon, 16 Jun 2025 13:52:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id mAAJDowhUGhHLwAAD6G6ig (envelope-from ); Mon, 16 Jun 2025 13:52:12 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , Vlastimil Babka , Jonathan Cameron , Harry Yoo , Rakie Kim , Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador Subject: [PATCH v7 03/11] mm,memory_hotplug: Implement numa node notifier Date: Mon, 16 Jun 2025 15:51:46 +0200 Message-ID: <20250616135158.450136-4-osalvador@suse.de> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250616135158.450136-1-osalvador@suse.de> References: <20250616135158.450136-1-osalvador@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Action: no action X-Stat-Signature: k5ci4iqwmmo7cmbjfm6tdhku4xzheh1y X-Rspamd-Queue-Id: 4103880003 X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1750081960-709430 X-HE-Meta: U2FsdGVkX1/HI5RHlx69NByvbk9aGawhny00EYDwVJA1aruM8D7kA2Wj8fPQ8jz/akOxRPqYpNs+0jwd58lKCiQW2Vll5F4F+pRM7RiuH8RyU2pSFijaun35kv8Z3mtQZFbODm06bZmUq8NYYSS/BcAybuIKRxsRz4kDux8oKEuqOnV3PemQsNNc9b3yika30YMwKMyKa7KjcJoWveDlb+GtQThEeUJlZX0xQ4q+AI0dIH51/iR/PhwaQfR8e1qCSdnw21lMm/MK8+9L1cABvkpPuBFBwduQfzO6I7355VF+AwPAtT/htiE9dzehCWVP9VkQSdQXuFYh2n8wbYsQ1ug70zEgWAvAOIEv3ehMsFybSPaBpnGK9+QKhO96F771X2jy6nYdmPHgmtd870PdZ3AgWk4h8rbqbwasZ3Ag2wr2au72gM1VU0yrl+ff47/bQLU/uIVjHqav5KGULMywVV5xz36t0MelJa3SVhU76krUq6/EmoJVh5PI+YoSj7EGxyIVfcBuGhKEvDiIM0prCaPKkCTycA+kpIkoG4JFl/1sfBzQ/croDHFQfayWB/Xu2/tfaecC4uDdhMBzhO0ywM07JdXI5Mx+DOKNvBGkn/sjcyn6bNj96CWPRYaFHc/CEvgLSlg81Rr1ZeVSjjnVmhMd5CaYloR5SLdStNKc0AMeggBwUt/chqS/auajxsgU4Vr7hdHXuYxRDU8jWIf50NOH6NDjIFxQS7cc809G23oREbUopC86SaHS9iF5Mp0lzWzKNMLJynkqsfuSiRfnoavoIM7KID1BedWVDtxLwAxYPFyDOircEBNZ+NJinRuJ8lOgExYoU0VSgPi5xWDGewEHBCMqyWWS+CvmQ92HROYwg9aIYswwZmYwMGcDXS/Rc1mLeC+e1ctTt42LnHZ8E6GMWHGoaG2rehDdVaX+MStd21qaxHqri3SDhR4FVU/WnzsuRGwi4IJPlgp2YvS MnnU4Ege j+9Ft0SD0nHwLiypw9sXEnc9uAY9GZgO9im0EbdR5HEtrRVrbjpGKux59Fy2FleGPK6JDFJ1vA+3UARPliG5yDok0fB5M0O3+qH+Tq4nT90p8ravtbRPFXn30xUFb6sDMV+/k5jrEcEfPWx/Ioe9HI5iZQjc8PhkwB+zp6olwEb+xCIsKfLI0ivBBNROHOdpzyRMeEAkm5Tt3bHFPxHb/MWuvnaBCLsX+T/jwbmLYMHHEvX58T6rnhvTYKEABPO80GZlJKNPNOfXYDUD5A28S1qQqEXeT+6dQ5UrCbVMOHc1DDEy6q8ZUkn/+ZS/ht6SCfauMcVpJ2jAoxKwSUH68BSZd+HkEWu1rws7kMGvFwsfokNFQYbHmZDrK1243thzFZQIocQaOXP/yglhB+/wXUkPLT7Jt2j5Sf+Bm9kSP+ZfThKU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are at least six consumers of hotplug_memory_notifier that what they really are interested in is whether any numa node changed its state, e.g: going from having memory to not having memory and vice versa. Implement a specific notifier for numa nodes when their state gets changed, which will later be used by those consumers that are only interested in numa node state changes. Add documentation as well. Signed-off-by: Oscar Salvador Reviewed-by: Jonathan Cameron Reviewed-by: Harry Yoo Reviewed-by: Vlastimil Babka --- Documentation/core-api/memory-hotplug.rst | 83 +++++++++++++ drivers/base/node.c | 21 ++++ include/linux/node.h | 40 ++++++ mm/memory_hotplug.c | 144 ++++++++++------------ 4 files changed, 208 insertions(+), 80 deletions(-) diff --git a/Documentation/core-api/memory-hotplug.rst b/Documentation/core-api/memory-hotplug.rst index d1b8eb9add8a..fb84e78968b2 100644 --- a/Documentation/core-api/memory-hotplug.rst +++ b/Documentation/core-api/memory-hotplug.rst @@ -9,6 +9,9 @@ Memory hotplug event notifier Hotplugging events are sent to a notification queue. +Memory notifier +---------------- + There are six types of notification defined in ``include/linux/memory.h``: MEM_GOING_ONLINE @@ -68,6 +71,14 @@ The third argument (arg) passes a pointer of struct memory_notify:: If status_changed_nid* >= 0, callback should create/discard structures for the node if necessary. +It is possible to get notified for MEM_CANCEL_ONLINE without having been notified +for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and +MEM_GOING_OFFLINE. +This can happen when a consumer fails, meaning we break the callchain and we +stop calling the remaining consumers of the notifier. +It is then important that users of memory_notify make no assumptions and get +prepared to handle such cases. + The callback routine shall return one of the values NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP defined in ``include/linux/notifier.h`` @@ -80,6 +91,78 @@ further processing of the notification queue. NOTIFY_STOP stops further processing of the notification queue. +Numa node notifier +------------------ + +There are six types of notification defined in ``include/linux/node.h``: + +NODE_ADDING_FIRST_MEMORY + Generated before memory becomes available to this node for the first time. + +NODE_CANCEL_ADDING_FIRST_MEMORY + Generated if NODE_ADDING_FIRST_MEMORY fails. + +NODE_ADDED_FIRST_MEMORY + Generated when memory has become available fo this node for the first time. + +NODE_REMOVING_LAST_MEMORY + Generated when the last memory available to this node is about to be offlined. + +NODE_CANCEL_REMOVING_LAST_MEMORY + Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails. + +NODE_REMOVED_LAST_MEMORY + Generated when the last memory available to this node has been offlined. + +A callback routine can be registered by calling:: + + hotplug_node_notifier(callback_func, priority) + +Callback functions with higher values of priority are called before callback +functions with lower values. + +A callback function must have the following prototype:: + + int callback_func( + + struct notifier_block *self, unsigned long action, void *arg); + +The first argument of the callback function (self) is a pointer to the block +of the notifier chain that points to the callback function itself. +The second argument (action) is one of the event types described above. +The third argument (arg) passes a pointer of struct node_notify:: + + struct node_notify { + int nid; + } + +- nid is the node we are adding or removing memory to. + +It is possible to get notified for NODE_CANCEL_ADDING_FIRST_MEMORY without +having been notified for NODE_ADDING_FIRST_MEMORY, and the same applies to +NODE_CANCEL_REMOVING_LAST_MEMORY and NODE_REMOVING_LAST_MEMORY. +This can happen when a consumer fails, meaning we break the callchain and we +stop calling the remaining consumers of the notifier. +It is then important that users of node_notify make no assumptions and get +prepared to handle such cases. + +The callback routine shall return one of the values +NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP +defined in ``include/linux/notifier.h`` + +NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. + +NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY, +NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or +NODE_REMOVED_LAST_MEMORY action to cancel hotplugging. +It stops further processing of the notification queue. + +NOTIFY_STOP stops further processing of the notification queue. + +Please note that we should not fail for NODE_ADDED_FIRST_MEMORY / +NODE_REMOVED_FIRST_MEMORY, as memory_hotplug code cannot rollback at that +point anymore. + Locking Internals ================= diff --git a/drivers/base/node.c b/drivers/base/node.c index 25ab9ec14eb8..c5b0859d846d 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -111,6 +111,27 @@ static const struct attribute_group *node_access_node_groups[] = { NULL, }; +#ifdef CONFIG_MEMORY_HOTPLUG +static BLOCKING_NOTIFIER_HEAD(node_chain); + +int register_node_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&node_chain, nb); +} +EXPORT_SYMBOL(register_node_notifier); + +void unregister_node_notifier(struct notifier_block *nb) +{ + blocking_notifier_chain_unregister(&node_chain, nb); +} +EXPORT_SYMBOL(unregister_node_notifier); + +int node_notify(unsigned long val, void *v) +{ + return blocking_notifier_call_chain(&node_chain, val, v); +} +#endif + static void node_remove_accesses(struct node *node) { struct node_access_nodes *c, *cnext; diff --git a/include/linux/node.h b/include/linux/node.h index 2b7517892230..d7aa2636d948 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -123,6 +123,46 @@ static inline void register_memory_blocks_under_node(int nid, unsigned long star #endif extern void unregister_node(struct node *node); + +struct node_notify { + int nid; +}; + +#define NODE_ADDING_FIRST_MEMORY (1<<0) +#define NODE_ADDED_FIRST_MEMORY (1<<1) +#define NODE_CANCEL_ADDING_FIRST_MEMORY (1<<2) +#define NODE_REMOVING_LAST_MEMORY (1<<3) +#define NODE_REMOVED_LAST_MEMORY (1<<4) +#define NODE_CANCEL_REMOVING_LAST_MEMORY (1<<5) + +#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA) +extern int register_node_notifier(struct notifier_block *nb); +extern void unregister_node_notifier(struct notifier_block *nb); +extern int node_notify(unsigned long val, void *v); + +#define hotplug_node_notifier(fn, pri) ({ \ + static __meminitdata struct notifier_block fn##_node_nb =\ + { .notifier_call = fn, .priority = pri };\ + register_node_notifier(&fn##_node_nb); \ +}) +#else +static inline int register_node_notifier(struct notifier_block *nb) +{ + return 0; +} +static inline void unregister_node_notifier(struct notifier_block *nb) +{ +} +static inline int node_notify(unsigned long val, void *v) +{ + return 0; +} +static inline int hotplug_node_notifier(notifier_fn_t fn, int pri) +{ + return 0; +} +#endif + #ifdef CONFIG_NUMA extern void node_dev_init(void); /* Core of the node registration - only memory hotplug should use this */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 94ae0ca37021..e8ccfe4cada2 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -35,6 +35,7 @@ #include #include #include +#include #include @@ -699,24 +700,6 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages) online_mem_sections(start_pfn, end_pfn); } -/* check which state of node_states will be changed when online memory */ -static void node_states_check_changes_online(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) -{ - int nid = zone_to_nid(zone); - - arg->status_change_nid = NUMA_NO_NODE; - - if (!node_state(nid, N_MEMORY)) - arg->status_change_nid = nid; -} - -static void node_states_set_node(int node, struct memory_notify *arg) -{ - if (arg->status_change_nid >= 0) - node_set_state(node, N_MEMORY); -} - static void __meminit resize_zone_range(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages) { @@ -1167,11 +1150,18 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages) int online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { - unsigned long flags; - int need_zonelists_rebuild = 0; + struct memory_notify mem_arg = { + .start_pfn = pfn, + .nr_pages = nr_pages, + .status_change_nid = NUMA_NO_NODE, + }; + struct node_notify node_arg = { + .nid = NUMA_NO_NODE, + }; const int nid = zone_to_nid(zone); + int need_zonelists_rebuild = 0; + unsigned long flags; int ret; - struct memory_notify arg; /* * {on,off}lining is constrained to full memory sections (or more @@ -1188,11 +1178,17 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, /* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); - arg.start_pfn = pfn; - arg.nr_pages = nr_pages; - node_states_check_changes_online(nr_pages, zone, &arg); + if (!node_state(nid, N_MEMORY)) { + /* Adding memory to the node for the first time */ + node_arg.nid = nid; + mem_arg.status_change_nid = nid; + ret = node_notify(NODE_ADDING_FIRST_MEMORY, &node_arg); + ret = notifier_to_errno(ret); + if (ret) + goto failed_addition; + } - ret = memory_notify(MEM_GOING_ONLINE, &arg); + ret = memory_notify(MEM_GOING_ONLINE, &mem_arg); ret = notifier_to_errno(ret); if (ret) goto failed_addition; @@ -1218,7 +1214,8 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, online_pages_range(pfn, nr_pages); adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); - node_states_set_node(nid, &arg); + if (node_arg.nid >= 0) + node_set_state(nid, N_MEMORY); if (need_zonelists_rebuild) build_all_zonelists(NULL); @@ -1239,16 +1236,22 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, kswapd_run(nid); kcompactd_run(nid); + if (node_arg.nid >= 0) + /* First memory added successfully. Notify consumers. */ + node_notify(NODE_ADDED_FIRST_MEMORY, &node_arg); + writeback_set_ratelimit(); - memory_notify(MEM_ONLINE, &arg); + memory_notify(MEM_ONLINE, &mem_arg); return 0; failed_addition: pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); - memory_notify(MEM_CANCEL_ONLINE, &arg); + memory_notify(MEM_CANCEL_ONLINE, &mem_arg); + if (node_arg.nid != NUMA_NO_NODE) + node_notify(NODE_CANCEL_ADDING_FIRST_MEMORY, &node_arg); remove_pfn_range_from_zone(zone, pfn, nr_pages); return ret; } @@ -1880,48 +1883,6 @@ static int __init cmdline_parse_movable_node(char *p) } early_param("movable_node", cmdline_parse_movable_node); -/* check which state of node_states will be changed when offline memory */ -static void node_states_check_changes_offline(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) -{ - struct pglist_data *pgdat = zone->zone_pgdat; - unsigned long present_pages = 0; - enum zone_type zt; - - arg->status_change_nid = NUMA_NO_NODE; - - /* - * Check whether node_states[N_NORMAL_MEMORY] will be changed. - * If the memory to be offline is within the range - * [0..ZONE_NORMAL], and it is the last present memory there, - * the zones in that range will become empty after the offlining, - * thus we can determine that we need to clear the node from - * node_states[N_NORMAL_MEMORY]. - */ - for (zt = 0; zt <= ZONE_NORMAL; zt++) - present_pages += pgdat->node_zones[zt].present_pages; - - /* - * We have accounted the pages from [0..ZONE_NORMAL); ZONE_HIGHMEM - * does not apply as we don't support 32bit. - * Here we count the possible pages from ZONE_MOVABLE. - * If after having accounted all the pages, we see that the nr_pages - * to be offlined is over or equal to the accounted pages, - * we know that the node will become empty, and so, we can clear - * it for N_MEMORY as well. - */ - present_pages += pgdat->node_zones[ZONE_MOVABLE].present_pages; - - if (nr_pages >= present_pages) - arg->status_change_nid = zone_to_nid(zone); -} - -static void node_states_clear_node(int node, struct memory_notify *arg) -{ - if (arg->status_change_nid >= 0) - node_clear_state(node, N_MEMORY); -} - static int count_system_ram_pages_cb(unsigned long start_pfn, unsigned long nr_pages, void *data) { @@ -1937,11 +1898,19 @@ static int count_system_ram_pages_cb(unsigned long start_pfn, int offline_pages(unsigned long start_pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group) { - const unsigned long end_pfn = start_pfn + nr_pages; unsigned long pfn, managed_pages, system_ram_pages = 0; + const unsigned long end_pfn = start_pfn + nr_pages; + struct pglist_data *pgdat = zone->zone_pgdat; const int node = zone_to_nid(zone); + struct memory_notify mem_arg = { + .start_pfn = start_pfn, + .nr_pages = nr_pages, + .status_change_nid = NUMA_NO_NODE, + }; + struct node_notify node_arg = { + .nid = NUMA_NO_NODE, + }; unsigned long flags; - struct memory_notify arg; char *reason; int ret; @@ -2000,11 +1969,21 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, goto failed_removal_pcplists_disabled; } - arg.start_pfn = start_pfn; - arg.nr_pages = nr_pages; - node_states_check_changes_offline(nr_pages, zone, &arg); + /* + * Check whether the node will have no present pages after we offline + * 'nr_pages' more. If so, we know that the node will become empty, and + * so we will clear N_MEMORY for it. + */ + if (nr_pages >= pgdat->node_present_pages) { + node_arg.nid = node; + mem_arg.status_change_nid = node; + ret = node_notify(NODE_REMOVING_LAST_MEMORY, &node_arg); + ret = notifier_to_errno(ret); + if (ret) + goto failed_removal_isolated; + } - ret = memory_notify(MEM_GOING_OFFLINE, &arg); + ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg); ret = notifier_to_errno(ret); if (ret) { reason = "notifier failure"; @@ -2084,27 +2063,32 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, * Make sure to mark the node as memory-less before rebuilding the zone * list. Otherwise this node would still appear in the fallback lists. */ - node_states_clear_node(node, &arg); + if (node_arg.nid >= 0) + node_clear_state(node, N_MEMORY); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } - if (arg.status_change_nid >= 0) { + if (node_arg.nid >= 0) { kcompactd_stop(node); kswapd_stop(node); + /* Node went memoryless. Notify consumers */ + node_notify(NODE_REMOVED_LAST_MEMORY, &node_arg); } writeback_set_ratelimit(); - memory_notify(MEM_OFFLINE, &arg); + memory_notify(MEM_OFFLINE, &mem_arg); remove_pfn_range_from_zone(zone, start_pfn, nr_pages); return 0; failed_removal_isolated: /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); - memory_notify(MEM_CANCEL_OFFLINE, &arg); + memory_notify(MEM_CANCEL_OFFLINE, &mem_arg); + if (node_arg.nid != NUMA_NO_NODE) + node_notify(NODE_CANCEL_REMOVING_LAST_MEMORY, &node_arg); failed_removal_pcplists_disabled: lru_cache_enable(); zone_pcp_enable(zone); -- 2.49.0