* [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
@ 2025-04-09 5:27 Donet Tom
2025-04-09 5:27 ` [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set Donet Tom
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Donet Tom @ 2025-04-09 5:27 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Mike Rapoport
Cc: Ritesh Harjani, rafael, Danilo Krummrich, Donet Tom
A new function, curr_node_memblock_intersect_memory_block, has been
added to check if the current node's NID intersects with a memory block.
This function takes the start and end PFN of a memory block, along with
the node ID being registered. It then finds the memblock region of the
current node and check if the passed memory block intersects with it. If
there is an intersection, the function returns true; otherwise, itreturns
false.
There are two scenarios to consider during the search:
1. The memory block size is greater than the memblock region size.
This means that multiple memblocks can be present within a single
memory block. If the start or end of the memblock is within the
start and end of the memory block, it indicates that the memblock
is part of that memory block. Therefore, the memory block can be
added to the node where the memblock resides.
2. The memory block size is less than or equal to the memblock size
This means that multiple memory blocks can be part of a single memblock
region. If the start or end of the memory block is within the start and
end of the memblock, it indicates that the memory block is part of the
memblock. Therefore, the memory block can be added to the node where
the memblock resides.
In the current implementation, during node device initialization, to
find the memory block NID, it iterates over each PFN of the memory
block until it finds a match. On large systems, this can take a
long time.
With this function, the boot time is reduced.
Boot time without this function - 32TB RAM
==========================================
Startup finished in 1min 12.413s (kernel)
Boot time with this function - 32TB RAM
========================================
Startup finished in 18.031s (kernel)
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
include/linux/memblock.h | 2 ++
mm/memblock.c | 67 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 68 insertions(+), 1 deletion(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index ef5a1ecc6e59..db87f7daa46c 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -277,6 +277,8 @@ static inline bool memblock_is_driver_managed(struct memblock_region *m)
int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
unsigned long *end_pfn);
+bool curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
+ unsigned long end_pfn, int curr_nid);
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
unsigned long *out_end_pfn, int *out_nid);
diff --git a/mm/memblock.c b/mm/memblock.c
index 0a53db4d9f7b..570ab7ac4dce 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -6,6 +6,8 @@
* Copyright (C) 2001 Peter Bergner.
*/
+#include "linux/stddef.h"
+#include "linux/types.h"
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/init.h>
@@ -17,7 +19,7 @@
#include <linux/seq_file.h>
#include <linux/memblock.h>
#include <linux/mutex.h>
-
+#include <linux/minmax.h>
#include <asm/sections.h>
#include <linux/io.h>
@@ -1909,6 +1911,69 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
return !memblock_is_nomap(&memblock.memory.regions[i]);
}
+/**
+ * curr_node_memblock_intersect_memory_block: checks if the current node's memblock
+ * region intersects with the memory block.
+ * @start_pfn: memory block start pfn
+ * @end_pfn: memory block end_pfn
+ * @curr_nid: Current node
+ *
+ * This function takes the start and end PFN of a memory block, as well as the node ID
+ * that is being registered. It then finds the memblock region of the current node and
+ * checks if the passed memory block intersects with the memblock. If there is an
+ * intersection, the function returns true; otherwise, it returns false.
+ *
+ * Return:
+ * If the current node's memblock region intersects with the memory block, it returns
+ * true; otherwise, it returns false.
+ */
+bool __init_memblock curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
+ unsigned long end_pfn, int curr_nid)
+{
+ struct memblock_region *r;
+ unsigned long r_start, r_end;
+ unsigned long size = end_pfn - start_pfn;
+ unsigned long r_size = 0;
+
+ for_each_mem_region(r) {
+ r_start = PFN_DOWN(r->base);
+ r_end = PFN_DOWN(r->base + r->size);
+ r_size = r_end - r_start;
+
+ if (r->nid == curr_nid) {
+ if (size > r_size) {
+ /*
+ * The memory block size is greater than the memblock
+ * region size, meaning multiple memblocks can be present
+ * within a single memory block. If the memblock's start
+ * or end is within the memory block's start and end, It
+ * indicates that the memblock is part of this memory block.
+ * Therefore, the memory block can be added to the node
+ * where the memblock resides.
+ */
+ if (in_range(r_start, start_pfn, size) ||
+ in_range(r_end, start_pfn, size))
+ return true;
+ } else {
+ /*
+ * The memory block size is less than or equal to the
+ * memblock size, meaning multiple memory blocks can
+ * be part of a single memblock region. If the memory
+ * block's start or end is within the memblock's start
+ * and end, it indicates that the memory block is part of
+ * the memblock. Therefore, the memory block can be added
+ * to the node where the memblock resides.
+ */
+ if (in_range(start_pfn, r_start, r_size) ||
+ in_range(end_pfn, r_start, r_size))
+ return true;
+ }
+ }
+ }
+ return false;
+}
+
+
int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
unsigned long *start_pfn, unsigned long *end_pfn)
{
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
2025-04-09 5:27 [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Donet Tom
@ 2025-04-09 5:27 ` Donet Tom
2025-04-10 8:07 ` Mike Rapoport
2025-04-10 2:20 ` [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Andrew Morton
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Donet Tom @ 2025-04-09 5:27 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Mike Rapoport
Cc: Ritesh Harjani, rafael, Danilo Krummrich, Donet Tom
In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
set, we iterate over all PFNs in the memory block and use
early_pfn_to_nid to find the NID until a match is found.
This patch we are using curr_node_memblock_intersect_memory_block() to
check if the current node's memblock intersects with the memory block
passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
is found, the memory block is added to the current node.
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
for finding the NID will continue to be used.
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
drivers/base/node.c | 37 +++++++++++++++++++++++++++++--------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index cd13ef287011..5c5dd02b8bdd 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -20,6 +20,8 @@
#include <linux/pm_runtime.h>
#include <linux/swap.h>
#include <linux/slab.h>
+#include <linux/memblock.h>
+
static const struct bus_type node_subsys = {
.name = "node",
@@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid,
ret);
}
-/* register memory section under specified node if it spans that node */
-static int register_mem_block_under_node_early(struct memory_block *mem_blk,
- void *arg)
+static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk,
+ unsigned long start_pfn, unsigned long end_pfn, int nid)
{
- unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
- unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
- unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
- int nid = *(int *)arg;
- unsigned long pfn;
+ if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid))
+ do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY);
+ return 0;
+}
+
+static int register_mem_block_early__normal(struct memory_block *mem_blk,
+ unsigned long start_pfn, unsigned long end_pfn, int nid)
+{
+ unsigned long pfn;
for (pfn = start_pfn; pfn <= end_pfn; pfn++) {
int page_nid;
@@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk,
/* mem section does not span the specified node */
return 0;
}
+/* register memory section under specified node if it spans that node */
+static int register_mem_block_under_node_early(struct memory_block *mem_blk,
+ void *arg)
+{
+ unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
+ unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
+ unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
+ int nid = *(int *)arg;
+
+#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+ if (system_state < SYSTEM_RUNNING)
+ return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid);
+#endif
+ return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid);
+
+}
/*
* During hotplug we know that all pages in the memory block belong to the same
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
2025-04-09 5:27 [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Donet Tom
2025-04-09 5:27 ` [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set Donet Tom
@ 2025-04-10 2:20 ` Andrew Morton
2025-04-10 4:35 ` Donet Tom
2025-04-10 7:03 ` kernel test robot
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2025-04-10 2:20 UTC (permalink / raw)
To: Donet Tom
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand, linux-mm,
Mike Rapoport, Ritesh Harjani, rafael, Danilo Krummrich
On Wed, 9 Apr 2025 10:57:56 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
> A new function, curr_node_memblock_intersect_memory_block, has been
"intersects".
Because the name is too short ;)
> With this function, the boot time is reduced.
>
> Boot time without this function - 32TB RAM
> ==========================================
> Startup finished in 1min 12.413s (kernel)
>
> Boot time with this function - 32TB RAM
> ========================================
> Startup finished in 18.031s (kernel)
Impressive. I'll assume this is rppt material.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
2025-04-10 2:20 ` [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Andrew Morton
@ 2025-04-10 4:35 ` Donet Tom
0 siblings, 0 replies; 14+ messages in thread
From: Donet Tom @ 2025-04-10 4:35 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand, linux-mm,
Mike Rapoport, Ritesh Harjani, rafael, Danilo Krummrich
On 4/10/25 7:50 AM, Andrew Morton wrote:
> On Wed, 9 Apr 2025 10:57:56 +0530 Donet Tom <donettom@linux.ibm.com> wrote:
>
>> A new function, curr_node_memblock_intersect_memory_block, has been
> "intersects".
>
> Because the name is too short ;)
Thanks Andrew,
I will change the name.
>
>> With this function, the boot time is reduced.
>>
>> Boot time without this function - 32TB RAM
>> ==========================================
>> Startup finished in 1min 12.413s (kernel)
>>
>> Boot time with this function - 32TB RAM
>> ========================================
>> Startup finished in 18.031s (kernel)
> Impressive. I'll assume this is rppt material.
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
2025-04-09 5:27 [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Donet Tom
2025-04-09 5:27 ` [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set Donet Tom
2025-04-10 2:20 ` [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Andrew Morton
@ 2025-04-10 7:03 ` kernel test robot
2025-04-10 7:25 ` kernel test robot
2025-04-10 7:49 ` Mike Rapoport
4 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-04-10 7:03 UTC (permalink / raw)
To: Donet Tom, Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, Mike Rapoport
Cc: oe-kbuild-all, Linux Memory Management List, Ritesh Harjani,
rafael, Danilo Krummrich, Donet Tom
Hi Donet,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Donet-Tom/base-node-Use-curr_node_memblock_intersect_memory_block-to-Get-Memory-Block-NID-if-CONFIG_DEFERRED_STRUCT_PAGE_INIT-is-S/20250409-132924
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/50142a29010463f436dc5c4feb540e5de3bb09df.1744175097.git.donettom%40linux.ibm.com
patch subject: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
config: csky-randconfig-001-20250410 (https://download.01.org/0day-ci/archive/20250410/202504101459.iFHlVGBA-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250410/202504101459.iFHlVGBA-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504101459.iFHlVGBA-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/memblock.c: In function 'curr_node_memblock_intersect_memory_block':
>> mm/memblock.c:1943:22: error: 'struct memblock_region' has no member named 'nid'
1943 | if (r->nid == curr_nid) {
| ^~
vim +1943 mm/memblock.c
1913
1914 /**
1915 * curr_node_memblock_intersect_memory_block: checks if the current node's memblock
1916 * region intersects with the memory block.
1917 * @start_pfn: memory block start pfn
1918 * @end_pfn: memory block end_pfn
1919 * @curr_nid: Current node
1920 *
1921 * This function takes the start and end PFN of a memory block, as well as the node ID
1922 * that is being registered. It then finds the memblock region of the current node and
1923 * checks if the passed memory block intersects with the memblock. If there is an
1924 * intersection, the function returns true; otherwise, it returns false.
1925 *
1926 * Return:
1927 * If the current node's memblock region intersects with the memory block, it returns
1928 * true; otherwise, it returns false.
1929 */
1930 bool __init_memblock curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
1931 unsigned long end_pfn, int curr_nid)
1932 {
1933 struct memblock_region *r;
1934 unsigned long r_start, r_end;
1935 unsigned long size = end_pfn - start_pfn;
1936 unsigned long r_size = 0;
1937
1938 for_each_mem_region(r) {
1939 r_start = PFN_DOWN(r->base);
1940 r_end = PFN_DOWN(r->base + r->size);
1941 r_size = r_end - r_start;
1942
> 1943 if (r->nid == curr_nid) {
1944 if (size > r_size) {
1945 /*
1946 * The memory block size is greater than the memblock
1947 * region size, meaning multiple memblocks can be present
1948 * within a single memory block. If the memblock's start
1949 * or end is within the memory block's start and end, It
1950 * indicates that the memblock is part of this memory block.
1951 * Therefore, the memory block can be added to the node
1952 * where the memblock resides.
1953 */
1954 if (in_range(r_start, start_pfn, size) ||
1955 in_range(r_end, start_pfn, size))
1956 return true;
1957 } else {
1958 /*
1959 * The memory block size is less than or equal to the
1960 * memblock size, meaning multiple memory blocks can
1961 * be part of a single memblock region. If the memory
1962 * block's start or end is within the memblock's start
1963 * and end, it indicates that the memory block is part of
1964 * the memblock. Therefore, the memory block can be added
1965 * to the node where the memblock resides.
1966 */
1967 if (in_range(start_pfn, r_start, r_size) ||
1968 in_range(end_pfn, r_start, r_size))
1969 return true;
1970 }
1971 }
1972 }
1973 return false;
1974 }
1975
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
2025-04-09 5:27 [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Donet Tom
` (2 preceding siblings ...)
2025-04-10 7:03 ` kernel test robot
@ 2025-04-10 7:25 ` kernel test robot
2025-04-10 7:49 ` Mike Rapoport
4 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-04-10 7:25 UTC (permalink / raw)
To: Donet Tom, Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, Mike Rapoport
Cc: llvm, oe-kbuild-all, Linux Memory Management List, Ritesh Harjani,
rafael, Danilo Krummrich, Donet Tom
Hi Donet,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Donet-Tom/base-node-Use-curr_node_memblock_intersect_memory_block-to-Get-Memory-Block-NID-if-CONFIG_DEFERRED_STRUCT_PAGE_INIT-is-S/20250409-132924
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/50142a29010463f436dc5c4feb540e5de3bb09df.1744175097.git.donettom%40linux.ibm.com
patch subject: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
config: hexagon-randconfig-002-20250410 (https://download.01.org/0day-ci/archive/20250410/202504101431.ovUiydUd-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project 92c93f5286b9ff33f27ff694d2dc33da1c07afdd)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250410/202504101431.ovUiydUd-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504101431.ovUiydUd-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/memblock.c:1943:10: error: no member named 'nid' in 'struct memblock_region'
1943 | if (r->nid == curr_nid) {
| ~ ^
1 error generated.
vim +1943 mm/memblock.c
1913
1914 /**
1915 * curr_node_memblock_intersect_memory_block: checks if the current node's memblock
1916 * region intersects with the memory block.
1917 * @start_pfn: memory block start pfn
1918 * @end_pfn: memory block end_pfn
1919 * @curr_nid: Current node
1920 *
1921 * This function takes the start and end PFN of a memory block, as well as the node ID
1922 * that is being registered. It then finds the memblock region of the current node and
1923 * checks if the passed memory block intersects with the memblock. If there is an
1924 * intersection, the function returns true; otherwise, it returns false.
1925 *
1926 * Return:
1927 * If the current node's memblock region intersects with the memory block, it returns
1928 * true; otherwise, it returns false.
1929 */
1930 bool __init_memblock curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
1931 unsigned long end_pfn, int curr_nid)
1932 {
1933 struct memblock_region *r;
1934 unsigned long r_start, r_end;
1935 unsigned long size = end_pfn - start_pfn;
1936 unsigned long r_size = 0;
1937
1938 for_each_mem_region(r) {
1939 r_start = PFN_DOWN(r->base);
1940 r_end = PFN_DOWN(r->base + r->size);
1941 r_size = r_end - r_start;
1942
> 1943 if (r->nid == curr_nid) {
1944 if (size > r_size) {
1945 /*
1946 * The memory block size is greater than the memblock
1947 * region size, meaning multiple memblocks can be present
1948 * within a single memory block. If the memblock's start
1949 * or end is within the memory block's start and end, It
1950 * indicates that the memblock is part of this memory block.
1951 * Therefore, the memory block can be added to the node
1952 * where the memblock resides.
1953 */
1954 if (in_range(r_start, start_pfn, size) ||
1955 in_range(r_end, start_pfn, size))
1956 return true;
1957 } else {
1958 /*
1959 * The memory block size is less than or equal to the
1960 * memblock size, meaning multiple memory blocks can
1961 * be part of a single memblock region. If the memory
1962 * block's start or end is within the memblock's start
1963 * and end, it indicates that the memory block is part of
1964 * the memblock. Therefore, the memory block can be added
1965 * to the node where the memblock resides.
1966 */
1967 if (in_range(start_pfn, r_start, r_size) ||
1968 in_range(end_pfn, r_start, r_size))
1969 return true;
1970 }
1971 }
1972 }
1973 return false;
1974 }
1975
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
2025-04-09 5:27 [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Donet Tom
` (3 preceding siblings ...)
2025-04-10 7:25 ` kernel test robot
@ 2025-04-10 7:49 ` Mike Rapoport
2025-04-10 19:06 ` Donet Tom
4 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2025-04-10 7:49 UTC (permalink / raw)
To: Donet Tom
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
Hi,
On Wed, Apr 09, 2025 at 10:57:56AM +0530, Donet Tom wrote:
> A new function, curr_node_memblock_intersect_memory_block, has been
> added to check if the current node's NID intersects with a memory block.
As Andrew mentioned, the name is too long :)
Maybe memblock_range_intersects_node(), but I think intersection is not the
right thing to check (see below).
Also, memblock does not care about sysfs representation of memory blocks,
please use "range" rather than "memory block" in changelog and comments.
> This function takes the start and end PFN of a memory block, along with
> the node ID being registered. It then finds the memblock region of the
> current node and check if the passed memory block intersects with it. If
> there is an intersection, the function returns true; otherwise, itreturns
> false.
Please describe here what problem you are solving and why you need this
functionality rather than what the new function does.
> There are two scenarios to consider during the search:
>
> 1. The memory block size is greater than the memblock region size.
>
> This means that multiple memblocks can be present within a single
> memory block. If the start or end of the memblock is within the
> start and end of the memory block, it indicates that the memblock
> is part of that memory block. Therefore, the memory block can be
> added to the node where the memblock resides.
If a range crosses several memblocks, it's possible that they belong to a
different nodes.
> 2. The memory block size is less than or equal to the memblock size
>
> This means that multiple memory blocks can be part of a single memblock
> region. If the start or end of the memory block is within the start and
> end of the memblock, it indicates that the memory block is part of the
> memblock. Therefore, the memory block can be added to the node where
> the memblock resides.
>
> In the current implementation, during node device initialization, to
> find the memory block NID, it iterates over each PFN of the memory
> block until it finds a match. On large systems, this can take a
> long time.
Why won't you replace the loop over each PFN with a loop over memblock
memory regions in the node device initialization?
> With this function, the boot time is reduced.
>
> Boot time without this function - 32TB RAM
> ==========================================
> Startup finished in 1min 12.413s (kernel)
>
> Boot time with this function - 32TB RAM
> ========================================
> Startup finished in 18.031s (kernel)
>
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
> include/linux/memblock.h | 2 ++
> mm/memblock.c | 67 +++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index ef5a1ecc6e59..db87f7daa46c 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -277,6 +277,8 @@ static inline bool memblock_is_driver_managed(struct memblock_region *m)
>
> int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
> unsigned long *end_pfn);
> +bool curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
> + unsigned long end_pfn, int curr_nid);
> void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
> unsigned long *out_end_pfn, int *out_nid);
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 0a53db4d9f7b..570ab7ac4dce 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -6,6 +6,8 @@
> * Copyright (C) 2001 Peter Bergner.
> */
>
> +#include "linux/stddef.h"
> +#include "linux/types.h"
> #include <linux/kernel.h>
> #include <linux/slab.h>
> #include <linux/init.h>
> @@ -17,7 +19,7 @@
> #include <linux/seq_file.h>
> #include <linux/memblock.h>
> #include <linux/mutex.h>
> -
> +#include <linux/minmax.h>
> #include <asm/sections.h>
> #include <linux/io.h>
>
> @@ -1909,6 +1911,69 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
> return !memblock_is_nomap(&memblock.memory.regions[i]);
> }
>
> +/**
> + * curr_node_memblock_intersect_memory_block: checks if the current node's memblock
> + * region intersects with the memory block.
> + * @start_pfn: memory block start pfn
> + * @end_pfn: memory block end_pfn
> + * @curr_nid: Current node
> + *
> + * This function takes the start and end PFN of a memory block, as well as the node ID
> + * that is being registered. It then finds the memblock region of the current node and
> + * checks if the passed memory block intersects with the memblock. If there is an
> + * intersection, the function returns true; otherwise, it returns false.
> + *
> + * Return:
> + * If the current node's memblock region intersects with the memory block, it returns
> + * true; otherwise, it returns false.
> + */
> +bool __init_memblock curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
> + unsigned long end_pfn, int curr_nid)
> +{
> + struct memblock_region *r;
> + unsigned long r_start, r_end;
> + unsigned long size = end_pfn - start_pfn;
> + unsigned long r_size = 0;
> +
> + for_each_mem_region(r) {
> + r_start = PFN_DOWN(r->base);
> + r_end = PFN_DOWN(r->base + r->size);
> + r_size = r_end - r_start;
> +
> + if (r->nid == curr_nid) {
r->nid is not defined for !NUMA configurations, please use
memblock_get_region_node()
> + if (size > r_size) {
> + /*
> + * The memory block size is greater than the memblock
> + * region size, meaning multiple memblocks can be present
> + * within a single memory block. If the memblock's start
> + * or end is within the memory block's start and end, It
> + * indicates that the memblock is part of this memory block.
> + * Therefore, the memory block can be added to the node
> + * where the memblock resides.
> + */
> + if (in_range(r_start, start_pfn, size) ||
> + in_range(r_end, start_pfn, size))
> + return true;
> + } else {
> + /*
> + * The memory block size is less than or equal to the
> + * memblock size, meaning multiple memory blocks can
> + * be part of a single memblock region. If the memory
> + * block's start or end is within the memblock's start
> + * and end, it indicates that the memory block is part of
> + * the memblock. Therefore, the memory block can be added
> + * to the node where the memblock resides.
> + */
> + if (in_range(start_pfn, r_start, r_size) ||
> + in_range(end_pfn, r_start, r_size))
> + return true;
> + }
> + }
> + }
> + return false;
> +}
> +
> +
> int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
> unsigned long *start_pfn, unsigned long *end_pfn)
> {
> --
> 2.48.1
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
2025-04-09 5:27 ` [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set Donet Tom
@ 2025-04-10 8:07 ` Mike Rapoport
2025-04-10 18:57 ` Donet Tom
0 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2025-04-10 8:07 UTC (permalink / raw)
To: Donet Tom
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote:
> In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
> set, we iterate over all PFNs in the memory block and use
> early_pfn_to_nid to find the NID until a match is found.
>
> This patch we are using curr_node_memblock_intersect_memory_block() to
> check if the current node's memblock intersects with the memory block
> passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
> is found, the memory block is added to the current node.
>
> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
> for finding the NID will continue to be used.
I don't think we really need different mechanisms for different settings of
CONFIG_DEFERRED_STRUCT_PAGE_INIT.
node_dev_init() runs after all struct pages are already initialized and can
always use pfn_to_nid().
kernel_init_freeable() ->
page_alloc_init_late(); /* completes initialization of deferred pages */
...
do_basic_setup() ->
driver_init() ->
node_dev_init();
The next step could be refactoring register_mem_block_under_node_early() to
loop over memblock regions rather than over pfns.
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
> drivers/base/node.c | 37 +++++++++++++++++++++++++++++--------
> 1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index cd13ef287011..5c5dd02b8bdd 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -20,6 +20,8 @@
> #include <linux/pm_runtime.h>
> #include <linux/swap.h>
> #include <linux/slab.h>
> +#include <linux/memblock.h>
> +
>
> static const struct bus_type node_subsys = {
> .name = "node",
> @@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid,
> ret);
> }
>
> -/* register memory section under specified node if it spans that node */
> -static int register_mem_block_under_node_early(struct memory_block *mem_blk,
> - void *arg)
> +static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk,
> + unsigned long start_pfn, unsigned long end_pfn, int nid)
> {
> - unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
> - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> - unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
> - int nid = *(int *)arg;
> - unsigned long pfn;
>
> + if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid))
> + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY);
> + return 0;
> +}
> +
> +static int register_mem_block_early__normal(struct memory_block *mem_blk,
> + unsigned long start_pfn, unsigned long end_pfn, int nid)
> +{
> + unsigned long pfn;
> for (pfn = start_pfn; pfn <= end_pfn; pfn++) {
> int page_nid;
>
> @@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk,
> /* mem section does not span the specified node */
> return 0;
> }
> +/* register memory section under specified node if it spans that node */
> +static int register_mem_block_under_node_early(struct memory_block *mem_blk,
> + void *arg)
> +{
> + unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
> + unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
> + unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
> + int nid = *(int *)arg;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> + if (system_state < SYSTEM_RUNNING)
> + return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid);
> +#endif
> + return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid);
> +
> +}
>
> /*
> * During hotplug we know that all pages in the memory block belong to the same
> --
> 2.48.1
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
2025-04-10 8:07 ` Mike Rapoport
@ 2025-04-10 18:57 ` Donet Tom
2025-04-11 10:59 ` Mike Rapoport
0 siblings, 1 reply; 14+ messages in thread
From: Donet Tom @ 2025-04-10 18:57 UTC (permalink / raw)
To: Mike Rapoport
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
On 4/10/25 1:37 PM, Mike Rapoport wrote:
> On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote:
>> In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
>> set, we iterate over all PFNs in the memory block and use
>> early_pfn_to_nid to find the NID until a match is found.
>>
>> This patch we are using curr_node_memblock_intersect_memory_block() to
>> check if the current node's memblock intersects with the memory block
>> passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
>> is found, the memory block is added to the current node.
>>
>> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
>> for finding the NID will continue to be used.
> I don't think we really need different mechanisms for different settings of
> CONFIG_DEFERRED_STRUCT_PAGE_INIT.
>
> node_dev_init() runs after all struct pages are already initialized and can
> always use pfn_to_nid().
In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, we perform a binary search in the memblock region to
determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain
the pfn's nid.
Your point is that we could unify this logic and always use
pfn_to_nid() to determine the pfn's nid, regardless of whether
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that
correct?
>
> kernel_init_freeable() ->
> page_alloc_init_late(); /* completes initialization of deferred pages */
> ...
> do_basic_setup() ->
> driver_init() ->
> node_dev_init();
>
> The next step could be refactoring register_mem_block_under_node_early() to
> loop over memblock regions rather than over pfns.
So it the current implementation
node_dev_init()
register_one_node
register_memory_blocks_under_node
walk_memory_blocks()
register_mem_block_under_node_early
get_nid_for_pfn
We get each node's start and end PFN from the pg_data. Using these
values, we determine the memory block's start and end within the
current node. To identify the node to which these memory block
belongs,we iterate over each PFN in the range.
The problem I am facing is,
In my system node4 has a memory block ranging from memory30351
to memory38524, and memory128433. The memory blocks between
memory38524 and memory128433 do not belong to this node.
In walk_memory_blocks() we iterate over all memory blocks starting
from memory38524 to memory128433.
In register_mem_block_under_node_early(), up to memory38524, the
first pfn correctly returns the corresponding nid and the function
returns from there. But after memory38524 and until memory128433,
the loop iterates through each pfn and checks the nid. Since the nid
does not match the required nid, the loop continues. This causes
the soft lockups.
This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, as a binary search is used to determine the PFN's nid. When
this configuration is disabled, pfn_to_nid is faster, and the issue does
not seen.( Faster because nid is getting from page)
To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT
is enabled, I added this function that iterates over all memblock regions
for each memory block to determine its nid.
"Loop over memblock regions instead of iterating over PFNs" -
My question is - in register_one_node, do you mean that we should iterate
over all memblock regions, identify the regions belonging to the current
node, and then retrieve the corresponding memory blocks to register them
under that node?
Thanks
Donet
>
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>> ---
>> drivers/base/node.c | 37 +++++++++++++++++++++++++++++--------
>> 1 file changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index cd13ef287011..5c5dd02b8bdd 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -20,6 +20,8 @@
>> #include <linux/pm_runtime.h>
>> #include <linux/swap.h>
>> #include <linux/slab.h>
>> +#include <linux/memblock.h>
>> +
>>
>> static const struct bus_type node_subsys = {
>> .name = "node",
>> @@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid,
>> ret);
>> }
>>
>> -/* register memory section under specified node if it spans that node */
>> -static int register_mem_block_under_node_early(struct memory_block *mem_blk,
>> - void *arg)
>> +static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk,
>> + unsigned long start_pfn, unsigned long end_pfn, int nid)
>> {
>> - unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
>> - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>> - unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
>> - int nid = *(int *)arg;
>> - unsigned long pfn;
>>
>> + if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid))
>> + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY);
>> + return 0;
>> +}
>> +
>> +static int register_mem_block_early__normal(struct memory_block *mem_blk,
>> + unsigned long start_pfn, unsigned long end_pfn, int nid)
>> +{
>> + unsigned long pfn;
>> for (pfn = start_pfn; pfn <= end_pfn; pfn++) {
>> int page_nid;
>>
>> @@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk,
>> /* mem section does not span the specified node */
>> return 0;
>> }
>> +/* register memory section under specified node if it spans that node */
>> +static int register_mem_block_under_node_early(struct memory_block *mem_blk,
>> + void *arg)
>> +{
>> + unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
>> + unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>> + unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
>> + int nid = *(int *)arg;
>> +
>> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> + if (system_state < SYSTEM_RUNNING)
>> + return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid);
>> +#endif
>> + return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid);
>> +
>> +}
>>
>> /*
>> * During hotplug we know that all pages in the memory block belong to the same
>> --
>> 2.48.1
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block
2025-04-10 7:49 ` Mike Rapoport
@ 2025-04-10 19:06 ` Donet Tom
0 siblings, 0 replies; 14+ messages in thread
From: Donet Tom @ 2025-04-10 19:06 UTC (permalink / raw)
To: Mike Rapoport
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
On 4/10/25 1:19 PM, Mike Rapoport wrote:
> Hi,
>
> On Wed, Apr 09, 2025 at 10:57:56AM +0530, Donet Tom wrote:
>> A new function, curr_node_memblock_intersect_memory_block, has been
>> added to check if the current node's NID intersects with a memory block.
> As Andrew mentioned, the name is too long :)
Sorry, I will reduce the name length from next time.
> Maybe memblock_range_intersects_node(), but I think intersection is not the
> right thing to check (see below).
>
> Also, memblock does not care about sysfs representation of memory blocks,
> please use "range" rather than "memory block" in changelog and comments.
sure.
>
>> This function takes the start and end PFN of a memory block, along with
>> the node ID being registered. It then finds the memblock region of the
>> current node and check if the passed memory block intersects with it. If
>> there is an intersection, the function returns true; otherwise, itreturns
>> false.
>
> Please describe here what problem you are solving and why you need this
> functionality rather than what the new function does.
>
>> There are two scenarios to consider during the search:
>>
>> 1. The memory block size is greater than the memblock region size.
>>
>> This means that multiple memblocks can be present within a single
>> memory block. If the start or end of the memblock is within the
>> start and end of the memory block, it indicates that the memblock
>> is part of that memory block. Therefore, the memory block can be
>> added to the node where the memblock resides.
> If a range crosses several memblocks, it's possible that they belong to a
> different nodes.
>
>> 2. The memory block size is less than or equal to the memblock size
>>
>> This means that multiple memory blocks can be part of a single memblock
>> region. If the start or end of the memory block is within the start and
>> end of the memblock, it indicates that the memory block is part of the
>> memblock. Therefore, the memory block can be added to the node where
>> the memblock resides.
>>
>> In the current implementation, during node device initialization, to
>> find the memory block NID, it iterates over each PFN of the memory
>> block until it finds a match. On large systems, this can take a
>> long time.
> Why won't you replace the loop over each PFN with a loop over memblock
> memory regions in the node device initialization?
>
The same question I have ,do you mean that we should iterate
over all memblock regions, identify the regions belonging to the current
node, and then retrieve the corresponding memory blocks to register them
under that node?
Thanks
Donet
>
>> With this function, the boot time is reduced.
>>
>> Boot time without this function - 32TB RAM
>> ==========================================
>> Startup finished in 1min 12.413s (kernel)
>>
>> Boot time with this function - 32TB RAM
>> ========================================
>> Startup finished in 18.031s (kernel)
>>
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>> ---
>> include/linux/memblock.h | 2 ++
>> mm/memblock.c | 67 +++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 68 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index ef5a1ecc6e59..db87f7daa46c 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -277,6 +277,8 @@ static inline bool memblock_is_driver_managed(struct memblock_region *m)
>>
>> int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
>> unsigned long *end_pfn);
>> +bool curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
>> + unsigned long end_pfn, int curr_nid);
>> void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
>> unsigned long *out_end_pfn, int *out_nid);
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 0a53db4d9f7b..570ab7ac4dce 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -6,6 +6,8 @@
>> * Copyright (C) 2001 Peter Bergner.
>> */
>>
>> +#include "linux/stddef.h"
>> +#include "linux/types.h"
>> #include <linux/kernel.h>
>> #include <linux/slab.h>
>> #include <linux/init.h>
>> @@ -17,7 +19,7 @@
>> #include <linux/seq_file.h>
>> #include <linux/memblock.h>
>> #include <linux/mutex.h>
>> -
>> +#include <linux/minmax.h>
>> #include <asm/sections.h>
>> #include <linux/io.h>
>>
>> @@ -1909,6 +1911,69 @@ bool __init_memblock memblock_is_map_memory(phys_addr_t addr)
>> return !memblock_is_nomap(&memblock.memory.regions[i]);
>> }
>>
>> +/**
>> + * curr_node_memblock_intersect_memory_block: checks if the current node's memblock
>> + * region intersects with the memory block.
>> + * @start_pfn: memory block start pfn
>> + * @end_pfn: memory block end_pfn
>> + * @curr_nid: Current node
>> + *
>> + * This function takes the start and end PFN of a memory block, as well as the node ID
>> + * that is being registered. It then finds the memblock region of the current node and
>> + * checks if the passed memory block intersects with the memblock. If there is an
>> + * intersection, the function returns true; otherwise, it returns false.
>> + *
>> + * Return:
>> + * If the current node's memblock region intersects with the memory block, it returns
>> + * true; otherwise, it returns false.
>> + */
>> +bool __init_memblock curr_node_memblock_intersect_memory_block(unsigned long start_pfn,
>> + unsigned long end_pfn, int curr_nid)
>> +{
>> + struct memblock_region *r;
>> + unsigned long r_start, r_end;
>> + unsigned long size = end_pfn - start_pfn;
>> + unsigned long r_size = 0;
>> +
>> + for_each_mem_region(r) {
>> + r_start = PFN_DOWN(r->base);
>> + r_end = PFN_DOWN(r->base + r->size);
>> + r_size = r_end - r_start;
>> +
>> + if (r->nid == curr_nid) {
> r->nid is not defined for !NUMA configurations, please use
> memblock_get_region_node()
>
>> + if (size > r_size) {
>> + /*
>> + * The memory block size is greater than the memblock
>> + * region size, meaning multiple memblocks can be present
>> + * within a single memory block. If the memblock's start
>> + * or end is within the memory block's start and end, It
>> + * indicates that the memblock is part of this memory block.
>> + * Therefore, the memory block can be added to the node
>> + * where the memblock resides.
>> + */
>> + if (in_range(r_start, start_pfn, size) ||
>> + in_range(r_end, start_pfn, size))
>> + return true;
>> + } else {
>> + /*
>> + * The memory block size is less than or equal to the
>> + * memblock size, meaning multiple memory blocks can
>> + * be part of a single memblock region. If the memory
>> + * block's start or end is within the memblock's start
>> + * and end, it indicates that the memory block is part of
>> + * the memblock. Therefore, the memory block can be added
>> + * to the node where the memblock resides.
>> + */
>> + if (in_range(start_pfn, r_start, r_size) ||
>> + in_range(end_pfn, r_start, r_size))
>> + return true;
>> + }
>> + }
>> + }
>> + return false;
>> +}
>> +
>> +
>> int __init_memblock memblock_search_pfn_nid(unsigned long pfn,
>> unsigned long *start_pfn, unsigned long *end_pfn)
>> {
>> --
>> 2.48.1
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
2025-04-10 18:57 ` Donet Tom
@ 2025-04-11 10:59 ` Mike Rapoport
2025-04-11 11:36 ` Donet Tom
0 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2025-04-11 10:59 UTC (permalink / raw)
To: Donet Tom
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
On Fri, Apr 11, 2025 at 12:27:28AM +0530, Donet Tom wrote:
>
> On 4/10/25 1:37 PM, Mike Rapoport wrote:
> > On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote:
> > > In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
> > > set, we iterate over all PFNs in the memory block and use
> > > early_pfn_to_nid to find the NID until a match is found.
> > >
> > > This patch we are using curr_node_memblock_intersect_memory_block() to
> > > check if the current node's memblock intersects with the memory block
> > > passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
> > > is found, the memory block is added to the current node.
> > >
> > > If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
> > > for finding the NID will continue to be used.
> > I don't think we really need different mechanisms for different settings of
> > CONFIG_DEFERRED_STRUCT_PAGE_INIT.
> >
> > node_dev_init() runs after all struct pages are already initialized and can
> > always use pfn_to_nid().
>
>
> In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT
> is enabled, we perform a binary search in the memblock region to
> determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain
> the pfn's nid.
>
> Your point is that we could unify this logic and always use
> pfn_to_nid() to determine the pfn's nid, regardless of whether
> CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that
> correct?
Yes, struct pages should be ready by the time node_dev_init() is called
even when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set.
> >
> > kernel_init_freeable() ->
> > page_alloc_init_late(); /* completes initialization of deferred pages */
> > ...
> > do_basic_setup() ->
> > driver_init() ->
> > node_dev_init();
> >
> > The next step could be refactoring register_mem_block_under_node_early() to
> > loop over memblock regions rather than over pfns.
> So it the current implementation
>
> node_dev_init()
> register_one_node
> register_memory_blocks_under_node
> walk_memory_blocks()
> register_mem_block_under_node_early
> get_nid_for_pfn
>
> We get each node's start and end PFN from the pg_data. Using these
> values, we determine the memory block's start and end within the
> current node. To identify the node to which these memory block
> belongs,we iterate over each PFN in the range.
>
> The problem I am facing is,
>
> In my system node4 has a memory block ranging from memory30351
> to memory38524, and memory128433. The memory blocks between
> memory38524 and memory128433 do not belong to this node.
>
> In walk_memory_blocks() we iterate over all memory blocks starting
> from memory38524 to memory128433.
> In register_mem_block_under_node_early(), up to memory38524, the
> first pfn correctly returns the corresponding nid and the function
> returns from there. But after memory38524 and until memory128433,
> the loop iterates through each pfn and checks the nid. Since the nid
> does not match the required nid, the loop continues. This causes
> the soft lockups.
>
> This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT
> is enabled, as a binary search is used to determine the PFN's nid. When
> this configuration is disabled, pfn_to_nid is faster, and the issue does
> not seen.( Faster because nid is getting from page)
>
> To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT
> is enabled, I added this function that iterates over all memblock regions
> for each memory block to determine its nid.
>
> "Loop over memblock regions instead of iterating over PFNs" -
> My question is - in register_one_node, do you mean that we should iterate
> over all memblock regions, identify the regions belonging to the current
> node, and then retrieve the corresponding memory blocks to register them
> under that node?
I looked more closely at register_mem_block_under_node_early() and
iteration over memblock regions won't make sense there.
It might make sense to use for_each_mem_range() as top level loop in
node_dev_init(), but that's a separate topic.
> Thanks
> Donet
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
2025-04-11 10:59 ` Mike Rapoport
@ 2025-04-11 11:36 ` Donet Tom
2025-04-15 9:46 ` Mike Rapoport
0 siblings, 1 reply; 14+ messages in thread
From: Donet Tom @ 2025-04-11 11:36 UTC (permalink / raw)
To: Mike Rapoport
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
On 4/11/25 4:29 PM, Mike Rapoport wrote:
> On Fri, Apr 11, 2025 at 12:27:28AM +0530, Donet Tom wrote:
>> On 4/10/25 1:37 PM, Mike Rapoport wrote:
>>> On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote:
>>>> In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is
>>>> set, we iterate over all PFNs in the memory block and use
>>>> early_pfn_to_nid to find the NID until a match is found.
>>>>
>>>> This patch we are using curr_node_memblock_intersect_memory_block() to
>>>> check if the current node's memblock intersects with the memory block
>>>> passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection
>>>> is found, the memory block is added to the current node.
>>>>
>>>> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism
>>>> for finding the NID will continue to be used.
>>> I don't think we really need different mechanisms for different settings of
>>> CONFIG_DEFERRED_STRUCT_PAGE_INIT.
>>>
>>> node_dev_init() runs after all struct pages are already initialized and can
>>> always use pfn_to_nid().
>>
>> In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> is enabled, we perform a binary search in the memblock region to
>> determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain
>> the pfn's nid.
>>
>> Your point is that we could unify this logic and always use
>> pfn_to_nid() to determine the pfn's nid, regardless of whether
>> CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that
>> correct?
> Yes, struct pages should be ready by the time node_dev_init() is called
> even when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set.
ok.
Thanks Mike.
>
>>> kernel_init_freeable() ->
>>> page_alloc_init_late(); /* completes initialization of deferred pages */
>>> ...
>>> do_basic_setup() ->
>>> driver_init() ->
>>> node_dev_init();
>>>
>>> The next step could be refactoring register_mem_block_under_node_early() to
>>> loop over memblock regions rather than over pfns.
>> So it the current implementation
>>
>> node_dev_init()
>> register_one_node
>> register_memory_blocks_under_node
>> walk_memory_blocks()
>> register_mem_block_under_node_early
>> get_nid_for_pfn
>>
>> We get each node's start and end PFN from the pg_data. Using these
>> values, we determine the memory block's start and end within the
>> current node. To identify the node to which these memory block
>> belongs,we iterate over each PFN in the range.
>>
>> The problem I am facing is,
>>
>> In my system node4 has a memory block ranging from memory30351
>> to memory38524, and memory128433. The memory blocks between
>> memory38524 and memory128433 do not belong to this node.
>>
>> In walk_memory_blocks() we iterate over all memory blocks starting
>> from memory38524 to memory128433.
>> In register_mem_block_under_node_early(), up to memory38524, the
>> first pfn correctly returns the corresponding nid and the function
>> returns from there. But after memory38524 and until memory128433,
>> the loop iterates through each pfn and checks the nid. Since the nid
>> does not match the required nid, the loop continues. This causes
>> the soft lockups.
>>
>> This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> is enabled, as a binary search is used to determine the PFN's nid. When
>> this configuration is disabled, pfn_to_nid is faster, and the issue does
>> not seen.( Faster because nid is getting from page)
>>
>> To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> is enabled, I added this function that iterates over all memblock regions
>> for each memory block to determine its nid.
>>
>> "Loop over memblock regions instead of iterating over PFNs" -
>> My question is - in register_one_node, do you mean that we should iterate
>> over all memblock regions, identify the regions belonging to the current
>> node, and then retrieve the corresponding memory blocks to register them
>> under that node?
> I looked more closely at register_mem_block_under_node_early() and
> iteration over memblock regions won't make sense there.
>
> It might make sense to use for_each_mem_range() as top level loop in
> node_dev_init(), but that's a separate topic.
Yes, this makes sense to me as well. So in your opinion, instead of adding
a new memblock search function like I added , it's better to use
|for_each_mem_range()| in|node_dev_init()|, which would work for all
cases—regardless of whether|CONFIG_DEFERRED_STRUCT_PAGE_INIT| is set or
not. Right?
>
>> Thanks
>> Donet
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
2025-04-11 11:36 ` Donet Tom
@ 2025-04-15 9:46 ` Mike Rapoport
2025-04-15 10:08 ` Donet Tom
0 siblings, 1 reply; 14+ messages in thread
From: Mike Rapoport @ 2025-04-15 9:46 UTC (permalink / raw)
To: Donet Tom
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
On Fri, Apr 11, 2025 at 05:06:55PM +0530, Donet Tom wrote:
> On 4/11/25 4:29 PM, Mike Rapoport wrote:
> >
> > It might make sense to use for_each_mem_range() as top level loop in
> > node_dev_init(), but that's a separate topic.
>
> Yes, this makes sense to me as well. So in your opinion, instead of adding
> a new memblock search function like I added , it's better to use
> |for_each_mem_range()| in|node_dev_init()|, which would work for all
> cases—regardless of whether|CONFIG_DEFERRED_STRUCT_PAGE_INIT| is set or
> not. Right?
Yes
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set
2025-04-15 9:46 ` Mike Rapoport
@ 2025-04-15 10:08 ` Donet Tom
0 siblings, 0 replies; 14+ messages in thread
From: Donet Tom @ 2025-04-15 10:08 UTC (permalink / raw)
To: Mike Rapoport
Cc: Greg Kroah-Hartman, linux-kernel, David Hildenbrand,
Andrew Morton, linux-mm, Ritesh Harjani, rafael, Danilo Krummrich
On 4/15/25 3:16 PM, Mike Rapoport wrote:
> On Fri, Apr 11, 2025 at 05:06:55PM +0530, Donet Tom wrote:
>> On 4/11/25 4:29 PM, Mike Rapoport wrote:
>>> It might make sense to use for_each_mem_range() as top level loop in
>>> node_dev_init(), but that's a separate topic.
>> Yes, this makes sense to me as well. So in your opinion, instead of adding
>> a new memblock search function like I added , it's better to use
>> |for_each_mem_range()| in|node_dev_init()|, which would work for all
>> cases—regardless of whether|CONFIG_DEFERRED_STRUCT_PAGE_INIT| is set or
>> not. Right?
>
> Yes
Thank you so much. I will implement it.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-04-15 10:08 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09 5:27 [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Donet Tom
2025-04-09 5:27 ` [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set Donet Tom
2025-04-10 8:07 ` Mike Rapoport
2025-04-10 18:57 ` Donet Tom
2025-04-11 10:59 ` Mike Rapoport
2025-04-11 11:36 ` Donet Tom
2025-04-15 9:46 ` Mike Rapoport
2025-04-15 10:08 ` Donet Tom
2025-04-10 2:20 ` [PATCH 1/2] mm/memblock: Added a New Memblock Function to Check if the Current Node's Memblock Region Intersects with a Memory Block Andrew Morton
2025-04-10 4:35 ` Donet Tom
2025-04-10 7:03 ` kernel test robot
2025-04-10 7:25 ` kernel test robot
2025-04-10 7:49 ` Mike Rapoport
2025-04-10 19:06 ` Donet Tom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).