[PATCH] mm/sparse: Optimize section number calculations using bit shifts

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH] mm/sparse: Optimize section number calculations using bit shifts
@ 2026-06-16  2:59 Zhen Ni
  2026-06-16  6:32 ` Mike Rapoport
  2026-06-16  8:06 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 4+ messages in thread
From: Zhen Ni @ 2026-06-16  2:59 UTC (permalink / raw)
  To: Andrew Morton, Kairui Song, Qi Zheng, Shakeel Butt, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel, Zhen Ni

Add SECTIONS_PER_ROOT_SHIFT = ilog2(SECTIONS_PER_ROOT) with correctness
guaranteed by BUILD_BUG_ON in sparse_init(). Convert SECTION_NR_TO_ROOT
to use right shift instead of division for better performance. Add
SECTION_NR_IN_ROOT() macro to improve code readability.

This improves code efficiency in hot paths where __nr_to_section() is
frequently called, such as sparse_init() and memory section management
operations.

Performance verification in sparse_init() on ARM (8GB RAM, 4 NUMA nodes):

    sparse_init()
    |
    +----> memblocks_present()
    |
    +----> section initialization (sparse_init_nid loop)

Time measurement points:

    [T1] sparse_init start
         |
         v
    [T2] memblocks_present() complete
         |
         v
    [T3] sparse_init_nid() loop complete / sparse_init end

Measurement values:
    memblocks_present_cycles = T2 - T1
    section_initialization_cycles = T3 - T2
    total_cycles = T3 - T1

Before (division):
[    0.000000] sparse_init: total 7538 cycles
[    0.000000]   memblocks_present: 4232 cycles
[    0.000000]   section initialization: 3261 cycles

After (bit shift):
[    0.000000] sparse_init: total 5641 cycles
[    0.000000]   memblocks_present: 3562 cycles
[    0.000000]   section initialization: 2057 cycles

Performance improvement:
  Total: (7538-5641)/7538 = 25.2% faster
  memblocks_present: (4232-3562)/4232 = 15.8% faster
  section initialization: (3261-2057)/3261 = 36.9% faster

Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
---
 include/linux/mmzone.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9adb2ad21da5..5daf471f6823 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2035,11 +2035,14 @@ struct mem_section {
 
 #ifdef CONFIG_SPARSEMEM_EXTREME
 #define SECTIONS_PER_ROOT       (PAGE_SIZE / sizeof (struct mem_section))
+#define SECTIONS_PER_ROOT_SHIFT ilog2(SECTIONS_PER_ROOT)
 #else
 #define SECTIONS_PER_ROOT	1
+#define SECTIONS_PER_ROOT_SHIFT 0
 #endif
 
-#define SECTION_NR_TO_ROOT(sec)	((sec) / SECTIONS_PER_ROOT)
+#define SECTION_NR_TO_ROOT(sec)	((sec) >> SECTIONS_PER_ROOT_SHIFT)
+#define SECTION_NR_IN_ROOT(sec)	((sec) & SECTION_ROOT_MASK)
 #define NR_SECTION_ROOTS	DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
 #define SECTION_ROOT_MASK	(SECTIONS_PER_ROOT - 1)
 
@@ -2065,7 +2068,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
 	if (!mem_section || !mem_section[root])
 		return NULL;
 #endif
-	return &mem_section[root][nr & SECTION_ROOT_MASK];
+	return &mem_section[root][SECTION_NR_IN_ROOT(nr)];
 }
 extern size_t mem_section_usage_size(void);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/sparse: Optimize section number calculations using bit shifts
  2026-06-16  2:59 [PATCH] mm/sparse: Optimize section number calculations using bit shifts Zhen Ni
@ 2026-06-16  6:32 ` Mike Rapoport
       [not found]   ` <764b8fef-4e77-4daf-b2ba-45745061ade9@easystack.cn>
  2026-06-16  8:06 ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 4+ messages in thread
From: Mike Rapoport @ 2026-06-16  6:32 UTC (permalink / raw)
  To: Zhen Ni
  Cc: Andrew Morton, Kairui Song, Qi Zheng, Shakeel Butt, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel

Hi,

On Tue, Jun 16, 2026 at 10:59:42AM +0800, Zhen Ni wrote:
> Add SECTIONS_PER_ROOT_SHIFT = ilog2(SECTIONS_PER_ROOT) with correctness
> guaranteed by BUILD_BUG_ON in sparse_init(). Convert SECTION_NR_TO_ROOT
> to use right shift instead of division for better performance. Add
> SECTION_NR_IN_ROOT() macro to improve code readability.
> 
> This improves code efficiency in hot paths where __nr_to_section() is
> frequently called, such as sparse_init() and memory section management
> operations.
> 
> Performance verification in sparse_init() on ARM (8GB RAM, 4 NUMA nodes):
> 
>     sparse_init()
>     |
>     +----> memblocks_present()
>     |
>     +----> section initialization (sparse_init_nid loop)
> 
> Time measurement points:
> 
>     [T1] sparse_init start
>          |
>          v
>     [T2] memblocks_present() complete
>          |
>          v
>     [T3] sparse_init_nid() loop complete / sparse_init end
> 
> Measurement values:
>     memblocks_present_cycles = T2 - T1
>     section_initialization_cycles = T3 - T2
>     total_cycles = T3 - T1
> 
> Before (division):
> [    0.000000] sparse_init: total 7538 cycles
> [    0.000000]   memblocks_present: 4232 cycles
> [    0.000000]   section initialization: 3261 cycles
> 
> After (bit shift):
> [    0.000000] sparse_init: total 5641 cycles
> [    0.000000]   memblocks_present: 3562 cycles
> [    0.000000]   section initialization: 2057 cycles
> 
> Performance improvement:
>   Total: (7538-5641)/7538 = 25.2% faster
>   memblocks_present: (4232-3562)/4232 = 15.8% faster
>   section initialization: (3261-2057)/3261 = 36.9% faster

This is a nice improvement, but it's not the hot path. I believe you can
derive improvement to __nr_to_section() from these measurements.
 
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
>  include/linux/mmzone.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 9adb2ad21da5..5daf471f6823 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2035,11 +2035,14 @@ struct mem_section {
>  
>  #ifdef CONFIG_SPARSEMEM_EXTREME
>  #define SECTIONS_PER_ROOT       (PAGE_SIZE / sizeof (struct mem_section))
> +#define SECTIONS_PER_ROOT_SHIFT ilog2(SECTIONS_PER_ROOT)
>  #else
>  #define SECTIONS_PER_ROOT	1
> +#define SECTIONS_PER_ROOT_SHIFT 0
>  #endif
>  
> -#define SECTION_NR_TO_ROOT(sec)	((sec) / SECTIONS_PER_ROOT)
> +#define SECTION_NR_TO_ROOT(sec)	((sec) >> SECTIONS_PER_ROOT_SHIFT)
> +#define SECTION_NR_IN_ROOT(sec)	((sec) & SECTION_ROOT_MASK)
>  #define NR_SECTION_ROOTS	DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
>  #define SECTION_ROOT_MASK	(SECTIONS_PER_ROOT - 1)
>  
> @@ -2065,7 +2068,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
>  	if (!mem_section || !mem_section[root])
>  		return NULL;
>  #endif
> -	return &mem_section[root][nr & SECTION_ROOT_MASK];
> +	return &mem_section[root][SECTION_NR_IN_ROOT(nr)];

The explicit masking is clearer IMO.

>  }
>  extern size_t mem_section_usage_size(void);

Hmm, I don't see BUILD_BUG_ON() you mention in the changelog.
 
> -- 
> 2.20.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/sparse: Optimize section number calculations using bit shifts
       [not found]   ` <764b8fef-4e77-4daf-b2ba-45745061ade9@easystack.cn>
@ 2026-06-16  7:56     ` Mike Rapoport
  0 siblings, 0 replies; 4+ messages in thread
From: Mike Rapoport @ 2026-06-16  7:56 UTC (permalink / raw)
  To: zhen.ni
  Cc: Andrew Morton, Kairui Song, Qi Zheng, Shakeel Butt, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, David Hildenbrand,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
	Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel

On Tue, Jun 16, 2026 at 03:29:38PM +0800, zhen.ni wrote:
> 在 2026/6/16 14:32, Mike Rapoport 写道:
> > On Tue, Jun 16, 2026 at 10:59:42AM +0800, Zhen Ni wrote:
> > > 
> > > Performance improvement:
> > >    Total: (7538-5641)/7538 = 25.2% faster
> > >    memblocks_present: (4232-3562)/4232 = 15.8% faster
> > >    section initialization: (3261-2057)/3261 = 36.9% faster
> > 
> > This is a nice improvement, but it's not the hot path. I believe you can
> > derive improvement to __nr_to_section() from these measurements.
> 
> sparse_init() is not a hot path, but it invokes __nr_to_section() in a
> tight loop, making it a good measurement point to demonstrate the
> performance improvement.
 
Right, and explanation along these lines should be in the changelog.

> > > Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> > > ---
> > >   include/linux/mmzone.h | 7 +++++--
> > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > > index 9adb2ad21da5..5daf471f6823 100644
> > > --- a/include/linux/mmzone.h
> > > +++ b/include/linux/mmzone.h
> > > @@ -2035,11 +2035,14 @@ struct mem_section {
> > >   #ifdef CONFIG_SPARSEMEM_EXTREME
> > >   #define SECTIONS_PER_ROOT       (PAGE_SIZE / sizeof (struct mem_section))
> > > +#define SECTIONS_PER_ROOT_SHIFT ilog2(SECTIONS_PER_ROOT)
> > >   #else
> > >   #define SECTIONS_PER_ROOT	1
> > > +#define SECTIONS_PER_ROOT_SHIFT 0
> > >   #endif
> > > -#define SECTION_NR_TO_ROOT(sec)	((sec) / SECTIONS_PER_ROOT)
> > > +#define SECTION_NR_TO_ROOT(sec)	((sec) >> SECTIONS_PER_ROOT_SHIFT)
> > > +#define SECTION_NR_IN_ROOT(sec)	((sec) & SECTION_ROOT_MASK)
> > >   #define NR_SECTION_ROOTS	DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
> > >   #define SECTION_ROOT_MASK	(SECTIONS_PER_ROOT - 1)
> > > @@ -2065,7 +2068,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
> > >   	if (!mem_section || !mem_section[root])
> > >   		return NULL;
> > >   #endif
> > > -	return &mem_section[root][nr & SECTION_ROOT_MASK];
> > > +	return &mem_section[root][SECTION_NR_IN_ROOT(nr)];
> > 
> > The explicit masking is clearer IMO.
> > 
> > >   }
> > >   extern size_t mem_section_usage_size(void);
> > 
> > Hmm, I don't see BUILD_BUG_ON() you mention in the changelog.
> > > -- 
> > > 2.20.1
> > > 
> > 
> 
> Regarding the BUILD_BUG_ON, it is in sparse_init() at line 419:
> 
> void __init sparse_init(void)
> {
>     ...
>     /* see include/linux/mmzone.h 'struct mem_section' definition */
>     BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
>     ...
> }
> 
> This guarantees that sizeof(struct mem_section) is a power of 2, and since
> SECTIONS_PER_ROOT = PAGE_SIZE / sizeof(struct mem_section) and PAGE_SIZE is
> always a power of 2, SECTIONS_PER_ROOT is guaranteed to be a power of 2 as
> well, validating the use of bit shifts.

This was not clear from reading the changelog. 
 
> Thanks,
> Zhen

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm/sparse: Optimize section number calculations using bit shifts
  2026-06-16  2:59 [PATCH] mm/sparse: Optimize section number calculations using bit shifts Zhen Ni
  2026-06-16  6:32 ` Mike Rapoport
@ 2026-06-16  8:06 ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-16  8:06 UTC (permalink / raw)
  To: Zhen Ni, Andrew Morton, Kairui Song, Qi Zheng, Shakeel Butt,
	Barry Song, Axel Rasmussen, Yuanchu Xie, Wei Xu, Lorenzo Stoakes,
	Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko
  Cc: linux-mm, linux-kernel

On 6/16/26 04:59, Zhen Ni wrote:
> Add SECTIONS_PER_ROOT_SHIFT = ilog2(SECTIONS_PER_ROOT) with correctness
> guaranteed by BUILD_BUG_ON in sparse_init(). Convert SECTION_NR_TO_ROOT
> to use right shift instead of division for better performance. Add
> SECTION_NR_IN_ROOT() macro to improve code readability.
> 
> This improves code efficiency in hot paths where __nr_to_section() is
> frequently called, such as sparse_init() and memory section management
> operations.
> 
> Performance verification in sparse_init() on ARM (8GB RAM, 4 NUMA nodes):
> 
>     sparse_init()
>     |
>     +----> memblocks_present()
>     |
>     +----> section initialization (sparse_init_nid loop)
> 
> Time measurement points:
> 
>     [T1] sparse_init start
>          |
>          v
>     [T2] memblocks_present() complete
>          |
>          v
>     [T3] sparse_init_nid() loop complete / sparse_init end
> 
> Measurement values:
>     memblocks_present_cycles = T2 - T1
>     section_initialization_cycles = T3 - T2
>     total_cycles = T3 - T1
> 
> Before (division):
> [    0.000000] sparse_init: total 7538 cycles
> [    0.000000]   memblocks_present: 4232 cycles
> [    0.000000]   section initialization: 3261 cycles
> 
> After (bit shift):
> [    0.000000] sparse_init: total 5641 cycles
> [    0.000000]   memblocks_present: 3562 cycles
> [    0.000000]   section initialization: 2057 cycles
> 
> Performance improvement:
>   Total: (7538-5641)/7538 = 25.2% faster
>   memblocks_present: (4232-3562)/4232 = 15.8% faster
>   section initialization: (3261-2057)/3261 = 36.9% faster
> 
> Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
> ---
>  include/linux/mmzone.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 9adb2ad21da5..5daf471f6823 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2035,11 +2035,14 @@ struct mem_section {
>  
>  #ifdef CONFIG_SPARSEMEM_EXTREME
>  #define SECTIONS_PER_ROOT       (PAGE_SIZE / sizeof (struct mem_section))
> +#define SECTIONS_PER_ROOT_SHIFT ilog2(SECTIONS_PER_ROOT)
>  #else
>  #define SECTIONS_PER_ROOT	1
> +#define SECTIONS_PER_ROOT_SHIFT 0
>  #endif
>  
> -#define SECTION_NR_TO_ROOT(sec)	((sec) / SECTIONS_PER_ROOT)
> +#define SECTION_NR_TO_ROOT(sec)	((sec) >> SECTIONS_PER_ROOT_SHIFT)
> +#define SECTION_NR_IN_ROOT(sec)	((sec) & SECTION_ROOT_MASK)
>  #define NR_SECTION_ROOTS	DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
>  #define SECTION_ROOT_MASK	(SECTIONS_PER_ROOT - 1)
>  

From a compiler POV, "/ SECTIONS_PER_ROOT" is exactly the same as >>
ilog2(SECTIONS_PER_ROOT) *as long as* the variable we are processing is
"unsigned long".

The compiler should be smart enough to figure out that out with
SECTIONS_PER_ROOT being known at compiletime.

Can you compare the generated code __nr_to_section() to see if and why the
compiler fails to optimize that?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-16 10:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16  2:59 [PATCH] mm/sparse: Optimize section number calculations using bit shifts Zhen Ni
2026-06-16  6:32 ` Mike Rapoport
     [not found]   ` <764b8fef-4e77-4daf-b2ba-45745061ade9@easystack.cn>
2026-06-16  7:56     ` Mike Rapoport
2026-06-16  8:06 ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox