From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9BEF3DD528 for ; Tue, 16 Jun 2026 06:32:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781591534; cv=none; b=BRo37Npsm2RoE/D2j+TNa3WTB6UevOco6+2TOrgp8Rt5Lll/LxtqKQrLDo60vC/EoQMepw6fVVLmvSWRPN+uMX0feTq7rdXp37slPvQbVuAsjVoEzhMI25l5AhAsK/iAbBCBrTgrehu+1b5i8sV/s9IFgr35Xs+6r6gkFB8rlFI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781591534; c=relaxed/simple; bh=K/VKNilU/qYY5Ve06udGwFYaPtFnaeMJS0Cu5k3SgCU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ToDsxlZ6+yIIOJ+Ae76/6oQsrNCn1zBX1Hh2T6i6LxAQUDIGZx82TfzvFfWNeK3flK+w1CDfrnqgkXFm7fZE0QLWbGfUkAnsY9jb9BvALbgVzZMZiWPkWD/e2C3KEOfRIRbOULnzgoI7ZCnlwf/U2GiAam9FQV//6ZAxVhfYn5c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kbDWGSXi; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kbDWGSXi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3CFD1F000E9; Tue, 16 Jun 2026 06:32:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781591532; bh=0h8hTE8D0MzpG2ZyqOgdW5zujhV0AXJgdChw284hKss=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=kbDWGSXigKMH4v1C1JplUjis1eugf8ArZFzK2sn8wtQOHaCXfk6t2EhwGK7/PHNWG IpRJsKmmE7b9o3pWAs/HGmN32wG4sfzZGOM3QafwO1SgMYXjVdjKTsKzj/VscekRKS ZyIR72GqHEjjIwDotdfVtDMtAYNqkTxpAdgQKT6ReymLZ9roLXIb0ECmEOOdw+26/y NR5u5Uas2D4imw0c+h6/FtAGpc3NROWygKYBKClRZ86vMi8utQpTr+ywUIX6ffTHR6 el+CIypo5WCmGd4/hDbrXkxDCgZMvJJcrzPkfBPM6UhFkqA3PZ5sEj5ohO+ueD9JiN YLHhk1jUN+ONA== Date: Tue, 16 Jun 2026 09:32:03 +0300 From: Mike Rapoport To: Zhen Ni Cc: Andrew Morton , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm/sparse: Optimize section number calculations using bit shifts Message-ID: References: <20260616025942.3572473-1-zhen.ni@easystack.cn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260616025942.3572473-1-zhen.ni@easystack.cn> Hi, On Tue, Jun 16, 2026 at 10:59:42AM +0800, Zhen Ni wrote: > Add SECTIONS_PER_ROOT_SHIFT = ilog2(SECTIONS_PER_ROOT) with correctness > guaranteed by BUILD_BUG_ON in sparse_init(). Convert SECTION_NR_TO_ROOT > to use right shift instead of division for better performance. Add > SECTION_NR_IN_ROOT() macro to improve code readability. > > This improves code efficiency in hot paths where __nr_to_section() is > frequently called, such as sparse_init() and memory section management > operations. > > Performance verification in sparse_init() on ARM (8GB RAM, 4 NUMA nodes): > > sparse_init() > | > +----> memblocks_present() > | > +----> section initialization (sparse_init_nid loop) > > Time measurement points: > > [T1] sparse_init start > | > v > [T2] memblocks_present() complete > | > v > [T3] sparse_init_nid() loop complete / sparse_init end > > Measurement values: > memblocks_present_cycles = T2 - T1 > section_initialization_cycles = T3 - T2 > total_cycles = T3 - T1 > > Before (division): > [ 0.000000] sparse_init: total 7538 cycles > [ 0.000000] memblocks_present: 4232 cycles > [ 0.000000] section initialization: 3261 cycles > > After (bit shift): > [ 0.000000] sparse_init: total 5641 cycles > [ 0.000000] memblocks_present: 3562 cycles > [ 0.000000] section initialization: 2057 cycles > > Performance improvement: > Total: (7538-5641)/7538 = 25.2% faster > memblocks_present: (4232-3562)/4232 = 15.8% faster > section initialization: (3261-2057)/3261 = 36.9% faster This is a nice improvement, but it's not the hot path. I believe you can derive improvement to __nr_to_section() from these measurements. > Signed-off-by: Zhen Ni > --- > include/linux/mmzone.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 9adb2ad21da5..5daf471f6823 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -2035,11 +2035,14 @@ struct mem_section { > > #ifdef CONFIG_SPARSEMEM_EXTREME > #define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section)) > +#define SECTIONS_PER_ROOT_SHIFT ilog2(SECTIONS_PER_ROOT) > #else > #define SECTIONS_PER_ROOT 1 > +#define SECTIONS_PER_ROOT_SHIFT 0 > #endif > > -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) > +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTIONS_PER_ROOT_SHIFT) > +#define SECTION_NR_IN_ROOT(sec) ((sec) & SECTION_ROOT_MASK) > #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT) > #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) > > @@ -2065,7 +2068,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr) > if (!mem_section || !mem_section[root]) > return NULL; > #endif > - return &mem_section[root][nr & SECTION_ROOT_MASK]; > + return &mem_section[root][SECTION_NR_IN_ROOT(nr)]; The explicit masking is clearer IMO. > } > extern size_t mem_section_usage_size(void); Hmm, I don't see BUILD_BUG_ON() you mention in the changelog. > -- > 2.20.1 > -- Sincerely yours, Mike.