From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-m19731101.qiye.163.com (mail-m19731101.qiye.163.com [220.197.31.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A15C36DA14 for ; Tue, 16 Jun 2026 14:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=220.197.31.101 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781619595; cv=none; b=eTyjGEHztL7QQDFdeXshTrB6UKTHzhwrdEKCdYmXynMQuB2b5MOlVHzf9EWy/VwEVm/MdWsuOxsTGqCE5dRk/5SjsFX4DCR5emAxScU93vlm+KyNOzIdQsoGxHysao5JdBFazPI7/Ff7aj+i2xLr2nMipDcDganKnbVHA5L22q0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781619595; c=relaxed/simple; bh=i6e/kigb8Em1V5nCAlQHiQ0dKCY6/nq4vAIOWSp1pu4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=bCPKSKDyKrrPnkuobdeb2tODKuv8qVrK67T8eF+tmz0pmRes1fEQDl0cgfsCDeIlT17a+9RpY4M7bE2SZ8T1Ivt7XkvtIGtEki3e9jCLtpiqfdtwuNxFmiw2q0VUrDFw9nb2DGoRHOkowTS5ZecASgEe7bkD41tWa8VHU7FK4A0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn; spf=pass smtp.mailfrom=easystack.cn; arc=none smtp.client-ip=220.197.31.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=easystack.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=easystack.cn Received: from [192.168.0.59] (unknown [218.94.118.90]) by smtp.qiye.163.com (Hmail) with ESMTP id 1b803707e; Tue, 16 Jun 2026 15:29:39 +0800 (GMT+08:00) Message-ID: <764b8fef-4e77-4daf-b2ba-45745061ade9@easystack.cn> Date: Tue, 16 Jun 2026 15:29:38 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/sparse: Optimize section number calculations using bit shifts To: Mike Rapoport Cc: Andrew Morton , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Suren Baghdasaryan , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20260616025942.3572473-1-zhen.ni@easystack.cn> From: "zhen.ni" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-HM-Tid: 0a9ecf55fab40229kunm567ead37407d8 X-HM-MType: 1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFJQjdXWRgWCB1ZQUpXWS1ZQUlXWQ8JGhUIEh9ZQVkZGk1IVh9LQx1MSBlDT0IaGFYVFA kWGhdVGRETFhoSFyQUDg9ZV1kYEgtZQVlJSkNVQk9VSkpDVUJLWVdZFhoPEhUdFFlBWU9LSFVKS0 lPT09IVUpLS1VKQktLWQY+ 在 2026/6/16 14:32, Mike Rapoport 写道: > Hi, > > On Tue, Jun 16, 2026 at 10:59:42AM +0800, Zhen Ni wrote: >> Add SECTIONS_PER_ROOT_SHIFT = ilog2(SECTIONS_PER_ROOT) with correctness >> guaranteed by BUILD_BUG_ON in sparse_init(). Convert SECTION_NR_TO_ROOT >> to use right shift instead of division for better performance. Add >> SECTION_NR_IN_ROOT() macro to improve code readability. >> >> This improves code efficiency in hot paths where __nr_to_section() is >> frequently called, such as sparse_init() and memory section management >> operations. >> >> Performance verification in sparse_init() on ARM (8GB RAM, 4 NUMA nodes): >> >> sparse_init() >> | >> +----> memblocks_present() >> | >> +----> section initialization (sparse_init_nid loop) >> >> Time measurement points: >> >> [T1] sparse_init start >> | >> v >> [T2] memblocks_present() complete >> | >> v >> [T3] sparse_init_nid() loop complete / sparse_init end >> >> Measurement values: >> memblocks_present_cycles = T2 - T1 >> section_initialization_cycles = T3 - T2 >> total_cycles = T3 - T1 >> >> Before (division): >> [ 0.000000] sparse_init: total 7538 cycles >> [ 0.000000] memblocks_present: 4232 cycles >> [ 0.000000] section initialization: 3261 cycles >> >> After (bit shift): >> [ 0.000000] sparse_init: total 5641 cycles >> [ 0.000000] memblocks_present: 3562 cycles >> [ 0.000000] section initialization: 2057 cycles >> >> Performance improvement: >> Total: (7538-5641)/7538 = 25.2% faster >> memblocks_present: (4232-3562)/4232 = 15.8% faster >> section initialization: (3261-2057)/3261 = 36.9% faster > > This is a nice improvement, but it's not the hot path. I believe you can > derive improvement to __nr_to_section() from these measurements. sparse_init() is not a hot path, but it invokes __nr_to_section() in a tight loop, making it a good measurement point to demonstrate the performance improvement. > >> Signed-off-by: Zhen Ni >> --- >> include/linux/mmzone.h | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 9adb2ad21da5..5daf471f6823 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -2035,11 +2035,14 @@ struct mem_section { >> >> #ifdef CONFIG_SPARSEMEM_EXTREME >> #define SECTIONS_PER_ROOT (PAGE_SIZE / sizeof (struct mem_section)) >> +#define SECTIONS_PER_ROOT_SHIFT ilog2(SECTIONS_PER_ROOT) >> #else >> #define SECTIONS_PER_ROOT 1 >> +#define SECTIONS_PER_ROOT_SHIFT 0 >> #endif >> >> -#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT) >> +#define SECTION_NR_TO_ROOT(sec) ((sec) >> SECTIONS_PER_ROOT_SHIFT) >> +#define SECTION_NR_IN_ROOT(sec) ((sec) & SECTION_ROOT_MASK) >> #define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT) >> #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) >> >> @@ -2065,7 +2068,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr) >> if (!mem_section || !mem_section[root]) >> return NULL; >> #endif >> - return &mem_section[root][nr & SECTION_ROOT_MASK]; >> + return &mem_section[root][SECTION_NR_IN_ROOT(nr)]; > > The explicit masking is clearer IMO. > >> } >> extern size_t mem_section_usage_size(void); > > Hmm, I don't see BUILD_BUG_ON() you mention in the changelog. > >> -- >> 2.20.1 >> > Regarding the BUILD_BUG_ON, it is in sparse_init() at line 419: void __init sparse_init(void) { ... /* see include/linux/mmzone.h 'struct mem_section' definition */ BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section))); ... } This guarantees that sizeof(struct mem_section) is a power of 2, and since SECTIONS_PER_ROOT = PAGE_SIZE / sizeof(struct mem_section) and PAGE_SIZE is always a power of 2, SECTIONS_PER_ROOT is guaranteed to be a power of 2 as well, validating the use of bit shifts. Thanks, Zhen