From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38FDDC3ABAC for ; Tue, 6 May 2025 06:53:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 057766B000A; Tue, 6 May 2025 02:53:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 006056B0082; Tue, 6 May 2025 02:53:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E36216B0085; Tue, 6 May 2025 02:53:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C65946B000A for ; Tue, 6 May 2025 02:53:23 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 155CBC1BE1 for ; Tue, 6 May 2025 06:53:24 +0000 (UTC) X-FDA: 83411566728.17.3248D4D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf27.hostedemail.com (Postfix) with ESMTP id 1185540005 for ; Tue, 6 May 2025 06:53:21 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf27.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746514402; a=rsa-sha256; cv=none; b=fkxtuSGHQngUdEwko5iBmeNsLCtlArt+pF6zPNo/6H/B2TF+/8ECueKzc8lQFmwtLYcpfQ On6TWA+6FvV1Lo3TbLe0gZNhYClshJAkTUS1B4uTR+klvvhpBnmK6Ux57En8f+OCSZlDMM ZngW97zL5G4pFM8eoo6DC/8xUoz96Ac= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf27.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746514402; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J50nAsO9iKBxWkkvUBBQGQm//wfwmYDSRl7lDEZ9JJ0=; b=gjt1W/0/mgzG5FBw1QtdAJup5qtK6Ad6zBX2Hjpz415/PjI2cORSKMi+WcWlV+HWthfcY2 yiSKMOPfMTitu+upFF37u49aixg1L4n/sgnlPtGyifJavfIZrwtIhOzE6HSG8VaRjd5NQN Rmg5n/RaXM0ivXoI+KXZ/gQUi4HOGCs= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4D7F1113E; Mon, 5 May 2025 23:53:11 -0700 (PDT) Received: from [10.163.54.208] (unknown [10.163.54.208]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E0D653F673; Mon, 5 May 2025 23:53:13 -0700 (PDT) Message-ID: <64a0c678-ead9-4620-b69b-e631d6e540f9@arm.com> Date: Tue, 6 May 2025 12:23:07 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] mm: Add CONFIG_PAGE_BLOCK_ORDER to select page block order To: Juan Yescas , Andrew Morton , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tjmercier@google.com, isaacmanjarres@google.com, surenb@google.com, kaleshsingh@google.com, Vlastimil Babka , "Liam R. Howlett" , Lorenzo Stoakes , David Hildenbrand , Mike Rapoport , Minchan Kim References: <20250506002319.513795-1-jyescas@google.com> Content-Language: en-US From: Anshuman Khandual In-Reply-To: <20250506002319.513795-1-jyescas@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1185540005 X-Stat-Signature: bd6thny6qnfspt9f6b4844qsscojhou7 X-Rspam-User: X-HE-Tag: 1746514401-353413 X-HE-Meta: U2FsdGVkX18QpB36ko1otN/CSxuPbYJEP1/Uic7c/xkbUaNzdL3BF5BBCnqqaZWpVxozxB6Uo13eo2Pr1SKCHpeNeZy4QPGHZ8SVOtkvjEwthmS6KUReKh5uio7KfQ2lw+jh79fxf29TmnH27+FkrTtmIWkgLu/MGewoUbfsekfyzEa+DqjDGV3mXOq2lXqiTw93oWoCpIvTiWALSdCDuVis3BIw2FUFuOfyYWPXC69MYBJcY7HmcDp8ANS9fY1kTTlZbcG5Miz4UhxG79hEWvJn9NJXBp0R34jKQLqzgWUPLF/C4MlUhMyQyZHyExvLNYy8fPI5TLyQjdzJ4gVpYBCtcZZkpe/L13XERMfhdu9q3PVHxFTRBwPyeUgEwX5dbmsChoNeoxm4a9QH4D46zijOeXc5RWhmw0jqXXJhtEFG53YER1n+RLbdM8Z732/O90cGgoNWDylbPjVkzbvZ4IjFC+ILpR4qjXCqtJztq43H+Wx3kKTZp7t/qyCfvVxO0zbqyHBsF66ZK1d/vGNptqWBwPl7Hw9i9hRgFb/nUu6FWJoiO1hUY65cKYlCxFpNzLSQSEhjCUIt1hDa6ivuEZhULvPohLXRhwo0OMg3cGn7pygkWUvMPmXwW8K65Ycax3ts/tYJZhd1YofpuJPpIzLOxIbLcv/q0CnsF7BLnvlgoRp7htb3wp7i5BBNREzin4aZWv8SIRJGpGnDgX2+oWwmZD7ztN7rargn2BSIj9pabi66fnTXt5jbNFvsyAzhVzKShgSnC0mdUpCkoSW1uWDiRVOe/OPTYRY07IHHXO0Qu29kq/aeItw7C6yeExhCtEW3umg9m9C6X8KHn0477E02w2JxkcIjUcaM7fzhLA0nbVyOyNRXBAT6uGHfKyhTbnE+RRz1LRjsPlMaNeGwUQLsCEkQ1ERiZ6OCNiH9h0ACeoM/0IhmRNokR3DwLawz/uan2z6uth+o7hqL/+T pw4fJbww C+MGFaC7e21zI0IXaeGcfrGaWvapOpe/CTyJV0UxvP2lI9UpP0Z+M1R+s0iUtsZvNdFoQx7vdZlZLHuRO5eynGy/MBPz4X/89ArAsJoDjbih1/RcJBqkzrO6TcVp5Nt8HKHRdg1tciIpnFRP5fBfiMeSeKo89AtR4s5fwYFjuqzPdOBIfPAU5dg1eP+9Qzth991qOgIT2yvg6V81x2FroZwwfWrzWVibO2a5u48cuWEYFARKOa2/ofRJPFHHZF7J8flDcc43S5AmPI+YGznLSVMY6AUJLcMgp7csYl8nk7mHCz5lYRzKdXmMwsQLb8qGIAaxVyLamQ8EtVHKycwOox+BvUH8SDgRdb+9OjXBXZlzGjukVBcFpeaqnla6CCJ2tjpyGTNuscaljhOkQIwd9r4RpmMfv1ua9vHOX1okPMquTpZDM1TeHEyktmLZ59JKXoT6a X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 5/6/25 05:52, Juan Yescas wrote: > Problem: On large page size configurations (16KiB, 64KiB), the CMA > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably, > and this causes the CMA reservations to be larger than necessary. > This means that system will have less available MIGRATE_UNMOVABLE and > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them. > > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels. > > For example, in ARM, the CMA alignment requirement when: > > - CONFIG_ARCH_FORCE_MAX_ORDER default value is used > - CONFIG_TRANSPARENT_HUGEPAGE is set: > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES > ----------------------------------------------------------------------- > 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB > 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB > > There are some extreme cases for the CMA alignment requirement when: > > - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set > - CONFIG_TRANSPARENT_HUGEPAGE is NOT set: > - CONFIG_HUGETLB_PAGE is NOT set > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES > ------------------------------------------------------------------------ > 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB > 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB > > This affects the CMA reservations for the drivers. If a driver in a > 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal > reservation has to be 32MiB due to the alignment requirements: > > reserved-memory { > ... > cma_test_reserve: cma_test_reserve { > compatible = "shared-dma-pool"; > size = <0x0 0x400000>; /* 4 MiB */ > ... > }; > }; > > reserved-memory { > ... > cma_test_reserve: cma_test_reserve { > compatible = "shared-dma-pool"; > size = <0x0 0x2000000>; /* 32 MiB */ > ... > }; > }; This indeed is a valid problem which reduces available memory for non-CMA page blocks on system required for general memory usage. > > Solution: Add a new config CONFIG_PAGE_BLOCK_ORDER that > allows to set the page block order in all the architectures. > The maximum page block order will be given by > ARCH_FORCE_MAX_ORDER. > > By default, CONFIG_PAGE_BLOCK_ORDER will have the same > value that ARCH_FORCE_MAX_ORDER. This will make sure that > current kernel configurations won't be affected by this > change. It is a opt-in change. Right. > > This patch will allow to have the same CMA alignment > requirements for large page sizes (16KiB, 64KiB) as that > in 4kb kernels by setting a lower pageblock_order. > > Tests: > > - Verified that HugeTLB pages work when pageblock_order is 1, 7, 10 > on 4k and 16k kernels. > > - Verified that Transparent Huge Pages work when pageblock_order > is 1, 7, 10 on 4k and 16k kernels. > > - Verified that dma-buf heaps allocations work when pageblock_order > is 1, 7, 10 on 4k and 16k kernels. pageblock_order are choosen as 1, 7 and 10 to cover the entire possible range for ARCH_FORCE_MAX_ORDER. Although kernel CI should test this for all values in the range. Because this now opens up different ranges for different platforms which were never tested earlier. > > Benchmarks: > > The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The > reason for the pageblock_order 7 is because this value makes the min > CMA alignment requirement the same as that in 4kb kernels (2MB). > > - Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of > SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf > (https://developer.android.com/ndk/guides/simpleperf) to measure > the # of instructions and page-faults on 16k kernels. > The benchmark was executed 10 times. The averages are below: > > # instructions | #page-faults > order 10 | order 7 | order 10 | order 7 > -------------------------------------------------------- > 13,891,765,770 | 11,425,777,314 | 220 | 217 > 14,456,293,487 | 12,660,819,302 | 224 | 219 > 13,924,261,018 | 13,243,970,736 | 217 | 221 > 13,910,886,504 | 13,845,519,630 | 217 | 221 > 14,388,071,190 | 13,498,583,098 | 223 | 224 > 13,656,442,167 | 12,915,831,681 | 216 | 218 > 13,300,268,343 | 12,930,484,776 | 222 | 218 > 13,625,470,223 | 14,234,092,777 | 219 | 218 > 13,508,964,965 | 13,432,689,094 | 225 | 219 > 13,368,950,667 | 13,683,587,37 | 219 | 225 > ------------------------------------------------------------------- > 13,803,137,433 | 13,131,974,268 | 220 | 220 Averages > > There were 4.85% #instructions when order was 7, in comparison > with order 10. > > 13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%) > > The number of page faults in order 7 and 10 were the same. > > These results didn't show any significant regression when the > pageblock_order is set to 7 on 16kb kernels. > > - Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times > on the 16k kernels with pageblock_order 7 and 10. > > order 10 | order 7 | order 7 - order 10 | (order 7 - order 10) % > ------------------------------------------------------------------- > 15.8 | 16.4 | 0.6 | 3.80% > 16.4 | 16.2 | -0.2 | -1.22% > 16.6 | 16.3 | -0.3 | -1.81% > 16.8 | 16.3 | -0.5 | -2.98% > 16.6 | 16.8 | 0.2 | 1.20% > ------------------------------------------------------------------- > 16.44 16.4 -0.04 -0.24% Averages > > The results didn't show any significant regression when the > pageblock_order is set to 7 on 16kb kernels. > > Cc: Andrew Morton > Cc: Vlastimil Babka > Cc: Liam R. Howlett > Cc: Lorenzo Stoakes > Cc: David Hildenbrand > CC: Mike Rapoport > Cc: Zi Yan > Cc: Suren Baghdasaryan > Cc: Minchan Kim > Signed-off-by: Juan Yescas > Acked-by: Zi Yan > --- > Changes in v3: > - Rename ARCH_FORCE_PAGE_BLOCK_ORDER to PAGE_BLOCK_ORDER > as per Matthew's suggestion. > - Update comments in pageblock-flags.h for pageblock_order > value when THP or HugeTLB are not used. > > Changes in v2: > - Add Zi's Acked-by tag. > - Move ARCH_FORCE_PAGE_BLOCK_ORDER config to mm/Kconfig as > per Zi and Matthew suggestion so it is available to > all the architectures. > - Set ARCH_FORCE_PAGE_BLOCK_ORDER to 10 by default when > ARCH_FORCE_MAX_ORDER is not available. > > > > > include/linux/pageblock-flags.h | 14 ++++++++++---- > mm/Kconfig | 31 +++++++++++++++++++++++++++++++ > 2 files changed, 41 insertions(+), 4 deletions(-) > > diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h > index fc6b9c87cb0a..0c4963339f0b 100644 > --- a/include/linux/pageblock-flags.h > +++ b/include/linux/pageblock-flags.h > @@ -28,6 +28,12 @@ enum pageblock_bits { > NR_PAGEBLOCK_BITS > }; > > +#if defined(CONFIG_PAGE_BLOCK_ORDER) > +#define PAGE_BLOCK_ORDER CONFIG_PAGE_BLOCK_ORDER > +#else > +#define PAGE_BLOCK_ORDER MAX_PAGE_ORDER > +#endif /* CONFIG_PAGE_BLOCK_ORDER */ > + > #if defined(CONFIG_HUGETLB_PAGE) > > #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE > @@ -41,18 +47,18 @@ extern unsigned int pageblock_order; > * Huge pages are a constant size, but don't exceed the maximum allocation > * granularity. > */ > -#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER) > +#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, PAGE_BLOCK_ORDER) > > #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ > > #elif defined(CONFIG_TRANSPARENT_HUGEPAGE) > > -#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER) > +#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER) > > #else /* CONFIG_TRANSPARENT_HUGEPAGE */ > > -/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */ > -#define pageblock_order MAX_PAGE_ORDER > +/* If huge pages are not used, group by PAGE_BLOCK_ORDER */ > +#define pageblock_order PAGE_BLOCK_ORDER > > #endif /* CONFIG_HUGETLB_PAGE */ > These all look good. > diff --git a/mm/Kconfig b/mm/Kconfig > index e113f713b493..c52be3489aa3 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -989,6 +989,37 @@ config CMA_AREAS > > If unsure, leave the default value "8" in UMA and "20" in NUMA. > > +# > +# Select this config option from the architecture Kconfig, if available, to set > +# the max page order for physically contiguous allocations. > +# > +config ARCH_FORCE_MAX_ORDER > + int ARCH_FORCE_MAX_ORDER needs to be defined here first before PAGE_BLOCK_ORDER could use that subsequently.But ARCH_FORCE_MAX_ORDER is defined in various architectures in 'int' or 'int ""' formats. So could there be a problem for this config to be defined both in generic and platform code ? But clearly ARCH_FORCE_MAX_ORDER still remains a arch specific config. #git grep "config ARCH_FORCE_MAX_ORDER" arch/arc/Kconfig:config ARCH_FORCE_MAX_ORDER arch/arm/Kconfig:config ARCH_FORCE_MAX_ORDER arch/arm64/Kconfig:config ARCH_FORCE_MAX_ORDER arch/loongarch/Kconfig:config ARCH_FORCE_MAX_ORDER arch/m68k/Kconfig.cpu:config ARCH_FORCE_MAX_ORDER arch/mips/Kconfig:config ARCH_FORCE_MAX_ORDER arch/nios2/Kconfig:config ARCH_FORCE_MAX_ORDER arch/powerpc/Kconfig:config ARCH_FORCE_MAX_ORDER arch/sh/mm/Kconfig:config ARCH_FORCE_MAX_ORDER arch/sparc/Kconfig:config ARCH_FORCE_MAX_ORDER arch/xtensa/Kconfig:config ARCH_FORCE_MAX_ORDER mm/Kconfig:config ARCH_FORCE_MAX_ORDER arch/arc/ config ARCH_FORCE_MAX_ORDER int "Maximum zone order" arch/arm/ config ARCH_FORCE_MAX_ORDER int "Order of maximal physically contiguous allocations" arch/arm64/ config ARCH_FORCE_MAX_ORDER int ........... arch/sparc/ config ARCH_FORCE_MAX_ORDER int "Order of maximal physically contiguous allocations" > + > +# When ARCH_FORCE_MAX_ORDER is not defined, the default page block order is 10, Just wondering - why the default is 10 ? > +# as per include/linux/mmzone.h. > +config PAGE_BLOCK_ORDER > + int "Page Block Order" > + range 1 10 if !ARCH_FORCE_MAX_ORDER Also why the range is restricted to 10 ? > + default 10 if !ARCH_FORCE_MAX_ORDER > + range 1 ARCH_FORCE_MAX_ORDER if ARCH_FORCE_MAX_ORDER > + default ARCH_FORCE_MAX_ORDER if ARCH_FORCE_MAX_ORDER We still have the MAX_PAGE_ORDER which maps into ARCH_FORCE_MAX_ORDER when available or otherwise just falls back as 10. /* Free memory management - zoned buddy allocator. */ #ifndef CONFIG_ARCH_FORCE_MAX_ORDER #define MAX_PAGE_ORDER 10 #else #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER #endif Hence could PAGE_BLOCK_ORDER config description block be simplified as config PAGE_BLOCK_ORDER int "Page Block Order" range 1 MAX_PAGE_ORDER default MAX_PAGE_ORDER As MAX_PAGE_ORDER could switch between ARCH_FORCE_MAX_ORDER and 10 as and when required. > + > + help > + The page block order refers to the power of two number of pages that > + are physically contiguous and can have a migrate type associated to > + them. The maximum size of the page block order is limited by > + ARCH_FORCE_MAX_ORDER. s/ARCH_FORCE_MAX_ORDER/ARCH_FORCE_MAX_ORDER when available on the platform/ ? Also mention about max range when ARCH_FORCE_MAX_ORDER is not available. > + > + This option allows overriding the default setting when the page > + block order requires to be smaller than ARCH_FORCE_MAX_ORDER. > + > + Reducing pageblock order can negatively impact THP generation > + successful rate. If your workloads uses THP heavily, please use this > + option with caution. Just wondering - could there be any other side effects besides THP ? Will it be better to depend on CONFIG_EXPERT while selecting anything other than the default option i.e ARCH_FORCE_MAX_ORDER or 10 from the value range. > + > + Don't change if unsure. > + > config MEM_SOFT_DIRTY > bool "Track memory changes" > depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS