* [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
@ 2025-05-01 5:25 Juan Yescas
2025-05-01 14:24 ` Zi Yan
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Juan Yescas @ 2025-05-01 5:25 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, Andrew Morton, Juan Yescas,
linux-arm-kernel, linux-kernel, linux-mm
Cc: tjmercier, isaacmanjarres, surenb, kaleshsingh, Vlastimil Babka,
Liam R. Howlett, Lorenzo Stoakes, David Hildenbrand,
Mike Rapoport, Zi Yan, Minchan Kim
Problem: On large page size configurations (16KiB, 64KiB), the CMA
alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
and this causes the CMA reservations to be larger than necessary.
This means that system will have less available MIGRATE_UNMOVABLE and
MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
For example, the CMA alignment requirement when:
- CONFIG_ARCH_FORCE_MAX_ORDER default value is used
- CONFIG_TRANSPARENT_HUGEPAGE is set:
PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
-----------------------------------------------------------------------
4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
There are some extreme cases for the CMA alignment requirement when:
- CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
- CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
- CONFIG_HUGETLB_PAGE is NOT set
PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
------------------------------------------------------------------------
4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
This affects the CMA reservations for the drivers. If a driver in a
4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
reservation has to be 32MiB due to the alignment requirements:
reserved-memory {
...
cma_test_reserve: cma_test_reserve {
compatible = "shared-dma-pool";
size = <0x0 0x400000>; /* 4 MiB */
...
};
};
reserved-memory {
...
cma_test_reserve: cma_test_reserve {
compatible = "shared-dma-pool";
size = <0x0 0x2000000>; /* 32 MiB */
...
};
};
Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
allows to set the page block order. The maximum page block
order will be given by ARCH_FORCE_MAX_ORDER.
By default, ARCH_FORCE_PAGE_BLOCK_ORDER will have the same
value that ARCH_FORCE_MAX_ORDER. This will make sure that
current kernel configurations won't be affected by this
change. It is a opt-in change.
This patch will allow to have the same CMA alignment
requirements for large page sizes (16KiB, 64KiB) as that
in 4kb kernels by setting a lower pageblock_order.
Tests:
- Verified that HugeTLB pages work when pageblock_order is 1, 7, 10
on 4k and 16k kernels.
- Verified that Transparent Huge Pages work when pageblock_order
is 1, 7, 10 on 4k and 16k kernels.
- Verified that dma-buf heaps allocations work when pageblock_order
is 1, 7, 10 on 4k and 16k kernels.
Benchmarks:
The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The
reason for the pageblock_order 7 is because this value makes the min
CMA alignment requirement the same as that in 4kb kernels (2MB).
- Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf
(https://developer.android.com/ndk/guides/simpleperf) to measure
the # of instructions and page-faults on 16k kernels.
The benchmark was executed 10 times. The averages are below:
# instructions | #page-faults
order 10 | order 7 | order 10 | order 7
--------------------------------------------------------
13,891,765,770 | 11,425,777,314 | 220 | 217
14,456,293,487 | 12,660,819,302 | 224 | 219
13,924,261,018 | 13,243,970,736 | 217 | 221
13,910,886,504 | 13,845,519,630 | 217 | 221
14,388,071,190 | 13,498,583,098 | 223 | 224
13,656,442,167 | 12,915,831,681 | 216 | 218
13,300,268,343 | 12,930,484,776 | 222 | 218
13,625,470,223 | 14,234,092,777 | 219 | 218
13,508,964,965 | 13,432,689,094 | 225 | 219
13,368,950,667 | 13,683,587,37 | 219 | 225
-------------------------------------------------------------------
13,803,137,433 | 13,131,974,268 | 220 | 220 Averages
There were 4.85% #instructions when order was 7, in comparison
with order 10.
13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)
The number of page faults in order 7 and 10 were the same.
These results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.
- Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
on the 16k kernels with pageblock_order 7 and 10.
order 10 | order 7 | order 7 - order 10 | (order 7 - order 10) %
-------------------------------------------------------------------
15.8 | 16.4 | 0.6 | 3.80%
16.4 | 16.2 | -0.2 | -1.22%
16.6 | 16.3 | -0.3 | -1.81%
16.8 | 16.3 | -0.5 | -2.98%
16.6 | 16.8 | 0.2 | 1.20%
-------------------------------------------------------------------
16.44 16.4 -0.04 -0.24% Averages
The results didn't show any significant regression when the
pageblock_order is set to 7 on 16kb kernels.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
CC: Mike Rapoport <rppt@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Juan Yescas <jyescas@google.com>
---
arch/arm64/Kconfig | 14 ++++++++++++++
include/linux/pageblock-flags.h | 12 +++++++++---
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a182295e6f08..d784049e1e01 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1658,6 +1658,20 @@ config ARCH_FORCE_MAX_ORDER
Don't change if unsure.
+config ARCH_FORCE_PAGE_BLOCK_ORDER
+ int "Page Block Order"
+ range 1 ARCH_FORCE_MAX_ORDER
+ default ARCH_FORCE_MAX_ORDER
+ help
+ The page block order refers to the power of two number of pages that
+ are physically contiguous and can have a migrate type associated to them.
+ The maximum size of the page block order is limited by ARCH_FORCE_MAX_ORDER.
+
+ This option allows overriding the default setting when the page
+ block order requires to be smaller than ARCH_FORCE_MAX_ORDER.
+
+ Don't change if unsure.
+
config UNMAP_KERNEL_AT_EL0
bool "Unmap kernel when running in userspace (KPTI)" if EXPERT
default y
diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
index fc6b9c87cb0a..ab3de96bb50c 100644
--- a/include/linux/pageblock-flags.h
+++ b/include/linux/pageblock-flags.h
@@ -28,6 +28,12 @@ enum pageblock_bits {
NR_PAGEBLOCK_BITS
};
+#if defined(CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER)
+#define PAGE_BLOCK_ORDER CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER
+#else
+#define PAGE_BLOCK_ORDER MAX_PAGE_ORDER
+#endif /* CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER */
+
#if defined(CONFIG_HUGETLB_PAGE)
#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
@@ -41,18 +47,18 @@ extern unsigned int pageblock_order;
* Huge pages are a constant size, but don't exceed the maximum allocation
* granularity.
*/
-#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
+#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, PAGE_BLOCK_ORDER)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
#elif defined(CONFIG_TRANSPARENT_HUGEPAGE)
-#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
+#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)
#else /* CONFIG_TRANSPARENT_HUGEPAGE */
/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
-#define pageblock_order MAX_PAGE_ORDER
+#define pageblock_order PAGE_BLOCK_ORDER
#endif /* CONFIG_HUGETLB_PAGE */
--
2.49.0.906.g1f30a19c02-goog
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 5:25 [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order Juan Yescas
@ 2025-05-01 14:24 ` Zi Yan
2025-05-01 17:11 ` Juan Yescas
2025-05-01 18:38 ` Matthew Wilcox
2025-05-01 18:49 ` Zi Yan
2 siblings, 1 reply; 12+ messages in thread
From: Zi Yan @ 2025-05-01 14:24 UTC (permalink / raw)
To: Juan Yescas
Cc: Catalin Marinas, Will Deacon, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Minchan Kim
On 1 May 2025, at 1:25, Juan Yescas wrote:
> Problem: On large page size configurations (16KiB, 64KiB), the CMA
> alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> and this causes the CMA reservations to be larger than necessary.
> This means that system will have less available MIGRATE_UNMOVABLE and
> MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
>
> The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
>
> For example, the CMA alignment requirement when:
>
> - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> - CONFIG_TRANSPARENT_HUGEPAGE is set:
>
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> -----------------------------------------------------------------------
> 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
> 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
> 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
>
> There are some extreme cases for the CMA alignment requirement when:
>
> - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> - CONFIG_HUGETLB_PAGE is NOT set
>
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> ------------------------------------------------------------------------
> 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
> 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
> 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
>
> This affects the CMA reservations for the drivers. If a driver in a
> 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> reservation has to be 32MiB due to the alignment requirements:
>
> reserved-memory {
> ...
> cma_test_reserve: cma_test_reserve {
> compatible = "shared-dma-pool";
> size = <0x0 0x400000>; /* 4 MiB */
> ...
> };
> };
>
> reserved-memory {
> ...
> cma_test_reserve: cma_test_reserve {
> compatible = "shared-dma-pool";
> size = <0x0 0x2000000>; /* 32 MiB */
> ...
> };
> };
>
> Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
> allows to set the page block order. The maximum page block
> order will be given by ARCH_FORCE_MAX_ORDER.
Why not use a boot time parameter to change page block order?
Otherwise, you will need to maintain an additional kernel
binary for your use case.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 14:24 ` Zi Yan
@ 2025-05-01 17:11 ` Juan Yescas
2025-05-01 18:21 ` Kalesh Singh
0 siblings, 1 reply; 12+ messages in thread
From: Juan Yescas @ 2025-05-01 17:11 UTC (permalink / raw)
To: Zi Yan
Cc: Catalin Marinas, Will Deacon, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Minchan Kim
On Thu, May 1, 2025 at 7:24 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 1 May 2025, at 1:25, Juan Yescas wrote:
>
> > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > and this causes the CMA reservations to be larger than necessary.
> > This means that system will have less available MIGRATE_UNMOVABLE and
> > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> >
> > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
> >
> > For example, the CMA alignment requirement when:
> >
> > - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> > - CONFIG_TRANSPARENT_HUGEPAGE is set:
> >
> > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > -----------------------------------------------------------------------
> > 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
> > 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
> > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> >
> > There are some extreme cases for the CMA alignment requirement when:
> >
> > - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> > - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> > - CONFIG_HUGETLB_PAGE is NOT set
> >
> > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > ------------------------------------------------------------------------
> > 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
> > 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
> > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> >
> > This affects the CMA reservations for the drivers. If a driver in a
> > 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> > reservation has to be 32MiB due to the alignment requirements:
> >
> > reserved-memory {
> > ...
> > cma_test_reserve: cma_test_reserve {
> > compatible = "shared-dma-pool";
> > size = <0x0 0x400000>; /* 4 MiB */
> > ...
> > };
> > };
> >
> > reserved-memory {
> > ...
> > cma_test_reserve: cma_test_reserve {
> > compatible = "shared-dma-pool";
> > size = <0x0 0x2000000>; /* 32 MiB */
> > ...
> > };
> > };
> >
> > Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
> > allows to set the page block order. The maximum page block
> > order will be given by ARCH_FORCE_MAX_ORDER.
>
> Why not use a boot time parameter to change page block order?
That is a good option. The main tradeoff is:
- The bootloader would have to be updated on the devices to pass the right
pageblock_order value depending on the kernel page size. Currently,
We can boot 4k/16k kernels without any change in the bootloader.
> Otherwise, you will need to maintain an additional kernel
> binary for your use case.
>
Unfortunately, we still need 2 kernel binaries, one for 4k and another for 16k.
There are several data structures that are aligned at compile time based on the
PAGE_SIZE (__aligned(PAGE_SIZE)) that makes it difficult to have only one
binary.
For example:
static u8 idmap_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE)
__ro_after_init,
kpti_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE) __ro_after_init;
https://elixir.bootlin.com/linux/v6.14.4/source/arch/arm64/mm/mmu.c#L780
Thanks
Juan
> --
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 17:11 ` Juan Yescas
@ 2025-05-01 18:21 ` Kalesh Singh
2025-05-01 18:40 ` Zi Yan
0 siblings, 1 reply; 12+ messages in thread
From: Kalesh Singh @ 2025-05-01 18:21 UTC (permalink / raw)
To: Juan Yescas
Cc: Zi Yan, Catalin Marinas, Will Deacon, Andrew Morton,
linux-arm-kernel, linux-kernel, linux-mm, tjmercier,
isaacmanjarres, surenb, Vlastimil Babka, Liam R. Howlett,
Lorenzo Stoakes, David Hildenbrand, Mike Rapoport, Minchan Kim
On Thu, May 1, 2025 at 10:11 AM Juan Yescas <jyescas@google.com> wrote:
>
> On Thu, May 1, 2025 at 7:24 AM Zi Yan <ziy@nvidia.com> wrote:
> >
> > On 1 May 2025, at 1:25, Juan Yescas wrote:
> >
> > > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > > and this causes the CMA reservations to be larger than necessary.
> > > This means that system will have less available MIGRATE_UNMOVABLE and
> > > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> > >
> > > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
> > >
> > > For example, the CMA alignment requirement when:
> > >
> > > - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> > > - CONFIG_TRANSPARENT_HUGEPAGE is set:
> > >
> > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > > -----------------------------------------------------------------------
> > > 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
> > > 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
> > > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> > >
> > > There are some extreme cases for the CMA alignment requirement when:
> > >
> > > - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> > > - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> > > - CONFIG_HUGETLB_PAGE is NOT set
> > >
> > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > > ------------------------------------------------------------------------
> > > 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
> > > 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
> > > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> > >
> > > This affects the CMA reservations for the drivers. If a driver in a
> > > 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> > > reservation has to be 32MiB due to the alignment requirements:
> > >
> > > reserved-memory {
> > > ...
> > > cma_test_reserve: cma_test_reserve {
> > > compatible = "shared-dma-pool";
> > > size = <0x0 0x400000>; /* 4 MiB */
> > > ...
> > > };
> > > };
> > >
> > > reserved-memory {
> > > ...
> > > cma_test_reserve: cma_test_reserve {
> > > compatible = "shared-dma-pool";
> > > size = <0x0 0x2000000>; /* 32 MiB */
> > > ...
> > > };
> > > };
> > >
> > > Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
> > > allows to set the page block order. The maximum page block
> > > order will be given by ARCH_FORCE_MAX_ORDER.
> >
> > Why not use a boot time parameter to change page block order?
>
> That is a good option. The main tradeoff is:
>
> - The bootloader would have to be updated on the devices to pass the right
> pageblock_order value depending on the kernel page size. Currently,
> We can boot 4k/16k kernels without any change in the bootloader.
Once we change the page block order we likely need to update the CMA
reservations in the device tree to match the new min alignment, which
needs to be recompiled and flashed to the device. So there is likely
not a significant process saving by making the page block order a boot
parameter.
-- Kalesh
>
> > Otherwise, you will need to maintain an additional kernel
> > binary for your use case.
> >
>
> Unfortunately, we still need 2 kernel binaries, one for 4k and another for 16k.
> There are several data structures that are aligned at compile time based on the
> PAGE_SIZE (__aligned(PAGE_SIZE)) that makes it difficult to have only one
> binary.
>
> For example:
>
> static u8 idmap_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE)
> __ro_after_init,
> kpti_ptes[IDMAP_LEVELS - 1][PAGE_SIZE] __aligned(PAGE_SIZE) __ro_after_init;
>
> https://elixir.bootlin.com/linux/v6.14.4/source/arch/arm64/mm/mmu.c#L780
>
> Thanks
> Juan
>
> > --
> > Best Regards,
> > Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 5:25 [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order Juan Yescas
2025-05-01 14:24 ` Zi Yan
@ 2025-05-01 18:38 ` Matthew Wilcox
2025-05-01 19:27 ` Juan Yescas
` (2 more replies)
2025-05-01 18:49 ` Zi Yan
2 siblings, 3 replies; 12+ messages in thread
From: Matthew Wilcox @ 2025-05-01 18:38 UTC (permalink / raw)
To: Juan Yescas
Cc: Catalin Marinas, Will Deacon, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Zi Yan, Minchan Kim
On Wed, Apr 30, 2025 at 10:25:11PM -0700, Juan Yescas wrote:
> Problem: On large page size configurations (16KiB, 64KiB), the CMA
> alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> and this causes the CMA reservations to be larger than necessary.
> This means that system will have less available MIGRATE_UNMOVABLE and
> MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
>
> The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
Sure, but why would any architecture *NOT* want to set this?
This seems like you're making each architecture bump into the problem
by itself, when the real problem is that the CMA people never thought
about this and should have come up with better defaults.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 18:21 ` Kalesh Singh
@ 2025-05-01 18:40 ` Zi Yan
0 siblings, 0 replies; 12+ messages in thread
From: Zi Yan @ 2025-05-01 18:40 UTC (permalink / raw)
To: Kalesh Singh
Cc: Juan Yescas, Catalin Marinas, Will Deacon, Andrew Morton,
linux-arm-kernel, linux-kernel, linux-mm, tjmercier,
isaacmanjarres, surenb, Vlastimil Babka, Liam R. Howlett,
Lorenzo Stoakes, David Hildenbrand, Mike Rapoport, Minchan Kim
On 1 May 2025, at 14:21, Kalesh Singh wrote:
> On Thu, May 1, 2025 at 10:11 AM Juan Yescas <jyescas@google.com> wrote:
>>
>> On Thu, May 1, 2025 at 7:24 AM Zi Yan <ziy@nvidia.com> wrote:
>>>
>>> On 1 May 2025, at 1:25, Juan Yescas wrote:
>>>
>>>> Problem: On large page size configurations (16KiB, 64KiB), the CMA
>>>> alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
>>>> and this causes the CMA reservations to be larger than necessary.
>>>> This means that system will have less available MIGRATE_UNMOVABLE and
>>>> MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
>>>>
>>>> The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
>>>> MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
>>>> ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
>>>>
>>>> For example, the CMA alignment requirement when:
>>>>
>>>> - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
>>>> - CONFIG_TRANSPARENT_HUGEPAGE is set:
>>>>
>>>> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
>>>> -----------------------------------------------------------------------
>>>> 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
>>>> 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
>>>> 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
>>>>
>>>> There are some extreme cases for the CMA alignment requirement when:
>>>>
>>>> - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
>>>> - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
>>>> - CONFIG_HUGETLB_PAGE is NOT set
>>>>
>>>> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
>>>> ------------------------------------------------------------------------
>>>> 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
>>>> 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
>>>> 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
>>>>
>>>> This affects the CMA reservations for the drivers. If a driver in a
>>>> 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
>>>> reservation has to be 32MiB due to the alignment requirements:
>>>>
>>>> reserved-memory {
>>>> ...
>>>> cma_test_reserve: cma_test_reserve {
>>>> compatible = "shared-dma-pool";
>>>> size = <0x0 0x400000>; /* 4 MiB */
>>>> ...
>>>> };
>>>> };
>>>>
>>>> reserved-memory {
>>>> ...
>>>> cma_test_reserve: cma_test_reserve {
>>>> compatible = "shared-dma-pool";
>>>> size = <0x0 0x2000000>; /* 32 MiB */
>>>> ...
>>>> };
>>>> };
>>>>
>>>> Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
>>>> allows to set the page block order. The maximum page block
>>>> order will be given by ARCH_FORCE_MAX_ORDER.
>>>
>>> Why not use a boot time parameter to change page block order?
>>
>> That is a good option. The main tradeoff is:
>>
>> - The bootloader would have to be updated on the devices to pass the right
>> pageblock_order value depending on the kernel page size. Currently,
>> We can boot 4k/16k kernels without any change in the bootloader.
>
> Once we change the page block order we likely need to update the CMA
> reservations in the device tree to match the new min alignment, which
> needs to be recompiled and flashed to the device. So there is likely
> not a significant process saving by making the page block order a boot
> parameter.
Got it. Thank you for the explanation.
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 5:25 [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order Juan Yescas
2025-05-01 14:24 ` Zi Yan
2025-05-01 18:38 ` Matthew Wilcox
@ 2025-05-01 18:49 ` Zi Yan
2025-05-01 21:17 ` Juan Yescas
2 siblings, 1 reply; 12+ messages in thread
From: Zi Yan @ 2025-05-01 18:49 UTC (permalink / raw)
To: Juan Yescas
Cc: Catalin Marinas, Will Deacon, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Minchan Kim
On 1 May 2025, at 1:25, Juan Yescas wrote:
> Problem: On large page size configurations (16KiB, 64KiB), the CMA
> alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> and this causes the CMA reservations to be larger than necessary.
> This means that system will have less available MIGRATE_UNMOVABLE and
> MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
>
> The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
>
> For example, the CMA alignment requirement when:
>
> - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> - CONFIG_TRANSPARENT_HUGEPAGE is set:
>
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> -----------------------------------------------------------------------
> 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
> 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
> 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
>
> There are some extreme cases for the CMA alignment requirement when:
>
> - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> - CONFIG_HUGETLB_PAGE is NOT set
>
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> ------------------------------------------------------------------------
> 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
> 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
> 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
>
> This affects the CMA reservations for the drivers. If a driver in a
> 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> reservation has to be 32MiB due to the alignment requirements:
>
> reserved-memory {
> ...
> cma_test_reserve: cma_test_reserve {
> compatible = "shared-dma-pool";
> size = <0x0 0x400000>; /* 4 MiB */
> ...
> };
> };
>
> reserved-memory {
> ...
> cma_test_reserve: cma_test_reserve {
> compatible = "shared-dma-pool";
> size = <0x0 0x2000000>; /* 32 MiB */
> ...
> };
> };
>
> Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
> allows to set the page block order. The maximum page block
> order will be given by ARCH_FORCE_MAX_ORDER.
>
> By default, ARCH_FORCE_PAGE_BLOCK_ORDER will have the same
> value that ARCH_FORCE_MAX_ORDER. This will make sure that
> current kernel configurations won't be affected by this
> change. It is a opt-in change.
>
> This patch will allow to have the same CMA alignment
> requirements for large page sizes (16KiB, 64KiB) as that
> in 4kb kernels by setting a lower pageblock_order.
>
> Tests:
>
> - Verified that HugeTLB pages work when pageblock_order is 1, 7, 10
> on 4k and 16k kernels.
>
> - Verified that Transparent Huge Pages work when pageblock_order
> is 1, 7, 10 on 4k and 16k kernels.
>
> - Verified that dma-buf heaps allocations work when pageblock_order
> is 1, 7, 10 on 4k and 16k kernels.
>
> Benchmarks:
>
> The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The
> reason for the pageblock_order 7 is because this value makes the min
> CMA alignment requirement the same as that in 4kb kernels (2MB).
>
> - Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
> SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf
> (https://developer.android.com/ndk/guides/simpleperf) to measure
> the # of instructions and page-faults on 16k kernels.
> The benchmark was executed 10 times. The averages are below:
>
> # instructions | #page-faults
> order 10 | order 7 | order 10 | order 7
> --------------------------------------------------------
> 13,891,765,770 | 11,425,777,314 | 220 | 217
> 14,456,293,487 | 12,660,819,302 | 224 | 219
> 13,924,261,018 | 13,243,970,736 | 217 | 221
> 13,910,886,504 | 13,845,519,630 | 217 | 221
> 14,388,071,190 | 13,498,583,098 | 223 | 224
> 13,656,442,167 | 12,915,831,681 | 216 | 218
> 13,300,268,343 | 12,930,484,776 | 222 | 218
> 13,625,470,223 | 14,234,092,777 | 219 | 218
> 13,508,964,965 | 13,432,689,094 | 225 | 219
> 13,368,950,667 | 13,683,587,37 | 219 | 225
> -------------------------------------------------------------------
> 13,803,137,433 | 13,131,974,268 | 220 | 220 Averages
>
> There were 4.85% #instructions when order was 7, in comparison
> with order 10.
>
> 13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)
>
> The number of page faults in order 7 and 10 were the same.
>
> These results didn't show any significant regression when the
> pageblock_order is set to 7 on 16kb kernels.
>
> - Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
> on the 16k kernels with pageblock_order 7 and 10.
>
> order 10 | order 7 | order 7 - order 10 | (order 7 - order 10) %
> -------------------------------------------------------------------
> 15.8 | 16.4 | 0.6 | 3.80%
> 16.4 | 16.2 | -0.2 | -1.22%
> 16.6 | 16.3 | -0.3 | -1.81%
> 16.8 | 16.3 | -0.5 | -2.98%
> 16.6 | 16.8 | 0.2 | 1.20%
> -------------------------------------------------------------------
> 16.44 16.4 -0.04 -0.24% Averages
>
> The results didn't show any significant regression when the
> pageblock_order is set to 7 on 16kb kernels.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Cc: David Hildenbrand <david@redhat.com>
> CC: Mike Rapoport <rppt@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Juan Yescas <jyescas@google.com>
> ---
> arch/arm64/Kconfig | 14 ++++++++++++++
> include/linux/pageblock-flags.h | 12 +++++++++---
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a182295e6f08..d784049e1e01 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1658,6 +1658,20 @@ config ARCH_FORCE_MAX_ORDER
>
> Don't change if unsure.
>
> +config ARCH_FORCE_PAGE_BLOCK_ORDER
> + int "Page Block Order"
> + range 1 ARCH_FORCE_MAX_ORDER
> + default ARCH_FORCE_MAX_ORDER
> + help
> + The page block order refers to the power of two number of pages that
> + are physically contiguous and can have a migrate type associated to them.
> + The maximum size of the page block order is limited by ARCH_FORCE_MAX_ORDER.
Since memory compaction operates at pageblock granularity and pageblock size
usually matches THP size, a smaller pageblock size degrades kernel
anti-fragmentation mechanism for THP significantly. Can you add something like
the text below to the help section?
"Reducing pageblock order can negatively impact THP generation successful rate.
If your workloads uses THP heavily, please use this option with caution."
Otherwise, Acked-by: Zi Yan <ziy@nvidia.com>
I am also OK if you move this to mm/Kconfig.
> +
> + This option allows overriding the default setting when the page
> + block order requires to be smaller than ARCH_FORCE_MAX_ORDER.
> +
> + Don't change if unsure.
> +
> config UNMAP_KERNEL_AT_EL0
> bool "Unmap kernel when running in userspace (KPTI)" if EXPERT
> default y
> diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
> index fc6b9c87cb0a..ab3de96bb50c 100644
> --- a/include/linux/pageblock-flags.h
> +++ b/include/linux/pageblock-flags.h
> @@ -28,6 +28,12 @@ enum pageblock_bits {
> NR_PAGEBLOCK_BITS
> };
>
> +#if defined(CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER)
> +#define PAGE_BLOCK_ORDER CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER
> +#else
> +#define PAGE_BLOCK_ORDER MAX_PAGE_ORDER
> +#endif /* CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER */
> +
> #if defined(CONFIG_HUGETLB_PAGE)
>
> #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
> @@ -41,18 +47,18 @@ extern unsigned int pageblock_order;
> * Huge pages are a constant size, but don't exceed the maximum allocation
> * granularity.
> */
> -#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
> +#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, PAGE_BLOCK_ORDER)
>
> #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>
> #elif defined(CONFIG_TRANSPARENT_HUGEPAGE)
>
> -#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
> +#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)
>
> #else /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> /* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
> -#define pageblock_order MAX_PAGE_ORDER
> +#define pageblock_order PAGE_BLOCK_ORDER
>
> #endif /* CONFIG_HUGETLB_PAGE */
>
> --
> 2.49.0.906.g1f30a19c02-goog
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 18:38 ` Matthew Wilcox
@ 2025-05-01 19:27 ` Juan Yescas
2025-05-01 21:07 ` Juan Yescas
2025-05-02 11:37 ` Will Deacon
2 siblings, 0 replies; 12+ messages in thread
From: Juan Yescas @ 2025-05-01 19:27 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Catalin Marinas, Will Deacon, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Zi Yan, Minchan Kim
On Thu, May 1, 2025 at 11:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Apr 30, 2025 at 10:25:11PM -0700, Juan Yescas wrote:
> > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > and this causes the CMA reservations to be larger than necessary.
> > This means that system will have less available MIGRATE_UNMOVABLE and
> > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> >
> > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
>
> Sure, but why would any architecture *NOT* want to set this?
You are right, not all the architectures support large page sizes and don't have
the CMA alignment issues that arm64 with 16k and 64k page sizes have.
This change only affects arm64 architectures.
> This seems like you're making each architecture bump into the problem
> by itself, when the real problem is that the CMA people never thought
> about this and should have come up with better defaults.
This change will only affect arm64 architectures. By default,
ARCH_FORCE_PAGE_BLOCK_ORDER
will have the same value that ARCH_FORCE_MAX_ORDER (This is the current
behaviour). If the kernel is configured with large page sizes for
arm64, the user
can decide to change the pageblock_order or leave the default. It is
an opt-in feature
for arm64.
Thanks
Juan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 18:38 ` Matthew Wilcox
2025-05-01 19:27 ` Juan Yescas
@ 2025-05-01 21:07 ` Juan Yescas
2025-05-02 11:37 ` Will Deacon
2 siblings, 0 replies; 12+ messages in thread
From: Juan Yescas @ 2025-05-01 21:07 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Catalin Marinas, Will Deacon, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Zi Yan, Minchan Kim
On Thu, May 1, 2025 at 11:38 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Apr 30, 2025 at 10:25:11PM -0700, Juan Yescas wrote:
> > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > and this causes the CMA reservations to be larger than necessary.
> > This means that system will have less available MIGRATE_UNMOVABLE and
> > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> >
> > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
>
> Sure, but why would any architecture *NOT* want to set this?
> This seems like you're making each architecture bump into the problem
> by itself, when the real problem is that the CMA people never thought
> about this and should have come up with better defaults.
Sorry Matthew about my previous reply. I think I misunderstood you.
You mean that we should move the configuration to mm/Kconfig as per Zi
suggestion so that other architectures can utilize if need? and have
something like this in mm/Kconfig
config ARCH_FORCE_PAGE_BLOCK_ORDER
int "Page Block Order"
range 1 ARCH_FORCE_MAX_ORDER
default ARCH_FORCE_MAX_ORDER if ARCH_FORCE_MAX_ORDER
default 10 if !ARCH_FORCE_MAX_ORDER
Or should we include ARCH_FORCE_PAGE_BLOCK_ORDER only on the
architectures that configure ARCH_FORCE_MAX_ORDER? In this case
arch/arc/Kconfig
arch/arm/Kconfig
arch/arm64/Kconfig
arch/loongarch/Kconfig
arch/m68k/Kconfig.cpu
arch/mips/Kconfig
arch/nios2/Kconfig
arch/powerpc/Kconfig
arch/sh/mm/Kconfig
arch/sparc/Kconfig
arch/xtensa/Kconfig
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 18:49 ` Zi Yan
@ 2025-05-01 21:17 ` Juan Yescas
0 siblings, 0 replies; 12+ messages in thread
From: Juan Yescas @ 2025-05-01 21:17 UTC (permalink / raw)
To: Zi Yan
Cc: Catalin Marinas, Will Deacon, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Minchan Kim
On Thu, May 1, 2025 at 11:49 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 1 May 2025, at 1:25, Juan Yescas wrote:
>
> > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > and this causes the CMA reservations to be larger than necessary.
> > This means that system will have less available MIGRATE_UNMOVABLE and
> > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> >
> > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
> >
> > For example, the CMA alignment requirement when:
> >
> > - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> > - CONFIG_TRANSPARENT_HUGEPAGE is set:
> >
> > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > -----------------------------------------------------------------------
> > 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
> > 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
> > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> >
> > There are some extreme cases for the CMA alignment requirement when:
> >
> > - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> > - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> > - CONFIG_HUGETLB_PAGE is NOT set
> >
> > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > ------------------------------------------------------------------------
> > 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
> > 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
> > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> >
> > This affects the CMA reservations for the drivers. If a driver in a
> > 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> > reservation has to be 32MiB due to the alignment requirements:
> >
> > reserved-memory {
> > ...
> > cma_test_reserve: cma_test_reserve {
> > compatible = "shared-dma-pool";
> > size = <0x0 0x400000>; /* 4 MiB */
> > ...
> > };
> > };
> >
> > reserved-memory {
> > ...
> > cma_test_reserve: cma_test_reserve {
> > compatible = "shared-dma-pool";
> > size = <0x0 0x2000000>; /* 32 MiB */
> > ...
> > };
> > };
> >
> > Solution: Add a new config ARCH_FORCE_PAGE_BLOCK_ORDER that
> > allows to set the page block order. The maximum page block
> > order will be given by ARCH_FORCE_MAX_ORDER.
> >
> > By default, ARCH_FORCE_PAGE_BLOCK_ORDER will have the same
> > value that ARCH_FORCE_MAX_ORDER. This will make sure that
> > current kernel configurations won't be affected by this
> > change. It is a opt-in change.
> >
> > This patch will allow to have the same CMA alignment
> > requirements for large page sizes (16KiB, 64KiB) as that
> > in 4kb kernels by setting a lower pageblock_order.
> >
> > Tests:
> >
> > - Verified that HugeTLB pages work when pageblock_order is 1, 7, 10
> > on 4k and 16k kernels.
> >
> > - Verified that Transparent Huge Pages work when pageblock_order
> > is 1, 7, 10 on 4k and 16k kernels.
> >
> > - Verified that dma-buf heaps allocations work when pageblock_order
> > is 1, 7, 10 on 4k and 16k kernels.
> >
> > Benchmarks:
> >
> > The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The
> > reason for the pageblock_order 7 is because this value makes the min
> > CMA alignment requirement the same as that in 4kb kernels (2MB).
> >
> > - Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
> > SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf
> > (https://developer.android.com/ndk/guides/simpleperf) to measure
> > the # of instructions and page-faults on 16k kernels.
> > The benchmark was executed 10 times. The averages are below:
> >
> > # instructions | #page-faults
> > order 10 | order 7 | order 10 | order 7
> > --------------------------------------------------------
> > 13,891,765,770 | 11,425,777,314 | 220 | 217
> > 14,456,293,487 | 12,660,819,302 | 224 | 219
> > 13,924,261,018 | 13,243,970,736 | 217 | 221
> > 13,910,886,504 | 13,845,519,630 | 217 | 221
> > 14,388,071,190 | 13,498,583,098 | 223 | 224
> > 13,656,442,167 | 12,915,831,681 | 216 | 218
> > 13,300,268,343 | 12,930,484,776 | 222 | 218
> > 13,625,470,223 | 14,234,092,777 | 219 | 218
> > 13,508,964,965 | 13,432,689,094 | 225 | 219
> > 13,368,950,667 | 13,683,587,37 | 219 | 225
> > -------------------------------------------------------------------
> > 13,803,137,433 | 13,131,974,268 | 220 | 220 Averages
> >
> > There were 4.85% #instructions when order was 7, in comparison
> > with order 10.
> >
> > 13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)
> >
> > The number of page faults in order 7 and 10 were the same.
> >
> > These results didn't show any significant regression when the
> > pageblock_order is set to 7 on 16kb kernels.
> >
> > - Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
> > on the 16k kernels with pageblock_order 7 and 10.
> >
> > order 10 | order 7 | order 7 - order 10 | (order 7 - order 10) %
> > -------------------------------------------------------------------
> > 15.8 | 16.4 | 0.6 | 3.80%
> > 16.4 | 16.2 | -0.2 | -1.22%
> > 16.6 | 16.3 | -0.3 | -1.81%
> > 16.8 | 16.3 | -0.5 | -2.98%
> > 16.6 | 16.8 | 0.2 | 1.20%
> > -------------------------------------------------------------------
> > 16.44 16.4 -0.04 -0.24% Averages
> >
> > The results didn't show any significant regression when the
> > pageblock_order is set to 7 on 16kb kernels.
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > CC: Mike Rapoport <rppt@kernel.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Signed-off-by: Juan Yescas <jyescas@google.com>
> > ---
> > arch/arm64/Kconfig | 14 ++++++++++++++
> > include/linux/pageblock-flags.h | 12 +++++++++---
> > 2 files changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index a182295e6f08..d784049e1e01 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1658,6 +1658,20 @@ config ARCH_FORCE_MAX_ORDER
> >
> > Don't change if unsure.
> >
> > +config ARCH_FORCE_PAGE_BLOCK_ORDER
> > + int "Page Block Order"
> > + range 1 ARCH_FORCE_MAX_ORDER
> > + default ARCH_FORCE_MAX_ORDER
> > + help
> > + The page block order refers to the power of two number of pages that
> > + are physically contiguous and can have a migrate type associated to them.
> > + The maximum size of the page block order is limited by ARCH_FORCE_MAX_ORDER.
>
> Since memory compaction operates at pageblock granularity and pageblock size
> usually matches THP size, a smaller pageblock size degrades kernel
> anti-fragmentation mechanism for THP significantly. Can you add something like
> the text below to the help section?
>
> "Reducing pageblock order can negatively impact THP generation successful rate.
> If your workloads uses THP heavily, please use this option with caution."
>
Thanks Zi for Pointing this out. I will add the comment in the help section.
> Otherwise, Acked-by: Zi Yan <ziy@nvidia.com>
>
> I am also OK if you move this to mm/Kconfig.
>
This seems reasonable to me.
> > +
> > + This option allows overriding the default setting when the page
> > + block order requires to be smaller than ARCH_FORCE_MAX_ORDER.
> > +
> > + Don't change if unsure.
> > +
> > config UNMAP_KERNEL_AT_EL0
> > bool "Unmap kernel when running in userspace (KPTI)" if EXPERT
> > default y
> > diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
> > index fc6b9c87cb0a..ab3de96bb50c 100644
> > --- a/include/linux/pageblock-flags.h
> > +++ b/include/linux/pageblock-flags.h
> > @@ -28,6 +28,12 @@ enum pageblock_bits {
> > NR_PAGEBLOCK_BITS
> > };
> >
> > +#if defined(CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER)
> > +#define PAGE_BLOCK_ORDER CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER
> > +#else
> > +#define PAGE_BLOCK_ORDER MAX_PAGE_ORDER
> > +#endif /* CONFIG_ARCH_FORCE_PAGE_BLOCK_ORDER */
> > +
> > #if defined(CONFIG_HUGETLB_PAGE)
> >
> > #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
> > @@ -41,18 +47,18 @@ extern unsigned int pageblock_order;
> > * Huge pages are a constant size, but don't exceed the maximum allocation
> > * granularity.
> > */
> > -#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
> > +#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, PAGE_BLOCK_ORDER)
> >
> > #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
> >
> > #elif defined(CONFIG_TRANSPARENT_HUGEPAGE)
> >
> > -#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
> > +#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)
> >
> > #else /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > /* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
> > -#define pageblock_order MAX_PAGE_ORDER
> > +#define pageblock_order PAGE_BLOCK_ORDER
> >
> > #endif /* CONFIG_HUGETLB_PAGE */
> >
> > --
> > 2.49.0.906.g1f30a19c02-goog
>
>
> --
> Best Regards,
> Yan, Zi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-01 18:38 ` Matthew Wilcox
2025-05-01 19:27 ` Juan Yescas
2025-05-01 21:07 ` Juan Yescas
@ 2025-05-02 11:37 ` Will Deacon
2025-05-05 18:58 ` Juan Yescas
2 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2025-05-02 11:37 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Juan Yescas, Catalin Marinas, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Zi Yan, Minchan Kim
On Thu, May 01, 2025 at 07:38:13PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 30, 2025 at 10:25:11PM -0700, Juan Yescas wrote:
> > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > and this causes the CMA reservations to be larger than necessary.
> > This means that system will have less available MIGRATE_UNMOVABLE and
> > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> >
> > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
>
> Sure, but why would any architecture *NOT* want to set this?
> This seems like you're making each architecture bump into the problem
> by itself, when the real problem is that the CMA people never thought
> about this and should have come up with better defaults.
Yes, I agree. It would be nice if arm64 wasn't the odd duck here. You'd
think Power and Risc-V would benefit from similar treatement, if nothing
else.
Will
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
2025-05-02 11:37 ` Will Deacon
@ 2025-05-05 18:58 ` Juan Yescas
0 siblings, 0 replies; 12+ messages in thread
From: Juan Yescas @ 2025-05-05 18:58 UTC (permalink / raw)
To: Will Deacon
Cc: Matthew Wilcox, Catalin Marinas, Andrew Morton, linux-arm-kernel,
linux-kernel, linux-mm, tjmercier, isaacmanjarres, surenb,
kaleshsingh, Vlastimil Babka, Liam R. Howlett, Lorenzo Stoakes,
David Hildenbrand, Mike Rapoport, Zi Yan, Minchan Kim
On Fri, May 2, 2025 at 4:37 AM Will Deacon <will@kernel.org> wrote:
>
> On Thu, May 01, 2025 at 07:38:13PM +0100, Matthew Wilcox wrote:
> > On Wed, Apr 30, 2025 at 10:25:11PM -0700, Juan Yescas wrote:
> > > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > > and this causes the CMA reservations to be larger than necessary.
> > > This means that system will have less available MIGRATE_UNMOVABLE and
> > > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> > >
> > > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
> >
> > Sure, but why would any architecture *NOT* want to set this?
> > This seems like you're making each architecture bump into the problem
> > by itself, when the real problem is that the CMA people never thought
> > about this and should have come up with better defaults.
>
> Yes, I agree. It would be nice if arm64 wasn't the odd duck here. You'd
> think Power and Risc-V would benefit from similar treatement, if nothing
> else.
>
> Will
Thanks for the comments, I sent the v2 version with the changes:
[PATCH v2] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order
Thanks
Juan
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-05-05 18:58 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-01 5:25 [PATCH] mm: Add ARCH_FORCE_PAGE_BLOCK_ORDER to select page block order Juan Yescas
2025-05-01 14:24 ` Zi Yan
2025-05-01 17:11 ` Juan Yescas
2025-05-01 18:21 ` Kalesh Singh
2025-05-01 18:40 ` Zi Yan
2025-05-01 18:38 ` Matthew Wilcox
2025-05-01 19:27 ` Juan Yescas
2025-05-01 21:07 ` Juan Yescas
2025-05-02 11:37 ` Will Deacon
2025-05-05 18:58 ` Juan Yescas
2025-05-01 18:49 ` Zi Yan
2025-05-01 21:17 ` Juan Yescas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).