Re: [PATCH v6] mm: Add CONFIG_PAGE_BLOCK_ORDER to select page block order

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Juan Yescas <jyescas@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: tjmercier@google.com, isaacmanjarres@google.com,
	kaleshsingh@google.com, masahiroy@kernel.org,
	Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH v6] mm: Add CONFIG_PAGE_BLOCK_ORDER to select page block order
Date: Wed, 21 May 2025 08:47:15 +0200	[thread overview]
Message-ID: <28a2881d-fd33-44d9-a212-adeb8600e15b@redhat.com> (raw)
In-Reply-To: <20250520225945.991229-1-jyescas@google.com>

On 21.05.25 00:59, Juan Yescas wrote:
> Problem: On large page size configurations (16KiB, 64KiB), the CMA
> alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> and this causes the CMA reservations to be larger than necessary.
> This means that system will have less available MIGRATE_UNMOVABLE and
> MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> 
> The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
> 
> For example, in ARM, the CMA alignment requirement when:
> 
> - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> - CONFIG_TRANSPARENT_HUGEPAGE is set:
> 
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> -----------------------------------------------------------------------
>     4KiB   |      10        |      10         |  4KiB * (2 ^ 10)  =  4MiB

Why is pageblock_nr_pages 10 in that case?

	#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)

So it should be 2 MiB (order-9)?

>    16Kib   |      11        |      11         | 16KiB * (2 ^ 11) =  32MiB
>    64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB
> 
> There are some extreme cases for the CMA alignment requirement when:
> 
> - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> - CONFIG_HUGETLB_PAGE is NOT set

I think we should just always group at HPAGE_PMD_ORDER also in this case. But that's
a different thing to sort out :)

> 
> PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order |  CMA_MIN_ALIGNMENT_BYTES
> ------------------------------------------------------------------------
>     4KiB   |      15        |      15         |  4KiB * (2 ^ 15) = 128MiB
>    16Kib   |      13        |      13         | 16KiB * (2 ^ 13) = 128MiB
>    64KiB   |      13        |      13         | 64KiB * (2 ^ 13) = 512MiB
> 
> This affects the CMA reservations for the drivers. If a driver in a
> 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> reservation has to be 32MiB due to the alignment requirements:
> 
> reserved-memory {
>      ...
>      cma_test_reserve: cma_test_reserve {
>          compatible = "shared-dma-pool";
>          size = <0x0 0x400000>; /* 4 MiB */
>          ...
>      };
> };
> 
> reserved-memory {
>      ...
>      cma_test_reserve: cma_test_reserve {
>          compatible = "shared-dma-pool";
>          size = <0x0 0x2000000>; /* 32 MiB */
>          ...
>      };
> };
> 
> Solution: Add a new config CONFIG_PAGE_BLOCK_ORDER that
> allows to set the page block order in all the architectures.
> The maximum page block order will be given by
> ARCH_FORCE_MAX_ORDER.
> 
> By default, CONFIG_PAGE_BLOCK_ORDER will have the same
> value that ARCH_FORCE_MAX_ORDER. This will make sure that
> current kernel configurations won't be affected by this
> change. It is a opt-in change.
> 
> This patch will allow to have the same CMA alignment
> requirements for large page sizes (16KiB, 64KiB) as that
> in 4kb kernels by setting a lower pageblock_order.
> 
> Tests:
> 
> - Verified that HugeTLB pages work when pageblock_order is 1, 7, 10
> on 4k and 16k kernels.
> 
> - Verified that Transparent Huge Pages work when pageblock_order
> is 1, 7, 10 on 4k and 16k kernels.
> 
> - Verified that dma-buf heaps allocations work when pageblock_order
> is 1, 7, 10 on 4k and 16k kernels.
> 
> Benchmarks:
> 
> The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The
> reason for the pageblock_order 7 is because this value makes the min
> CMA alignment requirement the same as that in 4kb kernels (2MB).
> 
> - Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
> SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf
> (https://developer.android.com/ndk/guides/simpleperf) to measure
> the # of instructions and page-faults on 16k kernels.
> The benchmark was executed 10 times. The averages are below:
> 
>             # instructions         |     #page-faults
>      order 10     |  order 7       | order 10 | order 7
> --------------------------------------------------------
>   13,891,765,770	 | 11,425,777,314 |    220   |   217
>   14,456,293,487	 | 12,660,819,302 |    224   |   219
>   13,924,261,018	 | 13,243,970,736 |    217   |   221
>   13,910,886,504	 | 13,845,519,630 |    217   |   221
>   14,388,071,190	 | 13,498,583,098 |    223   |   224
>   13,656,442,167	 | 12,915,831,681 |    216   |   218
>   13,300,268,343	 | 12,930,484,776 |    222   |   218
>   13,625,470,223	 | 14,234,092,777 |    219   |   218
>   13,508,964,965	 | 13,432,689,094 |    225   |   219
>   13,368,950,667	 | 13,683,587,37  |    219   |   225
> -------------------------------------------------------------------
>   13,803,137,433  | 13,131,974,268 |    220   |   220    Averages
> 
> There were 4.85% #instructions when order was 7, in comparison
> with order 10.
> 
>       13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)
> 
> The number of page faults in order 7 and 10 were the same.
> 
> These results didn't show any significant regression when the
> pageblock_order is set to 7 on 16kb kernels.
> 
> - Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
>   on the 16k kernels with pageblock_order 7 and 10.
> 
> order 10 | order 7  | order 7 - order 10 | (order 7 - order 10) %
> -------------------------------------------------------------------
>    15.8	 |  16.4    |         0.6        |     3.80%
>    16.4	 |  16.2    |        -0.2        |    -1.22%
>    16.6	 |  16.3    |        -0.3        |    -1.81%
>    16.8	 |  16.3    |        -0.5        |    -2.98%
>    16.6	 |  16.8    |         0.2        |     1.20%
> -------------------------------------------------------------------
>    16.44     16.4            -0.04	          -0.24%   Averages
> 
> The results didn't show any significant regression when the
> pageblock_order is set to 7 on 16kb kernels.
> 

Sorry for the late reply. I think using a bootime option might have saved us
some of the headake. :)

[...]

> +/* Defines the order for the number of pages that have a migrate type. */
> +#ifndef CONFIG_PAGE_BLOCK_ORDER
> +#define PAGE_BLOCK_ORDER MAX_PAGE_ORDER
> +#else
> +#define PAGE_BLOCK_ORDER CONFIG_PAGE_BLOCK_ORDER
> +#endif /* CONFIG_PAGE_BLOCK_ORDER */
> +
> +/*
> + * The MAX_PAGE_ORDER, which defines the max order of pages to be allocated
> + * by the buddy allocator, has to be larger or equal to the PAGE_BLOCK_ORDER,
> + * which defines the order for the number of pages that can have a migrate type
> + */
> +#if (PAGE_BLOCK_ORDER > MAX_PAGE_ORDER)
> +#error MAX_PAGE_ORDER must be >= PAGE_BLOCK_ORDER
> +#endif
> +>   /*
>    * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deemed
>    * costly to service.  That is between allocation orders which should
> diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h
> index fc6b9c87cb0a..e73a4292ef02 100644
> --- a/include/linux/pageblock-flags.h
> +++ b/include/linux/pageblock-flags.h
> @@ -41,18 +41,18 @@ extern unsigned int pageblock_order;
>    * Huge pages are a constant size, but don't exceed the maximum allocation
>    * granularity.
>    */

How is CONFIG_HUGETLB_PAGE_SIZE_VARIABLE handled?

> -#define pageblock_order		MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
> +#define pageblock_order		MIN_T(unsigned int, HUGETLB_PAGE_ORDER, PAGE_BLOCK_ORDER)
>   
>   #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
>   
>   #elif defined(CONFIG_TRANSPARENT_HUGEPAGE)
>   
> -#define pageblock_order		MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
> +#define pageblock_order		MIN_T(unsigned int, HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)

Wait, why are we using the MIN_T in that case? If someone requests 4 MiB, why would we reduce
it to 2 MiB even though MAX_PAGE_ORDER allows for it?


Maybe we really have to clean all that up first :/

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-05-21  6:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-20 22:59 [PATCH v6] mm: Add CONFIG_PAGE_BLOCK_ORDER to select page block order Juan Yescas
2025-05-21  6:47 ` David Hildenbrand [this message]
2025-05-21 16:51   ` Juan Yescas
2025-05-28  7:31     ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28a2881d-fd33-44d9-a212-adeb8600e15b@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=isaacmanjarres@google.com \
    --cc=jyescas@google.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=masahiroy@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tjmercier@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).