From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>,
linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au,
Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linux-mm@kvack.org
Subject: Re: [PATCH v2] powerpc/mm: Update default hugetlb size early
Date: Fri, 11 Feb 2022 17:53:50 +0530 [thread overview]
Message-ID: <87tud5a8x5.fsf@linux.ibm.com> (raw)
In-Reply-To: <b77816ef-80fd-40b7-cf6e-6de2a3125eb1@redhat.com>
David Hildenbrand <david@redhat.com> writes:
> On 11.02.22 10:16, Aneesh Kumar K V wrote:
>> On 2/11/22 14:00, David Hildenbrand wrote:
>>> On 11.02.22 07:52, Aneesh Kumar K.V wrote:
>>>> commit: d9c234005227 ("Do not depend on MAX_ORDER when grouping pages by mobility")
>>>> introduced pageblock_order which will be used to group pages better.
>>>> The kernel now groups pages based on the value of HPAGE_SHIFT. Hence HPAGE_SHIFT
>>>> should be set before we call set_pageblock_order.
>>>>
>>>> set_pageblock_order happens early in the boot and default hugetlb page size
>>>> should be initialized before that to compute the right pageblock_order value.
>>>>
>>>> Currently, default hugetlbe page size is set via arch_initcalls which happens
>>>> late in the boot as shown via the below callstack:
>>>>
>>>> [c000000007383b10] [c000000001289328] hugetlbpage_init+0x2b8/0x2f8
>>>> [c000000007383bc0] [c0000000012749e4] do_one_initcall+0x14c/0x320
>>>> [c000000007383c90] [c00000000127505c] kernel_init_freeable+0x410/0x4e8
>>>> [c000000007383da0] [c000000000012664] kernel_init+0x30/0x15c
>>>> [c000000007383e10] [c00000000000cf14] ret_from_kernel_thread+0x5c/0x64
>>>>
>>>> and the pageblock_order initialization is done early during the boot.
>>>>
>>>> [c0000000018bfc80] [c0000000012ae120] set_pageblock_order+0x50/0x64
>>>> [c0000000018bfca0] [c0000000012b3d94] sparse_init+0x188/0x268
>>>> [c0000000018bfd60] [c000000001288bfc] initmem_init+0x28c/0x328
>>>> [c0000000018bfe50] [c00000000127b370] setup_arch+0x410/0x480
>>>> [c0000000018bfed0] [c00000000127401c] start_kernel+0xb8/0x934
>>>> [c0000000018bff90] [c00000000000d984] start_here_common+0x1c/0x98
>>>>
>>>> delaying default hugetlb page size initialization implies the kernel will
>>>> initialize pageblock_order to (MAX_ORDER - 1) which is not an optimal
>>>> value for mobility grouping. IIUC we always had this issue. But it was not
>>>> a problem for hash translation mode because (MAX_ORDER - 1) is the same as
>>>> HUGETLB_PAGE_ORDER (8) in the case of hash (16MB). With radix,
>>>> HUGETLB_PAGE_ORDER will be 5 (2M size) and hence pageblock_order should be
>>>> 5 instead of 8.
>>>
>>>
>>> A related question: Can we on ppc still have pageblock_order > MAX_ORDER
>>> - 1? We have some code for that and I am not so sure if we really need that.
>>>
>>
>> I also have been wondering about the same. On book3s64 I don't think we
>> need that support for both 64K and 4K page size because with hash
>> hugetlb size is MAX_ORDER -1. (16MB hugepage size)
>>
>> I am not sure about the 256K page support. Christophe may be able to
>> answer that.
>>
>> For the gigantic hugepage support we depend on cma based allocation or
>> firmware reservation. So I am not sure why we ever considered pageblock
>> > MAX_ORDER -1 scenario. If you have pointers w.r.t why that was ever
>> needed, I could double-check whether ppc64 is still dependent on that.
>
> commit dc78327c0ea7da5186d8cbc1647bd6088c5c9fa5
> Author: Michal Nazarewicz <mina86@mina86.com>
> Date: Wed Jul 2 15:22:35 2014 -0700
>
> mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER
>
> indicates that at least arm64 used to have cases for that as well.
>
> However, nowadays with ARM64_64K_PAGES we have FORCE_MAX_ZONEORDER=14 as
> default, corresponding to 512MiB.
>
> So I'm not sure if this is something worth supporting. If you want
> somewhat reliable gigantic pages, use CMA or preallocate them during boot.
>
> --
> Thanks,
>
> David / dhildenb
I could build a kernel with FORCE_MAX_ZONEORDER=8 and pageblock_order =
8. We need to disable THP for such a kernel to boot, because THP do
check for PMD_ORDER < MAX_ORDER. I was able to boot that kernel on a
virtualized platform, but then gigantic_page_runtime_supported is not
supported on such config with hash translation.
On non virtualized platform I am hitting crashes like below during boot.
[ 47.637865][ C42] =============================================================================
[ 47.637907][ C42] BUG pgtable-2^11 (Not tainted): Object already free
[ 47.637925][ C42] -----------------------------------------------------------------------------
[ 47.637925][ C42]
[ 47.637945][ C42] Allocated in __pud_alloc+0x84/0x2a0 age=278 cpu=40 pid=1409
[ 47.637974][ C42] __slab_alloc.isra.0+0x40/0x60
[ 47.637995][ C42] kmem_cache_alloc+0x1a8/0x510
[ 47.638010][ C42] __pud_alloc+0x84/0x2a0
[ 47.638024][ C42] copy_page_range+0x38c/0x1b90
[ 47.638040][ C42] dup_mm+0x548/0x880
[ 47.638058][ C42] copy_process+0xdc0/0x1e90
[ 47.638076][ C42] kernel_clone+0xd4/0x9d0
[ 47.638094][ C42] __do_sys_clone+0x88/0xe0
[ 47.638112][ C42] system_call_exception+0x368/0x3a0
[ 47.638128][ C42] system_call_common+0xec/0x250
[ 47.638147][ C42] Freed in __tlb_remove_table+0x1d4/0x200 age=263 cpu=57 pid=326
[ 47.638172][ C42] kmem_cache_free+0x44c/0x680
[ 47.638187][ C42] __tlb_remove_table+0x1d4/0x200
[ 47.638204][ C42] tlb_remove_table_rcu+0x54/0xa0
[ 47.638222][ C42] rcu_core+0xdd4/0x15d0
[ 47.638239][ C42] __do_softirq+0x360/0x69c
[ 47.638257][ C42] run_ksoftirqd+0x54/0xc0
[ 47.638273][ C42] smpboot_thread_fn+0x28c/0x2f0
[ 47.638290][ C42] kthread+0x1a4/0x1b0
[ 47.638305][ C42] ret_from_kernel_thread+0x5c/0x64
[ 47.638320][ C42] Slab 0xc00c00000000d600 objects=10 used=9 fp=0xc0000000035a8000 flags=0x7ffff000010201(locked|slab|head|node=0|zone=0|lastcpupid=0x7ffff)
[ 47.638352][ C42] Object 0xc0000000035a8000 @offset=163840 fp=0x0000000000000000
[ 47.638352][ C42]
[ 47.638373][ C42] Redzone c0000000035a4000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638394][ C42] Redzone c0000000035a4010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638414][ C42] Redzone c0000000035a4020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638435][ C42] Redzone c0000000035a4030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638455][ C42] Redzone c0000000035a4040: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638474][ C42] Redzone c0000000035a4050: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638494][ C42] Redzone c0000000035a4060: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638514][ C42] Redzone c0000000035a4070: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[ 47.638534][ C42] Redzone c0000000035a4080: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
next prev parent reply other threads:[~2022-02-11 12:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-11 6:52 [PATCH v2] powerpc/mm: Update default hugetlb size early Aneesh Kumar K.V
2022-02-11 8:30 ` David Hildenbrand
2022-02-11 9:16 ` Aneesh Kumar K V
2022-02-11 10:05 ` David Hildenbrand
2022-02-11 12:23 ` Aneesh Kumar K.V [this message]
2022-02-11 12:29 ` David Hildenbrand
2022-02-11 14:40 ` Aneesh Kumar K.V
2022-02-16 12:25 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tud5a8x5.fsf@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.