linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Sourabh Jain <sourabhjain@linux.ibm.com>, linuxppc-dev@lists.ozlabs.org
Cc: Hari Bathini <hbathini@linux.ibm.com>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Mahesh Salgaonkar <mahesh@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Christophe Leroy <christophe.leroy@csgroup.eu>
Subject: Re: [PATCH v4] powerpc/hugetlb: Disable gigantic hugepages if fadump is active
Date: Thu, 06 Mar 2025 00:47:51 +0530	[thread overview]
Message-ID: <877c5370dc.fsf@gmail.com> (raw)
In-Reply-To: <0a6a29f2-e7a9-4c51-8ab4-3e2c33843d1c@linux.ibm.com>

Sourabh Jain <sourabhjain@linux.ibm.com> writes:

> Hello Ritesh,
>
>
> On 04/03/25 10:27, Ritesh Harjani (IBM) wrote:
>> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>>
>>> Hello Ritesh,
>>>
>>> Thanks for the review.
>>>
>>> On 02/03/25 12:05, Ritesh Harjani (IBM) wrote:
>>>> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>>>>
>>>>> The fadump kernel boots with limited memory solely to collect the kernel
>>>>> core dump. Having gigantic hugepages in the fadump kernel is of no use.
>>>> Sure got it.
>>>>
>>>>> Many times, the fadump kernel encounters OOM (Out of Memory) issues if
>>>>> gigantic hugepages are allocated.
>>>>>
>>>>> To address this, disable gigantic hugepages if fadump is active by
>>>>> returning early from arch_hugetlb_valid_size() using
>>>>> hugepages_supported(). When fadump is active, the global variable
>>>>> hugetlb_disabled is set to true, which is later used by the
>>>>> PowerPC-specific hugepages_supported() function to determine hugepage
>>>>> support.
>>>>>
>>>>> Returning early from arch_hugetlb_vali_size() not only disables
>>>>> gigantic hugepages but also avoids unnecessary hstate initialization for
>>>>> every hugepage size supported by the platform.
>>>>>
>>>>> kernel logs related to hugepages with this patch included:
>>>>> kernel argument passed: hugepagesz=1G hugepages=1
>>>>>
>>>>> First kernel: gigantic hugepage got allocated
>>>>> ==============================================
>>>>>
>>>>> dmesg | grep -i "hugetlb"
>>>>> -------------------------
>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>> HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
>>>>>
>>>>> $ cat /proc/meminfo | grep -i "hugetlb"
>>>>> -------------------------------------
>>>>> Hugetlb:         1048576 kB
>>>> Was this tested with patch [1] in your local tree?
>>>>
>>>> [1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d629d7a8efc33
>>>>
>>>> IIUC, this patch [1] disables the boot time allocation of hugepages.
>>>> Isn't it also disabling the boot time allocation for gigantic huge pages
>>>> passed by the cmdline params like hugepagesz=1G and hugepages=2 ?
>>> Yes, I had the patch [1] in my tree.
>>>
>>> My understanding is that gigantic pages are allocated before normal huge
>>> pages.
>>>
>>> In hugepages_setup() in hugetlb.c, we have:
>>>
>>>       if (hugetlb_max_hstate && hstate_is_gigantic(parsed_hstate))
>>>           hugetlb_hstate_alloc_pages(parsed_hstate);
>>>
>>> I believe the above code allocates memory for gigantic pages, and
>>> hugetlb_init() is
>>> called later because it is a subsys_initcall.
>>>
>>> So, by the time the kernel reaches hugetlb_init(), the gigantic pages
>>> are already
>>> allocated. Isn't that right?
>>>
>>> Please let me know your opinion.
>> Yes, you are right. We are allocating hugepages from memblock, however
>> this isn't getting advertized anywhere. i.e. there is no way one can
>> know from any user interface on whether hugepages were allocated or not.
>> i.e. for fadump kernel when hugepagesz= and hugepages= params are
>> passed, though it will allocate gigantic pages, it won't advertize this
>> in meminfo or anywhere else. This was adding the confusion when I tested
>> this (which wasn't clear from the commit msg either).
>>
>> And I guess this is happening during fadump kernel because of our patch
>> [1], which added a check to see whether hugetlb_disabled is true in
>> hugepages_supported(). Due to this hugetlb_init() is now not doing the
>> rest of the initialization for those gigantic pages which were allocated
>> due to cmdline options from hugepages_setup().
>>
>> [1]: https://lore.kernel.org/linuxppc-dev/20241202054310.928610-1-sourabhjain@linux.ibm.com/
>>
>> Now as we know from below that fadump can set hugetlb_disabled call in early_setup().
>> i.e. fadump can mark hugetlb_disabled to true in
>> early_setup() -> early_init_devtree() -> fadump_reserve_mem()
>>
>> And hugepages_setup() and hugepagesz_setup() gets called late in
>> start_kernel() -> parse_args()
>>
>>
>> And we already check for hugepages_supported() in all necessary calls in
>> mm/hugetlb.c. So IMO, this check should go in mm/hugetlb.c in
>> hugepagesz_setup() and hugepages_setup(). Because otherwise every arch
>> implementation will end up duplicating this by adding
>> hugepages_supported() check in their arch implementation of
>> arch_hugetlb_valid_size().
>>
>> e.g. references of hugepages_supported() checks in mm/hugetlb.c
>>
>> mm/hugetlb.c hugetlb_show_meminfo_node 4959 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_report_node_meminfo 4943 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_report_meminfo 4914 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_overcommit_handler 4848 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_sysctl_handler_common 4809 if (!hugepages_supported())
>> mm/hugetlb.c hugetlb_init 4461 if (!hugepages_supported()) {
>> mm/hugetlb.c dissolve_free_hugetlb_folios 2211 if (!hugepages_supported())
>> fs/hugetlbfs/inode.c init_hugetlbfs_fs 1604 if (!hugepages_supported()) {
>>
>>
>> Let me also see the history on why this wasn't done earlier though...
>>
>> ... Oh actually there is more history to this. See [2]. We already had
>> hugepages_supported() check in hugepages_setup() and other places
>> earlier which was removed to fix some other problem in ppc ;)
>>
>> [2]: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2833a5bf75b3657c4dd20b3709c8c702754cb1f
>>
>>
>> Hence I believe this needs a wider cleanup than just fixing it for our
>> arch. I see there is a patch series already fixing these code paths,
>> which is also cleaning up the path of gigantic hugepage allocation in
>> hugepages_setup(). I think it is in mm-unstable branch too. Can we
>> please review & test that to make sure that the fadump usecase of
>> disabling hugepages & gigantic are getting covered properly?
>>
>> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-fvdl@google.com/
>
> I evaluated the patch series [3] for the fadump issue, and here are my 
> observations:
>
> Currently, the patch series [3] does not fix the issue I am trying to 
> address with this patch.
>
> With patch series [3] applied, I see the following logs when booting the 
> fadump kernel with
> hugepagesz=1G hugepages=1
> |
> |
> With just Patch series [3]:
> ------------------------------------
>
> kdump:/# dmesg | grep -i HugeTLB
> [    0.000000] HugeTLB: allocating 10 of page size 1.00 GiB failed.  
> Only allocated 9 hugepages.
> [    0.405964] HugeTLB support is disabled!
> [    0.409162] HugeTLB: huge pages not supported, ignoring associated 
> command-line parameters
> [    0.437740] hugetlbfs: disabling because there are no supported 
> hugepage sizes
>
> One good thing is that the kernel now at least reports the gigantic 
> pages allocated, which was
> not the case before. I think patch series [3] introduced that improvement.
>
> Now, on top of patch series [3], I applied this fix, and the kernel 
> prints the following logs:
>
> Patch series [3] + this fix:
> ------------------------------------
>
> [    0.000000] HugeTLB: unsupported hugepagesz=1G
> [    0.000000] HugeTLB: hugepages=10 does not follow a valid hugepagesz, 
> ignoring
> [    0.366158] HugeTLB support is disabled!
> [    0.398004] hugetlbfs: disabling because there are no supported 
> hugepage sizes
>
> With these logs, one can clearly identify what is happening.
>
> What are your thoughts on this fix now?
>
> Do you still think handling this in generic code is better?

I believe so yes (unless we have a valid reason for not doing that).
hugepages_supported() is an arch specific call. If you see the prints
above what we are essentially saying is that this is not a valid
hugepagesz. But that's not the case really right. What it should just
reflect is that the hugepages support is disabled. 

That being said, I will have to go and look into that series to suggest,
where in that path it should use hugepages_supported() arch call to see
if the hugepages are supported or not before initializing. And hopefully
as you suggested since our cmdline parsing problem was already solved,
it should not be a problem in using hugepages_supported() during cmdline
parsing phase.
But let me check that series and get back.


> Given that I was already advised to handle things in arch
> code. [4]
>
> Or should we keep it this way?
> One advantage handling things in arch_hugetlb_valid_size() is that it helps
> avoid populating hstates since it is not required anyway. I am not claiming
> that it is not possible in generic code.

IMO, even adding hugepages_supported() check at the right place should avoid
populating hstates too. But let's confirm that.

-ritesh

>
> Thoughts?
>
> Thanks,
> Sourabh Jain
>
>
> [3]: https://lore.kernel.org/all/20250228182928.2645936-1-fvdl@google.com/
> [4]: https://lore.kernel.org/all/20250122150613.28a92438@thinkpad-T15/


  reply	other threads:[~2025-03-05 19:26 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-28  4:33 [PATCH v4] powerpc/hugetlb: Disable gigantic hugepages if fadump is active Sourabh Jain
2025-02-24  4:04 ` Sourabh Jain
2025-03-02  6:35 ` Ritesh Harjani
2025-03-03  6:22   ` Sourabh Jain
2025-03-04  4:57     ` Ritesh Harjani
2025-03-05  5:48       ` Sourabh Jain
2025-03-05  9:38       ` Sourabh Jain
2025-03-05 19:17         ` Ritesh Harjani [this message]
2025-03-06  8:30           ` Sourabh Jain
2025-03-03  6:34   ` Sourabh Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877c5370dc.fsf@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=hbathini@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mahesh@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=sourabhjain@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).