From: Sourabh Jain <sourabhjain@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org
Subject: Re: powerpc: Set crashkernel offset to mid of RMA region
Date: Fri, 4 Feb 2022 14:44:12 +0530 [thread overview]
Message-ID: <a4352457-7ffd-0fb9-7a86-92c9b50c957b@linux.ibm.com> (raw)
In-Reply-To: <87czk4mco0.fsf@mpe.ellerman.id.au>
On 03/02/22 16:37, Michael Ellerman wrote:
> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>> On 01/02/22 17:14, Michael Ellerman wrote:
>>> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>>>> On large config LPARs (having 192 and more cores), Linux fails to boot
>>>> due to insufficient memory in the first memblock. It is due to the
>>>> memory reservation for the crash kernel which starts at 128MB offset of
>>>> the first memblock. This memory reservation for the crash kernel doesn't
>>>> leave enough space in the first memblock to accommodate other essential
>>>> system resources.
>>>>
>>>> The crash kernel start address was set to 128MB offset by default to
>>>> ensure that the crash kernel get some memory below the RMA region which
>>>> is used to be of size 256MB. But given that the RMA region size can be
>>>> 512MB or more, setting the crash kernel offset to mid of RMA size will
>>>> leave enough space for kernel to allocate memory for other system
>>>> resources.
>>>>
>>>> Since the above crash kernel offset change is only applicable to the LPAR
>>>> platform, the LPAR feature detection is pushed before the crash kernel
>>>> reservation. The rest of LPAR specific initialization will still
>>>> be done during pseries_probe_fw_features as usual.
>>>>
>>>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>>>> Reported-and-tested-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
>>>>
>>>> ---
>>>> arch/powerpc/kernel/rtas.c | 4 ++++
>>>> arch/powerpc/kexec/core.c | 15 +++++++++++----
>>>> 2 files changed, 15 insertions(+), 4 deletions(-)
>>>>
>>>> ---
>>>> Change in v3:
>>>> Dropped 1st and 2nd patch from v2. 1st and 2nd patch from v2 patch
>>>> series [1] try to discover 1T segment MMU feature support
>>>> BEFORE boot CPU paca allocation ([1] describes why it is needed).
>>>> MPE has posted a patch [2] that archives a similar objective by moving
>>>> boot CPU paca allocation after mmu_early_init_devtree().
>>>>
>>>> NOTE: This patch is dependent on the patch [2].
>>>>
>>>> [1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20211018084434.217772-3-sourabhjain@linux.ibm.com/
>>>> [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
>>>> ---
>>>>
>>>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>>>> index 733e6ef36758..06df7464fb57 100644
>>>> --- a/arch/powerpc/kernel/rtas.c
>>>> +++ b/arch/powerpc/kernel/rtas.c
>>>> @@ -1313,6 +1313,10 @@ int __init early_init_dt_scan_rtas(unsigned long node,
>>>> entryp = of_get_flat_dt_prop(node, "linux,rtas-entry", NULL);
>>>> sizep = of_get_flat_dt_prop(node, "rtas-size", NULL);
>>>>
>>>> + /* need this feature to decide the crashkernel offset */
>>>> + if (of_get_flat_dt_prop(node, "ibm,hypertas-functions", NULL))
>>>> + powerpc_firmware_features |= FW_FEATURE_LPAR;
>>>> +
>>> As you'd have seen this breaks the 32-bit build. It will need an #ifdef
>>> CONFIG_PPC64 around it.
>>>
>>>> if (basep && entryp && sizep) {
>>>> rtas.base = *basep;
>>>> rtas.entry = *entryp;
>>>> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>>>> index 8b68d9f91a03..abf5897ae88c 100644
>>>> --- a/arch/powerpc/kexec/core.c
>>>> +++ b/arch/powerpc/kexec/core.c
>>>> @@ -134,11 +134,18 @@ void __init reserve_crashkernel(void)
>>>> if (!crashk_res.start) {
>>>> #ifdef CONFIG_PPC64
>>>> /*
>>>> - * On 64bit we split the RMO in half but cap it at half of
>>>> - * a small SLB (128MB) since the crash kernel needs to place
>>>> - * itself and some stacks to be in the first segment.
>>>> + * On the LPAR platform place the crash kernel to mid of
>>>> + * RMA size (512MB or more) to ensure the crash kernel
>>>> + * gets enough space to place itself and some stack to be
>>>> + * in the first segment. At the same time normal kernel
>>>> + * also get enough space to allocate memory for essential
>>>> + * system resource in the first segment. Keep the crash
>>>> + * kernel starts at 128MB offset on other platforms.
>>>> */
>>>> - crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>>>> + if (firmware_has_feature(FW_FEATURE_LPAR))
>>>> + crashk_res.start = ppc64_rma_size / 2;
>>>> + else
>>>> + crashk_res.start = min(0x8000000ULL, (ppc64_rma_size / 2));
>>> I think this will break on machines using Radix won't it? At this point
>>> in boot ppc64_rma_size will be == 0. Because we won't call into
>>> hash__setup_initial_memory_limit().
>>>
>>> That's not changed by your patch, but seems like this code needs to be
>>> more careful/clever.
>> Interesting, but in my testing, I found that ppc64_rma_size
>> did get initialized before reserve_crashkernel() using radix on LPAR.
>>
>> I am not sure why but hash__setup_initial_memory_limit() function is
>> gets called
>> regardless of radix or hash. Not sure whether it is by design but here
>> is the flow:
> It sort of is by design. See:
>
> 103a8542cb35 ("powerpc/book3s64/radix: Fix boot failure with large amount of guest memory")
>
> Basically the hash restrictions are more strict, so we apply them until
> we know we will use radix.
>
> But ...
>
>> setup_initial_memory_limit()
>>
>> static inline void setup_initial_memory_limit()
>> (arch/powerpc/include/asm/book3s/64/mmu.h)
>>
>> if (!early_radix_enabled()) // FALSE regardless of radix is enabled or not
> You mean early_radix_enabled() is False regardless. But that's not true
> in all cases.
>
> We can now build the kernel without hash MMU support at all, see:
>
> 387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU")
>
> In which case early_radix_enabled() will be true here, because it's hard
> coded to be true at build time.
Sorry, my bad I was not aware of that. But when both hash and radix
are enabled the early_radix_enabled return FALSE regardless it is disabled
using kernel command line or not is confusing to me. Maybe it is too early
in the boot sequence...
>
>> hash__setup_initial_memory_limit() // initialize ppc64_rma_size
>>
>> reserve_crashkernel() // initialize crashkernel offset to mid of RMA size.
>>
>>
>> For the sack of understanding even if we restrict crashkernel offset
>> setting to mid RMA (i.e. ppc64_rma_size/2) for
>> only hash it may not save radix because even today we are assigning
>> crashkernel offset using
>> ppc64_rma_size variable.
> Yes. There's already a bug there, your patch doesn't make it better or worse.
>
>> Is the current flow of initializing ppc64_rma_size variable before
>> reserve_crashkernel() for radix expected?
>>
>> Please provide your input.
> I wonder if we're better off moving the crash kernel reservation later,
> once we've discovered what MMU we're using.
>
> I can't immediately see why that would be a problem, as long as we do
> the reservation before we do any (many?) allocations. I'll have to think
> about it a bit more though, these boot ordering things are always
> subtle.
Agree we have space to improve this piece of code. Let me know
if I can help you to make this better.
> For now I think this patch is OK if you send a v2 to fix the compile
> error
I sent the next version in the mailing list. Thanks for the support.
- Sourabh Jain
next prev parent reply other threads:[~2022-02-04 9:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-28 10:04 powerpc: Set crashkernel offset to mid of RMA region Sourabh Jain
2022-02-01 6:50 ` kernel test robot
2022-02-01 11:10 ` Hari Bathini
2022-02-04 9:00 ` Sourabh Jain
2022-02-01 11:44 ` Michael Ellerman
2022-02-02 15:08 ` Sourabh Jain
2022-02-03 11:07 ` Michael Ellerman
2022-02-04 9:14 ` Sourabh Jain [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-02-04 8:56 Sourabh Jain
2022-02-16 12:25 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4352457-7ffd-0fb9-7a86-92c9b50c957b@linux.ibm.com \
--to=sourabhjain@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).