From: Sourabh Jain <sourabhjain@linux.ibm.com>
To: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>,
linuxppc-dev@lists.ozlabs.org
Cc: Baoquan he <bhe@redhat.com>, Jiri Bohac <jbohac@suse.cz>,
Hari Bathini <hbathini@linux.ibm.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Mahesh Salgaonkar <mahesh@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Shivang Upadhyay <shivangu@linux.ibm.com>,
kexec@lists.infradead.org
Subject: Re: [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation
Date: Tue, 4 Nov 2025 15:04:37 +0530 [thread overview]
Message-ID: <ea44849c-9d0f-43cc-9476-42bc619728f6@linux.ibm.com> (raw)
In-Reply-To: <7957bd55-5bda-406f-aab3-64e0620bd452@linux.ibm.com>
On 04/11/25 10:48, Sourabh Jain wrote:
>
>
> On 03/11/25 15:40, Ritesh Harjani (IBM) wrote:
>> Sourabh Jain <sourabhjain@linux.ibm.com> writes:
>>
>>> Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the
>>> crashkernel= command line option") and commit ab475510e042 ("kdump:
>>> implement reserve_crashkernel_cma") added CMA support for kdump
>>> crashkernel reservation.
>>>
>>> Extend crashkernel CMA reservation support to powerpc.
>>>
>>> The following changes are made to enable CMA reservation on powerpc:
>>>
>>> - Parse and obtain the CMA reservation size along with other
>>> crashkernel
>>> parameters
>>> - Call reserve_crashkernel_cma() to allocate the CMA region for kdump
>>> - Include the CMA-reserved ranges in the usable memory ranges for the
>>> kdump kernel to use.
>>> - Exclude the CMA-reserved ranges from the crash kernel memory to
>>> prevent them from being exported through /proc/vmcore.
>>>
>>> With the introduction of the CMA crashkernel regions,
>>> crash_exclude_mem_range() needs to be called multiple times to exclude
>>> both crashk_res and crashk_cma_ranges from the crash memory ranges. To
>>> avoid repetitive logic for validating mem_ranges size and handling
>>> reallocation when required, this functionality is moved to a new
>>> wrapper
>>> function crash_exclude_mem_range_guarded().
>>>
>>> To ensure proper CMA reservation, reserve_crashkernel_cma() is called
>>> after pageblock_order is initialized.
>>>
>>> Update kernel-parameters.txt to document CMA support for crashkernel on
>>> powerpc architecture.
>>>
>>> Cc: Baoquan he <bhe@redhat.com>
>>> Cc: Jiri Bohac <jbohac@suse.cz>
>>> Cc: Hari Bathini <hbathini@linux.ibm.com>
>>> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
>>> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>>> Cc: Shivang Upadhyay <shivangu@linux.ibm.com>
>>> Cc: kexec@lists.infradead.org
>>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>>> ---
>>> Changlog:
>>>
>>> v3 -> v4
>>> - Removed repeated initialization to tmem in
>>> crash_exclude_mem_range_guarded()
>>> - Call crash_exclude_mem_range() with right crashk ranges
>>>
>>> v4 -> v5:
>>> - Document CMA-based crashkernel support for ppc64 in
>>> kernel-parameters.txt
>>> ---
>>> .../admin-guide/kernel-parameters.txt | 2 +-
>>> arch/powerpc/include/asm/kexec.h | 2 +
>>> arch/powerpc/kernel/setup-common.c | 4 +-
>>> arch/powerpc/kexec/core.c | 10 ++++-
>>> arch/powerpc/kexec/ranges.c | 43
>>> ++++++++++++++-----
>>> 5 files changed, 47 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt
>>> b/Documentation/admin-guide/kernel-parameters.txt
>>> index 6c42061ca20e..0f386b546cec 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -1013,7 +1013,7 @@
>>> It will be ignored when crashkernel=X,high is not used
>>> or memory reserved is below 4G.
>>> crashkernel=size[KMG],cma
>>> - [KNL, X86] Reserve additional crash kernel memory from
>>> + [KNL, X86, ppc64] Reserve additional crash kernel
>>> memory from
>> Shouldn't this be PPC and not ppc64?
>>
>> If I see the crash_dump support...
>>
>> config ARCH_SUPPORTS_CRASH_DUMP
>> def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
>>
>> The changes below aren't specific to ppc64 correct?
>
> The thing is this feature is only supported with KEXEC_FILE and which
> only supported on PPC64:
>
> config ARCH_SUPPORTS_KEXEC_FILE
> def_bool PPC64
>
> Hence I kept it as ppc64.
>
> I think I should update that in the commit message.
>
> Also do you think is it good to restrict this feature to KEXEC_FILE?
Putting this under KEXEC_FILE may not help much because KEXEC_FILE is
enabled
by default in most configurations. Once it is enabled, the CMA
reservation will
happen regardless of which system call is used to load the kdump kernel
(kexec_load or kexec_file_load).
However, not restricting this feature to KEXEC_FILE will allow the kexec
tool to
independently add support for this feature in the future for the kexec_load
system call.
With that logic, I think if we do not restrict this feature to
KEXEC_FILE, the support
will be available for ppc and not limited to ppc64.
>
>>
>>> CMA. This reservation is usable by the first system's
>>> userspace memory and kernel movable allocations (memory
>>> balloon, zswap). Pages allocated from this memory range
>>> diff --git a/arch/powerpc/include/asm/kexec.h
>>> b/arch/powerpc/include/asm/kexec.h
>>> index 4bbf9f699aaa..bd4a6c42a5f3 100644
>>> --- a/arch/powerpc/include/asm/kexec.h
>>> +++ b/arch/powerpc/include/asm/kexec.h
>>> @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage
>>> *image, void *fdt, struct crash_mem
>>> #ifdef CONFIG_CRASH_RESERVE
>>> int __init overlaps_crashkernel(unsigned long start, unsigned long
>>> size);
>>> extern void arch_reserve_crashkernel(void);
>>> +extern void kdump_cma_reserve(void);
>>> #else
>>> static inline void arch_reserve_crashkernel(void) {}
>>> static inline int overlaps_crashkernel(unsigned long start,
>>> unsigned long size) { return 0; }
>>> +static inline void kdump_cma_reserve(void) { }
>>> #endif
>>> #if defined(CONFIG_CRASH_DUMP)
>>> diff --git a/arch/powerpc/kernel/setup-common.c
>>> b/arch/powerpc/kernel/setup-common.c
>>> index 68d47c53876c..c8c42b419742 100644
>>> --- a/arch/powerpc/kernel/setup-common.c
>>> +++ b/arch/powerpc/kernel/setup-common.c
>>> @@ -35,6 +35,7 @@
>>> #include <linux/of_irq.h>
>>> #include <linux/hugetlb.h>
>>> #include <linux/pgtable.h>
>>> +#include <asm/kexec.h>
>>> #include <asm/io.h>
>>> #include <asm/paca.h>
>>> #include <asm/processor.h>
>>> @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p)
>>> initmem_init();
>>> /*
>>> - * Reserve large chunks of memory for use by CMA for fadump,
>>> KVM and
>>> + * Reserve large chunks of memory for use by CMA for kdump,
>>> fadump, KVM and
>>> * hugetlb. These must be called after initmem_init(), so that
>>> * pageblock_order is initialised.
>>> */
>>> fadump_cma_init();
>>> + kdump_cma_reserve();
>>> kvm_cma_reserve();
>>> gigantic_hugetlb_cma_reserve();
>>> diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
>>> index d1a2d755381c..25744737eff5 100644
>>> --- a/arch/powerpc/kexec/core.c
>>> +++ b/arch/powerpc/kexec/core.c
>>> @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image)
>>> {
>>> }
>>> +unsigned long long cma_size;
>>> +
>> nit:
>> Since this is a gloabal powerpc variable you are defining, then can we
>> keep it's name to crashk_cma_size?
>
> Yeah make sense. I will update the variable name.
>
>
>>
>>> /*
>>> * Do not allocate memory (or fail in any way) in machine_kexec().
>>> * We are past the point of no return, committed to rebooting now.
>>> @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void)
>>> /* use common parsing */
>>> ret = parse_crashkernel(boot_command_line, total_mem_sz,
>>> &crash_size,
>>> - &crash_base, NULL, NULL, NULL);
>>> + &crash_base, NULL, &cma_size, NULL);
>>> if (ret)
>>> return;
>>> @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void)
>>> reserve_crashkernel_generic(crash_size, crash_base, 0, false);
>>> }
>>> +void __init kdump_cma_reserve(void)
>>> +{
>>> + if (cma_size)
>>> + reserve_crashkernel_cma(cma_size);
>>> +}
>>> +
>> nit:
>> cma_size is already checked for null within reserve_crashkernel_cma(),
>> so we don't really need kdump_cma_reserve() function call as such.
>>
>> Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA..
>> so instead do you think we can directly call
>> reserve_crashkernel_cma(cma_size)?
>
> I think the above kdump_cma_reserve() definition should come under
> CONFIG_CRASH_RESERVE
> because the way it is declared in arch/powerpc/include/asm/kexec.h.
>
> I would like to keep kdump_cma_reserve() as is it because of two reasons:
>
> - It keeps setup_arch() free from kdump #ifdefs
> - In case if we want to add some condition on this reservation it
> would straight forward.
>
> So lets keep kdump_cma_reserve as is, unless you have strong opinion
> on not to.
>
>>> int __init overlaps_crashkernel(unsigned long start, unsigned long
>>> size)
>>> {
>>> return (start + size) > crashk_res.start && start <=
>>> crashk_res.end;
>>> diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
>>> index 3702b0bdab14..3bd27c38726b 100644
>>> --- a/arch/powerpc/kexec/ranges.c
>>> +++ b/arch/powerpc/kexec/ranges.c
>>> @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> */
>>> int get_usable_memory_ranges(struct crash_mem **mem_ranges)
>>> {
>>> - int ret;
>>> + int ret, i;
>>> /*
>>> * Early boot failure observed on guests when low memory
>>> (first memory
>>> @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> if (ret)
>>> goto out;
>>> + for (i = 0; i < crashk_cma_cnt; i++) {
>>> + ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start,
>>> + crashk_cma_ranges[i].end -
>>> crashk_cma_ranges[i].start + 1);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> ret = add_rtas_mem_range(mem_ranges);
>>> if (ret)
>>> goto out;
>>> @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> #endif /* CONFIG_KEXEC_FILE */
>>> #ifdef CONFIG_CRASH_DUMP
>>> +static int crash_exclude_mem_range_guarded(struct crash_mem
>>> **mem_ranges,
>>> + unsigned long long mstart,
>>> + unsigned long long mend)
>>> +{
>>> + struct crash_mem *tmem = *mem_ranges;
>>> +
>>> + /* Reallocate memory ranges if there is no space to split
>>> ranges */
>>> + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> + tmem = realloc_mem_ranges(mem_ranges);
>>> + if (!tmem)
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + return crash_exclude_mem_range(tmem, mstart, mend);
>>> +}
>>> +
>>> /**
>>> * get_crash_memory_ranges - Get crash memory ranges. This list
>>> includes
>>> * first/crashing kernel's memory
>>> regions that
>>> @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> int get_crash_memory_ranges(struct crash_mem **mem_ranges)
>>> {
>>> phys_addr_t base, end;
>>> - struct crash_mem *tmem;
>>> u64 i;
>>> int ret;
>>> @@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem
>>> **mem_ranges)
>>> sort_memory_ranges(*mem_ranges, true);
>>> }
>>> - /* Reallocate memory ranges if there is no space to split
>>> ranges */
>>> - tmem = *mem_ranges;
>>> - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
>>> - tmem = realloc_mem_ranges(mem_ranges);
>>> - if (!tmem)
>>> - goto out;
>>> - }
>>> -
>>> /* Exclude crashkernel region */
>>> - ret = crash_exclude_mem_range(tmem, crashk_res.start,
>>> crashk_res.end);
>>> + ret = crash_exclude_mem_range_guarded(mem_ranges,
>>> crashk_res.start, crashk_res.end);
>>> if (ret)
>>> goto out;
>>> + for (i = 0; i < crashk_cma_cnt; ++i) {
>>> + ret = crash_exclude_mem_range_guarded(mem_ranges,
>>> crashk_cma_ranges[i].start,
>>> + crashk_cma_ranges[i].end);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> /*
>>> * FIXME: For now, stay in parity with kexec-tools but if
>>> RTAS/OPAL
>>> * regions are exported to save their context at the
>>> time of
>>> --
>>> 2.51.0
>
next prev parent reply other threads:[~2025-11-04 9:35 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-03 4:37 [PATCH v5] powerpc/kdump: Add support for crashkernel CMA reservation Sourabh Jain
2025-11-03 10:10 ` Ritesh Harjani
2025-11-04 5:18 ` Sourabh Jain
2025-11-04 9:34 ` Sourabh Jain [this message]
2025-11-04 10:24 ` Ritesh Harjani
2025-11-04 12:38 ` Sourabh Jain
2025-11-04 10:18 ` Ritesh Harjani
2025-11-04 10:35 ` Sourabh Jain
2025-11-04 10:51 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea44849c-9d0f-43cc-9476-42bc619728f6@linux.ibm.com \
--to=sourabhjain@linux.ibm.com \
--cc=bhe@redhat.com \
--cc=hbathini@linux.ibm.com \
--cc=jbohac@suse.cz \
--cc=kexec@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mahesh@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=ritesh.list@gmail.com \
--cc=shivangu@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).