Re: [PATCH v9 0/5] arm64/riscv: Add support for crashkernel CMA reservation

From: Sourabh Jain <sourabhjain@linux.ibm.com>
To: Jinjie Ruan <ruanjinjie@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: corbet@lwn.net, skhan@linuxfoundation.org,
	catalin.marinas@arm.com, will@kernel.org, chenhuacai@kernel.org,
	kernel@xen0n.name, maddy@linux.ibm.com, mpe@ellerman.id.au,
	npiggin@gmail.com, chleroy@kernel.org, pjw@kernel.org,
	palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr,
	tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, robh@kernel.org,
	saravanak@kernel.org, bhe@redhat.com, vgoyal@redhat.com,
	dyoung@redhat.com, rdunlap@infradead.org, peterz@infradead.org,
	feng.tang@linux.alibaba.com, pawan.kumar.gupta@linux.intel.com,
	dapeng1.mi@linux.intel.com, kees@kernel.org, elver@google.com,
	paulmck@kernel.org, lirongqing@baidu.com, safinaskar@gmail.com,
	rppt@kernel.org, ardb@kernel.org, leitao@debian.org,
	jbohac@suse.cz, cfsworks@gmail.com, osandov@fb.com,
	tangyouling@kylinos.cn, ritesh.list@gmail.com,
	eajames@linux.ibm.com, songshuaishuai@tinylab.org,
	kevin.brodsky@arm.com, samuel.holland@sifive.com,
	vishal.moola@gmail.com, junhui.liu@pigmoral.tech,
	coxu@redhat.com, liaoyuanhong@vivo.com,
	fuqiang.wang@easystack.cn, x86@kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	devicetree@vger.kernel.org, kexec@lists.infradead.org
Subject: Re: [PATCH v9 0/5] arm64/riscv: Add support for crashkernel CMA reservation
Date: Tue, 24 Mar 2026 09:59:46 +0530	[thread overview]
Message-ID: <595e793d-adc3-4acb-af18-f0a3cf2d5e73@linux.ibm.com> (raw)
In-Reply-To: <4cfde40c-673a-12b0-dfc5-703d582d6ea9@huawei.com>



On 24/03/26 09:32, Jinjie Ruan wrote:
>
> On 2026/3/24 0:55, Andrew Morton wrote:
>> On Mon, 23 Mar 2026 15:27:40 +0800 Jinjie Ruan <ruanjinjie@huawei.com> wrote:
>>
>>> The crash memory allocation, and the exclude of crashk_res, crashk_low_res
>>> and crashk_cma memory are almost identical across different architectures,
>>> This patch set handle them in crash core in a general way, which eliminate
>>> a lot of duplication code.
>>>
>>> And add support for crashkernel CMA reservation for arm64 and riscv.
>> Thanks.  AI review has completed and it asks questions:
>> 	https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie@huawei.com
> I believe it identified 4 valid issues:
>
> - The already discovered crashk_low_res not excluded bug in the existing
> RISC-V code.
>
> - An existing memory leak issue in the existing PowerPC code.

Yes and suggested approach to fix the issue looks good.
Which is basically replace return with goto out.

diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index 898742a5205c..1426d2099bad 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -440,7 +440,7 @@ static void update_crash_elfcorehdr(struct kimage 
*image, struct memory_notify *
         ret = get_crash_memory_ranges(&cmem);
         if (ret) {
                 pr_err("Failed to get crash mem range\n");
-               return;
+               goto out;
         }

         /*

Are you planning to handle this in this patch series? Or do you want me 
to send a separate fix patch?


>
> - The ordering issue of adding CMA ranges to "linux,usable-memory-range".
>
> - An existing concurrency issue. A Concurrent memory hotplug may occur
> between reading memblock and attempting to fill cmem during kexec_load()
> for almost all existing architectures，I'm not sure if this is a
> practical issue in reality..
>
>   Race Condition Scenario
>
>    Timeline:
>    ---------------------------------------------------------------------
>    T1: kexec_load() syscall starts
>    T2: kexec_trylock() acquires kexec_lock
>    T3: crash_prepare_headers() is called
>    T4: arch_get_system_nr_ranges() queries memblock → finds 100 memory ranges
>    T5: cmem = alloc_cmem(100) allocates buffer for 100 ranges
>    T6: [RACE WINDOW] Another process triggers memory hotplug
>    T7: add_memory() → lock_device_hotplug() → memblock_add_node()
>    T8: New memory region added to memblock
>    T9: arch_crash_populate_cmem() iterates: now finds 102 ranges
>    T10: cmem->ranges[100] → OUT OF BOUNDS WRITE!
>    T11: cmem->ranges[101] → OUT OF BOUNDS WRITE!
>    T12: Kernel crash or memory corruption
>
>    Why This Happens
>
>    1. Different locks used:
>      - kexec_load() uses kexec_trylock (atomic_t)
>      - Memory hotplug uses device_hotplug_lock (mutex)
>    2. No synchronization between these two operations
>    3. Time-of-check to time-of-use (TOCTOU) issue:
>      - Step T4-T5: We query the number of ranges and allocate buffer
>      - Step T6-T9: Memory hotplug adds new ranges between query and
> population
>
>
>
> Any comments or suggestions on the following approach?
>
>
> int crash_prepare_headers(...)
>    {
>        unsigned int max_nr_ranges;
>        struct crash_mem *cmem;
>        int ret;
>
>        lock_device_hotplug();
>
>        max_nr_ranges = arch_get_system_nr_ranges();
>        // ...
>        ret = arch_crash_populate_cmem(cmem);
>        // ...
>
>        unlock_device_hotplug();
>        return ret;
>    }
>
>