Re: [PATCH v14 17/17] arm64/crash: Add crash hotplug support

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Jinjie Ruan" <ruanjinjie@huawei.com>
Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org
Subject: Re: [PATCH v14 17/17] arm64/crash: Add crash hotplug support
Date: Mon, 25 May 2026 10:17:45 +0000	[thread overview]
Message-ID: <20260525101746.9959D1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260525084932.934910-18-ruanjinjie@huawei.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:

New issues:
- [High] Incorrect `kbuf.memsz` calculation in `load_other_segments` when `CONFIG_CRASH_HOTPLUG` is enabled leads to memory truncation or kernel hang.
- [High] Erroneous fall-through for CPU hotplug events contradicts documentation and leads to data races / use-after-free.
- [High] Using `memcpy_flushcache` on ARM64 without guaranteeing a PoC flush compromises the kdump kernel integrity.
- [Medium] Missing `elfcorehdr_index` initialization allows hijacking of the crash environment via a crafted initrd.
- [Low] The variable `pnum` is unused when `CONFIG_CRASH_HOTPLUG` is disabled, causing build failures with `-Werror`.

Pre-existing issues:
- [High] Freeing `headers` on `kexec_add_buffer` failure leads to a use-after-free on retry and a double-free during cleanup.
--

commit 35229a9002089e25646d549d3be6f48c32587974
Author: Jinjie Ruan <ruanjinjie@huawei.com>

arm64/crash: Add crash hotplug support

This commit adds crash hotplug support for ARM64 by implementing the
architecture-specific functions arch_crash_hotplug_support(),
arch_crash_get_elfcorehdr_size(), and arch_crash_handle_hotplug_event(). It
primarily handles memory add/remove events to update the elfcorehdr,
reflecting the current memory layout without needing to reload the entire
kdump image.

Link: https://lore.kernel.org/all/20240529133446.28446-1-Jonathan.Cameron@huawei.com/
Link: https://lore.kernel.org/all/20260328074013.3589544-1-ruanjinjie@huawei.com/

> diff --git a/arch/arm64/kernel/crash.c b/arch/arm64/kernel/crash.c
> new file mode 100644
> index 0000000000000..2114375820da9
> --- /dev/null
> +++ b/arch/arm64/kernel/crash.c

[ ... ]

> +static void update_crash_elfcorehdr(struct kimage *image)
> +{
> +	void *elfbuf = NULL, *old_elfcorehdr;
> +	unsigned long mem, memsz;
> +	unsigned long elfsz = 0;

[ ... ]

> +	/*
> +	 * Temporarily invalidate the crash image while the
> +	 * elfcorehdr is updated.
> +	 */
> +	xchg(&kexec_crash_image, NULL);
> +	memcpy_flushcache(old_elfcorehdr, elfbuf, elfsz);

[Severity: High]
Is it guaranteed that these caches are flushed to the Point of Coherency? On
ARM64, if CONFIG_ARM64_PMEM is disabled, CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
is also disabled, causing memcpy_flushcache() to silently fall back to a
standard memcpy() without flushing caches.

Since the kdump kernel boots initially with the MMU and data caches
disabled, could the kdump kernel read stale or garbage data directly from
physical RAM instead of the newly updated elfcorehdr?

> +	xchg(&kexec_crash_image, image);
> +	pr_debug("updated elfcorehdr\n");
> +
> +out:
> +	vfree(elfbuf);
> +}
> +
> +/**
> + * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
> + * @image: a pointer to kexec_crash_image
> + * @arg: struct memory_notify handler for memory hotplug case and
> + *       NULL for CPU hotplug case.
> + *
> + * Update the kdump image based on the type of hotplug event:
> + * - CPU add and remove: No action is needed.
> + * - Memory add/remove: Update the elfcorehdr to reflect the current memory layout.
> + *
> + * Prepare the new elfcorehdr and replace the existing elfcorehdr.
> + */
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
> +{
> +	switch (image->hp_action) {
> +	case KEXEC_CRASH_HP_ADD_CPU:
> +		fallthrough;
> +	case KEXEC_CRASH_HP_REMOVE_CPU:
> +		if (image->file_mode || image->elfcorehdr_updated)
> +			return;
> +		fallthrough;

[Severity: High]
Can this fall-through to the memory hotplug handler cause a data race? The
commit message states that CPU hotplug events require no action on ARM64.

If the CPU cases fall through to update_crash_elfcorehdr(), it will iterate
over memblock.memory.regions. Since CPU hotplug notifiers might not hold the
required locks to protect against concurrent memory hotplug operations,
could this iteration run concurrently with physical memory operations and
lead to a use-after-free?

> +	case KEXEC_CRASH_HP_ADD_MEMORY:
> +	case KEXEC_CRASH_HP_REMOVE_MEMORY:
> +		update_crash_elfcorehdr(image);
> +		return;
> +	default:
> +		pr_warn_once("Unknown hotplug action\n");
> +	}
> +}
> +#endif /* CONFIG_CRASH_HOTPLUG */

> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> index d2985ce623062..781febd0f6db1 100644
> --- a/arch/arm64/kernel/machine_kexec_file.c
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -82,10 +82,11 @@ int load_other_segments(struct kimage *image,
>  			char *cmdline, void *headers,
>  			unsigned long headers_sz)
>  {
> -	struct kexec_buf kbuf = {};
> -	void *dtb = NULL;
>  	unsigned long initrd_load_addr = 0, dtb_len,
>  		      orig_segments = image->nr_segments;
> +	struct kexec_buf kbuf = {};
> +	unsigned long pnum = 0;

[Severity: Low]
Does this cause a compiler warning when CONFIG_CRASH_HOTPLUG is disabled?
The pnum variable is declared unconditionally here but only used inside an
ifdef CONFIG_CRASH_HOTPLUG block, which might trigger a -Wunused-variable
warning.

> +	void *dtb = NULL;
>  	int ret = 0;
>  
>  	kbuf.image = image;

[ ... ]

> @@ -98,6 +99,23 @@ int load_other_segments(struct kimage *image,
>  		kbuf.bufsz = headers_sz;
>  		kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>  		kbuf.memsz = headers_sz;
> +
> +#ifdef CONFIG_CRASH_HOTPLUG
> +		/*
> +		 * The elfcorehdr segment size accounts for VMCOREINFO, kernel_map
> +		 * maximum CPUs and maximum memory ranges.
> +		 */
> +		if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +			pnum = 2 + num_possible_cpus() + CONFIG_CRASH_MAX_MEMORY_RANGES;
> +		else
> +			pnum += 2 + num_possible_cpus();
> +
> +		if (pnum < (unsigned long)PN_XNUM)
> +			kbuf.memsz = pnum * sizeof(Elf64_Phdr) + sizeof(Elf64_Ehdr);

[Severity: High]
Could this calculation cause an integer underflow later? If
CONFIG_MEMORY_HOTPLUG is disabled, pnum only accounts for CPU headers and
VMCOREINFO. This calculation ignores the existing size of the core headers
(headers_sz) which contains the boot memory ranges.

If headers_sz exceeds kbuf.memsz even after alignment,
kexec_calculate_store_digests() might underflow when calculating the null
padding size:

    nullsz = ksegment->memsz - ksegment->bufsz;

Would it be safer to ensure kbuf.memsz is at least as large as the original
headers_sz?

> +		else
> +			pr_err("number of Phdrs %lu exceeds max\n", pnum);
> +#endif
> +
>  		kbuf.buf_align = SZ_64K; /* largest supported page size */
>  		kbuf.buf_max = ULONG_MAX;
>  		kbuf.top_down = true;
> @@ -108,6 +126,7 @@ int load_other_segments(struct kimage *image,
>  			goto out_err;
>  		}

[Severity: Medium]
Is the elfcorehdr_index missing from this segment initialization? Without
explicitly setting image->elfcorehdr_index here, generic hotplug code in
crash_handle_hotplug_event() falls back to scanning all segments for the
ELFMAG signature and uses the index of the last matching segment.

Because ARM64 loads the initrd after the elfcorehdr, could a user supply a
crafted initrd starting with the ELF magic bytes to hijack the
elfcorehdr_index? Upon a hotplug event, the new elfcorehdr might overwrite
the initrd's physical memory instead.

[Severity: High]
This isn't a bug introduced by this patch, but does freeing headers in the
goto out_err path lead to a use-after-free?

fs/btrfs/send.c:load_other_segments() {
    ...
    ret = kexec_add_buffer(&kbuf);
    if (ret) {
        vfree(headers);
        goto out_err;
    }
    ...
}

The caller image_load() handles failure by adjusting boundaries and retrying
load_other_segments() in a while loop, passing the freed headers pointer
again.

If it succeeds on retry, the kexec segment would store a freed pointer.
Also, since image->elf_headers is assigned to headers prior to this
function, could this cause a double-free on ultimate failure when
arch_kimage_file_post_load_cleanup() executes?

>  		image->elf_load_addr = kbuf.mem;
> +		image->elf_headers_sz = kbuf.memsz;
>  
>  		kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>  			      image->elf_load_addr, kbuf.bufsz, kbuf.memsz);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260525084932.934910-1-ruanjinjie@huawei.com?part=17

next prev parent reply	other threads:[~2026-05-25 10:17 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25  8:49 [PATCH v14 00/17] arm64/riscv: Add support for crashkernel CMA reservation Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 01/17] riscv: kexec_file: Fix crashk_low_res not exclude bug Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 02/17] powerpc/crash: Fix possible memory leak in update_crash_elfcorehdr() Jinjie Ruan
2026-05-25  9:22   ` sashiko-bot
2026-05-26  3:14     ` Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 03/17] powerpc/crash: sort crash memory ranges before preparing elfcorehdr Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 04/17] arm64: kexec: Fix image->elf_headers memory leak during retry loop Jinjie Ruan
2026-05-25  9:11   ` sashiko-bot
2026-05-25  8:49 ` [PATCH v14 05/17] x86/kexec: Fix potential buffer overflow in prepare_elf_headers() Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 06/17] arm64: kexec_file: " Jinjie Ruan
2026-05-25 10:52   ` sashiko-bot
2026-05-25  8:49 ` [PATCH v14 07/17] riscv: " Jinjie Ruan
2026-05-25  9:54   ` sashiko-bot
2026-05-25  8:49 ` [PATCH v14 08/17] LoongArch: kexec: " Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 09/17] crash: Add crash_prepare_headers() to exclude crash kernel memory Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 10/17] arm64: kexec_file: Use crash_prepare_headers() helper to simplify code Jinjie Ruan
2026-05-25  9:33   ` sashiko-bot
2026-05-25  8:49 ` [PATCH v14 11/17] x86/kexec: " Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 12/17] riscv: kexec_file: " Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 13/17] LoongArch: kexec: " Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 14/17] crash: Use crash_exclude_core_ranges() on powerpc Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 15/17] arm64: kexec: Add support for crashkernel CMA reservation Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 16/17] riscv: " Jinjie Ruan
2026-05-25  8:49 ` [PATCH v14 17/17] arm64/crash: Add crash hotplug support Jinjie Ruan
2026-05-25 10:17   ` sashiko-bot [this message]
2026-05-25 10:14 ` [PATCH v14 00/17] arm64/riscv: Add support for crashkernel CMA reservation Huacai Chen
2026-05-25 11:36   ` Mike Rapoport
2026-05-25 13:17     ` Huacai Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260525101746.9959D1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=robh@kernel.org \
    --cc=ruanjinjie@huawei.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox