LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 14/16] powercap: intel_rapl: Use sysfs_emit() for cpumask show
From: Rafael J. Wysocki @ 2026-06-01 17:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Rasmus Villemoes, Russell King, Frank Li,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Chanwoo Choi, MyungJoo Ham, Kyungmin Park,
	Heiko Stuebner, Lorenzo Pieralisi, Xu Yilun, Tom Rix,
	Moritz Fischer, Yicong Yang, Jonathan Cameron, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Dan Williams, Vishal Verma,
	Dave Jiang, Ira Weiny, Bjorn Helgaas, Shuai Xue, Will Deacon,
	Jiucheng Xu, Neil Armstrong, Kevin Hilman, Jerome Brunet,
	Martin Blumenstingl, Robin Murphy, Jing Zhang, Xu Yang,
	Linu Cherian, Gowthami Thiagarajan, Ji Sheng Teoh, Khuong Dinh,
	Daniel Lezcano, Zhang Rui, Lukasz Luba, Yury Norov, Kees Cook,
	Thomas Weißschuh, Aboorva Devarajan, Ritesh Harjani (IBM),
	Ilkka Koskinen, Besar Wicaksono, Ma Ke, Chengwen Feng,
	linux-arm-kernel, imx, linux-kernel, linuxppc-dev,
	linux-perf-users, linux-acpi, driver-core, linux-pm,
	linux-rockchip, linux-fpga, linux-rdma, nvdimm, linux-pci,
	linux-amlogic, linux-cxl, linux-arm-msm
In-Reply-To: <20260528183625.870813-15-ynorov@nvidia.com>

On Thu, May 28, 2026 at 8:37 PM Yury Norov <ynorov@nvidia.com> wrote:
>
> cpumask_show() is a sysfs show callback. Use sysfs_emit() and
> cpumask_pr_args() to emit the mask.
>
> This prepares for removing cpumap_print_to_pagebuf().
>
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
> ---
>  drivers/powercap/intel_rapl_common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
> index a8dd02dff0a0..b38d4a7799a8 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -1441,7 +1441,7 @@ static ssize_t cpumask_show(struct device *dev,
>         }
>         cpus_read_unlock();
>
> -       ret = cpumap_print_to_pagebuf(true, buf, cpu_mask);
> +       ret = sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(cpu_mask));
>
>         free_cpumask_var(cpu_mask);
>
> --

Applied (with adjusted subject and changelog) as 7.2 material, thanks!


^ permalink raw reply

* Re: [PATCH 05/16] ACPI: pad: Use sysfs_emit() for idlecpus show
From: Rafael J. Wysocki @ 2026-06-01 17:45 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andrew Morton, Rasmus Villemoes, Russell King, Frank Li,
	Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	Danilo Krummrich, Chanwoo Choi, MyungJoo Ham, Kyungmin Park,
	Heiko Stuebner, Lorenzo Pieralisi, Xu Yilun, Tom Rix,
	Moritz Fischer, Yicong Yang, Jonathan Cameron, Dennis Dalessandro,
	Jason Gunthorpe, Leon Romanovsky, Dan Williams, Vishal Verma,
	Dave Jiang, Ira Weiny, Bjorn Helgaas, Shuai Xue, Will Deacon,
	Jiucheng Xu, Neil Armstrong, Kevin Hilman, Jerome Brunet,
	Martin Blumenstingl, Robin Murphy, Jing Zhang, Xu Yang,
	Linu Cherian, Gowthami Thiagarajan, Ji Sheng Teoh, Khuong Dinh,
	Daniel Lezcano, Zhang Rui, Lukasz Luba, Yury Norov, Kees Cook,
	Thomas Weißschuh, Aboorva Devarajan, Ritesh Harjani (IBM),
	Ilkka Koskinen, Besar Wicaksono, Ma Ke, Chengwen Feng,
	linux-arm-kernel, imx, linux-kernel, linuxppc-dev,
	linux-perf-users, linux-acpi, driver-core, linux-pm,
	linux-rockchip, linux-fpga, linux-rdma, nvdimm, linux-pci,
	linux-amlogic, linux-cxl, linux-arm-msm
In-Reply-To: <20260528183625.870813-6-ynorov@nvidia.com>

On Thu, May 28, 2026 at 8:36 PM Yury Norov <ynorov@nvidia.com> wrote:
>
> idlecpus_show() is a sysfs show callback. Use sysfs_emit() and
> cpumask_pr_args() to emit the mask.
>
> This prepares for removing cpumap_print_to_pagebuf().
>
> Signed-off-by: Yury Norov <ynorov@nvidia.com>
> ---
>  drivers/acpi/acpi_pad.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
> index ec94b09bb747..04d61a6cc95f 100644
> --- a/drivers/acpi/acpi_pad.c
> +++ b/drivers/acpi/acpi_pad.c
> @@ -334,8 +334,8 @@ static ssize_t idlecpus_store(struct device *dev,
>  static ssize_t idlecpus_show(struct device *dev,
>         struct device_attribute *attr, char *buf)
>  {
> -       return cpumap_print_to_pagebuf(false, buf,
> -                                      to_cpumask(pad_busy_cpus_bits));
> +       return sysfs_emit(buf, "%*pb\n",
> +                         cpumask_pr_args(to_cpumask(pad_busy_cpus_bits)));
>  }
>
>  static DEVICE_ATTR_RW(idlecpus);
> --

Applied (with a tweaked subject) as 7.2 material, thanks!


^ permalink raw reply

* Re: [axboe:for-7.0/block 19/138] block/blk-crypto-fallback.c:154:17: sparse: sparse: cast from restricted blk_status_t
From: Christoph Hellwig @ 2026-06-01 15:29 UTC (permalink / raw)
  To: kernel test robot
  Cc: Christoph Hellwig, oe-kbuild-all, Jens Axboe, Eric Biggers,
	Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy (CS GROUP), linuxppc-dev
In-Reply-To: <202606012339.rZAl6Xq8-lkp@intel.com>

I think powerpc really need to pick the tricks from other architectures
to make cmpxchg work on __bitwise fields.

On Mon, Jun 01, 2026 at 11:26:16PM +0800, kernel test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git for-7.0/block
> head:   72f4d6fca699a1e35b39c5e5dacac2926d254135
> commit: b37fbce460ad60b0c4449c1c7566cf24f3016713 [19/138] blk-crypto: optimize bio splitting in blk_crypto_fallback_encrypt_bio
> config: powerpc64-randconfig-r111-20260601 (https://download.01.org/0day-ci/archive/20260601/202606012339.rZAl6Xq8-lkp@intel.com/config)
> compiler: powerpc64-linux-gcc (GCC) 8.5.0
> sparse: v0.6.5-rc1
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260601/202606012339.rZAl6Xq8-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202606012339.rZAl6Xq8-lkp@intel.com/
> 
> sparse warnings: (new ones prefixed by >>)
> >> block/blk-crypto-fallback.c:154:17: sparse: sparse: cast from restricted blk_status_t
> >> block/blk-crypto-fallback.c:154:17: sparse: sparse: cast from restricted blk_status_t
>    block/blk-crypto-fallback.c:154:17: sparse: sparse: cast to restricted blk_status_t
> 
> vim +154 block/blk-crypto-fallback.c
> 
>    143	
>    144	static void blk_crypto_fallback_encrypt_endio(struct bio *enc_bio)
>    145	{
>    146		struct bio *src_bio = enc_bio->bi_private;
>    147		int i;
>    148	
>    149		for (i = 0; i < enc_bio->bi_vcnt; i++)
>    150			mempool_free(enc_bio->bi_io_vec[i].bv_page,
>    151				     blk_crypto_bounce_page_pool);
>    152	
>    153		if (enc_bio->bi_status)
>  > 154			cmpxchg(&src_bio->bi_status, 0, enc_bio->bi_status);
>    155	
>    156		bio_put(enc_bio);
>    157		bio_endio(src_bio);
>    158	}
>    159	
> 
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
---end quoted text---


^ permalink raw reply

* Re: [PATCH v15 00/23] arm64/riscv: Add support for crashkernel CMA reservation
From: Baoquan He @ 2026-06-01 13:40 UTC (permalink / raw)
  To: Jinjie Ruan
  Cc: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Hi Jinjie,

On 06/01/26 at 05:47pm, Jinjie Ruan wrote:
...snip... 
> Changes in v15:
> - Unify the subject prefix formats as Huacai suggested.
> - Fix powerpc pre-existing NULL pointer dereference [Sashiko [1]]
> - Fix powerpc pre-existing __merge_memory_ranges() memory range
>   truncation [Sashiko [1]].
> - Fix pre-existing arm64 CMA page leaks [Sashiko[2]].
> - Fix pre-existing crash_load_dm_crypt_keys() Use-After-Free and
>   Double Free issue [Sashiko[3]].
> - Fix vfree(headers) and uninitialized variables issue
>   and simplify the fix [Sashiko[2]].
> - As walk_system_ram_res() and for_each_mem_range() use different
>   lock, unify and simplify the fix of TOCTOU buffer overflow via memory
>   region padding [Sashiko[4]].
> - Fix the arm64 crash dump issues in Sashiko[5].
> - Link to v14: https://lore.kernel.org/all/20260525084932.934910-1-ruanjinjie@huawei.com/

Do these Fixes have anything with the main target of this patch series
you mentioned in cover-letter:"arm64/riscv: Add support for crashkernel CMA"?
The patches become more and more in each new version, I am wondering if
it relies on these Fixes patches to implement your adding support for
crashkernel CMA on arm64/risc-v.

If not relying on them, could you split them into different patchset
on different purpose? 

Thanks
Baoquan

> 
> [1]: https://lore.kernel.org/all/20260525092207.96B9D1F000E9@smtp.kernel.org/
> [2]: https://lore.kernel.org/all/20260525091149.1A1E01F00A3D@smtp.kernel.org/
> [3]: https://lore.kernel.org/all/20260525105227.3C2421F000E9@smtp.kernel.org/
> [4]: https://lore.kernel.org/all/20260525095447.944E11F000E9@smtp.kernel.org/
> [5]: https://lore.kernel.org/all/20260525101746.9959D1F000E9@smtp.kernel.org/
> 
> Changes in v14:
> - Fix image->elf_headers memory leak during retry loop for arm64 as Sashiko
>   AI code review pointed out.
> - Solve the hotplug notifier arch_crash_handle_hotplug_event() AA
>   self-deadlock problem as Sashiko AI code review pointed out.
> - Fix the TOCTOU issue in prepare_elf_headers() by get_online_mems().
> - -ENOMEM -> -EAGAIN as Breno suggested.
> - Add support for arm64 crash hotplug.
> - Link to v13: https://lore.kernel.org/all/20260511030454.1730881-1-ruanjinjie@huawei.com/
> 
> Changes in v13:
> - Rebased on v7.1-rc1.
> - Update the commit message.
> - Add Reviewed-by.
> - Link to v12: https://lore.kernel.org/all/20260402072701.628293-1-ruanjinjie@huawei.com/
> 
> Changes in v12:
> - Remove the unused "nr_mem_ranges" for x86.
> - Add "Fix crashk_low_res not exclude bug" test log.
> - Provide a separate patch for each architecture for using
>   crash_prepare_headers(), which will make the review more convenient.
> - Add Reviewed-by and Tested-by.
> - Link to v11: https://lore.kernel.org/all/20260328074013.3589544-1-ruanjinjie@huawei.com/
> 
> Changes in v11:
> - Avoid silently drop crash memory if the crash kernel is built without
>   CONFIG_CMA.
> - Remove unnecessary "cmem->nr_ranges = 0" for arch_crash_populate_cmem()
>   as we use kvzalloc().
> - Provide a separate patch for each architecture to fix the existing
>   buffer overflow issue.
> - Add Acked-bys for arm64.
> 
> Changes in v10:
> - Fix crashk_low_res not excluded bug in the existing
>   RISC-V code.
> - Fix an existing memory leak issue in the existing PowerPC code.
> - Fix the ordering issue of adding CMA ranges to
>   "linux,usable-memory-range".
> - Fix an existing concurrency issue. A Concurrent memory hotplug may occur
>   between reading memblock and attempting to fill cmem during kexec_load()
>   for almost all existing architectures.
> - Link to v9: https://lore.kernel.org/all/20260323072745.2481719-1-ruanjinjie@huawei.com/
> 
> Changes in v9:
> - Collect Reviewed-by and Acked-by, and prepare for Sashiko AI review.
> - Link to v8: https://lore.kernel.org/all/20260302035315.3892241-1-ruanjinjie@huawei.com/
> 
> Changes in v8:
> - Fix the build issues reported by kernel test robot and Sourabh.
> - Link to v7: https://lore.kernel.org/all/20260226130437.1867658-1-ruanjinjie@huawei.com/
> 
> Changes in v7:
> - Correct the inclusion of CMA-reserved ranges for kdump kernel in of/kexec
>   for arm64 and riscv.
> - Add Acked-by.
> - Link to v6: https://lore.kernel.org/all/20260224085342.387996-1-ruanjinjie@huawei.com/
> 
> Changes in v6:
> - Update the crash core exclude code as Mike suggested.
> - Rebased on v7.0-rc1.
> - Add acked-by.
> - Link to v5: https://lore.kernel.org/all/20260212101001.343158-1-ruanjinjie@huawei.com/
> 
> Jinjie Ruan (22):
>   riscv: kexec_file: Fix crashk_low_res not exclude bug
>   powerpc/crash: Fix possible memory leak in update_crash_elfcorehdr()
>   powerpc/kexec_file: Fix NULL pointer dereference in
>     kexec_extra_fdt_size_ppc64()
>   powerpc/kexec_file: Fix memory range truncation in
>     __merge_memory_ranges()
>   kexec: Extract kexec_free_segment_cma() from kimage_free_cma()
>   arm64: kexec_file: Fix CMA page leaks during segment placement retry
>     loops
>   arm64: kexec_file: Fix image->elf_headers memory leak during retry
>     loop
>   kexec: Fix UAF and Double Free in crash_load_dm_crypt_keys()
>   crash_core: Introduce CRASH_HOTPLUG_SAFETY_PADDING for memory hotplug
>     safety
>   x86: kexec_file: Fix TOCTOU buffer overflow via memory region padding
>   arm64: kexec_file: Fix TOCTOU buffer overflow via memory region
>     padding
>   riscv: kexec_file: Fix TOCTOU buffer overflow via memory region
>     padding
>   LoongArch: kexec_file: Fix TOCTOU buffer overflow via memory region
>     padding
>   crash: Add crash_prepare_headers() to exclude crash kernel memory
>   arm64: kexec_file: Use crash_prepare_headers() helper to simplify code
>   x86: kexec_file: Use crash_prepare_headers() helper to simplify code
>   riscv: kexec_file: Use crash_prepare_headers() helper to simplify code
>   LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify
>     code
>   powerpc/kexec_file: Use crash_exclude_core_ranges() helper
>   arm64: kexec_file: Add support for crashkernel CMA reservation
>   riscv: kexec_file: Add support for crashkernel CMA reservation
>   arm64: crash: Add crash hotplug support
> 
> Sourabh Jain (1):
>   powerpc/crash: sort crash memory ranges before preparing elfcorehdr
> 
>  .../admin-guide/kernel-parameters.txt         |  16 +-
>  arch/arm64/Kconfig                            |   3 +
>  arch/arm64/include/asm/kexec.h                |  13 ++
>  arch/arm64/kernel/Makefile                    |   2 +-
>  arch/arm64/kernel/crash.c                     | 152 ++++++++++++++++++
>  arch/arm64/kernel/kexec_image.c               |  34 ++++
>  arch/arm64/kernel/machine_kexec_file.c        |  78 ++-------
>  arch/arm64/mm/init.c                          |   5 +-
>  arch/loongarch/kernel/machine_kexec_file.c    |  44 ++---
>  arch/powerpc/include/asm/kexec_ranges.h       |   1 -
>  arch/powerpc/kexec/crash.c                    |   7 +-
>  arch/powerpc/kexec/file_load_64.c             |   3 +
>  arch/powerpc/kexec/ranges.c                   | 113 ++-----------
>  arch/riscv/kernel/machine_kexec_file.c        |  43 ++---
>  arch/riscv/mm/init.c                          |   5 +-
>  arch/x86/kernel/crash.c                       |  92 ++---------
>  drivers/of/fdt.c                              |   9 +-
>  drivers/of/kexec.c                            |   9 ++
>  include/linux/crash_core.h                    |  15 ++
>  include/linux/crash_reserve.h                 |   4 +-
>  include/linux/kexec.h                         |   2 +
>  kernel/crash_core.c                           |  89 +++++++++-
>  kernel/crash_dump_dm_crypt.c                  |   4 +-
>  kernel/kexec_core.c                           |  25 +--
>  24 files changed, 430 insertions(+), 338 deletions(-)
>  create mode 100644 arch/arm64/kernel/crash.c
> 
> -- 
> 2.34.1
> 


^ permalink raw reply

* Re: [PATCH] powerpc/prom: Remove redundant early_init_dt_scan_root() call
From: Shivang Upadhyay @ 2026-06-01 13:40 UTC (permalink / raw)
  To: Sourabh Jain
  Cc: linuxppc-dev, Aditya Gupta, Christophe Leroy (CS GROUP),
	Hari Bathini, Madhavan Srinivasan, Mahesh Salgaonkar,
	Michael Ellerman, Nicholas Piggin, Ritesh Harjani (IBM),
	Venkat Rao Bagalkote, linux-kernel
In-Reply-To: <20260418091250.134111-1-sourabhjain@linux.ibm.com>

On Sat, Apr 18, 2026 at 02:42:50PM +0530, Sourabh Jain wrote:
> Commit 554b66233623 ("of/fdt: Scan the root node properties earlier")
> moved the invocation of early_init_dt_scan_root() into
> early_init_dt_verify().
> 
> early_init_devtree() already calls early_init_dt_verify(), so the root
> node properties are parsed before reaching the explicit call in this
> function.
> 
> Keeping the call here results in scanning the root node twice. Remove
> the redundant call and rely on the invocation from
> early_init_dt_verify().
> 
> This change keeps the behavior the same and removes an unnecessary
> duplicate call.
> 
> Cc: Aditya Gupta <adityag@linux.ibm.com>
> Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
> Cc: Hari Bathini <hbathini@linux.ibm.com>
> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
> Cc: Shivang Upadhyay <shivangu@linux.ibm.com>
> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> ---
>  arch/powerpc/kernel/prom.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 9ed9dde7d231..d218c8cc1f73 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -824,7 +824,6 @@ void __init early_init_devtree(void *params)
>  	fadump_append_bootargs();
>  
>  	/* Scan memory nodes and rebuild MEMBLOCKs */
> -	early_init_dt_scan_root();
>  	early_init_dt_scan_memory_ppc();
>  
>  	/*
> -- 
> 2.52.0
> 

Hi Sourabh,

Patch Looks good to me. Ran a quick boot test also.
Feel free to add.

Tested-by: Shivang Upadhyay <shivangu@linux.ibm.com>

~Shivang.


^ permalink raw reply

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
From: Venkat @ 2026-06-01 13:33 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Madhavan Srinivasan, Mukesh Kumar Chaurasiya, Ritesh Harjani,
	linuxppc-dev, LKML, Srikar Dronamraju, Peter Zijlstra
In-Reply-To: <2f8c3d75-de2c-48bf-bd05-46b816d55c69@linux.ibm.com>



> On 1 Jun 2026, at 2:46 PM, Shrikanth Hegde <sshegde@linux.ibm.com> wrote:
> 
> Hi Venkat. Thanks for the report.
> 
> + mukesh, ritesh
> 
> On 6/1/26 12:11 PM, Venkat Rao Bagalkote wrote:
>> Greetings!!!
>> I hit a kernel BUG on a linux-next kernel running on ppc64le (Power11 LPAR). The issue was observed once in CI (Avocado tests) and I haven’t been able to reproduce it reliably yet.
> 
> Can you run with lockdep and see if you can hit it?

I did run with lockdep, I didn’t hit this issue. Though I will try few more times, If I hit, I will respond back here.

Meanwhile, I hit another boot warning with lockdep enabled, which is reproted here [1].

[1]: https://lore.kernel.org/all/b44aefc5-e066-478b-8d34-50d2d0deab6b@linux.ibm.com/

Regards,
Venkat.
> 
>> Architecture: ppc64le (Power11, pSeries)
>> Kernel: 7.1.0-rc5-next-20260529
>> Config: PREEMPT(lazy)
>> CPUs: large system (NR_CPUS=8192)
> 
> This is with GENERIC_ENTRY.
> 
>> So far, I have not reproduced the crash, but I am trying to stress similar conditions using:
>> parallel read workloads (fio / dd)
>> memory pressure
>> Traces:
>>  (5/8) /home/upstreamci/avocado-fvt-wrapper/tests/avocado-misc-tests/ cpu/ppc64_cpu_test.py:PPC64Test.test_smt_loop;run-run_type- upstream-9cfe: STARTED
>> [ 1885.176400] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1885.296164] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1885.386120] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1885.556134] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1886.576119] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1886.806060] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1887.026051] crash hp: kexec_trylock() failed, kdump image may be inaccurate
>> [ 1887.456075] ------------[ cut here ]------------
>> [ 1887.456101] kernel BUG at kernel/sched/core.c:7512!
>> [ 1887.456107] Oops: Exception in kernel mode, sig: 5 [#1]
>> [ 1887.456111] LE PAGE_SIZE=4K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
>> [ 1887.456116] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding tls ip_set rfkill nf_tables fsdev_dax kmem device_dax pseries_rng vmx_crypto dax_pmem fuse ext4 crc16 mbcache jbd2 sd_mod nd_pmem papr_scm sg libnvdimm ibmvscsi ibmveth scsi_transport_srp pseries_wdt
>> [ 1887.456173] CPU: 28 UID: 0 PID: 85305 Comm: kexec Not tainted 7.1.0- rc5-next-20260529 #1 PREEMPT(lazy)
>> [ 1887.456180] Hardware name: IBM,9080-HEX Power11 (architected) 0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
>> [ 1887.456185] NIP:  c0000000013a8e8c LR: c0000000003483bc CTR: 0000000000000000
>> [ 1887.456190] REGS: c000000069f03070 TRAP: 0700   Not tainted (7.1.0- rc5-next-20260529)
>> [ 1887.456195] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24428222  XER: 0000005a
>> [ 1887.456208] CFAR: c0000000003483b8 IRQMASK: 0
>> [ 1887.456208] GPR00: c0000000003483bc c000000069f03330 c000000001a82100 c000000069f033e0
>> [ 1887.456208] GPR04: 0000000000000000 0000000000000001 0000000000000001 c000000006dd3b00
>> [ 1887.456208] GPR08: ffffffffffffff00 0000000000000001 0000000000000000 0000000024428220
>> [ 1887.456208] GPR12: 0000000000000300 c000000effdbef00 0000000000000000 0000000000000000
>> [ 1887.456208] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456208] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456208] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456208] GPR28: 0000000000000000 0000000000000000 0000000000000000 c000000069f033e0
>> [ 1887.456265] NIP [c0000000013a8e8c] preempt_schedule_irq+0x44/0x118
>> [ 1887.456274] LR [c0000000003483bc] dynamic_irqentry_exit_cond_resched+0x40/0x1a4
>> [ 1887.456282] Call Trace:
>> [ 1887.456284] [c000000069f03360] [c0000000003483bc] dynamic_irqentry_exit_cond_resched+0x40/0x1a4
>> [ 1887.456291] [c000000069f03380] [c00000000014f3bc] do_page_fault+0xc0/0x104
>> [ 1887.456298] [c000000069f033b0] [c000000000008be0] data_access_common_virt+0x210/0x220
>> [ 1887.456306] ---- interrupt: 300 at __copy_tofrom_user_base+0xac/0x5a4
>> [ 1887.456313] NIP:  c00000000017fc38 LR: c000000000aaa684 CTR: 0000000000000000
>> [ 1887.456317] REGS: c000000069f033e0 TRAP: 0300   Not tainted (7.1.0- rc5-next-20260529)
>> [ 1887.456322] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24428220  XER: 2004005a
>> [ 1887.456334] CFAR: c00000000017fc34 DAR: 00003fff879a8000 DSISR: 42000000 IRQMASK: 0
>> [ 1887.456334] GPR00: 0000000000000000 c000000069f036a0 c000000001a82100 00003fff879a8000
>> [ 1887.456334] GPR04: c0000000bb314ff0 0000000000001000 69f0000606480600 0200c4080368f028
>> [ 1887.456334] GPR08: 09036af00005d9c4 0600000200e80803 0000000000000000 0000000000000030
>> [ 1887.456334] GPR12: 0000000000000040 c000000effdbef00 0000000000000000 000000000000000e
>> [ 1887.456334] GPR16: 0000000004a00000 000000000000001f c000000069f038a0 c00000006e73e500
>> [ 1887.456334] GPR20: c00000006f0ff6a8 0000000000000000 c00000006f0ff540 0000000000000001
>> [ 1887.456334] GPR24: 000000001816ce60 c0000000bb314000 c000000002e48730 c000000069f03a30
>> [ 1887.456334] GPR28: c0000000bb314000 00003fff879a7010 0000000000000010 0000000000001000
>> [ 1887.456393] NIP [c00000000017fc38] __copy_tofrom_user_base+0xac/0x5a4
>> [ 1887.456399] LR [c000000000aaa684] raw_copy_to_user+0x12c/0x314
>> [ 1887.456405] ---- interrupt: 300
>> [ 1887.456408] [c000000069f036a0] [c000000000aaa5f4] raw_copy_to_user+0x9c/0x314 (unreliable)
>> [ 1887.456416] [c000000069f036e0] [c000000000aacd08] _copy_to_iter+0xe4/0x79c
>> [ 1887.456423] [c000000069f037a0] [c000000000ab01ec] copy_page_to_iter+0xd4/0x1a4
>> [ 1887.456429] [c000000069f037f0] [c0000000005ddc34] filemap_read+0x420/0x4f0
>> [ 1887.456436] [c000000069f039c0] [c0080000043443e0] ext4_file_read_iter+0x78/0x31c [ext4]
>> [ 1887.456517] [c000000069f03a10] [c000000000796498] vfs_read+0x2a8/0x3c8
>> [ 1887.456524] [c000000069f03ac0] [c00000000079726c] ksys_read+0x88/0x140
>> [ 1887.456530] [c000000069f03b10] [c000000000032f98] system_call_exception+0x198/0x4e0
>> [ 1887.456537] [c000000069f03e30] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
>> [ 1887.456544] ---- interrupt: 3000 at 0x3fff9b133cf4
>> [ 1887.456549] NIP:  00003fff9b133cf4 LR: 00003fff9b133cf4 CTR: 0000000000000000
>> [ 1887.456554] REGS: c000000069f03e60 TRAP: 3000   Not tainted (7.1.0- rc5-next-20260529)
>> [ 1887.456558] MSR:  800000000000f033 <SF,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 44424402  XER: 00000000
>> [ 1887.456572] IRQMASK: 0
>> [ 1887.456572] GPR00: 0000000000000003 00003fffe5fb4190 0000000105087f00 0000000000000003
>> [ 1887.456572] GPR04: 00003fff82e93010 000000001816ce60 0000000000000022 0000000000000000
>> [ 1887.456572] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [ 1887.456572] GPR12: 0000000000000000 00003fff9b4cd860 000000010507f588 0000000000000000
>> [ 1887.456572] GPR16: ffffffffffffffff 0000000000000000 0000000000000006 0000000000000000
>> [ 1887.456572] GPR20: 0000000000000001 00003fff9b23039c 00003fff9b2303a0 00003fffe5fb5ee7
>> [ 1887.456572] GPR24: 0000000000000000 0000000000000000 00003fffe5fb5ee7 00003fffe5fb42d0
>> [ 1887.456572] GPR28: 0000000000000003 00003fff82e93010 000000001816ce60 0000000000000000
>> [ 1887.456626] NIP [00003fff9b133cf4] 0x3fff9b133cf4
>> [ 1887.456630] LR [00003fff9b133cf4] 0x3fff9b133cf4
>> [ 1887.456634] ---- interrupt: 3000
>> [ 1887.456637] Code: fbe1fff8 e92d0128 f8010010 f821ffd1 81490000 39200001 2c0a0000 40820014 892d0152 552907fe 7d290034 5529d97e <0b090000> 60000000 3bc00000 ebed0128
>> [ 1887.456657] ---[ end trace 0000000000000000 ]---
>> If you happen to fix this, please add below tag.
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> 
> Ritesh, Mukesh, Is below possible scenario?
> 
> do_page_fault seems to enable irq's in the interrupt handler?
> is that expected? if so, one might see
> 
> -- do_page_fault (enter kernel mode)
>   -- enables interrupts
>   -- gets interrupt - Sets need_resched.
>      -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
> and calls preempt_schedule_irq, which catches both
> preempt_count and !irqs_disabled. Hence the panic?
> 
> Should do_page_fault do preempt_disable when it enables the interrupts?



^ permalink raw reply

* [next20260529]powerpc/pseries: multiple WARNs: RCU not watching for tracepoint and lockdep_hardirq_context() during boot and cpuidle (Power11)
From: Venkat Rao Bagalkote @ 2026-06-01 13:27 UTC (permalink / raw)
  To: Peter Zijlstra, Shrikanth Hegde, Srikar Dronamraju,
	Mukesh Kumar Chaurasiya (IBM), Madhavan Srinivasan
  Cc: LKML, linuxppc-dev, Paul E. McKenney, Ingo Molnar

Greetings!!!


I am observing multiple reproducible WARN_ONs related to RCU and lockdep 
IRQ state tracking on a Power11 pSeries system when running on latest 
linux-next kernel.


Environment:
   Architecture: ppc64le (Power11, pSeries LPAR)
   Kernel: 7.1.0-rc5-next-20260529
   Config: PREEMPT(lazy)
           CONFIG_LOCKDEP=y
           CONFIG_PROVE_LOCKING=y


Warning1:

[    0.008277] ------------[ cut here ]------------
[    0.008285] RCU not watching for tracepoint
[    0.008294] WARNING: ./include/trace/events/preemptirq.h:36 at 
trace_hardirqs_off+0x16c/0x1a0, CPU#1: swapper/1/0
[    0.008306] Modules linked in:
[    0.008316] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 
7.1.0-rc5-next-20260529 #1 PREEMPT(lazy)
[    0.008322] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[    0.008327] NIP:  c0000000004bb2a8 LR: c0000000004bb2a4 CTR: 
0000000000000000
[    0.008331] REGS: c0000000049cb690 TRAP: 0700   Not tainted 
(7.1.0-rc5-next-20260529)
[    0.008336] MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 44000208  
XER: 00000005
[    0.008348] CFAR: c00000000022e9d4 IRQMASK: 3
[    0.008348] GPR00: c0000000004bb2a4 c0000000049cb950 c000000001ccf100 
000000000000001f
[    0.008348] GPR04: 3fffffffffff7fff c0000000049cb740 c0000000049cb738 
0000000000000000
[    0.008348] GPR08: c0000000029d1230 0000000000000001 c0000000049e8000 
0000000000000003
[    0.008348] GPR12: c000000002d514e0 c000000effffeb00 0000000000000000 
0000000000000000
[    0.008348] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.008348] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.008348] GPR24: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.008348] GPR28: 0000000000000000 0000000000000001 c000000002414988 
c00000000005be40
[    0.008403] NIP [c0000000004bb2a8] trace_hardirqs_off+0x16c/0x1a0
[    0.008408] LR [c0000000004bb2a4] trace_hardirqs_off+0x168/0x1a0
[    0.008413] Call Trace:
[    0.008416] [c0000000049cb950] [c0000000004bb2a4] 
trace_hardirqs_off+0x168/0x1a0 (unreliable)
[    0.008423] [c0000000049cb9d0] [c00000000005be40] 
arch_interrupt_enter_prepare+0xa0/0x19c
[    0.008430] [c0000000049cba00] [c00000000005bf78] 
doorbell_exception+0x3c/0x4c4
[    0.008436] [c0000000049cbaa0] [c00000000000a2fc] 
doorbell_super_common_virt+0x28c/0x290
[    0.008443] ---- interrupt: a00 at plpar_hcall_norets_notrace+0x18/0x2c
[    0.008449] NIP:  c0000000001b4fc8 LR: c0000000001bcea0 CTR: 
0000000000000000
[    0.008453] REGS: c0000000049cbad0 TRAP: 0a00   Not tainted 
(7.1.0-rc5-next-20260529)
[    0.008457] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 
24000008  XER: 00000000
[    0.008469] CFAR: 0000000000000000 IRQMASK: 0
[    0.008469] GPR00: 0000000000000000 c0000000049cbd90 c000000001ccf100 
0000000000000000
[    0.008469] GPR04: 0000000000000000 8004000038407c10 0000000000000000 
0000000000000003
[    0.008469] GPR08: 0000000000000001 0000000000000000 0000000000000090 
0000000000000001
[    0.008469] GPR12: 8004000038407c00 c000000effffeb00 0000000000000000 
000000002ef01820
[    0.008469] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.008469] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000001
[    0.008469] GPR24: 0000000000000001 000000000000dedc c000000003086150 
0000000000000001
[    0.008469] GPR28: c0000000049e8000 c000000002241548 c000000002241550 
c000000002241548
[    0.008523] NIP [c0000000001b4fc8] plpar_hcall_norets_notrace+0x18/0x2c
[    0.008528] LR [c0000000001bcea0] pseries_lpar_idle.part.0+0x74/0x160
[    0.008533] ---- interrupt: a00
[    0.008536] [c0000000049cbd90] [c0000000049cbe30] 0xc0000000049cbe30 
(unreliable)
[    0.008544] [c0000000049cbe10] [c000000000022c5c] 
arch_cpu_idle+0x4c/0x120
[    0.008551] [c0000000049cbe30] [c0000000015afe70] 
default_idle_call+0x154/0x454
[    0.008558] [c0000000049cbec0] [c0000000002d3dfc] 
cpuidle_idle_call+0x2dc/0x2e0
[    0.008565] [c0000000049cbf10] [c0000000002d3f48] do_idle+0x148/0x1f0
[    0.008571] [c0000000049cbf60] [c0000000002d43c8] 
cpu_startup_entry+0x4c/0x50
[    0.008578] [c0000000049cbf90] [c00000000006371c] 
start_secondary+0x27c/0x28c
[    0.008585] [c0000000049cbfe0] [c00000000000e258] 
start_secondary_prolog+0x10/0x14
[    0.008590] Code: 4bfffcc4 60000000 3d220132 8929db46 2c090000 
4082ff94 3c62ffd6 3d220132 3863d398 9ba9db46 4bd73655 60000000 
<0fe00000> 60000000 4bffff74 60000000
[    0.008611] irq event stamp: 20
[    0.008614] hardirqs last  enabled at (19): [<c0000000002d3dfc>] 
cpuidle_idle_call+0x2dc/0x2e0
[    0.008620] hardirqs last disabled at (20): [<c00000000005be40>] 
arch_interrupt_enter_prepare+0xa0/0x19c
[    0.008625] softirqs last  enabled at (0): [<c00000000022b6ac>] 
copy_process+0xb24/0x1dec
[    0.008632] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.008637] ---[ end trace 0000000000000000 ]---


Warning2:

[    0.010098] ------------[ cut here ]------------
[    0.010103] DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context())
[    0.010107] WARNING: kernel/locking/lockdep.c:4406 at 
lockdep_hardirqs_on_prepare+0x22c/0x2d4, CPU#0: swapper/0/1
[    0.010116] Modules linked in:
[    0.010120] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G     W    
        7.1.0-rc5-next-20260529 #1 PREEMPT(lazy)
[    0.010125] Tainted: [W]=WARN
[    0.010127] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[    0.010131] NIP:  c00000000031549c LR: c000000000315498 CTR: 
0000000000000000
[    0.010135] REGS: c0000000045bf100 TRAP: 0700   Tainted: G   W        
     (7.1.0-rc5-next-20260529)
[    0.010139] MSR:  8000000002021033 <SF,VEC,ME,IR,DR,RI,LE>  CR: 
44044228  XER: 00000005
[    0.010147] CFAR: c00000000022e9d4 IRQMASK: 3
[    0.010147] GPR00: c000000000315498 c0000000045bf3c0 c000000001ccf100 
000000000000002e
[    0.010147] GPR04: 3fffffffffff7fff c0000000045bf1b0 c0000000045bf1a8 
0000000000000000
[    0.010147] GPR08: c0000000029d1230 0000000000010002 c0000000048b2b00 
0000000000000003
[    0.010147] GPR12: c000000002d514e0 c000000003ea1000 c000000000011ae4 
0000000000000000
[    0.010147] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    0.010147] GPR20: 0000000000000000 0000000000000004 c000000000272684 
c0000000029bb0c0
[    0.010147] GPR24: 0000000ebc171000 c000000ebeb63850 c000000003084d00 
c00000000308b2d0
[    0.010147] GPR28: c00000000002a488 0000000000000001 0000000000000000 
c000000002246e08
[    0.010188] NIP [c00000000031549c] 
lockdep_hardirqs_on_prepare+0x22c/0x2d4
[    0.010192] LR [c000000000315498] lockdep_hardirqs_on_prepare+0x228/0x2d4
[    0.010196] Call Trace:
[    0.010198] [c0000000045bf3c0] [c000000000315498] 
lockdep_hardirqs_on_prepare+0x228/0x2d4 (unreliable)
[    0.010204] [c0000000045bf430] [c0000000004bb778] 
trace_hardirqs_on+0xec/0x1b0
[    0.010209] [c0000000045bf4b0] [c0000000015ad574] 
irqentry_exit+0x58c/0xe1c
[    0.010213] [c0000000045bf540] [c00000000002a488] 
timer_interrupt+0x210/0x564
[    0.010219] [c0000000045bf5f0] [c00000000003b960] 
__replay_soft_interrupts+0x14c/0x374
[    0.010224] [c0000000045bf7d0] [c00000000003bd74] 
arch_local_irq_restore.part.0+0x1ec/0x224
[    0.010230] [c0000000045bf810] [c0000000015c17d4] 
_raw_spin_unlock_irqrestore+0x54/0xac
[    0.010235] [c0000000045bf840] [c0000000002cdd54] 
set_user_nice+0x110/0x220
[    0.010240] [c0000000045bf8e0] [c000000000266a94] 
create_worker+0x13c/0x310
[    0.010245] [c0000000045bf9a0] [c0000000002726f4] 
workqueue_prepare_cpu+0x70/0xe4
[    0.010251] [c0000000045bf9e0] [c000000000232604] 
cpuhp_invoke_callback+0x1e8/0x3c0
[    0.010256] [c0000000045bfa50] [c000000000232924] 
__cpuhp_invoke_callback_range+0x148/0x230
[    0.010261] [c0000000045bfaf0] [c000000000234f68] _cpu_up+0x19c/0x3cc
[    0.010265] [c0000000045bfbb0] [c00000000023533c] cpu_up+0x1a4/0x1f4
[    0.010269] [c0000000045bfc40] [c00000000203d1f4] 
bringup_nonboot_cpus+0xbc/0x128
[    0.010275] [c0000000045bfca0] [c00000000204b98c] smp_init+0x44/0xd0
[    0.010279] [c0000000045bfd00] [c000000002006d4c] 
kernel_init_freeable+0x23c/0x3b0
[    0.010284] [c0000000045bfdc0] [c000000000011b0c] kernel_init+0x30/0x274
[    0.010288] [c0000000045bfe30] [c00000000000debc] 
ret_from_kernel_user_thread+0x14/0x1c
[    0.010292] ---- interrupt: 0 at 0x0
[    0.010296] Code: 4182ff74 3d22013c 3929c1d4 81290000 2c090000 
4082ff60 3c82ffda 3c62ffd9 3884f998 38634590 4bf19461 60000000 
<0fe00000> 4bffff40 60000000 60000000
[    0.010310] irq event stamp: 7440
[    0.010312] hardirqs last  enabled at (7439): [<c0000000015c1824>] 
_raw_spin_unlock_irqrestore+0xa4/0xac
[    0.010317] hardirqs last disabled at (7440): [<c00000000003bc30>] 
arch_local_irq_restore.part.0+0xa8/0x224
[    0.010323] softirqs last  enabled at (0): [<c00000000022b6ac>] 
copy_process+0xb24/0x1dec
[    0.010328] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.010331] ---[ end trace 0000000000000000 ]---



Warning3:

[    1.718239] ------------[ cut here ]------------
[    1.718247] RCU not watching for tracepoint
[    1.718255] WARNING: ./include/trace/events/preemptirq.h:40 at 
trace_hardirqs_on+0x180/0x1b0, CPU#19: swapper/19/0
[    1.718266] Modules linked in: ibmvscsi ibmveth scsi_transport_srp 
pseries_wdt
[    1.718275] CPU: 19 UID: 0 PID: 0 Comm: swapper/19 Tainted: G       
W           7.1.0-rc5-next-20260529 #1 PREEMPT(lazy)
[    1.718280] Tainted: [W]=WARN
[    1.718283] Hardware name: IBM,9080-HEX Power11 (architected) 
0x820200 0xf000007 of:IBM,FW1110.01 (NH1110_069) hv:phyp pSeries
[    1.718287] NIP:  c0000000004bb80c LR: c0000000004bb808 CTR: 
0000000000000000
[    1.718290] REGS: c000000004a4b9e0 TRAP: 0700   Tainted: G   W        
     (7.1.0-rc5-next-20260529)
[    1.718294] MSR:  8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  
CR: 44000208  XER: 00000005
[    1.718305] CFAR: c00000000022e9d4 IRQMASK: 3
[    1.718305] GPR00: c0000000004bb808 c000000004a4bca0 c000000001ccf100 
000000000000001f
[    1.718305] GPR04: 3fffffffffff7fff c000000004a4ba90 c000000004a4ba88 
0000000ebe5e2000
[    1.718305] GPR08: 0000000000000027 0000000000000002 c000000004a62b00 
0000000000000003
[    1.718305] GPR12: c000000002d514e0 c000000effff1300 0000000000000000 
000000002ef01a60
[    1.718305] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[    1.718305] GPR20: 0000000000000000 0000000000000000 0000000000000000 
00000000666a3c88
[    1.718305] GPR24: c00000000105088c 000000000000dedc c000000003084d00 
0000000000000000
[    1.718305] GPR28: c000000ec09fe440 0000000000000001 c000000002414988 
c00000000003bca8
[    1.718347] NIP [c0000000004bb80c] trace_hardirqs_on+0x180/0x1b0
[    1.718351] LR [c0000000004bb808] trace_hardirqs_on+0x17c/0x1b0
[    1.718355] Call Trace:
[    1.718357] [c000000004a4bca0] [c0000000004bb808] 
trace_hardirqs_on+0x17c/0x1b0 (unreliable)
[    1.718362] [c000000004a4bd20] [c00000000003bca8] 
arch_local_irq_restore.part.0+0x120/0x224
[    1.718369] [c000000004a4bd60] [c0000000015b065c] snooze_loop+0xa0/0x270
[    1.718374] [c000000004a4bda0] [c0000000015af06c] 
cpuidle_enter_state+0x110/0x8fc
[    1.718379] [c000000004a4be60] [c00000000105088c] cpuidle_enter+0x50/0x74
[    1.718384] [c000000004a4bea0] [c0000000002ca85c] call_cpuidle+0x48/0xa0
[    1.718389] [c000000004a4bec0] [c0000000002d3c80] 
cpuidle_idle_call+0x160/0x2e0
[    1.718395] [c000000004a4bf10] [c0000000002d3f48] do_idle+0x148/0x1f0
[    1.718400] [c000000004a4bf60] [c0000000002d43c8] 
cpu_startup_entry+0x4c/0x50
[    1.718405] [c000000004a4bf90] [c00000000006371c] 
start_secondary+0x27c/0x28c
[    1.718411] [c000000004a4bfe0] [c00000000000e258] 
start_secondary_prolog+0x10/0x14
[    1.718415] Code: 60000000 3d220132 8929db48 2c090000 4082ff64 
3c62ffd6 39200001 3d420132 3863d398 992adb48 4bd730f1 60000000 
<0fe00000> 60000000 4bffff40 60000000
[    1.718430] irq event stamp: 0
[    1.718432] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    1.718436] hardirqs last disabled at (0): [<c00000000022b6ac>] 
copy_process+0xb24/0x1dec
[    1.718442] softirqs last  enabled at (0): [<c00000000022b6ac>] 
copy_process+0xb24/0x1dec
[    1.718447] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    1.718450] ---[ end trace 0000000000000000 ]---



I am reporting all three warnings in one report, as its flagging 
inconsistencies around IRQ enable/disable transitions.


If you happen to fix this, please add below tag.


Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>



Regards,

Venkat.




^ permalink raw reply

* [PATCH v3] ASoC: fsl_sai: Fix 32 slots TDM broken by integer shift UB in xMR write
From: chancel.liu @ 2026-06-01  8:33 UTC (permalink / raw)
  To: shengjiu.wang, Xiubo.Lee, festevam, nicoleotsuka, lgirdwood,
	broonie, perex, tiwai
  Cc: linux-kernel, linuxppc-dev, linux-sound, stable

From: Chancel Liu <chancel.liu@nxp.com>

When configuring 32 slots TDM (channels == slots == 32), the xMR
(Mask Register) write used:
~0UL - ((1 << min(channels, slots)) - 1)

The literal "1" is a signed 32-bit int. Shifting it by 32 positions is
undefined behaviour which may set this register to 0xFFFFFFFF, masking
all 32 slots.

Use GENMASK_U32() macro instead. For 32 slots this produces a zero mask:
~GENMASK_U32(31, 0) = ~0xFFFFFFFF = 0x00000000
Behaviour for fewer than 32 slots is unchanged.

Fixes: 770f58d7d2c5 ("ASoC: fsl_sai: Support multiple data channel enable bits")
Cc: stable@vger.kernel.org
Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
---
Changes in v3
- Fix patch can't be applied

Changes in v2
- Use GENMASK_U32() macro instead to make it clearer and safer

 sound/soc/fsl/fsl_sai.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index d6dd95680892..9661602b53c5 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -797,7 +797,7 @@ static int fsl_sai_hw_params(struct snd_pcm_substream *substream,
 				   FSL_SAI_CR4_FSD_MSTR, FSL_SAI_CR4_FSD_MSTR);

 	regmap_write(sai->regmap, FSL_SAI_xMR(tx),
-		     ~0UL - ((1 << min(channels, slots)) - 1));
+		     ~GENMASK_U32(min(channels, slots) - 1, 0));

 	return 0;
 }
--
2.50.1



^ permalink raw reply related

* RE: [PATCH v2] ASoC: fsl_sai: Fix 32 slots TDM broken by integer shift UB in xMR write
From: Chancel Liu (OSS) @ 2026-06-01  8:28 UTC (permalink / raw)
  To: Chancel Liu (OSS), shengjiu.wang@gmail.com, Xiubo.Lee@gmail.com,
	festevam@gmail.com, nicoleotsuka@gmail.com, lgirdwood@gmail.com,
	broonie@kernel.org, perex@perex.cz, tiwai@suse.com
  Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-sound@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <20260601070543.1351629-1-chancel.liu@oss.nxp.com>

Sorry, please ignore v2, it can't be applied successfully.
Will send v3 shortly.

Regards, 
Chancel Liu

> When configuring 32 slots TDM (channels == slots == 32), the xMR (Mask
> Register) write used:
> ~0UL - ((1 << min(channels, slots)) - 1)
> 
> The literal "1" is a signed 32-bit int. Shifting it by 32 positions is
> undefined behaviour which may set this register to 0xFFFFFFFF, masking all
> 32 slots.
> 
> Use GENMASK_U32() macro instead. For 32 slots this produces a zero mask:
> ~GENMASK_U32(31, 0) = ~0xFFFFFFFF = 0x00000000 Behaviour for fewer than 32
> slots is unchanged.
> 
> Fixes: 770f58d7d2c5 ("ASoC: fsl_sai: Support multiple data channel enable
> bits")
> Cc: stable@vger.kernel.org
> Signed-off-by: Chancel Liu <chancel.liu@nxp.com>
> ---
> Changes in v2
> - Use GENMASK_U32() macro instead to make it clearer and safer
> 
>  sound/soc/fsl/fsl_sai.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c index
> 821e3bd51b6e..9661602b53c5 100644
> --- a/sound/soc/fsl/fsl_sai.c
> +++ b/sound/soc/fsl/fsl_sai.c
> @@ -797,7 +797,7 @@ static int fsl_sai_hw_params(struct snd_pcm_substream
> *substream,
>  				   FSL_SAI_CR4_FSD_MSTR, FSL_SAI_CR4_FSD_MSTR);
> 
>  	regmap_write(sai->regmap, FSL_SAI_xMR(tx),
> -		     ~0ULL - ((1ULL << min(channels, slots)) - 1));
> +		     ~GENMASK_U32(min(channels, slots) - 1, 0));
> 
>  	return 0;
>  }
> --
> 2.50.1



^ permalink raw reply

* [PATCH] powerpc/powernv: Cache OPAL check_token() results
From: Shivang Upadhyay @ 2026-06-01 11:25 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: maddy, mpe, npiggin, chleroy, thuth, Shivang Upadhyay,
	Aditya Gupta, Sourabh Jain, Mahesh J Salgaonkar

Add a caching layer for the opal_check_token() OPAL call to avoid
repeated firmware calls for token availability checks.

The opal_check_token() function is used to determine if a specific
OPAL firmware call is supported on the current platform. This check
is performed frequently during boot and runtime, resulting in
unnecessary firmware calls for the same token values.

This reduces firmware call overhead during boot and runtime token
checks while maintaining compatibility with existing code.

Testing with buildroot images shows OPAL calls reduced from
35578 to 28983, before console bring-up.

Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <chleroy@kernel.org>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Shivang Upadhyay <shivangu@linux.ibm.com>
---
 arch/powerpc/include/asm/opal.h            |  1 +
 arch/powerpc/platforms/powernv/opal-call.c |  2 +-
 arch/powerpc/platforms/powernv/opal.c      | 55 ++++++++++++++++++++++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 0a398265ba04..e7e11479122b 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -156,6 +156,7 @@ int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
 int64_t opal_pci_poll(uint64_t id);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
+int64_t opal_check_token_call(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
 
 int64_t opal_xscom_read(uint32_t gcid, uint64_t pcb_addr, __be64 *val);
diff --git a/arch/powerpc/platforms/powernv/opal-call.c b/arch/powerpc/platforms/powernv/opal-call.c
index 021b0ec29e24..00325c189e69 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -207,7 +207,7 @@ OPAL_CALL(opal_validate_flash,			OPAL_FLASH_VALIDATE);
 OPAL_CALL(opal_manage_flash,			OPAL_FLASH_MANAGE);
 OPAL_CALL(opal_update_flash,			OPAL_FLASH_UPDATE);
 OPAL_CALL(opal_resync_timebase,			OPAL_RESYNC_TIMEBASE);
-OPAL_CALL(opal_check_token,			OPAL_CHECK_TOKEN);
+OPAL_CALL(opal_check_token_call,		OPAL_CHECK_TOKEN);
 OPAL_CALL(opal_dump_init,			OPAL_DUMP_INIT);
 OPAL_CALL(opal_dump_info,			OPAL_DUMP_INFO);
 OPAL_CALL(opal_dump_info2,			OPAL_DUMP_INFO2);
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 1946dbdc9fa1..c32035136efa 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -73,6 +73,12 @@ static struct task_struct *kopald_tsk;
 static struct opal_msg *opal_msg;
 static u32 opal_msg_size __ro_after_init;
 
+/* Token cache for opal_check_token() */
+#define OPAL_TOKEN_CACHE_SIZE 256  /* Covers tokens 0-255, including OPAL_LAST (178) */
+static unsigned long opal_token_cache[BITS_TO_LONGS(OPAL_TOKEN_CACHE_SIZE)] __ro_after_init;
+static bool opal_token_cache_initialized __ro_after_init;
+static void opal_token_cache_init(void);
+
 void __init opal_configure_cores(void)
 {
 	u64 reinit_flags = 0;
@@ -1125,8 +1131,57 @@ EXPORT_SYMBOL_GPL(opal_flash_read);
 EXPORT_SYMBOL_GPL(opal_flash_write);
 EXPORT_SYMBOL_GPL(opal_flash_erase);
 EXPORT_SYMBOL_GPL(opal_prd_msg);
+
+/**
+ * opal_check_token - Check if an OPAL call token is supported
+ * @token: OPAL token number to check
+ *
+ * Returns 1 if supported, 0 if not.
+ * Uses a cached bitmap for fast lookups after initialization.
+ */
+int64_t opal_check_token(uint64_t token)
+{
+	/* Initialize if not done before */
+	if (!opal_token_cache_initialized) {
+		opal_token_cache_init();
+	}
+
+	/* Use cached result */
+	if (token < OPAL_TOKEN_CACHE_SIZE) {
+		return test_bit(token, opal_token_cache);
+	}
+
+	/* Fall back to direct OPAL call for out-of-range tokens */
+	return opal_check_token_call(token);
+}
 EXPORT_SYMBOL_GPL(opal_check_token);
 
+/**
+ * opal_token_cache_init - Initialize the OPAL token cache
+ *
+ * Called during opal_init() to populate the token cache by querying
+ * OPAL firmware for all tokens in the supported range.
+ */
+static void opal_token_cache_init(void)
+{
+	uint64_t token;
+	int64_t result;
+
+	pr_debug("Initializing OPAL token cache\n");
+
+	/* Query OPAL for each token and cache the result */
+	for (token = 0; token < OPAL_TOKEN_CACHE_SIZE; token++) {
+		result = opal_check_token_call(token);
+		if (result == 1)
+			set_bit(token, opal_token_cache);
+	}
+
+	/* Mark cache as initialized - enables fast path */
+	opal_token_cache_initialized = true;
+
+	pr_info("OPAL token cache initialized\n");
+}
+
 /* Convert a region of vmalloc memory to an opal sg list */
 struct opal_sg_list *opal_vmalloc_to_sg_list(void *vmalloc_addr,
 					     unsigned long vmalloc_size)
-- 
2.53.0



^ permalink raw reply related

* Re: [PATCH v7 15/15] arm64: mm: Unmap kernel data/bss entirely from the linear map
From: Kevin Brodsky @ 2026-06-01 10:43 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
	Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
	Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
	Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
In-Reply-To: <20260529150150.1670604-32-ardb+git@google.com>

On 29/05/2026 17:02, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The linear aliases of the kernel text and rodata are also mapped
> read-only in the linear map. Given that the contents of these regions
> are mostly identical to the version in the loadable image, mapping them
> read-only and leaving their contents visible is a reasonable hardening
> measure.
>
> Data and bss, however, are now also mapped read-only but the contents of
> these regions are more likely to contain data that we'd rather not leak.
> So let's unmap these entirely in the linear map when the kernel is
> running normally.
>
> When going into hibernation or waking up from it, these regions need to
> be mapped, so map the region initially, and toggle the valid bit so
> map/unmap the region as needed.

s/so map/to map/? Also not sure what "initially" is referring to here.

Otherwise:

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>

I don't know much about hibernation though, would be good for someone
knowledgeable to have a look.

- Kevin

> Doing so is required because pages covering the kernel image are marked
> as PageReserved, and therefore disregarded for snapshotting by the
> hibernate logic unless they are mapped.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/mm/mmu.c | 45 ++++++++++++++++++--
>  1 file changed, 41 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7b18dc2f1721..07a6fa210171 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -24,6 +24,7 @@
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
>  #include <linux/set_memory.h>
> +#include <linux/suspend.h>
>  #include <linux/kfence.h>
>  #include <linux/pkeys.h>
>  #include <linux/mm_inline.h>
> @@ -1056,6 +1057,29 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end,
>  				 end - start, prot, early_pgtable_alloc, flags);
>  }
>  
> +static void mark_linear_data_alias_valid(bool valid)
> +{
> +	set_memory_valid((unsigned long)lm_alias(__init_end),
> +			 (unsigned long)(__bss_stop - __init_end) / PAGE_SIZE,
> +			 valid);
> +}
> +
> +static int arm64_hibernate_pm_notify(struct notifier_block *nb,
> +				     unsigned long mode, void *unused)
> +{
> +	switch (mode) {
> +	default:
> +		break;
> +	case PM_POST_HIBERNATION:
> +		mark_linear_data_alias_valid(false);
> +		break;
> +	case PM_HIBERNATION_PREPARE:
> +		mark_linear_data_alias_valid(true);
> +		break;
> +	}
> +	return 0;
> +}
> +
>  void __init mark_linear_text_alias_ro(void)
>  {
>  	/*
> @@ -1064,6 +1088,21 @@ void __init mark_linear_text_alias_ro(void)
>  	update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
>  			    (unsigned long)__init_begin - (unsigned long)_text,
>  			    PAGE_KERNEL_RO);
> +
> +	/*
> +	 * Register a PM notifier to remap the linear alias of data/bss as
> +	 * valid read-only before hibernation. This is needed because the
> +	 * snapshot logic disregards PageReserved pages (such as the ones
> +	 * covering the kernel image) unless they are mapped in the linear
> +	 * map.
> +	 */
> +	if (IS_ENABLED(CONFIG_HIBERNATION)) {
> +		static struct notifier_block nb = {
> +			.notifier_call = arm64_hibernate_pm_notify
> +		};
> +
> +		register_pm_notifier(&nb);
> +	}
>  }
>  
>  #ifdef CONFIG_KFENCE
> @@ -1193,10 +1232,8 @@ static void __init map_mem(void)
>  			       flags);
>  	}
>  
> -	/* Map the kernel data/bss read-only in the linear map */
> -	__map_memblock(init_end, kernel_end, PAGE_KERNEL_RO, flags);
> -	flush_tlb_kernel_range((unsigned long)lm_alias(__init_end),
> -			       (unsigned long)lm_alias(__bss_stop));
> +	/* Map the kernel data/bss as invalid in the linear map */
> +	mark_linear_data_alias_valid(false);
>  }
>  
>  void mark_rodata_ro(void)


^ permalink raw reply

* Re: [PATCH v7 10/15] arm64: mm: Don't abuse memblock NOMAP to check for overlaps
From: Kevin Brodsky @ 2026-06-01 10:43 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
	Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
	Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
	Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
In-Reply-To: <20260529150150.1670604-27-ardb+git@google.com>

On 29/05/2026 17:02, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Now that the linear region mapping routines respect existing table
> mappings and contiguous block and page mappings, it is no longer needed
> to fiddle with the memblock tables to set and clear the NOMAP attribute
> in order to omit text and rodata when creating the linear map.
>
> Instead, map the kernel text and rodata alias first with the desired
> initial attributes and granularity, so that the loop iterating over the
> memblocks will not remap it in a manner that prevents it from being
> remapped with updated attributes later.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>

> ---
>  arch/arm64/mm/mmu.c | 26 ++++++++------------
>  1 file changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 971996e46fd1..dcfca5667e5c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1164,12 +1164,17 @@ static void __init map_mem(void)
>  		flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>  
>  	/*
> -	 * Take care not to create a writable alias for the
> -	 * read-only text and rodata sections of the kernel image.
> -	 * So temporarily mark them as NOMAP to skip mappings in
> -	 * the following for-loop
> +	 * Map the linear alias of the [_text, __init_begin) interval first
> +	 * so that its write permissions can be removed later without the need
> +	 * to split any block mappings created by the loop below.
> +	 *
> +	 * Write permissions are needed for alternatives patching, and will be
> +	 * removed later by mark_linear_text_alias_ro() above. This makes the
> +	 * contents of the region accessible to subsystems such as hibernate,
> +	 * but protects it from inadvertent modification or execution.
>  	 */
> -	memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> +	__map_memblock(kernel_start, kernel_end, pgprot_tagged(PAGE_KERNEL),
> +		       flags);
>  
>  	/* map all the memory banks */
>  	for_each_mem_range(i, &start, &end) {
> @@ -1181,17 +1186,6 @@ static void __init map_mem(void)
>  		__map_memblock(start, end, pgprot_tagged(PAGE_KERNEL),
>  			       flags);
>  	}
> -
> -	/*
> -	 * Map the linear alias of the [_text, __init_begin) interval
> -	 * as non-executable now, and remove the write permission in
> -	 * mark_linear_text_alias_ro() below (which will be called after
> -	 * alternative patching has completed). This makes the contents
> -	 * of the region accessible to subsystems such as hibernate,
> -	 * but protects it from inadvertent modification or execution.
> -	 */
> -	__map_memblock(kernel_start, kernel_end, PAGE_KERNEL, 0);
> -	memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
>  }
>  
>  void mark_rodata_ro(void)


^ permalink raw reply

* Re: [PATCH v7 07/15] arm64: kfence: Avoid NOMAP tricks when mapping the early pool
From: Kevin Brodsky @ 2026-06-01 10:42 UTC (permalink / raw)
  To: Ard Biesheuvel, linux-arm-kernel
  Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
	Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
	Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
	Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
In-Reply-To: <20260529150150.1670604-24-ardb+git@google.com>

On 29/05/2026 17:01, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Now that the map_mem() routines respect existing page mappings and
> contiguous granule sized blocks with the contiguous bit cleared, there
> is no longer a reason to play tricks with the memblock NOMAP attribute.
>
> Instead, the kfence pool can be allocated and mapped with page
> granularity first, and this granularity will be respected when the rest
> of DRAM is mapped later, even if block and contiguous mappings are
> allowed for the remainder of those mappings.
>
> Add the NO_EXEC_MAPPINGS flag to ensure that hierarchical XN attributes
> are set on the intermediate page tables that are allocated when mapping
> the pool.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>

> ---
>  arch/arm64/mm/mmu.c | 27 +++++---------------
>  1 file changed, 6 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index d7a6991e1844..cdf8b3510229 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1083,36 +1083,24 @@ static int __init parse_kfence_early_init(char *arg)
>  }
>  early_param("kfence.sample_interval", parse_kfence_early_init);
>  
> -static phys_addr_t __init arm64_kfence_alloc_pool(void)
> +static void __init arm64_kfence_map_pool(void)
>  {
>  	phys_addr_t kfence_pool;
>  
>  	if (!kfence_early_init)
> -		return 0;
> +		return;
>  
>  	kfence_pool = memblock_phys_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
>  	if (!kfence_pool) {
>  		pr_err("failed to allocate kfence pool\n");
>  		kfence_early_init = false;
> -		return 0;
> -	}
> -
> -	/* Temporarily mark as NOMAP. */
> -	memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE);
> -
> -	return kfence_pool;
> -}
> -
> -static void __init arm64_kfence_map_pool(phys_addr_t kfence_pool)
> -{
> -	if (!kfence_pool)
>  		return;
> +	}
>  
>  	/* KFENCE pool needs page-level mapping. */
>  	__map_memblock(kfence_pool, kfence_pool + KFENCE_POOL_SIZE,
>  			pgprot_tagged(PAGE_KERNEL),
> -			NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
> -	memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
> +			NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS | NO_EXEC_MAPPINGS);
>  	__kfence_pool = phys_to_virt(kfence_pool);
>  }
>  
> @@ -1144,8 +1132,7 @@ bool arch_kfence_init_pool(void)
>  }
>  #else /* CONFIG_KFENCE */
>  
> -static inline phys_addr_t arm64_kfence_alloc_pool(void) { return 0; }
> -static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool) { }
> +static inline void arm64_kfence_map_pool(void) { }
>  
>  #endif /* CONFIG_KFENCE */
>  
> @@ -1155,7 +1142,6 @@ static void __init map_mem(void)
>  	phys_addr_t kernel_start = __pa_symbol(_text);
>  	phys_addr_t kernel_end = __pa_symbol(__init_begin);
>  	phys_addr_t start, end;
> -	phys_addr_t early_kfence_pool;
>  	int flags = NO_EXEC_MAPPINGS;
>  	u64 i;
>  
> @@ -1172,7 +1158,7 @@ static void __init map_mem(void)
>  	BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end) &&
>  		     pgd_index(_PAGE_OFFSET(VA_BITS_MIN)) != PTRS_PER_PGD - 1);
>  
> -	early_kfence_pool = arm64_kfence_alloc_pool();
> +	arm64_kfence_map_pool();
>  
>  	linear_map_requires_bbml2 = !force_pte_mapping() && can_set_direct_map();
>  
> @@ -1210,7 +1196,6 @@ static void __init map_mem(void)
>  	 */
>  	__map_memblock(kernel_start, kernel_end, PAGE_KERNEL, NO_CONT_MAPPINGS);
>  	memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
> -	arm64_kfence_map_pool(early_kfence_pool);
>  }
>  
>  void mark_rodata_ro(void)


^ permalink raw reply

* Re: [linux-next20260529] kernel BUG at kernel/sched/core.c:7512!
From: Peter Zijlstra @ 2026-06-01  9:56 UTC (permalink / raw)
  To: Shrikanth Hegde
  Cc: Venkat Rao Bagalkote, Madhavan Srinivasan,
	Mukesh Kumar Chaurasiya, Ritesh Harjani, linuxppc-dev, LKML,
	Srikar Dronamraju
In-Reply-To: <2f8c3d75-de2c-48bf-bd05-46b816d55c69@linux.ibm.com>

On Mon, Jun 01, 2026 at 02:46:24PM +0530, Shrikanth Hegde wrote:

> Ritesh, Mukesh, Is below possible scenario?
> 
> do_page_fault seems to enable irq's in the interrupt handler?
> is that expected? if so, one might see
> 
> -- do_page_fault (enter kernel mode)
>    -- enables interrupts
>    -- gets interrupt - Sets need_resched.
>       -- irqentry_exit - Sees it is kernel mode. Just checks preempt count
> 			 and calls preempt_schedule_irq, which catches both
> 			 preempt_count and !irqs_disabled. Hence the panic?
> 
> Should do_page_fault do preempt_disable when it enables the interrupts?

No, it is expected for page-fault to be able to schedule. Specifically,
it must be able to sleep to support loading pages from disk.

Please check the value of preempt_count() (does it perchance have
HARDIRQ_OFFSET?). Also, if the fault handler does enable IRQs, it must
also disable them again once done.

Notably, I see ___do_page_fault() do interrupt_cond_loadl_irq_enable(),
but I'm not seeing a local_irq_disable() to match!

^ permalink raw reply

* [PATCH v15 23/23] arm64: crash: Add crash hotplug support
From: Jinjie Ruan @ 2026-06-01  9:48 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Due to CPU/Memory hotplug or online/offline events, the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) of kdump
image becomes outdated. Consequently, attempting dump collection with
an outdated elfcorehdr can lead to inaccurate dump collection.

The current solution to address the above issue involves monitoring
the CPU/Memory add/remove events in userspace using udev rules and
whenever there are changes in CPU and memory resources, the entire
kdump image is loaded again. The kdump image includes kernel, initrd,
elfcorehdr, FDT, purgatory. Given that only elfcorehdr gets outdated
due to CPU/Memory add/remove events, reloading the entire kdump image
is inefficient. More importantly, kdump remains inactive for a
substantial amount of time until the kdump reload completes.

To address the aforementioned issue, commit 247262756121 ("crash: add
generic infrastructure for crash hotplug support") added a generic
infrastructure that allows architectures to selectively update the
kdump image component during CPU or memory add/remove events within
the kernel itself.

In the event of a CPU or memory add/remove events, the generic crash
hotplug event handler, crash_handle_hotplug_event(), is triggered. It
then acquires the necessary locks to update the kdump image and invokes
the architecture-specific crash hotplug handler,
arch_crash_handle_hotplug_event(), to update the required kdump image
components.

[1] has supported virtual CPU hotplug in virtual machines for ARM64,
allowing vCPUs to be added or removed at runtime to meet Kubernetes
demands.

On ARM64, only memory add/remove events are handled. Here's why:

1. Physical CPU hotplug: Not supported on ARM64 hardware.

2. ACPI vCPU hotplug (KVM virtual machine):
   - vCPU hotplug is implemented as a static firmware policy where all
     possible vCPUs are pre-described in the MADT table at boot.
   - The vCPU status will be automatically updated after vCPU hotplug.
   - No FDT or elfcorehdr update needed.

3. Device tree booted Virtual Machine vCPU hotplug:
  - The elfcorehdr is built using for_each_possible_cpu(), so it
    already includes all possible CPUs and doesn't need updates.

For memory add/remove events, the elfcorehdr is updated to reflect
the current memory layout.

This patch adds the ARCH_SUPPORTS_CRASH_HOTPLUG config option and
implements:
- arch_crash_hotplug_support(): Check if hotplug update is supported
- arch_crash_get_elfcorehdr_size(): Return elfcorehdr buffer size
- arch_crash_handle_hotplug_event(): Handle memory hotplug events

This follows the same approach as x86 commit
ea53ad9cf73b ("x86/crash: add x86 crash hotplug support") and powerpc
commit b741092d5976 ("powerpc/crash: add crash CPU hotplug support")
and commit 849599b702ef ("powerpc/crash: add crash memory hotplug
support").

The test is based on the following QEMU version:
	https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2

Replace your '-smp' argument with something like:
 | -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1

then feed the following to the Qemu montior to hotplug vCPU;
 | (qemu) device_add driver=host-arm-cpu,core-id=1,id=cpu1
 | (qemu) device_del cpu1

feed the following to the Qemu montior to hotplug memory;
 | (qemu) object_add memory-backend-ram,id=mem1,size=256M
 | (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
 | (qemu) device_del dimm1

The qemu startup configuration is as follows:
qemu-system-aarch64 \
		-M virt,gic-version=3,acpi=on,highmem=on \
		-enable-kvm \
		-cpu host \
		-kernel Image \
		-smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1 \
		-bios /usr/share/edk2/aarch64/QEMU_EFI.fd \
		-m 2G,slots=64,maxmem=16G \
		-nographic \
		-no-reboot \
		-device virtio-rng-pci \
		-append "root=/dev/vda rw console=ttyAMA0 kgdboc=ttyAMA0,115200 \
			earlycon acpi=on crashkernel=512M" \
		-drive if=none,file=images/rootfs.ext4,format=raw,id=hd0 \
		-device virtio-blk-device,drive=hd0 \

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. Only kexec_file_load syscall way is tested now.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Breno Leitao <leitao@debian.org>
Cc: Kees Cook <kees@kernel.org>
[1]: https://lore.kernel.org/all/20240529133446.28446-1-Jonathan.Cameron@huawei.com/
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/Kconfig                     |   3 +
 arch/arm64/include/asm/kexec.h         |  13 +++
 arch/arm64/kernel/Makefile             |   2 +-
 arch/arm64/kernel/crash.c              | 152 +++++++++++++++++++++++++
 arch/arm64/kernel/kexec_image.c        |  21 +++-
 arch/arm64/kernel/machine_kexec_file.c |  40 ++-----
 6 files changed, 195 insertions(+), 36 deletions(-)
 create mode 100644 arch/arm64/kernel/crash.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fe60738e5943..9091c67e1cc2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1609,6 +1609,9 @@ config ARCH_DEFAULT_CRASH_DUMP
 config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
 	def_bool CRASH_RESERVE
 
+config ARCH_SUPPORTS_CRASH_HOTPLUG
+	def_bool y
+
 config TRANS_TABLE
 	def_bool y
 	depends on HIBERNATION || KEXEC_CORE
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 892e5bebda95..4f3d4fc2807e 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -130,6 +130,19 @@ extern int load_other_segments(struct kimage *image,
 		char *cmdline);
 #endif
 
+#ifdef CONFIG_CRASH_HOTPLUG
+#define pnum_hdr_sz(pnum) ((pnum) * sizeof(Elf64_Phdr) + sizeof(Elf64_Ehdr))
+
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags);
+#define arch_crash_hotplug_support arch_crash_hotplug_support
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
+#endif
+
 #endif /* __ASSEMBLER__ */
 
 #endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 74b76bb70452..0625422fc528 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -64,7 +64,7 @@ obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o kexec_image.o
 obj-$(CONFIG_ARM64_RELOC_TEST)		+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
+obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o crash.o
 obj-$(CONFIG_VMCORE_INFO)		+= vmcore_info.o
 obj-$(CONFIG_ARM_SDE_INTERFACE)		+= sdei.o
 obj-$(CONFIG_ARM64_PTR_AUTH)		+= pointer_auth.o
diff --git a/arch/arm64/kernel/crash.c b/arch/arm64/kernel/crash.c
new file mode 100644
index 000000000000..5882b9b5a90e
--- /dev/null
+++ b/arch/arm64/kernel/crash.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Architecture specific functions for kexec based crash dumps.
+ */
+
+#define pr_fmt(fmt)	"crash hp: " fmt
+
+#include <linux/kexec.h>
+#include <linux/elf.h>
+#include <linux/memblock.h>
+#include <linux/vmalloc.h>
+#include <linux/cacheflush.h>
+#include <linux/crash_core.h>
+
+#include <asm/kexec.h>
+
+#if defined(CONFIG_KEXEC_FILE) || defined(CONFIG_CRASH_HOTPLUG)
+unsigned int arch_get_system_nr_ranges(void)
+{
+	/* for exclusion of crashkernel region */
+	unsigned int nr_ranges = 2 + crashk_cma_cnt + CRASH_HOTPLUG_SAFETY_PADDING;
+	phys_addr_t start, end;
+	u64 i;
+
+	for_each_mem_range(i, &start, &end)
+		nr_ranges++;
+
+	return nr_ranges;
+}
+
+int arch_crash_populate_cmem(struct crash_mem *cmem)
+{
+	phys_addr_t start, end;
+	u64 i;
+
+	for_each_mem_range(i, &start, &end) {
+		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges))
+			return -EAGAIN;
+
+		cmem->ranges[cmem->nr_ranges].start = start;
+		cmem->ranges[cmem->nr_ranges].end = end - 1;
+		cmem->nr_ranges++;
+	}
+
+	return 0;
+}
+#endif
+
+#ifdef CONFIG_CRASH_HOTPLUG
+int arch_crash_hotplug_support(struct kimage *image, unsigned long kexec_flags)
+{
+#ifdef CONFIG_KEXEC_FILE
+	if (image->file_mode)
+		return 1;
+#endif
+	/*
+	 * For kexec_load syscall, crash hotplug support requires
+	 * KEXEC_CRASH_HOTPLUG_SUPPORT flag to be passed by userspace.
+	 */
+	return kexec_flags & KEXEC_CRASH_HOTPLUG_SUPPORT;
+}
+
+unsigned int arch_crash_get_elfcorehdr_size(void)
+{
+	unsigned int phdr_cnt;
+
+	/* A program header for possible CPUs, vmcoreinfo and kernel_map */
+	phdr_cnt = 2 + num_possible_cpus();
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		phdr_cnt += CONFIG_CRASH_MAX_MEMORY_RANGES;
+
+	return pnum_hdr_sz(phdr_cnt);
+}
+
+/**
+ * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old
+ *			       elfcorehdr in the kexec segment array.
+ * @image: the active struct kimage
+ */
+static void update_crash_elfcorehdr(struct kimage *image)
+{
+	void *elfbuf = NULL, *old_elfcorehdr;
+	unsigned long mem, memsz;
+	unsigned long elfsz = 0;
+
+	/*
+	 * Create the new elfcorehdr reflecting the changes to CPU and/or
+	 * memory resources.
+	 */
+	if (crash_prepare_headers(true, &elfbuf, &elfsz, NULL)) {
+		pr_err("unable to create new elfcorehdr");
+		goto out;
+	}
+
+	/*
+	 * Obtain address and size of the elfcorehdr segment, and
+	 * check it against the new elfcorehdr buffer.
+	 */
+	mem = image->segment[image->elfcorehdr_index].mem;
+	memsz = image->segment[image->elfcorehdr_index].memsz;
+	if (elfsz > memsz) {
+		pr_err("update elfcorehdr elfsz %lu > memsz %lu",
+			elfsz, memsz);
+		goto out;
+	}
+
+	/*
+	 * Copy new elfcorehdr over the old elfcorehdr at destination.
+	 */
+	old_elfcorehdr = (void *)__va(mem);
+	if (!old_elfcorehdr) {
+		pr_err("mapping elfcorehdr segment failed\n");
+		goto out;
+	}
+
+	/*
+	 * Temporarily invalidate the crash image while the
+	 * elfcorehdr is updated.
+	 */
+	xchg(&kexec_crash_image, NULL);
+	memcpy((void *)old_elfcorehdr, elfbuf, elfsz);
+	dcache_clean_inval_poc((unsigned long)old_elfcorehdr,
+			       (unsigned long)old_elfcorehdr + elfsz);
+	xchg(&kexec_crash_image, image);
+	pr_debug("updated elfcorehdr\n");
+
+out:
+	vfree(elfbuf);
+}
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: a pointer to kexec_crash_image
+ * @arg: struct memory_notify handler for memory hotplug case and
+ *       NULL for CPU hotplug case.
+ *
+ * Update the kdump image based on the type of hotplug event:
+ * - CPU add and remove: No action is needed.
+ * - Memory add/remove: Update the elfcorehdr to reflect the current memory layout.
+ *
+ * Prepare the new elfcorehdr and replace the existing elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
+{
+	if ((image->file_mode || image->elfcorehdr_updated) &&
+		((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+		(image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+		return;
+
+	update_crash_elfcorehdr(image);
+}
+#endif /* CONFIG_CRASH_HOTPLUG */
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 93c36a3aa618..21f38de7a8b6 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -8,6 +8,7 @@
 
 #define pr_fmt(fmt)	"kexec_file(Image): " fmt
 
+#include <linux/elf.h>
 #include <linux/err.h>
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -92,16 +93,32 @@ static void *image_load(struct kimage *image,
 #ifdef CONFIG_CRASH_DUMP
 	if (image->type == KEXEC_TYPE_CRASH) {
 		/* load elf core header */
-		unsigned long headers_sz;
+		unsigned long headers_sz, pnum = 0;
 		void *headers;
 
-		ret = crash_prepare_headers(true, &headers, &headers_sz, NULL);
+		ret = crash_prepare_headers(true, &headers, &headers_sz, &pnum);
 		if (ret) {
 			pr_err("Preparing elf core header failed\n");
 			return ERR_PTR(ret);
 		}
 		image->elf_headers = headers;
 		image->elf_headers_sz = headers_sz;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+		/*
+		 * The elfcorehdr segment size accounts for VMCOREINFO, kernel_map
+		 * maximum CPUs and maximum memory ranges.
+		 */
+		if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+			pnum = 2 + num_possible_cpus() + CONFIG_CRASH_MAX_MEMORY_RANGES;
+		else
+			pnum += 2 + num_possible_cpus();
+
+		if (pnum < (unsigned long)PN_XNUM)
+			image->elf_headers_sz = max(pnum_hdr_sz(pnum), headers_sz);
+		else
+			pr_err("number of Phdrs %lu exceeds max\n", pnum);
+#endif
 	}
 #endif
 
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index d0f73eb3f856..0016001f4d00 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -10,11 +10,11 @@
 
 #define pr_fmt(fmt) "kexec_file: " fmt
 
+#include <linux/elf.h>
 #include <linux/ioport.h>
 #include <linux/kernel.h>
 #include <linux/kexec.h>
 #include <linux/libfdt.h>
-#include <linux/memblock.h>
 #include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/slab.h>
@@ -39,38 +39,6 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
 	return kexec_image_post_load_cleanup_default(image);
 }
 
-#ifdef CONFIG_CRASH_DUMP
-unsigned int arch_get_system_nr_ranges(void)
-{
-	/* for exclusion of crashkernel region */
-	unsigned int nr_ranges = 2 + crashk_cma_cnt + CRASH_HOTPLUG_SAFETY_PADDING;
-	phys_addr_t start, end;
-	u64 i;
-
-	for_each_mem_range(i, &start, &end)
-		nr_ranges++;
-
-	return nr_ranges;
-}
-
-int arch_crash_populate_cmem(struct crash_mem *cmem)
-{
-	phys_addr_t start, end;
-	u64 i;
-
-	for_each_mem_range(i, &start, &end) {
-		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges))
-			return -EAGAIN;
-
-		cmem->ranges[cmem->nr_ranges].start = start;
-		cmem->ranges[cmem->nr_ranges].end = end - 1;
-		cmem->nr_ranges++;
-	}
-
-	return 0;
-}
-#endif
-
 /*
  * Tries to add the initrd and DTB to the image. If it is not possible to find
  * valid locations, this function will undo changes to the image and return non
@@ -98,6 +66,12 @@ int load_other_segments(struct kimage *image,
 		kbuf.bufsz = image->elf_headers_sz;
 		kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
 		kbuf.memsz = image->elf_headers_sz;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+		if (image->elf_headers_sz < pnum_hdr_sz(PN_XNUM))
+			image->elfcorehdr_index = image->nr_segments;
+#endif
+
 		kbuf.buf_align = SZ_64K; /* largest supported page size */
 		kbuf.buf_max = ULONG_MAX;
 		kbuf.top_down = true;
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 22/23] riscv: kexec_file: Add support for crashkernel CMA reservation
From: Jinjie Ruan @ 2026-06-01  9:48 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the
crashkernel= command line option") and commit ab475510e042 ("kdump:
implement reserve_crashkernel_cma") added CMA support for kdump
crashkernel reservation. This allows the kernel to dynamically allocate
contiguous memory for crash dumping when needed, rather than permanently
reserving a fixed region at boot time.

So extend crashkernel CMA reservation support to riscv. The following
changes are made to enable CMA reservation:

- Parse and obtain the CMA reservation size along with other crashkernel
  parameters.
- Call reserve_crashkernel_cma() to allocate the CMA region for kdump.
- Include the CMA-reserved ranges for kdump kernel to use, which was
  already done in of_kexec_alloc_and_setup_fdt().
- Exclude the CMA-reserved ranges from the crash kernel memory to
  prevent them from being exported through /proc/vmcore, which was
  already done in the crash core.

Update kernel-parameters.txt to document CMA support for crashkernel on
riscv architecture.

Cc: Paul Walmsley <pjw@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Paul Walmsley <pjw@kernel.org> # arch/riscv
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++--------
 arch/riscv/kernel/machine_kexec_file.c          |  2 +-
 arch/riscv/mm/init.c                            |  5 +++--
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 52742fab49a9..3ff3ddd516cf 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1119,14 +1119,14 @@ Kernel parameters
 			It will be ignored when crashkernel=X,high is not used
 			or memory reserved is below 4G.
 	crashkernel=size[KMG],cma
-			[KNL, X86, ARM64, PPC] Reserve additional crash kernel memory from
-			CMA. This reservation is usable by the first system's
-			userspace memory and kernel movable allocations (memory
-			balloon, zswap). Pages allocated from this memory range
-			will not be included in the vmcore so this should not
-			be used if dumping of userspace memory is intended and
-			it has to be expected that some movable kernel pages
-			may be missing from the dump.
+			[KNL, X86, ARM64, RISCV, PPC] Reserve additional crash
+			kernel memory from CMA. This reservation is usable by
+			the first system's userspace memory and kernel movable
+			allocations (memory balloon, zswap). Pages allocated
+			from this memory range will not be included in the vmcore
+			so this should not be used if dumping of userspace memory
+			is intended and it has to be expected that some movable
+			kernel pages may be missing from the dump.
 
 			A standard crashkernel reservation, as described above,
 			is still needed to hold the crash kernel and initrd.
diff --git a/arch/riscv/kernel/machine_kexec_file.c b/arch/riscv/kernel/machine_kexec_file.c
index 6e2a6747d187..42d847154e19 100644
--- a/arch/riscv/kernel/machine_kexec_file.c
+++ b/arch/riscv/kernel/machine_kexec_file.c
@@ -47,7 +47,7 @@ static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 unsigned int arch_get_system_nr_ranges(void)
 {
 	/* For exclusion of crashkernel region */
-	unsigned int nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
+	unsigned int nr_ranges = 2 + crashk_cma_cnt + CRASH_HOTPLUG_SAFETY_PADDING;
 
 	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
 
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index decd7df40fa4..c848454b8349 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1295,7 +1295,7 @@ static inline void setup_vm_final(void)
  */
 static void __init arch_reserve_crashkernel(void)
 {
-	unsigned long long low_size = 0;
+	unsigned long long low_size = 0, cma_size = 0;
 	unsigned long long crash_base, crash_size;
 	bool high = false;
 	int ret;
@@ -1305,11 +1305,12 @@ static void __init arch_reserve_crashkernel(void)
 
 	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
 				&crash_size, &crash_base,
-				&low_size, NULL, &high);
+				&low_size, &cma_size, &high);
 	if (ret)
 		return;
 
 	reserve_crashkernel_generic(crash_size, crash_base, low_size, high);
+	reserve_crashkernel_cma(cma_size);
 }
 
 void __init paging_init(void)
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 21/23] arm64: kexec_file: Add support for crashkernel CMA reservation
From: Jinjie Ruan @ 2026-06-01  9:48 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the
crashkernel= command line option") and commit ab475510e042 ("kdump:
implement reserve_crashkernel_cma") added CMA support for kdump
crashkernel reservation.

Crash kernel memory reservation wastes production resources if too
large, risks kdump failure if too small, and faces allocation difficulties
on fragmented systems due to contiguous block constraints. The new
CMA-based crashkernel reservation scheme splits the "large fixed
reservation" into a "small fixed region + large CMA dynamic region": the
CMA memory is available to userspace during normal operation to avoid
waste, and is reclaimed for kdump upon crash—saving memory while
improving reliability.

So extend crashkernel CMA reservation support to arm64. The following
changes are made to enable CMA reservation:

- Parse and obtain the CMA reservation size along with other crashkernel
  parameters.
- Call reserve_crashkernel_cma() to allocate the CMA region for kdump.
- Include the CMA-reserved ranges for kdump kernel to use.
- Exclude the CMA-reserved ranges from the crash kernel memory to
  prevent them from being exported through /proc/vmcore, which is already
  done in the crash core.

Update kernel-parameters.txt to document CMA support for crashkernel on
arm64 architecture.

Tested-by: Breno Leitao <leitao@debian.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
v7:
- Correct the inclusion of CMA-reserved ranges for kdump
  kernel in of/kexec.
v3:
- Add Acked-by.
v2:
- Free cmem in prepare_elf_headers()
- Add the mtivation.
---
 Documentation/admin-guide/kernel-parameters.txt | 2 +-
 arch/arm64/kernel/machine_kexec_file.c          | 2 +-
 arch/arm64/mm/init.c                            | 5 +++--
 drivers/of/fdt.c                                | 9 +++++----
 drivers/of/kexec.c                              | 9 +++++++++
 include/linux/crash_reserve.h                   | 4 +++-
 6 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 4d0f545fb3ec..52742fab49a9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1119,7 +1119,7 @@ Kernel parameters
 			It will be ignored when crashkernel=X,high is not used
 			or memory reserved is below 4G.
 	crashkernel=size[KMG],cma
-			[KNL, X86, ppc] Reserve additional crash kernel memory from
+			[KNL, X86, ARM64, PPC] Reserve additional crash kernel memory from
 			CMA. This reservation is usable by the first system's
 			userspace memory and kernel movable allocations (memory
 			balloon, zswap). Pages allocated from this memory range
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 14e65351133e..d0f73eb3f856 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -43,7 +43,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
 unsigned int arch_get_system_nr_ranges(void)
 {
 	/* for exclusion of crashkernel region */
-	unsigned int nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
+	unsigned int nr_ranges = 2 + crashk_cma_cnt + CRASH_HOTPLUG_SAFETY_PADDING;
 	phys_addr_t start, end;
 	u64 i;
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 97987f850a33..227f58522dad 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -96,8 +96,8 @@ phys_addr_t __ro_after_init arm64_dma_phys_limit;
 
 static void __init arch_reserve_crashkernel(void)
 {
+	unsigned long long crash_base, crash_size, cma_size = 0;
 	unsigned long long low_size = 0;
-	unsigned long long crash_base, crash_size;
 	bool high = false;
 	int ret;
 
@@ -106,11 +106,12 @@ static void __init arch_reserve_crashkernel(void)
 
 	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
 				&crash_size, &crash_base,
-				&low_size, NULL, &high);
+				&low_size, &cma_size, &high);
 	if (ret)
 		return;
 
 	reserve_crashkernel_generic(crash_size, crash_base, low_size, high);
+	reserve_crashkernel_cma(cma_size);
 }
 
 static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 82f7327c59ea..0470acbd1fcf 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -880,11 +880,12 @@ static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND;
 /*
  * The main usage of linux,usable-memory-range is for crash dump kernel.
  * Originally, the number of usable-memory regions is one. Now there may
- * be two regions, low region and high region.
- * To make compatibility with existing user-space and older kdump, the low
- * region is always the last range of linux,usable-memory-range if exist.
+ * be 2 + CRASHK_CMA_RANGES_MAX regions, low region, high region and cma
+ * regions. To make compatibility with existing user-space and older kdump,
+ * the high and low region are always the first two ranges of
+ * linux,usable-memory-range if exist.
  */
-#define MAX_USABLE_RANGES		2
+#define MAX_USABLE_RANGES		(2 + CRASHK_CMA_RANGES_MAX)
 
 /**
  * early_init_dt_check_for_usable_mem_range - Decode usable memory range
diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
index b6837e299e7f..029903b986cb 100644
--- a/drivers/of/kexec.c
+++ b/drivers/of/kexec.c
@@ -458,6 +458,15 @@ void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
 			if (ret)
 				goto out;
 		}
+
+		for (int i = 0; i < crashk_cma_cnt; i++) {
+			ret = fdt_appendprop_addrrange(fdt, 0, chosen_node,
+					"linux,usable-memory-range",
+					crashk_cma_ranges[i].start,
+					crashk_cma_ranges[i].end - crashk_cma_ranges[i].start + 1);
+			if (ret)
+				goto out;
+		}
 #endif
 	}
 
diff --git a/include/linux/crash_reserve.h b/include/linux/crash_reserve.h
index f0dc03d94ca2..30864d90d7f5 100644
--- a/include/linux/crash_reserve.h
+++ b/include/linux/crash_reserve.h
@@ -14,9 +14,11 @@
 extern struct resource crashk_res;
 extern struct resource crashk_low_res;
 extern struct range crashk_cma_ranges[];
+
+#define CRASHK_CMA_RANGES_MAX 4
 #if defined(CONFIG_CMA) && defined(CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION)
 #define CRASHKERNEL_CMA
-#define CRASHKERNEL_CMA_RANGES_MAX 4
+#define CRASHKERNEL_CMA_RANGES_MAX (CRASHK_CMA_RANGES_MAX)
 extern int crashk_cma_cnt;
 #else
 #define crashk_cma_cnt 0
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 20/23] powerpc/kexec_file: Use crash_exclude_core_ranges() helper
From: Jinjie Ruan @ 2026-06-01  9:48 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

The crash memory exclude of crashk_res and crashk_cma memory on powerpc
are almost identical to the generic crash_exclude_core_ranges().

By introducing the architecture-specific arch_crash_exclude_mem_range()
function with a default implementation of crash_exclude_mem_range(),
and using crash_exclude_mem_range_guarded as powerpc's separate
implementation, the generic crash_exclude_core_ranges() helper function
can be reused.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shivang Upadhyay <shivangu@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/powerpc/include/asm/kexec_ranges.h |  3 ---
 arch/powerpc/kexec/crash.c              |  2 +-
 arch/powerpc/kexec/ranges.c             | 16 ++++------------
 include/linux/crash_core.h              |  4 ++++
 kernel/crash_core.c                     | 19 +++++++++++++------
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h
index ad95e3792d10..8489e844b447 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,9 +7,6 @@
 void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
 struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
 int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
-int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges,
-				    unsigned long long mstart,
-				    unsigned long long mend);
 int get_exclude_memory_ranges(struct crash_mem **mem_ranges);
 int get_reserved_memory_ranges(struct crash_mem **mem_ranges);
 int get_crash_memory_ranges(struct crash_mem **mem_ranges);
diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
index d634db67becc..775895f31037 100644
--- a/arch/powerpc/kexec/crash.c
+++ b/arch/powerpc/kexec/crash.c
@@ -513,7 +513,7 @@ static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *
 		base_addr = PFN_PHYS(mn->start_pfn);
 		size = mn->nr_pages * PAGE_SIZE;
 		end = base_addr + size - 1;
-		ret = crash_exclude_mem_range_guarded(&cmem, base_addr, end);
+		ret = arch_crash_exclude_mem_range(&cmem, base_addr, end);
 		if (ret) {
 			pr_err("Failed to remove hot-unplugged memory from crash memory ranges\n");
 			goto out;
diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
index b2fb78562cdc..539061d14a77 100644
--- a/arch/powerpc/kexec/ranges.c
+++ b/arch/powerpc/kexec/ranges.c
@@ -551,9 +551,9 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges)
 #endif /* CONFIG_KEXEC_FILE */
 
 #ifdef CONFIG_CRASH_DUMP
-int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges,
-					   unsigned long long mstart,
-					   unsigned long long mend)
+int arch_crash_exclude_mem_range(struct crash_mem **mem_ranges,
+				 unsigned long long mstart,
+				 unsigned long long mend)
 {
 	struct crash_mem *tmem = *mem_ranges;
 
@@ -602,18 +602,10 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges)
 			sort_memory_ranges(*mem_ranges, true);
 	}
 
-	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_res.start, crashk_res.end);
+	ret = crash_exclude_core_ranges(mem_ranges);
 	if (ret)
 		goto out;
 
-	for (i = 0; i < crashk_cma_cnt; ++i) {
-		ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_cma_ranges[i].start,
-					      crashk_cma_ranges[i].end);
-		if (ret)
-			goto out;
-	}
-
 	/*
 	 * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
 	 *        regions are exported to save their context at the time of
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 43baf9c87355..1ae2c0eb2eb3 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -67,6 +67,7 @@ extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_ma
 				       void **addr, unsigned long *sz);
 extern int crash_prepare_headers(int need_kernel_map, void **addr,
 				 unsigned long *sz, unsigned long *nr_mem_ranges);
+extern int crash_exclude_core_ranges(struct crash_mem **cmem);
 
 struct kimage;
 struct kexec_segment;
@@ -87,6 +88,9 @@ extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
 extern unsigned int arch_get_system_nr_ranges(void);
 extern int arch_crash_populate_cmem(struct crash_mem *cmem);
 extern int arch_crash_exclude_ranges(struct crash_mem *cmem);
+extern int arch_crash_exclude_mem_range(struct crash_mem **mem,
+					unsigned long long mstart,
+					unsigned long long mend);
 
 #else /* !CONFIG_CRASH_DUMP*/
 struct pt_regs;
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 481babc29131..2b36aa9fade0 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -285,24 +285,31 @@ unsigned int __weak arch_get_system_nr_ranges(void) { return 0; }
 int __weak arch_crash_populate_cmem(struct crash_mem *cmem) { return -1; }
 int __weak arch_crash_exclude_ranges(struct crash_mem *cmem) { return 0; }
 
-static int crash_exclude_core_ranges(struct crash_mem *cmem)
+int __weak arch_crash_exclude_mem_range(struct crash_mem **mem,
+					unsigned long long mstart,
+					unsigned long long mend)
+{
+	return crash_exclude_mem_range(*mem, mstart, mend);
+}
+
+int crash_exclude_core_ranges(struct crash_mem **cmem)
 {
 	int ret, i;
 
 	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+	ret = arch_crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
 	if (ret)
 		return ret;
 
 	if (crashk_low_res.end) {
-		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
+		ret = arch_crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
 		if (ret)
 			return ret;
 	}
 
 	for (i = 0; i < crashk_cma_cnt; ++i) {
-		ret = crash_exclude_mem_range(cmem, crashk_cma_ranges[i].start,
-					      crashk_cma_ranges[i].end);
+		ret = arch_crash_exclude_mem_range(cmem, crashk_cma_ranges[i].start,
+						   crashk_cma_ranges[i].end);
 		if (ret)
 			return ret;
 	}
@@ -329,7 +336,7 @@ int crash_prepare_headers(int need_kernel_map, void **addr, unsigned long *sz,
 	if (ret)
 		goto out;
 
-	ret = crash_exclude_core_ranges(cmem);
+	ret = crash_exclude_core_ranges(&cmem);
 	if (ret)
 		goto out;
 
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 19/23] LoongArch: kexec_file: Use crash_prepare_headers() helper to simplify code
From: Jinjie Ruan @ 2026-06-01  9:48 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Use the newly introduced crash_prepare_headers() function to replace
the existing prepare_elf_headers(), allocate cmem and exclude crash kernel
memory in the crash core, which reduce code duplication.

Only the following two architecture functions need to be implemented:
- arch_get_system_nr_ranges(). Use for_each_mem_range to traverse
  and pre-count the max number of memory ranges.

- arch_crash_populate_cmem(). Use for_each_mem_range to traverse
  and collect the memory ranges and fills them into cmem.

Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Youling Tang <tangyouling@kylinos.cn>
Cc: Baoquan He <bhe@redhat.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/loongarch/kernel/machine_kexec_file.c | 48 +++++++---------------
 1 file changed, 15 insertions(+), 33 deletions(-)

diff --git a/arch/loongarch/kernel/machine_kexec_file.c b/arch/loongarch/kernel/machine_kexec_file.c
index 3c369124586e..f3101bea9e45 100644
--- a/arch/loongarch/kernel/machine_kexec_file.c
+++ b/arch/loongarch/kernel/machine_kexec_file.c
@@ -56,52 +56,34 @@ static void cmdline_add_initrd(struct kimage *image, unsigned long *cmdline_tmpl
 }
 
 #ifdef CONFIG_CRASH_DUMP
-
-static int prepare_elf_headers(void **addr, unsigned long *sz)
+unsigned int arch_get_system_nr_ranges(void)
 {
-	int ret, nr_ranges;
-	uint64_t i;
+	/* for exclusion of crashkernel region */
+	int nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	phys_addr_t start, end;
-	struct crash_mem *cmem;
+	uint64_t i;
 
-	/* for exclusion of crashkernel region */
-	nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	for_each_mem_range(i, &start, &end)
 		nr_ranges++;
 
-	cmem = kmalloc_flex(*cmem, ranges, nr_ranges);
-	if (!cmem)
-		return -ENOMEM;
+	return nr_ranges;
+}
+
+int arch_crash_populate_cmem(struct crash_mem *cmem)
+{
+	phys_addr_t start, end;
+	uint64_t i;
 
-	cmem->max_nr_ranges = nr_ranges;
-	cmem->nr_ranges = 0;
 	for_each_mem_range(i, &start, &end) {
-		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges)) {
-			ret = -EAGAIN;
-			goto out;
-		}
+		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges))
+			return -EAGAIN;
 
 		cmem->ranges[cmem->nr_ranges].start = start;
 		cmem->ranges[cmem->nr_ranges].end = end - 1;
 		cmem->nr_ranges++;
 	}
 
-	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
-	if (ret < 0)
-		goto out;
-
-	if (crashk_low_res.end) {
-		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
-		if (ret < 0)
-			goto out;
-	}
-
-	ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
-
-out:
-	kfree(cmem);
-	return ret;
+	return 0;
 }
 
 /*
@@ -169,7 +151,7 @@ int load_other_segments(struct kimage *image,
 		void *headers;
 		unsigned long headers_sz;
 
-		ret = prepare_elf_headers(&headers, &headers_sz);
+		ret = crash_prepare_headers(true, &headers, &headers_sz, NULL);
 		if (ret < 0) {
 			pr_err("Preparing elf core header failed\n");
 			goto out_err;
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 18/23] riscv: kexec_file: Use crash_prepare_headers() helper to simplify code
From: Jinjie Ruan @ 2026-06-01  9:48 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Use the newly introduced crash_prepare_headers() function to replace
the existing prepare_elf_headers(), allocate cmem and exclude crash kernel
memory in the crash core, which reduce code duplication.

Only the following two architecture functions need to be implemented:
- arch_get_system_nr_ranges(). Call get_nr_ram_ranges_callback()
  to pre-counts the max number of memory ranges.

- arch_crash_populate_cmem(). Use prepare_elf64_ram_headers_callback()
  to collects the memory ranges and fills them into cmem.

Cc: Paul Walmsley <pjw@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Guo Ren <guoren@kernel.org>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/riscv/kernel/machine_kexec_file.c | 49 +++++++-------------------
 1 file changed, 13 insertions(+), 36 deletions(-)

diff --git a/arch/riscv/kernel/machine_kexec_file.c b/arch/riscv/kernel/machine_kexec_file.c
index f3576dc0513f..6e2a6747d187 100644
--- a/arch/riscv/kernel/machine_kexec_file.c
+++ b/arch/riscv/kernel/machine_kexec_file.c
@@ -44,6 +44,16 @@ static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 	return 0;
 }
 
+unsigned int arch_get_system_nr_ranges(void)
+{
+	/* For exclusion of crashkernel region */
+	unsigned int nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
+
+	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
+
+	return nr_ranges;
+}
+
 static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
 	struct crash_mem *cmem = arg;
@@ -58,42 +68,9 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 	return 0;
 }
 
-static int prepare_elf_headers(void **addr, unsigned long *sz)
+int arch_crash_populate_cmem(struct crash_mem *cmem)
 {
-	struct crash_mem *cmem;
-	unsigned int nr_ranges;
-	int ret;
-
-	/* For exclusion of crashkernel region */
-	nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
-	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
-
-	cmem = kmalloc_flex(*cmem, ranges, nr_ranges);
-	if (!cmem)
-		return -ENOMEM;
-
-	cmem->max_nr_ranges = nr_ranges;
-	cmem->nr_ranges = 0;
-	ret = walk_system_ram_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
-	if (ret)
-		goto out;
-
-	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
-	if (ret)
-		goto out;
-
-	if (crashk_low_res.end) {
-		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
-		if (ret)
-			goto out;
-	}
-
-	ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
-
-out:
-	kfree(cmem);
-	return ret;
+	return walk_system_ram_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
 }
 
 static char *setup_kdump_cmdline(struct kimage *image, char *cmdline,
@@ -285,7 +262,7 @@ int load_extra_segments(struct kimage *image, unsigned long kernel_start,
 	if (image->type == KEXEC_TYPE_CRASH) {
 		void *headers;
 		unsigned long headers_sz;
-		ret = prepare_elf_headers(&headers, &headers_sz);
+		ret = crash_prepare_headers(true, &headers, &headers_sz, NULL);
 		if (ret) {
 			pr_err("Preparing elf core header failed\n");
 			goto out;
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 17/23] x86: kexec_file: Use crash_prepare_headers() helper to simplify code
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Use the newly introduced crash_prepare_headers() function to replace
the existing prepare_elf_headers(), allocate cmem and exclude crash kernel
memory in the crash core, which reduce code duplication.

Only the following three architecture functions need to be implemented:
- arch_get_system_nr_ranges(). Call get_nr_ram_ranges_callback()
  to pre-count the max number of memory ranges.

- arch_crash_populate_cmem(). Use prepare_elf64_ram_headers_callback()
  to collect the memory ranges and fills them into cmem.

- arch_crash_exclude_ranges(). Exclude the low 1M for x86.

By the way, remove the unused "nr_mem_ranges" in
arch_crash_handle_hotplug_event().

Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/x86/kernel/crash.c | 89 +++++------------------------------------
 1 file changed, 11 insertions(+), 78 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index a1089907728d..7145b00da4ee 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -153,16 +153,8 @@ static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 	return 0;
 }
 
-/* Gather all the required information to prepare elf headers for ram regions */
-static struct crash_mem *fill_up_crash_elf_data(void)
+unsigned int arch_get_system_nr_ranges(void)
 {
-	unsigned int nr_ranges = 0;
-	struct crash_mem *cmem;
-
-	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
-	if (!nr_ranges)
-		return NULL;
-
 	/*
 	 * Exclusion of crash region, crashk_low_res and/or crashk_cma_ranges
 	 * may cause range splits. So add extra slots here.
@@ -177,49 +169,16 @@ static struct crash_mem *fill_up_crash_elf_data(void)
 	 * But in order to lest the low 1M could be changed in the future,
 	 * (e.g. [start, 1M]), add a extra slot.
 	 */
-	nr_ranges += 3 + crashk_cma_cnt + CRASH_HOTPLUG_SAFETY_PADDING;
-	cmem = vzalloc(struct_size(cmem, ranges, nr_ranges));
-	if (!cmem)
-		return NULL;
-
-	cmem->max_nr_ranges = nr_ranges;
+	unsigned int nr_ranges = 3 + crashk_cma_cnt + CRASH_HOTPLUG_SAFETY_PADDING;
 
-	return cmem;
+	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
+	return nr_ranges;
 }
 
-/*
- * Look for any unwanted ranges between mstart, mend and remove them. This
- * might lead to split and split ranges are put in cmem->ranges[] array
- */
-static int elf_header_exclude_ranges(struct crash_mem *cmem)
+int arch_crash_exclude_ranges(struct crash_mem *cmem)
 {
-	int ret = 0;
-	int i;
-
 	/* Exclude the low 1M because it is always reserved */
-	ret = crash_exclude_mem_range(cmem, 0, SZ_1M - 1);
-	if (ret)
-		return ret;
-
-	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
-	if (ret)
-		return ret;
-
-	if (crashk_low_res.end)
-		ret = crash_exclude_mem_range(cmem, crashk_low_res.start,
-					      crashk_low_res.end);
-	if (ret)
-		return ret;
-
-	for (i = 0; i < crashk_cma_cnt; ++i) {
-		ret = crash_exclude_mem_range(cmem, crashk_cma_ranges[i].start,
-					      crashk_cma_ranges[i].end);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
+	return crash_exclude_mem_range(cmem, 0, SZ_1M - 1);
 }
 
 static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
@@ -236,35 +195,9 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 	return 0;
 }
 
-/* Prepare elf headers. Return addr and size */
-static int prepare_elf_headers(void **addr, unsigned long *sz,
-			       unsigned long *nr_mem_ranges)
+int arch_crash_populate_cmem(struct crash_mem *cmem)
 {
-	struct crash_mem *cmem;
-	int ret;
-
-	cmem = fill_up_crash_elf_data();
-	if (!cmem)
-		return -ENOMEM;
-
-	ret = walk_system_ram_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
-	if (ret)
-		goto out;
-
-	/* Exclude unwanted mem ranges */
-	ret = elf_header_exclude_ranges(cmem);
-	if (ret)
-		goto out;
-
-	/* Return the computed number of memory ranges, for hotplug usage */
-	*nr_mem_ranges = cmem->nr_ranges;
-
-	/* By default prepare 64bit headers */
-	ret = crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);
-
-out:
-	vfree(cmem);
-	return ret;
+	return walk_system_ram_res(0, -1, cmem, prepare_elf64_ram_headers_callback);
 }
 #endif
 
@@ -422,7 +355,8 @@ int crash_load_segments(struct kimage *image)
 				  .buf_max = ULONG_MAX, .top_down = false };
 
 	/* Prepare elf headers and add a segment */
-	ret = prepare_elf_headers(&kbuf.buffer, &kbuf.bufsz, &pnum);
+	ret = crash_prepare_headers(IS_ENABLED(CONFIG_X86_64), &kbuf.buffer,
+				    &kbuf.bufsz, &pnum);
 	if (ret)
 		return ret;
 
@@ -515,7 +449,6 @@ unsigned int arch_crash_get_elfcorehdr_size(void)
 void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
 	void *elfbuf = NULL, *old_elfcorehdr;
-	unsigned long nr_mem_ranges;
 	unsigned long mem, memsz;
 	unsigned long elfsz = 0;
 
@@ -533,7 +466,7 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 	 * Create the new elfcorehdr reflecting the changes to CPU and/or
 	 * memory resources.
 	 */
-	if (prepare_elf_headers(&elfbuf, &elfsz, &nr_mem_ranges)) {
+	if (crash_prepare_headers(IS_ENABLED(CONFIG_X86_64), &elfbuf, &elfsz, NULL)) {
 		pr_err("unable to create new elfcorehdr");
 		goto out;
 	}
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 16/23] arm64: kexec_file: Use crash_prepare_headers() helper to simplify code
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Use the newly introduced crash_prepare_headers() function to replace
the existing prepare_elf_headers(), allocate cmem and exclude crash
kernel memory in the crash core, which reduce code duplication.

Only the following two architecture functions need to be implemented:
- arch_get_system_nr_ranges(). Use for_each_mem_range() to traverse
  and pre-count the max number of memory ranges.

- arch_crash_populate_cmem(). Use for_each_mem_range to traverse
  and collect the memory ranges and fills them into cmem.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/include/asm/kexec.h         |  1 -
 arch/arm64/kernel/kexec_image.c        |  2 +-
 arch/arm64/kernel/machine_kexec_file.c | 46 ++++++++------------------
 3 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 7ffa2ff5fcfd..892e5bebda95 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -128,7 +128,6 @@ extern int load_other_segments(struct kimage *image,
 		unsigned long kernel_load_addr, unsigned long kernel_size,
 		char *initrd, unsigned long initrd_len,
 		char *cmdline);
-extern int prepare_elf_headers(void **addr, unsigned long *sz);
 #endif
 
 #endif /* __ASSEMBLER__ */
diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 424b9527db09..93c36a3aa618 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -95,7 +95,7 @@ static void *image_load(struct kimage *image,
 		unsigned long headers_sz;
 		void *headers;
 
-		ret = prepare_elf_headers(&headers, &headers_sz);
+		ret = crash_prepare_headers(true, &headers, &headers_sz, NULL);
 		if (ret) {
 			pr_err("Preparing elf core header failed\n");
 			return ERR_PTR(ret);
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 8a96fb68b88d..14e65351133e 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -40,52 +40,34 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
 }
 
 #ifdef CONFIG_CRASH_DUMP
-int prepare_elf_headers(void **addr, unsigned long *sz)
+unsigned int arch_get_system_nr_ranges(void)
 {
-	struct crash_mem *cmem;
-	unsigned int nr_ranges;
-	int ret;
-	u64 i;
+	/* for exclusion of crashkernel region */
+	unsigned int nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	phys_addr_t start, end;
+	u64 i;
 
-	/* for exclusion of crashkernel region */
-	nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	for_each_mem_range(i, &start, &end)
 		nr_ranges++;
 
-	cmem = kmalloc_flex(*cmem, ranges, nr_ranges);
-	if (!cmem)
-		return -ENOMEM;
+	return nr_ranges;
+}
+
+int arch_crash_populate_cmem(struct crash_mem *cmem)
+{
+	phys_addr_t start, end;
+	u64 i;
 
-	cmem->max_nr_ranges = nr_ranges;
-	cmem->nr_ranges = 0;
 	for_each_mem_range(i, &start, &end) {
-		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges)) {
-			ret = -EAGAIN;
-			goto out;
-		}
+		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges))
+			return -EAGAIN;
 
 		cmem->ranges[cmem->nr_ranges].start = start;
 		cmem->ranges[cmem->nr_ranges].end = end - 1;
 		cmem->nr_ranges++;
 	}
 
-	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
-	if (ret)
-		goto out;
-
-	if (crashk_low_res.end) {
-		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
-		if (ret)
-			goto out;
-	}
-
-	ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
-
-out:
-	kfree(cmem);
-	return ret;
+	return 0;
 }
 #endif
 
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 15/23] crash: Add crash_prepare_headers() to exclude crash kernel memory
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

The crash memory alloc, and the exclude of crashk_res, crashk_low_res
and crashk_cma memory are almost identical across different architectures,
handling them in the crash core would eliminate a lot of duplication, so
add crash_prepare_headers() helper to handle them in the common code.

To achieve the above goal, three architecture-specific functions are
introduced:

- arch_get_system_nr_ranges(). Pre-counts the max number of memory ranges.

- arch_crash_populate_cmem(). Collects the memory ranges and fills them
  into cmem.

- arch_crash_exclude_ranges(). Architecture's additional crash memory
  ranges exclusion, defaulting to empty.

Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 include/linux/crash_core.h |  5 +++
 kernel/crash_core.c        | 82 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index d4762e000098..43baf9c87355 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -65,6 +65,8 @@ extern int crash_exclude_mem_range(struct crash_mem *mem,
 				   unsigned long long mend);
 extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 				       void **addr, unsigned long *sz);
+extern int crash_prepare_headers(int need_kernel_map, void **addr,
+				 unsigned long *sz, unsigned long *nr_mem_ranges);
 
 struct kimage;
 struct kexec_segment;
@@ -82,6 +84,9 @@ int kexec_should_crash(struct task_struct *p);
 int kexec_crash_loaded(void);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
+extern unsigned int arch_get_system_nr_ranges(void);
+extern int arch_crash_populate_cmem(struct crash_mem *cmem);
+extern int arch_crash_exclude_ranges(struct crash_mem *cmem);
 
 #else /* !CONFIG_CRASH_DUMP*/
 struct pt_regs;
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 4f21fc3b108b..481babc29131 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -168,9 +168,6 @@ static inline resource_size_t crash_resource_size(const struct resource *res)
 	return !res->end ? 0 : resource_size(res);
 }
 
-
-
-
 int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 			  void **addr, unsigned long *sz)
 {
@@ -272,6 +269,85 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
 	return 0;
 }
 
+static struct crash_mem *alloc_cmem(unsigned int nr_ranges)
+{
+	struct crash_mem *cmem;
+
+	cmem = kvzalloc_flex(*cmem, ranges, nr_ranges);
+	if (!cmem)
+		return NULL;
+
+	cmem->max_nr_ranges = nr_ranges;
+	return cmem;
+}
+
+unsigned int __weak arch_get_system_nr_ranges(void) { return 0; }
+int __weak arch_crash_populate_cmem(struct crash_mem *cmem) { return -1; }
+int __weak arch_crash_exclude_ranges(struct crash_mem *cmem) { return 0; }
+
+static int crash_exclude_core_ranges(struct crash_mem *cmem)
+{
+	int ret, i;
+
+	/* Exclude crashkernel region */
+	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+	if (ret)
+		return ret;
+
+	if (crashk_low_res.end) {
+		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
+		if (ret)
+			return ret;
+	}
+
+	for (i = 0; i < crashk_cma_cnt; ++i) {
+		ret = crash_exclude_mem_range(cmem, crashk_cma_ranges[i].start,
+					      crashk_cma_ranges[i].end);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int crash_prepare_headers(int need_kernel_map, void **addr, unsigned long *sz,
+			  unsigned long *nr_mem_ranges)
+{
+	unsigned int max_nr_ranges;
+	struct crash_mem *cmem;
+	int ret;
+
+	max_nr_ranges = arch_get_system_nr_ranges();
+	if (!max_nr_ranges)
+		return -ENOMEM;
+
+	cmem = alloc_cmem(max_nr_ranges);
+	if (!cmem)
+		return -ENOMEM;
+
+	ret = arch_crash_populate_cmem(cmem);
+	if (ret)
+		goto out;
+
+	ret = crash_exclude_core_ranges(cmem);
+	if (ret)
+		goto out;
+
+	ret = arch_crash_exclude_ranges(cmem);
+	if (ret)
+		goto out;
+
+	/* Return the computed number of memory ranges, for hotplug usage */
+	if (nr_mem_ranges)
+		*nr_mem_ranges = cmem->nr_ranges;
+
+	ret = crash_prepare_elf64_headers(cmem, need_kernel_map, addr, sz);
+
+out:
+	kvfree(cmem);
+	return ret;
+}
+
 /**
  * crash_exclude_mem_range - exclude a mem range for existing ranges
  * @mem: mem->range contains an array of ranges sorted in ascending order
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 14/23] LoongArch: kexec_file: Fix TOCTOU buffer overflow via memory region padding
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to
Time-of-Use) race condition in prepare_elf_headers() between the initial
pass that counts System RAM ranges and the second pass that populates them.
If a memory hotplug event occurs between these two steps, the number of
memory regions may increase, causing an out-of-bounds write to
the cmem->ranges[] array.

Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING` (128 slots)
to expand the flexible array allocation ceiling upfront. This safely
absorbs any concurrent memory region expansion. Concurrently, add
a defensive boundary check to return -EAGAIN on unexpected overrun,
fully eradicating the overflow window and ensuring system stability.

Cc: Youling Tang <tangyouling@kylinos.cn>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: stable@vger.kernel.org
Fixes: 1bcca8620a91 ("LoongArch: Add crash dump support for kexec_file")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/loongarch/kernel/machine_kexec_file.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/kernel/machine_kexec_file.c b/arch/loongarch/kernel/machine_kexec_file.c
index 5584b798ba46..3c369124586e 100644
--- a/arch/loongarch/kernel/machine_kexec_file.c
+++ b/arch/loongarch/kernel/machine_kexec_file.c
@@ -64,7 +64,8 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
 	phys_addr_t start, end;
 	struct crash_mem *cmem;
 
-	nr_ranges = 2; /* for exclusion of crashkernel region */
+	/* for exclusion of crashkernel region */
+	nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	for_each_mem_range(i, &start, &end)
 		nr_ranges++;
 
@@ -75,6 +76,11 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
 	cmem->max_nr_ranges = nr_ranges;
 	cmem->nr_ranges = 0;
 	for_each_mem_range(i, &start, &end) {
+		if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+
 		cmem->ranges[cmem->nr_ranges].start = start;
 		cmem->ranges[cmem->nr_ranges].end = end - 1;
 		cmem->nr_ranges++;
-- 
2.34.1



^ permalink raw reply related

* [PATCH v15 13/23] riscv: kexec_file: Fix TOCTOU buffer overflow via memory region padding
From: Jinjie Ruan @ 2026-06-01  9:47 UTC (permalink / raw)
  To: corbet, skhan, catalin.marinas, will, chenhuacai, kernel, maddy,
	mpe, npiggin, chleroy, pjw, palmer, aou, alex, tglx, mingo, bp,
	dave.hansen, hpa, robh, saravanak, akpm, bhe, rppt,
	pasha.tatashin, pratyush, ruirui.yang, rdunlap, feng.tang,
	dapeng1.mi, kees, elver, kuba, lirongqing, ebiggers, paulmck,
	sourabhjain, thuth, ardb, masahiroy, gshan, james.morse, maz,
	leitao, yeoreum.yun, coxu, suzuki.poulose, cfsworks, osandov,
	jbohac, ryan.roberts, tangyouling, ritesh.list, adityag, hbathini,
	bjorn, songshuaishuai, vishal.moola, junhui.liu,
	djordje.todorovic, austin.kim, namcao, djbw, chao.gao, seanjc,
	fuqiang.wang, liaoyuanhong, makb, graf, piliu, rafael.j.wysocki,
	mario.limonciello, jbouron, chenjiahao16, guoren, bauerman, bgwin,
	takahiro.akashi, x86, linux-doc, linux-kernel, linux-arm-kernel,
	loongarch, linuxppc-dev, linux-riscv, devicetree, kexec
  Cc: ruanjinjie
In-Reply-To: <20260601094805.2928614-1-ruanjinjie@huawei.com>

Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to
Time-of-Use) race condition in prepare_elf_headers() between the initial
pass that counts System RAM ranges and the second pass that populates them.
If a memory hotplug event occurs between these two steps, the number of
memory regions may increase, causing an out-of-bounds write to
the cmem->ranges[] array.

Fix this fundamentally by using `CRASH_HOTPLUG_SAFETY_PADDING` (128 slots)
to expand the flexible array allocation ceiling upfront. This safely
absorbs any concurrent memory region expansion. Concurrently, add
a defensive boundary check inside the callback to return -EAGAIN on
unexpected overrun, fully eradicating the overflow window and ensuring
system stability.

Cc: Paul Walmsley <pjw@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: songshuaishuai@tinylab.org
Cc: bjorn@rivosinc.com
Cc: leitao@debian.org
Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic")
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/riscv/kernel/machine_kexec_file.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/machine_kexec_file.c b/arch/riscv/kernel/machine_kexec_file.c
index 3f7766057cac..f3576dc0513f 100644
--- a/arch/riscv/kernel/machine_kexec_file.c
+++ b/arch/riscv/kernel/machine_kexec_file.c
@@ -48,6 +48,9 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
 	struct crash_mem *cmem = arg;
 
+	if (unlikely(cmem->nr_ranges >= cmem->max_nr_ranges))
+		return -EAGAIN;
+
 	cmem->ranges[cmem->nr_ranges].start = res->start;
 	cmem->ranges[cmem->nr_ranges].end = res->end;
 	cmem->nr_ranges++;
@@ -61,7 +64,8 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
 	unsigned int nr_ranges;
 	int ret;
 
-	nr_ranges = 2; /* For exclusion of crashkernel region */
+	/* For exclusion of crashkernel region */
+	nr_ranges = 2 + CRASH_HOTPLUG_SAFETY_PADDING;
 	walk_system_ram_res(0, -1, &nr_ranges, get_nr_ram_ranges_callback);
 
 	cmem = kmalloc_flex(*cmem, ranges, nr_ranges);
-- 
2.34.1



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox