* [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump
@ 2021-10-20  2:03 Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN Zhen Lei
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.
To solve these issues, change the behavior of crashkernel=X.
crashkernel=X tries low allocation in DMA zone and fall back to high
allocation if it fails.
We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.
When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices. So there may be two regions reserved for
crash dump kernel.
In order to distinct from the high region and make no effect to the use
of existing kexec-tools, rename the low region as "Crash kernel (low)",
and pass the low region by reusing DT property
"linux,usable-memory-range". We made the low memory region as the last
range of "linux,usable-memory-range" to keep compatibility with existing
user-space and older kdump kernels.
Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])
Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])
This patchset contains the following 10 patches:
0001-0004 are some x86 cleanups which prepares for making
functionsreserve_crashkernel[_low]() generic.
0005 makes functions reserve_crashkernel[_low]() generic.
0006-0008 reimplements arm64 crashkernel=X.
0009 adds memory for devices by DT property linux,usable-memory-range.
0010 updates the doc.
Changes since [v14]
- Recovering the requirement that the CrashKernel memory regions on X86
  only requires 1 MiB alignment.
- Combine patches 5 and 6 in v14 into one. The compilation warning fixed
  by patch 6 was introduced by patch 5 in v14.
- As with crashk_res, crashk_low_res is also processed by
  crash_exclude_mem_range() in patch 7.
- Due to commit b261dba2fdb2 ("arm64: kdump: Remove custom linux,usable-memory-range handling")
  has removed the architecture-specific code, extend the property "linux,usable-memory-range"
  in the platform-agnostic FDT core code. See patch 9.
- Discard the x86 description update in the document, because the description
  has been updated by commit b1f4c363666c ("Documentation: kdump: update kdump guide").
- Change "arm64" to "ARM64" in Doc.
Changes since [v13]
- Rebased on top of 5.11-rc5.
- Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
Since reserve_crashkernel[_low]() implementations are quite similar on
other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
arch/Kconfig and select this by X86 and ARM64.
- Some minor cleanup.
Changes since [v12]
- Rebased on top of 5.10-rc1.
- Keep CRASH_ALIGN as 16M suggested by Dave.
- Drop patch "kdump: add threshold for the required memory".
- Add Tested-by from John.
Changes since [v11]
- Rebased on top of 5.9-rc4.
- Make the function reserve_crashkernel() of x86 generic.
Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
and arm64 use the generic version to reimplement crashkernel=X.
Changes since [v10]
- Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.
Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.
Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.
Changes since [v6]
- Fix build errors reported by kbuild test robot.
Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.
Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.
Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.
Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.
[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19 
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73
[v10]: https://lkml.org/lkml/2020/7/2/1443
[v11]: https://lkml.org/lkml/2020/8/1/150
[v12]: https://lkml.org/lkml/2020/9/7/1037
[v13]: https://lkml.org/lkml/2020/10/31/34
[v14]: https://lkml.org/lkml/2021/1/30/53
Chen Zhou (10):
  x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
  x86: kdump: make the lower bound of crash kernel reservation
    consistent
  x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
    reserve_crashkernel()
  x86: kdump: move xen_pv_domain() check and insert_resource() to
    setup_arch()
  x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
  arm64: kdump: introduce some macros for crash kernel reservation
  arm64: kdump: reimplement crashkernel=X
  x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
  of: fdt: Add memory for devices by DT property
    "linux,usable-memory-range"
  kdump: update Documentation about crashkernel
 Documentation/admin-guide/kdump/kdump.rst     |  11 +-
 .../admin-guide/kernel-parameters.txt         |  11 +-
 arch/Kconfig                                  |   3 +
 arch/arm64/Kconfig                            |   1 +
 arch/arm64/include/asm/kexec.h                |  10 ++
 arch/arm64/kernel/machine_kexec_file.c        |  12 +-
 arch/arm64/kernel/setup.c                     |  13 +-
 arch/arm64/mm/init.c                          |  59 ++-----
 arch/x86/Kconfig                              |   2 +
 arch/x86/include/asm/elf.h                    |   3 +
 arch/x86/include/asm/kexec.h                  |  31 +++-
 arch/x86/kernel/setup.c                       | 163 ++----------------
 drivers/of/fdt.c                              |  47 +++--
 include/linux/crash_core.h                    |   3 +
 include/linux/kexec.h                         |   2 -
 kernel/crash_core.c                           | 156 +++++++++++++++++
 kernel/kexec_core.c                           |  17 --
 17 files changed, 306 insertions(+), 238 deletions(-)
-- 
2.25.1
^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH v15 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent Zhen Lei
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
Move CRASH_ALIGN to header asm/kexec.h for later use.
Suggested-by: Dave Young <dyoung@redhat.com>
Suggested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
---
 arch/x86/include/asm/kexec.h | 3 +++
 arch/x86/kernel/setup.c      | 3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 0a6e34b07017586..5f63ad6b6e74b15 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -18,6 +18,9 @@
 
 # define KEXEC_CONTROL_CODE_MAX_SIZE	2048
 
+/* 16M alignment for crash kernel regions */
+#define CRASH_ALIGN		SZ_16M
+
 #ifndef __ASSEMBLY__
 
 #include <linux/string.h>
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 40ed44ead063128..3d394127dc03d20 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -392,9 +392,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 
 #ifdef CONFIG_KEXEC_CORE
 
-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN		SZ_16M
-
 /*
  * Keep the crash kernel below this limit.
  *
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() Zhen Lei
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
The lower bounds of crash kernel reservation and crash kernel low
reservation are different, use the consistent value CRASH_ALIGN.
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
---
 arch/x86/kernel/setup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d394127dc03d20..5bebd46c7ce81f5 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -441,7 +441,8 @@ static int __init reserve_crashkernel_low(void)
 			return 0;
 	}
 
-	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, CRASH_ADDR_LOW_MAX);
+	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
+			CRASH_ADDR_LOW_MAX);
 	if (!low_base) {
 		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
 		       (unsigned long)(low_size >> 20));
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel()
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() Zhen Lei
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
To make the functions reserve_crashkernel() as generic,
replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/kernel/setup.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 5bebd46c7ce81f5..1b2c9f5c71a870e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -489,8 +489,9 @@ static void __init reserve_crashkernel(void)
 	if (!crash_base) {
 		/*
 		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-		 * crashkernel=x,high reserves memory over 4G, also allocates
-		 * 256M extra low memory for DMA buffers and swiotlb.
+		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+		 * also allocates 256M extra low memory for DMA buffers
+		 * and swiotlb.
 		 * But the extra memory is not required for all machines.
 		 * So try low memory first and fall back to high memory
 		 * unless "crashkernel=size[KMG],high" is specified.
@@ -518,7 +519,7 @@ static void __init reserve_crashkernel(void)
 		}
 	}
 
-	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
+	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
 		memblock_free(crash_base, crash_size);
 		return;
 	}
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch()
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
                   ` (2 preceding siblings ...)
  2021-10-20  2:03 ` [PATCH v15 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c Zhen Lei
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
We will make the functions reserve_crashkernel() as generic, the
xen_pv_domain() check in reserve_crashkernel() is relevant only to
x86, the same as insert_resource() in reserve_crashkernel[_low]().
So move xen_pv_domain() check and insert_resource() to setup_arch()
to keep them in x86.
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 arch/x86/kernel/setup.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1b2c9f5c71a870e..657ec7fb62da37c 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -456,7 +456,6 @@ static int __init reserve_crashkernel_low(void)
 
 	crashk_low_res.start = low_base;
 	crashk_low_res.end   = low_base + low_size - 1;
-	insert_resource(&iomem_resource, &crashk_low_res);
 #endif
 	return 0;
 }
@@ -480,11 +479,6 @@ static void __init reserve_crashkernel(void)
 		high = true;
 	}
 
-	if (xen_pv_domain()) {
-		pr_info("Ignoring crashkernel for a Xen PV domain\n");
-		return;
-	}
-
 	/* 0 means: find the address automatically */
 	if (!crash_base) {
 		/*
@@ -531,7 +525,6 @@ static void __init reserve_crashkernel(void)
 
 	crashk_res.start = crash_base;
 	crashk_res.end   = crash_base + crash_size - 1;
-	insert_resource(&iomem_resource, &crashk_res);
 }
 #else
 static void __init reserve_crashkernel(void)
@@ -1131,7 +1124,17 @@ void __init setup_arch(char **cmdline_p)
 	 * Reserve memory for crash kernel after SRAT is parsed so that it
 	 * won't consume hotpluggable memory.
 	 */
-	reserve_crashkernel();
+	if (xen_pv_domain())
+		pr_info("Ignoring crashkernel for a Xen PV domain\n");
+	else {
+		reserve_crashkernel();
+#ifdef CONFIG_KEXEC_CORE
+		if (crashk_res.end > crashk_res.start)
+			insert_resource(&iomem_resource, &crashk_res);
+		if (crashk_low_res.end > crashk_low_res.start)
+			insert_resource(&iomem_resource, &crashk_low_res);
+#endif
+	}
 
 	memblock_find_dma_reserve();
 
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
                   ` (3 preceding siblings ...)
  2021-10-20  2:03 ` [PATCH v15 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 06/10] arm64: kdump: introduce some macros for crash kernel reservation Zhen Lei
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
Make the functions reserve_crashkernel[_low]() as generic.
Arm64 will use these to reimplement crashkernel=X.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
---
 arch/x86/include/asm/elf.h   |   3 +
 arch/x86/include/asm/kexec.h |  28 +++++-
 arch/x86/kernel/setup.c      | 143 +------------------------------
 include/linux/crash_core.h   |   3 +
 include/linux/kexec.h        |   2 -
 kernel/crash_core.c          | 159 +++++++++++++++++++++++++++++++++++
 kernel/kexec_core.c          |  17 ----
 7 files changed, 192 insertions(+), 163 deletions(-)
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 29fea180a6658e8..7a6c36cff8331f5 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;
 
 #define elf_check_arch(x)	elf_check_arch_ia32(x)
 
+/* We can also handle crash dumps from 64 bit kernel. */
+# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
+
 /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
    contains a pointer to a function which might be registered using `atexit'.
    This provides a mean for the dynamic linker to call DT_FINI functions for
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5f63ad6b6e74b15..3533ede83b42158 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -21,6 +21,27 @@
 /* 16M alignment for crash kernel regions */
 #define CRASH_ALIGN		SZ_16M
 
+/*
+ * Keep the crash kernel below this limit.
+ *
+ * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
+ * due to mapping restrictions.
+ *
+ * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
+ * the upper limit of system RAM in 4-level paging mode. Since the kdump
+ * jump could be from 5-level paging to 4-level paging, the jump will fail if
+ * the kernel is put above 64 TB, and during the 1st kernel bootup there's
+ * no good way to detect the paging mode of the target kernel which will be
+ * loaded for dumping.
+ */
+#ifdef CONFIG_X86_32
+# define CRASH_ADDR_LOW_MAX	SZ_512M
+# define CRASH_ADDR_HIGH_MAX	SZ_512M
+#else
+# define CRASH_ADDR_LOW_MAX	SZ_4G
+# define CRASH_ADDR_HIGH_MAX	SZ_64T
+#endif
+
 #ifndef __ASSEMBLY__
 
 #include <linux/string.h>
@@ -51,9 +72,6 @@ struct kimage;
 
 /* The native architecture */
 # define KEXEC_ARCH KEXEC_ARCH_386
-
-/* We can also handle crash dumps from 64 bit kernel. */
-# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
 #else
 /* Maximum physical address we can use pages from */
 # define KEXEC_SOURCE_MEMORY_LIMIT      (MAXMEM-1)
@@ -195,6 +213,10 @@ typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 657ec7fb62da37c..bef6340e0e32441 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -39,6 +39,7 @@
 #include <asm/io_apic.h>
 #include <asm/kasan.h>
 #include <asm/kaslr.h>
+#include <asm/kexec.h>
 #include <asm/mce.h>
 #include <asm/mtrr.h>
 #include <asm/realmode.h>
@@ -386,147 +387,7 @@ static void __init memblock_x86_reserve_range_setup_data(void)
 	}
 }
 
-/*
- * --------- Crashkernel reservation ------------------------------
- */
-
-#ifdef CONFIG_KEXEC_CORE
-
-/*
- * Keep the crash kernel below this limit.
- *
- * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
- * due to mapping restrictions.
- *
- * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
- * the upper limit of system RAM in 4-level paging mode. Since the kdump
- * jump could be from 5-level paging to 4-level paging, the jump will fail if
- * the kernel is put above 64 TB, and during the 1st kernel bootup there's
- * no good way to detect the paging mode of the target kernel which will be
- * loaded for dumping.
- */
-#ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX	SZ_512M
-# define CRASH_ADDR_HIGH_MAX	SZ_512M
-#else
-# define CRASH_ADDR_LOW_MAX	SZ_4G
-# define CRASH_ADDR_HIGH_MAX	SZ_64T
-#endif
-
-static int __init reserve_crashkernel_low(void)
-{
-#ifdef CONFIG_X86_64
-	unsigned long long base, low_base = 0, low_size = 0;
-	unsigned long low_mem_limit;
-	int ret;
-
-	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
-
-	/* crashkernel=Y,low */
-	ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
-	if (ret) {
-		/*
-		 * two parts from kernel/dma/swiotlb.c:
-		 * -swiotlb size: user-specified with swiotlb= or default.
-		 *
-		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
-		 * to 8M for other buffers that may need to stay low too. Also
-		 * make sure we allocate enough extra low memory so that we
-		 * don't run out of DMA buffers for 32-bit devices.
-		 */
-		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
-	} else {
-		/* passed with crashkernel=0,low ? */
-		if (!low_size)
-			return 0;
-	}
-
-	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
-			CRASH_ADDR_LOW_MAX);
-	if (!low_base) {
-		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
-		       (unsigned long)(low_size >> 20));
-		return -ENOMEM;
-	}
-
-	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
-		(unsigned long)(low_size >> 20),
-		(unsigned long)(low_base >> 20),
-		(unsigned long)(low_mem_limit >> 20));
-
-	crashk_low_res.start = low_base;
-	crashk_low_res.end   = low_base + low_size - 1;
-#endif
-	return 0;
-}
-
-static void __init reserve_crashkernel(void)
-{
-	unsigned long long crash_size, crash_base, total_mem;
-	bool high = false;
-	int ret;
-
-	total_mem = memblock_phys_mem_size();
-
-	/* crashkernel=XM */
-	ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
-	if (ret != 0 || crash_size <= 0) {
-		/* crashkernel=X,high */
-		ret = parse_crashkernel_high(boot_command_line, total_mem,
-					     &crash_size, &crash_base);
-		if (ret != 0 || crash_size <= 0)
-			return;
-		high = true;
-	}
-
-	/* 0 means: find the address automatically */
-	if (!crash_base) {
-		/*
-		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
-		 * also allocates 256M extra low memory for DMA buffers
-		 * and swiotlb.
-		 * But the extra memory is not required for all machines.
-		 * So try low memory first and fall back to high memory
-		 * unless "crashkernel=size[KMG],high" is specified.
-		 */
-		if (!high)
-			crash_base = memblock_phys_alloc_range(crash_size,
-						CRASH_ALIGN, CRASH_ALIGN,
-						CRASH_ADDR_LOW_MAX);
-		if (!crash_base)
-			crash_base = memblock_phys_alloc_range(crash_size,
-						CRASH_ALIGN, CRASH_ALIGN,
-						CRASH_ADDR_HIGH_MAX);
-		if (!crash_base) {
-			pr_info("crashkernel reservation failed - No suitable area found.\n");
-			return;
-		}
-	} else {
-		unsigned long long start;
-
-		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
-						  crash_base + crash_size);
-		if (start != crash_base) {
-			pr_info("crashkernel reservation failed - memory is in use.\n");
-			return;
-		}
-	}
-
-	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
-		memblock_free(crash_base, crash_size);
-		return;
-	}
-
-	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
-		(unsigned long)(crash_size >> 20),
-		(unsigned long)(crash_base >> 20),
-		(unsigned long)(total_mem >> 20));
-
-	crashk_res.start = crash_base;
-	crashk_res.end   = crash_base + crash_size - 1;
-}
-#else
+#ifndef CONFIG_KEXEC_CORE
 static void __init reserve_crashkernel(void)
 {
 }
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e7db..f6b99da4ed08ecf 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -73,6 +73,9 @@ extern unsigned char *vmcoreinfo_data;
 extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
 
+extern struct resource crashk_res;
+extern struct resource crashk_low_res;
+
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len);
 void final_note(Elf_Word *buf);
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0c994ae37729e1e..cd744d962f6f417 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -352,8 +352,6 @@ extern int kexec_load_disabled;
 
 /* Location of a reserved region to hold the crash kernel.
  */
-extern struct resource crashk_res;
-extern struct resource crashk_low_res;
 extern note_buf_t __percpu *crash_notes;
 
 /* flag to track if kexec reboot is in progress */
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index eb53f5ec62c900f..21105942df72897 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -8,6 +8,12 @@
 #include <linux/crash_core.h>
 #include <linux/utsname.h>
 #include <linux/vmalloc.h>
+#include <linux/memblock.h>
+#include <linux/swiotlb.h>
+
+#ifdef CONFIG_KEXEC_CORE
+#include <asm/kexec.h>
+#endif
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -22,6 +28,22 @@ u32 *vmcoreinfo_note;
 /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
 static unsigned char *vmcoreinfo_data_safecopy;
 
+/* Location of the reserved area for the crash kernel */
+struct resource crashk_res = {
+	.name  = "Crash kernel",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+	.desc  = IORES_DESC_CRASH_KERNEL
+};
+struct resource crashk_low_res = {
+	.name  = "Crash kernel",
+	.start = 0,
+	.end   = 0,
+	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
+	.desc  = IORES_DESC_CRASH_KERNEL
+};
+
 /*
  * parsing the "crashkernel" commandline
  *
@@ -295,6 +317,143 @@ int __init parse_crashkernel_low(char *cmdline,
 				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
 }
 
+/*
+ * --------- Crashkernel reservation ------------------------------
+ */
+
+#ifdef CONFIG_KEXEC_CORE
+
+#ifdef CONFIG_X86
+static int __init reserve_crashkernel_low(void)
+{
+#ifdef CONFIG_X86_64
+	unsigned long long base, low_base = 0, low_size = 0;
+	unsigned long low_mem_limit;
+	int ret;
+
+	low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
+
+	/* crashkernel=Y,low */
+	ret = parse_crashkernel_low(boot_command_line, low_mem_limit, &low_size, &base);
+	if (ret) {
+		/*
+		 * two parts from kernel/dma/swiotlb.c:
+		 * -swiotlb size: user-specified with swiotlb= or default.
+		 *
+		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
+		 * to 8M for other buffers that may need to stay low too. Also
+		 * make sure we allocate enough extra low memory so that we
+		 * don't run out of DMA buffers for 32-bit devices.
+		 */
+		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
+	} else {
+		/* passed with crashkernel=0,low ? */
+		if (!low_size)
+			return 0;
+	}
+
+	low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
+			CRASH_ADDR_LOW_MAX);
+	if (!low_base) {
+		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
+		       (unsigned long)(low_size >> 20));
+		return -ENOMEM;
+	}
+
+	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (low RAM limit: %ldMB)\n",
+		(unsigned long)(low_size >> 20),
+		(unsigned long)(low_base >> 20),
+		(unsigned long)(low_mem_limit >> 20));
+
+	crashk_low_res.start = low_base;
+	crashk_low_res.end   = low_base + low_size - 1;
+#endif
+	return 0;
+}
+
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+void __init reserve_crashkernel(void)
+{
+	unsigned long long crash_size, crash_base, total_mem;
+	bool high = false;
+	int ret;
+
+	total_mem = memblock_phys_mem_size();
+
+	/* crashkernel=XM */
+	ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
+	if (ret != 0 || crash_size <= 0) {
+		/* crashkernel=X,high */
+		ret = parse_crashkernel_high(boot_command_line, total_mem,
+					     &crash_size, &crash_base);
+		if (ret != 0 || crash_size <= 0)
+			return;
+		high = true;
+	}
+
+	/* 0 means: find the address automatically */
+	if (!crash_base) {
+		/*
+		 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
+		 * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
+		 * also allocates 256M extra low memory for DMA buffers
+		 * and swiotlb.
+		 * But the extra memory is not required for all machines.
+		 * So try low memory first and fall back to high memory
+		 * unless "crashkernel=size[KMG],high" is specified.
+		 */
+		if (!high)
+			crash_base = memblock_phys_alloc_range(crash_size,
+						CRASH_ALIGN, CRASH_ALIGN,
+						CRASH_ADDR_LOW_MAX);
+		if (!crash_base)
+			crash_base = memblock_phys_alloc_range(crash_size,
+						CRASH_ALIGN, CRASH_ALIGN,
+						CRASH_ADDR_HIGH_MAX);
+		if (!crash_base) {
+			pr_info("crashkernel reservation failed - No suitable area found.\n");
+			return;
+		}
+	} else {
+		/* User specifies base address explicitly. */
+		unsigned long long start;
+
+		if (!IS_ALIGNED(crash_base, CRASH_ALIGN)) {
+			pr_warn("cannot reserve crashkernel: base address is not %ldMB aligned\n",
+				(unsigned long)CRASH_ALIGN >> 20);
+			return;
+		}
+
+		start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
+						  crash_base + crash_size);
+		if (start != crash_base) {
+			pr_info("crashkernel reservation failed - memory is in use.\n");
+			return;
+		}
+	}
+
+	if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
+		memblock_free(crash_base, crash_size);
+		return;
+	}
+
+	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
+		(unsigned long)(crash_size >> 20),
+		(unsigned long)(crash_base >> 20),
+		(unsigned long)(total_mem >> 20));
+
+	crashk_res.start = crash_base;
+	crashk_res.end   = crash_base + crash_size - 1;
+}
+#endif /* CONFIG_X86 */
+#endif /* CONFIG_KEXEC_CORE */
+
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len)
 {
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 5a5d192a89ac307..1e0d4909bbb6b77 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -54,23 +54,6 @@ note_buf_t __percpu *crash_notes;
 /* Flag to indicate we are going to kexec a new kernel */
 bool kexec_in_progress = false;
 
-
-/* Location of the reserved area for the crash kernel */
-struct resource crashk_res = {
-	.name  = "Crash kernel",
-	.start = 0,
-	.end   = 0,
-	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-	.desc  = IORES_DESC_CRASH_KERNEL
-};
-struct resource crashk_low_res = {
-	.name  = "Crash kernel",
-	.start = 0,
-	.end   = 0,
-	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
-	.desc  = IORES_DESC_CRASH_KERNEL
-};
-
 int kexec_should_crash(struct task_struct *p)
 {
 	/*
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 06/10] arm64: kdump: introduce some macros for crash kernel reservation
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
                   ` (4 preceding siblings ...)
  2021-10-20  2:03 ` [PATCH v15 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 07/10] arm64: kdump: reimplement crashkernel=X Zhen Lei
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
Introduce macro CRASH_ALIGN for alignment, macro CRASH_ADDR_LOW_MAX
for upper bound of low crash memory, macro CRASH_ADDR_HIGH_MAX for
upper bound of high crash memory, use macros instead.
Besides, keep consistent with x86, use CRASH_ALIGN as the lower bound
of crash kernel reservation.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
---
 arch/arm64/include/asm/kexec.h | 6 ++++++
 arch/arm64/mm/init.c           | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 00dbcc71aeb2918..b51ceb143cbbdb0 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -25,6 +25,12 @@
 
 #define KEXEC_ARCH KEXEC_ARCH_AARCH64
 
+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGN	SZ_2M
+
+#define CRASH_ADDR_LOW_MAX	arm64_dma_phys_limit
+#define CRASH_ADDR_HIGH_MAX	MEMBLOCK_ALLOC_ACCESSIBLE
+
 #ifndef __ASSEMBLY__
 
 /**
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 37a81754d9b61f7..2c94ae13b160834 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -75,7 +75,7 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
 static void __init reserve_crashkernel(void)
 {
 	unsigned long long crash_base, crash_size;
-	unsigned long long crash_max = arm64_dma_phys_limit;
+	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
 	int ret;
 
 	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
@@ -91,7 +91,7 @@ static void __init reserve_crashkernel(void)
 		crash_max = crash_base + crash_size;
 
 	/* Current arm64 boot protocol requires 2MB alignment */
-	crash_base = memblock_phys_alloc_range(crash_size, SZ_2M,
+	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
 					       crash_base, crash_max);
 	if (!crash_base) {
 		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 07/10] arm64: kdump: reimplement crashkernel=X
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
                   ` (5 preceding siblings ...)
  2021-10-20  2:03 ` [PATCH v15 06/10] arm64: kdump: introduce some macros for crash kernel reservation Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 08/10] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config Zhen Lei
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which
will fail when there is no enough low memory.
2. If reserving crashkernel above 4G, in this case, crash dump
kernel will boot failure because there is no low memory available
for allocation.
To solve these issues, change the behavior of crashkernel=X and
introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
in DMA zone, and fall back to high allocation if it fails.
We can also use "crashkernel=X,high" to select a region above DMA zone,
which also tries to allocate at least 256M in DMA zone automatically.
"crashkernel=Y,low" can be used to allocate specified size low memory.
Another minor change, there may be two regions reserved for crash
dump kernel, in order to distinct from the high region and make no
effect to the use of existing kexec-tools, rename the low region as
"Crash kernel (low)".
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Tested-by: John Donnelly <John.p.donnelly@oracle.com>
---
 arch/arm64/include/asm/kexec.h         |  4 ++
 arch/arm64/kernel/machine_kexec_file.c | 12 +++++-
 arch/arm64/kernel/setup.c              | 13 +++++-
 arch/arm64/mm/init.c                   | 59 +++++---------------------
 kernel/crash_core.c                    |  6 +--
 5 files changed, 40 insertions(+), 54 deletions(-)
diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index b51ceb143cbbdb0..fa17fc8a5a2701b 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
 static inline void crash_post_resume(void) {}
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
+extern void __init reserve_crashkernel(void);
+#endif
+
 #define ARCH_HAS_KIMAGE_ARCH
 
 struct kimage_arch {
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 63634b4d72c158f..6f3fa059ca4e816 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -65,10 +65,18 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
 
 	/* Exclude crashkernel region */
 	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+	if (ret)
+		goto out;
+
+	if (crashk_low_res.end) {
+		ret = crash_exclude_mem_range(cmem, crashk_low_res.start, crashk_low_res.end);
+		if (ret)
+			goto out;
+	}
 
-	if (!ret)
-		ret =  crash_prepare_elf64_headers(cmem, true, addr, sz);
+	ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
 
+out:
 	kfree(cmem);
 	return ret;
 }
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index be5f85b0a24de69..4bb2e55366be64d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -248,7 +248,18 @@ static void __init request_standard_resources(void)
 		    kernel_data.end <= res->end)
 			request_resource(res, &kernel_data);
 #ifdef CONFIG_KEXEC_CORE
-		/* Userspace will find "Crash kernel" region in /proc/iomem. */
+		/*
+		 * Userspace will find "Crash kernel" or "Crash kernel (low)"
+		 * region in /proc/iomem.
+		 * In order to distinct from the high region and make no effect
+		 * to the use of existing kexec-tools, rename the low region as
+		 * "Crash kernel (low)".
+		 */
+		if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+				crashk_low_res.end <= res->end) {
+			crashk_low_res.name = "Crash kernel (low)";
+			request_resource(res, &crashk_low_res);
+		}
 		if (crashk_res.end && crashk_res.start >= res->start &&
 		    crashk_res.end <= res->end)
 			request_resource(res, &crashk_res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 2c94ae13b160834..cde26d49f76cfa0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -36,6 +36,7 @@
 #include <asm/fixmap.h>
 #include <asm/kasan.h>
 #include <asm/kernel-pgtable.h>
+#include <asm/kexec.h>
 #include <asm/kvm_host.h>
 #include <asm/memory.h>
 #include <asm/numa.h>
@@ -64,57 +65,11 @@ EXPORT_SYMBOL(memstart_addr);
  */
 phys_addr_t arm64_dma_phys_limit __ro_after_init;
 
-#ifdef CONFIG_KEXEC_CORE
-/*
- * reserve_crashkernel() - reserves memory for crash kernel
- *
- * This function reserves memory area given in "crashkernel=" kernel command
- * line parameter. The memory reserved is used by dump capture kernel when
- * primary kernel is crashing.
- */
+#ifndef CONFIG_KEXEC_CORE
 static void __init reserve_crashkernel(void)
 {
-	unsigned long long crash_base, crash_size;
-	unsigned long long crash_max = CRASH_ADDR_LOW_MAX;
-	int ret;
-
-	ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
-				&crash_size, &crash_base);
-	/* no crashkernel= or invalid value specified */
-	if (ret || !crash_size)
-		return;
-
-	crash_size = PAGE_ALIGN(crash_size);
-
-	/* User specifies base address explicitly. */
-	if (crash_base)
-		crash_max = crash_base + crash_size;
-
-	/* Current arm64 boot protocol requires 2MB alignment */
-	crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
-					       crash_base, crash_max);
-	if (!crash_base) {
-		pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
-			crash_size);
-		return;
-	}
-
-	pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
-		crash_base, crash_base + crash_size, crash_size >> 20);
-
-	/*
-	 * The crashkernel memory will be removed from the kernel linear
-	 * map. Inform kmemleak so that it won't try to access it.
-	 */
-	kmemleak_ignore_phys(crash_base);
-	crashk_res.start = crash_base;
-	crashk_res.end = crash_base + crash_size - 1;
 }
-#else
-static void __init reserve_crashkernel(void)
-{
-}
-#endif /* CONFIG_KEXEC_CORE */
+#endif
 
 /*
  * Return the maximum physical address for a zone accessible by the given bits
@@ -399,6 +354,14 @@ void __init bootmem_init(void)
 	 * reserved, so do it here.
 	 */
 	reserve_crashkernel();
+#ifdef CONFIG_KEXEC_CORE
+	/*
+	 * The low region is intended to be used for crash dump kernel devices,
+	 * just mark the low region as "nomap" simply.
+	 */
+	if (crashk_low_res.end)
+		memblock_mark_nomap(crashk_low_res.start, resource_size(&crashk_low_res));
+#endif
 
 	memblock_dump_all();
 }
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 21105942df72897..4d81b9ff42db88b 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -323,10 +323,10 @@ int __init parse_crashkernel_low(char *cmdline,
 
 #ifdef CONFIG_KEXEC_CORE
 
-#ifdef CONFIG_X86
+#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
 static int __init reserve_crashkernel_low(void)
 {
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_64BIT
 	unsigned long long base, low_base = 0, low_size = 0;
 	unsigned long low_mem_limit;
 	int ret;
@@ -451,7 +451,7 @@ void __init reserve_crashkernel(void)
 	crashk_res.start = crash_base;
 	crashk_res.end   = crash_base + crash_size - 1;
 }
-#endif /* CONFIG_X86 */
+#endif
 #endif /* CONFIG_KEXEC_CORE */
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 08/10] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
                   ` (6 preceding siblings ...)
  2021-10-20  2:03 ` [PATCH v15 07/10] arm64: kdump: reimplement crashkernel=X Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range" Zhen Lei
  2021-10-20  2:03 ` [PATCH v15 10/10] kdump: update Documentation about crashkernel Zhen Lei
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
We make the functions reserve_crashkernel[_low]() as generic for
x86 and arm64. Since reserve_crashkernel[_low]() implementations
are quite similar on other architectures as well, we can have more
users of this later.
So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
select this by X86 and ARM64.
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
 arch/Kconfig        | 3 +++
 arch/arm64/Kconfig  | 1 +
 arch/x86/Kconfig    | 2 ++
 kernel/crash_core.c | 7 ++-----
 4 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c71026435df..d0585ce1b81b9cb 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -24,6 +24,9 @@ config KEXEC_ELF
 config HAVE_IMA_KEXEC
 	bool
 
+config ARCH_WANT_RESERVE_CRASH_KERNEL
+	bool
+
 config SET_FS
 	bool
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fee914c716aa262..0ddf06afe625584 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -94,6 +94,7 @@ config ARM64
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
 	select ARCH_WANT_LD_ORPHAN_WARN
+	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
 	select ARCH_WANTS_NO_INSTR
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARM_AMBA
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9830e7e1060f7c..66eb5d088695c77 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -12,6 +12,7 @@ config X86_32
 	depends on !64BIT
 	# Options that are inherently 32-bit kernel only:
 	select ARCH_WANT_IPC_PARSE_VERSION
+	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
 	select CLKSRC_I8253
 	select CLONE_BACKWARDS
 	select GENERIC_VDSO_32
@@ -28,6 +29,7 @@ config X86_64
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
 	select HAVE_ARCH_SOFT_DIRTY
 	select MODULES_USE_ELF_RELA
 	select NEED_DMA_MAP_STATE
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 4d81b9ff42db88b..4d5bf55ed71c253 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -321,9 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
  * --------- Crashkernel reservation ------------------------------
  */
 
-#ifdef CONFIG_KEXEC_CORE
-
-#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
+#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
 static int __init reserve_crashkernel_low(void)
 {
 #ifdef CONFIG_64BIT
@@ -451,8 +449,7 @@ void __init reserve_crashkernel(void)
 	crashk_res.start = crash_base;
 	crashk_res.end   = crash_base + crash_size - 1;
 }
-#endif
-#endif /* CONFIG_KEXEC_CORE */
+#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
 			  void *data, size_t data_len)
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
                   ` (7 preceding siblings ...)
  2021-10-20  2:03 ` [PATCH v15 08/10] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  2021-10-20 14:19   ` Rob Herring
  2021-10-20  2:03 ` [PATCH v15 10/10] kdump: update Documentation about crashkernel Zhen Lei
  9 siblings, 1 reply; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
When reserving crashkernel in high memory, some low memory is reserved
for crash dump kernel devices and never mapped by the first kernel.
This memory range is advertised to crash dump kernel via DT property
under /chosen,
        linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>
We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.
Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/of/fdt.c | 47 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 11 deletions(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4546572af24bbf1..cf59c847b2c28a5 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -969,8 +969,16 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
 		 elfcorehdr_addr, elfcorehdr_size);
 }
 
-static phys_addr_t cap_mem_addr;
-static phys_addr_t cap_mem_size;
+/*
+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now there may
+ * be two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES		2
+
+static struct memblock_region cap_mem_regions[MAX_USABLE_RANGES];
 
 /**
  * early_init_dt_check_for_usable_mem_range - Decode usable memory range
@@ -979,20 +987,30 @@ static phys_addr_t cap_mem_size;
  */
 static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
 {
-	const __be32 *prop;
-	int len;
+	const __be32 *prop, *endp;
+	int len, nr = 0;
+	struct memblock_region *rgn = &cap_mem_regions[0];
 
 	pr_debug("Looking for usable-memory-range property... ");
 
 	prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
-	if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
+	if (!prop)
 		return;
 
-	cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);
-	cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
+	endp = prop + (len / sizeof(__be32));
+	while ((endp - prop) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		rgn->base = dt_mem_next_cell(dt_root_addr_cells, &prop);
+		rgn->size = dt_mem_next_cell(dt_root_size_cells, &prop);
+
+		pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
+			 nr, &rgn->base, &rgn->size);
+
+		if (++nr >= MAX_USABLE_RANGES)
+			break;
+
+		rgn++;
+	}
 
-	pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
-		 &cap_mem_size);
 }
 
 #ifdef CONFIG_SERIAL_EARLYCON
@@ -1265,7 +1283,8 @@ bool __init early_init_dt_verify(void *params)
 
 void __init early_init_dt_scan_nodes(void)
 {
-	int rc = 0;
+	int i, rc = 0;
+	struct memblock_region *rgn = &cap_mem_regions[0];
 
 	/* Initialize {size,address}-cells info */
 	of_scan_flat_dt(early_init_dt_scan_root, NULL);
@@ -1279,7 +1298,13 @@ void __init early_init_dt_scan_nodes(void)
 	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
 
 	/* Handle linux,usable-memory-range property */
-	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
+	memblock_cap_memory_range(rgn->base, rgn->size);
+	for (i = 1; i < MAX_USABLE_RANGES; i++) {
+		rgn++;
+
+		if (rgn->size)
+			memblock_add(rgn->base, rgn->size);
+	}
 }
 
 bool __init early_init_dt_scan(void *params)
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* [PATCH v15 10/10] kdump: update Documentation about crashkernel
  2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
                   ` (8 preceding siblings ...)
  2021-10-20  2:03 ` [PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range" Zhen Lei
@ 2021-10-20  2:03 ` Zhen Lei
  9 siblings, 0 replies; 13+ messages in thread
From: Zhen Lei @ 2021-10-20  2:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Rob Herring, Frank Rowand, devicetree,
	Jonathan Corbet, linux-doc
  Cc: Zhen Lei, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
From: Chen Zhou <chenzhou10@huawei.com>
For arm64, the behavior of crashkernel=X has been changed, which
tries low allocation in DMA zone and fall back to high allocation
if it fails.
We can also use "crashkernel=X,high" to select a high region above
DMA zone, which also tries to allocate at least 256M low memory in
DMA zone automatically and "crashkernel=Y,low" can be used to allocate
specified size low memory.
So update the Documentation.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
---
 Documentation/admin-guide/kdump/kdump.rst       | 11 +++++++++--
 Documentation/admin-guide/kernel-parameters.txt | 11 +++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index cb30ca3df27c9b2..d4c287044be0c70 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -361,8 +361,15 @@ Boot into System Kernel
    kernel will automatically locate the crash kernel image within the
    first 512MB of RAM if X is not given.
 
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
-   the kernel, X if explicitly specified, must be aligned to 2MiB (0x200000).
+   On arm64, use "crashkernel=X" to try low allocation in DMA zone and
+   fall back to high allocation if it fails.
+   We can also use "crashkernel=X,high" to select a high region above
+   DMA zone, which also tries to allocate at least 256M low memory in
+   DMA zone automatically.
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
+   Use "crashkernel=Y@X" if you really have to reserve memory from
+   specified start address X. Note that the start address of the kernel,
+   X if explicitly specified, must be aligned to 2MiB (0x200000).
 
 Load the Dump-capture Kernel
 ============================
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 43dc35fe5bc038e..98b87e82321413b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -783,6 +783,9 @@
 			[KNL, X86-64] Select a region under 4G first, and
 			fall back to reserve region above 4G when '@offset'
 			hasn't been specified.
+			[KNL, ARM64] Try low allocation in DMA zone and fall back
+			to high allocation if it fails when '@offset' hasn't been
+			specified.
 			See Documentation/admin-guide/kdump/kdump.rst for further details.
 
 	crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -799,6 +802,8 @@
 			Otherwise memory region will be allocated below 4G, if
 			available.
 			It will be ignored if crashkernel=X is specified.
+			[KNL, ARM64] range in high memory.
+			Allow kernel to allocate physical memory region from top.
 	crashkernel=size[KMG],low
 			[KNL, X86-64] range under 4G. When crashkernel=X,high
 			is passed, kernel could allocate physical memory region
@@ -807,13 +812,15 @@
 			requires at least 64M+32K low memory, also enough extra
 			low memory is needed to make sure DMA buffers for 32-bit
 			devices won't run out. Kernel would try to allocate at
-			at least 256M below 4G automatically.
+			least 256M below 4G automatically.
 			This one let user to specify own low range under 4G
 			for second kernel instead.
 			0: to disable low allocation.
 			It will be ignored when crashkernel=X,high is not used
 			or memory reserved is below 4G.
-
+			[KNL, ARM64] range in low memory.
+			This one let user to specify a low range in DMA zone for
+			crash dump kernel.
 	cryptomgr.notests
 			[KNL] Disable crypto self-tests
 
-- 
2.25.1
^ permalink raw reply related	[flat|nested] 13+ messages in thread
* Re: [PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
  2021-10-20  2:03 ` [PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range" Zhen Lei
@ 2021-10-20 14:19   ` Rob Herring
  2021-10-21  1:14     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 13+ messages in thread
From: Rob Herring @ 2021-10-20 14:19 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Frank Rowand, devicetree, Jonathan Corbet,
	linux-doc, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
On Wed, Oct 20, 2021 at 10:03:16AM +0800, Zhen Lei wrote:
> From: Chen Zhou <chenzhou10@huawei.com>
> 
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices and never mapped by the first kernel.
> This memory range is advertised to crash dump kernel via DT property
> under /chosen,
>         linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>
> 
> We reused the DT property linux,usable-memory-range and made the low
> memory region as the second range "BASE2 SIZE2", which keeps compatibility
> with existing user-space and older kdump kernels.
> 
> Crash dump kernel reads this property at boot time and call memblock_add()
> to add the low memory region after memblock_cap_memory_range() has been
> called.
> 
> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  drivers/of/fdt.c | 47 ++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 36 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 4546572af24bbf1..cf59c847b2c28a5 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -969,8 +969,16 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
>  		 elfcorehdr_addr, elfcorehdr_size);
>  }
>  
> -static phys_addr_t cap_mem_addr;
> -static phys_addr_t cap_mem_size;
> +/*
> + * The main usage of linux,usable-memory-range is for crash dump kernel.
> + * Originally, the number of usable-memory regions is one. Now there may
> + * be two regions, low region and high region.
> + * To make compatibility with existing user-space and older kdump, the low
> + * region is always the last range of linux,usable-memory-range if exist.
> + */
> +#define MAX_USABLE_RANGES		2
> +
> +static struct memblock_region cap_mem_regions[MAX_USABLE_RANGES];
>  
>  /**
>   * early_init_dt_check_for_usable_mem_range - Decode usable memory range
> @@ -979,20 +987,30 @@ static phys_addr_t cap_mem_size;
>   */
>  static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
>  {
> -	const __be32 *prop;
> -	int len;
> +	const __be32 *prop, *endp;
> +	int len, nr = 0;
> +	struct memblock_region *rgn = &cap_mem_regions[0];
>  
>  	pr_debug("Looking for usable-memory-range property... ");
>  
>  	prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
> -	if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
> +	if (!prop)
>  		return;
>  
> -	cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);
> -	cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
> +	endp = prop + (len / sizeof(__be32));
> +	while ((endp - prop) >= (dt_root_addr_cells + dt_root_size_cells)) {
> +		rgn->base = dt_mem_next_cell(dt_root_addr_cells, &prop);
> +		rgn->size = dt_mem_next_cell(dt_root_size_cells, &prop);
> +
> +		pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
> +			 nr, &rgn->base, &rgn->size);
> +
> +		if (++nr >= MAX_USABLE_RANGES)
> +			break;
> +
> +		rgn++;
> +	}
>  
> -	pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
> -		 &cap_mem_size);
>  }
>  
>  #ifdef CONFIG_SERIAL_EARLYCON
> @@ -1265,7 +1283,8 @@ bool __init early_init_dt_verify(void *params)
>  
>  void __init early_init_dt_scan_nodes(void)
>  {
> -	int rc = 0;
> +	int i, rc = 0;
> +	struct memblock_region *rgn = &cap_mem_regions[0];
>  
>  	/* Initialize {size,address}-cells info */
>  	of_scan_flat_dt(early_init_dt_scan_root, NULL);
> @@ -1279,7 +1298,13 @@ void __init early_init_dt_scan_nodes(void)
>  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>  
>  	/* Handle linux,usable-memory-range property */
> -	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
> +	memblock_cap_memory_range(rgn->base, rgn->size);
> +	for (i = 1; i < MAX_USABLE_RANGES; i++) {
> +		rgn++;
Just use rgn[i].
> +
> +		if (rgn->size)
This check can be in the 'for' conditions check.
> +			memblock_add(rgn->base, rgn->size);
> +	}
There's not really any point in doing all this in 2 steps. I'm 
assuming this needs to be handled after scanning the memory nodes, so 
can you refactor this moving early_init_dt_check_for_usable_mem_range 
out of early_init_dt_scan_chosen() and call it here. You'll have to get 
the offset for /chosen twice or save the offset.
Rob
^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range"
  2021-10-20 14:19   ` Rob Herring
@ 2021-10-21  1:14     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 13+ messages in thread
From: Leizhen (ThunderTown) @ 2021-10-21  1:14 UTC (permalink / raw)
  To: Rob Herring
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H . Peter Anvin, linux-kernel, Dave Young, Baoquan He,
	Vivek Goyal, Eric Biederman, kexec, Catalin Marinas, Will Deacon,
	linux-arm-kernel, Frank Rowand, devicetree, Jonathan Corbet,
	linux-doc, Randy Dunlap, Nicolas Saenz Julienne, Feng Zhou,
	Kefeng Wang
On 2021/10/20 22:19, Rob Herring wrote:
> On Wed, Oct 20, 2021 at 10:03:16AM +0800, Zhen Lei wrote:
>> From: Chen Zhou <chenzhou10@huawei.com>
>>
>> When reserving crashkernel in high memory, some low memory is reserved
>> for crash dump kernel devices and never mapped by the first kernel.
>> This memory range is advertised to crash dump kernel via DT property
>> under /chosen,
>>         linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>
>>
>> We reused the DT property linux,usable-memory-range and made the low
>> memory region as the second range "BASE2 SIZE2", which keeps compatibility
>> with existing user-space and older kdump kernels.
>>
>> Crash dump kernel reads this property at boot time and call memblock_add()
>> to add the low memory region after memblock_cap_memory_range() has been
>> called.
>>
>> Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  drivers/of/fdt.c | 47 ++++++++++++++++++++++++++++++++++++-----------
>>  1 file changed, 36 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>> index 4546572af24bbf1..cf59c847b2c28a5 100644
>> --- a/drivers/of/fdt.c
>> +++ b/drivers/of/fdt.c
>> @@ -969,8 +969,16 @@ static void __init early_init_dt_check_for_elfcorehdr(unsigned long node)
>>  		 elfcorehdr_addr, elfcorehdr_size);
>>  }
>>  
>> -static phys_addr_t cap_mem_addr;
>> -static phys_addr_t cap_mem_size;
>> +/*
>> + * The main usage of linux,usable-memory-range is for crash dump kernel.
>> + * Originally, the number of usable-memory regions is one. Now there may
>> + * be two regions, low region and high region.
>> + * To make compatibility with existing user-space and older kdump, the low
>> + * region is always the last range of linux,usable-memory-range if exist.
>> + */
>> +#define MAX_USABLE_RANGES		2
>> +
>> +static struct memblock_region cap_mem_regions[MAX_USABLE_RANGES];
>>  
>>  /**
>>   * early_init_dt_check_for_usable_mem_range - Decode usable memory range
>> @@ -979,20 +987,30 @@ static phys_addr_t cap_mem_size;
>>   */
>>  static void __init early_init_dt_check_for_usable_mem_range(unsigned long node)
>>  {
>> -	const __be32 *prop;
>> -	int len;
>> +	const __be32 *prop, *endp;
>> +	int len, nr = 0;
>> +	struct memblock_region *rgn = &cap_mem_regions[0];
>>  
>>  	pr_debug("Looking for usable-memory-range property... ");
>>  
>>  	prop = of_get_flat_dt_prop(node, "linux,usable-memory-range", &len);
>> -	if (!prop || (len < (dt_root_addr_cells + dt_root_size_cells)))
>> +	if (!prop)
>>  		return;
>>  
>> -	cap_mem_addr = dt_mem_next_cell(dt_root_addr_cells, &prop);
>> -	cap_mem_size = dt_mem_next_cell(dt_root_size_cells, &prop);
>> +	endp = prop + (len / sizeof(__be32));
>> +	while ((endp - prop) >= (dt_root_addr_cells + dt_root_size_cells)) {
>> +		rgn->base = dt_mem_next_cell(dt_root_addr_cells, &prop);
>> +		rgn->size = dt_mem_next_cell(dt_root_size_cells, &prop);
>> +
>> +		pr_debug("cap_mem_regions[%d]: base=%pa, size=%pa\n",
>> +			 nr, &rgn->base, &rgn->size);
>> +
>> +		if (++nr >= MAX_USABLE_RANGES)
>> +			break;
>> +
>> +		rgn++;
>> +	}
>>  
>> -	pr_debug("cap_mem_start=%pa cap_mem_size=%pa\n", &cap_mem_addr,
>> -		 &cap_mem_size);
>>  }
>>  
>>  #ifdef CONFIG_SERIAL_EARLYCON
>> @@ -1265,7 +1283,8 @@ bool __init early_init_dt_verify(void *params)
>>  
>>  void __init early_init_dt_scan_nodes(void)
>>  {
>> -	int rc = 0;
>> +	int i, rc = 0;
>> +	struct memblock_region *rgn = &cap_mem_regions[0];
>>  
>>  	/* Initialize {size,address}-cells info */
>>  	of_scan_flat_dt(early_init_dt_scan_root, NULL);
>> @@ -1279,7 +1298,13 @@ void __init early_init_dt_scan_nodes(void)
>>  	of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>>  
>>  	/* Handle linux,usable-memory-range property */
>> -	memblock_cap_memory_range(cap_mem_addr, cap_mem_size);
>> +	memblock_cap_memory_range(rgn->base, rgn->size);
>> +	for (i = 1; i < MAX_USABLE_RANGES; i++) {
>> +		rgn++;
> 
> Just use rgn[i].
OK.
> 
>> +
>> +		if (rgn->size)
> 
> This check can be in the 'for' conditions check.
Yes, this node is added by kexec tool, it is impossible that
the first range is zero and the second range is not zero.
> 
>> +			memblock_add(rgn->base, rgn->size);
>> +	}
> 
> 
> There's not really any point in doing all this in 2 steps. I'm 
> assuming this needs to be handled after scanning the memory nodes, so 
> can you refactor this moving early_init_dt_check_for_usable_mem_range 
> out of early_init_dt_scan_chosen() and call it here. You'll have to get 
> the offset for /chosen twice or save the offset.
This's a good suggestion. I'll refactor it. Thanks.
Zhen Lei
> 
> Rob
> .
> 
^ permalink raw reply	[flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-10-21  1:14 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-20  2:03 [PATCH v15 00/10] support reserving crashkernel above 4G on arm64 kdump Zhen Lei
2021-10-20  2:03 ` [PATCH v15 01/10] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN Zhen Lei
2021-10-20  2:03 ` [PATCH v15 02/10] x86: kdump: make the lower bound of crash kernel reservation consistent Zhen Lei
2021-10-20  2:03 ` [PATCH v15 03/10] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel() Zhen Lei
2021-10-20  2:03 ` [PATCH v15 04/10] x86: kdump: move xen_pv_domain() check and insert_resource() to setup_arch() Zhen Lei
2021-10-20  2:03 ` [PATCH v15 05/10] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c Zhen Lei
2021-10-20  2:03 ` [PATCH v15 06/10] arm64: kdump: introduce some macros for crash kernel reservation Zhen Lei
2021-10-20  2:03 ` [PATCH v15 07/10] arm64: kdump: reimplement crashkernel=X Zhen Lei
2021-10-20  2:03 ` [PATCH v15 08/10] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config Zhen Lei
2021-10-20  2:03 ` [PATCH v15 09/10] of: fdt: Add memory for devices by DT property "linux,usable-memory-range" Zhen Lei
2021-10-20 14:19   ` Rob Herring
2021-10-21  1:14     ` Leizhen (ThunderTown)
2021-10-20  2:03 ` [PATCH v15 10/10] kdump: update Documentation about crashkernel Zhen Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).