linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel
@ 2023-04-23 10:52 Sourabh Jain
  2023-04-23 10:52 ` [PATCH v10 1/5] powerpc/kexec: turn some static helper functions public Sourabh Jain
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Sourabh Jain @ 2023-04-23 10:52 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: eric.devolder, bhe, mahesh, kexec, ldufour, hbathini

The Problem:
============
Post CPU/Memory hot plug/unplug and online/offline events the  kernel
holds stale information about the system. Dump collection with stale
kdump kernel might end up in dump capture failure or an inaccurate dump
collection.

Existing solution:
==================
The existing solution to keep the kdump kernel up-to-date by monitoring
CPU/Memory hotplug/online/offline events via udev rule and trigger a full
kdump kernel reload for every hotplug event.

Shortcomings:
------------------------------------------------
- Leaves a window where kernel crash might not lead to a successful dump
  collection.
- Reloading all kexec components for each hotplug is inefficient.
- udev rules are prone to races if hotplug events are frequent.

More about issues with an existing solution is posted here:
 - https://lkml.org/lkml/2020/12/14/532
 - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html

Proposed Solution:
==================
Instead of reloading all kexec segments on CPU/Memory hotplug/online/offline
event, this patch series focuses on updating only the relevant kexec segment.
Once the kexec segments are loaded in the kernel reserved area then an
arch-specific hotplug handler will update the relevant kexec segment based on
hotplug event type.

Series Dependencies
====================
This patch series implements the crash hotplug handler on PowerPC. The generic
crash hotplug handler is introduced by https://lkml.org/lkml/2023/4/4/1136 patch
series.

Git tree for testing:
=====================
The below git tree has this patch series applied on top of dependent patch
series.
https://github.com/sourabhjains/linux/tree/e21-s10

To realise the feature the kdump udev rule must updated to avoid
reloading of kdump reload on CPU/Memory hotplug/online/offline events.

  RHEL: /usr/lib/udev/rules.d/98-kexec.rules

	-SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"
	-SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem"
	-SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem"
	+SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
	+SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

Note: only kexec_file_load syscall will work. For kexec_load minor changes are
required in kexec tool.

---
Changelog:

v10:
  - Drop the patch that adds fdt_index attribute to struct kimage_arch
    Find the fdt segment index when needed.
  - Added more details into commits messages.
  - Rebased onto 6.3.0-rc5

v9:
  - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
    The patch is moved to generic patch series that introduces generic
    infrastructure for in kernel crash update.
  - Removed patch to pass the hotplug action type to the arch crash
    hotplug handler function. The generic patch series has introduced
    the hotplug action type in kimage struct.
  - Add detail commit message for better understanding.

v8:
  - Restrict fdt_index initialization to machine_kexec_post_load
    it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour

  - Updated the logic to find the number of offline core. [6/8]

  - Changed the logic to find the elfcore program header to accommodate
    future memory ranges due memory hotplug events. [8/8]

v7
  - added a new config to configure this feature
  - pass hotplug action type to arch specific handler

v6
  - Added crash memory hotplug support

v5:
  - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
  - Move fdt segment identification for kexec_load case to load path
    instead of crash hotplug handler
  - Keep new attribute defined under kimage_arch to track FDT segment
    under CONFIG_HOTPLUG_CPU config.

v4:
  - Update the logic to find the additional space needed for hotadd CPUs post
    kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug
    support for kexec_file_load" patch to know more about the change.
  - Fix a couple of typo.
  - Replace pr_err to pr_info_once to warn user about memory hotplug
    support.
  - In crash hotplug handle exit the for loop if FDT segment is found.

v3
  - Move fdt_index and fdt_index_vaild variables to kimage_arch struct.
  - Rebase patche on top of https://lkml.org/lkml/2022/3/3/674 [v5]
  - Fixed warning reported by checpatch script

v2:
  - Use generic hotplug handler introduced by https://lkml.org/lkml/2022/2/9/1406, a
    significant change from v1.

Sourabh Jain (5):
  powerpc/kexec: turn some static helper functions public
  powerpc/crash: introduce a new config option CRASH_HOTPLUG
  powerpc/crash: add crash CPU hotplug support
  crash: forward memory_notify args to arch crash hotplug handler
  powerpc/kexec: add crash memory hotplug support

 arch/powerpc/Kconfig                    |  12 +
 arch/powerpc/include/asm/kexec.h        |  10 +
 arch/powerpc/include/asm/kexec_ranges.h |   1 +
 arch/powerpc/kexec/core_64.c            | 301 ++++++++++++++++++++++++
 arch/powerpc/kexec/elf_64.c             |  12 +-
 arch/powerpc/kexec/file_load_64.c       | 212 ++++-------------
 arch/powerpc/kexec/ranges.c             |  85 +++++++
 arch/x86/include/asm/kexec.h            |   2 +-
 arch/x86/kernel/crash.c                 |   3 +-
 include/linux/kexec.h                   |   2 +-
 kernel/crash_core.c                     |  14 +-
 11 files changed, 479 insertions(+), 175 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v10 1/5] powerpc/kexec: turn some static helper functions public
  2023-04-23 10:52 [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Sourabh Jain
@ 2023-04-23 10:52 ` Sourabh Jain
  2023-04-23 10:52 ` [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Sourabh Jain @ 2023-04-23 10:52 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: eric.devolder, bhe, mahesh, kexec, Laurent Dufour, ldufour,
	hbathini

Move update_cpus_node and get_crash_memory_ranges functions from
kexec/file_load_64.c to kexec/core_64.c to make these functions usable
by other kexec components.

Later in the series, these functions are utilized to do in-kernel update
to kexec segments on CPU/Memory hot plug/unplug or online/offline events
for both kexec_load and kexec_file_load syscalls.

No functional change intended.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
---
 arch/powerpc/include/asm/kexec.h  |   6 ++
 arch/powerpc/kexec/core_64.c      | 166 ++++++++++++++++++++++++++++++
 arch/powerpc/kexec/file_load_64.c | 162 -----------------------------
 3 files changed, 172 insertions(+), 162 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a1ddba01e7d13..8090ad7d97d9d 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -99,6 +99,12 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co
 
 void kexec_copy_flush(struct kimage *image);
 
+#ifdef CONFIG_PPC64
+struct crash_mem;
+int update_cpus_node(void *fdt);
+int get_crash_memory_ranges(struct crash_mem **mem_ranges);
+#endif
+
 #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
 #define crash_free_reserved_phys_range crash_free_reserved_phys_range
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index a79e28c91e2be..0b292f93a74cc 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -17,6 +17,8 @@
 #include <linux/cpu.h>
 #include <linux/hardirq.h>
 #include <linux/of.h>
+#include <linux/libfdt.h>
+#include <linux/memblock.h>
 
 #include <asm/page.h>
 #include <asm/current.h>
@@ -30,6 +32,8 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/svm.h>
 #include <asm/ultravisor.h>
+#include <asm/kexec_ranges.h>
+#include <asm/crashdump-ppc64.h>
 
 int machine_kexec_prepare(struct kimage *image)
 {
@@ -377,6 +381,168 @@ void default_machine_kexec(struct kimage *image)
 	/* NOTREACHED */
 }
 
+/**
+ * get_crash_memory_ranges - Get crash memory ranges. This list includes
+ *                           first/crashing kernel's memory regions that
+ *                           would be exported via an elfcore.
+ * @mem_ranges:              Range list to add the memory ranges to.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int get_crash_memory_ranges(struct crash_mem **mem_ranges)
+{
+	phys_addr_t base, end;
+	struct crash_mem *tmem;
+	u64 i;
+	int ret;
+
+	for_each_mem_range(i, &base, &end) {
+		u64 size = end - base;
+
+		/* Skip backup memory region, which needs a separate entry */
+		if (base == BACKUP_SRC_START) {
+			if (size > BACKUP_SRC_SIZE) {
+				base = BACKUP_SRC_END + 1;
+				size -= BACKUP_SRC_SIZE;
+			} else
+				continue;
+		}
+
+		ret = add_mem_range(mem_ranges, base, size);
+		if (ret)
+			goto out;
+
+		/* Try merging adjacent ranges before reallocation attempt */
+		if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
+			sort_memory_ranges(*mem_ranges, true);
+	}
+
+	/* Reallocate memory ranges if there is no space to split ranges */
+	tmem = *mem_ranges;
+	if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
+		tmem = realloc_mem_ranges(mem_ranges);
+		if (!tmem)
+			goto out;
+	}
+
+	/* Exclude crashkernel region */
+	ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
+	if (ret)
+		goto out;
+
+	/*
+	 * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
+	 *        regions are exported to save their context at the time of
+	 *        crash, they should actually be backed up just like the
+	 *        first 64K bytes of memory.
+	 */
+	ret = add_rtas_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	ret = add_opal_mem_range(mem_ranges);
+	if (ret)
+		goto out;
+
+	/* create a separate program header for the backup region */
+	ret = add_mem_range(mem_ranges, BACKUP_SRC_START, BACKUP_SRC_SIZE);
+	if (ret)
+		goto out;
+
+	sort_memory_ranges(*mem_ranges, false);
+out:
+	if (ret)
+		pr_err("Failed to setup crash memory ranges\n");
+	return ret;
+}
+
+/**
+ * add_node_props - Reads node properties from device node structure and add
+ *                  them to fdt.
+ * @fdt:            Flattened device tree of the kernel
+ * @node_offset:    offset of the node to add a property at
+ * @dn:             device node pointer
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
+{
+	int ret = 0;
+	struct property *pp;
+
+	if (!dn)
+		return -EINVAL;
+
+	for_each_property_of_node(dn, pp) {
+		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
+		if (ret < 0) {
+			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
+			return ret;
+		}
+	}
+	return ret;
+}
+
+/**
+ * update_cpus_node - Update cpus node of flattened device tree using of_root
+ *                    device node.
+ * @fdt:              Flattened device tree of the kernel.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int update_cpus_node(void *fdt)
+{
+	struct device_node *cpus_node, *dn;
+	int cpus_offset, cpus_subnode_offset, ret = 0;
+
+	cpus_offset = fdt_path_offset(fdt, "/cpus");
+	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
+		pr_err("Malformed device tree: error reading /cpus node: %s\n",
+		       fdt_strerror(cpus_offset));
+		return cpus_offset;
+	}
+
+	if (cpus_offset > 0) {
+		ret = fdt_del_node(fdt, cpus_offset);
+		if (ret < 0) {
+			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
+			return -EINVAL;
+		}
+	}
+
+	/* Add cpus node to fdt */
+	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
+	if (cpus_offset < 0) {
+		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
+		return -EINVAL;
+	}
+
+	/* Add cpus node properties */
+	cpus_node = of_find_node_by_path("/cpus");
+	ret = add_node_props(fdt, cpus_offset, cpus_node);
+	of_node_put(cpus_node);
+	if (ret < 0)
+		return ret;
+
+	/* Loop through all subnodes of cpus and add them to fdt */
+	for_each_node_by_type(dn, "cpu") {
+		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
+		if (cpus_subnode_offset < 0) {
+			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
+			       fdt_strerror(cpus_subnode_offset));
+			ret = cpus_subnode_offset;
+			goto out;
+		}
+
+		ret = add_node_props(fdt, cpus_subnode_offset, dn);
+		if (ret < 0)
+			goto out;
+	}
+out:
+	of_node_put(dn);
+	return ret;
+}
+
 #ifdef CONFIG_PPC_64S_HASH_MMU
 /* Values we need to export to the second kernel via the device tree. */
 static unsigned long htab_base;
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 110d28bede2a7..5b0b3f61e0e72 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -133,81 +133,6 @@ static int get_usable_memory_ranges(struct crash_mem **mem_ranges)
 	return ret;
 }
 
-/**
- * get_crash_memory_ranges - Get crash memory ranges. This list includes
- *                           first/crashing kernel's memory regions that
- *                           would be exported via an elfcore.
- * @mem_ranges:              Range list to add the memory ranges to.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int get_crash_memory_ranges(struct crash_mem **mem_ranges)
-{
-	phys_addr_t base, end;
-	struct crash_mem *tmem;
-	u64 i;
-	int ret;
-
-	for_each_mem_range(i, &base, &end) {
-		u64 size = end - base;
-
-		/* Skip backup memory region, which needs a separate entry */
-		if (base == BACKUP_SRC_START) {
-			if (size > BACKUP_SRC_SIZE) {
-				base = BACKUP_SRC_END + 1;
-				size -= BACKUP_SRC_SIZE;
-			} else
-				continue;
-		}
-
-		ret = add_mem_range(mem_ranges, base, size);
-		if (ret)
-			goto out;
-
-		/* Try merging adjacent ranges before reallocation attempt */
-		if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
-			sort_memory_ranges(*mem_ranges, true);
-	}
-
-	/* Reallocate memory ranges if there is no space to split ranges */
-	tmem = *mem_ranges;
-	if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
-		tmem = realloc_mem_ranges(mem_ranges);
-		if (!tmem)
-			goto out;
-	}
-
-	/* Exclude crashkernel region */
-	ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
-	if (ret)
-		goto out;
-
-	/*
-	 * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL
-	 *        regions are exported to save their context at the time of
-	 *        crash, they should actually be backed up just like the
-	 *        first 64K bytes of memory.
-	 */
-	ret = add_rtas_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	ret = add_opal_mem_range(mem_ranges);
-	if (ret)
-		goto out;
-
-	/* create a separate program header for the backup region */
-	ret = add_mem_range(mem_ranges, BACKUP_SRC_START, BACKUP_SRC_SIZE);
-	if (ret)
-		goto out;
-
-	sort_memory_ranges(*mem_ranges, false);
-out:
-	if (ret)
-		pr_err("Failed to setup crash memory ranges\n");
-	return ret;
-}
-
 /**
  * get_reserved_memory_ranges - Get reserve memory ranges. This list includes
  *                              memory regions that should be added to the
@@ -1018,93 +943,6 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 	return extra_size;
 }
 
-/**
- * add_node_props - Reads node properties from device node structure and add
- *                  them to fdt.
- * @fdt:            Flattened device tree of the kernel
- * @node_offset:    offset of the node to add a property at
- * @dn:             device node pointer
- *
- * Returns 0 on success, negative errno on error.
- */
-static int add_node_props(void *fdt, int node_offset, const struct device_node *dn)
-{
-	int ret = 0;
-	struct property *pp;
-
-	if (!dn)
-		return -EINVAL;
-
-	for_each_property_of_node(dn, pp) {
-		ret = fdt_setprop(fdt, node_offset, pp->name, pp->value, pp->length);
-		if (ret < 0) {
-			pr_err("Unable to add %s property: %s\n", pp->name, fdt_strerror(ret));
-			return ret;
-		}
-	}
-	return ret;
-}
-
-/**
- * update_cpus_node - Update cpus node of flattened device tree using of_root
- *                    device node.
- * @fdt:              Flattened device tree of the kernel.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int update_cpus_node(void *fdt)
-{
-	struct device_node *cpus_node, *dn;
-	int cpus_offset, cpus_subnode_offset, ret = 0;
-
-	cpus_offset = fdt_path_offset(fdt, "/cpus");
-	if (cpus_offset < 0 && cpus_offset != -FDT_ERR_NOTFOUND) {
-		pr_err("Malformed device tree: error reading /cpus node: %s\n",
-		       fdt_strerror(cpus_offset));
-		return cpus_offset;
-	}
-
-	if (cpus_offset > 0) {
-		ret = fdt_del_node(fdt, cpus_offset);
-		if (ret < 0) {
-			pr_err("Error deleting /cpus node: %s\n", fdt_strerror(ret));
-			return -EINVAL;
-		}
-	}
-
-	/* Add cpus node to fdt */
-	cpus_offset = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"), "cpus");
-	if (cpus_offset < 0) {
-		pr_err("Error creating /cpus node: %s\n", fdt_strerror(cpus_offset));
-		return -EINVAL;
-	}
-
-	/* Add cpus node properties */
-	cpus_node = of_find_node_by_path("/cpus");
-	ret = add_node_props(fdt, cpus_offset, cpus_node);
-	of_node_put(cpus_node);
-	if (ret < 0)
-		return ret;
-
-	/* Loop through all subnodes of cpus and add them to fdt */
-	for_each_node_by_type(dn, "cpu") {
-		cpus_subnode_offset = fdt_add_subnode(fdt, cpus_offset, dn->full_name);
-		if (cpus_subnode_offset < 0) {
-			pr_err("Unable to add %s subnode: %s\n", dn->full_name,
-			       fdt_strerror(cpus_subnode_offset));
-			ret = cpus_subnode_offset;
-			goto out;
-		}
-
-		ret = add_node_props(fdt, cpus_subnode_offset, dn);
-		if (ret < 0)
-			goto out;
-	}
-out:
-	of_node_put(dn);
-	return ret;
-}
-
 static int copy_property(void *fdt, int node_offset, const struct device_node *dn,
 			 const char *propname)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG
  2023-04-23 10:52 [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Sourabh Jain
  2023-04-23 10:52 ` [PATCH v10 1/5] powerpc/kexec: turn some static helper functions public Sourabh Jain
@ 2023-04-23 10:52 ` Sourabh Jain
  2023-04-24  9:57   ` Laurent Dufour
  2023-04-23 10:52 ` [PATCH v10 3/5] powerpc/crash: add crash CPU hotplug support Sourabh Jain
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Sourabh Jain @ 2023-04-23 10:52 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: eric.devolder, bhe, mahesh, kexec, Laurent Dufour, ldufour,
	hbathini

Due to CPU/Memory hot plug/unplug or online/offline events the system
resources changes. A similar change should reflect in the loaded kdump
kernel kexec segments that describes the state of the CPU and memory of
the running kernel.

If the kdump kernel kexec segments are not updated after the CPU/Memory
hot plug/unplug or online/offline events and kdump kernel tries to
collect the dump with the stale system resource data then this might
lead to dump collection failure or an inaccurate dump collection.

The current method to keep the kdump kernel kexec segments up to date is
by reloading the complete kdump kernel whenever a CPU/Memory hot
plug/unplug or online/offline event is observed in userspace. Reloading
the kdump kernel for every CPU/Memory hot plug/unplug or online/offline
event is inefficient and creates a large window where the kdump service
is not available. It can be improved by doing in-kernel updates to only
necessary kdump kernel kexec segments which describe CPU and Memory
resources of the running kernel to the kdump kernel.

The kernel changes related to in-kernel updates to the kdump kernel
kexec segments are kept under the CRASH_HOTPLUG config option.

Later in the series, a powerpc crash hotplug handler is introduced to
update the kdump kernel kexec segments on CPU/Memory hotplug events.
This arch-specific handler is triggered from a generic crash handler
that registers with the CPU/Memory add/remove notifiers.

The CRASH_HOTPLUG config option is enabled by default.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
---
 arch/powerpc/Kconfig | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a6c4407d3ec83..ac0dc0ffe89b4 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -681,6 +681,18 @@ config CRASH_DUMP
 	  The same kernel binary can be used as production kernel and dump
 	  capture kernel.
 
+config CRASH_HOTPLUG
+	bool "In-kernel update to kdump kernel on system configuration changes"
+	default y
+	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+	help
+	  Quick and efficient mechanism to update the kdump kernel in the
+	  event of CPU/Memory hot plug/unplug or online/offline events. This
+	  approach does the in-kernel update to only necessary kexec segment
+	  instead of unload-reload entire kdump kernel from userspace.
+
+	  If unsure, say Y.
+
 config FA_DUMP
 	bool "Firmware-assisted dump"
 	depends on PPC64 && (PPC_RTAS || PPC_POWERNV)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 3/5] powerpc/crash: add crash CPU hotplug support
  2023-04-23 10:52 [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Sourabh Jain
  2023-04-23 10:52 ` [PATCH v10 1/5] powerpc/kexec: turn some static helper functions public Sourabh Jain
  2023-04-23 10:52 ` [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
@ 2023-04-23 10:52 ` Sourabh Jain
  2023-04-24  9:59   ` Laurent Dufour
  2023-04-23 10:52 ` [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Sourabh Jain @ 2023-04-23 10:52 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: eric.devolder, bhe, mahesh, kexec, Laurent Dufour, ldufour,
	hbathini

Introduce powerpc crash hotplug handler to update the necessary kexec
segments in the kernel on CPU/Memory hotplug events. Currently, these
updates are done by monitoring CPU/Memory hotplug events in userspace.

A common crash hotplug handler is triggered from generic infrastructure
for both CPU/Memory hotplug events. But in this patch, crash updates are
handled only for CPU hotplug events. Support for the crash update on
memory hotplug events is added in upcoming patches.

The elfcorehdr segment is used to exchange the CPU and other
dump-related information between the kernels. Ideally, the elfcorehdr
segment needs to be recreated on CPU hotplug events to reflect the
changes. But on powerpc, the elfcorehdr is built with possible CPUs
hence there is no need to update/recreate the elfcorehdr on CPU hotplug
events.

In addition to elfcorehdr, there is another kexec segment that holds CPU
data on powerpc is FDT (Flattened Device Tree). During the kdump kernel
boot, it is expected that the crashing CPU must be present in FDT, else
kdump kernel boot fails.

Now the only action needed on powerpc to handle the crash CPU hotplug
event is to add hot added CPUs in the kdump FDT segment to avoid kdump
kernel boot failure. So for the CPU hot add event, the FDT segment is
updated with hot added CPU and Since there is no need to remove the hot
unplugged CPUs from the FDT segment hence no action was taken for CPU
hot remove event.

To accommodate a growing number of CPUs, FDT is built with additional
buffer space to ensure that it can hold possible CPU nodes.

The changes done here will also work for the kexec_load system call
given that the kexec tool builds the FDT segment with additional space
to accommodate possible CPU nodes.

Since memory crash hotplug support is not there yet the crash hotplug
the handler simply warns the user and returns.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
---
 arch/powerpc/include/asm/kexec.h  |  4 ++
 arch/powerpc/kexec/core_64.c      | 61 +++++++++++++++++++++++++++++++
 arch/powerpc/kexec/elf_64.c       | 12 +++++-
 arch/powerpc/kexec/file_load_64.c | 14 +++++++
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 8090ad7d97d9d..f01ba767af56e 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -103,6 +103,10 @@ void kexec_copy_flush(struct kimage *image);
 struct crash_mem;
 int update_cpus_node(void *fdt);
 int get_crash_memory_ranges(struct crash_mem **mem_ranges);
+#if defined(CONFIG_CRASH_HOTPLUG)
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+#endif
 #endif
 
 #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 0b292f93a74cc..611b89bcea2be 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -543,6 +543,67 @@ int update_cpus_node(void *fdt)
 	return ret;
 }
 
+#if defined(CONFIG_CRASH_HOTPLUG)
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
+ *                                necessary kexec segments based on the hotplug event.
+ * @image: the active struct kimage
+ *
+ * Update FDT segment to include newly added CPU. No action for CPU remove case.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image)
+{
+	void *fdt, *ptr;
+	unsigned long mem;
+	int i, fdt_index = -1;
+	unsigned int hp_action = image->hp_action;
+
+	/*
+	 * Since the hot-unplugged CPU is already part of crash FDT,
+	 * no action is needed for CPU remove case.
+	 */
+	if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+		return;
+
+	/* crash update on memory hotplug events is not supported yet */
+	if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
+		pr_info_once("Crash update is not supported for memory hotplug\n");
+		return;
+	}
+
+	/* Find the FDT segment index in kexec segment array. */
+	for (i = 0; i < image->nr_segments; i++) {
+		mem = image->segment[i].mem;
+		ptr = __va(mem);
+
+		if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
+			fdt_index = i;
+			break;
+		}
+	}
+
+	if (fdt_index < 0) {
+		pr_err("Unable to locate FDT segment.\n");
+		return;
+	}
+
+	fdt = __va((void *)image->segment[fdt_index].mem);
+
+	/* Temporarily invalidate the crash image while it is replaced */
+	xchg(&kexec_crash_image, NULL);
+
+	/* update FDT to refelect changes in CPU resrouces */
+	if (update_cpus_node(fdt))
+		pr_err("Failed to update crash FDT");
+
+	/* The crash image is now valid once again */
+	xchg(&kexec_crash_image, image);
+}
+#endif
+
 #ifdef CONFIG_PPC_64S_HASH_MMU
 /* Values we need to export to the second kernel via the device tree. */
 static unsigned long htab_base;
diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
index eeb258002d1e0..d8f6d967231e8 100644
--- a/arch/powerpc/kexec/elf_64.c
+++ b/arch/powerpc/kexec/elf_64.c
@@ -30,6 +30,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
 			unsigned long cmdline_len)
 {
 	int ret;
+	bool do_pack_fdt = true;
 	unsigned long kernel_load_addr;
 	unsigned long initrd_load_addr = 0, fdt_load_addr;
 	void *fdt;
@@ -116,7 +117,16 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
 	if (ret)
 		goto out_free_fdt;
 
-	fdt_pack(fdt);
+#if defined(CONFIG_CRASH_HOTPLUG)
+	/*
+	 * Do not pack FDT, additional space is reserved to accommodate
+	 * possible CPU nodes which are not yet present in the system.
+	 */
+	if (image->type == KEXEC_TYPE_CRASH)
+		do_pack_fdt = false;
+#endif
+	if (do_pack_fdt)
+		fdt_pack(fdt);
 
 	kbuf.buffer = fdt;
 	kbuf.bufsz = kbuf.memsz = fdt_totalsize(fdt);
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 5b0b3f61e0e72..3554168687869 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -908,6 +908,9 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 	unsigned int cpu_nodes, extra_size = 0;
 	struct device_node *dn;
 	u64 usm_entries;
+#if defined(CONFIG_CRASH_HOTPLUG)
+	unsigned int possible_cpu_nodes;
+#endif
 
 	// Budget some space for the password blob. There's already extra space
 	// for the key name
@@ -940,6 +943,17 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
 	if (cpu_nodes > boot_cpu_node_count)
 		extra_size += (cpu_nodes - boot_cpu_node_count) * cpu_node_size();
 
+#if defined(CONFIG_CRASH_HOTPLUG)
+	/*
+	 * Make sure enough space is reserved to accommodate possible CPU nodes
+	 * in the crash FDT. This allows packing possible CPU nodes which are
+	 * not yet present in the system without regenerating the entire FDT.
+	 */
+	possible_cpu_nodes = num_possible_cpus() / threads_per_core;
+	if (image->type == KEXEC_TYPE_CRASH && possible_cpu_nodes > cpu_nodes)
+		extra_size += (possible_cpu_nodes - cpu_nodes) * cpu_node_size();
+#endif
+
 	return extra_size;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler
  2023-04-23 10:52 [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Sourabh Jain
                   ` (2 preceding siblings ...)
  2023-04-23 10:52 ` [PATCH v10 3/5] powerpc/crash: add crash CPU hotplug support Sourabh Jain
@ 2023-04-23 10:52 ` Sourabh Jain
  2023-04-24  9:59   ` Laurent Dufour
  2023-04-24 14:02   ` Eric DeVolder
  2023-04-23 10:52 ` [PATCH v10 5/5] powerpc/kexec: add crash memory hotplug support Sourabh Jain
  2023-04-24 14:05 ` [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Eric DeVolder
  5 siblings, 2 replies; 14+ messages in thread
From: Sourabh Jain @ 2023-04-23 10:52 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: eric.devolder, bhe, mahesh, kexec, Laurent Dufour, ldufour,
	hbathini

On PowePC memblock regions are used to prepare elfcorehdr which
describes the memory regions of the running kernel to the kdump kernel.
Since the notifier used for the memory hotplug crash handler gets
initiated before the update of the memblock region happens (as depicted
below) the newly prepared elfcorehdr still holds the old memory regions.
If the elfcorehdr is prepared with stale memblock regions then the newly
prepared elfcorehdr will still be holding stale memory regions. And dump
collection with stale elfcorehdr will lead to dump collection failure or
incomplete dump collection.

The sequence of actions done on PowerPC when an LMB memory hot removed:

 Initiate memory hot remove
          |
          v
 offline pages
          |
          v
 initiate memory notify call
 chain for MEM_OFFLINE event  <---> Prepare new elfcorehdr and replace
 				    it with old one
          |
          v
 update memblock regions

Such challenges only exist for memory remove case. For the memory add
case the memory regions are updated first and then memory notify calls
the arch crash hotplug handler to update the elfcorehdr.

This patch passes additional information about the hot removed LMB to
the arch crash hotplug handler in the form of memory_notify object.

How passing memory_notify to arch crash hotplug handler will help?

memory_notify holds the start PFN and page count of the hot removed
memory. With that base address and the size of the hot removed memory
can be calculated and same can be used to avoid adding hot removed
memory region to get added in the elfcorehdr.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
---
 arch/powerpc/include/asm/kexec.h |  2 +-
 arch/powerpc/kexec/core_64.c     |  3 ++-
 arch/x86/include/asm/kexec.h     |  2 +-
 arch/x86/kernel/crash.c          |  3 ++-
 include/linux/kexec.h            |  2 +-
 kernel/crash_core.c              | 14 +++++++-------
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index f01ba767af56e..7e811bad5ec92 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -104,7 +104,7 @@ struct crash_mem;
 int update_cpus_node(void *fdt);
 int get_crash_memory_ranges(struct crash_mem **mem_ranges);
 #if defined(CONFIG_CRASH_HOTPLUG)
-void arch_crash_handle_hotplug_event(struct kimage *image);
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 #endif
 #endif
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 611b89bcea2be..147ea6288a526 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -551,10 +551,11 @@ int update_cpus_node(void *fdt)
  * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
  *                                necessary kexec segments based on the hotplug event.
  * @image: the active struct kimage
+ * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
  *
  * Update FDT segment to include newly added CPU. No action for CPU remove case.
  */
-void arch_crash_handle_hotplug_event(struct kimage *image)
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
 	void *fdt, *ptr;
 	unsigned long mem;
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 1bc852ce347d4..70c3b23b468b6 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -213,7 +213,7 @@ extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
 #ifdef CONFIG_CRASH_HOTPLUG
-void arch_crash_handle_hotplug_event(struct kimage *image);
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index ead602636f3e0..b45d13193b579 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -445,11 +445,12 @@ int crash_load_segments(struct kimage *image)
 /**
  * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
  * @image: the active struct kimage
+ * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
  *
  * The new elfcorehdr is prepared in a kernel buffer, and then it is
  * written on top of the existing/old elfcorehdr.
  */
-void arch_crash_handle_hotplug_event(struct kimage *image)
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
 	void *elfbuf = NULL, *old_elfcorehdr;
 	unsigned long nr_mem_ranges;
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0ac41f48de0b1..69765e6a92d0d 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -506,7 +506,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
 #endif
 
 #ifndef arch_crash_handle_hotplug_event
-static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { }
 #endif
 
 #ifndef crash_hotplug_cpu_support
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 4aa3c7a6b390f..7afe5f60d2bfe 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -718,7 +718,7 @@ subsys_initcall(crash_save_vmcoreinfo_init);
  * list of segments it checks (since the elfcorehdr changes and thus
  * would require an update to purgatory itself to update the digest).
  */
-static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg)
 {
 	struct kimage *image;
 
@@ -775,7 +775,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
 	image->hp_action = hp_action;
 
 	/* Now invoke arch-specific update handler */
-	arch_crash_handle_hotplug_event(image);
+	arch_crash_handle_hotplug_event(image, arg);
 
 	/* No longer handling a hotplug event */
 	image->hp_action = KEXEC_CRASH_HP_NONE;
@@ -789,17 +789,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
 	kexec_unlock();
 }
 
-static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg)
 {
 	switch (val) {
 	case MEM_ONLINE:
 		crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
-			KEXEC_CRASH_HP_INVALID_CPU);
+			KEXEC_CRASH_HP_INVALID_CPU, arg);
 		break;
 
 	case MEM_OFFLINE:
 		crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
-			KEXEC_CRASH_HP_INVALID_CPU);
+			KEXEC_CRASH_HP_INVALID_CPU, arg);
 		break;
 	}
 	return NOTIFY_OK;
@@ -812,13 +812,13 @@ static struct notifier_block crash_memhp_nb = {
 
 static int crash_cpuhp_online(unsigned int cpu)
 {
-	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu, NULL);
 	return 0;
 }
 
 static int crash_cpuhp_offline(unsigned int cpu)
 {
-	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu, NULL);
 	return 0;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v10 5/5] powerpc/kexec: add crash memory hotplug support
  2023-04-23 10:52 [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Sourabh Jain
                   ` (3 preceding siblings ...)
  2023-04-23 10:52 ` [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
@ 2023-04-23 10:52 ` Sourabh Jain
  2023-04-24 10:00   ` Laurent Dufour
  2023-04-24 14:05 ` [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Eric DeVolder
  5 siblings, 1 reply; 14+ messages in thread
From: Sourabh Jain @ 2023-04-23 10:52 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: eric.devolder, bhe, mahesh, kexec, Laurent Dufour, ldufour,
	hbathini

Extend PowerPC arch crash hotplug handler to support memory hotplug
events. Since elfcorehdr is used to exchange the memory info between the
kernels hence it needs to be recreated to reflect the changes due to
memory hotplug events.

The way memory hotplug events are handled on PowerPC and the notifier
call chain used in generic code to trigger the arch crash handler, the
process to recreate the elfcorehdr is different for memory add and
remove case.

For memory remove case the memory change notifier call chain is
triggered first and then memblock regions is updated. Whereas for the
memory hot add case, memblock regions are updated before invoking the
memory change notifier call chain.

On PowerPC, memblock regions list is used to prepare the elfcorehdr. In
case of memory hot remove the memblock regions are updated after the
arch crash hotplug handler is triggered, hence an additional step is
taken to ensure that memory ranges used to prepare elfcorehdr do not
include hot removed memory.

When memory is hot removed it possible that memory regions count may
increase. So to accommodate a growing number of memory regions, the
elfcorehdr kexec segment is built with additional buffer space.

The changes done here will also work for the kexec_load system call given
that the kexec tool builds the elfcoredhr with additional space to
accommodate future memory regions as it is done for kexec_file_load
system call in the kernel.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
---
 arch/powerpc/include/asm/kexec_ranges.h |  1 +
 arch/powerpc/kexec/core_64.c            | 77 +++++++++++++++++++++-
 arch/powerpc/kexec/file_load_64.c       | 36 ++++++++++-
 arch/powerpc/kexec/ranges.c             | 85 +++++++++++++++++++++++++
 4 files changed, 195 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h
index f83866a19e870..802abf580cf0f 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,6 +7,7 @@
 void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
 struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
 int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
 int add_tce_mem_ranges(struct crash_mem **mem_ranges);
 int add_initrd_mem_range(struct crash_mem **mem_ranges);
 #ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 147ea6288a526..01a764b1c9b07 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -19,6 +19,7 @@
 #include <linux/of.h>
 #include <linux/libfdt.h>
 #include <linux/memblock.h>
+#include <linux/memory.h>
 
 #include <asm/page.h>
 #include <asm/current.h>
@@ -547,6 +548,76 @@ int update_cpus_node(void *fdt)
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
 
+/**
+ * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old
+ *			       elfcorehdr in the kexec segment array.
+ * @image: the active struct kimage
+ * @arg: struct memory_notify data handler
+ */
+static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *mn)
+{
+	int ret;
+	struct crash_mem *cmem = NULL;
+	struct kexec_segment *ksegment;
+	void *ptr, *mem, *elfbuf = NULL;
+	unsigned long elfsz, memsz, base_addr, size;
+
+	ksegment = &image->segment[image->elfcorehdr_index];
+	mem = (void *) ksegment->mem;
+	memsz = ksegment->memsz;
+
+	ret = get_crash_memory_ranges(&cmem);
+	if (ret) {
+		pr_err("Failed to get crash mem range\n");
+		return;
+	}
+
+	/*
+	 * The hot unplugged memory is not yet removed from crash memory
+	 * ranges, remove it here.
+	 */
+	if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
+		base_addr = PFN_PHYS(mn->start_pfn);
+		size = mn->nr_pages * PAGE_SIZE;
+		ret = remove_mem_range(&cmem, base_addr, size);
+		if (ret) {
+			pr_err("Failed to remove hot-unplugged from crash memory ranges.\n");
+			return;
+		}
+	}
+
+	ret = crash_prepare_elf64_headers(cmem, false, &elfbuf, &elfsz);
+	if (ret) {
+		pr_err("Failed to prepare elf header\n");
+		return;
+	}
+
+	/*
+	 * It is unlikely that kernel hit this because elfcorehdr kexec
+	 * segment (memsz) is built with addition space to accommodate growing
+	 * number of crash memory ranges while loading the kdump kernel. It is
+	 * Just to avoid any unforeseen case.
+	 */
+	if (elfsz > memsz) {
+		pr_err("Updated crash elfcorehdr elfsz %lu > memsz %lu", elfsz, memsz);
+		goto out;
+	}
+
+	ptr = __va(mem);
+	if (ptr) {
+		/* Temporarily invalidate the crash image while it is replaced */
+		xchg(&kexec_crash_image, NULL);
+
+		/* Replace the old elfcorehdr with newly prepared elfcorehdr */
+		memcpy((void *)ptr, elfbuf, elfsz);
+
+		/* The crash image is now valid once again */
+		xchg(&kexec_crash_image, image);
+	}
+out:
+	vfree(elfbuf);
+}
+
 /**
  * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
  *                                necessary kexec segments based on the hotplug event.
@@ -554,12 +625,14 @@ int update_cpus_node(void *fdt)
  * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
  *
  * Update FDT segment to include newly added CPU. No action for CPU remove case.
+ * Recreate the elfcorehdr for Memory add/remove case and replace it with old one.
  */
 void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
 	void *fdt, *ptr;
 	unsigned long mem;
 	int i, fdt_index = -1;
+	struct memory_notify *mn;
 	unsigned int hp_action = image->hp_action;
 
 	/*
@@ -569,9 +642,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 	if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
 		return;
 
-	/* crash update on memory hotplug events is not supported yet */
 	if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
-		pr_info_once("Crash update is not supported for memory hotplug\n");
+		mn = (struct memory_notify *) arg;
+		update_crash_elfcorehdr(image, mn);
 		return;
 	}
 
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 3554168687869..db7ba1c8c5e0b 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -21,6 +21,8 @@
 #include <linux/memblock.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/elf.h>
+
 #include <asm/setup.h>
 #include <asm/drmem.h>
 #include <asm/firmware.h>
@@ -707,6 +709,30 @@ static void update_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr)
 	}
 }
 
+/* get_max_phdr - Find the total number of Phdr needed to represent the
+ *		  max memory in the kdump elfcorehdr.
+ *
+ * @cmem: crash memory ranges in the system.
+ */
+static int get_max_phdr(struct crash_mem *cmem)
+{
+	int max_lmb;
+
+	/* In the worst case, a Phdr is needed for every other LMB to be represented
+	 * as an individual crash range.
+	 */
+	max_lmb = memory_hotplug_max() / 2 * drmem_lmb_size();
+
+	/* Do not cross the Phdr max limit of the elf header.
+	 * Avoid counting Phdr for crash ranges (cmem->nr_ranges) which
+	 * are already part of elfcorehdr.
+	 */
+	if (max_lmb > PN_XNUM)
+		return PN_XNUM - cmem->nr_ranges;
+
+	return max_lmb - cmem->nr_ranges;
+}
+
 /**
  * load_elfcorehdr_segment - Setup crash memory ranges and initialize elfcorehdr
  *                           segment needed to load kdump kernel.
@@ -738,7 +764,13 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
 
 	kbuf->buffer = headers;
 	kbuf->mem = KEXEC_BUF_MEM_UNKNOWN;
-	kbuf->bufsz = kbuf->memsz = headers_sz;
+	kbuf->bufsz = headers_sz;
+/* Additional buffer space to accommodate future memory ranges */
+#if defined(CONFIG_MEMORY_HOTPLUG)
+	kbuf->memsz = headers_sz + get_max_phdr(cmem) * sizeof(Elf64_Phdr);
+#else
+	kbuf->memsz = headers_sz;
+#endif
 	kbuf->top_down = false;
 
 	ret = kexec_add_buffer(kbuf);
@@ -748,7 +780,7 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
 	}
 
 	image->elf_load_addr = kbuf->mem;
-	image->elf_headers_sz = headers_sz;
+	image->elf_headers_sz = kbuf->memsz;
 	image->elf_headers = headers;
 out:
 	kfree(cmem);
diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
index 5fc53a5fcfdf6..d8007363cdc11 100644
--- a/arch/powerpc/kexec/ranges.c
+++ b/arch/powerpc/kexec/ranges.c
@@ -234,6 +234,91 @@ int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
 	return __add_mem_range(mem_ranges, base, size);
 }
 
+/**
+ * remove_mem_range - Removes the given memory range from the range list.
+ * @mem_ranges:    Range list to remove the memory range to.
+ * @base:          Base address of the range to remove.
+ * @size:          Size of the memory range to remove.
+ *
+ * (Re)allocates memory, if needed.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
+{
+	u64 end;
+	int ret = 0;
+	unsigned int i;
+	u64 mstart, mend;
+	struct crash_mem *mem_rngs = *mem_ranges;
+
+	if (!size)
+		return 0;
+
+	/*
+	 * Memory range are stored as start and end address, use
+	 * the same format to do remove operation.
+	 */
+	end = base + size - 1;
+
+	for (i = 0; i < mem_rngs->nr_ranges; i++) {
+		mstart = mem_rngs->ranges[i].start;
+		mend = mem_rngs->ranges[i].end;
+
+		/*
+		 * Memory range to remove is not part of this range entry
+		 * in the memory range list
+		 */
+		if (!(base >= mstart && end <= mend))
+			continue;
+
+		/*
+		 * Memory range to remove is equivalent to this entry in the
+		 * memory range list. Remove the range entry from the list.
+		 */
+		if (base == mstart && end == mend) {
+			for (; i < mem_rngs->nr_ranges - 1; i++) {
+				mem_rngs->ranges[i].start = mem_rngs->ranges[i+1].start;
+				mem_rngs->ranges[i].end = mem_rngs->ranges[i+1].end;
+			}
+			mem_rngs->nr_ranges--;
+			goto out;
+		}
+		/*
+		 * Start address of the memory range to remove and the
+		 * current memory range entry in the list is same. Just
+		 * move the start address of the current memory range
+		 * entry in the list to end + 1.
+		 */
+		else if (base == mstart) {
+			mem_rngs->ranges[i].start = end + 1;
+			goto out;
+		}
+		/*
+		 * End address of the memory range to remove and the
+		 * current memory range entry in the list is same.
+		 * Just move the end address of the current memory
+		 * range entry in the list to base - 1.
+		 */
+		else if (end == mend)  {
+			mem_rngs->ranges[i].end = base - 1;
+			goto out;
+		}
+		/*
+		 * Memory range to remove is not at the edge of current
+		 * memory range entry. Split the current memory entry into
+		 * two half.
+		 */
+		else {
+			mem_rngs->ranges[i].end = base - 1;
+			size = mem_rngs->ranges[i].end - end;
+			ret = add_mem_range(mem_ranges, end + 1, size);
+		}
+	}
+out:
+	return ret;
+}
+
 /**
  * add_tce_mem_ranges - Adds tce-table range to the given memory ranges list.
  * @mem_ranges:         Range list to add the memory range(s) to.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG
  2023-04-23 10:52 ` [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
@ 2023-04-24  9:57   ` Laurent Dufour
  2023-04-24 15:00     ` Sourabh Jain
  0 siblings, 1 reply; 14+ messages in thread
From: Laurent Dufour @ 2023-04-24  9:57 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: bhe, mahesh, kexec, Laurent Dufour, eric.devolder, hbathini

On 23/04/2023 12:52:10, Sourabh Jain wrote:
> Due to CPU/Memory hot plug/unplug or online/offline events the system
> resources changes. A similar change should reflect in the loaded kdump
> kernel kexec segments that describes the state of the CPU and memory of
> the running kernel.
> 
> If the kdump kernel kexec segments are not updated after the CPU/Memory
> hot plug/unplug or online/offline events and kdump kernel tries to
> collect the dump with the stale system resource data then this might
> lead to dump collection failure or an inaccurate dump collection.
> 
> The current method to keep the kdump kernel kexec segments up to date is
> by reloading the complete kdump kernel whenever a CPU/Memory hot
> plug/unplug or online/offline event is observed in userspace. Reloading
> the kdump kernel for every CPU/Memory hot plug/unplug or online/offline
> event is inefficient and creates a large window where the kdump service
> is not available. It can be improved by doing in-kernel updates to only
> necessary kdump kernel kexec segments which describe CPU and Memory
> resources of the running kernel to the kdump kernel.
> 
> The kernel changes related to in-kernel updates to the kdump kernel
> kexec segments are kept under the CRASH_HOTPLUG config option.
> 
> Later in the series, a powerpc crash hotplug handler is introduced to
> update the kdump kernel kexec segments on CPU/Memory hotplug events.
> This arch-specific handler is triggered from a generic crash handler
> that registers with the CPU/Memory add/remove notifiers.
> 
> The CRASH_HOTPLUG config option is enabled by default.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>

I can't remember having sent a review-by on that patch earlier.

Anyway, I can't find any issue with that one, so replace with:
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>

> ---
>  arch/powerpc/Kconfig | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index a6c4407d3ec83..ac0dc0ffe89b4 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -681,6 +681,18 @@ config CRASH_DUMP
>  	  The same kernel binary can be used as production kernel and dump
>  	  capture kernel.
>  
> +config CRASH_HOTPLUG
> +	bool "In-kernel update to kdump kernel on system configuration changes"
> +	default y
> +	depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
> +	help
> +	  Quick and efficient mechanism to update the kdump kernel in the
> +	  event of CPU/Memory hot plug/unplug or online/offline events. This
> +	  approach does the in-kernel update to only necessary kexec segment
> +	  instead of unload-reload entire kdump kernel from userspace.
> +
> +	  If unsure, say Y.
> +
>  config FA_DUMP
>  	bool "Firmware-assisted dump"
>  	depends on PPC64 && (PPC_RTAS || PPC_POWERNV)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 3/5] powerpc/crash: add crash CPU hotplug support
  2023-04-23 10:52 ` [PATCH v10 3/5] powerpc/crash: add crash CPU hotplug support Sourabh Jain
@ 2023-04-24  9:59   ` Laurent Dufour
  0 siblings, 0 replies; 14+ messages in thread
From: Laurent Dufour @ 2023-04-24  9:59 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: bhe, mahesh, kexec, Laurent Dufour, eric.devolder, hbathini

On 23/04/2023 12:52:11, Sourabh Jain wrote:
> Introduce powerpc crash hotplug handler to update the necessary kexec
> segments in the kernel on CPU/Memory hotplug events. Currently, these
> updates are done by monitoring CPU/Memory hotplug events in userspace.
> 
> A common crash hotplug handler is triggered from generic infrastructure
> for both CPU/Memory hotplug events. But in this patch, crash updates are
> handled only for CPU hotplug events. Support for the crash update on
> memory hotplug events is added in upcoming patches.
> 
> The elfcorehdr segment is used to exchange the CPU and other
> dump-related information between the kernels. Ideally, the elfcorehdr
> segment needs to be recreated on CPU hotplug events to reflect the
> changes. But on powerpc, the elfcorehdr is built with possible CPUs
> hence there is no need to update/recreate the elfcorehdr on CPU hotplug
> events.
> 
> In addition to elfcorehdr, there is another kexec segment that holds CPU
> data on powerpc is FDT (Flattened Device Tree). During the kdump kernel
> boot, it is expected that the crashing CPU must be present in FDT, else
> kdump kernel boot fails.
> 
> Now the only action needed on powerpc to handle the crash CPU hotplug
> event is to add hot added CPUs in the kdump FDT segment to avoid kdump
> kernel boot failure. So for the CPU hot add event, the FDT segment is
> updated with hot added CPU and Since there is no need to remove the hot
> unplugged CPUs from the FDT segment hence no action was taken for CPU
> hot remove event.
> 
> To accommodate a growing number of CPUs, FDT is built with additional
> buffer space to ensure that it can hold possible CPU nodes.
> 
> The changes done here will also work for the kexec_load system call
> given that the kexec tool builds the FDT segment with additional space
> to accommodate possible CPU nodes.
> 
> Since memory crash hotplug support is not there yet the crash hotplug
> the handler simply warns the user and returns.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com
I don't remember sending a review-by on this patch earlier, do you?

> ---
>  arch/powerpc/include/asm/kexec.h  |  4 ++
>  arch/powerpc/kexec/core_64.c      | 61 +++++++++++++++++++++++++++++++
>  arch/powerpc/kexec/elf_64.c       | 12 +++++-
>  arch/powerpc/kexec/file_load_64.c | 14 +++++++
>  4 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index 8090ad7d97d9d..f01ba767af56e 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -103,6 +103,10 @@ void kexec_copy_flush(struct kimage *image);
>  struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +void arch_crash_handle_hotplug_event(struct kimage *image);
> +#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
> +#endif
>  #endif
>  
>  #if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 0b292f93a74cc..611b89bcea2be 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -543,6 +543,67 @@ int update_cpus_node(void *fdt)
>  	return ret;
>  }
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +#undef pr_fmt
> +#define pr_fmt(fmt) "crash hp: " fmt
> +
> +/**
> + * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
> + *                                necessary kexec segments based on the hotplug event.
> + * @image: the active struct kimage
> + *
> + * Update FDT segment to include newly added CPU. No action for CPU remove case.
> + */
> +void arch_crash_handle_hotplug_event(struct kimage *image)
> +{
> +	void *fdt, *ptr;
> +	unsigned long mem;
> +	int i, fdt_index = -1;
> +	unsigned int hp_action = image->hp_action;
> +
> +	/*
> +	 * Since the hot-unplugged CPU is already part of crash FDT,
> +	 * no action is needed for CPU remove case.
> +	 */
> +	if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
> +		return;
> +
> +	/* crash update on memory hotplug events is not supported yet */
> +	if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
> +		pr_info_once("Crash update is not supported for memory hotplug\n");
> +		return;
> +	}
> +
> +	/* Find the FDT segment index in kexec segment array. */
> +	for (i = 0; i < image->nr_segments; i++) {
> +		mem = image->segment[i].mem;
> +		ptr = __va(mem);
> +
> +		if (ptr && fdt_magic(ptr) == FDT_MAGIC) {
> +			fdt_index = i;
> +			break;
> +		}
> +	}
> +
> +	if (fdt_index < 0) {
> +		pr_err("Unable to locate FDT segment.\n");
> +		return;
> +	}
> +
> +	fdt = __va((void *)image->segment[fdt_index].mem);
> +
> +	/* Temporarily invalidate the crash image while it is replaced */
> +	xchg(&kexec_crash_image, NULL);
> +
> +	/* update FDT to refelect changes in CPU resrouces */
> +	if (update_cpus_node(fdt))
> +		pr_err("Failed to update crash FDT");
> +
> +	/* The crash image is now valid once again */
> +	xchg(&kexec_crash_image, image);
> +}
> +#endif
> +
>  #ifdef CONFIG_PPC_64S_HASH_MMU
>  /* Values we need to export to the second kernel via the device tree. */
>  static unsigned long htab_base;
> diff --git a/arch/powerpc/kexec/elf_64.c b/arch/powerpc/kexec/elf_64.c
> index eeb258002d1e0..d8f6d967231e8 100644
> --- a/arch/powerpc/kexec/elf_64.c
> +++ b/arch/powerpc/kexec/elf_64.c
> @@ -30,6 +30,7 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
>  			unsigned long cmdline_len)
>  {
>  	int ret;
> +	bool do_pack_fdt = true;
>  	unsigned long kernel_load_addr;
>  	unsigned long initrd_load_addr = 0, fdt_load_addr;
>  	void *fdt;
> @@ -116,7 +117,16 @@ static void *elf64_load(struct kimage *image, char *kernel_buf,
>  	if (ret)
>  		goto out_free_fdt;
>  
> -	fdt_pack(fdt);
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +	/*
> +	 * Do not pack FDT, additional space is reserved to accommodate
> +	 * possible CPU nodes which are not yet present in the system.
> +	 */
> +	if (image->type == KEXEC_TYPE_CRASH)
> +		do_pack_fdt = false;
> +#endif
> +	if (do_pack_fdt)
> +		fdt_pack(fdt);
>  
>  	kbuf.buffer = fdt;
>  	kbuf.bufsz = kbuf.memsz = fdt_totalsize(fdt);
> diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
> index 5b0b3f61e0e72..3554168687869 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -908,6 +908,9 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
>  	unsigned int cpu_nodes, extra_size = 0;
>  	struct device_node *dn;
>  	u64 usm_entries;
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +	unsigned int possible_cpu_nodes;
> +#endif
>  
>  	// Budget some space for the password blob. There's already extra space
>  	// for the key name
> @@ -940,6 +943,17 @@ unsigned int kexec_extra_fdt_size_ppc64(struct kimage *image)
>  	if (cpu_nodes > boot_cpu_node_count)
>  		extra_size += (cpu_nodes - boot_cpu_node_count) * cpu_node_size();
>  
> +#if defined(CONFIG_CRASH_HOTPLUG)
> +	/*
> +	 * Make sure enough space is reserved to accommodate possible CPU nodes
> +	 * in the crash FDT. This allows packing possible CPU nodes which are
> +	 * not yet present in the system without regenerating the entire FDT.
> +	 */
> +	possible_cpu_nodes = num_possible_cpus() / threads_per_core;
> +	if (image->type == KEXEC_TYPE_CRASH && possible_cpu_nodes > cpu_nodes)
> +		extra_size += (possible_cpu_nodes - cpu_nodes) * cpu_node_size();
> +#endif
> +
>  	return extra_size;
>  }
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler
  2023-04-23 10:52 ` [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
@ 2023-04-24  9:59   ` Laurent Dufour
  2023-04-24 14:02   ` Eric DeVolder
  1 sibling, 0 replies; 14+ messages in thread
From: Laurent Dufour @ 2023-04-24  9:59 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: bhe, mahesh, kexec, Laurent Dufour, eric.devolder, hbathini

On 23/04/2023 12:52:12, Sourabh Jain wrote:
> On PowePC memblock regions are used to prepare elfcorehdr which
> describes the memory regions of the running kernel to the kdump kernel.
> Since the notifier used for the memory hotplug crash handler gets
> initiated before the update of the memblock region happens (as depicted
> below) the newly prepared elfcorehdr still holds the old memory regions.
> If the elfcorehdr is prepared with stale memblock regions then the newly
> prepared elfcorehdr will still be holding stale memory regions. And dump
> collection with stale elfcorehdr will lead to dump collection failure or
> incomplete dump collection.
> 
> The sequence of actions done on PowerPC when an LMB memory hot removed:
> 
>  Initiate memory hot remove
>           |
>           v
>  offline pages
>           |
>           v
>  initiate memory notify call
>  chain for MEM_OFFLINE event  <---> Prepare new elfcorehdr and replace
>  				    it with old one
>           |
>           v
>  update memblock regions
> 
> Such challenges only exist for memory remove case. For the memory add
> case the memory regions are updated first and then memory notify calls
> the arch crash hotplug handler to update the elfcorehdr.
> 
> This patch passes additional information about the hot removed LMB to
> the arch crash hotplug handler in the form of memory_notify object.
> 
> How passing memory_notify to arch crash hotplug handler will help?
> 
> memory_notify holds the start PFN and page count of the hot removed
> memory. With that base address and the size of the hot removed memory
> can be calculated and same can be used to avoid adding hot removed
> memory region to get added in the elfcorehdr.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>

I don't remember sending a review-by on this patch earlier, do you?

> ---
>  arch/powerpc/include/asm/kexec.h |  2 +-
>  arch/powerpc/kexec/core_64.c     |  3 ++-
>  arch/x86/include/asm/kexec.h     |  2 +-
>  arch/x86/kernel/crash.c          |  3 ++-
>  include/linux/kexec.h            |  2 +-
>  kernel/crash_core.c              | 14 +++++++-------
>  6 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index f01ba767af56e..7e811bad5ec92 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -104,7 +104,7 @@ struct crash_mem;
>  int update_cpus_node(void *fdt);
>  int get_crash_memory_ranges(struct crash_mem **mem_ranges);
>  #if defined(CONFIG_CRASH_HOTPLUG)
> -void arch_crash_handle_hotplug_event(struct kimage *image);
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>  #endif
>  #endif
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 611b89bcea2be..147ea6288a526 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -551,10 +551,11 @@ int update_cpus_node(void *fdt)
>   * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
>   *                                necessary kexec segments based on the hotplug event.
>   * @image: the active struct kimage
> + * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
>   *
>   * Update FDT segment to include newly added CPU. No action for CPU remove case.
>   */
> -void arch_crash_handle_hotplug_event(struct kimage *image)
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>  {
>  	void *fdt, *ptr;
>  	unsigned long mem;
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 1bc852ce347d4..70c3b23b468b6 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -213,7 +213,7 @@ extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>  extern void kdump_nmi_shootdown_cpus(void);
>  
>  #ifdef CONFIG_CRASH_HOTPLUG
> -void arch_crash_handle_hotplug_event(struct kimage *image);
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>  #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index ead602636f3e0..b45d13193b579 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -445,11 +445,12 @@ int crash_load_segments(struct kimage *image)
>  /**
>   * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>   * @image: the active struct kimage
> + * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
>   *
>   * The new elfcorehdr is prepared in a kernel buffer, and then it is
>   * written on top of the existing/old elfcorehdr.
>   */
> -void arch_crash_handle_hotplug_event(struct kimage *image)
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>  {
>  	void *elfbuf = NULL, *old_elfcorehdr;
>  	unsigned long nr_mem_ranges;
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0ac41f48de0b1..69765e6a92d0d 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -506,7 +506,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>  #endif
>  
>  #ifndef arch_crash_handle_hotplug_event
> -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
> +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { }
>  #endif
>  
>  #ifndef crash_hotplug_cpu_support
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 4aa3c7a6b390f..7afe5f60d2bfe 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -718,7 +718,7 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>   * list of segments it checks (since the elfcorehdr changes and thus
>   * would require an update to purgatory itself to update the digest).
>   */
> -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg)
>  {
>  	struct kimage *image;
>  
> @@ -775,7 +775,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>  	image->hp_action = hp_action;
>  
>  	/* Now invoke arch-specific update handler */
> -	arch_crash_handle_hotplug_event(image);
> +	arch_crash_handle_hotplug_event(image, arg);
>  
>  	/* No longer handling a hotplug event */
>  	image->hp_action = KEXEC_CRASH_HP_NONE;
> @@ -789,17 +789,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>  	kexec_unlock();
>  }
>  
> -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
> +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg)
>  {
>  	switch (val) {
>  	case MEM_ONLINE:
>  		crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
> -			KEXEC_CRASH_HP_INVALID_CPU);
> +			KEXEC_CRASH_HP_INVALID_CPU, arg);
>  		break;
>  
>  	case MEM_OFFLINE:
>  		crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
> -			KEXEC_CRASH_HP_INVALID_CPU);
> +			KEXEC_CRASH_HP_INVALID_CPU, arg);
>  		break;
>  	}
>  	return NOTIFY_OK;
> @@ -812,13 +812,13 @@ static struct notifier_block crash_memhp_nb = {
>  
>  static int crash_cpuhp_online(unsigned int cpu)
>  {
> -	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
> +	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu, NULL);
>  	return 0;
>  }
>  
>  static int crash_cpuhp_offline(unsigned int cpu)
>  {
> -	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
> +	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu, NULL);
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 5/5] powerpc/kexec: add crash memory hotplug support
  2023-04-23 10:52 ` [PATCH v10 5/5] powerpc/kexec: add crash memory hotplug support Sourabh Jain
@ 2023-04-24 10:00   ` Laurent Dufour
  0 siblings, 0 replies; 14+ messages in thread
From: Laurent Dufour @ 2023-04-24 10:00 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: bhe, mahesh, kexec, Laurent Dufour, eric.devolder, hbathini

On 23/04/2023 12:52:13, Sourabh Jain wrote:
> Extend PowerPC arch crash hotplug handler to support memory hotplug
> events. Since elfcorehdr is used to exchange the memory info between the
> kernels hence it needs to be recreated to reflect the changes due to
> memory hotplug events.
> 
> The way memory hotplug events are handled on PowerPC and the notifier
> call chain used in generic code to trigger the arch crash handler, the
> process to recreate the elfcorehdr is different for memory add and
> remove case.
> 
> For memory remove case the memory change notifier call chain is
> triggered first and then memblock regions is updated. Whereas for the
> memory hot add case, memblock regions are updated before invoking the
> memory change notifier call chain.
> 
> On PowerPC, memblock regions list is used to prepare the elfcorehdr. In
> case of memory hot remove the memblock regions are updated after the
> arch crash hotplug handler is triggered, hence an additional step is
> taken to ensure that memory ranges used to prepare elfcorehdr do not
> include hot removed memory.
> 
> When memory is hot removed it possible that memory regions count may
> increase. So to accommodate a growing number of memory regions, the
> elfcorehdr kexec segment is built with additional buffer space.
> 
> The changes done here will also work for the kexec_load system call given
> that the kexec tool builds the elfcoredhr with additional space to
> accommodate future memory regions as it is done for kexec_file_load
> system call in the kernel.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>

I don't remember sending a review-by on this patch earlier, do you?

> ---
>  arch/powerpc/include/asm/kexec_ranges.h |  1 +
>  arch/powerpc/kexec/core_64.c            | 77 +++++++++++++++++++++-
>  arch/powerpc/kexec/file_load_64.c       | 36 ++++++++++-
>  arch/powerpc/kexec/ranges.c             | 85 +++++++++++++++++++++++++
>  4 files changed, 195 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec_ranges.h b/arch/powerpc/include/asm/kexec_ranges.h
> index f83866a19e870..802abf580cf0f 100644
> --- a/arch/powerpc/include/asm/kexec_ranges.h
> +++ b/arch/powerpc/include/asm/kexec_ranges.h
> @@ -7,6 +7,7 @@
>  void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
>  struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
>  int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
> +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
>  int add_tce_mem_ranges(struct crash_mem **mem_ranges);
>  int add_initrd_mem_range(struct crash_mem **mem_ranges);
>  #ifdef CONFIG_PPC_64S_HASH_MMU
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 147ea6288a526..01a764b1c9b07 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -19,6 +19,7 @@
>  #include <linux/of.h>
>  #include <linux/libfdt.h>
>  #include <linux/memblock.h>
> +#include <linux/memory.h>
>  
>  #include <asm/page.h>
>  #include <asm/current.h>
> @@ -547,6 +548,76 @@ int update_cpus_node(void *fdt)
>  #undef pr_fmt
>  #define pr_fmt(fmt) "crash hp: " fmt
>  
> +/**
> + * update_crash_elfcorehdr() - Recreate the elfcorehdr and replace it with old
> + *			       elfcorehdr in the kexec segment array.
> + * @image: the active struct kimage
> + * @arg: struct memory_notify data handler
> + */
> +static void update_crash_elfcorehdr(struct kimage *image, struct memory_notify *mn)
> +{
> +	int ret;
> +	struct crash_mem *cmem = NULL;
> +	struct kexec_segment *ksegment;
> +	void *ptr, *mem, *elfbuf = NULL;
> +	unsigned long elfsz, memsz, base_addr, size;
> +
> +	ksegment = &image->segment[image->elfcorehdr_index];
> +	mem = (void *) ksegment->mem;
> +	memsz = ksegment->memsz;
> +
> +	ret = get_crash_memory_ranges(&cmem);
> +	if (ret) {
> +		pr_err("Failed to get crash mem range\n");
> +		return;
> +	}
> +
> +	/*
> +	 * The hot unplugged memory is not yet removed from crash memory
> +	 * ranges, remove it here.
> +	 */
> +	if (image->hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY) {
> +		base_addr = PFN_PHYS(mn->start_pfn);
> +		size = mn->nr_pages * PAGE_SIZE;
> +		ret = remove_mem_range(&cmem, base_addr, size);
> +		if (ret) {
> +			pr_err("Failed to remove hot-unplugged from crash memory ranges.\n");
> +			return;
> +		}
> +	}
> +
> +	ret = crash_prepare_elf64_headers(cmem, false, &elfbuf, &elfsz);
> +	if (ret) {
> +		pr_err("Failed to prepare elf header\n");
> +		return;
> +	}
> +
> +	/*
> +	 * It is unlikely that kernel hit this because elfcorehdr kexec
> +	 * segment (memsz) is built with addition space to accommodate growing
> +	 * number of crash memory ranges while loading the kdump kernel. It is
> +	 * Just to avoid any unforeseen case.
> +	 */
> +	if (elfsz > memsz) {
> +		pr_err("Updated crash elfcorehdr elfsz %lu > memsz %lu", elfsz, memsz);
> +		goto out;
> +	}
> +
> +	ptr = __va(mem);
> +	if (ptr) {
> +		/* Temporarily invalidate the crash image while it is replaced */
> +		xchg(&kexec_crash_image, NULL);
> +
> +		/* Replace the old elfcorehdr with newly prepared elfcorehdr */
> +		memcpy((void *)ptr, elfbuf, elfsz);
> +
> +		/* The crash image is now valid once again */
> +		xchg(&kexec_crash_image, image);
> +	}
> +out:
> +	vfree(elfbuf);
> +}
> +
>  /**
>   * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
>   *                                necessary kexec segments based on the hotplug event.
> @@ -554,12 +625,14 @@ int update_cpus_node(void *fdt)
>   * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
>   *
>   * Update FDT segment to include newly added CPU. No action for CPU remove case.
> + * Recreate the elfcorehdr for Memory add/remove case and replace it with old one.
>   */
>  void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>  {
>  	void *fdt, *ptr;
>  	unsigned long mem;
>  	int i, fdt_index = -1;
> +	struct memory_notify *mn;
>  	unsigned int hp_action = image->hp_action;
>  
>  	/*
> @@ -569,9 +642,9 @@ void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>  	if (hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>  		return;
>  
> -	/* crash update on memory hotplug events is not supported yet */
>  	if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
> -		pr_info_once("Crash update is not supported for memory hotplug\n");
> +		mn = (struct memory_notify *) arg;
> +		update_crash_elfcorehdr(image, mn);
>  		return;
>  	}
>  
> diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
> index 3554168687869..db7ba1c8c5e0b 100644
> --- a/arch/powerpc/kexec/file_load_64.c
> +++ b/arch/powerpc/kexec/file_load_64.c
> @@ -21,6 +21,8 @@
>  #include <linux/memblock.h>
>  #include <linux/slab.h>
>  #include <linux/vmalloc.h>
> +#include <linux/elf.h>
> +
>  #include <asm/setup.h>
>  #include <asm/drmem.h>
>  #include <asm/firmware.h>
> @@ -707,6 +709,30 @@ static void update_backup_region_phdr(struct kimage *image, Elf64_Ehdr *ehdr)
>  	}
>  }
>  
> +/* get_max_phdr - Find the total number of Phdr needed to represent the
> + *		  max memory in the kdump elfcorehdr.
> + *
> + * @cmem: crash memory ranges in the system.
> + */
> +static int get_max_phdr(struct crash_mem *cmem)
> +{
> +	int max_lmb;
> +
> +	/* In the worst case, a Phdr is needed for every other LMB to be represented
> +	 * as an individual crash range.
> +	 */
> +	max_lmb = memory_hotplug_max() / 2 * drmem_lmb_size();
> +
> +	/* Do not cross the Phdr max limit of the elf header.
> +	 * Avoid counting Phdr for crash ranges (cmem->nr_ranges) which
> +	 * are already part of elfcorehdr.
> +	 */
> +	if (max_lmb > PN_XNUM)
> +		return PN_XNUM - cmem->nr_ranges;
> +
> +	return max_lmb - cmem->nr_ranges;
> +}
> +
>  /**
>   * load_elfcorehdr_segment - Setup crash memory ranges and initialize elfcorehdr
>   *                           segment needed to load kdump kernel.
> @@ -738,7 +764,13 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
>  
>  	kbuf->buffer = headers;
>  	kbuf->mem = KEXEC_BUF_MEM_UNKNOWN;
> -	kbuf->bufsz = kbuf->memsz = headers_sz;
> +	kbuf->bufsz = headers_sz;
> +/* Additional buffer space to accommodate future memory ranges */
> +#if defined(CONFIG_MEMORY_HOTPLUG)
> +	kbuf->memsz = headers_sz + get_max_phdr(cmem) * sizeof(Elf64_Phdr);
> +#else
> +	kbuf->memsz = headers_sz;
> +#endif
>  	kbuf->top_down = false;
>  
>  	ret = kexec_add_buffer(kbuf);
> @@ -748,7 +780,7 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
>  	}
>  
>  	image->elf_load_addr = kbuf->mem;
> -	image->elf_headers_sz = headers_sz;
> +	image->elf_headers_sz = kbuf->memsz;
>  	image->elf_headers = headers;
>  out:
>  	kfree(cmem);
> diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
> index 5fc53a5fcfdf6..d8007363cdc11 100644
> --- a/arch/powerpc/kexec/ranges.c
> +++ b/arch/powerpc/kexec/ranges.c
> @@ -234,6 +234,91 @@ int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
>  	return __add_mem_range(mem_ranges, base, size);
>  }
>  
> +/**
> + * remove_mem_range - Removes the given memory range from the range list.
> + * @mem_ranges:    Range list to remove the memory range to.
> + * @base:          Base address of the range to remove.
> + * @size:          Size of the memory range to remove.
> + *
> + * (Re)allocates memory, if needed.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size)
> +{
> +	u64 end;
> +	int ret = 0;
> +	unsigned int i;
> +	u64 mstart, mend;
> +	struct crash_mem *mem_rngs = *mem_ranges;
> +
> +	if (!size)
> +		return 0;
> +
> +	/*
> +	 * Memory range are stored as start and end address, use
> +	 * the same format to do remove operation.
> +	 */
> +	end = base + size - 1;
> +
> +	for (i = 0; i < mem_rngs->nr_ranges; i++) {
> +		mstart = mem_rngs->ranges[i].start;
> +		mend = mem_rngs->ranges[i].end;
> +
> +		/*
> +		 * Memory range to remove is not part of this range entry
> +		 * in the memory range list
> +		 */
> +		if (!(base >= mstart && end <= mend))
> +			continue;
> +
> +		/*
> +		 * Memory range to remove is equivalent to this entry in the
> +		 * memory range list. Remove the range entry from the list.
> +		 */
> +		if (base == mstart && end == mend) {
> +			for (; i < mem_rngs->nr_ranges - 1; i++) {
> +				mem_rngs->ranges[i].start = mem_rngs->ranges[i+1].start;
> +				mem_rngs->ranges[i].end = mem_rngs->ranges[i+1].end;
> +			}
> +			mem_rngs->nr_ranges--;
> +			goto out;
> +		}
> +		/*
> +		 * Start address of the memory range to remove and the
> +		 * current memory range entry in the list is same. Just
> +		 * move the start address of the current memory range
> +		 * entry in the list to end + 1.
> +		 */
> +		else if (base == mstart) {
> +			mem_rngs->ranges[i].start = end + 1;
> +			goto out;
> +		}
> +		/*
> +		 * End address of the memory range to remove and the
> +		 * current memory range entry in the list is same.
> +		 * Just move the end address of the current memory
> +		 * range entry in the list to base - 1.
> +		 */
> +		else if (end == mend)  {
> +			mem_rngs->ranges[i].end = base - 1;
> +			goto out;
> +		}
> +		/*
> +		 * Memory range to remove is not at the edge of current
> +		 * memory range entry. Split the current memory entry into
> +		 * two half.
> +		 */
> +		else {
> +			mem_rngs->ranges[i].end = base - 1;
> +			size = mem_rngs->ranges[i].end - end;
> +			ret = add_mem_range(mem_ranges, end + 1, size);
> +		}
> +	}
> +out:
> +	return ret;
> +}
> +
>  /**
>   * add_tce_mem_ranges - Adds tce-table range to the given memory ranges list.
>   * @mem_ranges:         Range list to add the memory range(s) to.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler
  2023-04-23 10:52 ` [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
  2023-04-24  9:59   ` Laurent Dufour
@ 2023-04-24 14:02   ` Eric DeVolder
  1 sibling, 0 replies; 14+ messages in thread
From: Eric DeVolder @ 2023-04-24 14:02 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe
  Cc: bhe, mahesh, kexec, Laurent Dufour, ldufour, hbathini



On 4/23/23 05:52, Sourabh Jain wrote:
> On PowePC memblock regions are used to prepare elfcorehdr which
s/PowePC/PowerPC/

> describes the memory regions of the running kernel to the kdump kernel.
> Since the notifier used for the memory hotplug crash handler gets
> initiated before the update of the memblock region happens (as depicted
> below) the newly prepared elfcorehdr still holds the old memory regions.
> If the elfcorehdr is prepared with stale memblock regions then the newly
> prepared elfcorehdr will still be holding stale memory regions. And dump
> collection with stale elfcorehdr will lead to dump collection failure or
> incomplete dump collection.
> 
> The sequence of actions done on PowerPC when an LMB memory hot removed:
> 
>   Initiate memory hot remove
>            |
>            v
>   offline pages
>            |
>            v
>   initiate memory notify call
>   chain for MEM_OFFLINE event  <---> Prepare new elfcorehdr and replace
>   				    it with old one
>            |
>            v
>   update memblock regions
> 
> Such challenges only exist for memory remove case. For the memory add
> case the memory regions are updated first and then memory notify calls
> the arch crash hotplug handler to update the elfcorehdr.
> 
> This patch passes additional information about the hot removed LMB to
> the arch crash hotplug handler in the form of memory_notify object.
> 
> How passing memory_notify to arch crash hotplug handler will help?
> 
> memory_notify holds the start PFN and page count of the hot removed
> memory. With that base address and the size of the hot removed memory
> can be calculated and same can be used to avoid adding hot removed
> memory region to get added in the elfcorehdr.
> 
> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
> ---
>   arch/powerpc/include/asm/kexec.h |  2 +-
>   arch/powerpc/kexec/core_64.c     |  3 ++-
>   arch/x86/include/asm/kexec.h     |  2 +-
>   arch/x86/kernel/crash.c          |  3 ++-
>   include/linux/kexec.h            |  2 +-
>   kernel/crash_core.c              | 14 +++++++-------
>   6 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
> index f01ba767af56e..7e811bad5ec92 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -104,7 +104,7 @@ struct crash_mem;
>   int update_cpus_node(void *fdt);
>   int get_crash_memory_ranges(struct crash_mem **mem_ranges);
>   #if defined(CONFIG_CRASH_HOTPLUG)
> -void arch_crash_handle_hotplug_event(struct kimage *image);
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>   #endif
>   #endif
> diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
> index 611b89bcea2be..147ea6288a526 100644
> --- a/arch/powerpc/kexec/core_64.c
> +++ b/arch/powerpc/kexec/core_64.c
> @@ -551,10 +551,11 @@ int update_cpus_node(void *fdt)
>    * arch_crash_hotplug_handler() - Handle crash CPU/Memory hotplug events to update the
>    *                                necessary kexec segments based on the hotplug event.
s/arch/crash_hotplug_handler/arch_crash_handle_hotplug_event/

>    * @image: the active struct kimage
> + * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
>    *
>    * Update FDT segment to include newly added CPU. No action for CPU remove case.
>    */
> -void arch_crash_handle_hotplug_event(struct kimage *image)
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>   {
>   	void *fdt, *ptr;
>   	unsigned long mem;
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 1bc852ce347d4..70c3b23b468b6 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -213,7 +213,7 @@ extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>   extern void kdump_nmi_shootdown_cpus(void);
>   
>   #ifdef CONFIG_CRASH_HOTPLUG
> -void arch_crash_handle_hotplug_event(struct kimage *image);
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>   
>   #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index ead602636f3e0..b45d13193b579 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -445,11 +445,12 @@ int crash_load_segments(struct kimage *image)
>   /**
>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>    * @image: the active struct kimage
> + * @arg: struct memory_notify handler for memory add/remove case and NULL for CPU case.
>    *
>    * The new elfcorehdr is prepared in a kernel buffer, and then it is
>    * written on top of the existing/old elfcorehdr.
>    */
> -void arch_crash_handle_hotplug_event(struct kimage *image)
> +void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
>   {
>   	void *elfbuf = NULL, *old_elfcorehdr;
>   	unsigned long nr_mem_ranges;
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0ac41f48de0b1..69765e6a92d0d 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -506,7 +506,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>   #endif
>   
>   #ifndef arch_crash_handle_hotplug_event
> -static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
> +static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { }
>   #endif
>   
>   #ifndef crash_hotplug_cpu_support
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 4aa3c7a6b390f..7afe5f60d2bfe 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -718,7 +718,7 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>    * list of segments it checks (since the elfcorehdr changes and thus
>    * would require an update to purgatory itself to update the digest).
>    */
> -static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> +static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu, void *arg)
>   {
>   	struct kimage *image;
>   
> @@ -775,7 +775,7 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>   	image->hp_action = hp_action;
>   
>   	/* Now invoke arch-specific update handler */
> -	arch_crash_handle_hotplug_event(image);
> +	arch_crash_handle_hotplug_event(image, arg);
>   
>   	/* No longer handling a hotplug event */
>   	image->hp_action = KEXEC_CRASH_HP_NONE;
> @@ -789,17 +789,17 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>   	kexec_unlock();
>   }
>   
> -static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
> +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *arg)
>   {
>   	switch (val) {
>   	case MEM_ONLINE:
>   		crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
> -			KEXEC_CRASH_HP_INVALID_CPU);
> +			KEXEC_CRASH_HP_INVALID_CPU, arg);
>   		break;
>   
>   	case MEM_OFFLINE:
>   		crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
> -			KEXEC_CRASH_HP_INVALID_CPU);
> +			KEXEC_CRASH_HP_INVALID_CPU, arg);
>   		break;
>   	}
>   	return NOTIFY_OK;
> @@ -812,13 +812,13 @@ static struct notifier_block crash_memhp_nb = {
>   
>   static int crash_cpuhp_online(unsigned int cpu)
>   {
> -	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
> +	crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu, NULL);
>   	return 0;
>   }
>   
>   static int crash_cpuhp_offline(unsigned int cpu)
>   {
> -	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
> +	crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu, NULL);
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel
  2023-04-23 10:52 [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Sourabh Jain
                   ` (4 preceding siblings ...)
  2023-04-23 10:52 ` [PATCH v10 5/5] powerpc/kexec: add crash memory hotplug support Sourabh Jain
@ 2023-04-24 14:05 ` Eric DeVolder
  2023-04-24 15:13   ` Sourabh Jain
  5 siblings, 1 reply; 14+ messages in thread
From: Eric DeVolder @ 2023-04-24 14:05 UTC (permalink / raw)
  To: Sourabh Jain, linuxppc-dev, mpe; +Cc: mahesh, ldufour, kexec, bhe, hbathini



On 4/23/23 05:52, Sourabh Jain wrote:
> The Problem:
> ============
> Post CPU/Memory hot plug/unplug and online/offline events the  kernel
> holds stale information about the system. Dump collection with stale
> kdump kernel might end up in dump capture failure or an inaccurate dump
> collection.
> 
> Existing solution:
> ==================
> The existing solution to keep the kdump kernel up-to-date by monitoring
> CPU/Memory hotplug/online/offline events via udev rule and trigger a full
> kdump kernel reload for every hotplug event.
> 
> Shortcomings:
> ------------------------------------------------
> - Leaves a window where kernel crash might not lead to a successful dump
>    collection.
> - Reloading all kexec components for each hotplug is inefficient.
> - udev rules are prone to races if hotplug events are frequent.
> 
> More about issues with an existing solution is posted here:
>   - https://lkml.org/lkml/2020/12/14/532
>   - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html
> 
> Proposed Solution:
> ==================
> Instead of reloading all kexec segments on CPU/Memory hotplug/online/offline
> event, this patch series focuses on updating only the relevant kexec segment.
> Once the kexec segments are loaded in the kernel reserved area then an
> arch-specific hotplug handler will update the relevant kexec segment based on
> hotplug event type.
> 
> Series Dependencies
> ====================
> This patch series implements the crash hotplug handler on PowerPC. The generic
> crash hotplug handler is introduced by https://lkml.org/lkml/2023/4/4/1136 patch
> series.
> 
> Git tree for testing:
> =====================
> The below git tree has this patch series applied on top of dependent patch
> series.
> https://github.com/sourabhjains/linux/tree/e21-s10
> 
> To realise the feature the kdump udev rule must updated to avoid
> reloading of kdump reload on CPU/Memory hotplug/online/offline events.
> 
>    RHEL: /usr/lib/udev/rules.d/98-kexec.rules
> 
> 	-SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"
> 	-SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem"
> 	-SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem"
> 	+SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> 	+SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> 

I didn't see in the patch series where you would have the equivalent to the following (needed for 
the sysfs crash_hotplug entries):

#ifdef CONFIG_HOTPLUG_CPU
static inline int crash_hotplug_cpu_support(void) { return 1; }
#define crash_hotplug_cpu_support crash_hotplug_cpu_support
#endif

#ifdef CONFIG_MEMORY_HOTPLUG
static inline int crash_hotplug_memory_support(void) { return 1; }
#define crash_hotplug_memory_support crash_hotplug_memory_support
#endif

> Note: only kexec_file_load syscall will work. For kexec_load minor changes are
> required in kexec tool.
> 
> ---
> Changelog:
> 
> v10:
>    - Drop the patch that adds fdt_index attribute to struct kimage_arch
>      Find the fdt segment index when needed.
>    - Added more details into commits messages.
>    - Rebased onto 6.3.0-rc5
> 
> v9:
>    - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
>      The patch is moved to generic patch series that introduces generic
>      infrastructure for in kernel crash update.
>    - Removed patch to pass the hotplug action type to the arch crash
>      hotplug handler function. The generic patch series has introduced
>      the hotplug action type in kimage struct.
>    - Add detail commit message for better understanding.
> 
> v8:
>    - Restrict fdt_index initialization to machine_kexec_post_load
>      it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour
> 
>    - Updated the logic to find the number of offline core. [6/8]
> 
>    - Changed the logic to find the elfcore program header to accommodate
>      future memory ranges due memory hotplug events. [8/8]
> 
> v7
>    - added a new config to configure this feature
>    - pass hotplug action type to arch specific handler
> 
> v6
>    - Added crash memory hotplug support
> 
> v5:
>    - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
>    - Move fdt segment identification for kexec_load case to load path
>      instead of crash hotplug handler
>    - Keep new attribute defined under kimage_arch to track FDT segment
>      under CONFIG_HOTPLUG_CPU config.
> 
> v4:
>    - Update the logic to find the additional space needed for hotadd CPUs post
>      kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug
>      support for kexec_file_load" patch to know more about the change.
>    - Fix a couple of typo.
>    - Replace pr_err to pr_info_once to warn user about memory hotplug
>      support.
>    - In crash hotplug handle exit the for loop if FDT segment is found.
> 
> v3
>    - Move fdt_index and fdt_index_vaild variables to kimage_arch struct.
>    - Rebase patche on top of https://lkml.org/lkml/2022/3/3/674 [v5]
>    - Fixed warning reported by checpatch script
> 
> v2:
>    - Use generic hotplug handler introduced by https://lkml.org/lkml/2022/2/9/1406, a
>      significant change from v1.
> 
> Sourabh Jain (5):
>    powerpc/kexec: turn some static helper functions public
>    powerpc/crash: introduce a new config option CRASH_HOTPLUG
>    powerpc/crash: add crash CPU hotplug support
>    crash: forward memory_notify args to arch crash hotplug handler
>    powerpc/kexec: add crash memory hotplug support
> 
>   arch/powerpc/Kconfig                    |  12 +
>   arch/powerpc/include/asm/kexec.h        |  10 +
>   arch/powerpc/include/asm/kexec_ranges.h |   1 +
>   arch/powerpc/kexec/core_64.c            | 301 ++++++++++++++++++++++++
>   arch/powerpc/kexec/elf_64.c             |  12 +-
>   arch/powerpc/kexec/file_load_64.c       | 212 ++++-------------
>   arch/powerpc/kexec/ranges.c             |  85 +++++++
>   arch/x86/include/asm/kexec.h            |   2 +-
>   arch/x86/kernel/crash.c                 |   3 +-
>   include/linux/kexec.h                   |   2 +-
>   kernel/crash_core.c                     |  14 +-
>   11 files changed, 479 insertions(+), 175 deletions(-)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG
  2023-04-24  9:57   ` Laurent Dufour
@ 2023-04-24 15:00     ` Sourabh Jain
  0 siblings, 0 replies; 14+ messages in thread
From: Sourabh Jain @ 2023-04-24 15:00 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, mpe
  Cc: bhe, mahesh, kexec, Laurent Dufour, eric.devolder, hbathini


On 24/04/23 15:27, Laurent Dufour wrote:
> On 23/04/2023 12:52:10, Sourabh Jain wrote:
>> Due to CPU/Memory hot plug/unplug or online/offline events the system
>> resources changes. A similar change should reflect in the loaded kdump
>> kernel kexec segments that describes the state of the CPU and memory of
>> the running kernel.
>>
>> If the kdump kernel kexec segments are not updated after the CPU/Memory
>> hot plug/unplug or online/offline events and kdump kernel tries to
>> collect the dump with the stale system resource data then this might
>> lead to dump collection failure or an inaccurate dump collection.
>>
>> The current method to keep the kdump kernel kexec segments up to date is
>> by reloading the complete kdump kernel whenever a CPU/Memory hot
>> plug/unplug or online/offline event is observed in userspace. Reloading
>> the kdump kernel for every CPU/Memory hot plug/unplug or online/offline
>> event is inefficient and creates a large window where the kdump service
>> is not available. It can be improved by doing in-kernel updates to only
>> necessary kdump kernel kexec segments which describe CPU and Memory
>> resources of the running kernel to the kdump kernel.
>>
>> The kernel changes related to in-kernel updates to the kdump kernel
>> kexec segments are kept under the CRASH_HOTPLUG config option.
>>
>> Later in the series, a powerpc crash hotplug handler is introduced to
>> update the kdump kernel kexec segments on CPU/Memory hotplug events.
>> This arch-specific handler is triggered from a generic crash handler
>> that registers with the CPU/Memory add/remove notifiers.
>>
>> The CRASH_HOTPLUG config option is enabled by default.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
>> Reviewed-by: Laurent Dufour <laurent.dufour@fr.ibm.com>
> I can't remember having sent a review-by on that patch earlier.
>
> Anyway, I can't find any issue with that one, so replace with:
> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>

I apologies for mistakenly applying the "Reviewed-by" tag to the entire
patch series, when it was intended for only one patch. I will remove the
tag in next version.

- Sourabh Jain


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel
  2023-04-24 14:05 ` [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Eric DeVolder
@ 2023-04-24 15:13   ` Sourabh Jain
  0 siblings, 0 replies; 14+ messages in thread
From: Sourabh Jain @ 2023-04-24 15:13 UTC (permalink / raw)
  To: Eric DeVolder, linuxppc-dev, mpe; +Cc: mahesh, ldufour, kexec, hbathini, bhe


On 24/04/23 19:35, Eric DeVolder wrote:
>
>
> On 4/23/23 05:52, Sourabh Jain wrote:
>> The Problem:
>> ============
>> Post CPU/Memory hot plug/unplug and online/offline events the kernel
>> holds stale information about the system. Dump collection with stale
>> kdump kernel might end up in dump capture failure or an inaccurate dump
>> collection.
>>
>> Existing solution:
>> ==================
>> The existing solution to keep the kdump kernel up-to-date by monitoring
>> CPU/Memory hotplug/online/offline events via udev rule and trigger a 
>> full
>> kdump kernel reload for every hotplug event.
>>
>> Shortcomings:
>> ------------------------------------------------
>> - Leaves a window where kernel crash might not lead to a successful dump
>>    collection.
>> - Reloading all kexec components for each hotplug is inefficient.
>> - udev rules are prone to races if hotplug events are frequent.
>>
>> More about issues with an existing solution is posted here:
>>   - https://lkml.org/lkml/2020/12/14/532
>>   - 
>> https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html
>>
>> Proposed Solution:
>> ==================
>> Instead of reloading all kexec segments on CPU/Memory 
>> hotplug/online/offline
>> event, this patch series focuses on updating only the relevant kexec 
>> segment.
>> Once the kexec segments are loaded in the kernel reserved area then an
>> arch-specific hotplug handler will update the relevant kexec segment 
>> based on
>> hotplug event type.
>>
>> Series Dependencies
>> ====================
>> This patch series implements the crash hotplug handler on PowerPC. 
>> The generic
>> crash hotplug handler is introduced by 
>> https://lkml.org/lkml/2023/4/4/1136 patch
>> series.
>>
>> Git tree for testing:
>> =====================
>> The below git tree has this patch series applied on top of dependent 
>> patch
>> series.
>> https://github.com/sourabhjains/linux/tree/e21-s10
>>
>> To realise the feature the kdump udev rule must updated to avoid
>> reloading of kdump reload on CPU/Memory hotplug/online/offline events.
>>
>>    RHEL: /usr/lib/udev/rules.d/98-kexec.rules
>>
>>     -SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu"
>>     -SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem"
>>     -SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem"
>>     +SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", 
>> GOTO="kdump_reload_end"
>>     +SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", 
>> GOTO="kdump_reload_end"
>>
>
> I didn't see in the patch series where you would have the equivalent 
> to the following (needed for the sysfs crash_hotplug entries):
>
> #ifdef CONFIG_HOTPLUG_CPU
> static inline int crash_hotplug_cpu_support(void) { return 1; }
> #define crash_hotplug_cpu_support crash_hotplug_cpu_support
> #endif
>
> #ifdef CONFIG_MEMORY_HOTPLUG
> static inline int crash_hotplug_memory_support(void) { return 1; }
> #define crash_hotplug_memory_support crash_hotplug_memory_support
> #endif

I missed the above diff in my testing environment. Thanks you for 
bringing it
to my attention. I will fix this next version.

- Sourabh Jain

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-04-24 15:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-23 10:52 [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Sourabh Jain
2023-04-23 10:52 ` [PATCH v10 1/5] powerpc/kexec: turn some static helper functions public Sourabh Jain
2023-04-23 10:52 ` [PATCH v10 2/5] powerpc/crash: introduce a new config option CRASH_HOTPLUG Sourabh Jain
2023-04-24  9:57   ` Laurent Dufour
2023-04-24 15:00     ` Sourabh Jain
2023-04-23 10:52 ` [PATCH v10 3/5] powerpc/crash: add crash CPU hotplug support Sourabh Jain
2023-04-24  9:59   ` Laurent Dufour
2023-04-23 10:52 ` [PATCH v10 4/5] crash: forward memory_notify args to arch crash hotplug handler Sourabh Jain
2023-04-24  9:59   ` Laurent Dufour
2023-04-24 14:02   ` Eric DeVolder
2023-04-23 10:52 ` [PATCH v10 5/5] powerpc/kexec: add crash memory hotplug support Sourabh Jain
2023-04-24 10:00   ` Laurent Dufour
2023-04-24 14:05 ` [PATCH v10 0/5] PowerPC: In-kernel handling of CPU/Memory hotplug/online/offline events for kdump kernel Eric DeVolder
2023-04-24 15:13   ` Sourabh Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).