* [PATCH 1/8] pseries: phyp dump: Documentation
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
@ 2008-03-21 23:33 ` Manish Ahuja
2008-03-21 23:37 ` [PATCH 2/8] pseries: phyp dump: reserve-release Manish Ahuja
` (6 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:33 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
Basic documentation for hypervisor-assisted dump.
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
---
Documentation/powerpc/phyp-assisted-dump.txt | 127 +++++++++++++++++++++++++++
1 file changed, 127 insertions(+)
Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-18 03:22:33.000000000 -0600
@@ -0,0 +1,127 @@
+
+ Hypervisor-Assisted Dump
+ ------------------------
+ November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+ with a fresh copy of the kernel. In particular,
+ PCI and I/O devices have been reinitialized and are
+ in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+ immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+ required; the system will be fully usable, and running
+ in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+ the low 256MB of RAM to a previously registered
+ save region. It will also save system state, system
+ registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+ hypervisor will reset PCI and other hardware state.
+ It will *not* clear RAM. It will then launch the
+ bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+ is a new node (ibm,dump-kernel) in the device tree,
+ indicating that there is crash data available from
+ a previous boot. It will boot into only 256MB of RAM,
+ reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+ and read /proc/vmcore to obtain the contents of memory,
+ which holds the previous crashed kernel. The userspace
+ tools may copy this info to disk, or network, nas, san,
+ iscsi, etc. as desired.
+
+ For Example: the values in /sys/kernel/release-region
+ would look something like this (address-range pairs).
+ CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
+ DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+ dump, they echo an offset and size to
+ /sys/kernel/release_region to release the reserved
+ memory back to general use.
+
+ An example of this is:
+ "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+------------
+ o The various code paths that tell the hypervisor that a crash
+ occurred, vs. it simply being a normal reboot, should be
+ reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+ instead? There is a dump_subsys being created by the s390 code,
+ perhaps the pseries code should use a similar layout as well.
+
+ o Is reserving a 256MB region really required? The goal of
+ reserving a 256MB scratch area is to make sure that no
+ important crash data is clobbered when the hypervisor
+ save low mem to the scratch area. But, if one could assure
+ that nothing important is located in some 256MB area, then
+ it would not need to be reserved. Something that can be
+ improved in subsequent versions.
+
+ o Still working the kdump team to integrate this with kdump,
+ some work remains but this would not affect the current
+ patches.
+
+ o Still need to write a shell script, to copy the dump away.
+ Currently I am parsing it manually.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/8] pseries: phyp dump: reserve-release
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-03-21 23:33 ` [PATCH 1/8] pseries: phyp dump: Documentation Manish Ahuja
@ 2008-03-21 23:37 ` Manish Ahuja
2008-03-21 23:39 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
` (5 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:37 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
---
arch/powerpc/kernel/prom.c | 52 ++++++++++++++
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/phyp_dump.c | 103 +++++++++++++++++++++++++++++
include/asm-powerpc/phyp_dump.h | 41 +++++++++++
4 files changed, 197 insertions(+)
Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h 2008-03-21 23:37:11.000000000 -0500
@@ -0,0 +1,41 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END (1UL<<28)
+
+struct phyp_dump {
+ /* Memory that is reserved during very early boot. */
+ unsigned long init_reserve_start;
+ unsigned long init_reserve_size;
+ /* Check status during boot if dump supported, active & present*/
+ unsigned long phyp_dump_configured;
+ unsigned long phyp_dump_is_active;
+ /* store cpu & hpte size */
+ unsigned long cpu_state_size;
+ unsigned long hpte_region_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+int early_init_dt_scan_phyp_dump(unsigned long node,
+ const char *uname, int depth, void *data);
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 23:37:12.000000000 -0500
@@ -0,0 +1,103 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+#include <asm/machdep.h>
+#include <asm/prom.h>
+
+/* Variables, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_vars;
+struct phyp_dump *phyp_dump_info = &phyp_dump_vars;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+ struct page *rpage;
+ unsigned long end_pfn;
+ long i;
+
+ end_pfn = start_pfn + nr_pages;
+
+ for (i = start_pfn; i <= end_pfn; i++) {
+ rpage = pfn_to_page(i);
+ if (PageReserved(rpage)) {
+ ClearPageReserved(rpage);
+ init_page_count(rpage);
+ __free_page(rpage);
+ totalram_pages++;
+ }
+ }
+}
+
+static int __init phyp_dump_setup(void)
+{
+ unsigned long start_pfn, nr_pages;
+
+ /* If no memory was reserved in early boot, there is nothing to do */
+ if (phyp_dump_info->init_reserve_size == 0)
+ return 0;
+
+ /* Release memory that was reserved in early boot */
+ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+ nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+ release_memory_range(start_pfn, nr_pages);
+
+ return 0;
+}
+machine_subsys_initcall(pseries, phyp_dump_setup);
+
+int __init early_init_dt_scan_phyp_dump(unsigned long node,
+ const char *uname, int depth, void *data)
+{
+ const unsigned int *sizes;
+
+ phyp_dump_info->phyp_dump_configured = 0;
+ phyp_dump_info->phyp_dump_is_active = 0;
+
+ if (depth != 1 || strcmp(uname, "rtas") != 0)
+ return 0;
+
+ if (of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL))
+ phyp_dump_info->phyp_dump_configured++;
+
+ if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL))
+ phyp_dump_info->phyp_dump_is_active++;
+
+ sizes = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
+ NULL);
+ if (!sizes)
+ return 0;
+
+ if (sizes[0] == 1)
+ phyp_dump_info->cpu_state_size = *((unsigned long *)&sizes[1]);
+
+ if (sizes[3] == 2)
+ phyp_dump_info->hpte_region_size =
+ *((unsigned long *)&sizes[4]);
+ return 1;
+}
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/Makefile 2008-03-21 00:01:26.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile 2008-03-21 00:02:15.000000000 -0500
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu
obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
obj-$(CONFIG_HVCS) += hvcserver.o
obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c 2008-03-21 00:01:26.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/kernel/prom.c 2008-03-21 22:59:37.000000000 -0500
@@ -51,6 +51,7 @@
#include <asm/machdep.h>
#include <asm/pSeries_reconfig.h>
#include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
#include <asm/kexec.h>
#ifdef DEBUG
@@ -1039,6 +1040,51 @@ static void __init early_reserve_mem(voi
#endif
}
+#ifdef CONFIG_PHYP_DUMP
+/**
+ * phyp_dump_reserve_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine may reserve memory regions in the kernel only
+ * if the system is supported and a dump was taken in last
+ * boot instance or if the hardware is supported and the
+ * scratch area needs to be setup. In other instances it returns
+ * without reserving anything. The memory in case of dump being
+ * active is freed when the dump is collected (by userland tools).
+ */
+static void __init phyp_dump_reserve_mem(void)
+{
+ unsigned long base, size;
+ if (!phyp_dump_info->phyp_dump_configured) {
+ printk(KERN_ERR "Phyp-dump not supported on this hardware\n");
+ return;
+ }
+
+ if (phyp_dump_info->phyp_dump_is_active) {
+ /* Reserve *everything* above RMR.Area freed by userland tools*/
+ base = PHYP_DUMP_RMR_END;
+ size = lmb_end_of_DRAM() - base;
+
+ /* XXX crashed_ram_end is wrong, since it may be beyond
+ * the memory_limit, it will need to be adjusted. */
+ lmb_reserve(base, size);
+
+ phyp_dump_info->init_reserve_start = base;
+ phyp_dump_info->init_reserve_size = size;
+ } else {
+ size = phyp_dump_info->cpu_state_size +
+ phyp_dump_info->hpte_region_size +
+ PHYP_DUMP_RMR_END;
+ base = lmb_end_of_DRAM() - size;
+ lmb_reserve(base, size);
+ phyp_dump_info->init_reserve_start = base;
+ phyp_dump_info->init_reserve_size = size;
+ }
+}
+#else
+static inline void __init phyp_dump_reserve_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP && CONFIG_PPC_RTAS */
+
+
void __init early_init_devtree(void *params)
{
DBG(" -> early_init_devtree(%p)\n", params);
@@ -1051,6 +1097,11 @@ void __init early_init_devtree(void *par
of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
#endif
+#ifdef CONFIG_PHYP_DUMP
+ /* scan tree to see if dump occured during last boot */
+ of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
+#endif
+
/* Retrieve various informations from the /chosen node of the
* device-tree, including the platform type, initrd location and
* size, TCE reserve, and more ...
@@ -1071,6 +1122,7 @@ void __init early_init_devtree(void *par
reserve_kdump_trampoline();
reserve_crashkernel();
early_reserve_mem();
+ phyp_dump_reserve_mem();
lmb_enforce_memory_limit(memory_limit);
lmb_analyze();
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
2008-03-21 23:33 ` [PATCH 1/8] pseries: phyp dump: Documentation Manish Ahuja
2008-03-21 23:37 ` [PATCH 2/8] pseries: phyp dump: reserve-release Manish Ahuja
@ 2008-03-21 23:39 ` Manish Ahuja
2008-03-21 23:43 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
` (4 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:39 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space
finishes dumping a section, it must release that memory
by writing to sysfs. For example,
echo "0x40000000 0x10000000" > /sys/kernel/release_region
will release 256MB starting at the 1GB. The released memory
becomes free for general use.
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
---
arch/powerpc/platforms/pseries/phyp_dump.c | 81 +++++++++++++++++++++++++++--
1 file changed, 76 insertions(+), 5 deletions(-)
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 00:10:15.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:39:21.000000000 -0500
@@ -12,19 +12,24 @@
*/
#include <linux/init.h>
+#include <linux/kobject.h>
#include <linux/mm.h>
+#include <linux/of.h>
#include <linux/pfn.h>
#include <linux/swap.h>
+#include <linux/sysfs.h>
#include <asm/page.h>
#include <asm/phyp_dump.h>
#include <asm/machdep.h>
#include <asm/prom.h>
+#include <asm/rtas.h>
/* Variables, used to communicate data between early boot and late boot */
static struct phyp_dump phyp_dump_vars;
struct phyp_dump *phyp_dump_info = &phyp_dump_vars;
+/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
* @start_pfn: starting physical frame number
@@ -54,18 +59,84 @@ release_memory_range(unsigned long start
}
}
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ * "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ * "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t store_release_region(struct kobject *kobj,
+ struct kobj_attribute *attr,
+ const char *buf, size_t count)
{
+ unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+ ssize_t ret;
+
+ ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+ if (ret != 2)
+ return -EINVAL;
+
+ /* Range-check - don't free any reserved memory that
+ * wasn't reserved for phyp-dump */
+ if (start_addr < phyp_dump_info->init_reserve_start)
+ start_addr = phyp_dump_info->init_reserve_start;
+
+ end_addr = phyp_dump_info->init_reserve_start +
+ phyp_dump_info->init_reserve_size;
+ if (start_addr+length > end_addr)
+ length = end_addr - start_addr;
+
+ /* Release the region of memory assed in by user */
+ start_pfn = PFN_DOWN(start_addr);
+ nr_pages = PFN_DOWN(length);
+ release_memory_range(start_pfn, nr_pages);
+
+ return count;
+}
+
+static struct kobj_attribute rr = __ATTR(release_region, 0600,
+ NULL, store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+ struct device_node *rtas;
+ const int *dump_header = NULL;
+ int header_len = 0;
+ int rc;
/* If no memory was reserved in early boot, there is nothing to do */
if (phyp_dump_info->init_reserve_size == 0)
return 0;
- /* Release memory that was reserved in early boot */
- start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
- nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
- release_memory_range(start_pfn, nr_pages);
+ /* Return if phyp dump not supported */
+ if (!phyp_dump_info->phyp_dump_configured)
+ return -ENOSYS;
+
+ /* Is there dump data waiting for us? */
+ rtas = of_find_node_by_path("/rtas");
+ if (rtas) {
+ dump_header = of_get_property(rtas, "ibm,kernel-dump",
+ &header_len);
+ of_node_put(rtas);
+ }
+
+ if (dump_header == NULL)
+ return 0;
+
+ /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+ rc = sysfs_create_file(kernel_kobj, &rr.attr);
+ if (rc) {
+ printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
+ rc);
+ return 0;
+ }
return 0;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 4/8] pseries: phyp dump: register dump area.
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
` (2 preceding siblings ...)
2008-03-21 23:39 ` [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem Manish Ahuja
@ 2008-03-21 23:43 ` Manish Ahuja
2008-03-21 23:44 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
` (3 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:43 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
Set up the actual dump header, register it with the hypervisor.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
---
arch/powerpc/platforms/pseries/phyp_dump.c | 137 +++++++++++++++++++++++++++--
1 file changed, 131 insertions(+), 6 deletions(-)
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:39:21.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:52:53.000000000 -0500
@@ -29,6 +29,117 @@
static struct phyp_dump phyp_dump_vars;
struct phyp_dump *phyp_dump_info = &phyp_dump_vars;
+static int ibm_configure_kernel_dump;
+/* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+ u32 dump_flags;
+ u16 source_type;
+ u16 error_flags;
+ u64 source_address;
+ u64 source_length;
+ u64 length_copied;
+ u64 destination_address;
+};
+
+struct phyp_dump_header {
+ u32 version;
+ u16 num_of_sections;
+ u16 status;
+
+ u32 first_offset_section;
+ u32 dump_disk_section;
+ u64 block_num_dd;
+ u64 num_of_blocks_dd;
+ u32 offset_dd;
+ u32 maxtime_to_auto;
+ /* No dump disk path string used */
+
+ struct dump_section cpu_data;
+ struct dump_section hpte_data;
+ struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO 0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+ unsigned long addr_offset = 0;
+
+ /* Set up the dump header */
+ ph->version = DUMP_HEADER_VERSION;
+ ph->num_of_sections = NUM_DUMP_SECTIONS;
+ ph->status = 0;
+
+ ph->first_offset_section =
+ (u32)offsetof(struct phyp_dump_header, cpu_data);
+ ph->dump_disk_section = 0;
+ ph->block_num_dd = 0;
+ ph->num_of_blocks_dd = 0;
+ ph->offset_dd = 0;
+
+ ph->maxtime_to_auto = 0; /* disabled */
+
+ /* The first two sections are mandatory */
+ ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+ ph->cpu_data.source_address = 0;
+ ph->cpu_data.source_length = phyp_dump_info->cpu_state_size;
+ ph->cpu_data.destination_address = addr_offset;
+ addr_offset += phyp_dump_info->cpu_state_size;
+
+ ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+ ph->hpte_data.source_address = 0;
+ ph->hpte_data.source_length = phyp_dump_info->hpte_region_size;
+ ph->hpte_data.destination_address = addr_offset;
+ addr_offset += phyp_dump_info->hpte_region_size;
+
+ /* This section describes the low kernel region */
+ ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+ ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+ ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+ ph->kernel_data.destination_address = addr_offset;
+ addr_offset += ph->kernel_data.source_length;
+
+ return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+ int rc;
+ ph->cpu_data.destination_address += addr;
+ ph->hpte_data.destination_address += addr;
+ ph->kernel_data.destination_address += addr;
+
+ do {
+ rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+ 1, ph, sizeof(struct phyp_dump_header));
+ } while (rtas_busy_delay(rc));
+
+ if (rc)
+ printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
+ "register\n", rc);
+}
+
/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
@@ -107,7 +218,9 @@ static struct kobj_attribute rr = __ATTR
static int __init phyp_dump_setup(void)
{
struct device_node *rtas;
- const int *dump_header = NULL;
+ const struct phyp_dump_header *dump_header = NULL;
+ unsigned long dump_area_start;
+ unsigned long dump_area_length;
int header_len = 0;
int rc;
@@ -119,7 +232,13 @@ static int __init phyp_dump_setup(void)
if (!phyp_dump_info->phyp_dump_configured)
return -ENOSYS;
- /* Is there dump data waiting for us? */
+ /* Is there dump data waiting for us? If there isn't,
+ * then register a new dump area, and release all of
+ * the rest of the reserved ram.
+ *
+ * The /rtas/ibm,kernel-dump rtas node is present only
+ * if there is dump data waiting for us.
+ */
rtas = of_find_node_by_path("/rtas");
if (rtas) {
dump_header = of_get_property(rtas, "ibm,kernel-dump",
@@ -127,17 +246,23 @@ static int __init phyp_dump_setup(void)
of_node_put(rtas);
}
- if (dump_header == NULL)
+ dump_area_length = init_dump_header(&phdr);
+
+ /* align down */
+ dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
+
+ if (dump_header == NULL) {
+ register_dump_area(&phdr, dump_area_start);
return 0;
+ }
/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
rc = sysfs_create_file(kernel_kobj, &rr.attr);
- if (rc) {
+ if (rc)
printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
rc);
- return 0;
- }
+ /* ToDo: re-register the dump area, for next time. */
return 0;
}
machine_subsys_initcall(pseries, phyp_dump_setup);
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 5/8] pseries: phyp dump: debugging print routines.
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
` (3 preceding siblings ...)
2008-03-21 23:43 ` [PATCH 4/8] pseries: phyp dump: register dump area Manish Ahuja
@ 2008-03-21 23:44 ` Manish Ahuja
2008-03-21 23:45 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
` (2 subsequent siblings)
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:44 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
Provide some basic debugging support.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
---
arch/powerpc/platforms/pseries/phyp_dump.c | 61 ++++++++++++++++++++++++++++-
1 file changed, 59 insertions(+), 2 deletions(-)
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:52:53.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:54:44.000000000 -0500
@@ -123,6 +123,61 @@ static unsigned long init_dump_header(st
return addr_offset;
}
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+ printk(KERN_INFO "dump header:\n");
+ /* setup some ph->sections required */
+ printk(KERN_INFO "version = %d\n", ph->version);
+ printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+ printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+ /* No ph->disk, so all should be set to 0 */
+ printk(KERN_INFO "Offset to first section 0x%x\n",
+ ph->first_offset_section);
+ printk(KERN_INFO "dump disk sections should be zero\n");
+ printk(KERN_INFO "dump disk section = %d\n", ph->dump_disk_section);
+ printk(KERN_INFO "block num = %ld\n", ph->block_num_dd);
+ printk(KERN_INFO "number of blocks = %ld\n", ph->num_of_blocks_dd);
+ printk(KERN_INFO "dump disk offset = %d\n", ph->offset_dd);
+ printk(KERN_INFO "Max auto time= %d\n", ph->maxtime_to_auto);
+
+ /*set cpu state and hpte states as well scratch pad area */
+ printk(KERN_INFO " CPU AREA \n");
+ printk(KERN_INFO "cpu dump_flags =%d\n", ph->cpu_data.dump_flags);
+ printk(KERN_INFO "cpu source_type =%d\n", ph->cpu_data.source_type);
+ printk(KERN_INFO "cpu error_flags =%d\n", ph->cpu_data.error_flags);
+ printk(KERN_INFO "cpu source_address =%lx\n",
+ ph->cpu_data.source_address);
+ printk(KERN_INFO "cpu source_length =%lx\n",
+ ph->cpu_data.source_length);
+ printk(KERN_INFO "cpu length_copied =%lx\n",
+ ph->cpu_data.length_copied);
+
+ printk(KERN_INFO " HPTE AREA \n");
+ printk(KERN_INFO "HPTE dump_flags =%d\n", ph->hpte_data.dump_flags);
+ printk(KERN_INFO "HPTE source_type =%d\n", ph->hpte_data.source_type);
+ printk(KERN_INFO "HPTE error_flags =%d\n", ph->hpte_data.error_flags);
+ printk(KERN_INFO "HPTE source_address =%lx\n",
+ ph->hpte_data.source_address);
+ printk(KERN_INFO "HPTE source_length =%lx\n",
+ ph->hpte_data.source_length);
+ printk(KERN_INFO "HPTE length_copied =%lx\n",
+ ph->hpte_data.length_copied);
+
+ printk(KERN_INFO " SRSD AREA \n");
+ printk(KERN_INFO "SRSD dump_flags =%d\n", ph->kernel_data.dump_flags);
+ printk(KERN_INFO "SRSD source_type =%d\n", ph->kernel_data.source_type);
+ printk(KERN_INFO "SRSD error_flags =%d\n", ph->kernel_data.error_flags);
+ printk(KERN_INFO "SRSD source_address =%lx\n",
+ ph->kernel_data.source_address);
+ printk(KERN_INFO "SRSD source_length =%lx\n",
+ ph->kernel_data.source_length);
+ printk(KERN_INFO "SRSD length_copied =%lx\n",
+ ph->kernel_data.length_copied);
+#endif
+}
+
static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
{
int rc;
@@ -135,9 +190,11 @@ static void register_dump_area(struct ph
1, ph, sizeof(struct phyp_dump_header));
} while (rtas_busy_delay(rc));
- if (rc)
+ if (rc) {
printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
"register\n", rc);
+ print_dump_header(ph);
+ }
}
/* ------------------------------------------------- */
@@ -246,8 +303,8 @@ static int __init phyp_dump_setup(void)
of_node_put(rtas);
}
+ print_dump_header(dump_header);
dump_area_length = init_dump_header(&phdr);
-
/* align down */
dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
` (4 preceding siblings ...)
2008-03-21 23:44 ` [PATCH 5/8] pseries: phyp dump: debugging print routines Manish Ahuja
@ 2008-03-21 23:45 ` Manish Ahuja
2008-03-21 23:47 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
2008-03-21 23:50 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:45 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
Routines to
a. invalidate dump
b. Calculate region that is reserved and needs to be freed. This is
exported through sysfs interface.
Unregister has been removed for now as it wasn't being used.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
---
arch/powerpc/platforms/pseries/phyp_dump.c | 83 ++++++++++++++++++++++++++---
include/asm-powerpc/phyp_dump.h | 3 +
2 files changed, 80 insertions(+), 6 deletions(-)
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-20 21:52:59.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-20 21:55:52.000000000 -0500
@@ -70,6 +70,10 @@ static struct phyp_dump_header phdr;
#define DUMP_SOURCE_CPU 0x0001
#define DUMP_SOURCE_HPTE 0x0002
#define DUMP_SOURCE_RMO 0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
/**
* init_dump_header() - initialize the header declaring a dump
@@ -181,9 +185,15 @@ static void print_dump_header(const stru
static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
{
int rc;
- ph->cpu_data.destination_address += addr;
- ph->hpte_data.destination_address += addr;
- ph->kernel_data.destination_address += addr;
+
+ /* Add addr value if not initialized before */
+ if (ph->cpu_data.destination_address == 0) {
+ ph->cpu_data.destination_address += addr;
+ ph->hpte_data.destination_address += addr;
+ ph->kernel_data.destination_address += addr;
+ }
+
+ /* ToDo Invalidate kdump and free memory range. */
do {
rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -197,6 +207,30 @@ static void register_dump_area(struct ph
}
}
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+ int rc;
+
+ /* Add addr value if not initialized before */
+ if (ph->cpu_data.destination_address == 0) {
+ ph->cpu_data.destination_address += addr;
+ ph->hpte_data.destination_address += addr;
+ ph->kernel_data.destination_address += addr;
+ }
+
+ do {
+ rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+ 2, ph, sizeof(struct phyp_dump_header));
+ } while (rtas_busy_delay(rc));
+
+ if (rc) {
+ printk(KERN_ERR "phyp-dump: unexpected error (%d) "
+ "on invalidate\n", rc);
+ print_dump_header(ph);
+ }
+}
+
/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
@@ -207,8 +241,8 @@ static void register_dump_area(struct ph
* lmb_reserved in early boot. The released memory becomes
* available for genreal use.
*/
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static void release_memory_range(unsigned long start_pfn,
+ unsigned long nr_pages)
{
struct page *rpage;
unsigned long end_pfn;
@@ -269,8 +303,29 @@ static ssize_t store_release_region(stru
return count;
}
+static ssize_t show_release_region(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ u64 second_addr_range;
+
+ /* total reserved size - start of scratch area */
+ second_addr_range = phyp_dump_info->init_reserve_size -
+ phyp_dump_info->reserved_scratch_size;
+ return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:"
+ " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n",
+ phdr.cpu_data.destination_address,
+ phdr.cpu_data.length_copied,
+ phdr.hpte_data.destination_address,
+ phdr.hpte_data.length_copied,
+ phdr.kernel_data.destination_address,
+ phdr.kernel_data.length_copied,
+ phyp_dump_info->init_reserve_start,
+ second_addr_range);
+}
+
static struct kobj_attribute rr = __ATTR(release_region, 0600,
- NULL, store_release_region);
+ show_release_region,
+ store_release_region);
static int __init phyp_dump_setup(void)
{
@@ -313,6 +368,22 @@ static int __init phyp_dump_setup(void)
return 0;
}
+ /* re-register the dump area, if old dump was invalid */
+ if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) {
+ invalidate_last_dump(&phdr, dump_area_start);
+ register_dump_area(&phdr, dump_area_start);
+ return 0;
+ }
+
+ if (dump_header) {
+ phyp_dump_info->reserved_scratch_addr =
+ dump_header->cpu_data.destination_address;
+ phyp_dump_info->reserved_scratch_size =
+ dump_header->cpu_data.source_length +
+ dump_header->hpte_data.source_length +
+ dump_header->kernel_data.source_length;
+ }
+
/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
rc = sysfs_create_file(kernel_kobj, &rr.attr);
if (rc)
Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h 2008-03-20 21:23:45.000000000 -0500
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h 2008-03-20 21:53:38.000000000 -0500
@@ -30,6 +30,9 @@ struct phyp_dump {
/* store cpu & hpte size */
unsigned long cpu_state_size;
unsigned long hpte_region_size;
+ /* previous scratch area values */
+ unsigned long reserved_scratch_addr;
+ unsigned long reserved_scratch_size;
};
extern struct phyp_dump *phyp_dump_info;
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
` (5 preceding siblings ...)
2008-03-21 23:45 ` [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas Manish Ahuja
@ 2008-03-21 23:47 ` Manish Ahuja
2008-03-21 23:50 ` [PATCH 8/8] pseries: phyp dump: config file Manish Ahuja
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:47 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and
send in large chunks for now.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
---
arch/powerpc/platforms/pseries/phyp_dump.c | 35 +++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:14:00.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:14:05.000000000 -0500
@@ -261,6 +261,39 @@ static void release_memory_range(unsigne
}
}
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+ static unsigned long scratch_area_size, reserved_area_size;
+
+ if (addr < phyp_dump_info->init_reserve_start)
+ return;
+
+ if ((addr >= phyp_dump_info->init_reserve_start) &&
+ (addr <= phyp_dump_info->init_reserve_start +
+ phyp_dump_info->init_reserve_size))
+ reserved_area_size += length;
+
+ if ((addr >= phyp_dump_info->reserved_scratch_addr) &&
+ (addr <= phyp_dump_info->reserved_scratch_addr +
+ phyp_dump_info->reserved_scratch_size))
+ scratch_area_size += length;
+
+ if ((reserved_area_size == phyp_dump_info->init_reserve_size) &&
+ (scratch_area_size == phyp_dump_info->reserved_scratch_size)) {
+
+ invalidate_last_dump(&phdr,
+ phyp_dump_info->reserved_scratch_addr);
+ register_dump_area(&phdr,
+ phyp_dump_info->reserved_scratch_addr);
+ }
+}
+
/* ------------------------------------------------- */
/**
* sysfs_release_region -- sysfs interface to release memory range.
@@ -285,6 +318,8 @@ static ssize_t store_release_region(stru
if (ret != 2)
return -EINVAL;
+ track_freed_range(start_addr, length);
+
/* Range-check - don't free any reserved memory that
* wasn't reserved for phyp-dump */
if (start_addr < phyp_dump_info->init_reserve_start)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 8/8] pseries: phyp dump: config file
2008-03-21 22:42 [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump Manish Ahuja
` (6 preceding siblings ...)
2008-03-21 23:47 ` [PATCH 7/8] pseries: phyp dump: Tracking memory range freed Manish Ahuja
@ 2008-03-21 23:50 ` Manish Ahuja
7 siblings, 0 replies; 13+ messages in thread
From: Manish Ahuja @ 2008-03-21 23:50 UTC (permalink / raw)
To: linuxppc-dev, paulus, michael; +Cc: mahuja, linasvepstas
Add hypervisor-assisted dump to kernel config
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
---
arch/powerpc/Kconfig | 10 ++++++++++
1 file changed, 10 insertions(+)
Index: 2.6.25-rc1/arch/powerpc/Kconfig
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/Kconfig 2008-03-20 20:53:33.000000000 -0500
+++ 2.6.25-rc1/arch/powerpc/Kconfig 2008-03-20 21:06:29.000000000 -0500
@@ -306,6 +306,16 @@ config CRASH_DUMP
Don't change this unless you know what you are doing.
+config PHYP_DUMP
+ bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+ depends on PPC_PSERIES && EXPERIMENTAL
+ help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say "N"
+
config PPCBUG_NVRAM
bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
default y if PPC_PREP
^ permalink raw reply [flat|nested] 13+ messages in thread