* [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump
@ 2007-11-21 22:36 Linas Vepstas
2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides
documentation explaining what this is :-). Yes, its supposed
to be an improvement over kdump.
The patches mostly sort-of work; a list of open issues
is inculded in the documentation. It also appears that
the not-yet-released firmware versions this was tested
on are still, ahem, incomplete; this work is also pending.
-- Linas & Manish
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC 1/6]: phyp dump: Documentation
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
@ 2007-11-21 22:37 ` Linas Vepstas
2007-11-21 22:39 ` [PATCH/RFC 2/6]: phyp dump: config file Linas Vepstas
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:37 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
Basic documentation for hypervisor-assisted dump.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
Documentation/powerpc/phyp-assisted-dump.txt | 126 +++++++++++++++++++++++++++
1 file changed, 126 insertions(+)
Index: linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt 2007-11-21 16:26:44.000000000 -0600
@@ -0,0 +1,126 @@
+
+ Hypervisor-Assisted Dump
+ ------------------------
+ November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+ with a fresh copy of the kernel. In particular,
+ PCI and I/O devices have been reinitialized and are
+ in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+ immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+ required; the system will be fully usable, and running
+ in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+ the low 256MB of RAM to a previously registered
+ save region. It will also save system state, system
+ registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+ hypervisor will reset PCI and other hardware state.
+ It will *not* clear RAM. It will then launch the
+ bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+ is a new node (ibm,dump-kernel) in the device tree,
+ indicating that there is crash data available from
+ a previous boot. It will boot into only 256MB of RAM,
+ reserving the rest of system memory.
+
+-- Userspace tools will read /proc/kcore to obtain the
+ contents of memory, which holds the previous crashed
+ kernel. The userspace tools may copy this info to
+ disk, or network, nas, san, iscsi, etc. as desired.
+
+-- As the userspace tools complete saving a portion of
+ dump, they echo an offset and size to
+ /sys/kernel/release_region to release the reserved
+ memory back to general use.
+
+ An example of this is:
+ "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues:
+------------
+ o User-space dump tool integration is completely unresolved.
+
+ o The various code paths that tell the hypervisor that a crash
+ occurred, vs. it simply being a normal reboot, should be
+ reviewed, and possibly clarified/fixed.
+
+ o The real-virtual mapping is awkward and unaddressed. There
+ is currently no clear way of matching up the contents of
+ /proc/kcore to the values that need to be sent to
+ /sys/kernel/release_region
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+ instead? There is a dump_subsys being created by the s390 code,
+ perhaps the pseries code should use a similar layout as well.
+
+ o Saved system registers and HPTE tables will be located in high
+ memory. There is currently no way of telling user-space where
+ these are located.
+
+ o The post-dump procedures are incomplete. In particular,
+ after a dump as been taken, the system should re-register
+ with the hypervisor, so that a subsequent crash can be handled.
+
+ o The hypervisor may have an error preserving the dump data.
+ The current code does not check for this error, and does
+ not handle it.
+
+ o Is reserving a 256MB region really required? The goal of
+ reserving a 256MB scratch area is to make sure that no
+ important crash data is clobbered when the hypervisor
+ save low mem to the scratch area. But, if one could assure
+ that nothing important is located in some 256MB area, then
+ it would not need to be reserved.
+
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC 2/6]: phyp dump: config file
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
@ 2007-11-21 22:39 ` Linas Vepstas
2007-11-21 22:40 ` [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept Linas Vepstas
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
Add hypervisor-assisted dump to kernel config
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
-----
arch/powerpc/Kconfig | 11 +++++++++++
1 file changed, 11 insertions(+)
Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 16:39:20.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig 2007-11-15 14:27:33.000000000 -0600
@@ -261,6 +261,17 @@ config CRASH_DUMP
Don't change this unless you know what you are doing.
+config PHYP_DUMP
+ bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+ depends on PPC_PSERIES && EXPERIMENTAL
+ default y
+ help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say "Y"
+
config PPCBUG_NVRAM
bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
default y if PPC_PREP
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
2007-11-21 22:39 ` [PATCH/RFC 2/6]: phyp dump: config file Linas Vepstas
@ 2007-11-21 22:40 ` Linas Vepstas
2007-11-21 22:41 ` [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem Linas Vepstas
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:40 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
Initial rough-in/proof of concept of reserving memory in
early boot, and freeing it later. If the previous boot
had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
arch/powerpc/kernel/prom.c | 33 +++++++++++++
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++
include/asm-powerpc/phyp_dump.h | 32 +++++++++++++
4 files changed, 137 insertions(+)
Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.000000000 -0600
@@ -0,0 +1,32 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyright (c) 2007 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END (1UL<<28)
+
+struct phyp_dump {
+ /* Memory that is reserved during very early boot. */
+ unsigned long init_reserve_start;
+ unsigned long init_reserve_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.000000000 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyrhgit (c) 2007 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+ struct page *rpage;
+ unsigned long end_pfn;
+ long i;
+
+ end_pfn = start_pfn + nr_pages;
+
+ for (i=start_pfn; i <= end_pfn; i++) {
+ rpage = pfn_to_page(i);
+ if (PageReserved(rpage)) {
+ ClearPageReserved(rpage);
+ init_page_count(rpage);
+ __free_page(rpage);
+ totalram_pages++;
+ }
+ }
+}
+
+static int __init phyp_dump_setup(void)
+{
+ unsigned long start_pfn, nr_pages;
+
+ /* If no memory was reserved in early boot, there is nothing to do */
+ if (phyp_dump_info->init_reserve_size == 0)
+ return 0;
+
+ /* Release memory that was reserved in early boot */
+ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+ nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+ release_memory_range(start_pfn, nr_pages);
+
+ return 0;
+}
+
+subsys_initcall(phyp_dump_setup);
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:44:21.000000000 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu
obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
obj-$(CONFIG_HVCS) += hvcserver.o
obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/prom.c 2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c 2007-11-19 17:44:21.000000000 -0600
@@ -51,6 +51,7 @@
#include <asm/machdep.h>
#include <asm/pSeries_reconfig.h>
#include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
#include <asm/kexec.h>
#ifdef DEBUG
@@ -1011,6 +1012,37 @@ static void __init early_reserve_mem(voi
#endif
}
+#ifdef CONFIG_PHYP_DUMP
+
+/**
+ * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine will reserve almost all of the memory in the
+ * system, except for a few hundred megabytes used to boot the
+ * new kernel. As the reserved memory is dumped to the dump
+ * device (by userland tools), it will be freed and made available.
+ */
+static void __init reserve_crashed_mem(void)
+{
+ unsigned long crashed_base, crashed_size;
+
+ /* Reserve *everything* above the RMR. We'll free this real soon. */
+ crashed_base = PHYP_DUMP_RMR_END;
+ crashed_size = lmb_end_of_DRAM() - crashed_base;
+
+ /* XXX crashed_ram_end is wrong, since it may be beyond
+ * the memory_limit, it will need to be adjusted. */
+ lmb_reserve(crashed_base, crashed_size);
+
+ phyp_dump_info->init_reserve_start = crashed_base;
+ phyp_dump_info->init_reserve_size = crashed_size;
+}
+
+#else
+static inline void __init reserve_crashed_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP */
+
+
void __init early_init_devtree(void *params)
{
DBG(" -> early_init_devtree(%p)\n", params);
@@ -1043,6 +1075,7 @@ void __init early_init_devtree(void *par
reserve_kdump_trampoline();
reserve_crashkernel();
early_reserve_mem();
+ reserve_crashed_mem();
lmb_enforce_memory_limit(memory_limit);
lmb_analyze();
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
` (2 preceding siblings ...)
2007-11-21 22:40 ` [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept Linas Vepstas
@ 2007-11-21 22:41 ` Linas Vepstas
2007-11-21 22:43 ` [PATCH/RFC 5/6]: phyp dump: register the dump area Linas Vepstas
2007-11-21 22:45 ` [PATCH/RFC 6/6]: phyp dump: debugging print routines Linas Vepstas
5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:41 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space
finishes dumping a section, it must release that memory
by writing to sysfs. For example,
echo "0x40000000 0x10000000" > /sys/kernel/release_region
will release 256MB starting at the 1GB. The released memory
becomes free for general use.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
------
arch/powerpc/platforms/pseries/phyp_dump.c | 101 +++++++++++++++++++++++++++--
1 file changed, 96 insertions(+), 5 deletions(-)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:15:05.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:24:30.000000000 -0600
@@ -12,17 +12,24 @@
*/
#include <linux/init.h>
+#include <linux/kobject.h>
#include <linux/mm.h>
+#include <linux/of.h>
#include <linux/pfn.h>
#include <linux/swap.h>
+#include <linux/sysfs.h>
#include <asm/page.h>
#include <asm/phyp_dump.h>
+#include <asm/rtas.h>
/* Global, used to communicate data between early boot and late boot */
static struct phyp_dump phyp_dump_global;
struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+static int ibm_configure_kernel_dump;
+
+/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
* @start_pfn: starting physical frame number
@@ -52,18 +59,102 @@ release_memory_range(unsigned long start
}
}
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ * "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ * "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
{
+ unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+ ssize_t ret;
- /* If no memory was reserved in early boot, there is nothing to do */
- if (phyp_dump_info->init_reserve_size == 0)
- return 0;
+ ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+ if (ret != 2)
+ return -EINVAL;
+
+ /* Range-check - don't free any reserved memory that
+ * wasn't reserved for phyp-dump */
+ if (start_addr < phyp_dump_info->init_reserve_start)
+ start_addr = phyp_dump_info->init_reserve_start;
+
+ end_addr = phyp_dump_info->init_reserve_start +
+ phyp_dump_info->init_reserve_size;
+ if (start_addr+length > end_addr)
+ length = end_addr - start_addr;
+
+ /* Release the region of memory assed in by user */
+ start_pfn = PFN_DOWN(start_addr);
+ nr_pages = PFN_DOWN(length);
+ release_memory_range (start_pfn, nr_pages);
+
+ return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+ return sprintf(buf, "ola\n");
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+ show_release_region,
+ store_release_region);
+
+/* ------------------------------------------------- */
+
+static void release_all (void)
+{
+ unsigned long start_pfn, nr_pages;
- /* Release memory that was reserved in early boot */
+ /* Release all memory that was reserved in early boot */
start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
release_memory_range(start_pfn, nr_pages);
+}
+
+static int __init phyp_dump_setup(void)
+{
+ struct device_node *rtas;
+ const int *dump_header;
+ int header_len = 0;
+ int rc;
+
+ /* If no memory was reserved in early boot, there is nothing to do */
+ if (phyp_dump_info->init_reserve_size == 0)
+ return 0;
+
+ /* Return if phyp dump not supported */
+ ibm_configure_kernel_dump = rtas_token("ibm,configure-kernel-dump");
+ if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) {
+ release_all();
+ return -ENOSYS;
+ }
+
+ /* Is there dump data waiting for us? */
+ rtas = of_find_node_by_path("/rtas");
+ dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+ if (dump_header == NULL) {
+ release_all();
+ return 0;
+ }
+
+ /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+ rc = subsys_create_file(&kernel_subsys, &rr);
+ if (rc) {
+ printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
+ release_all();
+ return 0;
+ }
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC 5/6]: phyp dump: register the dump area
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
` (3 preceding siblings ...)
2007-11-21 22:41 ` [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem Linas Vepstas
@ 2007-11-21 22:43 ` Linas Vepstas
2007-11-21 22:45 ` [PATCH/RFC 6/6]: phyp dump: debugging print routines Linas Vepstas
5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:43 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
Set up the actual dump header, register it with the hypervisor.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
------
arch/powerpc/platforms/pseries/phyp_dump.c | 169 +++++++++++++++++++++++++++--
1 file changed, 163 insertions(+), 6 deletions(-)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 15:55:37.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:06:52.000000000 -0600
@@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = &phyp
static int ibm_configure_kernel_dump;
/* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+ u32 dump_flags;
+ u16 source_type;
+ u16 error_flags;
+ u64 source_address;
+ u64 source_length;
+ u64 length_copied;
+ u64 destination_address;
+};
+
+struct phyp_dump_header {
+ u32 version;
+ u16 num_of_sections;
+ u16 status;
+
+ u32 first_offset_section;
+ u32 dump_disk_section;
+ u64 block_num_dd;
+ u64 num_of_blocks_dd;
+ u32 offset_dd;
+ u32 maxtime_to_auto;
+ /* No dump disk path string used */
+
+ struct dump_section cpu_data;
+ struct dump_section hpte_data;
+ struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO 0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+ struct device_node *rtas;
+ const unsigned int *sizes;
+ int len;
+ unsigned long cpu_state_size = 0;
+ unsigned long hpte_region_size = 0;
+ unsigned long addr_offset = 0;
+
+ /* Get the required dump region sizes */
+ rtas = of_find_node_by_path("/rtas");
+ sizes = of_get_property(rtas, "ibm,configure-kernel-dump-sizes", &len);
+ if (!sizes || len < 20)
+ return 0;
+
+ if (sizes[0] == 1)
+ cpu_state_size = *((unsigned long *) &sizes[1]);
+
+ if (sizes[3] == 2)
+ hpte_region_size = *((unsigned long *) &sizes[4]);
+
+ /* Set up the dump header */
+ ph->version = DUMP_HEADER_VERSION;
+ ph->num_of_sections = NUM_DUMP_SECTIONS;
+ ph->status = 0;
+
+ ph->first_offset_section =
+ (u32) &(((struct phyp_dump_header *) 0)->cpu_data);
+ ph->dump_disk_section = 0;
+ ph->block_num_dd = 0;
+ ph->num_of_blocks_dd = 0;
+ ph->offset_dd = 0;
+
+ ph->maxtime_to_auto = 0; /* disabled */
+
+ /* The first two sections are mandatory */
+ ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+ ph->cpu_data.source_address = 0;
+ ph->cpu_data.source_length = cpu_state_size;
+ ph->cpu_data.destination_address = addr_offset;
+ addr_offset += cpu_state_size;
+
+ ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+ ph->hpte_data.source_address = 0;
+ ph->hpte_data.source_length = hpte_region_size;
+ ph->hpte_data.destination_address = addr_offset;
+ addr_offset += hpte_region_size;
+
+ /* This section describes the low kernel region */
+ ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+ ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+ ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+ ph->kernel_data.destination_address = addr_offset;
+ addr_offset += ph->kernel_data.source_length;
+
+ return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+ int rc;
+ ph->cpu_data.destination_address += addr;
+ ph->hpte_data.destination_address += addr;
+ ph->kernel_data.destination_address += addr;
+
+ do {
+ rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+ 1, ph, sizeof(struct phyp_dump_header));
+ } while (rtas_busy_delay(rc));
+
+ if (rc)
+ {
+ printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+ }
+}
+
+/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
* @start_pfn: starting physical frame number
@@ -125,7 +253,11 @@ static void release_all (void)
static int __init phyp_dump_setup(void)
{
struct device_node *rtas;
- const int *dump_header;
+ const struct phyp_dump_header *dump_header;
+ unsigned long dump_area_start;
+ unsigned long dump_area_length;
+ unsigned long free_area_length;
+ unsigned long start_pfn, nr_pages;
int header_len = 0;
int rc;
@@ -140,22 +272,47 @@ static int __init phyp_dump_setup(void)
return -ENOSYS;
}
- /* Is there dump data waiting for us? */
+ /* Is there dump data waiting for us? If there isn't,
+ * then register a new dump area, and release all of
+ * the rest of the reserved ram.
+ *
+ * The /rtas/ibm,kernel-dump rtas node is present only
+ * if there is dump data waiting for us.
+ */
rtas = of_find_node_by_path("/rtas");
dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+
+ dump_area_length = init_dump_header (&phdr);
+ free_area_length = phyp_dump_info->init_reserve_size - dump_area_length;
+ dump_area_start = phyp_dump_info->init_reserve_start + free_area_length;
+ dump_area_start = dump_area_start & PAGE_MASK; /* align down */
+ free_area_length = dump_area_start - phyp_dump_info->init_reserve_start;
+
if (dump_header == NULL) {
- release_all();
- return 0;
+ register_dump_area (&phdr, dump_area_start);
+ goto release_mem;
}
+ /* Don't allow user to release the 256MB scratch area */
+ phyp_dump_info->init_reserve_size = free_area_length;
+
/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
rc = subsys_create_file(&kernel_subsys, &rr);
if (rc) {
printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
- release_all();
- return 0;
+ goto release_mem;
}
+ /* ToDo: re-register the dump area, for next time. */
+
+ return 0;
+
+release_mem:
+ /* release everything except the top 256 MB scratch area */
+ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+ nr_pages = PFN_DOWN(free_area_length);
+ release_memory_range(start_pfn, nr_pages);
+
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH/RFC 6/6]: phyp dump: debugging print routines.
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
` (4 preceding siblings ...)
2007-11-21 22:43 ` [PATCH/RFC 5/6]: phyp dump: register the dump area Linas Vepstas
@ 2007-11-21 22:45 ` Linas Vepstas
5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake
Provide some basic debugging support.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepsts <linas@austin.ibm.com>
-----
arch/powerpc/platforms/pseries/phyp_dump.c | 51 +++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:12:21.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:12:46.000000000 -0600
@@ -139,6 +139,51 @@ static unsigned long init_dump_header(st
return addr_offset;
}
+#ifdef DEBUG
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+ printk(KERN_INFO "dump header:\n");
+ /* setup some ph->sections required */
+ printk(KERN_INFO "version = %d\n", ph->version);
+ printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+ printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+ /* No ph->disk, so all should be set to 0 */
+ printk(KERN_INFO "Offset to first section 0x%x\n", ph->first_offset_section);
+ printk(KERN_INFO "dump disk sections should be zero\n");
+ printk(KERN_INFO "dump disk section = %d\n",ph->dump_disk_section);
+ printk(KERN_INFO "block num = %ld\n",ph->block_num_dd);
+ printk(KERN_INFO "number of blocks = %ld\n",ph->num_of_blocks_dd);
+ printk(KERN_INFO "dump disk offset = %d\n",ph->offset_dd);
+ printk(KERN_INFO "Max auto time= %d\n",ph->maxtime_to_auto);
+
+ /*set cpu state and hpte states as well scratch pad area */
+ printk(KERN_INFO " CPU AREA \n");
+ printk(KERN_INFO "cpu dump_flags =%d\n",ph->cpu_data.dump_flags);
+ printk(KERN_INFO "cpu source_type =%d\n",ph->cpu_data.source_type);
+ printk(KERN_INFO "cpu error_flags =%d\n",ph->cpu_data.error_flags);
+ printk(KERN_INFO "cpu source_address =%lx\n",ph->cpu_data.source_address);
+ printk(KERN_INFO "cpu source_length =%lx\n",ph->cpu_data.source_length);
+ printk(KERN_INFO "cpu length_copied =%lx\n",ph->cpu_data.length_copied);
+
+ printk(KERN_INFO " HPTE AREA \n");
+ printk(KERN_INFO "HPTE dump_flags =%d\n",ph->hpte_data.dump_flags);
+ printk(KERN_INFO "HPTE source_type =%d\n",ph->hpte_data.source_type);
+ printk(KERN_INFO "HPTE error_flags =%d\n",ph->hpte_data.error_flags);
+ printk(KERN_INFO "HPTE source_address =%lx\n",ph->hpte_data.source_address);
+ printk(KERN_INFO "HPTE source_length =%lx\n",ph->hpte_data.source_length);
+ printk(KERN_INFO "HPTE length_copied =%lx\n",ph->hpte_data.length_copied);
+
+ printk(KERN_INFO " SRSD AREA \n");
+ printk(KERN_INFO "SRSD dump_flags =%d\n",ph->kernel_data.dump_flags);
+ printk(KERN_INFO "SRSD source_type =%d\n",ph->kernel_data.source_type);
+ printk(KERN_INFO "SRSD error_flags =%d\n",ph->kernel_data.error_flags);
+ printk(KERN_INFO "SRSD source_address =%lx\n",ph->kernel_data.source_address);
+ printk(KERN_INFO "SRSD source_length =%lx\n",ph->kernel_data.source_length);
+ printk(KERN_INFO "SRSD length_copied =%lx\n",ph->kernel_data.length_copied);
+}
+#endif
+
static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
{
int rc;
@@ -154,6 +199,9 @@ static void register_dump_area(struct ph
if (rc)
{
printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+#ifdef DEBUG
+ print_dump_header (ph);
+#endif
}
}
@@ -292,6 +340,9 @@ static int __init phyp_dump_setup(void)
register_dump_area (&phdr, dump_area_start);
goto release_mem;
}
+#ifdef DEBUG
+ print_dump_header (dump_header);
+#endif
/* Don't allow user to release the 256MB scratch area */
phyp_dump_info->init_reserve_size = free_area_length;
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-11-21 22:45 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
2007-11-21 22:39 ` [PATCH/RFC 2/6]: phyp dump: config file Linas Vepstas
2007-11-21 22:40 ` [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept Linas Vepstas
2007-11-21 22:41 ` [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem Linas Vepstas
2007-11-21 22:43 ` [PATCH/RFC 5/6]: phyp dump: register the dump area Linas Vepstas
2007-11-21 22:45 ` [PATCH/RFC 6/6]: phyp dump: debugging print routines Linas Vepstas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).