linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump
@ 2007-11-21 22:36 Linas Vepstas
  2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:36 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja


The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is :-). Yes, its supposed
to be an improvement over kdump.

The patches mostly sort-of work; a list of open issues
is inculded in the documentation.  It also appears that 
the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

-- Linas & Manish

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC 1/6]: phyp dump: Documentation
  2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
@ 2007-11-21 22:37 ` Linas Vepstas
  2007-11-21 22:39 ` [PATCH/RFC 2/6]: phyp dump: config file Linas Vepstas
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:37 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja


Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

----
 Documentation/powerpc/phyp-assisted-dump.txt |  126 +++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

Index: linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt	2007-11-21 16:26:44.000000000 -0600
@@ -0,0 +1,126 @@
+
+                   Hypervisor-Assisted Dump
+                   ------------------------
+                       November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will read /proc/kcore to obtain the
+   contents of memory, which holds the previous crashed
+   kernel. The userspace tools may copy this info to
+   disk, or network, nas, san, iscsi, etc. as desired.
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues:
+------------
+ o User-space dump tool integration is completely unresolved.
+
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o The real-virtual mapping is awkward and unaddressed. There
+   is currently no clear way of matching up the contents of
+   /proc/kcore to the values that need to be sent to
+   /sys/kernel/release_region
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o Saved system registers and HPTE tables will be located in high
+   memory. There is currently no way of telling user-space where
+   these are located.
+
+ o The post-dump procedures are incomplete. In particular,
+   after a dump as been taken, the system should re-register
+   with the hypervisor, so that a subsequent crash can be handled.
+
+ o The hypervisor may have an error preserving the dump data.
+   The current code does not check for this error, and does
+   not handle it.
+
+ o Is reserving a 256MB region really required? The goal of
+   reserving a 256MB scratch area is to make sure that no
+   important crash data is clobbered when the hypervisor
+   save low mem to the scratch area. But, if one could assure
+   that nothing important is located in some 256MB area, then
+   it would not need to be reserved.
+

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC 2/6]: phyp dump: config file
  2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
  2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
@ 2007-11-21 22:39 ` Linas Vepstas
  2007-11-21 22:40 ` [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept Linas Vepstas
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:39 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja


Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

-----
 arch/powerpc/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig	2007-11-14 16:39:20.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig	2007-11-15 14:27:33.000000000 -0600
@@ -261,6 +261,17 @@ config CRASH_DUMP
 
 	  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+	bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+	depends on PPC_PSERIES && EXPERIMENTAL
+	default y
+	help
+	  Hypervisor-assisted dump is meant to be a kdump replacement
+	  offering robustness and speed not possible without system
+	  hypervisor assistence.
+
+	  If unsure, say "Y"
+
 config PPCBUG_NVRAM
 	bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
 	default y if PPC_PREP

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept
  2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
  2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
  2007-11-21 22:39 ` [PATCH/RFC 2/6]: phyp dump: config file Linas Vepstas
@ 2007-11-21 22:40 ` Linas Vepstas
  2007-11-21 22:41 ` [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem Linas Vepstas
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:40 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja


Initial rough-in/proof of concept of reserving memory in
early boot, and freeing it later. If the previous boot
had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

----
 arch/powerpc/kernel/prom.c                 |   33 +++++++++++++
 arch/powerpc/platforms/pseries/Makefile    |    1 
 arch/powerpc/platforms/pseries/phyp_dump.c |   71 +++++++++++++++++++++++++++++
 include/asm-powerpc/phyp_dump.h            |   32 +++++++++++++
 4 files changed, 137 insertions(+)

Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h	2007-11-19 17:44:21.000000000 -0600
@@ -0,0 +1,32 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyright (c) 2007 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL<<28)
+
+struct phyp_dump {
+	/* Memory that is reserved during very early boot. */
+	unsigned long init_reserve_start;
+	unsigned long init_reserve_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c	2007-11-19 19:07:49.000000000 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyrhgit (c) 2007 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+	struct page *rpage;
+	unsigned long end_pfn;
+	long i;
+
+	end_pfn = start_pfn + nr_pages;
+
+	for (i=start_pfn; i <= end_pfn; i++) {
+		rpage = pfn_to_page(i);
+		if (PageReserved(rpage)) {
+			ClearPageReserved(rpage);
+			init_page_count(rpage);
+			__free_page(rpage);
+			totalram_pages++;
+		}
+	}
+}
+
+static int __init phyp_dump_setup(void)
+{
+	unsigned long start_pfn, nr_pages;
+
+	/* If no memory was reserved in early boot, there is nothing to do */
+	if (phyp_dump_info->init_reserve_size == 0)
+		return 0;
+
+	/* Release memory that was reserved in early boot */
+	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+	release_memory_range(start_pfn, nr_pages);
+
+	return 0;
+}
+
+subsys_initcall(phyp_dump_setup);
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile	2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile	2007-11-19 17:44:21.000000000 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu
 obj-$(CONFIG_HVC_CONSOLE)	+= hvconsole.o
 obj-$(CONFIG_HVCS)		+= hvcserver.o
 obj-$(CONFIG_HCALL_STATS)	+= hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP)	+= phyp_dump.o
Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/prom.c	2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c	2007-11-19 17:44:21.000000000 -0600
@@ -51,6 +51,7 @@
 #include <asm/machdep.h>
 #include <asm/pSeries_reconfig.h>
 #include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
 #include <asm/kexec.h>
 
 #ifdef DEBUG
@@ -1011,6 +1012,37 @@ static void __init early_reserve_mem(voi
 #endif
 }
 
+#ifdef CONFIG_PHYP_DUMP
+
+/**
+ * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine will reserve almost all of the memory in the
+ * system, except for a few hundred megabytes used to boot the
+ * new kernel. As the reserved memory is dumped to the dump
+ * device (by userland tools), it will be freed and made available.
+ */
+static void __init reserve_crashed_mem(void)
+{
+	unsigned long crashed_base, crashed_size;
+
+	/* Reserve *everything* above the RMR. We'll free this real soon. */
+	crashed_base = PHYP_DUMP_RMR_END;
+	crashed_size = lmb_end_of_DRAM() - crashed_base;
+
+	/* XXX crashed_ram_end is wrong, since it may be beyond
+	 * the memory_limit, it will need to be adjusted. */
+	lmb_reserve(crashed_base, crashed_size);
+
+	phyp_dump_info->init_reserve_start = crashed_base;
+	phyp_dump_info->init_reserve_size = crashed_size;
+}
+
+#else
+static inline void __init reserve_crashed_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP */
+
+
 void __init early_init_devtree(void *params)
 {
 	DBG(" -> early_init_devtree(%p)\n", params);
@@ -1043,6 +1075,7 @@ void __init early_init_devtree(void *par
 	reserve_kdump_trampoline();
 	reserve_crashkernel();
 	early_reserve_mem();
+	reserve_crashed_mem();
 
 	lmb_enforce_memory_limit(memory_limit);
 	lmb_analyze();

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem
  2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
                   ` (2 preceding siblings ...)
  2007-11-21 22:40 ` [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept Linas Vepstas
@ 2007-11-21 22:41 ` Linas Vepstas
  2007-11-21 22:43 ` [PATCH/RFC 5/6]: phyp dump: register the dump area Linas Vepstas
  2007-11-21 22:45 ` [PATCH/RFC 6/6]: phyp dump: debugging print routines Linas Vepstas
  5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:41 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja


Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo "0x40000000 0x10000000" > /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |  101 +++++++++++++++++++++++++++--
 1 file changed, 96 insertions(+), 5 deletions(-)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2007-11-21 13:15:05.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c	2007-11-21 13:24:30.000000000 -0600
@@ -12,17 +12,24 @@
  */
 
 #include <linux/init.h>
+#include <linux/kobject.h>
 #include <linux/mm.h>
+#include <linux/of.h>
 #include <linux/pfn.h>
 #include <linux/swap.h>
+#include <linux/sysfs.h>
 
 #include <asm/page.h>
 #include <asm/phyp_dump.h>
+#include <asm/rtas.h>
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = &phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+
+/* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -52,18 +59,102 @@ release_memory_range(unsigned long start
 	}
 }
 
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ *   "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
 {
+	unsigned long start_addr, length, end_addr;
 	unsigned long start_pfn, nr_pages;
+	ssize_t ret;
 
-	/* If no memory was reserved in early boot, there is nothing to do */
-	if (phyp_dump_info->init_reserve_size == 0)
-		return 0;
+	ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+	if (ret != 2)
+		return -EINVAL;
+
+	/* Range-check - don't free any reserved memory that
+	 * wasn't reserved for phyp-dump */
+	if (start_addr < phyp_dump_info->init_reserve_start)
+		start_addr = phyp_dump_info->init_reserve_start;
+
+	end_addr = phyp_dump_info->init_reserve_start +
+			phyp_dump_info->init_reserve_size;
+	if (start_addr+length > end_addr)
+		length = end_addr - start_addr;
+
+	/* Release the region of memory assed in by user */
+	start_pfn = PFN_DOWN(start_addr);
+	nr_pages = PFN_DOWN(length);
+	release_memory_range (start_pfn, nr_pages);
+
+	return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+	return sprintf(buf, "ola\n");
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+					 show_release_region,
+					 store_release_region);
+
+/* ------------------------------------------------- */
+
+static void release_all (void)
+{
+	unsigned long start_pfn, nr_pages;
 
-	/* Release memory that was reserved in early boot */
+	/* Release all memory that was reserved in early boot */
 	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
 	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
 	release_memory_range(start_pfn, nr_pages);
+}
+
+static int __init phyp_dump_setup(void)
+{
+	struct device_node *rtas;
+	const int *dump_header;
+	int header_len = 0;
+	int rc;
+
+	/* If no memory was reserved in early boot, there is nothing to do */
+	if (phyp_dump_info->init_reserve_size == 0)
+		return 0;
+
+	/* Return if phyp dump not supported */
+	ibm_configure_kernel_dump = rtas_token("ibm,configure-kernel-dump");
+	if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) {
+		release_all();
+		return -ENOSYS;
+	}
+
+	/* Is there dump data waiting for us? */
+	rtas = of_find_node_by_path("/rtas");
+	dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+	if (dump_header == NULL) {
+		release_all();
+		return 0;
+	}
+
+	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+	rc = subsys_create_file(&kernel_subsys, &rr);
+	if (rc) {
+		printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
+		release_all();
+		return 0;
+	}
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC 5/6]: phyp dump: register the dump area
  2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
                   ` (3 preceding siblings ...)
  2007-11-21 22:41 ` [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem Linas Vepstas
@ 2007-11-21 22:43 ` Linas Vepstas
  2007-11-21 22:45 ` [PATCH/RFC 6/6]: phyp dump: debugging print routines Linas Vepstas
  5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:43 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja


Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |  169 +++++++++++++++++++++++++++--
 1 file changed, 163 insertions(+), 6 deletions(-)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2007-11-21 15:55:37.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c	2007-11-21 16:06:52.000000000 -0600
@@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = &phyp
 static int ibm_configure_kernel_dump;
 
 /* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+	u32 dump_flags;
+	u16 source_type;
+	u16 error_flags;
+	u64 source_address;
+	u64 source_length;
+	u64 length_copied;
+	u64 destination_address;
+};
+
+struct phyp_dump_header {
+	u32 version;
+	u16 num_of_sections;
+	u16 status;
+
+	u32 first_offset_section;
+	u32 dump_disk_section;
+	u64 block_num_dd;
+	u64 num_of_blocks_dd;
+	u32 offset_dd;
+	u32 maxtime_to_auto;
+	/* No dump disk path string used */
+
+	struct dump_section cpu_data;
+	struct dump_section hpte_data;
+	struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+	struct device_node *rtas;
+	const unsigned int *sizes;
+	int len;
+	unsigned long cpu_state_size = 0;
+	unsigned long hpte_region_size = 0;
+	unsigned long addr_offset = 0;
+
+	/* Get the required dump region sizes */
+	rtas = of_find_node_by_path("/rtas");
+	sizes = of_get_property(rtas, "ibm,configure-kernel-dump-sizes", &len);
+	if (!sizes || len < 20)
+		return 0;
+
+	if (sizes[0] == 1)
+		cpu_state_size = *((unsigned long *) &sizes[1]);
+
+	if (sizes[3] == 2)
+		hpte_region_size = *((unsigned long *) &sizes[4]);
+
+	/* Set up the dump header */
+	ph->version = DUMP_HEADER_VERSION;
+	ph->num_of_sections = NUM_DUMP_SECTIONS;
+	ph->status = 0;
+
+	ph->first_offset_section =
+		(u32) &(((struct phyp_dump_header *) 0)->cpu_data);
+	ph->dump_disk_section = 0;
+	ph->block_num_dd = 0;
+	ph->num_of_blocks_dd = 0;
+	ph->offset_dd = 0;
+
+	ph->maxtime_to_auto = 0; /* disabled */
+
+	/* The first two sections are mandatory */
+	ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+	ph->cpu_data.source_address = 0;
+	ph->cpu_data.source_length = cpu_state_size;
+	ph->cpu_data.destination_address = addr_offset;
+	addr_offset += cpu_state_size;
+
+	ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+	ph->hpte_data.source_address = 0;
+	ph->hpte_data.source_length = hpte_region_size;
+	ph->hpte_data.destination_address = addr_offset;
+	addr_offset += hpte_region_size;
+
+	/* This section describes the low kernel region */
+	ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+	ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+	ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+	ph->kernel_data.destination_address = addr_offset;
+	addr_offset += ph->kernel_data.source_length;
+
+	return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+	int rc;
+	ph->cpu_data.destination_address += addr;
+	ph->hpte_data.destination_address += addr;
+	ph->kernel_data.destination_address += addr;
+
+	do {
+		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+		               1, ph, sizeof(struct phyp_dump_header));
+	} while (rtas_busy_delay(rc));
+
+	if (rc)
+	{
+		printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+	}
+}
+
+/* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -125,7 +253,11 @@ static void release_all (void)
 static int __init phyp_dump_setup(void)
 {
 	struct device_node *rtas;
-	const int *dump_header;
+	const struct phyp_dump_header *dump_header;
+	unsigned long dump_area_start;
+	unsigned long dump_area_length;
+	unsigned long free_area_length;
+	unsigned long start_pfn, nr_pages;
 	int header_len = 0;
 	int rc;
 
@@ -140,22 +272,47 @@ static int __init phyp_dump_setup(void)
 		return -ENOSYS;
 	}
 
-	/* Is there dump data waiting for us? */
+	/* Is there dump data waiting for us? If there isn't,
+	 * then register a new dump area, and release all of
+	 * the rest of the reserved ram.
+	 *
+	 * The /rtas/ibm,kernel-dump rtas node is present only
+	 * if there is dump data waiting for us.
+	 */
 	rtas = of_find_node_by_path("/rtas");
 	dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+
+	dump_area_length = init_dump_header (&phdr);
+	free_area_length = phyp_dump_info->init_reserve_size - dump_area_length;
+	dump_area_start = phyp_dump_info->init_reserve_start + free_area_length;
+	dump_area_start = dump_area_start & PAGE_MASK; /* align down */
+	free_area_length = dump_area_start - phyp_dump_info->init_reserve_start;
+
 	if (dump_header == NULL) {
-		release_all();
-		return 0;
+		register_dump_area (&phdr, dump_area_start);
+		goto release_mem;
 	}
 
+	/* Don't allow user to release the 256MB scratch area */
+	phyp_dump_info->init_reserve_size = free_area_length;
+
 	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 	rc = subsys_create_file(&kernel_subsys, &rr);
 	if (rc) {
 		printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
-		release_all();
-		return 0;
+		goto release_mem;
 	}
 
+	/* ToDo: re-register the dump area, for next time. */
+
+	return 0;
+
+release_mem:
+	/* release everything except the top 256 MB scratch area */
+	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+	nr_pages = PFN_DOWN(free_area_length);
+	release_memory_range(start_pfn, nr_pages);
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC 6/6]: phyp dump: debugging print routines.
  2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
                   ` (4 preceding siblings ...)
  2007-11-21 22:43 ` [PATCH/RFC 5/6]: phyp dump: register the dump area Linas Vepstas
@ 2007-11-21 22:45 ` Linas Vepstas
  5 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2007-11-21 22:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mahuja, lkessler, strosake


Provide some basic debugging support.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepsts <linas@austin.ibm.com>
-----

 arch/powerpc/platforms/pseries/phyp_dump.c |   51 +++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2007-11-21 16:12:21.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c	2007-11-21 16:12:46.000000000 -0600
@@ -139,6 +139,51 @@ static unsigned long init_dump_header(st
 	return addr_offset;
 }
 
+#ifdef DEBUG
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+	printk(KERN_INFO "dump header:\n");
+	/* setup some ph->sections required */
+	printk(KERN_INFO "version = %d\n", ph->version);
+	printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+	printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+	/* No ph->disk, so all should be set to 0 */
+	printk(KERN_INFO "Offset to first section 0x%x\n", ph->first_offset_section);
+	printk(KERN_INFO "dump disk sections should be zero\n");
+	printk(KERN_INFO "dump disk section = %d\n",ph->dump_disk_section);
+	printk(KERN_INFO "block num = %ld\n",ph->block_num_dd);
+	printk(KERN_INFO "number of blocks = %ld\n",ph->num_of_blocks_dd);
+	printk(KERN_INFO "dump disk offset = %d\n",ph->offset_dd);
+	printk(KERN_INFO "Max auto time= %d\n",ph->maxtime_to_auto);
+
+	/*set cpu state and hpte states as well scratch pad area */
+	printk(KERN_INFO " CPU AREA \n");
+	printk(KERN_INFO "cpu dump_flags =%d\n",ph->cpu_data.dump_flags);
+	printk(KERN_INFO "cpu source_type =%d\n",ph->cpu_data.source_type);
+	printk(KERN_INFO "cpu error_flags =%d\n",ph->cpu_data.error_flags);
+	printk(KERN_INFO "cpu source_address =%lx\n",ph->cpu_data.source_address);
+	printk(KERN_INFO "cpu source_length =%lx\n",ph->cpu_data.source_length);
+	printk(KERN_INFO "cpu length_copied =%lx\n",ph->cpu_data.length_copied);
+
+	printk(KERN_INFO " HPTE AREA \n");
+	printk(KERN_INFO "HPTE dump_flags =%d\n",ph->hpte_data.dump_flags);
+	printk(KERN_INFO "HPTE source_type =%d\n",ph->hpte_data.source_type);
+	printk(KERN_INFO "HPTE error_flags =%d\n",ph->hpte_data.error_flags);
+	printk(KERN_INFO "HPTE source_address =%lx\n",ph->hpte_data.source_address);
+	printk(KERN_INFO "HPTE source_length =%lx\n",ph->hpte_data.source_length);
+	printk(KERN_INFO "HPTE length_copied =%lx\n",ph->hpte_data.length_copied);
+
+	printk(KERN_INFO " SRSD AREA \n");
+	printk(KERN_INFO "SRSD dump_flags =%d\n",ph->kernel_data.dump_flags);
+	printk(KERN_INFO "SRSD source_type =%d\n",ph->kernel_data.source_type);
+	printk(KERN_INFO "SRSD error_flags =%d\n",ph->kernel_data.error_flags);
+	printk(KERN_INFO "SRSD source_address =%lx\n",ph->kernel_data.source_address);
+	printk(KERN_INFO "SRSD source_length =%lx\n",ph->kernel_data.source_length);
+	printk(KERN_INFO "SRSD length_copied =%lx\n",ph->kernel_data.length_copied);
+}
+#endif
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
 	int rc;
@@ -154,6 +199,9 @@ static void register_dump_area(struct ph
 	if (rc)
 	{
 		printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+#ifdef DEBUG
+		print_dump_header (ph);
+#endif
 	}
 }
 
@@ -292,6 +340,9 @@ static int __init phyp_dump_setup(void)
 		register_dump_area (&phdr, dump_area_start);
 		goto release_mem;
 	}
+#ifdef DEBUG
+	print_dump_header (dump_header);
+#endif
 
 	/* Don't allow user to release the 256MB scratch area */
 	phyp_dump_info->init_reserve_size = free_area_length;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-11-21 22:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-21 22:36 [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump Linas Vepstas
2007-11-21 22:37 ` [PATCH/RFC 1/6]: phyp dump: Documentation Linas Vepstas
2007-11-21 22:39 ` [PATCH/RFC 2/6]: phyp dump: config file Linas Vepstas
2007-11-21 22:40 ` [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept Linas Vepstas
2007-11-21 22:41 ` [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem Linas Vepstas
2007-11-21 22:43 ` [PATCH/RFC 5/6]: phyp dump: register the dump area Linas Vepstas
2007-11-21 22:45 ` [PATCH/RFC 6/6]: phyp dump: debugging print routines Linas Vepstas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).