LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: Sequoia NAND
From: Josh Boyer @ 2008-02-17 13:18 UTC (permalink / raw)
  To: Steve Heflin; +Cc: linuxppc-embedded
In-Reply-To: <20080217113932.02EBDDE127@ozlabs.org>

On Sun, 17 Feb 2008 06:39:43 -0500
Steve Heflin <sheflin@newagemicro.com> wrote:

> I don't see where the new powerpc architecture and drivers/mtd/nand 
> contains support for the NAND chip on the Sequoia platform.  We have 

It's not.

> Is the Sequoia's NAND Flash Controller supported in the current Linux-2.6.25?

No.

There are a couple different patches floating around for it, all of
which need work.  The driver is drivers/mtd/nand/ndfc.c and work needs
to be done to parse a device tree and present the proper platform
devices so that driver will work.  Stefan has something like this
somewhere, I've just been lax in getting it into my tree.

Too late for .25, but I have 4 boards that use this now so I'll be a
bit more interested in getting it into .26.

josh

^ permalink raw reply

* Re: Sequoia NAND - others missing?
From: Steve Heflin @ 2008-02-17 15:27 UTC (permalink / raw)
  To: linuxppc-embedded, Josh Boyer; +Cc: linuxppc-embedded
In-Reply-To: <20080217071851.0edbf07f@vader.jdub.homelinux.org>

Are there other devices (beside the NAND Flash Controller) that exist 
on the AMCC-440EPx chip and are not supported by the current 
Linux-2.6.25 ARCH=powerpc?

thanks for your help,
Steve

At 08:18 AM 2/17/2008, Josh Boyer wrote:
>On Sun, 17 Feb 2008 06:39:43 -0500
>Steve Heflin <sheflin@newagemicro.com> wrote:
>
> > I don't see where the new powerpc architecture and drivers/mtd/nand
> > contains support for the NAND chip on the Sequoia platform.  We have
>
>It's not.
>
> > Is the Sequoia's NAND Flash Controller supported in the current 
> Linux-2.6.25?
>
>No.
>
>There are a couple different patches floating around for it, all of
>which need work.  The driver is drivers/mtd/nand/ndfc.c and work needs
>to be done to parse a device tree and present the proper platform
>devices so that driver will work.  Stefan has something like this
>somewhere, I've just been lax in getting it into my tree.
>
>Too late for .25, but I have 4 boards that use this now so I'll be a
>bit more interested in getting it into .26.
>
>josh

^ permalink raw reply

* Re: Sequoia NAND - others missing?
From: Steve Heflin @ 2008-02-17 15:27 UTC (permalink / raw)
  To: linuxppc-embedded, Josh Boyer; +Cc: linuxppc-embedded
In-Reply-To: <20080217071851.0edbf07f@vader.jdub.homelinux.org>

Are there other devices (beside the NAND Flash Controller) that exist 
on the AMCC-440EPx chip and are not supported by the current 
Linux-2.6.25 ARCH=powerpc?

thanks for your help,
Steve

At 08:18 AM 2/17/2008, Josh Boyer wrote:
>On Sun, 17 Feb 2008 06:39:43 -0500
>Steve Heflin <sheflin@newagemicro.com> wrote:
>
> > I don't see where the new powerpc architecture and drivers/mtd/nand
> > contains support for the NAND chip on the Sequoia platform.  We have
>
>It's not.
>
> > Is the Sequoia's NAND Flash Controller supported in the current 
> Linux-2.6.25?
>
>No.
>
>There are a couple different patches floating around for it, all of
>which need work.  The driver is drivers/mtd/nand/ndfc.c and work needs
>to be done to parse a device tree and present the proper platform
>devices so that driver will work.  Stefan has something like this
>somewhere, I've just been lax in getting it into my tree.
>
>Too late for .25, but I have 4 boards that use this now so I'll be a
>bit more interested in getting it into .26.
>
>josh

^ permalink raw reply

* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
From: Jens Axboe @ 2008-02-17 19:29 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: Dhaval Giani, Linux Kernel Mailing List, Srivatsa Vaddagiri,
	linuxppc-dev, Ingo Molnar, Balbir Singh
In-Reply-To: <47B67E5E.4010001@linux.vnet.ibm.com>

On Sat, Feb 16 2008, Kamalesh Babulal wrote:
> Hi,
> 
> The softlockup is seen from 2.6.25-rc1-git{1,3} and is visible in the 2.6.24-rc2 kernel,
> While booting up with the 2.6.25-rc1-git{1,3} and 2.6.25-rc2 kernel(s) on the powerbox
> 
> Loading st.ko module
> BUG: soft lockup - CPU#1 stuck for 61s! [insmod:379]
> NIP: c0000000001b0620 LR: c0000000001a5dcc CTR: 0000000000000040
> REGS: c00000077caab8a0 TRAP: 0901   Not tainted  (2.6.25-rc2-autotest)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 84004088  XER: 20000000
> TASK = c00000077cb450a0[379] 'insmod' THREAD: c00000077caa8000 CPU: 1
> GPR00: c00000077c9d4000 c00000077caabb20 c000000000538a40 000000000000000b 
> GPR04: ffc0000000000000 c00000077e0c0000 0000000000000036 000000000000000a 
> GPR08: 0040000000000000 c00000077c9d4250 c000000000000000 0000000000000000 
> GPR12: c00000077c9d4230 c000000000481d00 
> NIP [c0000000001b0620] .radix_tree_gang_lookup+0x100/0x1e4
> LR [c0000000001a5dcc] .call_for_each_cic+0x50/0x10c
> Call Trace:
> [c00000077caabb20] [c0000000001a5e2c] .call_for_each_cic+0xb0/0x10c (unreliable)
> [c00000077caabc60] [c00000000019dba4] .exit_io_context+0xf0/0x110
> [c00000077caabcf0] [c000000000061e38] .do_exit+0x820/0x850
> [c00000077caabda0] [c000000000061f34] .do_group_exit+0xcc/0xe8
> [c00000077caabe30] [c00000000000872c] syscall_exit+0x0/0x40
> Instruction dump:
> 7d296214 39290018 e8090000 7caa2038 39290008 2fa00000 409e0018 7caa4215 
> 396b0001 418200cc 424000b8 4bffffdc <79691f24> 7d296214 e9690018 2fab0000 

It's odd stuff. Could you perhaps try and add some printks to
block/cfq-iosched.c:call_for_each_cic(), like dumping the 'nr' return
from radix_tree_gang_lookup() and the pointer value of cics[i] in the
for() loop after the lookup?

How many SCSI devices are online?

-- 
Jens Axboe

^ permalink raw reply

* Re: [BUG] Linux 2.6.25-rc2 - Regression from 2.6.24-rc1-git1 softlockup while bootup on powerpc
From: Rafael J. Wysocki @ 2008-02-17 20:08 UTC (permalink / raw)
  To: Kamalesh Babulal
  Cc: Dhaval Giani, Srivatsa Vaddagiri, Linux Kernel Mailing List,
	linuxppc-dev, Jens Axboe, Ingo Molnar, Balbir Singh
In-Reply-To: <47B67E5E.4010001@linux.vnet.ibm.com>

On Saturday, 16 of February 2008, Kamalesh Babulal wrote:
> Hi,

Hi,
 
> The softlockup is seen from 2.6.25-rc1-git{1,3} and is visible in the 2.6.24-rc2 kernel,
> While booting up with the 2.6.25-rc1-git{1,3} and 2.6.25-rc2 kernel(s) on the powerbox

Can you update the Bugzilla entry at:
http://bugzilla.kernel.org/show_bug.cgi?id=9948
with the above information, please?

Rafael


> Loading st.ko module
> BUG: soft lockup - CPU#1 stuck for 61s! [insmod:379]
> NIP: c0000000001b0620 LR: c0000000001a5dcc CTR: 0000000000000040
> REGS: c00000077caab8a0 TRAP: 0901   Not tainted  (2.6.25-rc2-autotest)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 84004088  XER: 20000000
> TASK = c00000077cb450a0[379] 'insmod' THREAD: c00000077caa8000 CPU: 1
> GPR00: c00000077c9d4000 c00000077caabb20 c000000000538a40 000000000000000b 
> GPR04: ffc0000000000000 c00000077e0c0000 0000000000000036 000000000000000a 
> GPR08: 0040000000000000 c00000077c9d4250 c000000000000000 0000000000000000 
> GPR12: c00000077c9d4230 c000000000481d00 
> NIP [c0000000001b0620] .radix_tree_gang_lookup+0x100/0x1e4
> LR [c0000000001a5dcc] .call_for_each_cic+0x50/0x10c
> Call Trace:
> [c00000077caabb20] [c0000000001a5e2c] .call_for_each_cic+0xb0/0x10c (unreliable)
> [c00000077caabc60] [c00000000019dba4] .exit_io_context+0xf0/0x110
> [c00000077caabcf0] [c000000000061e38] .do_exit+0x820/0x850
> [c00000077caabda0] [c000000000061f34] .do_group_exit+0xcc/0xe8
> [c00000077caabe30] [c00000000000872c] syscall_exit+0x0/0x40
> Instruction dump:
> 7d296214 39290018 e8090000 7caa2038 39290008 2fa00000 409e0018 7caa4215 
> 396b0001 418200cc 424000b8 4bffffdc <79691f24> 7d296214 e9690018 2fab0000 
> INFO: task insmod:387 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> insmod        D 000000001000e144 12144   387      1
> Call Trace:
> [c00000077cb97600] [c0000000008fae80] 0xc0000000008fae80 (unreliable)
> [c00000077cb977d0] [c000000000010c7c] .__switch_to+0x11c/0x154
> [c00000077cb97860] [c000000000344498] .schedule+0x5d0/0x6b0
> [c00000077cb97950] [c0000000003447d8] .schedule_timeout+0x3c/0xe8
> [c00000077cb97a20] [c000000000343d34] .wait_for_common+0x150/0x22c
> [c00000077cb97ae0] [c00000000008ef00] .__stop_machine_run+0xbc/0xf0
> [c00000077cb97bb0] [c00000000008ef70] .stop_machine_run+0x3c/0x80
> [c00000077cb97c50] [c0000000000891f0] .sys_init_module+0x14e4/0x1af4
> [c00000077cb97e30] [c00000000000872c] syscall_exit+0x0/0x40
> -- 0:conmux-control -- time-stamp -- Feb/15/08 16:04:12 --
> INFO: task insmod:387 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> insmod        D 000000001000e144 12144   387      1
> Call Trace:
> [c00000077cb97600] [c0000000008fae80] 0xc0000000008fae80 (unreliable)
> [c00000077cb977d0] [c000000000010c7c] .__switch_to+0x11c/0x154
> [c00000077cb97860] [c000000000344498] .schedule+0x5d0/0x6b0
> [c00000077cb97950] [c0000000003447d8] .schedule_timeout+0x3c/0xe8
> [c00000077cb97a20] [c000000000343d34] .wait_for_common+0x150/0x22c
> [c00000077cb97ae0] [c00000000008ef00] .__stop_machine_run+0xbc/0xf0
> [c00000077cb97bb0] [c00000000008ef70] .stop_machine_run+0x3c/0x80
> [c00000077cb97c50] [c0000000000891f0] .sys_init_module+0x14e4/0x1af4
> [c00000077cb97e30] [c00000000000872c] syscall_exit+0x0/0x40
> -- 0:conmux-control -- time-stamp -- Feb/15/08 16:06:21 --



-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply

* Re: Sequoia NAND - others missing?
From: Josh Boyer @ 2008-02-17 20:56 UTC (permalink / raw)
  To: Steve Heflin; +Cc: linuxppc-embedded
In-Reply-To: <47b8524c.1286460a.1307.7530SMTPIN_ADDED@mx.google.com>

On Sun, 17 Feb 2008 10:27:22 -0500
Steve Heflin <sheflin@newagemicro.com> wrote:

> Are there other devices (beside the NAND Flash Controller) that exist 
> on the AMCC-440EPx chip and are not supported by the current 
> Linux-2.6.25 ARCH=powerpc?

i2c, GPIO, the security stuff (if your version has that), and GPT.
Thought GPT has never really been supported in any kernel that I
remember.

Patches for i2c and GPIO are floating around somewhere I think.  Just
need to get them polished up and device-tree compliant.

josh

^ permalink raw reply

* Does anyone have a simple UIO driver that uses an interrupt handler?
From: Nick Droogh @ 2008-02-17 23:13 UTC (permalink / raw)
  To: linuxppc-embedded

[-- Attachment #1: Type: text/plain, Size: 244 bytes --]

Hi everyone,

I am looking for a driver example for a user space driver utilizing an 
interrupt handler.  I am having trouble registering a handler in my 
driver attempt and would like to see an example of a working UIO driver.

Thanks,

Nick


[-- Attachment #2: ndroogh.vcf --]
[-- Type: text/x-vcard, Size: 220 bytes --]

begin:vcard
fn:Nick Droogh
n:Droogh;Nick
org:CADlink Technology
adr:Suite 100;;2440 Don Reid Drive;Ottawa;Ontario;K1H 1E1;Canada
email;internet:ndroogh@cadlink.com
title:Senior Technical Architect
version:2.1
end:vcard


^ permalink raw reply

* [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
From: Manish Ahuja @ 2008-02-18  4:53 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas

The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides 
documentation explaining what this is  :-) . Yes, its supposed
to be an improvement over kdump.

A list of open issues / todo list is included in the documentation.
It also appears that the not-yet-released firmware versions this was tested 
on are still, ahem, incomplete; this work is also pending.

I have included most of the changes requested. Although, I did find
one or two, fixed in a later patch file rather than the first location
they appeared at.

Also it now does not block any memory on machines other than power6 boxes
which have the requisite firmware. This is from a power5 box.

from jal-lp6 a power5 machine.
.........
Phyp-dump not supported on this hardware
Using pSeries machine description
console [udbg-1] enabled
.......

I think I incorporated everyones comments so far. 


-- Manish & Linas.

^ permalink raw reply

* libfdt: More tests of NOP handling behaviour
From: David Gibson @ 2008-02-18  5:09 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: linuxppc-dev

In light of the recently discovered bug with NOP handling, this adds
some more testcases for NOP handling.  Specifically, it adds a helper
program which will add a NOP tag after every existing tag in a dtb,
and runs the standard battery of tests over trees mangled in this way.

For now, this does not add a NOP at the very beginning of the
structure block.  This causes problems for libfdt at present, because
we assume in many places that the root node's BEGIN_NODE tag is at
offset 0.  I'm still contemplating what to do about this (with one
option being simply to declare such dtbs invalid).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Index: dtc/tests/nopulate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ dtc/tests/nopulate.c	2008-02-14 17:01:10.000000000 +1100
@@ -0,0 +1,107 @@
+/*
+ * libfdt - Flat Device Tree manipulation
+ *	Testcase/tool for rearranging blocks of a dtb
+ * Copyright (C) 2006 David Gibson, IBM Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <limits.h>
+#include <stdint.h>
+
+#include <fdt.h>
+#include <libfdt.h>
+
+#include "tests.h"
+#include "testdata.h"
+
+int nopulate_struct(char *buf, const void *fdt)
+{
+	int offset, nextoffset = 0;
+	uint32_t tag;
+	char *p;
+
+	p = buf;
+
+	do {
+		offset = nextoffset;
+		tag = fdt_next_tag(fdt, offset, &nextoffset);
+
+		memcpy(p, fdt + fdt_off_dt_struct(fdt) + offset,
+		       nextoffset - offset);
+		p += nextoffset - offset;
+
+		*((uint32_t *)p) = cpu_to_fdt32(FDT_NOP);
+		p += FDT_TAGSIZE;
+
+	} while (tag != FDT_END);
+
+	return p - buf;
+}
+
+int main(int argc, char *argv[])
+{
+	void *fdt, *fdt2;
+	void *buf;
+	int newsize, struct_start, struct_end_old, struct_end_new, delta;
+	const char *inname;
+	char outname[PATH_MAX];
+
+	test_init(argc, argv);
+	if (argc != 2)
+		CONFIG("Usage: %s <dtb file>", argv[0]);
+
+	inname = argv[1];
+	fdt = load_blob(argv[1]);
+	sprintf(outname, "noppy.%s", inname);
+
+	if (fdt_version(fdt) < 17)
+		FAIL("Can't deal with version <17");
+
+	buf = xmalloc(2 * fdt_size_dt_struct(fdt));
+
+	newsize = nopulate_struct(buf, fdt);
+
+	verbose_printf("Nopulated structure block has new size %d\n", newsize);
+
+	/* Replace old strcutre block with the new */
+
+	fdt2 = xmalloc(fdt_totalsize(fdt) + newsize);
+
+	struct_start = fdt_off_dt_struct(fdt);
+	delta = newsize - fdt_size_dt_struct(fdt);
+	struct_end_old = struct_start + fdt_size_dt_struct(fdt);
+	struct_end_new = struct_start + newsize;
+
+	memcpy(fdt2, fdt, struct_start);
+	memcpy(fdt2 + struct_start, buf, newsize);
+	memcpy(fdt2 + struct_end_new, fdt + struct_end_old,
+	       fdt_totalsize(fdt) - struct_end_old);
+
+	fdt_set_totalsize(fdt2, fdt_totalsize(fdt) + delta);
+	fdt_set_size_dt_struct(fdt2, newsize);
+
+	if (fdt_off_mem_rsvmap(fdt) > struct_start)
+		fdt_set_off_mem_rsvmap(fdt2, fdt_off_mem_rsvmap(fdt) + delta);
+	if (fdt_off_dt_strings(fdt) > struct_start)
+		fdt_set_off_dt_strings(fdt2, fdt_off_dt_strings(fdt) + delta);
+
+	save_blob(outname, fdt2);
+
+	PASS();
+}
Index: dtc/tests/Makefile.tests
===================================================================
--- dtc.orig/tests/Makefile.tests	2008-02-14 16:49:55.000000000 +1100
+++ dtc/tests/Makefile.tests	2008-02-14 17:01:10.000000000 +1100
@@ -7,7 +7,7 @@
 	notfound \
 	setprop_inplace nop_property nop_node \
 	sw_tree1 \
-	move_and_save mangle-layout \
+	move_and_save mangle-layout nopulate \
 	open_pack rw_tree1 set_name setprop del_property del_node \
 	string_escapes references path-references \
 	dtbs_equal_ordered \
Index: dtc/tests/run_tests.sh
===================================================================
--- dtc.orig/tests/run_tests.sh	2008-02-14 16:49:55.000000000 +1100
+++ dtc/tests/run_tests.sh	2008-02-14 17:01:10.000000000 +1100
@@ -126,6 +126,13 @@
     tree1_tests rw_tree1.test.dtb
     tree1_tests_rw rw_tree1.test.dtb
 
+    for basetree in test_tree1.dtb sw_tree1.test.dtb rw_tree1.test.dtb; do
+	run_test nopulate $basetree
+	run_test dtbs_equal_ordered $basetree noppy.$basetree
+	tree1_tests noppy.$basetree
+	tree1_tests_rw noppy.$basetree
+    done
+
     # Tests for behaviour on various sorts of corrupted trees
     run_test truncated_property
 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* [PATCH 1/8] pseries: phyp dump: Documentation
From: Manish Ahuja @ 2008-02-18  5:34 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>


Basic documentation for hypervisor-assisted dump.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

----
 Documentation/powerpc/phyp-assisted-dump.txt |  127 +++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt	2008-02-18 03:22:33.000000000 -0600
@@ -0,0 +1,127 @@
+
+                   Hypervisor-Assisted Dump
+                   ------------------------
+                       November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+   with a fresh copy of the kernel.  In particular,
+   PCI and I/O devices have been reinitialized and are
+   in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+   immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+   required; the system will be fully usable, and running
+   in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+   the low 256MB of RAM to a previously registered
+   save region. It will also save system state, system
+   registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+   hypervisor will reset PCI and other hardware state.
+   It will *not* clear RAM. It will then launch the
+   bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+   is a new node (ibm,dump-kernel) in the device tree,
+   indicating that there is crash data available from
+   a previous boot. It will boot into only 256MB of RAM,
+   reserving the rest of system memory.
+
+-- Userspace tools will parse /sys/kernel/release_region
+   and read /proc/vmcore to obtain the contents of memory,
+   which holds the previous crashed kernel. The userspace
+   tools may copy this info to disk, or network, nas, san,
+   iscsi, etc. as desired.
+
+   For Example: the values in /sys/kernel/release-region
+   would look something like this (address-range pairs).
+   CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
+   DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
+
+-- As the userspace tools complete saving a portion of
+   dump, they echo an offset and size to
+   /sys/kernel/release_region to release the reserved
+   memory back to general use.
+
+   An example of this is:
+     "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+   which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on this particular machine. If it does, then
+we check to see if a active dump is waiting for us. If yes
+then everything but 256 MB of RAM is reserved during early
+boot. This area is released once we collect a dump from user
+land scripts that are run. If there is dump data, then
+the /sys/kernel/release_region file is created, and
+the reserved memory is held.
+
+If there is no waiting dump data, then only the highest
+256MB of the ram is reserved as a scratch area. This area
+is *not* be released: this region will be kept permanently
+reserved, so that it can act as a receptacle for a copy
+of the low 256MB in the case a crash does occur. See,
+however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The starting address
+to be read and the range for each data point in provided
+in /sys/kernel/release_region.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues/ToDo:
+------------
+ o The various code paths that tell the hypervisor that a crash
+   occurred, vs. it simply being a normal reboot, should be
+   reviewed, and possibly clarified/fixed.
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+   instead? There is a dump_subsys being created by the s390 code,
+   perhaps the pseries code should use a similar layout as well.
+
+ o Is reserving a 256MB region really required? The goal of
+   reserving a 256MB scratch area is to make sure that no
+   important crash data is clobbered when the hypervisor
+   save low mem to the scratch area. But, if one could assure
+   that nothing important is located in some 256MB area, then
+   it would not need to be reserved. Something that can be
+   improved in subsequent versions.
+
+ o Still working the kdump team to integrate this with kdump,
+   some work remains but this would not affect the current
+   patches.
+
+ o Still need to write a shell script, to copy the dump away.
+   Currently I am parsing it manually.

^ permalink raw reply

* [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
From: Manish Ahuja @ 2008-02-18  5:36 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>


Initial patch for reserving memory in early boot, and freeing it later.
If the previous boot had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

----
 arch/powerpc/kernel/prom.c                 |   49 ++++++++++++++++++++
 arch/powerpc/kernel/rtas.c                 |   34 +++++++++++++
 arch/powerpc/platforms/pseries/Makefile    |    1 
 arch/powerpc/platforms/pseries/phyp_dump.c |   71 +++++++++++++++++++++++++++++
 include/asm-powerpc/phyp_dump.h            |   38 +++++++++++++++
 include/asm-powerpc/rtas.h                 |    3 +
 6 files changed, 196 insertions(+)

Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h	2008-02-18 04:30:28.000000000 -0600
@@ -0,0 +1,38 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END   (1UL<<28)
+
+struct phyp_dump {
+	/* Memory that is reserved during very early boot. */
+	unsigned long init_reserve_start;
+	unsigned long init_reserve_size;
+	/* Check status during boot if dump supported, active & present*/
+	unsigned long phyp_dump_configured;
+	unsigned long phyp_dump_is_active;
+	/* store cpu & hpte size */
+	unsigned long cpu_state_size;
+	unsigned long hpte_region_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:32:17.000000000 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2008
+ * Copyright 2008 IBM Corp.
+ *
+ *      This program is free software; you can redistribute it and/or
+ *      modify it under the terms of the GNU General Public License
+ *      as published by the Free Software Foundation; either version
+ *      2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+#include <asm/machdep.h>
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+	struct page *rpage;
+	unsigned long end_pfn;
+	long i;
+
+	end_pfn = start_pfn + nr_pages;
+
+	for (i = start_pfn; i <= end_pfn; i++) {
+		rpage = pfn_to_page(i);
+		if (PageReserved(rpage)) {
+			ClearPageReserved(rpage);
+			init_page_count(rpage);
+			__free_page(rpage);
+			totalram_pages++;
+		}
+	}
+}
+
+static int __init phyp_dump_setup(void)
+{
+	unsigned long start_pfn, nr_pages;
+
+	/* If no memory was reserved in early boot, there is nothing to do */
+	if (phyp_dump_info->init_reserve_size == 0)
+		return 0;
+
+	/* Release memory that was reserved in early boot */
+	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+	release_memory_range(start_pfn, nr_pages);
+
+	return 0;
+}
+machine_subsys_initcall(pseries, phyp_dump_setup);
Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/Makefile	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/Makefile	2008-02-18 03:23:47.000000000 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug-cpu
 obj-$(CONFIG_HVC_CONSOLE)	+= hvconsole.o
 obj-$(CONFIG_HVCS)		+= hvcserver.o
 obj-$(CONFIG_HCALL_STATS)	+= hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP)	+= phyp_dump.o
Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/kernel/prom.c	2008-02-18 03:23:47.000000000 -0600
@@ -51,6 +51,7 @@
 #include <asm/machdep.h>
 #include <asm/pSeries_reconfig.h>
 #include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
 #include <asm/kexec.h>
 
 #ifdef DEBUG
@@ -1039,6 +1040,51 @@ static void __init early_reserve_mem(voi
 #endif
 }
 
+#ifdef CONFIG_PHYP_DUMP
+/**
+ * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine may reserve memory regions in the kernel only
+ * if the system is supported and a dump was taken in last
+ * boot instance or if the hardware is supported and the
+ * scratch area needs to be setup. In other instances it returns
+ * without reserving anything. The memory in case of dump being
+ * active is freed when the dump is collected (by userland tools).
+ */
+static void __init reserve_crashed_mem(void)
+{
+	unsigned long base, size;
+	if (!phyp_dump_info->phyp_dump_configured) {
+		printk(KERN_ERR "Phyp-dump not supported on this hardware\n");
+		return;
+	}
+
+	if (phyp_dump_info->phyp_dump_is_active) {
+		/* Reserve *everything* above RMR.Area freed by userland tools*/
+		base = PHYP_DUMP_RMR_END;
+		size = lmb_end_of_DRAM() - base;
+
+		/* XXX crashed_ram_end is wrong, since it may be beyond
+		 * the memory_limit, it will need to be adjusted. */
+		lmb_reserve(base, size);
+
+		phyp_dump_info->init_reserve_start = base;
+		phyp_dump_info->init_reserve_size = size;
+	} else {
+		size = phyp_dump_info->cpu_state_size +
+			phyp_dump_info->hpte_region_size +
+			PHYP_DUMP_RMR_END;
+		base = lmb_end_of_DRAM() - size;
+		lmb_reserve(base, size);
+		phyp_dump_info->init_reserve_start = base;
+		phyp_dump_info->init_reserve_size = size;
+	}
+}
+#else
+static inline void __init reserve_crashed_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP */
+
+
 void __init early_init_devtree(void *params)
 {
 	DBG(" -> early_init_devtree(%p)\n", params);
@@ -1050,6 +1096,8 @@ void __init early_init_devtree(void *par
 	/* Some machines might need RTAS info for debugging, grab it now. */
 	of_scan_flat_dt(early_init_dt_scan_rtas, NULL);
 #endif
+	/* scan tree to see if dump occured during last boot */
+	of_scan_flat_dt(early_init_dt_scan_phyp_dump, NULL);
 
 	/* Retrieve various informations from the /chosen node of the
 	 * device-tree, including the platform type, initrd location and
@@ -1071,6 +1119,7 @@ void __init early_init_devtree(void *par
 	reserve_kdump_trampoline();
 	reserve_crashkernel();
 	early_reserve_mem();
+	reserve_crashed_mem();
 
 	lmb_enforce_memory_limit(memory_limit);
 	lmb_analyze();
Index: 2.6.25-rc1/arch/powerpc/kernel/rtas.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/kernel/rtas.c	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/kernel/rtas.c	2008-02-18 03:23:47.000000000 -0600
@@ -39,6 +39,7 @@
 #include <asm/syscalls.h>
 #include <asm/smp.h>
 #include <asm/atomic.h>
+#include <asm/phyp_dump.h>
 
 struct rtas_t rtas = {
 	.lock = SPIN_LOCK_UNLOCKED
@@ -883,6 +884,39 @@ void __init rtas_initialize(void)
 #endif
 }
 
+int __init early_init_dt_scan_phyp_dump(unsigned long node,
+		const char *uname, int depth, void *data)
+{
+#ifdef CONFIG_PHYP_DUMP
+	const unsigned int *sizes;
+
+	phyp_dump_info->phyp_dump_configured = 0;
+	phyp_dump_info->phyp_dump_is_active = 0;
+
+	if (depth != 1 || strcmp(uname, "rtas") != 0)
+		return 0;
+
+	if (of_get_flat_dt_prop(node, "ibm,configure-kernel-dump", NULL))
+		phyp_dump_info->phyp_dump_configured++;
+
+	if (of_get_flat_dt_prop(node, "ibm,dump-kernel", NULL))
+		phyp_dump_info->phyp_dump_is_active++;
+
+	sizes = of_get_flat_dt_prop(node, "ibm,configure-kernel-dump-sizes",
+									NULL);
+	if (!sizes)
+		return 0;
+
+	if (sizes[0] == 1)
+		phyp_dump_info->cpu_state_size = *((unsigned long *)&sizes[1]);
+
+	if (sizes[3] == 2)
+		phyp_dump_info->hpte_region_size =
+						*((unsigned long *)&sizes[4]);
+#endif
+	return 1;
+}
+
 int __init early_init_dt_scan_rtas(unsigned long node,
 		const char *uname, int depth, void *data)
 {
Index: 2.6.25-rc1/include/asm-powerpc/rtas.h
===================================================================
--- 2.6.25-rc1.orig/include/asm-powerpc/rtas.h	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/include/asm-powerpc/rtas.h	2008-02-18 03:23:47.000000000 -0600
@@ -183,6 +183,9 @@ extern unsigned int rtas_busy_delay(int 
 
 extern int early_init_dt_scan_rtas(unsigned long node,
 		const char *uname, int depth, void *data);
+int early_init_dt_scan_phyp_dump(unsigned long node,
+		const char *uname, int depth, void *data);
+
 
 extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 

^ permalink raw reply

* [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
From: Manish Ahuja @ 2008-02-18  5:38 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>



Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space 
finishes dumping a section, it must release that memory
by writing to sysfs. For example,

  echo "0x40000000 0x10000000" > /sys/kernel/release_region

will release 256MB starting at the 1GB.  The released memory
becomes free for general use.

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |   81 +++++++++++++++++++++++++++--
 1 file changed, 76 insertions(+), 5 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:23:47.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:32:13.000000000 -0600
@@ -12,18 +12,23 @@
  */
 
 #include <linux/init.h>
+#include <linux/kobject.h>
 #include <linux/mm.h>
+#include <linux/of.h>
 #include <linux/pfn.h>
 #include <linux/swap.h>
+#include <linux/sysfs.h>
 
 #include <asm/page.h>
 #include <asm/phyp_dump.h>
 #include <asm/machdep.h>
+#include <asm/rtas.h>
 
 /* Global, used to communicate data between early boot and late boot */
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = &phyp_dump_global;
 
+/* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
  * @start_pfn: starting physical frame number
@@ -53,18 +58,84 @@ release_memory_range(unsigned long start
 	}
 }
 
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ *   "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ *   "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t store_release_region(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				const char *buf, size_t count)
 {
+	unsigned long start_addr, length, end_addr;
 	unsigned long start_pfn, nr_pages;
+	ssize_t ret;
+
+	ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+	if (ret != 2)
+		return -EINVAL;
+
+	/* Range-check - don't free any reserved memory that
+	 * wasn't reserved for phyp-dump */
+	if (start_addr < phyp_dump_info->init_reserve_start)
+		start_addr = phyp_dump_info->init_reserve_start;
+
+	end_addr = phyp_dump_info->init_reserve_start +
+			phyp_dump_info->init_reserve_size;
+	if (start_addr+length > end_addr)
+		length = end_addr - start_addr;
+
+	/* Release the region of memory assed in by user */
+	start_pfn = PFN_DOWN(start_addr);
+	nr_pages = PFN_DOWN(length);
+	release_memory_range(start_pfn, nr_pages);
+
+	return count;
+}
+
+static struct kobj_attribute rr = __ATTR(release_region, 0600,
+					 NULL, store_release_region);
+
+static int __init phyp_dump_setup(void)
+{
+	struct device_node *rtas;
+	const int *dump_header = NULL;
+	int header_len = 0;
+	int rc;
 
 	/* If no memory was reserved in early boot, there is nothing to do */
 	if (phyp_dump_info->init_reserve_size == 0)
 		return 0;
 
-	/* Release memory that was reserved in early boot */
-	start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
-	nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
-	release_memory_range(start_pfn, nr_pages);
+	/* Return if phyp dump not supported */
+	if (!phyp_dump_info->phyp_dump_configured)
+		return -ENOSYS;
+
+	/* Is there dump data waiting for us? */
+	rtas = of_find_node_by_path("/rtas");
+	if (rtas) {
+		dump_header = of_get_property(rtas, "ibm,kernel-dump",
+								&header_len);
+		of_node_put(rtas);
+	}
+
+	if (dump_header == NULL)
+		return 0;
+
+	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+	rc = sysfs_create_file(kernel_kobj, &rr.attr);
+	if (rc) {
+		printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
+									rc);
+		return 0;
+	}
 
 	return 0;
 }

^ permalink raw reply

* [PATCH 4/8] pseries: phyp dump: register dump area.
From: Manish Ahuja @ 2008-02-18  5:40 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>


Set up the actual dump header, register it with the hypervisor.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

------
 arch/powerpc/platforms/pseries/phyp_dump.c |  137 +++++++++++++++++++++++++++--
 1 file changed, 131 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:26:56.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:30:28.000000000 -0600
@@ -28,6 +28,117 @@
 static struct phyp_dump phyp_dump_global;
 struct phyp_dump *phyp_dump_info = &phyp_dump_global;
 
+static int ibm_configure_kernel_dump;
+/* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+	u32 dump_flags;
+	u16 source_type;
+	u16 error_flags;
+	u64 source_address;
+	u64 source_length;
+	u64 length_copied;
+	u64 destination_address;
+};
+
+struct phyp_dump_header {
+	u32 version;
+	u16 num_of_sections;
+	u16 status;
+
+	u32 first_offset_section;
+	u32 dump_disk_section;
+	u64 block_num_dd;
+	u64 num_of_blocks_dd;
+	u32 offset_dd;
+	u32 maxtime_to_auto;
+	/* No dump disk path string used */
+
+	struct dump_section cpu_data;
+	struct dump_section hpte_data;
+	struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO  0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+	unsigned long addr_offset = 0;
+
+	/* Set up the dump header */
+	ph->version = DUMP_HEADER_VERSION;
+	ph->num_of_sections = NUM_DUMP_SECTIONS;
+	ph->status = 0;
+
+	ph->first_offset_section =
+		(u32)offsetof(struct phyp_dump_header, cpu_data);
+	ph->dump_disk_section = 0;
+	ph->block_num_dd = 0;
+	ph->num_of_blocks_dd = 0;
+	ph->offset_dd = 0;
+
+	ph->maxtime_to_auto = 0; /* disabled */
+
+	/* The first two sections are mandatory */
+	ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+	ph->cpu_data.source_address = 0;
+	ph->cpu_data.source_length = phyp_dump_info->cpu_state_size;
+	ph->cpu_data.destination_address = addr_offset;
+	addr_offset += phyp_dump_info->cpu_state_size;
+
+	ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+	ph->hpte_data.source_address = 0;
+	ph->hpte_data.source_length = phyp_dump_info->hpte_region_size;
+	ph->hpte_data.destination_address = addr_offset;
+	addr_offset += phyp_dump_info->hpte_region_size;
+
+	/* This section describes the low kernel region */
+	ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+	ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+	ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+	ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+	ph->kernel_data.destination_address = addr_offset;
+	addr_offset += ph->kernel_data.source_length;
+
+	return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+	int rc;
+	ph->cpu_data.destination_address += addr;
+	ph->hpte_data.destination_address += addr;
+	ph->kernel_data.destination_address += addr;
+
+	do {
+		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+				1, ph, sizeof(struct phyp_dump_header));
+	} while (rtas_busy_delay(rc));
+
+	if (rc)
+		printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
+						"register\n", rc);
+}
+
 /* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -106,7 +217,9 @@ static struct kobj_attribute rr = __ATTR
 static int __init phyp_dump_setup(void)
 {
 	struct device_node *rtas;
-	const int *dump_header = NULL;
+ 	const struct phyp_dump_header *dump_header = NULL;
+ 	unsigned long dump_area_start;
+ 	unsigned long dump_area_length;
 	int header_len = 0;
 	int rc;
 
@@ -118,7 +231,13 @@ static int __init phyp_dump_setup(void)
 	if (!phyp_dump_info->phyp_dump_configured)
 		return -ENOSYS;
 
-	/* Is there dump data waiting for us? */
+	/* Is there dump data waiting for us? If there isn't,
+	 * then register a new dump area, and release all of
+	 * the rest of the reserved ram.
+	 *
+	 * The /rtas/ibm,kernel-dump rtas node is present only
+	 * if there is dump data waiting for us.
+	 */
 	rtas = of_find_node_by_path("/rtas");
 	if (rtas) {
 		dump_header = of_get_property(rtas, "ibm,kernel-dump",
@@ -126,17 +245,23 @@ static int __init phyp_dump_setup(void)
 		of_node_put(rtas);
 	}
 
-	if (dump_header == NULL)
+	dump_area_length = init_dump_header(&phdr);
+
+	/* align down */
+	dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
+
+	if (dump_header == NULL) {
+		register_dump_area(&phdr, dump_area_start);
 		return 0;
+	}
 
 	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 	rc = sysfs_create_file(kernel_kobj, &rr.attr);
-	if (rc) {
+	if (rc)
 		printk(KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n",
 									rc);
-		return 0;
-	}
 
+	/* ToDo: re-register the dump area, for next time. */
 	return 0;
 }
 machine_subsys_initcall(pseries, phyp_dump_setup);

^ permalink raw reply

* [PATCH 5/8] pseries: phyp dump: debugging print routines.
From: Manish Ahuja @ 2008-02-18  5:41 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>


Provide some basic debugging support.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>
-----

 arch/powerpc/platforms/pseries/phyp_dump.c |   61 ++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 2 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:30:53.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:25:19.000000000 -0600
@@ -122,6 +122,61 @@ static unsigned long init_dump_header(st
 	return addr_offset;
 }
 
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+#ifdef DEBUG
+	printk(KERN_INFO "dump header:\n");
+	/* setup some ph->sections required */
+	printk(KERN_INFO "version = %d\n", ph->version);
+	printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+	printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+	/* No ph->disk, so all should be set to 0 */
+	printk(KERN_INFO "Offset to first section 0x%x\n",
+						ph->first_offset_section);
+	printk(KERN_INFO "dump disk sections should be zero\n");
+	printk(KERN_INFO "dump disk section = %d\n", ph->dump_disk_section);
+	printk(KERN_INFO "block num = %ld\n", ph->block_num_dd);
+	printk(KERN_INFO "number of blocks = %ld\n", ph->num_of_blocks_dd);
+	printk(KERN_INFO "dump disk offset = %d\n", ph->offset_dd);
+	printk(KERN_INFO "Max auto time= %d\n", ph->maxtime_to_auto);
+
+	/*set cpu state and hpte states as well scratch pad area */
+	printk(KERN_INFO " CPU AREA \n");
+	printk(KERN_INFO "cpu dump_flags =%d\n", ph->cpu_data.dump_flags);
+	printk(KERN_INFO "cpu source_type =%d\n", ph->cpu_data.source_type);
+	printk(KERN_INFO "cpu error_flags =%d\n", ph->cpu_data.error_flags);
+	printk(KERN_INFO "cpu source_address =%lx\n",
+						ph->cpu_data.source_address);
+	printk(KERN_INFO "cpu source_length =%lx\n",
+						ph->cpu_data.source_length);
+	printk(KERN_INFO "cpu length_copied =%lx\n",
+						ph->cpu_data.length_copied);
+
+	printk(KERN_INFO " HPTE AREA \n");
+	printk(KERN_INFO "HPTE dump_flags =%d\n", ph->hpte_data.dump_flags);
+	printk(KERN_INFO "HPTE source_type =%d\n", ph->hpte_data.source_type);
+	printk(KERN_INFO "HPTE error_flags =%d\n", ph->hpte_data.error_flags);
+	printk(KERN_INFO "HPTE source_address =%lx\n",
+						ph->hpte_data.source_address);
+	printk(KERN_INFO "HPTE source_length =%lx\n",
+						ph->hpte_data.source_length);
+	printk(KERN_INFO "HPTE length_copied =%lx\n",
+						ph->hpte_data.length_copied);
+
+	printk(KERN_INFO " SRSD AREA \n");
+	printk(KERN_INFO "SRSD dump_flags =%d\n", ph->kernel_data.dump_flags);
+	printk(KERN_INFO "SRSD source_type =%d\n", ph->kernel_data.source_type);
+	printk(KERN_INFO "SRSD error_flags =%d\n", ph->kernel_data.error_flags);
+	printk(KERN_INFO "SRSD source_address =%lx\n",
+						ph->kernel_data.source_address);
+	printk(KERN_INFO "SRSD source_length =%lx\n",
+						ph->kernel_data.source_length);
+	printk(KERN_INFO "SRSD length_copied =%lx\n",
+						ph->kernel_data.length_copied);
+#endif
+}
+
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
 	int rc;
@@ -134,9 +189,11 @@ static void register_dump_area(struct ph
 				1, ph, sizeof(struct phyp_dump_header));
 	} while (rtas_busy_delay(rc));
 
-	if (rc)
+	if (rc) {
 		printk(KERN_ERR "phyp-dump: unexpected error (%d) on "
 						"register\n", rc);
+		print_dump_header(ph);
+	}
 }
 
 /* ------------------------------------------------- */
@@ -245,8 +302,8 @@ static int __init phyp_dump_setup(void)
 		of_node_put(rtas);
 	}
 
+	print_dump_header(dump_header);
 	dump_area_length = init_dump_header(&phdr);
-
 	/* align down */
 	dump_area_start = phyp_dump_info->init_reserve_start & PAGE_MASK;
 

^ permalink raw reply

* [PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
From: Manish Ahuja @ 2008-02-18  5:42 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>


Routines to 
a. invalidate dump 
b. Calculate region that is reserved and needs to be freed. This is 
   exported through sysfs interface.

Unregister has been removed for now as it wasn't being used.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
-----

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   83 ++++++++++++++++++++++++++---
 include/asm-powerpc/phyp_dump.h            |    3 +
 2 files changed, 80 insertions(+), 6 deletions(-)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:25:19.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 04:25:32.000000000 -0600
@@ -69,6 +69,10 @@ static struct phyp_dump_header phdr;
 #define DUMP_SOURCE_CPU 0x0001
 #define DUMP_SOURCE_HPTE 0x0002
 #define DUMP_SOURCE_RMO  0x0011
+#define DUMP_ERROR_FLAG 0x2000
+#define DUMP_TRIGGERED 0x4000
+#define DUMP_PERFORMED 0x8000
+
 
 /**
  * init_dump_header() - initialize the header declaring a dump
@@ -180,9 +184,15 @@ static void print_dump_header(const stru
 static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
 {
 	int rc;
-	ph->cpu_data.destination_address += addr;
-	ph->hpte_data.destination_address += addr;
-	ph->kernel_data.destination_address += addr;
+
+	/* Add addr value if not initialized before */
+	if (ph->cpu_data.destination_address == 0) {
+		ph->cpu_data.destination_address += addr;
+		ph->hpte_data.destination_address += addr;
+		ph->kernel_data.destination_address += addr;
+	}
+
+	/* ToDo Invalidate kdump and free memory range. */
 
 	do {
 		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
@@ -196,6 +206,30 @@ static void register_dump_area(struct ph
 	}
 }
 
+static
+void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr)
+{
+	int rc;
+
+	/* Add addr value if not initialized before */
+	if (ph->cpu_data.destination_address == 0) {
+		ph->cpu_data.destination_address += addr;
+		ph->hpte_data.destination_address += addr;
+		ph->kernel_data.destination_address += addr;
+	}
+
+	do {
+		rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+				2, ph, sizeof(struct phyp_dump_header));
+	} while (rtas_busy_delay(rc));
+
+	if (rc) {
+		printk(KERN_ERR "phyp-dump: unexpected error (%d) "
+						"on invalidate\n", rc);
+		print_dump_header(ph);
+	}
+}
+
 /* ------------------------------------------------- */
 /**
  * release_memory_range -- release memory previously lmb_reserved
@@ -206,8 +240,8 @@ static void register_dump_area(struct ph
  * lmb_reserved in early boot. The released memory becomes
  * available for genreal use.
  */
-static void
-release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+static
+void release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
 {
 	struct page *rpage;
 	unsigned long end_pfn;
@@ -268,8 +302,29 @@ static ssize_t store_release_region(stru
 	return count;
 }
 
+static ssize_t show_release_region(struct kobject *kobj,
+			struct kobj_attribute *attr, char *buf)
+{
+	u64 second_addr_range;
+
+	/* total reserved size - start of scratch area */
+	second_addr_range = phyp_dump_info->init_reserve_size -
+				phyp_dump_info->reserved_scratch_size;
+	return sprintf(buf, "CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx:"
+			    " DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n",
+		phdr.cpu_data.destination_address,
+		phdr.cpu_data.length_copied,
+		phdr.hpte_data.destination_address,
+		phdr.hpte_data.length_copied,
+		phdr.kernel_data.destination_address,
+		phdr.kernel_data.length_copied,
+		phyp_dump_info->init_reserve_start,
+		second_addr_range);
+}
+
 static struct kobj_attribute rr = __ATTR(release_region, 0600,
-					 NULL, store_release_region);
+					show_release_region,
+					store_release_region);
 
 static int __init phyp_dump_setup(void)
 {
@@ -312,6 +367,22 @@ static int __init phyp_dump_setup(void)
 		return 0;
 	}
 
+	/* re-register the dump area, if old dump was invalid */
+	if ((dump_header) && (dump_header->status & DUMP_ERROR_FLAG)) {
+		invalidate_last_dump(&phdr, dump_area_start);
+		register_dump_area(&phdr, dump_area_start);
+		return 0;
+	}
+
+	if (dump_header) {
+		phyp_dump_info->reserved_scratch_addr =
+				dump_header->cpu_data.destination_address;
+		phyp_dump_info->reserved_scratch_size =
+				dump_header->cpu_data.source_length +
+				dump_header->hpte_data.source_length +
+				dump_header->kernel_data.source_length;
+	}
+
 	/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
 	rc = sysfs_create_file(kernel_kobj, &rr.attr);
 	if (rc)
Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h
===================================================================
--- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h	2008-02-18 04:24:16.000000000 -0600
+++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h	2008-02-18 04:25:32.000000000 -0600
@@ -30,6 +30,9 @@ struct phyp_dump {
 	/* store cpu & hpte size */
 	unsigned long cpu_state_size;
 	unsigned long hpte_region_size;
+	/* previous scratch area values */
+	unsigned long reserved_scratch_addr;
+	unsigned long reserved_scratch_size;
 };
 
 extern struct phyp_dump *phyp_dump_info;

^ permalink raw reply

* [PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
From: Manish Ahuja @ 2008-02-18  5:44 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>


This patch tracks the size freed. For now it does a simple
rudimentary calculation of the ranges freed. The idea is
to keep it simple at the external shell script level and 
send in large chunks for now.

Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
-----

---
 arch/powerpc/platforms/pseries/phyp_dump.c |   35 +++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:31:22.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c	2008-02-18 03:31:30.000000000 -0600
@@ -260,6 +260,39 @@ void release_memory_range(unsigned long 
 	}
 }
 
+/**
+ * track_freed_range -- Counts the range being freed.
+ * Once the counter goes to zero, it re-registers dump for
+ * future use.
+ */
+static void
+track_freed_range(unsigned long addr, unsigned long length)
+{
+	static unsigned long scratch_area_size, reserved_area_size;
+
+	if (addr < phyp_dump_info->init_reserve_start)
+		return;
+
+	if ((addr >= phyp_dump_info->init_reserve_start) &&
+	    (addr <= phyp_dump_info->init_reserve_start +
+	     phyp_dump_info->init_reserve_size))
+		reserved_area_size += length;
+
+	if ((addr >= phyp_dump_info->reserved_scratch_addr) &&
+	    (addr <= phyp_dump_info->reserved_scratch_addr +
+	     phyp_dump_info->reserved_scratch_size))
+		scratch_area_size += length;
+
+	if ((reserved_area_size == phyp_dump_info->init_reserve_size) &&
+	    (scratch_area_size == phyp_dump_info->reserved_scratch_size)) {
+
+		invalidate_last_dump(&phdr,
+				phyp_dump_info->reserved_scratch_addr);
+		register_dump_area(&phdr,
+				phyp_dump_info->reserved_scratch_addr);
+	}
+}
+
 /* ------------------------------------------------- */
 /**
  * sysfs_release_region -- sysfs interface to release memory range.
@@ -284,6 +317,8 @@ static ssize_t store_release_region(stru
 	if (ret != 2)
 		return -EINVAL;
 
+	track_freed_range(start_addr, length);
+
 	/* Range-check - don't free any reserved memory that
 	 * wasn't reserved for phyp-dump */
 	if (start_addr < phyp_dump_info->init_reserve_start)

^ permalink raw reply

* [PATCH 8/8] pseries: phyp dump: config file
From: Manish Ahuja @ 2008-02-18  5:45 UTC (permalink / raw)
  To: ppc-dev, paulus, Linas Vepstas; +Cc: mahuja
In-Reply-To: <47B90F55.2080606@austin.ibm.com>



Add hypervisor-assisted dump to kernel config

Signed-off-by: Linas Vepstas <linasvepstas@gmail.com>

-----
 arch/powerpc/Kconfig |   11 +++++++++++
 1 file changed, 11 insertions(+)

Index: 2.6.25-rc1/arch/powerpc/Kconfig
===================================================================
--- 2.6.25-rc1.orig/arch/powerpc/Kconfig	2008-02-18 03:22:06.000000000 -0600
+++ 2.6.25-rc1/arch/powerpc/Kconfig	2008-02-18 03:22:45.000000000 -0600
@@ -306,6 +306,17 @@ config CRASH_DUMP
 
 	  Don't change this unless you know what you are doing.
 
+config PHYP_DUMP
+	bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+	depends on PPC_PSERIES && EXPERIMENTAL
+	default y
+	help
+	  Hypervisor-assisted dump is meant to be a kdump replacement
+	  offering robustness and speed not possible without system
+	  hypervisor assistence.
+
+	  If unsure, say "Y"
+
 config PPCBUG_NVRAM
 	bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
 	default y if PPC_PREP

^ permalink raw reply

* libfdt: Trivial cleanup for CHECK_HEADER)
From: David Gibson @ 2008-02-18  7:06 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: linuxppc-dev

Currently the CHECK_HEADER() macro is defined local to fdt_ro.c.
However, there are a handful of functions (fdt_move, rw_check_header,
fdt_open_into) from other files which could also use it (currently
they open-code something more-or-less identical).  Therefore, this
patch moves CHECK_HEADER() to libfdt_internal.h and uses it in those
places.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

---
 libfdt/fdt.c             |    5 +----
 libfdt/fdt_ro.c          |    7 -------
 libfdt/fdt_rw.c          |    8 ++------
 libfdt/libfdt_internal.h |    7 +++++++
 4 files changed, 10 insertions(+), 17 deletions(-)

Index: dtc/libfdt/fdt.c
===================================================================
--- dtc.orig/libfdt/fdt.c	2008-02-18 18:01:59.000000000 +1100
+++ dtc/libfdt/fdt.c	2008-02-18 18:02:01.000000000 +1100
@@ -184,10 +184,7 @@ const char *_fdt_find_string(const char 
 
 int fdt_move(const void *fdt, void *buf, int bufsize)
 {
-	int err = fdt_check_header(fdt);
-
-	if (err)
-		return err;
+	CHECK_HEADER(fdt);
 
 	if (fdt_totalsize(fdt) > bufsize)
 		return -FDT_ERR_NOSPACE;
Index: dtc/libfdt/fdt_ro.c
===================================================================
--- dtc.orig/libfdt/fdt_ro.c	2008-02-18 18:01:59.000000000 +1100
+++ dtc/libfdt/fdt_ro.c	2008-02-18 18:02:01.000000000 +1100
@@ -55,13 +55,6 @@
 
 #include "libfdt_internal.h"
 
-#define CHECK_HEADER(fdt) \
-	{ \
-		int err; \
-		if ((err = fdt_check_header(fdt)) != 0) \
-			return err; \
-	}
-
 static int nodename_eq(const void *fdt, int offset,
 		       const char *s, int len)
 {
Index: dtc/libfdt/libfdt_internal.h
===================================================================
--- dtc.orig/libfdt/libfdt_internal.h	2008-02-18 18:01:59.000000000 +1100
+++ dtc/libfdt/libfdt_internal.h	2008-02-18 18:02:01.000000000 +1100
@@ -58,6 +58,13 @@
 #define memeq(p, q, n)	(memcmp((p), (q), (n)) == 0)
 #define streq(p, q)	(strcmp((p), (q)) == 0)
 
+#define CHECK_HEADER(fdt) \
+	{ \
+		int err; \
+		if ((err = fdt_check_header(fdt)) != 0) \
+			return err; \
+	}
+
 uint32_t _fdt_next_tag(const void *fdt, int startoffset, int *nextoffset);
 const char *_fdt_find_string(const char *strtab, int tabsize, const char *s);
 int _fdt_node_end_offset(void *fdt, int nodeoffset);
Index: dtc/libfdt/fdt_rw.c
===================================================================
--- dtc.orig/libfdt/fdt_rw.c	2008-02-18 18:02:52.000000000 +1100
+++ dtc/libfdt/fdt_rw.c	2008-02-18 18:04:00.000000000 +1100
@@ -69,10 +69,8 @@ static int _blocks_misordered(const void
 
 static int rw_check_header(void *fdt)
 {
-	int err;
+	CHECK_HEADER(fdt);
 
-	if ((err = fdt_check_header(fdt)))
-		return err;
 	if (fdt_version(fdt) < 17)
 		return -FDT_ERR_BADVERSION;
 	if (_blocks_misordered(fdt, sizeof(struct fdt_reserve_entry),
@@ -399,9 +397,7 @@ int fdt_open_into(const void *fdt, void 
 	int newsize;
 	void *tmp;
 
-	err = fdt_check_header(fdt);
-	if (err)
-		return err;
+	CHECK_HEADER(fdt);
 
 	mem_rsv_size = (fdt_num_mem_rsv(fdt)+1)
 		* sizeof(struct fdt_reserve_entry);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* libfdt: Remove no longer used code from fdt_node_offset_by_compatible()
From: David Gibson @ 2008-02-18  7:09 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: linuxppc-dev

Since fdt_node_offset_by_compatible() was converted to the new
fdt_next_node() iterator, a chunk of initialization code became
redundant, but was not removed by oversight.  This patch cleans it up.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Index: dtc/libfdt/fdt_ro.c
===================================================================
--- dtc.orig/libfdt/fdt_ro.c	2008-02-18 18:02:01.000000000 +1100
+++ dtc/libfdt/fdt_ro.c	2008-02-18 18:06:33.000000000 +1100
@@ -453,20 +453,10 @@ int fdt_node_check_compatible(const void
 int fdt_node_offset_by_compatible(const void *fdt, int startoffset,
 				  const char *compatible)
 {
-	uint32_t tag;
-	int offset, nextoffset;
-	int err;
+	int offset, err;
 
 	CHECK_HEADER(fdt);
 
-	if (startoffset >= 0) {
-		tag = fdt_next_tag(fdt, startoffset, &nextoffset);
-		if (tag != FDT_BEGIN_NODE)
-			return -FDT_ERR_BADOFFSET;
-	} else {
-		nextoffset = 0;
-	}
-
 	/* FIXME: The algorithm here is pretty horrible: we scan each
 	 * property of a node in fdt_node_check_compatible(), then if
 	 * that didn't find what we want, we scan over them again

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply

* RE: [PATCH 5/6] Add OF-tree support to RapidIO controller driver.
From: Zhang Wei @ 2008-02-18  7:24 UTC (permalink / raw)
  To: Kumar Gala, Stephen Rothwell; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4A9328DB-F68C-4C09-B762-44FEA85DD12E@kernel.crashing.org>

=20

> -----Original Message-----
> From: Kumar Gala [mailto:galak@kernel.crashing.org]=20
>=20
>=20
> On Feb 4, 2008, at 11:44 PM, Stephen Rothwell wrote:
>=20
> >>
> >> +	aw =3D *(u32 *)of_get_property(dev->node, "#address-cells", =
NULL);
> >> +	sw =3D *(u32 *)of_get_property(dev->node, "#size-cells", NULL);
> >
> > What happens if either of these properties is missing?
>=20
> Should we add __must_check to of_get_property?
>=20

You are right, I'll add the checking here.

Thanks!
Wei.

^ permalink raw reply

* RE: [PATCH 4/6] Add multi mport support.
From: Zhang Wei @ 2008-02-18  7:33 UTC (permalink / raw)
  To: Matt Porter; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20080205162942.GB20177@gate.crashing.org>

Hi, Matt,

So glad to see you again!

> -----Original Message-----
> From: Matt Porter [mailto:mporter@kernel.crashing.org]=20
> On Thu, Jan 31, 2008 at 02:30:13PM +0800, Zhang Wei wrote:
> > > -----Original Message-----
> > > From: Kumar Gala [mailto:galak@kernel.crashing.org]=20
> > > when we have multiple ports are the device IDs on the=20
> ports intended =20
> > > to be unique only to a port or unique across all ports?
> > >=20
> > I consider each RIO controller will has its own network,=20
> the device IDs
> > should be
> > unique only in its port network.
>=20
> This is a bad assumption IMHO. It pushes policy on to the system
> designer of a RapidIO network.

I know it is a real bad assumption. However, the RIO initial ID is only
transported to
 driver by kernel parameter "riohdid", which can not distinguish the
multi
 rio controllers. It may be more better add a "rio-id" property in RIO
dts node, but
 the u-boot need some changes to support the rio-id assignment.

Cheers!
Wei.

^ permalink raw reply

* Re: [Linux-fbdev-devel] [PATCH 1/2] fb: add support for foreign endianness
From: Krzysztof Helt @ 2008-02-18  7:18 UTC (permalink / raw)
  To: linux-fbdev-devel
  Cc: adaplas, linux-kernel, linuxppc-dev, Geert Uytterhoeven,
	Andrew Morton
In-Reply-To: <Pine.LNX.4.64.0802171030330.6848@anakin>

On Sun, 17 Feb 2008 10:44:32 +0100 (CET)
Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> On Fri, 15 Feb 2008, Anton Vorontsov wrote:
> > On Thu, Feb 14, 2008 at 10:49:42PM -0800, Andrew Morton wrote:
> > > On Tue, 5 Feb 2008 18:44:32 +0300 Anton Vorontsov <avorontsov@ru.mvista.com> wrote:

> > > Actually...  should CONFIG_FB_FOREIGN_ENDIAN exist, or should this feature
> > > be permanently enabled?
> > 
(...)
> 
> The notion of `FOREIGN_ENDIAN' is relative, as it depends on the
> architecture you're compiling for.
> 
> Suppose you have a PCI graphics card with a frame buffer that's always
> big endian. When compiling for a big endian platform, the driver won't
> depend on FB_FOREIGN_ENDIAN. When compiling for a little endian
> platform, it will.
> 
> Shouldn't we add LITTLE_ENDIAN and BIG_ENDIAN Kconfig vars first, just
> like we have 64BIT?
> 

I disagree here. The FOREIGN_ENDIAN is enough. It is determined only by
graphics chip endianess and CPU (arch) endianess.
I know two fb drivers which use endianess information (pm2fb and s3c2410fb).
Both resolve endianess at driver level. Actually, both handle it by setting special
bits so the graphics chip itself reorder bytes to transform foreign endianess. 
I understand that this patch is for chips which cannot reorder bytes by themselves.

So the FOREIGN_ENDIANESS flag should be set by the driver if it is needed
(if the graphics chip is BE and CPU is LE a simple #ifdef will add the flag).

> 
> I'd like to handle this in Kconfig (cfr. above).
> 

Again, it is possible. It is enough to put one rule which enables
the FOREIGN_ENDIAN if the architecture endianess is "foreign"
for the driver. The advantage here is that it  can be set only
for drivers which need it (as some driver can handle it without
this code). It should be hidden option set only internally if needed
(no user selectable).

I tested this patch on the s3c2410fb with disabled byte order
corrections by the graphics chip itself. It worked for 8-bit depth
but not for 16-bit depth (pixel position seemed ok but wrong tux'
colors). I will investigate. The s3c2410fb is BE and the kernel
was arm LE.

I would like to extend this patch to fb depths below 8-bit. The
s3c2410fb cannot handle this correctly with LE kernel.

Kind regards,
Krzysztof

----------------------------------------------------------------------
Masz ostatnia szanse !
Sprawdz >>> http://link.interia.pl/f1d02  

^ permalink raw reply

* "make ARCH=ppc defconfig" fails looking for common_defconfig
From: Robert P. J. Day @ 2008-02-18  9:32 UTC (permalink / raw)
  To: Linux PPC Mailing List


  (AIUI, the entire ppc/ architecture is going away, yes?  which means
i probably shouldn't care about any errors within.  is that correct?
even so, build errors should probably still be avoided for now.)

$ make ARCH=ppc defconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/basic/docproc
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/kxgettext.o
  SHIPPED scripts/kconfig/zconf.tab.c
  SHIPPED scripts/kconfig/lex.zconf.c
  SHIPPED scripts/kconfig/zconf.hash.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'common_defconfig'
***
*** Can't find default configuration "arch/ppc/configs/common_defconfig"!
***
make[1]: *** [defconfig] Error 1
make: *** [defconfig] Error 2



rday
--

========================================================================
Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry:
    Have classroom, will lecture.

http://crashcourse.ca                          Waterloo, Ontario, CANADA
========================================================================

^ permalink raw reply

* Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
From: Jan-Bernd Themann @ 2008-02-18  9:56 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Q Klein, ossthema, Greg KH, apw, linux-kernel,
	linuxppc-dev, Christoph Raisch, Badari Pulavarty, netdev, tklein
In-Reply-To: <1203094538.8142.23.camel@nimitz.home.sr71.net>

[-- Attachment #1: Type: text/plain, Size: 2256 bytes --]

Dave Hansen <haveblue@us.ibm.com> wrote on 15.02.2008 17:55:38:

> I've been thinking about that, and I don't think you really *need* to
> keep a comprehensive map like that. 
> 
> When the memory is in a particular configuration (range of memory
> present along with unique set of holes) you get a unique ehea_bmap
> configuration.  That layout is completely predictable.
> 
> So, if at any time you want to figure out what the ehea_bmap address for
> a particular *Linux* virtual address is, you just need to pretend that
> you're creating the entire ehea_bmap, use the same algorithm and figure
> out host you would have placed things, and use that result.
> 
> Now, that's going to be a slow, crappy linear search (but maybe not as
> slow as recreating the silly thing).  So, you might eventually run into
> some scalability problems with a lot of packets going around.  But, I'd
> be curious if you do in practice.

Up to 14 addresses translation per packet (sg_list) might be required on 
the
transmit side. On receive side it is only 1. Most packets require only 
very few
translations (1 or sometimes more)  translations. However, with more then 
700.000 
packets per second this approach does not seem reasonable from performance
perspective when memory is fragmented as you described.

> 
> The other idea is that you create a mapping that is precisely 1:1 with
> kernel memory.  Let's say you have two sections present, 0 and 100.  You
> have a high_section_index of 100, and you vmalloc() a 100 entry array.
> 
> You need to create a *CONTIGUOUS* ehea map?  Create one like this:
> 
> EHEA_VADDR->Linux Section
> 0->0
> 1->0
> 2->0
> 3->0
> ...
> 100->100
> 
> It's contiguous.  Each area points to a valid Linux memory address.
> It's also discernable in O(1) to what EHEA address a given Linux address
> is mapped.  You just have a couple of duplicate entries. 

This has a serious issues with constraint I mentions in the previous mail: 

"- MRs can have a maximum size of the memory available under linux"

The requirement is not met that the memory region must not be 
larger then the available memory for that partition. The "create MR" 
H_CALL
will fails (we tried this and discussed with FW development)


Regards,
Jan-Bernd & Christoph

[-- Attachment #2: Type: text/html, Size: 4038 bytes --]

^ permalink raw reply

* Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier
From: Jan-Bernd Themann @ 2008-02-18 10:00 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Thomas Q Klein, ossthema, Jan-Bernd Themann, Greg KH, Dave Hansen,
	apw, linux-kernel, Christoph Raisch, Badari Pulavarty, netdev,
	tklein
In-Reply-To: <1203094538.8142.23.camel@nimitz.home.sr71.net>

switching to proper mail client...

Dave Hansen <haveblue@us.ibm.com> wrote on 15.02.2008 17:55:38:

> I've been thinking about that, and I don't think you really *need* to
> keep a comprehensive map like that.=20
>=20
> When the memory is in a particular configuration (range of memory
> present along with unique set of holes) you get a unique ehea_bmap
> configuration. =A0That layout is completely predictable.
>=20
> So, if at any time you want to figure out what the ehea_bmap address for
> a particular *Linux* virtual address is, you just need to pretend that
> you're creating the entire ehea_bmap, use the same algorithm and figure
> out host you would have placed things, and use that result.
>=20
> Now, that's going to be a slow, crappy linear search (but maybe not as
> slow as recreating the silly thing). =A0So, you might eventually run into
> some scalability problems with a lot of packets going around. =A0But, I'd
> be curious if you do in practice.

Up to 14 addresses translation per packet (sg_list) might be required on=20
the transmit side. On receive side it is only 1. Most packets require only=
=20
very few translations (1 or sometimes more) =A0translations. However,=20
with more then 700.000 packets per second this approach does not seem=20
reasonable from performance perspective when memory is fragmented as you
described.

>=20
> The other idea is that you create a mapping that is precisely 1:1 with
> kernel memory. =A0Let's say you have two sections present, 0 and 100. =A0=
You
> have a high_section_index of 100, and you vmalloc() a 100 entry array.
>=20
> You need to create a *CONTIGUOUS* ehea map? =A0Create one like this:
>=20
> EHEA_VADDR->Linux Section
> 0->0
> 1->0
> 2->0
> 3->0
> ...
> 100->100
>=20
> It's contiguous. =A0Each area points to a valid Linux memory address.
> It's also discernable in O(1) to what EHEA address a given Linux address
> is mapped. =A0You just have a couple of duplicate entries.=20

This has a serious issues with constraint I mentions in the previous mail:=
=20

"- MRs can have a maximum size of the memory available under linux"

The requirement is not met that the memory region must not be=20
larger then the available memory for that partition. The "create MR"=20
H_CALL will fails (we tried this and discussed with FW development)


Regards,
Jan-Bernd & Christoph

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox