* Re: [PATCH 1/8] ibm_newemac: Fix possible lockup on close
From: Benjamin Herrenschmidt @ 2007-11-21 19:53 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: netdev, jgarzik, linuxppc-dev
In-Reply-To: <20071121154123.GB23589@lst.de>
On Wed, 2007-11-21 at 16:41 +0100, Christoph Hellwig wrote:
> On Wed, Nov 21, 2007 at 05:06:39PM +1100, Benjamin Herrenschmidt wrote:
> > It's a bad idea to call flush_scheduled_work from within a
> > netdev->stop because the linkwatch will occasionally take the
> > rtnl lock from a workqueue context, and thus that can deadlock.
> >
> > This reworks things a bit in that area to avoid the problem.
>
> So from the name of the driver you want to keep the previous emac
> driver around. Is there a good reason for that?
Until arch/ppc is gone... the previous driver works with arch/ppc the
new one with arch/powerpc.
If we kill arch/ppc in .25, then we'll remove the old driver and rename
the new one. If not, that will wait til .26
I'm hard at work porting as much of 4xx over I can to get to the point
where we -can- kill arch/ppc but I'm not done yet.
Cheers,
Ben.
^ permalink raw reply
* [PATCH] PPC: fix missed increment on device interface counter
From: Cyrill Gorcunov @ 2007-11-21 19:56 UTC (permalink / raw)
To: Kumar Gala; +Cc: PPCML, LKML
From: Cyrill Gorcunov <gorcunov@gmail.com>
This patch adds simple increment on device interface counter
(it seems to be accidently missed)
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
arch/powerpc/platforms/pasemi/electra_ide.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/platforms/pasemi/electra_ide.c b/arch/powerpc/platforms/pasemi/electra_ide.c
index 12fb0c9..8e73086 100644
--- a/arch/powerpc/platforms/pasemi/electra_ide.c
+++ b/arch/powerpc/platforms/pasemi/electra_ide.c
@@ -42,7 +42,7 @@ static int __devinit electra_ide_init(void)
np = of_find_compatible_node(NULL, "ide", "electra-ide");
i = 0;
- while (np && i < MAX_IFS) {
+ while (np && i++ < MAX_IFS) {
memset(r, 0, sizeof(r));
/* pata_platform wants two address ranges: one for the base registers,
^ permalink raw reply related
* Re: [RFC/PATCH 5/14] powerpc: Fix 440/440A machine check handling
From: Benjamin Herrenschmidt @ 2007-11-21 20:05 UTC (permalink / raw)
To: Josh Boyer; +Cc: linuxppc-dev
In-Reply-To: <20071121135109.73b9e98f@weaponx>
On Wed, 2007-11-21 at 13:51 -0600, Josh Boyer wrote:
> > Hrm... it's per processor, not per board. I didn't feel like digging
> > which board uses which processor and go fixup all the ppc_md's
>
> Sounds like something a generic function could probe for from the DTS.
> I'll look at doing something here when I start making 44x
> multiplatform
> (soon).
Well... we already probe the CPU type.... from cputable.
So if there was a place to put that, it would be the cputable.
Ben.
^ permalink raw reply
* [RFC] PPC: convert for(...) cycles into for_each... form
From: Cyrill Gorcunov @ 2007-11-21 20:09 UTC (permalink / raw)
To: PPCML; +Cc: Paul Mackerras, LKML
From: Cyrill Gorcunov <gorcunov@gmail.com>
This patch does convert cyclic calls to of_find_compatible_node()
and of_find_node_by_type() into appropriate macroses. It does reduce
code a bit.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
WARNING: I've no PowerPC to test it - please reiew the patch
closely. Thanks.
arch/powerpc/kernel/legacy_serial.c | 8 ++++----
arch/powerpc/platforms/82xx/pq2.c | 4 ++--
arch/powerpc/platforms/celleb/scc_sio.c | 5 ++---
arch/powerpc/platforms/powermac/low_i2c.c | 3 +--
arch/powerpc/sysdev/mv64x60_pci.c | 4 ++--
arch/powerpc/sysdev/mv64x60_udbg.c | 4 ++--
arch/powerpc/sysdev/uic.c | 17 +++++------------
7 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/legacy_serial.c b/arch/powerpc/kernel/legacy_serial.c
index 4ed5887..b5dc646 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -307,7 +307,7 @@ void __init find_legacy_serial_ports(void)
}
/* First fill our array with SOC ports */
- for (np = NULL; (np = of_find_compatible_node(np, "serial", "ns16550")) != NULL;) {
+ for_each_compatible_node(np, "serial", "ns16550") {
struct device_node *soc = of_get_parent(np);
if (soc && !strcmp(soc->type, "soc")) {
index = add_legacy_soc_port(np, np);
@@ -318,7 +318,7 @@ void __init find_legacy_serial_ports(void)
}
/* First fill our array with ISA ports */
- for (np = NULL; (np = of_find_node_by_type(np, "serial"));) {
+ for_each_node_by_type(np, "serial") {
struct device_node *isa = of_get_parent(np);
if (isa && !strcmp(isa->name, "isa")) {
index = add_legacy_isa_port(np, isa);
@@ -329,7 +329,7 @@ void __init find_legacy_serial_ports(void)
}
/* First fill our array with tsi-bridge ports */
- for (np = NULL; (np = of_find_compatible_node(np, "serial", "ns16550")) != NULL;) {
+ for_each_compatible_node(np, "serial", "ns16550") {
struct device_node *tsi = of_get_parent(np);
if (tsi && !strcmp(tsi->type, "tsi-bridge")) {
index = add_legacy_soc_port(np, np);
@@ -340,7 +340,7 @@ void __init find_legacy_serial_ports(void)
}
/* First fill our array with opb bus ports */
- for (np = NULL; (np = of_find_compatible_node(np, "serial", "ns16550")) != NULL;) {
+ for_each_compatible_node(np, "serial", "ns16550") {
struct device_node *opb = of_get_parent(np);
if (opb && (!strcmp(opb->type, "opb") ||
of_device_is_compatible(opb, "ibm,opb"))) {
diff --git a/arch/powerpc/platforms/82xx/pq2.c b/arch/powerpc/platforms/82xx/pq2.c
index a497cba..9e74393 100644
--- a/arch/powerpc/platforms/82xx/pq2.c
+++ b/arch/powerpc/platforms/82xx/pq2.c
@@ -72,11 +72,11 @@ err:
void __init pq2_init_pci(void)
{
- struct device_node *np = NULL;
+ struct device_node *np;
ppc_md.pci_exclude_device = pq2_pci_exclude_device;
- while ((np = of_find_compatible_node(np, NULL, "fsl,pq2-pci")))
+ for_each_compatible_node(np, NULL, "fsl,pq2-pci")
pq2_pci_add_bridge(np);
}
#endif
diff --git a/arch/powerpc/platforms/celleb/scc_sio.c b/arch/powerpc/platforms/celleb/scc_sio.c
index 6100082..5e43bac 100644
--- a/arch/powerpc/platforms/celleb/scc_sio.c
+++ b/arch/powerpc/platforms/celleb/scc_sio.c
@@ -42,14 +42,13 @@ static struct {
static int __init txx9_serial_init(void)
{
extern int early_serial_txx9_setup(struct uart_port *port);
- struct device_node *node = NULL;
+ struct device_node *node;
int i;
struct uart_port req;
struct of_irq irq;
struct resource res;
- while ((node = of_find_compatible_node(node,
- "serial", "toshiba,sio-scc")) != NULL) {
+ for_each_compatible_node(node, "serial", "toshiba,sio-scc") {
for (i = 0; i < ARRAY_SIZE(txx9_scc_tab); i++) {
if (!(txx9_serial_bitmap & (1<<i)))
continue;
diff --git a/arch/powerpc/platforms/powermac/low_i2c.c b/arch/powerpc/platforms/powermac/low_i2c.c
index da2007e..864fbf4 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -585,8 +585,7 @@ static void __init kw_i2c_probe(void)
struct device_node *np, *child, *parent;
/* Probe keywest-i2c busses */
- for (np = NULL;
- (np = of_find_compatible_node(np, "i2c","keywest-i2c")) != NULL;){
+ for_each_compatible_node(np, "i2c","keywest-i2c") {
struct pmac_i2c_host_kw *host;
int multibus, chans, i;
diff --git a/arch/powerpc/sysdev/mv64x60_pci.c b/arch/powerpc/sysdev/mv64x60_pci.c
index 6933f9c..d21ab8f 100644
--- a/arch/powerpc/sysdev/mv64x60_pci.c
+++ b/arch/powerpc/sysdev/mv64x60_pci.c
@@ -164,8 +164,8 @@ static int __init mv64x60_add_bridge(struct device_node *dev)
void __init mv64x60_pci_init(void)
{
- struct device_node *np = NULL;
+ struct device_node *np;
- while ((np = of_find_compatible_node(np, "pci", "marvell,mv64x60-pci")))
+ for_each_compatible_node(np, "pci", "marvell,mv64x60-pci")
mv64x60_add_bridge(np);
}
diff --git a/arch/powerpc/sysdev/mv64x60_udbg.c b/arch/powerpc/sysdev/mv64x60_udbg.c
index 367e7b1..35c77c7 100644
--- a/arch/powerpc/sysdev/mv64x60_udbg.c
+++ b/arch/powerpc/sysdev/mv64x60_udbg.c
@@ -85,10 +85,10 @@ static void mv64x60_udbg_init(void)
if (!stdout)
return;
- for (np = NULL;
- (np = of_find_compatible_node(np, "serial", "marvell,mpsc")); )
+ for_each_compatible_node(np, "serial", "marvell,mpsc") {
if (np == stdout)
break;
+ }
of_node_put(stdout);
if (!np)
diff --git a/arch/powerpc/sysdev/uic.c b/arch/powerpc/sysdev/uic.c
index 5149716..815d6db 100644
--- a/arch/powerpc/sysdev/uic.c
+++ b/arch/powerpc/sysdev/uic.c
@@ -326,28 +326,23 @@ void __init uic_init_tree(void)
const u32 *interrupts;
/* First locate and initialize the top-level UIC */
-
- np = of_find_compatible_node(NULL, NULL, "ibm,uic");
- while (np) {
+ for_each_compatible_node(np, NULL, "ibm,uic") {
interrupts = of_get_property(np, "interrupts", NULL);
- if (! interrupts)
+ if (!interrupts)
break;
-
- np = of_find_compatible_node(np, NULL, "ibm,uic");
}
BUG_ON(!np); /* uic_init_tree() assumes there's a UIC as the
* top-level interrupt controller */
primary_uic = uic_init_one(np);
- if (! primary_uic)
+ if (!primary_uic)
panic("Unable to initialize primary UIC %s\n", np->full_name);
irq_set_default_host(primary_uic->irqhost);
of_node_put(np);
/* The scan again for cascaded UICs */
- np = of_find_compatible_node(NULL, NULL, "ibm,uic");
- while (np) {
+ for_each_compatible_node(np, NULL, "ibm,uic") {
interrupts = of_get_property(np, "interrupts", NULL);
if (interrupts) {
/* Secondary UIC */
@@ -355,7 +350,7 @@ void __init uic_init_tree(void)
int ret;
uic = uic_init_one(np);
- if (! uic)
+ if (!uic)
panic("Unable to initialize a secondary UIC %s\n",
np->full_name);
@@ -373,8 +368,6 @@ void __init uic_init_tree(void)
/* FIXME: setup critical cascade?? */
}
-
- np = of_find_compatible_node(np, NULL, "ibm,uic");
}
}
^ permalink raw reply related
* oops trying to execute sh
From: John Charles Tyner @ 2007-11-21 19:54 UTC (permalink / raw)
To: linuxppc-dev
I'm trying to boot linux 2.6.22.9 on an mpc860c rev d4.
When init trys to spawn sh, during the exec, the kernel oopses as seen
below:
## Starting application at 0x00400000 ...
loaded at: 00400000 004EF15C
board data at: 03F9FBC0 03F9FBFC
relocated to: 00404044 00404080
zimage at: 00404E74 004EC662
avail ram: 004F0000 04000000
Linux/PPC load: console=ttyCPM,38400
Uncompressing Linux...done.
Now booting the kernel
Linux version 2.6.22.9 (jtyner@johnnyedge) (gcc version 4.2.1) #113 Wed Nov 21 10:49:36 PST 2007
Zone PFN ranges:
DMA 0 -> 16384
Normal 16384 -> 16384
early_node_map[1] active PFN ranges
0: 0 -> 16384
Built 1 zonelists. Total pages: 16256
Kernel command line: console=ttyCPM,38400
PID hash table entries: 256 (order: 8, 1024 bytes)
Decrementer Frequency = 183750000/60
Console: colour dummy device 80x25
cpm_uart: console: compat mode
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory: 63244k available (880k kernel code, 268k data, 444k init, 0k highmem)
Mount-cache hash table entries: 512
ADDSI: Init
io scheduler noop registered (default)
Serial: CPM driver $Revision: 0.02 $
ttyCPM0 at MMIO 0xc5000a80 (irq = 20) is a CPM UART
mice: PS/2 mouse device common for all mice
Freeing unused kernel memory: 444k init
init started: BusyBox v1.8.0 (2007-11-16 14:24:51 PST)
starting pid 103, tty '': '/bin/sh'
Oops: kernel access of bad area, sig: 11 [#1]
NIP: c0044ed0 LR: c0044ff0 CTR: 00000001
REGS: c3c0bd00 TRAP: 0300 Not tainted (2.6.22.9)
MSR: 00009032 <EE,ME,IR,DR> CR: 30099099 XER: a0008c7f
DAR: ff80103f, DSISR: c0000000
TASK = c0288070[103] 'init' THREAD: c3c0a000
GPR00: c0044ff0 c3c0bdb0 c0288070 ff800fff 00000000 7faf8000 00000000 00000000
GPR08: c01a8f58 c017d91c 00000002 c0179cd0 30099093 1007687c 00000002 c00f8744
GPR16: 00000000 c00f0a64 c011d1ac c00f0aa4 c00f0a90 c0120000 00000001 00000003
GPR24: c3c1ce00 00000000 c0180000 c0247550 00000000 c3c0bdc8 c0179cd0 ff800fff
NIP [c0044ed0] remove_vma+0x14/0x70
LR [c0044ff0] exit_mmap+0xc4/0xf0
Call Trace:
[c3c0bdb0] [c3c0bdc8] 0xc3c0bdc8 (unreliable)
[c3c0bdc0] [c0044ff0] exit_mmap+0xc4/0xf0
[c3c0bdf0] [c000f74c] mmput+0x50/0xd4
[c3c0be00] [c00591f4] flush_old_exec+0x3b8/0x7a8
[c3c0be50] [c0086cc0] load_elf_binary+0x2e8/0x1454
[c3c0bee0] [c005892c] search_binary_handler+0x58/0x12c
[c3c0bf00] [c0059d64] do_execve+0x13c/0x1f0
[c3c0bf20] [c00089b4] sys_execve+0x50/0x90
[c3c0bf40] [c0002a40] ret_from_syscall+0x0/0x38
Instruction dump:
7d808120 38210040 4e800020 83c30000 4bffff18 38a00000 4bffff9c 7c0802a6
9421fff0 bfc10008 90010014 7c7f1b78 <81230040> 83c3000c 2f890000 419e0018
The interesting thing is that r3 points to something funny. While tracing
this problem down, I replaced the remove_vma function with the following:
/*
* Close a vm structure and free it, returning the next.
*/
static struct vm_area_struct * __attribute__((__noinline__)) __remove_vma(struct vm_area_struct *vma)
{
struct vm_area_struct *next = vma->vm_next;
might_sleep();
if (vma->vm_ops && vma->vm_ops->close)
vma->vm_ops->close(vma);
if (vma->vm_file)
fput(vma->vm_file);
mpol_free(vma_policy(vma));
kmem_cache_free(vm_area_cachep, vma);
return next;
}
static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)
{
asm volatile (
"lis 4,-128\n"
"ori 4,4,4095\n"
"tweq 3,4\n"
"lwz 5,0(1)\n"
"tweq 3,4\n"
);
return __remove_vma( vma );
}
With this code, the kernel oopses on the *second* tweq instruction:
Kernel BUG at c0045fd4 [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
NIP: c0045fd4 LR: c00460a0 CTR: 00000001
REGS: c3c0bd10 TRAP: 0700 Not tainted (2.6.22.9)
MSR: 00029032 <EE,ME,IR,DR> CR: 30099099 XER: a0008c7f
TASK = c0292b40[103] 'init' THREAD: c3c0a000
GPR00: 00000001 c3c0bdc0 c0292b40 ff800fff ff800fff c3c0bdf0 00000000 00000000
GPR08: c0219398 c017d91c 00000002 c0179cd0 30099093 1007687c 00000002 c00f8744
GPR16: 00000000 c00f0a64 c011d1ac c00f0aa4 c00f0a90 c0120000 00000001 00000003
GPR24: c3c32e00 00000000 c0180000 c0247080 00000000 c3c0bdc8 c0179cd0 c017641c
NIP [c0045fd4] remove_vma+0x10/0x18
LR [c00460a0] exit_mmap+0xc4/0xf0
Call Trace:
[c3c0bdc0] [c0046074] exit_mmap+0x98/0xf0 (unreliable)
[c3c0bdf0] [c000f74c] mmput+0x50/0xd4
[c3c0be00] [c005920c] flush_old_exec+0x3b8/0x7a8
[c3c0be50] [c0086cd8] load_elf_binary+0x2e8/0x1454
[c3c0bee0] [c0058944] search_binary_handler+0x58/0x12c
[c3c0bf00] [c0059d7c] do_execve+0x13c/0x1f0
[c3c0bf20] [c00089b4] sys_execve+0x50/0x90
[c3c0bf40] [c0002a40] ret_from_syscall+0x0/0x38
Instruction dump:
7fe4fb78 4800a0ed 80010014 7fc3f378 7c0803a6 bbc10008 38210010 4e800020
3c80ff80 60840fff 7c832008 80a10000 <7c832008> 4bffff7c 7c0802a6 9421ffd0
The access of memory through r1 seems to corrupt r3, and always with the
same value. The problem isn't necessarily here, though. If I modify my
remove_vma function to cause and correct the problem (by saving r3 prior
to the memory access and restoring it afterwards), I just get the same
problem in some other part of the code, but the oops is always caused
because the base register for some memory access is set to ff800fff.
I applied a recent patch I found that corrects the address returned by
cpm_dpram_addr and its use in cpu_uart_cpm1.h, and I've created my own
platform setup file by copying the mpc866ads setup enough to get the
console uart (SMC1) to work.
If there is any other information I can or need to provide, let me
know. Any help would be greatly appreciated.
Thanks,
John
^ permalink raw reply
* Re: [PATCH] [POWERPC] Emulate isel (Integer Select) instruction
From: Paul Mackerras @ 2007-11-21 21:31 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: linuxppc-dev
In-Reply-To: <Pine.LNX.4.62.0711211407260.12720@pademelon.sonytel.be>
Geert Uytterhoeven writes:
> @@ -721,31 +729,38 @@ static int emulate_instruction(struct pt
>
> /* Emulate the mfspr rD, PVR. */
> if ((instword & INST_MFSPR_PVR_MASK) == INST_MFSPR_PVR) {
> + WARN_EMULATE("mfpvr");
mfpvr is a bit different from the others in that it is actually a
privileged instruction, so I don't think it helps to warn about it.
Also, I think the warnings should be optional.
Paul.
^ permalink raw reply
* Re: [PATCH] [POWERPC] Emulate isel (Integer Select) instruction
From: Paul Mackerras @ 2007-11-21 21:36 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: linuxppc-dev
In-Reply-To: <Pine.LNX.4.62.0711211407260.12720@pademelon.sonytel.be>
Geert Uytterhoeven writes:
> +#define WARN_EMULATE(type) \
> + do { \
> + static unsigned int count; \
> + if (count++ < 10) \
> + pr_warning("%s used emulated %s instruction\n", \
> + current->comm, type); \
Thinking about this a bit more, if an instruction gets emulated 10
times then I don't care, since it's probably only cost me 10
microseconds or so. If it gets emulated a million times then I might
want to look at it. So in fact this approach doesn't give me the
information I need to know whether there is a real problem or not.
Paul.
^ permalink raw reply
* Re: [PATCH] [POWERPC] Emulate isel (Integer Select) instruction
From: Paul Mackerras @ 2007-11-21 21:38 UTC (permalink / raw)
To: Geert Uytterhoeven; +Cc: linuxppc-dev
In-Reply-To: <Pine.LNX.4.62.0711211407260.12720@pademelon.sonytel.be>
Geert Uytterhoeven writes:
> Question: do we want it for emulate_single_step(), too?
No, because that's not emulating an instruction.
Paul.
^ permalink raw reply
* Re: [PATCH] [POWERPC] Emulate isel (Integer Select) instruction
From: Scott Wood @ 2007-11-21 21:41 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Geert Uytterhoeven, linuxppc-dev
In-Reply-To: <18244.42181.426804.662877@cargo.ozlabs.ibm.com>
Paul Mackerras wrote:
> Geert Uytterhoeven writes:
>
>> +#define WARN_EMULATE(type) \
>> + do { \
>> + static unsigned int count; \
>> + if (count++ < 10) \
>> + pr_warning("%s used emulated %s instruction\n", \
>> + current->comm, type); \
>
> Thinking about this a bit more, if an instruction gets emulated 10
> times then I don't care, since it's probably only cost me 10
> microseconds or so. If it gets emulated a million times then I might
> want to look at it. So in fact this approach doesn't give me the
> information I need to know whether there is a real problem or not.
Maybe print the first time, then when it's happened 10 times, then 100,
then 1000, etc.
-Scott
^ permalink raw reply
* Re: [PATCH] [POWERPC] Emulate isel (Integer Select) instruction
From: Kim Phillips @ 2007-11-21 21:48 UTC (permalink / raw)
To: Scott Wood; +Cc: Geert Uytterhoeven, linuxppc-dev, Paul Mackerras
In-Reply-To: <4744A5EC.1090201@freescale.com>
On Wed, 21 Nov 2007 15:41:00 -0600
Scott Wood <scottwood@freescale.com> wrote:
> Paul Mackerras wrote:
> > Geert Uytterhoeven writes:
> >
> >> +#define WARN_EMULATE(type) \
> >> + do { \
> >> + static unsigned int count; \
> >> + if (count++ < 10) \
> >> + pr_warning("%s used emulated %s instruction\n", \
> >> + current->comm, type); \
> >
> > Thinking about this a bit more, if an instruction gets emulated 10
> > times then I don't care, since it's probably only cost me 10
> > microseconds or so. If it gets emulated a million times then I might
> > want to look at it. So in fact this approach doesn't give me the
> > information I need to know whether there is a real problem or not.
>
> Maybe print the first time, then when it's happened 10 times, then 100,
> then 1000, etc.
>
or just use printk_ratelimit().
Kim
^ permalink raw reply
* pseries (power3) boot hang (pageblock_nr_pages==0)
From: Will Schmidt @ 2007-11-21 21:55 UTC (permalink / raw)
To: Mel Gorman, Stephen Rothwell, Linux Memory Management List,
linuxppc-dev
Hi Folks,
I've been seeing a boot hang/crash on power3 systems for a few weeks.
(hangs on a 270, drops to SP on a p610). This afternoon I got around
to tracking it down to the changes in
commit d9c2340052278d8eb2ffb16b0484f8f794def4de
Do not depend on MAX_ORDER when grouping pages by mobility
cpu 0x0: Vector: 100 (System Reset) at [c00000006e803ae0]
pc: c00000000009bf50: .setup_per_zone_pages_min+0x298/0x34c
lr: c00000000009be38: .setup_per_zone_pages_min+0x180/0x34c
[c00000006e803e20] c0000000005e3898 .init_per_zone_pages_min+0x80/0xa0
[c00000006e803ea0] c0000000005c9c04 .kernel_init+0x214/0x3d8
[c00000006e803f90] c000000000026cac .kernel_thread+0x4c/0x68
I narrowed it down to the for loop within setup_zone_migrate_reserve(),
called by setup_per_zone_pages_min(). The loop spins forever due to
pageblock_nr_pages being 0.
I imagine this would be properly fixed with something similar to the
change for iSeries. Depending on how obvious, quick and easy it is for
the experts to come up with a proper fix, I'll be able to do additional
debug and hacking after turkey-day. :-)
For the moment, I've hacked it with the following patch. (tested on
both the 270 and the p610):
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2454,6 +2454,9 @@ static void setup_zone_migrate_reserve(struct zone
*zone)
reserve = roundup(zone->pages_min, pageblock_nr_pages) >>
pageblock_order;
+/* this is a cheap and dirty bailout, probally not a proper fix. */
+ if (pageblock_nr_pages==0) return;
+
for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages)
{
if (!pfn_valid(pfn))
continue;
^ permalink raw reply
* Re: pseries (power3) boot hang (pageblock_nr_pages==0)
From: Mel Gorman @ 2007-11-21 22:03 UTC (permalink / raw)
To: Will Schmidt; +Cc: linuxppc-dev, Stephen Rothwell, Linux Memory Management List
In-Reply-To: <1195682111.4421.23.camel@farscape.rchland.ibm.com>
On (21/11/07 15:55), Will Schmidt didst pronounce:
> Hi Folks,
>
> I've been seeing a boot hang/crash on power3 systems for a few weeks.
> (hangs on a 270, drops to SP on a p610). This afternoon I got around
> to tracking it down to the changes in
>
> commit d9c2340052278d8eb2ffb16b0484f8f794def4de
> Do not depend on MAX_ORDER when grouping pages by mobility
>
> cpu 0x0: Vector: 100 (System Reset) at [c00000006e803ae0]
> pc: c00000000009bf50: .setup_per_zone_pages_min+0x298/0x34c
> lr: c00000000009be38: .setup_per_zone_pages_min+0x180/0x34c
> [c00000006e803e20] c0000000005e3898 .init_per_zone_pages_min+0x80/0xa0
> [c00000006e803ea0] c0000000005c9c04 .kernel_init+0x214/0x3d8
> [c00000006e803f90] c000000000026cac .kernel_thread+0x4c/0x68
>
> I narrowed it down to the for loop within setup_zone_migrate_reserve(),
> called by setup_per_zone_pages_min(). The loop spins forever due to
> pageblock_nr_pages being 0.
>
> I imagine this would be properly fixed with something similar to the
> change for iSeries.
Have you tried with the patch that fixed the iSeries boot problem?
Thanks for tracking down the problem to such a specific place.
Here it the iSeries fix in case it applies to this as well.
======
Ordinarily, the size of a pageblock is determined at compile-time based on
the hugepage size. On PPC64, the hugepage size is determined at runtime based
on what is supported by the machine. On legacy machines such as iSeries which
do not support hugepages, HPAGE_SHIFT is 0. This results in pageblock_order
being set to -PAGE_SHIFT and a crash results shortly afterwards.
This patch checks that HPAGE_SHIFT is a sensible value before using the
hugepage size. If it is 0, MAX_ORDER-1 is used instead as this is a sensible
value of pageblock_order.
This is a fix for 2.6.24.
Credit goes to Stephen Rothwell for identifying the bug and testing on
iSeries. Additional credit goes to David Gibson for testing with the
libhugetlbfs test suite.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
arch/powerpc/Kconfig | 5 +++++
mm/page_alloc.c | 11 ++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 18f397c..232c298 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -187,6 +187,11 @@ config FORCE_MAX_ZONEORDER
default "9" if PPC_64K_PAGES
default "13"
+config HUGETLB_PAGE_SIZE_VARIABLE
+ bool
+ depends on HUGETLB_PAGE
+ default y
+
config MATH_EMULATION
bool "Math emulation"
depends on 4xx || 8xx || E200 || PPC_MPC832x || E500
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index da69d83..14e0ac3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3386,7 +3386,16 @@ static void __meminit free_area_init_core(struct pglist_data *pgdat,
if (!size)
continue;
- set_pageblock_order(HUGETLB_PAGE_ORDER);
+ /*
+ * If HPAGE_SHIFT is a sensible value, base the size of a
+ * pageblock on the hugepage size. Otherwise MAX_ORDER-1
+ * is a sensible choice
+ */
+ if (HPAGE_SHIFT > PAGE_SHIFT)
+ set_pageblock_order(HUGETLB_PAGE_ORDER);
+ else
+ set_pageblock_order(MAX_ORDER-1);
+
setup_usemap(pgdat, zone, size);
ret = init_currently_empty_zone(zone, zone_start_pfn,
size, MEMMAP_EARLY);
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply related
* Re: [PATCH 3/3] [POWERPC] Add docs for Freescale DMA & DMA channel device tree nodes
From: David Gibson @ 2007-11-21 22:28 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Timur Tabi
In-Reply-To: <47448687.6010106@freescale.com>
On Wed, Nov 21, 2007 at 01:27:03PM -0600, Scott Wood wrote:
> Kumar Gala wrote:
> > On Nov 21, 2007, at 11:35 AM, Scott Wood wrote:
> >> A cell-index property would be useful here for indexing into the summary
> >> status register.
> >
> > Divide by 0x80.
>
> :-P
>
> Using cell-index for things like this is reasonably common, and endorsed
> by current ePAPR drafts.
Indeed, indexing or writing into shared registers is exactly what
cell-index is for.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* [PATCH/RFC 0/6]: phyp dump: hypervisor-assisted dump
From: Linas Vepstas @ 2007-11-21 22:36 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
The following series of patches implement a basic framework
for hypervisor-assisted dump. The very first patch provides
documentation explaining what this is :-). Yes, its supposed
to be an improvement over kdump.
The patches mostly sort-of work; a list of open issues
is inculded in the documentation. It also appears that
the not-yet-released firmware versions this was tested
on are still, ahem, incomplete; this work is also pending.
-- Linas & Manish
^ permalink raw reply
* [PATCH/RFC 1/6]: phyp dump: Documentation
From: Linas Vepstas @ 2007-11-21 22:37 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Basic documentation for hypervisor-assisted dump.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
Documentation/powerpc/phyp-assisted-dump.txt | 126 +++++++++++++++++++++++++++
1 file changed, 126 insertions(+)
Index: linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc3-git1/Documentation/powerpc/phyp-assisted-dump.txt 2007-11-21 16:26:44.000000000 -0600
@@ -0,0 +1,126 @@
+
+ Hypervisor-Assisted Dump
+ ------------------------
+ November 2007
+
+The goal of hypervisor-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+As compared to kdump or other strategies, hypervisor-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+ with a fresh copy of the kernel. In particular,
+ PCI and I/O devices have been reinitialized and are
+ in a clean, consistent state.
+-- As the dump is performed, the dumped memory becomes
+ immediately available to the system for normal use.
+-- After the dump is completed, no further reboots are
+ required; the system will be fully usable, and running
+ in it's normal, production mode on it normal kernel.
+
+The above can only be accomplished by coordination with,
+and assistance from the hypervisor. The procedure is
+as follows:
+
+-- When a system crashes, the hypervisor will save
+ the low 256MB of RAM to a previously registered
+ save region. It will also save system state, system
+ registers, and hardware PTE's.
+
+-- After the low 256MB area has been saved, the
+ hypervisor will reset PCI and other hardware state.
+ It will *not* clear RAM. It will then launch the
+ bootloader, as normal.
+
+-- The freshly booted kernel will notice that there
+ is a new node (ibm,dump-kernel) in the device tree,
+ indicating that there is crash data available from
+ a previous boot. It will boot into only 256MB of RAM,
+ reserving the rest of system memory.
+
+-- Userspace tools will read /proc/kcore to obtain the
+ contents of memory, which holds the previous crashed
+ kernel. The userspace tools may copy this info to
+ disk, or network, nas, san, iscsi, etc. as desired.
+
+-- As the userspace tools complete saving a portion of
+ dump, they echo an offset and size to
+ /sys/kernel/release_region to release the reserved
+ memory back to general use.
+
+ An example of this is:
+ "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ which will release 256MB at the 1GB boundary.
+
+Please note that the hypervisor-assisted dump feature
+is only available on Power6-based systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+In order for this scheme to work, memory needs to be reserved
+quite early in the boot cycle. However, access to the device
+tree this early in the boot cycle is difficult, and device-tree
+access is needed to determine if there is a crash data waiting.
+To work around this problem, all but 256MB of RAM is reserved
+during early boot. A short while later in boot, a check is made
+to determine if there is dump data waiting. If there isn't,
+then the reserved memory is released to general kernel use.
+If there is dump data, then the /sys/kernel/release_region
+file is created, and the reserved memory is held.
+
+If there is no waiting dump data, then all but 256MB of the
+reserved ram will be released for general kernel use. The
+highest 256 MB of RAM will *not* be released: this region
+will be kept permanently reserved, so that it can act as
+a receptacle for a copy of the low 256MB in the case a crash
+does occur. See, however, "open issues" below, as to whether
+such a reserved region is really needed.
+
+General notes:
+--------------
+Security: please note that there are potential security issues
+with any sort of dump mechanism. In particular, plaintext
+(unencrypted) data, and possibly passwords, may be present in
+the dump data. Userspace tools must take adequate precautions to
+preserve security.
+
+Open issues:
+------------
+ o User-space dump tool integration is completely unresolved.
+
+ o The various code paths that tell the hypervisor that a crash
+ occurred, vs. it simply being a normal reboot, should be
+ reviewed, and possibly clarified/fixed.
+
+ o The real-virtual mapping is awkward and unaddressed. There
+ is currently no clear way of matching up the contents of
+ /proc/kcore to the values that need to be sent to
+ /sys/kernel/release_region
+
+ o Instead of using /sys/kernel, should there be a /sys/dump
+ instead? There is a dump_subsys being created by the s390 code,
+ perhaps the pseries code should use a similar layout as well.
+
+ o Saved system registers and HPTE tables will be located in high
+ memory. There is currently no way of telling user-space where
+ these are located.
+
+ o The post-dump procedures are incomplete. In particular,
+ after a dump as been taken, the system should re-register
+ with the hypervisor, so that a subsequent crash can be handled.
+
+ o The hypervisor may have an error preserving the dump data.
+ The current code does not check for this error, and does
+ not handle it.
+
+ o Is reserving a 256MB region really required? The goal of
+ reserving a 256MB scratch area is to make sure that no
+ important crash data is clobbered when the hypervisor
+ save low mem to the scratch area. But, if one could assure
+ that nothing important is located in some 256MB area, then
+ it would not need to be reserved.
+
^ permalink raw reply
* [PATCH/RFC 2/6]: phyp dump: config file
From: Linas Vepstas @ 2007-11-21 22:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Add hypervisor-assisted dump to kernel config
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
-----
arch/powerpc/Kconfig | 11 +++++++++++
1 file changed, 11 insertions(+)
Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 16:39:20.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig 2007-11-15 14:27:33.000000000 -0600
@@ -261,6 +261,17 @@ config CRASH_DUMP
Don't change this unless you know what you are doing.
+config PHYP_DUMP
+ bool "Hypervisor-assisted dump (EXPERIMENTAL)"
+ depends on PPC_PSERIES && EXPERIMENTAL
+ default y
+ help
+ Hypervisor-assisted dump is meant to be a kdump replacement
+ offering robustness and speed not possible without system
+ hypervisor assistence.
+
+ If unsure, say "Y"
+
config PPCBUG_NVRAM
bool "Enable reading PPCBUG NVRAM during boot" if PPLUS || LOPEC
default y if PPC_PREP
^ permalink raw reply
* [PATCH/RFC 3/6]: phyp dump: reserve-release proof-of-concept
From: Linas Vepstas @ 2007-11-21 22:40 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Initial rough-in/proof of concept of reserving memory in
early boot, and freeing it later. If the previous boot
had ended with a crash, the reserved memory would contain
a copy of the crashed kernel data.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
----
arch/powerpc/kernel/prom.c | 33 +++++++++++++
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/phyp_dump.c | 71 +++++++++++++++++++++++++++++
include/asm-powerpc/phyp_dump.h | 32 +++++++++++++
4 files changed, 137 insertions(+)
Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.000000000 -0600
@@ -0,0 +1,32 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyright (c) 2007 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _PPC64_PHYP_DUMP_H
+#define _PPC64_PHYP_DUMP_H
+
+#ifdef CONFIG_PHYP_DUMP
+
+/* The RMR region will be saved for later dumping
+ * whenever the kernel crashes. Set this to 256MB. */
+#define PHYP_DUMP_RMR_START 0x0
+#define PHYP_DUMP_RMR_END (1UL<<28)
+
+struct phyp_dump {
+ /* Memory that is reserved during very early boot. */
+ unsigned long init_reserve_start;
+ unsigned long init_reserve_size;
+};
+
+extern struct phyp_dump *phyp_dump_info;
+
+#endif /* CONFIG_PHYP_DUMP */
+#endif /* _PPC64_PHYP_DUMP_H */
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.000000000 -0600
@@ -0,0 +1,71 @@
+/*
+ * Hypervisor-assisted dump
+ *
+ * Linas Vepstas, Manish Ahuja 2007
+ * Copyrhgit (c) 2007 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/pfn.h>
+#include <linux/swap.h>
+
+#include <asm/page.h>
+#include <asm/phyp_dump.h>
+
+/* Global, used to communicate data between early boot and late boot */
+static struct phyp_dump phyp_dump_global;
+struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+
+/**
+ * release_memory_range -- release memory previously lmb_reserved
+ * @start_pfn: starting physical frame number
+ * @nr_pages: number of pages to free.
+ *
+ * This routine will release memory that had been previously
+ * lmb_reserved in early boot. The released memory becomes
+ * available for genreal use.
+ */
+static void
+release_memory_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+ struct page *rpage;
+ unsigned long end_pfn;
+ long i;
+
+ end_pfn = start_pfn + nr_pages;
+
+ for (i=start_pfn; i <= end_pfn; i++) {
+ rpage = pfn_to_page(i);
+ if (PageReserved(rpage)) {
+ ClearPageReserved(rpage);
+ init_page_count(rpage);
+ __free_page(rpage);
+ totalram_pages++;
+ }
+ }
+}
+
+static int __init phyp_dump_setup(void)
+{
+ unsigned long start_pfn, nr_pages;
+
+ /* If no memory was reserved in early boot, there is nothing to do */
+ if (phyp_dump_info->init_reserve_size == 0)
+ return 0;
+
+ /* Release memory that was reserved in early boot */
+ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+ nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
+ release_memory_range(start_pfn, nr_pages);
+
+ return 0;
+}
+
+subsys_initcall(phyp_dump_setup);
Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:44:21.000000000 -0600
@@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu
obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
obj-$(CONFIG_HVCS) += hvcserver.o
obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o
+obj-$(CONFIG_PHYP_DUMP) += phyp_dump.o
Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c
===================================================================
--- linux-2.6.24-rc2-git4.orig/arch/powerpc/kernel/prom.c 2007-11-19 17:43:52.000000000 -0600
+++ linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c 2007-11-19 17:44:21.000000000 -0600
@@ -51,6 +51,7 @@
#include <asm/machdep.h>
#include <asm/pSeries_reconfig.h>
#include <asm/pci-bridge.h>
+#include <asm/phyp_dump.h>
#include <asm/kexec.h>
#ifdef DEBUG
@@ -1011,6 +1012,37 @@ static void __init early_reserve_mem(voi
#endif
}
+#ifdef CONFIG_PHYP_DUMP
+
+/**
+ * reserve_crashed_mem() - reserve all not-yet-dumped mmemory
+ *
+ * This routine will reserve almost all of the memory in the
+ * system, except for a few hundred megabytes used to boot the
+ * new kernel. As the reserved memory is dumped to the dump
+ * device (by userland tools), it will be freed and made available.
+ */
+static void __init reserve_crashed_mem(void)
+{
+ unsigned long crashed_base, crashed_size;
+
+ /* Reserve *everything* above the RMR. We'll free this real soon. */
+ crashed_base = PHYP_DUMP_RMR_END;
+ crashed_size = lmb_end_of_DRAM() - crashed_base;
+
+ /* XXX crashed_ram_end is wrong, since it may be beyond
+ * the memory_limit, it will need to be adjusted. */
+ lmb_reserve(crashed_base, crashed_size);
+
+ phyp_dump_info->init_reserve_start = crashed_base;
+ phyp_dump_info->init_reserve_size = crashed_size;
+}
+
+#else
+static inline void __init reserve_crashed_mem(void) {}
+#endif /* CONFIG_PHYP_DUMP */
+
+
void __init early_init_devtree(void *params)
{
DBG(" -> early_init_devtree(%p)\n", params);
@@ -1043,6 +1075,7 @@ void __init early_init_devtree(void *par
reserve_kdump_trampoline();
reserve_crashkernel();
early_reserve_mem();
+ reserve_crashed_mem();
lmb_enforce_memory_limit(memory_limit);
lmb_analyze();
^ permalink raw reply
* [PATCH/RFC 4/6]: phyp dump: use sysfs to release reserved mem
From: Linas Vepstas @ 2007-11-21 22:41 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Check to see if there actually is data from a previously
crashed kernel waiting. If so, Allow user-sapce tools to
grab the data (by reading /proc/kcore). When user-space
finishes dumping a section, it must release that memory
by writing to sysfs. For example,
echo "0x40000000 0x10000000" > /sys/kernel/release_region
will release 256MB starting at the 1GB. The released memory
becomes free for general use.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
------
arch/powerpc/platforms/pseries/phyp_dump.c | 101 +++++++++++++++++++++++++++--
1 file changed, 96 insertions(+), 5 deletions(-)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:15:05.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:24:30.000000000 -0600
@@ -12,17 +12,24 @@
*/
#include <linux/init.h>
+#include <linux/kobject.h>
#include <linux/mm.h>
+#include <linux/of.h>
#include <linux/pfn.h>
#include <linux/swap.h>
+#include <linux/sysfs.h>
#include <asm/page.h>
#include <asm/phyp_dump.h>
+#include <asm/rtas.h>
/* Global, used to communicate data between early boot and late boot */
static struct phyp_dump phyp_dump_global;
struct phyp_dump *phyp_dump_info = &phyp_dump_global;
+static int ibm_configure_kernel_dump;
+
+/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
* @start_pfn: starting physical frame number
@@ -52,18 +59,102 @@ release_memory_range(unsigned long start
}
}
-static int __init phyp_dump_setup(void)
+/* ------------------------------------------------- */
+/**
+ * sysfs_release_region -- sysfs interface to release memory range.
+ *
+ * Usage:
+ * "echo <start addr> <length> > /sys/kernel/release_region"
+ *
+ * Example:
+ * "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
+ *
+ * will release 256MB starting at 1GB.
+ */
+static ssize_t
+store_release_region(struct kset *kset, const char *buf, size_t count)
{
+ unsigned long start_addr, length, end_addr;
unsigned long start_pfn, nr_pages;
+ ssize_t ret;
- /* If no memory was reserved in early boot, there is nothing to do */
- if (phyp_dump_info->init_reserve_size == 0)
- return 0;
+ ret = sscanf(buf, "%lx %lx", &start_addr, &length);
+ if (ret != 2)
+ return -EINVAL;
+
+ /* Range-check - don't free any reserved memory that
+ * wasn't reserved for phyp-dump */
+ if (start_addr < phyp_dump_info->init_reserve_start)
+ start_addr = phyp_dump_info->init_reserve_start;
+
+ end_addr = phyp_dump_info->init_reserve_start +
+ phyp_dump_info->init_reserve_size;
+ if (start_addr+length > end_addr)
+ length = end_addr - start_addr;
+
+ /* Release the region of memory assed in by user */
+ start_pfn = PFN_DOWN(start_addr);
+ nr_pages = PFN_DOWN(length);
+ release_memory_range (start_pfn, nr_pages);
+
+ return count;
+}
+
+static ssize_t
+show_release_region(struct kset * kset, char *buf)
+{
+ return sprintf(buf, "ola\n");
+}
+
+static struct subsys_attribute rr = __ATTR(release_region, 0600,
+ show_release_region,
+ store_release_region);
+
+/* ------------------------------------------------- */
+
+static void release_all (void)
+{
+ unsigned long start_pfn, nr_pages;
- /* Release memory that was reserved in early boot */
+ /* Release all memory that was reserved in early boot */
start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
nr_pages = PFN_DOWN(phyp_dump_info->init_reserve_size);
release_memory_range(start_pfn, nr_pages);
+}
+
+static int __init phyp_dump_setup(void)
+{
+ struct device_node *rtas;
+ const int *dump_header;
+ int header_len = 0;
+ int rc;
+
+ /* If no memory was reserved in early boot, there is nothing to do */
+ if (phyp_dump_info->init_reserve_size == 0)
+ return 0;
+
+ /* Return if phyp dump not supported */
+ ibm_configure_kernel_dump = rtas_token("ibm,configure-kernel-dump");
+ if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) {
+ release_all();
+ return -ENOSYS;
+ }
+
+ /* Is there dump data waiting for us? */
+ rtas = of_find_node_by_path("/rtas");
+ dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+ if (dump_header == NULL) {
+ release_all();
+ return 0;
+ }
+
+ /* Should we create a dump_subsys, analogous to s390/ipl.c ? */
+ rc = subsys_create_file(&kernel_subsys, &rr);
+ if (rc) {
+ printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
+ release_all();
+ return 0;
+ }
return 0;
}
^ permalink raw reply
* Re: annoying prinkts during vmemmap initialization
From: Stephen Rothwell @ 2007-11-21 22:41 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linuxppc-dev
In-Reply-To: <20071121153526.GA23589@lst.de>
[-- Attachment #1: Type: text/plain, Size: 861 bytes --]
Hi Christoph,
On Wed, 21 Nov 2007 16:35:26 +0100 Christoph Hellwig <hch@lst.de> wrote:
>
> Hi Andi,
>
> your patch 'ppc64: SPARSEMEM_VMEMMAP support' adds the following two lines:
>
> + printk(KERN_WARNING "vmemmap %08lx allocated at %p, "
> + "physical %p.\n", start, p, __pa(p));
>
> in a loop around basically every page. That's a lot of flooding (with
> the wrong printk level, btw) and really slows down booting my cell blade
> a lot (these only have a very slow serial over lan console).
>
> Any reason to keep this? And if yes can we please make it conditional
> on some kind of vmemmap_debug boot option?
These have been changed to pr_debug() in 2.6.24-rc3 kernel.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* [PATCH/RFC 5/6]: phyp dump: register the dump area
From: Linas Vepstas @ 2007-11-21 22:43 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake, mahuja
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Set up the actual dump header, register it with the hypervisor.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
------
arch/powerpc/platforms/pseries/phyp_dump.c | 169 +++++++++++++++++++++++++++--
1 file changed, 163 insertions(+), 6 deletions(-)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 15:55:37.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:06:52.000000000 -0600
@@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = &phyp
static int ibm_configure_kernel_dump;
/* ------------------------------------------------- */
+/* RTAS interfaces to declare the dump regions */
+
+struct dump_section {
+ u32 dump_flags;
+ u16 source_type;
+ u16 error_flags;
+ u64 source_address;
+ u64 source_length;
+ u64 length_copied;
+ u64 destination_address;
+};
+
+struct phyp_dump_header {
+ u32 version;
+ u16 num_of_sections;
+ u16 status;
+
+ u32 first_offset_section;
+ u32 dump_disk_section;
+ u64 block_num_dd;
+ u64 num_of_blocks_dd;
+ u32 offset_dd;
+ u32 maxtime_to_auto;
+ /* No dump disk path string used */
+
+ struct dump_section cpu_data;
+ struct dump_section hpte_data;
+ struct dump_section kernel_data;
+};
+
+/* The dump header *must be* in low memory, so .bss it */
+static struct phyp_dump_header phdr;
+
+#define NUM_DUMP_SECTIONS 3
+#define DUMP_HEADER_VERSION 0x1
+#define DUMP_REQUEST_FLAG 0x1
+#define DUMP_SOURCE_CPU 0x0001
+#define DUMP_SOURCE_HPTE 0x0002
+#define DUMP_SOURCE_RMO 0x0011
+
+/**
+ * init_dump_header() - initialize the header declaring a dump
+ * Returns: length of dump save area.
+ *
+ * When the hypervisor saves crashed state, it needs to put
+ * it somewhere. The dump header tells the hypervisor where
+ * the data can be saved.
+ */
+static unsigned long init_dump_header(struct phyp_dump_header *ph)
+{
+ struct device_node *rtas;
+ const unsigned int *sizes;
+ int len;
+ unsigned long cpu_state_size = 0;
+ unsigned long hpte_region_size = 0;
+ unsigned long addr_offset = 0;
+
+ /* Get the required dump region sizes */
+ rtas = of_find_node_by_path("/rtas");
+ sizes = of_get_property(rtas, "ibm,configure-kernel-dump-sizes", &len);
+ if (!sizes || len < 20)
+ return 0;
+
+ if (sizes[0] == 1)
+ cpu_state_size = *((unsigned long *) &sizes[1]);
+
+ if (sizes[3] == 2)
+ hpte_region_size = *((unsigned long *) &sizes[4]);
+
+ /* Set up the dump header */
+ ph->version = DUMP_HEADER_VERSION;
+ ph->num_of_sections = NUM_DUMP_SECTIONS;
+ ph->status = 0;
+
+ ph->first_offset_section =
+ (u32) &(((struct phyp_dump_header *) 0)->cpu_data);
+ ph->dump_disk_section = 0;
+ ph->block_num_dd = 0;
+ ph->num_of_blocks_dd = 0;
+ ph->offset_dd = 0;
+
+ ph->maxtime_to_auto = 0; /* disabled */
+
+ /* The first two sections are mandatory */
+ ph->cpu_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->cpu_data.source_type = DUMP_SOURCE_CPU;
+ ph->cpu_data.source_address = 0;
+ ph->cpu_data.source_length = cpu_state_size;
+ ph->cpu_data.destination_address = addr_offset;
+ addr_offset += cpu_state_size;
+
+ ph->hpte_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->hpte_data.source_type = DUMP_SOURCE_HPTE;
+ ph->hpte_data.source_address = 0;
+ ph->hpte_data.source_length = hpte_region_size;
+ ph->hpte_data.destination_address = addr_offset;
+ addr_offset += hpte_region_size;
+
+ /* This section describes the low kernel region */
+ ph->kernel_data.dump_flags = DUMP_REQUEST_FLAG;
+ ph->kernel_data.source_type = DUMP_SOURCE_RMO;
+ ph->kernel_data.source_address = PHYP_DUMP_RMR_START;
+ ph->kernel_data.source_length = PHYP_DUMP_RMR_END;
+ ph->kernel_data.destination_address = addr_offset;
+ addr_offset += ph->kernel_data.source_length;
+
+ return addr_offset;
+}
+
+static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
+{
+ int rc;
+ ph->cpu_data.destination_address += addr;
+ ph->hpte_data.destination_address += addr;
+ ph->kernel_data.destination_address += addr;
+
+ do {
+ rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL,
+ 1, ph, sizeof(struct phyp_dump_header));
+ } while (rtas_busy_delay(rc));
+
+ if (rc)
+ {
+ printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+ }
+}
+
+/* ------------------------------------------------- */
/**
* release_memory_range -- release memory previously lmb_reserved
* @start_pfn: starting physical frame number
@@ -125,7 +253,11 @@ static void release_all (void)
static int __init phyp_dump_setup(void)
{
struct device_node *rtas;
- const int *dump_header;
+ const struct phyp_dump_header *dump_header;
+ unsigned long dump_area_start;
+ unsigned long dump_area_length;
+ unsigned long free_area_length;
+ unsigned long start_pfn, nr_pages;
int header_len = 0;
int rc;
@@ -140,22 +272,47 @@ static int __init phyp_dump_setup(void)
return -ENOSYS;
}
- /* Is there dump data waiting for us? */
+ /* Is there dump data waiting for us? If there isn't,
+ * then register a new dump area, and release all of
+ * the rest of the reserved ram.
+ *
+ * The /rtas/ibm,kernel-dump rtas node is present only
+ * if there is dump data waiting for us.
+ */
rtas = of_find_node_by_path("/rtas");
dump_header = of_get_property(rtas, "ibm,kernel-dump", &header_len);
+
+ dump_area_length = init_dump_header (&phdr);
+ free_area_length = phyp_dump_info->init_reserve_size - dump_area_length;
+ dump_area_start = phyp_dump_info->init_reserve_start + free_area_length;
+ dump_area_start = dump_area_start & PAGE_MASK; /* align down */
+ free_area_length = dump_area_start - phyp_dump_info->init_reserve_start;
+
if (dump_header == NULL) {
- release_all();
- return 0;
+ register_dump_area (&phdr, dump_area_start);
+ goto release_mem;
}
+ /* Don't allow user to release the 256MB scratch area */
+ phyp_dump_info->init_reserve_size = free_area_length;
+
/* Should we create a dump_subsys, analogous to s390/ipl.c ? */
rc = subsys_create_file(&kernel_subsys, &rr);
if (rc) {
printk (KERN_ERR "phyp-dump: unable to create sysfs file (%d)\n", rc);
- release_all();
- return 0;
+ goto release_mem;
}
+ /* ToDo: re-register the dump area, for next time. */
+
+ return 0;
+
+release_mem:
+ /* release everything except the top 256 MB scratch area */
+ start_pfn = PFN_DOWN(phyp_dump_info->init_reserve_start);
+ nr_pages = PFN_DOWN(free_area_length);
+ release_memory_range(start_pfn, nr_pages);
+
return 0;
}
^ permalink raw reply
* [PATCH/RFC 6/6]: phyp dump: debugging print routines.
From: Linas Vepstas @ 2007-11-21 22:45 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mahuja, lkessler, strosake
In-Reply-To: <20071121223639.GB4374@austin.ibm.com>
Provide some basic debugging support.
Signed-off-by: Manish Ahuja <mahuja@us.ibm.com>
Signed-off-by: Linas Vepsts <linas@austin.ibm.com>
-----
arch/powerpc/platforms/pseries/phyp_dump.c | 51 +++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c
===================================================================
--- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:12:21.000000000 -0600
+++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:12:46.000000000 -0600
@@ -139,6 +139,51 @@ static unsigned long init_dump_header(st
return addr_offset;
}
+#ifdef DEBUG
+static void print_dump_header(const struct phyp_dump_header *ph)
+{
+ printk(KERN_INFO "dump header:\n");
+ /* setup some ph->sections required */
+ printk(KERN_INFO "version = %d\n", ph->version);
+ printk(KERN_INFO "Sections = %d\n", ph->num_of_sections);
+ printk(KERN_INFO "Status = 0x%x\n", ph->status);
+
+ /* No ph->disk, so all should be set to 0 */
+ printk(KERN_INFO "Offset to first section 0x%x\n", ph->first_offset_section);
+ printk(KERN_INFO "dump disk sections should be zero\n");
+ printk(KERN_INFO "dump disk section = %d\n",ph->dump_disk_section);
+ printk(KERN_INFO "block num = %ld\n",ph->block_num_dd);
+ printk(KERN_INFO "number of blocks = %ld\n",ph->num_of_blocks_dd);
+ printk(KERN_INFO "dump disk offset = %d\n",ph->offset_dd);
+ printk(KERN_INFO "Max auto time= %d\n",ph->maxtime_to_auto);
+
+ /*set cpu state and hpte states as well scratch pad area */
+ printk(KERN_INFO " CPU AREA \n");
+ printk(KERN_INFO "cpu dump_flags =%d\n",ph->cpu_data.dump_flags);
+ printk(KERN_INFO "cpu source_type =%d\n",ph->cpu_data.source_type);
+ printk(KERN_INFO "cpu error_flags =%d\n",ph->cpu_data.error_flags);
+ printk(KERN_INFO "cpu source_address =%lx\n",ph->cpu_data.source_address);
+ printk(KERN_INFO "cpu source_length =%lx\n",ph->cpu_data.source_length);
+ printk(KERN_INFO "cpu length_copied =%lx\n",ph->cpu_data.length_copied);
+
+ printk(KERN_INFO " HPTE AREA \n");
+ printk(KERN_INFO "HPTE dump_flags =%d\n",ph->hpte_data.dump_flags);
+ printk(KERN_INFO "HPTE source_type =%d\n",ph->hpte_data.source_type);
+ printk(KERN_INFO "HPTE error_flags =%d\n",ph->hpte_data.error_flags);
+ printk(KERN_INFO "HPTE source_address =%lx\n",ph->hpte_data.source_address);
+ printk(KERN_INFO "HPTE source_length =%lx\n",ph->hpte_data.source_length);
+ printk(KERN_INFO "HPTE length_copied =%lx\n",ph->hpte_data.length_copied);
+
+ printk(KERN_INFO " SRSD AREA \n");
+ printk(KERN_INFO "SRSD dump_flags =%d\n",ph->kernel_data.dump_flags);
+ printk(KERN_INFO "SRSD source_type =%d\n",ph->kernel_data.source_type);
+ printk(KERN_INFO "SRSD error_flags =%d\n",ph->kernel_data.error_flags);
+ printk(KERN_INFO "SRSD source_address =%lx\n",ph->kernel_data.source_address);
+ printk(KERN_INFO "SRSD source_length =%lx\n",ph->kernel_data.source_length);
+ printk(KERN_INFO "SRSD length_copied =%lx\n",ph->kernel_data.length_copied);
+}
+#endif
+
static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr)
{
int rc;
@@ -154,6 +199,9 @@ static void register_dump_area(struct ph
if (rc)
{
printk (KERN_ERR "phyp-dump: unexpected error (%d) on register\n", rc);
+#ifdef DEBUG
+ print_dump_header (ph);
+#endif
}
}
@@ -292,6 +340,9 @@ static int __init phyp_dump_setup(void)
register_dump_area (&phdr, dump_area_start);
goto release_mem;
}
+#ifdef DEBUG
+ print_dump_header (dump_header);
+#endif
/* Don't allow user to release the 256MB scratch area */
phyp_dump_info->init_reserve_size = free_area_length;
^ permalink raw reply
* Re: annoying prinkts during vmemmap initialization
From: Christoph Hellwig @ 2007-11-21 22:49 UTC (permalink / raw)
To: Stephen Rothwell; +Cc: linuxppc-dev, Christoph Hellwig
In-Reply-To: <20071122094145.e79e1084.sfr@canb.auug.org.au>
On Thu, Nov 22, 2007 at 09:41:45AM +1100, Stephen Rothwell wrote:
> > Any reason to keep this? And if yes can we please make it conditional
> > on some kind of vmemmap_debug boot option?
>
> These have been changed to pr_debug() in 2.6.24-rc3 kernel.
Ah, sorry for not checking. Looks like the spufs tree lags a little
behind.
^ permalink raw reply
* Re: 2.6.24-rc3-mm1- powerpc link failure
From: Stephen Rothwell @ 2007-11-21 22:52 UTC (permalink / raw)
To: Kamalesh Babulal; +Cc: linuxppc-dev, Andrew Morton, Balbir Singh, linux-kernel
In-Reply-To: <4743E706.6010504@linux.vnet.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 681 bytes --]
On Wed, 21 Nov 2007 13:36:30 +0530 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>
> The kernel build fails on powerpc while linking,
Only for allyesconfig (or maybe some other config that builds a lot of
stuff in.
> AS .tmp_kallsyms3.o
> LD vmlinux.o
> ld: TOC section size exceeds 64k
> make: *** [vmlinux.o] Error 1
>
> The patch posted at http://lkml.org/lkml/2007/11/13/414, solves this
> failure.
However, that patch needs more testing especially to figure out what
performance effects it has. i.e. not for merging, yet.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: [RFC/PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: David Gibson @ 2007-11-21 22:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20071121061720.83D5ADEBE3@ozlabs.org>
On Wed, Nov 21, 2007 at 05:16:30PM +1100, Benjamin Herrenschmidt wrote:
> This adds some basic real mode based early udbg support for 40x
> in order to debug things more easily
Shouldn't we be able to share code with the Maple realmode udbg()?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: [PATCH 12/14] powerpc: Add early udbg support for 40x processors
From: Grant Likely @ 2007-11-21 23:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <20071121061555.55B06DDFA8@ozlabs.org>
On 11/20/07, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> This adds some basic real mode based early udbg support for 40x
> in order to debug things more easily
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> --- linux-work.orig/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:16.000000000 +1100
> +++ linux-work/arch/powerpc/platforms/Kconfig.cputype 2007-11-21 12:50:18.000000000 +1100
> @@ -43,6 +43,7 @@ config 40x
> bool "AMCC 40x"
> select PPC_DCR_NATIVE
> select WANT_DEVICE_TREE
> + select PPC_UDBG_16550
Unfortunately, this isn't always true. The Xilinx Virtex parts us
config 40x, but not all FPGA bitstreams have a 16550 serial port.
Sometimes it's a uartlite instead.
Cheers,
g.
--
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
grant.likely@secretlab.ca
(403) 399-0195
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox